-
Notifications
You must be signed in to change notification settings - Fork 449
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds example ES|QL Knowledge Base
integration with static data
#9007
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for creating this package, this clarifies some things to me.
What I see clear is that we need:
- Support to install static documents.
- Support for
rank_features
.
What is still not clear to me is how this interact with other data, mainly:
- Is this used to work with data in other indexes or data streams?
- Does it use elastic-agent?
If these packages only manage one data stream, and the data is not collected by agents, I would say that we need a new specific package type.
packages/esql_knowledge_base/data_stream/esql_knowledge_base/agent/stream/stream.yml.hbs
Outdated
Show resolved
Hide resolved
type: keyword | ||
description: Model used to generate the vector | ||
- name: tokens | ||
type: rank_features |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we need to add support for this type.
* "Generate an ES|QL query for the top 10 countries with the most sales" | ||
* "Generate an ES|QL query for my most recent open security detection alerts of high risk" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the data queried supposed to be in data streams managed by this package, or in other data streams or indexes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The assistant generated query would be querying other data streams/indices not managed by this package. It can get kind of confusing but the flow is as follows:
- User prompts assistant to generate ES|QL query for fetching some data in their cluster as in examples above
- Assistant identifies user wants to generate an ES|QL query, and so calls the custom ES|QL Query Generation Tool
- ES|QL Query Generation Tool does a vector search against this package's data streams for relevant documents to aid in the generation of the query
- Assistant uses documents as context and returns an ES|QL query to the user
- User executes ES|QL query (which should be querying whatever index the model inferred from the original user request)
I'm not sure I understand the implications here, but generally speaking the static data in these package's are a standalone/independent resource to be referenced by the assistants when it needs to perform a specific task. Perhaps the ES|QL query generation example muddied the waters a bit here, but a different example would be a Knowledge Base package containing embeddings for the entirety of a book or of transcripts from a podcast. On initialization the assistant would fetch existing KB indices/packages, parse the package description for what it contains/provides, then register a tool that says 'query these data streams when asked about this topic'. So by just installing the
No immediate need for an agent configuration to append to the static data, but I definitely could see future use cases where you might want to keep adding data to these knowledge base indices. Or put differently, I could see use cases for existing integrations to include static data like this. E.g. what if the
From an architectural perspective, are you thinking a separate package and it is referenced by the 'main/ingest' package as you outlined in elastic/package-spec#351? As a solutions dev, and from the user's perspective, I currently hold preference to this functionality being in the same integration for the capability and flexibility mentioned above. Now I'm just coming back up to speed with the package-spec and surrounding infrastructure, so totally understand if that goes against some explicit separation of concerns that we're trying to keep here...and in that case don't mind me 😅. |
💔 Build Failed
Failed CI StepsHistory
cc @spong |
Hi! We just realized that we haven't looked into this PR in a while. We're sorry! We're labeling this issue as |
Hoping to pick this back up sometime next week. Will open corresponding |
Hi! We just realized that we haven't looked into this PR in a while. We're sorry! We're labeling this issue as |
I've had to re-focus on some immediate items for |
Hi! We just realized that we haven't looked into this PR in a while. We're sorry! We're labeling this issue as |
Hi! This PR has been stale for a while and we're going to close it as part of our cleanup procedure. We appreciate your contribution and would like to apologize if we have not been able to review it, due to the current heavy load of the team. Feel free to re-open this PR if you think it should stay open and is worth rebasing. Thank you for your contribution! |
Important
This is a work-in-progress example integration for proving out elastic/package-spec#693. Contents subject to change.
Proposed commit message
This is an example integration in support of the elastic/package-spec#693 change proposal for creating a 'Knowledge Base' integration that provides both data streams, and corresponding static content to be loaded into those data streams.
After discussion with @jsoriano, we have moved the content/mappings from the
data_stream
directory to a newknowledge_base
directory, which contains any number of directories for the knowledge bases you want to install (similar todata_stream
's). And in those directories are both the fields and static documents (as a JSON array in a*.json
file) within adocuments
folder. See spec to update: package-spec/spec/integration/data_stream/spec.ymlExample structure:
Checklist
changelog.yml
file.Author's Checklist
How to test this PR locally
Related issues
elastic/package-spec#693
Screenshots