-
Notifications
You must be signed in to change notification settings - Fork 8.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Security Assistant] Automatically Install Knowledge Base #182763
Conversation
…ipeline and esql docs via KB data client
…atus checks when FF is enabled
// Do not pre-gen _id for bulk create operations to avoid `version_conflict_engine_exception` | ||
// TODO: Breaks Attack Discovery anonymization-field creation :( | ||
{ create: { _index: this.options.index } }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Letting ES generate ID's ended up breaking Attack Discovery as it validated anonymizationFields.id
to be UUID.
After speaking with @YulNaumenko, we decided to relax this constraint, so it and all other ID's that ES may have a chance at generating have been updated to be UUID or NonEmptyString in this commit: ddf93a8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For reference, as mentioned in the source comment above, we removed the explicit id generation _id: uuid.v4()
as this was causing version_conflict_engine_exception
errors during the bulk document load of KB entries.
Incidentally, all the documents were actually loaded without issue, however we couldn't determine why the errors were thrown (which ultimately mess with application retry/error logic), so opted for having ES generate the id's. I have reached out to the ES folks (internal slack) to see if perhaps this is related to the ELSER ingest_pipeline as none of the other DataClient assets like anonymization-fields have this issue when bulk creating.
return { | ||
'@timestamp': createdAt, | ||
created_at: createdAt, | ||
created_by: user.profile_uid ?? 'unknown', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is unknown
ever a valid profile_uid
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not, just using it as a sentinel value at the moment as the authenticatedUser
can be null on the kbDataClient (if for some reason it was not instantiated from a request, or an unauthenticated request).
@YulNaumenko has started putting auth checks on routes (like here in post_actions_execute), so I think either that or individual checks in the kbDataClient methods that require an auth'd user as you mentioned below would be good here as we discussed. Or perhaps we can make authenticatedUser
mandatory on the client and just return a null client (which it can be now) if the authc.getCurrentUser() call returns null.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the user is mandatory, how we will make "shared"/public knowledge?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, good point. We could use a skeleton 'system authenticatedUser', but I'm thinking it might be better to leave optional and do null checks as needed. I'll see what this looks like when I get closer to fleshing out the RBAC portions of the API. Thanks Yuliia!
...lastic_assistant/server/ai_assistant_data_clients/knowledge_base/get_knowledge_base_entry.ts
Show resolved
Hide resolved
x-pack/plugins/elastic_assistant/server/ai_assistant_data_clients/knowledge_base/index.ts
Show resolved
Hide resolved
seq_no_primary_term: true, | ||
}); | ||
const conversation = transformESSearchToKnowledgeBaseEntry(response); | ||
return conversation[0] ?? null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this an exceptional case, and if so, should it throw instead of returning null
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was following the same pattern as @YulNaumenko put in place with the conversation data client here:
Line 77 in 1c26791
return conversation[0] ?? null; |
Let's discuss further with @YulNaumenko, as when you and I discussed offline, there are a few other failure points in this get action to consider, including any potential latency in writing/reading on serverless as well.
Looks like I need to update that variable name though, so I'll go ahead and do that 😅
{ | ||
match: user.profile_uid | ||
? { 'users.id': user.profile_uid } | ||
: { 'users.name': user.username }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is user.username
a valid fallback when user.profile_uid
does not exist?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good question... Again, I was following the existing pattern put in place with conversations here:
Lines 38 to 40 in 1c26791
match: user.profile_uid | |
? { 'users.id': user.profile_uid } | |
: { 'users.name': user.username }, |
I'm not sure here, so let's discuss with @YulNaumenko as there may be some cloud/ESS/Serverless things to consider around identity/users (as this seems to indicate profile_uid
is not present everywhere, by maybe username
is).
@@ -21,7 +21,7 @@ export interface AIAssistantDataClientParams { | |||
kibanaVersion: string; | |||
spaceId: string; | |||
logger: Logger; | |||
indexPatternsResorceName: string; | |||
indexPatternsResourceName: string; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice find! 🙏
x-pack/plugins/elastic_assistant/server/lib/data_stream/documents_data_writer.ts
Outdated
Show resolved
Hide resolved
return response.map((doc) => doc.id); | ||
} catch (e) { | ||
this.logger.error(`Error loading data into KB\n ${e}`); | ||
return []; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider letting this error propagate, because if it doesn't, the caller must compare the length of the input documents
with the length of the returned response here, in order to detect errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Going to leave this one as-is for now since it matches the current behavior of the addDocuments()
function. But will revisit this as I work through the tool creation next week and (possibly) move document creation out of esStore.
x-pack/plugins/elastic_assistant/server/routes/knowledge_base/entries/create_route.ts
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @spong for carefully slotting in this improvement to the KB setup UX while ensuring the current path continues to work 🙏
✅ Desk tested locally
LGTM 🚀
💚 Build Succeeded
Metrics [docs]Module Count
Public APIs missing comments
Async chunks
Page load bundle
Unknown metric groupsAPI count
ESLint disabled line counts
Total ESLint disabled count
History
To update your PR or re-run it, just comment with: cc @spong |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Exited about the next steps of making KB so easy to use!
Summary
This PR is
Phase 1
of the Knowledge Base work for8.15
, which includes automatically setting up the Knowledge base (this PR), introducing new generic KB tools for recall/retrieval, a CRUD API for for managing Knowledge Base Entries, and a basic UI for Knowledge Base Entry management (all captured in this issue). Once complete, this will also provide the opportunity to remove the!isEnabledKnowledgeBase
code paths, directing all interactions through our LangChain Agent pipeline.This PR sets the ground work for all of the above by moving ELSER setup and Knowledge Base data management to use the new
AssistantDataClient
architecture used for Conversations, AnonymizationFields and Prompts.This feature is currently behind the
assistantKnowledgeBaseByDefault
experimental feature flag, which can be enabled by adding the following to yourkibana.dev.yml
:Once enabled, an
Install Knowledge Base
button will be shown when starting a new conversation. Note: UX is still under development.Useful Dev Tools Queries
The new
assistantKnowledgeBaseByDefault
flows are quite resilient, and so everything should function as expected even if one piece of the puzzle is missing or incomplete. Here are some dev tool queries to check and delete individual resources, which is nice for testing. For instance, you can nuke the ingest pipeline, or ELSER, and theInstall KB
button will appear and function as intended.Note
Since the existing API's were used, with forked logic for the
assistantKnowledgeBaseByDefault
FF, the existing KB Settings UI still functions as expected, and can be used for deleting and re-initializing the KB. This functionality will most likely go away with updates to the KB UI, but is nice for testing in the interim.Useful Dev Tools Queries
New Features:
assistantKnowledgeBaseByDefault
experimental feature flag and exposed throughassistantCapabilities
APIassistantFeatures
made available in AssistantProvider and tests (no need to individually plumb new features)AIAssistantDataClient
for creating Data Streams, Component Templates, and Ingest PipelineAIAssistantDataClient
to automatically install ELSER viainstallElasticModel()
API, then deploy via TrainedModelsAPIaddKnowledgeBaseDocuments()
for creating KB entries from LangChainDocuments
withinAIAssistantDataClient
ElasticsearchStore
to take akbDataClient
for use in document adding/retrievalChanges not behind FF:
getELSER()
helper function to be called byinternalUser
to prevent ml privilege requirements to setup Knowledge Base once a privileged user has enabled ELSERget/post/delete
knowledge base routes to createesStore
asinternalUser
to enable all assistant users the ability to enable the KB once ELSER has been installed (currently they503
if currentUser doesn't haveread_ingest
andmanage
/read
for the.kibana-elastic-ai-assistant-kb index
)get/post/delete
knowledge base routes with assistant API access controls:tags: ['access:elasticAssistant'],
Checklist
Delete any items that are not applicable to this PR.