-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Obs AI Assistant] knowledge base integration tests #189000
[Obs AI Assistant] knowledge base integration tests #189000
Conversation
🤖 GitHub commentsExpand to view the GitHub comments
Just comment with:
|
/ci |
/ci |
…unian/kibana into 188999-kb-integration-tests
Flaky Test Runner Stats🎉 All tests passed! - kibana-flaky-test-suite-runner#6629[✅] x-pack/test/observability_ai_assistant_api_integration/enterprise/config.ts: 25/25 tests passed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a comment, but overall ML changes LGTM
Co-authored-by: James Gowdy <jgowdy@elastic.co>
if (!license.hasAtLeast('enterprise')) { | ||
return defaultModelId; | ||
} | ||
if (configModelId) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we move this up before the license check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I put it below just to make sure someone without an enterprise license could not try to use it, but not sure it matters. And because we are only, at the moment, testing the enterprise functionality using the assistant where its allowed. Though, I'm not sure why we need to doInit
to start with and create the ES assets if not enterprise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we await the license check here to make sure the license check is completed. We don't really care whether it is valid, from the looks of it - why even return a model id if the license is not valid? So I think we should move the license check out of this code, and throw an invalid license error further down. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It sounds like you and Søren had a similar conversation https://github.com/elastic/kibana/pull/181700/files#r1579341084
It seems like we were catching the invalid license error that ML was throwing but we didn't want to log it. So we check if its enterprise and just return the "default model id" even though its no use. I was also thinking we should not call doInit or stop getModelId from being called further up if they dont have the enterprise license.
I moved the config before license and added back in a comment that was removed that I think was helpful.
.../test/observability_ai_assistant_api_integration/tests/knowledge_base/knowledge_base.spec.ts
Outdated
Show resolved
Hide resolved
...bservability_ai_assistant_api_integration/tests/knowledge_base/knowledge_base_status.spec.ts
Outdated
Show resolved
Hide resolved
await ml.testResources.cleanMLSavedObjects(); | ||
}); | ||
|
||
it('returns correct status after knowledge base is setup', async () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we test what happens before the model is imported?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems, during setup, we assume that no installation error is going to happen. The call to create the model throws an error that isn't caught:
{
name: 'ResponseError',
message: 'action_request_validation_exception\n' +
'\tRoot causes:\n' +
'\t\taction_request_validation_exception: Validation Failed: 1: [model_type] must be set if [definition] is not defined.;2: [inference_config] must not be null.;'
}
This causes pRetry to call itself again and repeat the process til it finally fails after the 12 times. We display a toast with a 500 internal server error
These validation errors should not happen give elser2 should already have these properties set somewhere, but I think we should check the call to create model and stop if it fails for whatever reason. If i understand correctly, pRetry is mainly checking to see if a model is finished installing and we depend on a call to getTrainedModels status errors for that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I think happens is (and maybe we can e.g. use named functions to clarify this):
- (1) Check whether model is installed
-- 1A. If this check throws an error, and it's a resource_not_found_exception or status_exception, install the model, and re-evaluate step (1). this is what happens on a clean slate.
-- 1B. If this check does not throw, but returns false (the model is installing, but not fully defined), retry (1) in n seconds.
-- 1C. if this check does not throw, and returns true, consider the model successfully installed and available for deployment, and continue to step (2) - (2) Deploy installed model.
-- 2A. If this throws with aresource_not_found_exception
orstatus_exception
, catch the error and continue to (3).
-- 2B. If any other error, fail and exit the process. - (3) Check if model has been successfully deployed
-- 3A. If successfully deployed, exit process with a 200
-- 3B. If not successfully deployed, re-evaluate (3) in n seconds
To answer your question, I think we don't really retry the installModel
call - we retry the calls to determine whether the model is installed. This is a consequence of the fact that a model install can be in progress. One thing that might clear this up is to move the installModel
call out of the pRetry
. Perhaps it should be something like:
installModelIfDoesNotExist()
pollForModelInstallCompleted()
deployModelIfNeeded()
pollForModelDeploymentCompleted()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For 1, that makes sense and aligns with the behaviour I'm seeing. The problem arises if the model cannot install we get "stuck" in 1A and call installModel repeatedly. This happens when putTrainedModel() in installModel throws which means getTrainedModels() will continue to throw because we never started the install.
Thanks, I'll see if I can clear it up and add error handling for when the model install can't start / throws.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dgieselaar I updated (1) model installation process. Let me know if that's easier to read.
.../test/observability_ai_assistant_api_integration/tests/knowledge_base/knowledge_base.spec.ts
Outdated
Show resolved
Hide resolved
…unian/kibana into 188999-kb-integration-tests
Flaky Test Runner Stats🎉 All tests passed! - kibana-flaky-test-suite-runner#6669[✅] x-pack/test/observability_ai_assistant_api_integration/enterprise/config.ts: 25/25 tests passed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a ton, this is great!
/ci |
💚 Build Succeeded
Metrics [docs]
History
To update your PR or re-run it, just comment with: |
Closes #188999
modelId
, for internal use, to override elser model idknowledgeBaseService.setup()
to fix bug where if the model failed to install when calling ml.putTrainedModel, we dont get stuck polling and retrying the install. We were assuming that the first error that gets throw when the model is exists would only happen once and the return true or false and poll for whether its done installing. But the installation could fail itself causing getTrainedModelsStats to continuously throw and try to install the model. Now user immediately gets error if model fails to install and polling does not happen.