Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Support multiple model deployments #155375

Merged

Conversation

darnautov
Copy link
Contributor

@darnautov darnautov commented Apr 20, 2023

Summary

With elastic/elasticsearch#95168 it's possible to provide an optional deployment ID for start, stop and infer.
elastic/elasticsearch#95440 also updated the _stats API to provide state per deployment.

This PR update the Trained Models UI to support multiple model deployments.

  • Adds support for specifying deployment ID during the start model deployment

image

  • Stopping specific deployments

image

  • Specify the deployment ID for the Test model action

image

  • Show deployment stats for every deployment

image

  • Show pipelines with associated deployments

image

Checklist

@darnautov darnautov added release_note:enhancement :ml Feature:3rd Party Models ML 3rd party models Team:ML Team label for ML (also use :ml) v8.8.0 labels Apr 20, 2023
@darnautov darnautov self-assigned this Apr 20, 2023
@darnautov darnautov force-pushed the ml-154886-support-multiple-deployments branch from 8ad8e57 to 9893f1d Compare April 24, 2023 09:38
@darnautov darnautov changed the title [ML] Support deployment ID parameter for starting and stopping model deployment [ML] Support multiple model deployments Apr 24, 2023
@darnautov darnautov marked this pull request as ready for review April 24, 2023 17:47
@darnautov darnautov requested a review from a team as a code owner April 24, 2023 17:47
@elasticmachine
Copy link
Contributor

Pinging @elastic/ml-ui (:ml)

@@ -491,7 +491,6 @@ export function getMlClient(
return mlClient.startTrainedModelDeployment(...p);
},
async updateTrainedModelDeployment(...p: Parameters<MlClient['updateTrainedModelDeployment']>) {
await modelIdsCheck(p);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These calls to modelIdsCheck are needed to restrict access to models that are in the current space only.
Without them a user can update, stop and infer a model from spaces they do not have access to.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem here with deployment IDs. We don't create saved objects for them, because they only live in deployment stats

Copy link
Member

@jgowdyelastic jgowdyelastic Apr 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I this is going to need extra work which could be done after as a bug fix.
We need to know the model ID for each of these commands, otherwise we can't enforce space awareness.
We could change the endpoints so modelId is always sent as part of the request body, and we rename the path variable to deploymentId, e.g.
/api/ml/trained_models/{deploymentId}/deployment/_stop

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about it more, we'll have to leave the path variable as modelId so it is compatible with the Elasticsearch API that it forwards the request on to.
So the real model ID will probably need to be called something else, but still added to the body so we can check the space.

Copy link
Contributor

@peteharverson peteharverson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested latest changes and LGTM

@darnautov darnautov enabled auto-merge (squash) April 25, 2023 17:40
Copy link
Member

@jgowdyelastic jgowdyelastic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested, LGTM

@kibana-ci
Copy link
Collaborator

💚 Build Succeeded

Metrics [docs]

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
ml 3.4MB 3.4MB +4.5KB

Page load bundle

Size of the bundles that are downloaded on every page load. Target size is below 100kb

id before after diff
ml 77.2KB 77.3KB +124.0B
Unknown metric groups

ESLint disabled line counts

id before after diff
enterpriseSearch 16 18 +2
securitySolution 395 398 +3
total +5

Total ESLint disabled count

id before after diff
enterpriseSearch 17 19 +2
securitySolution 475 478 +3
total +5

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @darnautov

@darnautov darnautov merged commit 609228b into elastic:main Apr 25, 2023
@kibanamachine kibanamachine added the backport:skip This commit does not require backporting label Apr 25, 2023
@darnautov darnautov deleted the ml-154886-support-multiple-deployments branch April 25, 2023 18:48
jgowdyelastic added a commit that referenced this pull request May 3, 2023
…6238)

Fixes issues raised in
#155375 (comment)
Kibana trained model endpoints for `_stop`, `_update` and `infer` now
require the model ID was well as the deployment ID to be passed to them.


Also fixes the stop trained model api when stopping more than one model.
It's very likely the elasticsearch `_stop` api will not support a comma
separated list of deployment IDs for this release, and so this change
calls `_stop` in a loop for each deployment. It also allows for better
reporting if any of the deployments fail to stop.
kibanamachine pushed a commit to kibanamachine/kibana that referenced this pull request May 3, 2023
…stic#156238)

Fixes issues raised in
elastic#155375 (comment)
Kibana trained model endpoints for `_stop`, `_update` and `infer` now
require the model ID was well as the deployment ID to be passed to them.

Also fixes the stop trained model api when stopping more than one model.
It's very likely the elasticsearch `_stop` api will not support a comma
separated list of deployment IDs for this release, and so this change
calls `_stop` in a loop for each deployment. It also allows for better
reporting if any of the deployments fail to stop.

(cherry picked from commit 9559bee)
kibanamachine added a commit that referenced this pull request May 3, 2023
#156238) (#156483)

# Backport

This will backport the following commits from `main` to `8.8`:
- [[ML] Fixing space checks for recently changed trained model apis
(#156238)](#156238)

<!--- Backport version: 8.9.7 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sqren/backport)

<!--BACKPORT [{"author":{"name":"James
Gowdy","email":"jgowdy@elastic.co"},"sourceCommit":{"committedDate":"2023-05-03T08:42:12Z","message":"[ML]
Fixing space checks for recently changed trained model apis
(#156238)\n\nFixes issues raised
in\r\nhttps://github.com//pull/155375#discussion_r1176594934\r\nKibana
trained model endpoints for `_stop`, `_update` and `infer`
now\r\nrequire the model ID was well as the deployment ID to be passed
to them.\r\n\r\n\r\nAlso fixes the stop trained model api when stopping
more than one model.\r\nIt's very likely the elasticsearch `_stop` api
will not support a comma\r\nseparated list of deployment IDs for this
release, and so this change\r\ncalls `_stop` in a loop for each
deployment. It also allows for better\r\nreporting if any of the
deployments fail to
stop.","sha":"9559bee36139c5a896b70ab16c57d1e24c6a94d9","branchLabelMapping":{"^v8.9.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["non-issue",":ml","release_note:skip","Feature:3rd
Party
Models","v8.8.0","v8.9.0"],"number":156238,"url":"#156238
Fixing space checks for recently changed trained model apis
(#156238)\n\nFixes issues raised
in\r\nhttps://github.com//pull/155375#discussion_r1176594934\r\nKibana
trained model endpoints for `_stop`, `_update` and `infer`
now\r\nrequire the model ID was well as the deployment ID to be passed
to them.\r\n\r\n\r\nAlso fixes the stop trained model api when stopping
more than one model.\r\nIt's very likely the elasticsearch `_stop` api
will not support a comma\r\nseparated list of deployment IDs for this
release, and so this change\r\ncalls `_stop` in a loop for each
deployment. It also allows for better\r\nreporting if any of the
deployments fail to
stop.","sha":"9559bee36139c5a896b70ab16c57d1e24c6a94d9"}},"sourceBranch":"main","suggestedTargetBranches":["8.8"],"targetPullRequestStates":[{"branch":"8.8","label":"v8.8.0","labelRegex":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"},{"branch":"main","label":"v8.9.0","labelRegex":"^v8.9.0$","isSourceBranch":true,"state":"MERGED","url":"#156238
Fixing space checks for recently changed trained model apis
(#156238)\n\nFixes issues raised
in\r\nhttps://github.com//pull/155375#discussion_r1176594934\r\nKibana
trained model endpoints for `_stop`, `_update` and `infer`
now\r\nrequire the model ID was well as the deployment ID to be passed
to them.\r\n\r\n\r\nAlso fixes the stop trained model api when stopping
more than one model.\r\nIt's very likely the elasticsearch `_stop` api
will not support a comma\r\nseparated list of deployment IDs for this
release, and so this change\r\ncalls `_stop` in a loop for each
deployment. It also allows for better\r\nreporting if any of the
deployments fail to
stop.","sha":"9559bee36139c5a896b70ab16c57d1e24c6a94d9"}}]}] BACKPORT-->

Co-authored-by: James Gowdy <jgowdy@elastic.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:skip This commit does not require backporting Feature:3rd Party Models ML 3rd party models :ml release_note:enhancement Team:ML Team label for ML (also use :ml) v8.8.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants