-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor deployed model labels #346
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it'll be better to make the prefix consistent across all labels.
Can you review #348 and rebase this PR against that. I think the e2e flakiness is due to kserve |
Co-authored-by: yadavsunny05 <yadav.sunny05@gmail.com>
Co-authored-by: yadavsunny05 <yadav.sunny05@gmail.com>
Co-authored-by: yadavsunny05 <yadav.sunny05@gmail.com>
8f40c83
to
9181b97
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @ariefrahmansyah
|
||
// InitKubernetesLabeller builds a new KubernetesLabeller Singleton | ||
func InitKubernetesLabeller(p string) { | ||
prefix = p |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we do validation whether the prefix is valid or not? e.g "goto/gojek/" is not a valid prefix, if this value passed during init of merlin api, user won't be able to do anything
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great suggestion. We added the validation for prefix.
api/models/metadata.go
Outdated
labelEnvironment = "gojek.com/environment" | ||
labelUsersPrefix = "gojek.com/%s" | ||
LabelAppName = "app" | ||
LabelComponent = "component" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to expose this const?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, only LabelOrchestratorName that used by other packages, so yeah, we will change other to local variables.
Co-authored-by: yadavsunny05 <yadav.sunny05@gmail.com>
lgtm |
<!-- Thanks for sending a pull request! Here are some tips for you: 1. Run unit tests and ensure that they are passing 2. If your change introduces any API changes, make sure to update the e2e tests 3. Make sure documentation is updated for your PR! --> **What this PR does / why we need it**: <!-- Explain here the context and why you're making the change. What is the problem you're trying to solve. ---> 1. The image-building jobs in Merlin are timing out. After some investigation, we found that one of the root causes is the node pool got scaled down resulting in the image building pods to be rescheduled. This PR adds "cluster-autoscaler.kubernetes.io/safe-to-evict": "false" to avoid the pod get killed and rescheduled. 2. In Refactor deployed model labels #346, we managed to propagate user labels to models and batch prediction jobs, but not to image builder jobs (here). This PR includes adding project + model version labels to image builder job. 3. This PR also improve e2e test by: a. Removing the duplicate of kserve-controller-manager statefulset as in kserve 0.9.0 manifest has both of statefulset and deployment b. Removing kserve-controller-manager's cpu limits **Does this PR introduce a user-facing change?**: <!-- If no, just write "NONE" in the release-note block below. If yes, a release note is required. Enter your extended release note in the block below. If the PR requires additional action from users switching to the new release, include the string "action required". For more information about release notes, see kubernetes' guide here: http://git.k8s.io/community/contributors/guide/release-notes.md --> ```release-note NONE ``` **Checklist** - [x] Added unit test, integration, and/or e2e tests - [x] Tested locally - [x] Updated documentation - [ ] Update Swagger spec if the PR introduce API changes - [ ] Regenerated Golang and Python client if the PR introduce API changes
What this PR does / why we need it:
The labels implementation of Merlin resources (model services, batch jobs) is having two major limitations:
These drawbacks limit the user to have more flexibility on using label for their tracking.
This PR make following changes:
Which issue(s) this PR fixes:
Fixes #
Does this PR introduce a user-facing change?:
Checklist