Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor deployed model labels #346

Merged
merged 7 commits into from
Feb 6, 2023
Merged

Conversation

ariefrahmansyah
Copy link
Contributor

@ariefrahmansyah ariefrahmansyah commented Jan 30, 2023

What this PR does / why we need it:

The labels implementation of Merlin resources (model services, batch jobs) is having two major limitations:

  1. Hard coded to use "gojek.com/" as prefix so user can change it
  2. Only propagate project's labels even though version labels are available via new_model_version() function.

These drawbacks limit the user to have more flexibility on using label for their tracking.

This PR make following changes:

  1. Allow user to specify label prefix
  2. Merge version labels with project labels. Version labels have more priority.
  3. Introduce "component" labels to tag model-version, model-endpoint, batch-job, and image-builder resources.

Which issue(s) this PR fixes:

Fixes #

Does this PR introduce a user-facing change?:


Checklist

  • Added unit test, integration, and/or e2e tests
  • Tested locally
  • Updated documentation
  • Update Swagger spec if the PR introduce API changes
  • Regenerated Golang and Python client if the PR introduce API changes

@ariefrahmansyah ariefrahmansyah marked this pull request as ready for review February 2, 2023 07:13
Copy link
Member

@pradithya pradithya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'll be better to make the prefix consistent across all labels.

api/models/metadata.go Outdated Show resolved Hide resolved
@pradithya
Copy link
Member

Can you review #348 and rebase this PR against that. I think the e2e flakiness is due to kserve

Copy link
Member

@pradithya pradithya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @ariefrahmansyah


// InitKubernetesLabeller builds a new KubernetesLabeller Singleton
func InitKubernetesLabeller(p string) {
prefix = p
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we do validation whether the prefix is valid or not? e.g "goto/gojek/" is not a valid prefix, if this value passed during init of merlin api, user won't be able to do anything

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great suggestion. We added the validation for prefix.

api/models/metadata.go Outdated Show resolved Hide resolved
labelEnvironment = "gojek.com/environment"
labelUsersPrefix = "gojek.com/%s"
LabelAppName = "app"
LabelComponent = "component"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to expose this const?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, only LabelOrchestratorName that used by other packages, so yeah, we will change other to local variables.

Co-authored-by: yadavsunny05 <yadav.sunny05@gmail.com>
@tiopramayudi
Copy link
Contributor

lgtm

@ariefrahmansyah ariefrahmansyah merged commit 14d23c9 into main Feb 6, 2023
@ariefrahmansyah ariefrahmansyah deleted the refactor-deployment-labels branch February 6, 2023 16:19
shydefoo added a commit that referenced this pull request Feb 21, 2023
* upstream/main:
  Refactor deployed model labels (#346)
  [Pyfunc] Add ability to push metrics to Prometheus push gateway (#311)
ariefrahmansyah added a commit that referenced this pull request Apr 17, 2023
<!--  Thanks for sending a pull request!  Here are some tips for you:

1. Run unit tests and ensure that they are passing
2. If your change introduces any API changes, make sure to update the
e2e tests
3. Make sure documentation is updated for your PR!

-->

**What this PR does / why we need it**:
<!-- Explain here the context and why you're making the change. What is
the problem you're trying to solve. --->

1. The image-building jobs in Merlin are timing out. After some
investigation, we found that one of the root causes is the node pool got
scaled down resulting in the image building pods to be rescheduled. This
PR adds "cluster-autoscaler.kubernetes.io/safe-to-evict": "false" to
avoid the pod get killed and rescheduled.
2. In Refactor deployed model labels #346, we managed to propagate user
labels to models and batch prediction jobs, but not to image builder
jobs (here). This PR includes adding project + model version labels to
image builder job.
3. This PR also improve e2e test by:
a. Removing the duplicate of kserve-controller-manager statefulset as in
kserve 0.9.0 manifest has both of statefulset and deployment
  b. Removing kserve-controller-manager's cpu limits

**Does this PR introduce a user-facing change?**:
<!--
If no, just write "NONE" in the release-note block below.
If yes, a release note is required. Enter your extended release note in
the block below.
If the PR requires additional action from users switching to the new
release, include the string "action required".

For more information about release notes, see kubernetes' guide here:
http://git.k8s.io/community/contributors/guide/release-notes.md
-->

```release-note
NONE
```

**Checklist**

- [x] Added unit test, integration, and/or e2e tests
- [x] Tested locally
- [x] Updated documentation
- [ ] Update Swagger spec if the PR introduce API changes
- [ ] Regenerated Golang and Python client if the PR introduce API
changes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants