-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tweak node pool usage #984
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I will present that in the next meeting with others and we can merge once they have a look as well. Thanks @leej3 for the contribuitions!!!
Sounds good. Thanks. One thing to consider is whether to target some of the clearml pods using affinity as we have done in the case of forward-auth. On our deployment the clearml nodes use a lot of resources and my understanding is that the clearml server needs quite a lot of resources (or at least we haven't explored what the minimum resources are). It may be more sensible to run all auxiliary pods somewhere other than the clearml node pool to be more efficient... |
@@ -226,6 +226,8 @@ redis: # configuration from https://github.com/bitnami/charts/blob/master/bitnam | |||
master: | |||
name: "{{ .Release.Name }}-redis-master" | |||
port: 6379 | |||
nodeSelector: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is the node being labeled "app: clearml"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or this an assumption that the node will have this label?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is the node being labeled "app: clearml"
good point. It's a default value set in the variable.tf file that propagest to the chart's values.yaml (see here)).
Or this an assumption that the node will have this label?
It is a requirement. We manually set this label for our deployment. We are not sure if this is automated when QHub deploys clearml when using the cloud providers (as described here).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that drew our attention to missing redis/mongdb dynamic value setting. fixed now.
Additionally, instead of targeting all clearml pods to the general pool, we want to be able to separately target the services to different node pools. The agent should run on a large node but all other services (including kube-system components) should be targeted to a general pool with nodes with less resources. @viniciusdc and myself will take a look. We'll submit this separately |
@viniciusdc could you update this PR when you test clearml? I think this may already be fixed. |
Based on a comparison between the main.tf files from this PR and the current present on Qhub, this was not fixed yet. So we need to move these changes to 0.4.0 standards then we can merge this. (after a new review). As this will be part of #1217, I will move this to the same milestone. |
Allocates some stray cleaml resources to the clearml node-pool and targets the forwardauth deployment resource to the general node pool.