Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tweak node pool usage #984

Closed
wants to merge 3 commits into from
Closed

Tweak node pool usage #984

wants to merge 3 commits into from

Conversation

leej3
Copy link
Contributor

@leej3 leej3 commented Jan 5, 2022

Allocates some stray cleaml resources to the clearml node-pool and targets the forwardauth deployment resource to the general node pool.

@leej3 leej3 requested a review from viniciusdc January 5, 2022 17:24
Copy link
Contributor

@viniciusdc viniciusdc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I will present that in the next meeting with others and we can merge once they have a look as well. Thanks @leej3 for the contribuitions!!!

@leej3
Copy link
Contributor Author

leej3 commented Jan 6, 2022

Sounds good. Thanks. One thing to consider is whether to target some of the clearml pods using affinity as we have done in the case of forward-auth. On our deployment the clearml nodes use a lot of resources and my understanding is that the clearml server needs quite a lot of resources (or at least we haven't explored what the minimum resources are). It may be more sensible to run all auxiliary pods somewhere other than the clearml node pool to be more efficient...

@@ -226,6 +226,8 @@ redis: # configuration from https://github.com/bitnami/charts/blob/master/bitnam
master:
name: "{{ .Release.Name }}-redis-master"
port: 6379
nodeSelector:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is the node being labeled "app: clearml"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or this an assumption that the node will have this label?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is the node being labeled "app: clearml"

good point. It's a default value set in the variable.tf file that propagest to the chart's values.yaml (see here)).

Or this an assumption that the node will have this label?

It is a requirement. We manually set this label for our deployment. We are not sure if this is automated when QHub deploys clearml when using the cloud providers (as described here).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that drew our attention to missing redis/mongdb dynamic value setting. fixed now.

@leej3
Copy link
Contributor Author

leej3 commented Jan 7, 2022

Additionally, instead of targeting all clearml pods to the general pool, we want to be able to separately target the services to different node pools. The agent should run on a large node but all other services (including kube-system components) should be targeted to a general pool with nodes with less resources. @viniciusdc and myself will take a look. We'll submit this separately

@leej3 leej3 mentioned this pull request Jan 17, 2022
@magsol magsol added this to the Release v0.4.1 milestone Apr 26, 2022
@costrouc
Copy link
Member

@viniciusdc could you update this PR when you test clearml? I think this may already be fixed.

@viniciusdc
Copy link
Contributor

viniciusdc commented May 5, 2022

Based on a comparison between the main.tf files from this PR and the current present on Qhub, this was not fixed yet. So we need to move these changes to 0.4.0 standards then we can merge this. (after a new review).

As this will be part of #1217, I will move this to the same milestone.

@viniciusdc
Copy link
Contributor

I will be closing this due to the changes made for v0.4.0 and #1292. These changes will then be included once a PR for #1281 is merged. Thanks, @leej3 for pointing out the issue.

@viniciusdc viniciusdc closed this May 24, 2022
@iameskild iameskild removed this from the Release v0.4.2 milestone Jun 2, 2022
@trallard trallard deleted the tweak_node_pool_usage branch October 7, 2022 16:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: terraform 💾 needs: follow-up 📫 Someone needs to get back to this issue or PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants