-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use spot instances for dask workers #605
Conversation
- Don't change the instance type of the dask workers - so no mixed instances. This is a straight port of what we have now to just use spot instances. - x1 spot instances can't be created due to quota issues, so we remove them from the pool. Fixes 2i2c-org#490
I've deployed and tested this. |
Issues to create from here:
|
thanks so much for tackling this! @yuvipanda I've created a tracking issue for this one, think we could use that to track the extra items you wanted to follow-up on? And merge this PR since it is deployed? #607 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With resolved comments, this LGTM
eksctl/carbonplan.jsonnet
Outdated
// Node definitions for notebook nodes. Only `instanceType` is | ||
// supported as a property. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Node definitions for notebook nodes. Only `instanceType` is | |
// supported as a property. | |
// Node definitions for dask worker nodes. Only `instanceType` is | |
// supported as a property. |
@@ -60,12 +69,12 @@ cluster { | |||
"hub.jupyter.org/dedicated": "user:NoSchedule" | |||
}, | |||
|
|||
} + n for n in nodes | |||
} + n for n in notebookNodes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know jsonnet, but there is an inconsistency here.
Below it sais } for n in daskNodes
, but here it sais } + n for n in notebookNodes
. I'll go ahead and guess this is wrong, but perhaps its the other that is wrong? No clue.
} + n for n in notebookNodes | |
} n for n in notebookNodes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@consideRatio yeah, I was trying to be 'clever' and just be able to set instanceType
for both non-spot and spot node pools. However, this becomes unmaintainable soon, so I've reverted that. Now the config is the same as what you'll pass to eksctl.
eksctl/libsonnet/nodegroup.jsonnet
Outdated
volumeSize: 80, | ||
labels+: { | ||
// Add instance type as label to nodegroups, so they | ||
// can be picked up by the autoscaler | ||
'node.kubernetes.io/instance-type': $.instanceType, | ||
// can be picked up by the autoscaler. If using spot instances, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// can be picked up by the autoscaler. If using spot instances, | |
// can be picked up by the autoscaler. If using spot instances, |
Indentation nitpick
For spot instances, `instanceType` can not be set on the nodePool object, or eksctl complains. Before this commit, we were still just specifying 'instanceType', but magically converting it to the appropriate incantation for spot instances. This leads to confusion, and missing features (for example, the ability to specify multiple instanceTypes). This commit switches us to using just plain eksctl schema, at the cost of a slightly more verbose config
Thanks a lot, @consideRatio! what do you think of it now? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
I don't feel comfortable in this situation: either I press merge or I don't.
If I merge, I'm not feeling comfortable as I don't know what happens in the CI system and feel that I have a responsibility if I merge something and the CI system messes things up in a deployment or something happen making me think so.
If I don't press merge, I feel that it would slowed down a process that isn't worth slowing down after one approved review in this case.
I'll opt to not press merge this time and count on self-merge is to be considered an acceptable policy given this PR review approval.
@consideRatio we're discussing that topic over in here too: 2i2c-org/team-compass#175 |
mixed instances. This is a straight port of what we have now
to just use spot instances.
so we remove them from the pool.
Fixes #490