-
Notifications
You must be signed in to change notification settings - Fork 14
adds AWS infra for instantiating and destroying all baseline substratus components #170
Conversation
Hey @brandonjbjelland and team, I had a quick thought I wanted to pass along so I don't forget about it.
Karpenter may be a good option to get the intelligent auto-provisioning you are looking for. Karpenter docs on provisioners. I'll double-check what I ended up doing to get I am looking forward to testing Substratus out on Monday. |
You got a point there, I don't see a critical need yet, so we can skip them for now. |
Spinning up EKS clusters with Terraform is such a pain, maybe by design? I wasn't expecting it to be so complex. The eksctl tool does seem to make it easier: Not sure if it will fit all our needs or takes away too much flexibility. I personally prefer terraform but seeing the struggle and complexity of EKS, it might be fine to consider something like In the end we expect most users to already have a K8s cluster when they want to use Substratus and those end-users will choose their own tooling of choice to manage EKS + nodegroups. So the main purpose of the bundled installer/EKS cluster creator is mostly development and PoC phase to get rolling quickly with minimal issues. |
@BOsterbuhr thanks for the pointer! Karpenter looks viable here Sidebar: if you're hoping to use substratus on AWS, we're just getting started on adding support. Running on GCP is the paved path we have today. |
In my experience if you are just trying to get dev/PoC support for AWS quickly then As for Karpenter, yeah unfortunately AWS doesn't seem to be in a rush to support other cloud providers. Sidebar: I'll test on GCP first to get a better understanding of everything and will watch for AWS support. |
532de8b
to
7fd3104
Compare
We discussed internally today and arrived at this same consensus. In short, delivering a cluster is not our value add, where we should spend time, or put in a lot of code (which inevitably will rot, have feature requests itself, etc.). We just need the simplest possible way to get a minimum cluster up in each supported provider for someone starting at zero - that may or may not be through terraform. Here it seems
Though we're being very careful with the dependencies we take on, I don't know that it changes our calculation here - karpenter def presents itself as the simplest way to auto-scale EKS on hydrogenous on hardware with minimal overhead. We very much feel incentivized to avoid having a long list of node groups for the different flavors of GPU-supported instances if that's our alternative.
Thank you! 🙏 Any feedback is highly appreciated, @BOsterbuhr ! ❤️ |
b5b0cda
to
0eb7cba
Compare
0eb7cba
to
4ef5f84
Compare
0f41505
to
03a444d
Compare
6c0458a
to
b90e48f
Compare
b90e48f
to
0ae8e30
Compare
0ae8e30
to
c6b53ea
Compare
e95dcc2
to
e846c81
Compare
761f548
to
f57fc8b
Compare
05b491a
to
0dde133
Compare
0dde133
to
5149163
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good stuff! Added some comments, but none of them are big deals
Karpenter seems to have Karpenter specific node labels you have to use in your pod spec. This might require some more design discussion and require making use_karpenter a flag or expose it on Substratus resources somehow? |
Can you explain what you mean? Your pods shouldn't have to even know Karpenter exists. |
Let's say you want to run a pod on A100 GPU then how would you ensure the pod gets scheduled on a node that has A100 GPU on Karpenter vs non Karpenter? You might have nodes with T4, V100 and A100 in the same cluster. Note I might be totally wrong since I haven't used Karpenter myself. I was reading this: https://karpenter.sh/preview/concepts/scheduling/ That doc made me believe in order for Karpenter to create a node that has A100 I would have to set nodeSelector in the pod to karpenter.k8s.aws/instance-gpu-name = a100 OR use affinity rules. I got a GCP background so this is my first time seriously looking into Karpenter. For reference, I'm hoping there is a label like |
Oh ok, that makes sense, so instead of making the end user understand that you are using Karpenter you could just create a provisioner with a User-Defined Label as a requirement and then have your end-user use that label as a node selector. |
The issue is that there doesn't seem to be a node label that exposes the GPU type on AWS unless you use Karpenter, however at the same time we also don't want to depend on Karpenter and ensure Substratus works well without Karpenter. A key principle of Substratus is to minimize dependencies so it's easier to get Substratus to run in any EKS cluster. Actually, I might be wrong all together and should just get a GPU node on EKS to verify myself. Seems there is in fact a label that would have the info we're looking for: https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/aws/aws_cloud_provider.go#L38C28-L38C31 |
yeah you're right it looks like that is where the google label you mentioned is coming from as well https://github.com/kubernetes/autoscaler/blob/fc5870f8eaf850dd1e18a5884a7491168dc5d8a0/cluster-autoscaler/cloudprovider/gce/gce_cloud_provider.go#L37 The only issue I could see is one I think you all brought up previously; when using the Kubernetes autoscaler you have to manage a separate node group of each different instance type. But that may be worth it if you don't want any dependencies. |
76ff2c9
What is this change?
This change adds a script that we (and users) can use to create a complete substratus environment on a new AWS account.
Why make this change?
#12
Caveats, questions, TBD
Q: Are the local/ephemeral SSDs important for any sort of workload we support? Trying to understand if that's critical here too.- size is important but local SSDs probably not. Leaving this out for now.Unknown: I added karpenter but haven't throughly tested how well it works at auto-provisioning. GPU backed noe
Caveat: The features we get enabled on a GKE cluster through simple flags are incredibly fussy on EKS and I don't have them working. I'll dig further (maybe I've missed some events) but I think we might need to manage these on our own. e.g. I've never seen these fail on GKE:~Equivalent features baked into EKS configuration fail consistently. I've seen timeouts across
coredns
,vpc-cni
,ebs-csi
regardless of the order of deployment or how I do it. So far I've tried:eks_add_on
resources (docs) andThey all fail. We need to attach some of the IRSA roles created in the code here to those resources however we instantiate them so if that's out of scope of terraform, some additional outputs will be necessary (i.e., arns of the roles).~
With time, I'll add to this...