Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new OVH cluster #2414

Merged
merged 14 commits into from
Nov 21, 2022
Merged

new OVH cluster #2414

merged 14 commits into from
Nov 21, 2022

Conversation

minrk
Copy link
Member

@minrk minrk commented Nov 14, 2022

implementing #2407

So far:

  • cluster is deployed with terraform
  • registry is deployed with terraform, along with accounts for credentials
  • binderhub is deployed to the new cluster (currently at https://ovh-test.mybinder.org)
  • added new cluster to CI scripts as ovh2

So far it's simpler and less unique than the previous OVH cluster, since we are using standard credentials, letsencrypt, a load-balancer service, etc. Nothing is manually created and no nodes need special treatment.

Stil to do:

  • figure out the right node flavors (after quotas are worked out)
  • select data center
  • put the cluster on a private network?
  • assign actual domain name, once we are confident we've deployed the cluster for the last time.
  • put docker credentials in github repo secrets

Questions for @mael-le-gal and OVH folks, especially since I'm new to the OVH tools:

  • data centers - do you have any suggestions for region in which to run the cluster and/or registry? I've picked GRA/GRA9 for now.
  • tokens for OVH - I created credentials following the steps in the ovh docs to use in terraform, but it's quite unclear what permissions those credentials actually have. It seems like the token has permission to act as me, which is not what we want. Presumably, I should create credentials that only have access to this particular project, but I can't find any way to manage the token credentials once they've been created. Where do these tokens/access get managed? Is there an example of creating a 'service account' with appropriately limited access to only the project it is meant to control?
  • are there any examples of using OVH object storage for terraform state? We use backend = gcs for google, and I've done the same with the OVH deploy, but it would probably be nice to store it in the OVH object store. I'm guessing the s3 backend can be configured with the right options, I just don't know what they are.
  • the quotas on the account are currently very low. It's asking me for billing info, but I shouldn't need any billing info on my account to manage the project. Does this mean only @SylvainCorlay can do this? If so, @SylvainCorlay can you order the 20VM quota? We may need 50, but let's start with 20 for now. We can't really deploy a toy instance with the base limit, since it doesn't have enough RAM for a single cluster node.
  • The version of Harbor in the managed service is quite old (2.0.1 from 2020, current is 2.6), which causes some problems with the harbor terraform provider, which doesn't support harbor older than 2.2. What is the chance of harbor getting upgraded? I've worked around it since I think this will take time, but it would be nice to have a reasonably up-to-date registry

Thanks!

@minrk minrk mentioned this pull request Nov 14, 2022
6 tasks
@mael-le-gal
Copy link
Contributor

mael-le-gal commented Nov 15, 2022

Hello @minrk

Here are some answers for yours questions :

About the region :

Your previous cluster was hosted in GRA region (which stands for Graveline) so I suggest to select the same.

About the ovh token :

They are all related to our API.
The description of the API can be found here : https://api.ovh.com/console/
In your case the routes that you need will mainly be under the /cloud/project/{serviceName}/* section. If you replace serviceName by your project id and you create your token with that route configured, then your token will be scoped only on your project.
If later you want to manage your created tokens you can do it by calling the routes starting with /me/api/credential. This current system is not intuitive, sadly I agree on that ...

About the s3 provider :

Maybe this can help you

Before to do that I guess you will have to created your s3 user and created your bucket

@minrk minrk marked this pull request as draft November 15, 2022 11:22
@mael-le-gal
Copy link
Contributor

About the registry

I got an answer from the registry team :
They are aware that the registry version is an old one.
The Harbor version 2.4.1 should be available soon (December)
They are working on improving the speed of their delivering process and next year they should provide new version but faster (2.5 / 2.6)

s3 bucket created by hand (just like gcs)
because one pull secret can't grant access to multiple projects

this will be fixed in harbor 2.2
@minrk
Copy link
Member Author

minrk commented Nov 15, 2022

Thanks! I was able to migrate state to s3 on OVH.

THe cluster is now deployed on a private network, which seems right.

The node flavors don't quite line up how we're used to. We typically use GCP's 'highmem' nodes which have a ~7:1 ram:cpu ratio, whereas OVH offers ~4:1 or ~12:1, but not in between. We have used 4 cpu nodes for the core pool, and 8 cpu nodes for the user pool.

I suggest we start with node pools:

  • b2-15 (4 cpu, 15GB) for core, and
  • r2-60 (4 cpu, 60GB) for user nodes

Given the ratios in OVH quotas (~6 GB / core), we could also go with b2-60 (16 cpu, 60GB) for user nodes, and have ~twice the cpu headroom per user that we do on GKE.

I also don't know if there's anything special we want to do with disks for image building. on GKE, we mount SSDs for this, but I think perhaps the boot disk SSDs are fine? At least okay for now.

So if we're confident in the cluster setup so far, I can update the domain and merge to try to start deploying on CI.

@minrk
Copy link
Member Author

minrk commented Nov 15, 2022

@mael-le-gal where can we see logs, e.g. kubernetes cluster event logs? I tried clicking on the 'Logs Data Platform' in the cloud project sidebar, but then it tried to get me to buy it on my personal account. How do we associate logs with an existing cloud project that has the credits, etc.?

@minrk
Copy link
Member Author

minrk commented Nov 15, 2022

Quota has been bumped, so the cluster is now using b2-15 and r2-60.

@mael-le-gal
Copy link
Contributor

@mael-le-gal where can we see logs, e.g. kubernetes cluster event logs? I tried clicking on the 'Logs Data Platform' in the cloud project sidebar, but then it tried to get me to buy it on my personal account. How do we associate logs with an existing cloud project that has the credits, etc.?

Currently I think you can't, except from the control panel UI directly.

More info here

The global OVHcloud offer is currently missing an integrated observability solution so that all logs from all services ends in the same place for customers. We are aware of that limitation and that's on the long term roadmap.

@minrk
Copy link
Member Author

minrk commented Nov 17, 2022

@mael-le-gal can I associate a cloud public IP with a kubernetes load balancer? We do that in other deployments so that if the public kubernetes Service gets recreated, the IP doesn't get released. As far as I can tell, though, I can only associate public IPs with instances, not load balancers.

@minrk minrk changed the title [WIP] new OVH cluster new OVH cluster Nov 17, 2022
@minrk minrk marked this pull request as ready for review November 17, 2022 12:27
now that it seems like we're keeping this cluster
@minrk
Copy link
Member Author

minrk commented Nov 17, 2022

OK! I think this is ready to go to the next step. The cluster is deployed, and I've deployed it manually with python deploy.py ovh2. I believe I've made all the changes necessary for the cluster to get updated from CI here as well, but the only way to know is to merge and try it.

Note that this will not make ovh2 part of the federation. That's a separate step. This will just get the new cluster deployed from CI instead of my laptop, to ensure everything really is working.

so's we don't forget
Copy link
Member

@consideRatio consideRatio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looked good to me overall! I just had some ideas related to imagePullSecret

mybinder/templates/image-pull-secret.yaml Outdated Show resolved Hide resolved
mybinder/templates/image-pull-secret.yaml Outdated Show resolved Hide resolved
"--limits",
"memory=250Mi",
"--requests",
"memory=200Mi",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mael-le-gal coredns was getting OOMKilled after we applied our coredns config to ban a bunch of IPs, so we have to raise this a bit. I'm not sure if that's feedback that would be useful to you.

@minrk
Copy link
Member Author

minrk commented Nov 21, 2022

@consideRatio thanks for the review! Going ahead since the pull secret that had some review was entirely unnecessary and is now removed.

@minrk minrk merged commit a392225 into jupyterhub:master Nov 21, 2022
@minrk minrk deleted the ovh-terraform branch November 21, 2022 07:25
@minrk minrk mentioned this pull request Nov 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants