Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new OVH cluster #2407

Closed
2 of 6 tasks
minrk opened this issue Nov 10, 2022 · 8 comments
Closed
2 of 6 tasks

new OVH cluster #2407

minrk opened this issue Nov 10, 2022 · 8 comments

Comments

@minrk
Copy link
Member

minrk commented Nov 10, 2022

OVH is migrating us to a new OVH project, as a part of renewed funding via numfocus.

We'll have an actual cloud project, so we have full control of the cluster and anything else we need to deploy, instead of having a cluster provided to us, so we'll be able to do things like upgrade and recycle nodes. The new setup should also make the cluster less unique (I think we can use load balancers, letsencrypt, etc. unlike the current cluster), since OVH now has a managed kubernetes service. The project is owned by numfocus, as the direct recipient of the funding.

Things to do:

  • get folks access to the project (I think folks have to create an OVHCloud account and send your user id (rm111111-ovh) to @SylvainCorlay as the numfocus representative)
  • (figure out if we can get project admin permissions to add users, so @SylvainCorlay doesn't have to be the only one who can do it)
  • deploy container registry, cluster with terraform, like we do for GKE
  • add new cluster to federation as ovh2
  • when things are going smoothly, remove ovh from federation so OVH can shut it down
  • (later) after ovh is shutdown, migrate ovh domain(s) to new cluster

I'll have time to work on this starting next week when I finish with most of my teaching for the semester.

cc @mael-le-gal


@SylvainCorlay
Copy link

Regarding the admin access to the NumFOCUS institutional account, this was created with a dedicated email that could have the credentials made available in a password manager. For most operations (besides adding and removing maintainers), being an admin on the project should be the only requirement.

@minrk minrk mentioned this issue Nov 14, 2022
9 tasks
@minrk
Copy link
Member Author

minrk commented Nov 14, 2022

I've started testing out deployment with terraform in #2414, and so far it seems to be going fine. I've hit some hiccups with failed deployments without any error messages (specifically, deploying the registry only got 'state=ERROR' with no messages), but retrying a couple of times and it went okay.

@manics
Copy link
Member

manics commented Nov 15, 2022

I'm happy to help/look, could I have access please?

@minrk
Copy link
Member Author

minrk commented Nov 16, 2022

@manics You have to create an account, then send the user id (rm111111-ovh) to Sylvain, and he can add you to the project.

@consideRatio
Copy link
Member

@minrk I've deployed prometheus 16.0.0 now, but not on ovh2 or similar.

There is a manual step you need to take to do it. You need to first scale the prometheus server replicas to 0, and delete the node-exporter daemonset before upgrading to currently latest mybinder chart.

See #2419, practically. these commands or similar adjusted for name and namespace.

kubectl scale deploy prometheus-server --replicas=0

kubectl delete ds prometheus-node-exporter

@rmorshea
Copy link

rmorshea commented Dec 23, 2022

Unsure if this is related, but I find that some widgets fail when my repos are deployed via https://ovh2.mybinder.org even though they work fine when deployed via https://mybinder.org/. Is this expected at the moment?

@choldgraf
Copy link
Member

What's the status on this one? Is there anything we should try to complete in this effort as a part of shutting down gke?

@minrk
Copy link
Member Author

minrk commented Apr 19, 2023

OVH2 is fully operational and taking ~15% of traffic right now (I don't understand why our load-balancing doesn't make that closer to 33%, which stated capacity ought to result in). I think this can be closed in favor of more specific reliability issues like #2514

@minrk minrk closed this as completed Apr 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants