Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automate cluster setup for Azure #512

Closed
5 tasks done
yuvipanda opened this issue Jul 13, 2021 · 9 comments · Fixed by #513
Closed
5 tasks done

Automate cluster setup for Azure #512

yuvipanda opened this issue Jul 13, 2021 · 9 comments · Fixed by #513
Assignees

Comments

@yuvipanda
Copy link
Member

yuvipanda commented Jul 13, 2021

Description

We should write terraform code to programmatically deploy hubs on Azure infrastructure, similar to our AWS and GCP terraform deployments. We already have two hubs running on Azure (#288 and #413) that we can use as reference.

Benefit

Azure probably the least-common cloud that our communities want to run infrastructure on. However, it's still one of "the big three" and will certainly be requested again in the future. Moreover, since we're already running two hubs on Azure, we can use a terraform deployment to make it easier to update and maintain those hubs in the future.

Implementation

Here is the terraform code for the UToronto deployment. We should clean this up, generalize it, and move it to this repository.

We should try to use AzureFile if possible so that we don't have to hassle with setting up our own filesystems on Azure.

Steps to complete this goal

@yuvipanda
Copy link
Member Author

/cc @sgibson91 who has a lot of Azure experience!

yuvipanda added a commit to yuvipanda/pilot-hubs that referenced this issue Jul 13, 2021
yuvipanda added a commit to yuvipanda/pilot-hubs that referenced this issue Jul 13, 2021
yuvipanda added a commit to yuvipanda/pilot-hubs that referenced this issue Jul 13, 2021
yuvipanda added a commit to yuvipanda/pilot-hubs that referenced this issue Jul 13, 2021
- Prefixes in GCP keep resources separate from each other.
  Azure has the native concept of a ResourceGroup for the same,
  so we do not need prefixes.
- Overall structure tries to be as close to GCP as possible,
  but Azure specific naming is used when necessary. For example,
  Azure uses the word `vm_size` where GCP uses `machine_type`,
  and we use `vm_size` here
- Setup notebook and dask nodes with same labels and tains as
  we do on GCP. We can ruse the same logic in our deployers
  this way.
- Remove the NFS server + ansible setup. Similar to EFS, we
  should try use https://azure.microsoft.com/en-in/services/storage/files/
  on Azure

Ref 2i2c-org#512
@yuvipanda
Copy link
Member Author

@sgibson91 do you have any experience with Azure Files? I am hoping we can use that the way we use EFS on AWS

@sgibson91
Copy link
Member

A small amount, nothing that ever managed to get into production though.

This issue kinda devolved into security of the data in the volume and authentication methods, i.e., using the Storage Access Key probably isn't the best move as it has full read/write access. But I also don't think any of the proposed solutions actually work for a Kubernetes deployment (they came from the context of a Data Safe Haven which is a bunch of VMs not controlled by k8s). So I'm just flagging that in case you want to factor it into your considerations.

@choldgraf choldgraf changed the title Add Azure support to our terraform code Automate deploying clusters on Azure Aug 31, 2021
@choldgraf choldgraf changed the title Automate deploying clusters on Azure Automate cluster setup for Azure Aug 31, 2021
@choldgraf
Copy link
Member

Could we keep this one open until our azure cluster setup is documented? We have a placeholder page for this here: https://pilot-hubs.2i2c.org/en/latest/howto/operate/new-cluster/azure.html

Doesn't need to be complex since we're using terraform. I wonder if we could just reuse much of the instructions here: https://pilot-hubs.2i2c.org/en/latest/howto/operate/new-cluster/google-cloud.html ?

@choldgraf choldgraf reopened this Sep 8, 2021
@yuvipanda
Copy link
Member Author

Yep, that PR should've closed #373 not this one! Sorry for not verifying that.

@choldgraf
Copy link
Member

No problem 🙂

@choldgraf
Copy link
Member

Next steps on this one: @yuvipanda and @sgibson91 will collaborate together to write out some rough documentation for this, and we can refine this documentation in the future when we need to deploy a hub on Azure.

@choldgraf
Copy link
Member

I'm removing this from our backlog and replacing it with something a bit more focused on the documentation, since that's the only remaining step: #718

We can close this one when that is done as well

@sgibson91
Copy link
Member

#718 is merged so I believe this can be closed now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants