Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor GCP terraform code + add MOEM-IGE hub #429

Merged
merged 27 commits into from
Jun 24, 2021

Conversation

yuvipanda
Copy link
Member

@yuvipanda yuvipanda commented May 25, 2021

  • Setup the cluster with the terraform google provider, instead
    of the higher level gke module. The code gets simpler, and
    makes more terraform features (like for_each) accessible more easily.
  • Allow multiple notebook and dask nodepools to be set up. Most
    research hubs want 2-3 options of notebook sizes to optimize for
    spend. I attempted to use gke node autoprovisioning instead of
    requiring manual nodepool provisioning, but it consistently
    provisioned nodes bigger than required. We should re-evaluate it
    later.
  • Expose the GCP SA used by the k8s nodes to the user pods. A highly
    restricted SA is used for this, to prevent damage as much as possible.
    Users can then make requests to GCS buckets in other
    projects on behalf of this project.
  • Dask Nodepools will default to matching sizes of the notebook
    nodepools. Can be overriden if necessary.
  • Split terraform code into multiple files for easier maintenance
  • Move tfvars files into a subdirectory and split terraform code
    into multiple files for easier maintenance
  • Remove unused terraform variables
  • Add some inline terraform docs
  • Setup MOEM-IGE cluster + hub with new terraform code
  • traefik tag bump was required to get LE working. It's
    already bumped in newer z2jh versions
  • NFS server was again set up manually, and needed the insecure
    flag - even though other hubs are setup the same way and didn't
    need this. NFS situation needs to be sorted.

Ref #207

TODO:

@damianavila
Copy link
Contributor

I think I am confused here... or I am missing some context.

There is a Grenoble hub already in some of the existing cluster: https://github.com/2i2c-org/pilot-hubs/blob/master/config/hubs/2i2c.cluster.yaml#L305

Why we are deploying a new cluster with the meom name and a new hub there?

@yuvipanda
Copy link
Member Author

I just added a little more context in #207 (comment). Basically they got their own project and we're moving the cluster there.

@damianavila
Copy link
Contributor

Thanks for the clarification and the additional info @yuvipanda!

terraform/meom.tfvars Outdated Show resolved Hide resolved
@damianavila
Copy link
Contributor

needed the insecure flag

Mmm... that is weird... maybe the setup was slightly different in comparison with the previous NFS server?

@yuvipanda
Copy link
Member Author

Mmm... that is weird... maybe the setup was slightly different in comparison with the previous NFS server?

Yeah, I spent many hours trying to figure out what could possibly be different but couldn't :(

terraform/meom.tfvars Outdated Show resolved Hide resolved
terraform/meom.tfvars Outdated Show resolved Hide resolved
@yuvipanda yuvipanda changed the title Add hub for MOEM-IGE group Refactor GCP terraform code + add MOEM-IGE hub May 30, 2021
@yuvipanda
Copy link
Member Author

yuvipanda commented May 30, 2021

I did a major refactor, added inline docs, and updated the PR description :) I've deployed this too.

terraform/main.tf Outdated Show resolved Hide resolved
terraform/main.tf Outdated Show resolved Hide resolved
terraform/main.tf Outdated Show resolved Hide resolved
terraform/main.tf Outdated Show resolved Hide resolved
terraform/main.tf Outdated Show resolved Hide resolved
@damianavila
Copy link
Contributor

I like what I saw here... the splitting helps to understand the structure and should be easier to maintain in the future.
Left some tiny comments, but none of them should be a blocker. We should be able to merge this one soon.

terraform/main.tf Outdated Show resolved Hide resolved
Calls terraform-docs dynamically from sphinx
- Remove terraform-specific dir, not enough docs there.
  Autogenerated reference docs will be moved to a different
  PR as I can't seem to get conda to work
Can't seem to get conda to work on RTD for now
Leftover from environment.yml conversion
@yuvipanda
Copy link
Member Author

Ok, I added docs as well and addressed your comments, @damianavila!

Long-term, (2) is the appropriate way to do this for everyone. However, it affects the size
of the core node pool, since it runs some components in the cluster. For now, we use (1) for
single-tenant clusters, and (2) for multi-tenant clusters. If nobody wants a scratch GCS bucket,
neither option is required.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This last sentence suggests, IMHO, that we can disable the config connector (2) and the matadata concealment (1), but AFAIK, the option (1) is always available even if we do not want a scratch bucket, right?

Copy link
Contributor

@damianavila damianavila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, this is getting big, and what we currently have is in good shape.
I think we should merge this one ASAP and, eventually iterate on leftovers.
Thanks for all the efforts in the docs and explanatory comments, @yuvipanda.

@yuvipanda
Copy link
Member Author

yay, thank you, @damianavila :)

Copy link
Member

@sgibson91 sgibson91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Just spotted a couple of typos

.readthedocs.yaml Outdated Show resolved Hide resolved
docs/topic/cluster-design.md Outdated Show resolved Hide resolved
yuvipanda and others added 2 commits June 23, 2021 22:04
Co-authored-by: Sarah Gibson <44771837+sgibson91@users.noreply.github.com>
Co-authored-by: Sarah Gibson <44771837+sgibson91@users.noreply.github.com>
@yuvipanda
Copy link
Member Author

w00t, tyvm @damianavila and @sgibson91!

This wasn't working earlier because of a typo in
.readthedocs.yml that I didn't spot

This reverts commit c77ef3e.
Since conda seems to now work with RTD

This reverts commit 70b39e8.
--output requires a newer version of terraform-docs on
conda-forge
(conda-forge/go-terraform-docs-feedstock#25).
@yuvipanda
Copy link
Member Author

Thanks to @sgibson91's spot I managed to get conda working on RTD, but need a newer version of terraform-docs. I opened conda-forge/go-terraform-docs-feedstock#25, and we can autogenerate terraform reference docs after that is merged!

@yuvipanda yuvipanda merged commit d8ae087 into 2i2c-org:master Jun 24, 2021
@damianavila
Copy link
Contributor

I opened conda-forge/go-terraform-docs-feedstock#25, and we can autogenerate terraform reference docs after that is merged!

It was already merged!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants