Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[New Hub] Allen Institute Summer Workshop on the Dynamic Brain #1440

Closed
7 of 9 tasks
colliand opened this issue Jun 20, 2022 · 147 comments
Closed
7 of 9 tasks

[New Hub] Allen Institute Summer Workshop on the Dynamic Brain #1440

colliand opened this issue Jun 20, 2022 · 147 comments
Assignees

Comments

@colliand
Copy link
Contributor

colliand commented Jun 20, 2022

Hub Description

The Allen Institute requests a hub for their upcoming Summer Workshop on the Dynamic Brain

Community Representative(s)

@mabuice
@fcollman

Important dates

Notes: target and required dates are outdated, we need to update them accordingly to new information and prioritization.

  • Target start date: 2022-07-31
  • Required start date: 2022-08-05
  • Any important dates for usage: The event runs 2022-08-20 through 2022-09-04.

Hub Authentication Type

GitHub Authentication (e.g., @MyGitHubHandle)

Hub logo information

Hub user image

Extra features you'd like to enable

  • Specific cloud provider or datacenter: AWS
  • Dedicated Kubernetes cluster
  • Scalable Dask Cluster

Other relevant information

No response

Hub URL

allen-swdb.2i2c.cloud

Hub Type

daskhub

Tasks to deploy the hub

  • Engineer who will deploy the hub is assigned
  • Deploy information filled in above
  • Initial Hub deployment PR: Add allen-swdb hub #1585
  • Administrators able to log on
  • Community Representative satisfied with hub environment
  • Hub now in steady-state
@GeorgianaElena GeorgianaElena changed the title [New Hub] [New Hub] Allen Institute Summer Workshop on the Dynamic Brain Jun 21, 2022
@damianavila damianavila moved this to Needs Shaping / Refinement in DEPRECATED Engineering and Product Backlog Jun 21, 2022
@damianavila
Copy link
Contributor

@colliand request ack. I have added the issue to our backlog board and we will prioritize the hub deployment accordingly to available eng resources so we can deploy the hub on a timely manner.

@mabuice, @fcollman, we will ping you soon with some questions about the specific of the hub deployment.

@damianavila damianavila removed their assignment Jun 21, 2022
@fcollman
Copy link

Where can i read instructions about the requirements for the course docker image? (i.e. what should be running on what ports/etc?)

Will plan on putting materials and Dockerfile in this repo (just a stub for now)
https://github.com/AllenInstitute/swdb_2022

@colliand
Copy link
Contributor Author

Hi Forrest! This and related pages in 2i2c's docs may help. Our engineers will likely have better pointers soon. https://docs.2i2c.org/en/latest/admin/howto/environment/index.html

@sgibson91
Copy link
Member

@fcollman The best thing to do is to fork this repo: https://github.com/2i2c-org/hub-user-image-template and follow the instructions Jim linked to setup your environment. There are docs on using nbgitpuller for materials https://docs.2i2c.org/en/latest/admin/howto/content.html

@colliand
Copy link
Contributor Author

Following a suggestion from @mabuice, I am adding Saskia de Vries @saskiad to this issue thread and related email chains. Welcome Saskia!

@mabuice
Copy link

mabuice commented Jul 22, 2022

Adding @aamster and @morriscb, representatives from the Allen Institute Technology team to this thread.

@mabuice
Copy link

mabuice commented Jul 22, 2022

Hub repo is here: https://github.com/AllenInstitute/swdb_2022_hub_image

(I have not yet gone through the steps in the instructions other than creating this repo.)

Content repo is here: https://github.com/alleninstitute/swdb_2022
The content repo contains the necessary splash logo:
https://github.com/AllenInstitute/swdb_2022/blob/main/resources/cropped-SummerWorkshop_Header.png

If I understand correctly, the above hub_image is all we need to get something off the ground, and we can continue to configure and modify it as we go forward.

We have an AWS account with an associated domain name that we'll be using for this. We will serve data to students via an S3 bucket through this account, so whatever is necessary on the configuration end to facilitate this will need to be done (hence connecting @aamster and @morriscb, who will be preparing the data on our side).

From reading through the docs I understand (more or less) how to configure the hub image and how to prepare content (although we'll likely have questions as we go through this process). I'm unclear on how persistent storage for the students works and how we configure that (or if that's just part of the magic that happens in the background).

@mabuice
Copy link

mabuice commented Jul 25, 2022

Alright, I've followed the instructions on this page: https://docs.2i2c.org/en/latest/admin/howto/environment/hub-user-image-template-guide.html#hub-user-image-template-guide-how-to up to step 6, which unless I've misunderstood something is as far as I can go.

@mabuice
Copy link

mabuice commented Jul 26, 2022

@damianavila Are there any further steps you are waiting on from us? When is a reasonable timeline to expect an initial hub to be up and running?

@damianavila
Copy link
Contributor

@mabuice, this new hub request is currently in the pipeline of new hubs to be deployed.
I am trying to secure resources to make this happen as soon as possible.

Accordingly to the initial request, the event is happening by the end of August, can you confirm?
Can you also clarify how much time you might need to interact with the hub and get ready? Is a few days before the event enough?

Btw, I will update the dates at the top of the issue to reflect reality as soon as I get more clarity about your hub availability needs. Thanks!!

@mabuice
Copy link

mabuice commented Jul 26, 2022

The event itself starts August 20th. We were hoping to have the initial hub up and running by the end of July so that we can iterate on the environment and test the various parts we need to incorporate (data from AWS, MySQL server for one dataset, etc.) as well as making sure we can get TAs and staff comfortable with administering the environment and fix our course materials beforehand. We would like as much time for that as you can give us. This is why I threw together the basic hub linked above.

@colliand
Copy link
Contributor Author

I updated the Required start date in the top entry of the issue to 2021-07-31.

@damianavila
Copy link
Contributor

@mabuice, thanks for the additional information!
I have assigned the task to @yuvipanda who is the engineer who is going to deploy the new hub.

I updated the Required start date in the top entry of the issue to 2021-07-31.

I moved that date to be the target date and I add a few more days to the required date.
Realistically, I think we are going to be closed to the target date but I do not want to promise something if there is a chance of delays, and given the fact that today is July 27th, I think adding a few more buffer days set the right expectations.

We have an AWS account with an associated domain name that we'll be using for this.

@mabuice, can you provide access to @yuvipanda? Thanks!!
Ref: https://infrastructure.2i2c.org/en/latest/topic/cloud-auth.html#access-individual-aws-accounts.

@mabuice
Copy link

mabuice commented Jul 27, 2022

For hooking up the right AWS account I’m going to connect @yuvipanda with @aamster and @morriscb.

@yuvipanda
Copy link
Member

ty, @mabuice! I'm at yuvipanda@2i2c.org for an invite.

@aamster
Copy link

aamster commented Jul 28, 2022

@yuvipanda I just granted you access. You should have received an email. Let me know if you have what you need.

@yuvipanda
Copy link
Member

@aamster can confirm I have access! \o/

I'll also grant access to the other 2i2c engineers shortly. Hub should be up in a day or two. Thanks!

@damianavila damianavila moved this from Needs Shaping / Refinement to In progress in DEPRECATED Engineering and Product Backlog Jul 28, 2022
@yuvipanda
Copy link
Member

I'm working on this now!

@yuvipanda
Copy link
Member

I'm setting this up on us-west-2 (Oregon) now. I had to delete the existing empty VPC (swdb-2020-vpc) to make room for this, as we were up to the limit of VPCs (5)

yuvipanda added a commit to yuvipanda/pilot-hubs that referenced this issue Aug 2, 2022
- Also add a generate-cluster command to our deployer that generates
  the jsonnet & tfvars files from a template

Ref 2i2c-org#1440
@mabuice
Copy link

mabuice commented Aug 19, 2022

might need cudatoolkit==11 on the dot. The K80 might not be compatibility with later versions.

But let me try explicitly specifying pytorch-gpu...

@yuvipanda
Copy link
Member

ok, the GPU profile should now only be available to people with the GPU team on the staging hub. Can you get someone else to test it too? if it works I'll deploy it to production.

@mabuice
Copy link

mabuice commented Aug 19, 2022

will do.

@mabuice
Copy link

mabuice commented Aug 19, 2022

pytorch -> pytorch-gpu fixes the issue, even with cudatoolkit==11.

Solved.

@mabuice
Copy link

mabuice commented Aug 19, 2022

@yuvipanda: @fcollman can see the gpu instance on the staging hub.

@yuvipanda
Copy link
Member

@mabuice I see the GPU profile! but same as before with the image - I can see nvidia-smi show the GPU, but pytorch doesn't see it. I am using the latest tag available in https://quay.io/repository/mabuice/swdb2022?tab=tags which is a5fb5c42540b. I see that you have admin now, so you should be able to set the image for staging too at https://staging.allen-swdb.2i2c.cloud/services/configurator/.

@mabuice
Copy link

mabuice commented Aug 20, 2022

The current image with explicitly installed pytorch-gpu has torch.cuda.is_available()==True. We're good on that end. Tested on staging.

@yuvipanda Looks to me like we can update the main Hub.

@yuvipanda
Copy link
Member

@mabuice done!

image

@mabuice
Copy link

mabuice commented Aug 22, 2022

@yuvipanda Is the user of a given instance on the sudoer list? Is there a sudo password?

@yuvipanda
Copy link
Member

@mabuice there's no sudo enabled in the container. What is the use case you're trying to solve?

@mabuice
Copy link

mabuice commented Aug 22, 2022

The current use case is that one of our more technically minded TAs asked me if we had sudo access, so I said I didn’t think so, but I’d ask. :).

@yuvipanda
Copy link
Member

@mabuice aaah, cool :) Almost everything they can do with sudo access, they can by modifying the imgae via PRs!

How's the event going?

@mabuice
Copy link

mabuice commented Aug 22, 2022

That’s kinda what I figured. 😀

Things are going pretty well. We’ll see how it goes when things ramp up later in the week.

@yuvipanda
Copy link
Member

@mabuice just wanted to check-in - how is it going? :)

@fcollman
Copy link

It's going well! we have had no major issues in the introduction portion of the dataset, and we even did a couple of within course live nggitpuller links with relatively few issues.

we do seem to sporadically run into edge cases that I don't totally understand where people aren't able to do the nbgitpuller, and we revert to renaming their folders and pulling a fresh one. But that has been the minority of the experience.

@yuvipanda
Copy link
Member

@fcollman if you can report the errors they get in https://github.com/jupyterhub/nbgitpuller/, that would be helpful!

yuvipanda added a commit to yuvipanda/pilot-hubs that referenced this issue Aug 26, 2022
- Also add a generate-cluster command to our deployer that generates
  the jsonnet & tfvars files from a template

Ref 2i2c-org#1440
yuvipanda added a commit to yuvipanda/pilot-hubs that referenced this issue Aug 29, 2022
- Also add a generate-cluster command to our deployer that generates
  the jsonnet & tfvars files from a template

Ref 2i2c-org#1440
@yuvipanda
Copy link
Member

@fcollman @mabuice how did this go?

@colliand
Copy link
Contributor Author

This hub can be decommissioned at the end of September. FYI @damianavila .

@mabuice
Copy link

mabuice commented Sep 20, 2022

@yuvipanda It went very well! Thank you for your help!

@damianavila
Copy link
Contributor

Thanks @mabuice for your feedback!

Keeping this issue open until we merge: #1585.

This hub can be decommissioned at the end of September. FYI @damianavila .

Btw, I have created a decommission issue over here: #1722.

@damianavila damianavila moved this from In progress to Waiting in DEPRECATED Engineering and Product Backlog Sep 20, 2022
yuvipanda added a commit to yuvipanda/pilot-hubs that referenced this issue Sep 30, 2022
- Also add a generate-cluster command to our deployer that generates
  the jsonnet & tfvars files from a template

Ref 2i2c-org#1440
@choldgraf
Copy link
Member

I'm removing the "due date" for this issue because we've now deployed this hub. I believe that we can also close this issue (especially once #1722 is complete) but will leave that to others to decide since you have more context than I do.

@yuvipanda
Copy link
Member

This is done!

Repository owner moved this from Waiting to Complete in DEPRECATED Engineering and Product Backlog Oct 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

No branches or pull requests

9 participants