-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[New Hub] LEAP Pangeo #1050
Comments
Hey all - I put down some details for the new LEAP Pangeo hub that we're deploying for @rabernat . I think that we need to clarify some of the information above in order to know what kind of environment / hardware to set up. @rabernat could you take a look at the questions in the top comment and resolve them w/ answers or discussion? |
@rabernat ok I've deployed a standard dask based hub at https://leap.2i2c.cloud! It's configured to be similar to the pangeo hub. Next steps:
I'll work on adding GPUs as well. |
Hi Folks! This is awesome. Sorry for not responding earlier to this issue. Somehow I missed the notification.
Just me for now. May add others later.
👍
We would like to use the latest image from https://github.com/pangeo-data/pangeo-docker-images/tags, currently at 2022.02.04. However, that is probably not possible due to #1031, which is preventing us from updating to the latest image due to dask gateway incompatibilities. We also want all of the different machine types to have the option to launch the ML-version of the image. However, the ML notebook image has a small 🐛 right now (see pangeo-data/pangeo-docker-images#294). We would like to add a larger machine type, something equivalent to e2-standard-8, with 8 vcpus and 32 GB memory. We will need an option to attach a GPU. I am not sure which one and would appreciate an rundown of the options / costs. Going forward, it would be great to be able to select any of the tags from a dropdown, as part of a matrix of profile-list spawner options (see jupyterhub/kubespawner#307).
I have created the
GCP us-central-1 would probably be ideal, like the other cluster. |
@rabernat I've investigated #1031 (comment) and I think it's sorted out. The LEAP hub now has the latest pangeo image. Next things to do:
|
@rabernat I've redid the size options available: These put one user per node as well, which I think is a better fit for research hubs. |
Has access been granted to the Also, where does this hub configuration live? |
I actually just had a new report that it IS working, so I think we are good in terms of authorization. |
So I just got some good feedback from the LEAP Executive Committee about this hub. First off, everyone is very excited and happy to have the hub up! 🎉 So thanks for getting this off the ground. 🙏 Most of the feedback is from PIs who are very experienced at using HPC resources for supporting large research groups. I think these points will be quite universal for large and complex communities like LEAP, so I hope they can stimulate some useful discussion. Onboarding TutorialsAs specified in our contract, 2i2c will provide onboarding training for the hubs. Question for 2i2c: What is the timeframe and process for organizing these training sessions? OffboardingOver the 5-10 years of this project, people will exit the project. We need a sustainable approach to not only onboarding but offboarding. Question for 2i2c: Beyond simply removing their access via the github group, what is the process for offboarding them and specifically purging their user data from storage so we don't continuously accumulate abandoned data? Tiering of AccessThere is a huge range of different types of participants in LEAP and users of LEAP-Pangeo: from high-schoolers who will participate in a hackathon for 1 day to senior faculty who will do cutting edge research over many users. It seems inevitable that we will need different tiers of access. Specifically, we would like to limit certain profile-list options (e.g. GPUs) to certain user groups. Question for 2i2c: is it possible to associate distinct profiles with different user sub-groups? Metrics and ReportThis one will be difficult I think, but I am stating it clearly here: it is important for LEAP to have user-level breakdowns of hub usage and costs. This is what PIs who work on HPC centers are used to and this is what they expect here. Specifically, I would like to do a query for a specific user (e.g. myself
The sum of these individual user costs should roughly add up to the total hub cost. The reason for this is based on the PIs years of experience on HPC where a small number of users (sometimes maliciously) consume a disproportionate amount of resources. Identifying and diagnosing such situations is imperative. Question for 2i2c: What technical developments are required to deliver this granularity of reporting? What is a reasonable time-frame for implementation? |
@rabernat Great questions! I opened 2i2c-org/features#8 to discuss offboarding. I'll let @choldgraf speak about some of the other questions. I also know we already have issues wrt reporting elsewhere... |
Hey @rabernat - thanks for these follow-up questions and requests. Some of them there are plans in the works, and others will need more investigation and discussion before moving forward. I'll touch on each below:
Right now, we have a job position open for the person that will spearhead these efforts: https://2i2c.org/jobs/2022/product-community-lead/ . We expect to start reviewing applications in a week or so, and will hire somebody on a rolling basis once we find the right candidate. I expect that process to take another month at least. In the meantime, I wonder how we can have the most impact with low-hanging fruit for the LEAP community. Can we discuss the most important things to focus on in the issues linked below? If there are specific needs that LEAP has right now, we can create focused issues for them.
See the issue below where we're tracking this question. Semi-related: we have these offboarding docs but those are for an entire hub migrating off the service, not for the regular "churn" of users on a hub.
I don't believe this is currently possible in JupyterHub. I looked around in KubeSpawner but didn't find anything about this specifically, so I've opened up the issue below to track and discuss:
We are tracking development efforts to improve reporting / monitoring in these two issues that are both actively under development. I'm not sure what the timeline is on them, but I think we'll be able to track hub-level usage/costs by the end of Q2 or so.
Our current targets are to calculate "usage and costs" at the hub level, and at the user-level focus on "usage" (memory, CPU, etc) rather than calculate costs per-se. Let's discuss this one in those more specific issues? |
I actually think this is possible, it's just not default out-of-the-box and requires custom logic. See @consideRatio's wonderful Discourse post on the topic here: https://discourse.jupyter.org/t/tailoring-spawn-options-and-server-configuration-to-certain-users/8449 (I will add this to the related issue too. Edit: Ah, I see it's already been mentioned over there!) |
@sgibson91 good point! Indeed @consideRatio provided some helpful comments there as well. I've opened up a 2i2c issue to track this one, since it seems the change wouldn't be in KubeSpawner but instead would be in our config / deployment: #1120 I believe that we have all major parts of this hub worked out, so once #1074 is merged I think we can close this issue and spot-check more feature improvements or issues in support channels + dev issues. Anybody object to that? |
It was great to read jupyterhub/kubespawner#589 (comment) and @consideRatio's suggestion of how to implement custom spawner logic. It sounds like this is technically feasible for 2i2c today. Based on this I would like to request that 2i2c implement this sort of customized spawner for the LEAP hub. To begin, we would like two tiers:
Having tiered access is very important to the LEAP executive committee. Delivering this feature quickly will be a win for 2i2c in terms of demonstrating ability to be responsive to feature requests, building trust from the LEAP PIs. |
It makes me happy you thought what I've written it was helpful @rabernat! @rabernat are If so I think the following issue is of very high relevance to address: jupyterhub/oauthenticator#492, it is about retaining the information captured during authentication about github org/team membership for later use. That could for example be when a user is about to be presented spawn options - which is at a separate time than during login even though it can be something happening in a quick succession. |
I've opened up an issue to track this action, since it is complex enough that I think it warrants its own description / implementation discussion, etc: Also added it to our project backlog so that we can consider it in the context of the other development efforts we're undertaking. Agreed that having a nice story for this will be impactful for many, and it would be extra useful since LEAP could use this feature right now. |
yes, and both are public:
There is also
|
@rabernat note they don't look to be public to me, i get |
I am checking in to see if there is any progress on the issue of the custom spawner for the LEAP hub? I would like to be able to share an update with the LEAP executive committee. |
@rabernat I am going to start actively working on it this week, and should have an update on how long this might take soon. |
We agreed at the planning meeting this is completed and any follow-ups already have dedicated issues. |
profile_list is now dynamically generated, based on the GH teams user is a part of. This list of teams is refreshed only during login - so user needs to log out and log back in to see new teams! This also means that users removed from teams on GH will still have access to the profiles until they are logged out from the admin panel too (to be fixed) This approach is taken over customizing options_form to protect against users just bypassing the options form and using the API directly to spawn servers. Deployed to the leap hub, except 'large' & 'huge' is only available to leap-stc:leap-pangeo-research members, not to leap-stc:leap-pangeo-users members - based on 2i2c-org#1050 (comment) Fixes 2i2c-org#1146
profile_list is now dynamically generated, based on the GH teams user is a part of. This list of teams is refreshed only during login - so user needs to log out and log back in to see new teams! This also means that users removed from teams on GH will still have access to the profiles until they are logged out from the admin panel too (to be fixed) This approach is taken over customizing options_form to protect against users just bypassing the options form and using the API directly to spawn servers. Deployed to the leap hub, except 'large' & 'huge' is only available to leap-stc:leap-pangeo-research members, not to leap-stc:leap-pangeo-users members - based on 2i2c-org#1050 (comment) Fixes 2i2c-org#1146
Hub Description
LEAP Pangeo is an extension of the Pangeo project to new communities around research and education with Machine Learning. The hub's environment will be nearly identical to the Pangeo Hubs, and run on GKE, though the setup might be slightly different and we should get clarifications from @rabernat.
Community Representative(s)
@rabernat
Not sure if there are others serving as leads on the project.
Important dates
Hub Authentication Type
Other (may not be possible, please specify in comments)
Hub logo information
TODO: @rabernat does this look correct?
Hub user image
TODO: @rabernat can you advise here? Is this the Pangeo user image?
Extra features you'd like to enable
TODO: @rabernat does it need to be in a specific data center?
Other relevant information
There is a GCP billing account with credits for this hub. It is under the 2i2c.org GCP organization. Here are the details:
Hub URL
leap.pangeo.2i2c.cloud
Hub Type
daskhub
Tasks to deploy the hub
Follow-up issues
The text was updated successfully, but these errors were encountered: