Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overloaded dask gateway fix #1777

Merged
merged 6 commits into from
May 4, 2023
Merged

Conversation

Adam-D-Lewis
Copy link
Member

@Adam-D-Lewis Adam-D-Lewis commented May 3, 2023

Reference Issues or PRs

Closes #1750.

What does this implement/fix?

Put a x in the boxes that apply

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds a feature)
  • Breaking change (fix or feature that would cause existing features not to work as expected)
  • Documentation Update
  • Code style update (formatting, renaming)
  • Refactoring (no functional changes, no API changes)
  • Build related changes
  • Other (please describe):

Testing

  • Did you test the pull request locally?
  • Did you add new tests?

Any other comments?

@Adam-D-Lewis
Copy link
Member Author

I spun up 50 clusters in 30 seconds with the following code (after nodes were already spun up):

%%time
import concurrent.futures

# define the function you want to run
def my_function(num):
    from dask_gateway import Gateway
    gateway = Gateway()
    return gateway.new_cluster()

# create a thread pool with 10 threads
with concurrent.futures.ThreadPoolExecutor(max_workers=50) as executor:
    # submit the function to the thread pool 50 times
    results = [executor.submit(my_function, i) for i in range(50)]
    
    # retrieve the results of the function calls
    for i, future in enumerate(concurrent.futures.as_completed(results)):
        print(i+1, future.result())

@dharhas
Copy link
Member

dharhas commented May 3, 2023

@Adam-D-Lewis

can you test getting options from dask gateway as well. Also run those scaling tests on the existing nebari cluster and see if they fail.

@Adam-D-Lewis
Copy link
Member Author

Adam-D-Lewis commented May 3, 2023

@dharhas I did both of those things. The test fails in the nebari.quansight.dev, and options worked as expected.

@pavithraes pavithraes added type: enhancement 💅🏼 New feature or request area: integration/Dask Issues related to Dask on QHub status: in progress 🏗 This task is currently being worked on labels May 3, 2023
@Adam-D-Lewis
Copy link
Member Author

Adam-D-Lewis commented May 3, 2023

I can probably fix it so it works with self signed certs by using the k8s service endpoint (e.g. curl -kL http://nebari-conda-store-server.dev.svc:5000/conda-store/api/v1/environment -H "Authorization: Bearer <token>)

Copy link
Member

@iameskild iameskild left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested this and everything looks good to me! I had to use ProcessPoolManager instead of:

with concurrent.futures.ThreadPoolExecutor(max_workers=50) as executor:

Either way, I was able to connect to 50 dask schedulers within a matter of seconds 🎉

@rsignell-usgs
Copy link
Contributor

rsignell-usgs commented May 4, 2023

This is awesome! @kcpevey let's test this out on the ESIP nebari training deployment at https://nebari-workshop.esipfed.org as soon as it's ready: I've added you as an admin on the private repo: https://github.com/ESIPFed/nebari-workshop

It looks like we need to get @amsnyder to push the deployment as she was the one who configured and deployed this! Let's continue this discussion on ESIP slack...

@Adam-D-Lewis Adam-D-Lewis merged commit fc18977 into develop May 4, 2023
@Adam-D-Lewis Adam-D-Lewis deleted the overloaded_dask_gateway_fix branch May 4, 2023 14:03
@amsnyder
Copy link

amsnyder commented May 5, 2023

We would like to push these updates to our deployment so that we can use them in a workshop we are running on May 10th.

Would you be able to create a release with these updates so that we can pull them?

@iameskild
Copy link
Member

Hi @amsnyder, we working on getting a new release out either today or early next week :)

@amsnyder
Copy link

amsnyder commented May 5, 2023

Thanks @iameskild! If the release is delayed, is it possible for us to just update our images from:

default_images:
  jupyterhub: quay.io/nebari/nebari-jupyterhub:2023.4.1
  jupyterlab: quay.io/nebari/nebari-jupyterlab:2023.4.1
  dask_worker: quay.io/nebari/nebari-dask-worker:2023.4.1

to:

default_images:
  jupyterhub: quay.io/nebari/nebari-jupyterhub:main
  jupyterlab: quay.io/nebari/nebari-jupyterlab:main
  dask_worker: quay.io/nebari/nebari-dask-worker:main

@Adam-D-Lewis
Copy link
Member Author

Thanks @iameskild! If the release is delayed, is it possible for us to just update our images from:

Unfortunately, I don't think that will work since conda store needs to add a token for dask gateway which is only done during the nebari deploy step with the latest version of nebari. You may be able to pip install the latest version of the develop branch of nebari as shown (https://stackoverflow.com/questions/20101834/pip-install-from-git-repo-branch) if needed, and deploy with this fix. I think that would work, but you may encounter other unforeseen issues doing that.

@dharhas
Copy link
Member

dharhas commented May 5, 2023

Is there anything holding back the release at this point?

@kcpevey
Copy link
Contributor

kcpevey commented May 5, 2023

No blockers. We're hoping to get it out today 🤞

@pavithraes pavithraes added status: approved 💪🏾 This PR has been reviewed and approved for merge and removed status: in progress 🏗 This task is currently being worked on labels May 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: integration/Dask Issues related to Dask on QHub status: approved 💪🏾 This PR has been reviewed and approved for merge type: enhancement 💅🏼 New feature or request
Projects
Development

Successfully merging this pull request may close these issues.

Potential issues when using Dask-Gateway with multiple simultaneous users
7 participants