Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Short-circuit root-ish check for many deps #5113

Merged
merged 1 commit into from
Jul 24, 2021

Conversation

gjoseph92
Copy link
Collaborator

While looking into #5083 I happened to notice that the dashboard felt very sluggish. I profiled with py-spy and discovered that the scheduler was spending 20% of runtime calculaing sum(map(len, group._dependencies)) < 5! A quick print statement showed some task groups depended on 25,728 other groups (each of size 1). We can easily skip those.

I originally had this conditional in the PR but we removed it for simplicity: #4967 (comment); turns out it was relevant after all!

Screen Shot 2021-07-23 at 3 27 20 PM

FYI @d-v-b this solves your performance regression from #5083.

cc @mrocklin @jrbourbeau

  • Closes #xxxx
  • Tests added / passed
  • Passes black distributed / flake8 distributed / isort distributed

While looking into dask#5083 I happened to notice that the dashboard felt very sluggish. I profiled with py-spy and discovered that the scheduler was spending 20% of runtime calculaing `sum(map(len, group._dependencies)) < 5`! A quick print statement showed some task groups depended on 25,728 other groups (each of size 1). We can easily skip those.

I originally had this conditional in dask#4967 but we removed it for simplicity: dask#4967 (comment); turns out it was relevant after all!
@mrocklin mrocklin merged commit 9c30f38 into dask:main Jul 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants