Skip to content

Commit

Permalink
Short-circuit root-ish check for many deps (#5113)
Browse files Browse the repository at this point in the history
While looking into #5083 I happened to notice that the dashboard felt very sluggish. I profiled with py-spy and discovered that the scheduler was spending 20% of runtime calculaing `sum(map(len, group._dependencies)) < 5`! A quick print statement showed some task groups depended on 25,728 other groups (each of size 1). We can easily skip those.

I originally had this conditional in #4967 but we removed it for simplicity: #4967 (comment); turns out it was relevant after all!
  • Loading branch information
gjoseph92 authored Jul 24, 2021
1 parent a1893b1 commit 9c30f38
Showing 1 changed file with 1 addition and 0 deletions.
1 change: 1 addition & 0 deletions distributed/scheduler.py
Original file line number Diff line number Diff line change
Expand Up @@ -2485,6 +2485,7 @@ def decide_worker(self, ts: TaskState) -> WorkerState:
if (
valid_workers is None
and len(group) > self._total_nthreads * 2
and len(group._dependencies) < 5
and sum(map(len, group._dependencies)) < 5
):
ws: WorkerState = group._last_worker
Expand Down

0 comments on commit 9c30f38

Please sign in to comment.