Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider candidates that don't hold any dependencies in
decide_worker
#4925base: main
Are you sure you want to change the base?
Consider candidates that don't hold any dependencies in
decide_worker
#4925Changes from 11 commits
5b17f55
4a81ca8
c57fd72
d810d2d
768d660
346ab17
420c99e
0a004b2
b050d14
9e99b7f
f6acdc4
524da73
a5d37ae
5842ca8
a159245
bb991d1
58b4bf8
13975cb
fcb165e
cd382c6
cc57a8b
13911bc
f2445fe
38e6b57
5794540
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're not wrong to be concerned. 12us is a decent chunk of time.
It looks like
random.sample
isn't cheapThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might help with efficiency while also up with simplicity?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some time is spent just working with
sortedcontainers
.As a side note it seems
random.sample
won't take adict
or its views directly. So this includes coercing to alist
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SortedValuesView does say lookups are O(log(n)), but it seems like it's more than that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It takes any indexable, which
sortedcontainers.SortedDict.values
provides. Indexability is the reason why we use sortedcontainers.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For comparison:
So just to be clear that the random sample is the thing to try to avoid, not the
who_has
selection. In the uncommon case of many many dependencies duplicated over many workers, that selection is still on par with the random sample.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So sampling the keys and then using them to look up values is 2x faster with SortedDict, and the same as what we'd get with a plain dict:
Also
random.choices
is 2x faster thanrandom.sample
(makes sense since it doesn't have to hold onto any state):I'd feel okay about
random.choices
; we'd get slightly fewer extra workers in the pool, but it's probably worth the time savings.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're okay with duplicates then we can go much faster.
Iterating values of a SortedDict is slow:
But it's fast on a plain dict:
So we just have to make sure we're passing in
self._workers_dv.values()
, notself._workers.values()
.This is decent:
But we can go much faster and avoid a copy with an iterator:
EDIT: The iterator method could return less than
k
items though. So particularly if there's one idle worker, that's not ideal. I feel likerandom.choices(list(pd.values()), k=10)
is okay for now. If future profiling shows it's too slow, then we could doThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks
random.sample
will taked.keys()
from playing around with it locally. So we can probably skip the coercion to alist
in that case. Though maybe we've already moved well beyond this at this pointI'd probably shy away from getting to clever around random item selection. It's probably more likely to have surprising behavior that we only discover later. Just my 2 cents 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we're beyond that, since we're only taking
dict_values
objects in this function now. Making it able to support both dicts and sets was a bit of a mess and not worth it IMO.I didn't go with the too-clever itertools option because it did have surprising behavior. I did have to add another conditional though because
choices
can repeat items: fcb165e. So perhaps even that's too clever?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that we have idle_workers in this function should this be
idle_workers or all_workers
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting; yes it probably should be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, now that I look at this again, maybe we do need the sleep.
If we're scheduling
inc
s then it's not unreasonable to schedule six of these things on one worker. That's where the data is. This computation is very cheap. But in contrast, the dependency is very small. We may need the scheduler to learn thatslowinc
is slow, and so heavy relative to theint
dependency.Maybe
This way we're confident that the computational cost of the future will far outweigh the communciation cost, but things are still fast-ish.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's why I originally went with
sleep
. It also feels more like the case we're trying to test for. Though the test still passes with 6d91816 (and note that that test fails on main), so I'm not sure what we're thinking about wrong?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently 28 bytes / 100MB/s is just still smaller than however fast inc ran in (a few microseconds probably). It would be safer to bump this up to milliseconds though.
gen_cluster tests can run in less than a second, but more than 100ms. So sleeps of around that 100ms time are more welcome than 1s sleeps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test is passing as-is; do you still think I should change it back to
sleep
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would become a problem if the
inc
ran in28 bytes / 100MB/s = 0.2us
or less. Inc does actually run this fast.But presumably the infrastructure around the inc is taking longer. If our timing infrsatructure got much faster then this might result in intermittent errors. It probably doesn't matter at this scale, but it's probably a good idea if it takes only a few minutes.
We might also introduce a penalty of something like 1ms for any communication (I'm surprised that we don't do this today actually) which might tip the scales in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll switch back to a short sleep for now.
That seems very worth doing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We maybe used to do this? It might be worth checking logs. The
worker_objective
function is pretty clean if you wanted to add this. We should probably wait until we find a case where it comes up though in an impactful way just to keep from cluttering things.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think it should be a separate PR for cleanliness