Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider candidates that don't hold any dependencies in
decide_worker
#4925base: main
Are you sure you want to change the base?
Consider candidates that don't hold any dependencies in
decide_worker
#4925Changes from 6 commits
5b17f55
4a81ca8
c57fd72
d810d2d
768d660
346ab17
420c99e
0a004b2
b050d14
9e99b7f
f6acdc4
524da73
a5d37ae
5842ca8
a159245
bb991d1
58b4bf8
13975cb
fcb165e
cd382c6
cc57a8b
13911bc
f2445fe
38e6b57
5794540
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that we have idle_workers in this function should this be
idle_workers or all_workers
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting; yes it probably should be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, now that I look at this again, maybe we do need the sleep.
If we're scheduling
inc
s then it's not unreasonable to schedule six of these things on one worker. That's where the data is. This computation is very cheap. But in contrast, the dependency is very small. We may need the scheduler to learn thatslowinc
is slow, and so heavy relative to theint
dependency.Maybe
This way we're confident that the computational cost of the future will far outweigh the communciation cost, but things are still fast-ish.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's why I originally went with
sleep
. It also feels more like the case we're trying to test for. Though the test still passes with 6d91816 (and note that that test fails on main), so I'm not sure what we're thinking about wrong?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently 28 bytes / 100MB/s is just still smaller than however fast inc ran in (a few microseconds probably). It would be safer to bump this up to milliseconds though.
gen_cluster tests can run in less than a second, but more than 100ms. So sleeps of around that 100ms time are more welcome than 1s sleeps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test is passing as-is; do you still think I should change it back to
sleep
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would become a problem if the
inc
ran in28 bytes / 100MB/s = 0.2us
or less. Inc does actually run this fast.But presumably the infrastructure around the inc is taking longer. If our timing infrsatructure got much faster then this might result in intermittent errors. It probably doesn't matter at this scale, but it's probably a good idea if it takes only a few minutes.
We might also introduce a penalty of something like 1ms for any communication (I'm surprised that we don't do this today actually) which might tip the scales in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll switch back to a short sleep for now.
That seems very worth doing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We maybe used to do this? It might be worth checking logs. The
worker_objective
function is pretty clean if you wanted to add this. We should probably wait until we find a case where it comes up though in an impactful way just to keep from cluttering things.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think it should be a separate PR for cleanliness