-
-
Notifications
You must be signed in to change notification settings - Fork 719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
25% performance regression in merges #7052
Comments
Thanks for reporting and even identifying the relevant code change @wence-! @hendrikmakait do you have some time to look into this? |
Hi @wence, thanks for bringing this to my attention! #6975 was by design going to hurt some workloads, we're essentially trading fewer out-of-memory scenarios (which might have fatal consequences on a worker) for increased runtime of some workloads. I had not noticed any performance hits for the integration tests executed within Do you have an example workload and cluster configuration (e.g. cluster size, available RAM, # of workers) that I could try and replicate? If you have time to investigate further, would you mind exploring the effect of increasing It might be that we have to revisit the default setting or this imperfect approach to limiting memory load caused by data transfer altogether. |
Setting Inspecting the values of These benchmarks are running on a high-performance network (depending on the worker pairings between 12 and 45 GiB/s uni-directional bandwidth), so the default to limit grabbing multiple "small" messages from a single worker at 50MB total is getting in the way (I can send multiple GiBs of data in less than a second). I think what is happening is that previously there might have been two messages in flight between any given pair of workers at any one time, whereas now the changed logic means we limit to a single message. So I think that #6975 fixed the logic in terms of limiting wrt |
I think there might be a problem with the related logic, let me take a closer look at the implemention. |
This feels like a good idea regardless of the problem at hand, I'll put together a PR. |
Fixed by #7071 |
Our weekly multi-node benchmarking (working on making this publicly visible) shows a performance regression in simple dataframe merges, which I can pinpoint to #6975. (This was briefly reverted in #6994 and then reintroduced in #7007).
More specifically, #6975 changes the decision making in
_select_keys_for_gather
:distributed/distributed/worker_state_machine.py
Lines 1654 to 1665 in 2b23840
Prior to this change the logic was
distributed/distributed/worker_state_machine.py
Lines 1620 to 1630 in b133009
Note the difference in whether we fetch the top priority task. If I remove the part of the decision making logic that looks at
self.incoming_transfer_bytes
:Then performance goes back to where it was previously.
Not sure the correct way to square this circle. I don't understand the how the change in
_select_keys_for_gather
interacts with the intention of the PR to throttle data transfer.cc @hendrikmakait (as author of #6975)
The text was updated successfully, but these errors were encountered: