Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DGX Nightly Benchmark run 20210504 #139

Open
quasiben opened this issue May 4, 2021 · 3 comments
Open

DGX Nightly Benchmark run 20210504 #139

quasiben opened this issue May 4, 2021 · 3 comments

Comments

@quasiben
Copy link
Owner

quasiben commented May 4, 2021

Benchmark history

Benchmark Image

Raw Data

<Client: 'tcp://127.0.0.1:37895' processes=10 threads=10, memory=503.79 GiB>
Distributed Version: 2021.04.1+9.g233ec884
simple
6.015e-01 +/- 4.029e-02
shuffle
2.293e+01 +/- 6.89e-01
rand_access
5.897e-03 +/- 3.28e-03
anom_mean
1.064e+02 +/- 1.488e+00

Raw Values

simple
[0.61249677 0.55235898 0.62748533 0.55971711 0.61404489 0.58228994
0.68462942 0.57052608 0.56799022 0.6434607 ]
shuffle
[22.32830022 22.36646918 22.03776295 22.9777403 22.39722683 22.7571989
22.76775758 24.11153089 23.46421872 24.04703911]
rand_access
[0.0078597 0.00342729 0.00465147 0.00341666 0.00960192 0.00371119
0.00467234 0.01374343 0.00334072 0.00454751]
anom_mean
[106.24792361 105.50825132 105.92475792 110.57412884 105.40724264
105.72732002 106.89931533 105.76425968 106.86963659 105.23619132]

Dask Profiles

Scheduler Execution Graph

Sched Graph Image

@jakirkham
Copy link
Collaborator

This includes Rick's recent HLG timeseries PR ( dask/dask#7615 )

@jakirkham
Copy link
Collaborator

It's worth comparing this to the results in issue ( #137 ) where we profiled with a workload that uses very little communication (details in that issue and its references). What sticks out is that transitions takes barely any time once communication doesn't play a significant role. In fact there really are not any obvious slow parts in that result.

While the result here suggests more time is spent in communication (see read at 13.67% and write at 7.17%) vs. _transition at 12.39%. I think the previous result indicates that might be underselling how much time is eaten up in communication itself.

IOW things like replacing communication with asyncio (to leverage uvloop) ( dask/distributed#4513 ) or even just using UCX and improving serialization are likely more important at this stage. There's probably still some value to be gained from things like using C APIs for individual transitions ( dask/distributed#4650 ) (as ~3.57% is spent exclusively in _transition and likely due to the current Python call overhead), but expect this is less than the former two items.

cc @jrbourbeau @madsbk @rjzamora @quasiben @mrocklin

@jakirkham
Copy link
Collaborator

Built on Rick's non-shuffle example in PR ( #141 ). The results now look analogous to what is seen with shuffle. Namely time is spent mostly in communication followed by transitions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants