-
-
Notifications
You must be signed in to change notification settings - Fork 727
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Profiling data of Scheduler.update_graph for very large graph #7998
Comments
I did some memory profiling of the scheduler [1] based on 145c13a I submitted a large array workload with about 1.5MM tasks. The scheduler requires about 6GB of RAM to hold the computation state in memory. The peak is a bit larger since there is some intermediate state required (mostly for dask.order). Once the plateau of this graph is reached, computation starts and the memory usage breaks down roughly as
The two big contributions worth discussing is the The tracing for the TaskState object is a little fuzzy (possibly because it is using slots?) but it largely points to the usage of sets in TaskState. Indeed, empty sets are allocating relatively high memory. With 9 sets and 3 dictionaries we're already at a lower bound per TaskState of # Python 3.10.11
format_bytes(
...: 9 * sys.getsizeof(set())
...: + 3 * sys.getsizeof(dict())
...: )
Out[9]: '2.09 kiB' which adds up to almost 3GiB alone for 1.5MM tasks. The actual memory use is even better than this calculation suggests (not sure what went wrong here...) The other large contribution is the stringification of keys. Stringify does not cache/deduplicate str values, nor is the python interpreter able to intern our keys (afaik, only possible w/ ascii chars) every call to stringify effectively allocates new memory. This suggests that we should either remove or rework stringification and possibly consider a slimmer representation of our TaskState object. [1] scheduler_memory_profile.html.zip distributed/distributed/scheduler.py Line 4769 in 145c13a
[3] distributed/distributed/scheduler.py Line 4767 in 145c13a
[4] distributed/distributed/scheduler.py Line 4773 in 145c13a
|
Note that the above graph wasn't using any annotations. Annotations will add one more stringification for annotated keys |
I recently had the pleasure to see how the scheduler reacts to a very large graph. Not too well.
I submitted a graph with a couple million tasks. Locally it looks like 2.5MM tasks but the scheduler later says less. Anyhow, it's seven digits. update_graph ran for about 5min, i.e. also blocking the event loop for that time (#7980)
What is eating up the most time is
__main__
in dumps resultIt also looks like the TaskState and all the foo attached to them is taking up about 65% of the memory which in this case is about 82GiB. Assuming we're at 2MM tasks that's roundabout 40KB per TaskState. That's quite a lot.
scheduler-profile.zip
Nothing to do here, this is purely informational.
The text was updated successfully, but these errors were encountered: