-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Turn off dask profiler by default #19
Comments
Following as it could be interesting for coffea-casa AF as well :) |
Dear @nsmith-, |
I've realized that in this setup, the dask config arguments are difficult to propagate to the workers. Just adding things to the yaml or with cluster = LPCCondorCluster(
job_script_prologue=[
"export DASK_DISTRIBUTED__WORKER__PROFILE__INTERVAL=1d",
"export DASK_DISTRIBUTED__WORKER__PROFILE__CYCLE=2d",
],
) and can confirm they get respected: In [1]: client.run(lambda: dask.config.get("distributed.worker.profile"))
Out[1]:
{'tcp://131.225.191.76:10001': {'interval': '1d',
'cycle': '2d',
'enabled': True,
'low-level': False}} @yimuchen perhaps with this patch you can also play more easily with |
I just confirmed that this patch indeed allows us to set worker environment variables. |
I wanted to consolidate this into my own config, ended up going a bit overboard with boilerplate (so this is non-minimal), but doing the following (creating a context manager with all the jobqueue and dask workers settings), which is simultaneously written as a config file to a location the workers can see, seems to ensure all the settings are respected (some like nanny are picked up from the context config, some only from the file, and it's possible that in writing out this config I'm overriding/missing some elements that would otherwise be picked up from the default lpcjobqueue config, but anyway... Note, I couldn't get the dask environment variables to work for the MALLOC and MKL/OMP threads settings, because the parser seems to lower-case the variable names, and thus I was getting a "MALLOC_TRIM_THRESHOLD_": 65536 and a "malloc_trim_threshold_": , and it seems like only the properly upper-cased one was respected). Therefore I stuck with passing them in as env variables for the "distributed.nanny.pre-spawn-environ"
|
In https://git.rwth-aachen.de/3pia/cms_analyses/common/-/blob/master/dask.yml#L34-37 the lines
adjust default dask worker memory limits and profiler settings. The worker memory pause fraction is a source of frequent headaches as jobs on that worker stall for a long time, often slowing down the overall processing. @pfackeldey would you recommend these defaults also here?
The text was updated successfully, but these errors were encountered: