-
-
Notifications
You must be signed in to change notification settings - Fork 719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
distributed.nanny.environ.MALLOC_TRIM_THRESHOLD_ is ineffective #5971
Comments
I'm uncertain about how to solve this. The simple solution, to change The alternative is to have an intermediate process which sets the variables and then invokes python again, which however is very expensive. This issue also impacts the other two variables set by the config:
AFAIK, if for any reason numpy is imported on the worker process before the config, these two variables will not be picked up. |
Quite possible: #5729
To be fair, all the environment variables we're currently talking about setting (malloc_trim and num_threads) basically don't have an impact unless they're set before the interpreter starts. So for these specifically, setting them in the Nanny shouldn't actually change anything in practice. I still dislike the uncleanliness of setting them in the Nanny process, though.
Could also have a process-wide lock for |
They'll be set before the worker process starts. the worker process is where it matters |
I meant having them set in the Nanny process won't really affect things for the Nanny itself. Guido and I don't like the poor hygiene of leaving them set on the Nanny, but I'm just noting that that poor hygiene shouldn't affect anything on the Nanny in practice because the particular variables we're setting only have an effect at interpreter startup/NumPy import time. |
I was worried about potential user-defined variables, not the three we set. But I'm leaning towards not over-engineering this just to cover purely hypothetical use cases. |
I'm running a Jupyter Lab notebook in Ubuntu 22.04.2 LTS where unmanaged memory isn't being released. Dask is running using Coiled. After about 30 seconds of running my notebook, unmanaged memory appears and stays high for my long-running tasks. I have When I include
My imports are currently:
This seems to set the MALLOC_TRIM_THRESHOLD_ variable correction; after I create my client, I check the variable with And get
But unmanaged memory still increases after about 30 seconds and stays high. That eventually causes my model to fail. How do I trim the unmanaged memory? Thanks very much. |
@dagibbs22 the workarounds described above are very old. This issue was resolved in July 2022. import dask
import coiled
dask.config.set({"distributed.nanny.pre-spawn-environ.MALLOC_TRIM_THRESHOLD_: your_value_here})
cluster = coiled.Cluster(...) This said, I'd be honestly surprised if tampering with the setting were to fix your issue, and if your unmanaged memory does disappear I would love to see your code. |
Thanks, @crusaderky . I could tell the issue had been resolved but couldn't tell what it was... Adding |
… on the full time series. Even on 2012-2021, unmanaged memory increased over time, getting into orange and then red for all workers. Eventually, workers died, but somehow they restarted and finished the time series. The same thing happened with the full time series two times (workers in the red zone for memory and then dying) but I guess it just happened too many times and eventually the model died. So, adding dask.config.set({"distributed.nanny.pre-spawn-environ.MALLOC_TRIM_THRESHOLD_": 1}) based on my conversation at dask/distributed#5971 (comment) didn't actually reduce unamanaged memory but did make the model push through the accumulated unmanaged memory, at least one or two times. Of course, this isn't a viable solution overall. But it is good data; unmanaged memory accumulation isn't due to MALLOC_TRIM_THRESHOLD_.
@dagibbs22 there are many causes for unmanaged memory, listed here: https://distributed.dask.org/en/stable/worker-memory.html#using-the-dashboard-to-monitor-memory-usage Is unmanaged memory persisting while there are no tasks running? If it goes away, it's heap memory and you have to reduce the size of your chunks/partitions. |
@crusaderky The old unmanaged memory gets up to about 6 GB in each worker when I run my notebook but drops to 2 GB per worker after the notebook finishes. Does 2 GB/worker count as "memory persisting while there are no tasks running"? Why would memory persist like that? Thanks. |
Some of it will be logs. dask workers store log information in deques for forensic analysis. You can shorten them through the dask config: distributed:
admin:
log-length: 0
low-level-log-length: 0 |
Ubuntu 21.10 x86/64
distributed 2022.3.0
The MALLOC_TRIM_THRESHOLD_ env variable seems to be effective at making memory deallocation more reactive.
However, the config variable that sets it doesn't seem to do anything - which indicates that the variable is being set after the worker process is started, whereas it should be set before spawning it.
Result:
Managed: 0
Unmanaged: 1.16 GiB
Result:
Managed: 0
Unmanaged: 151 MiB
Production Workaround
Set the env variable on the shell, before starting
dask-worker
:The text was updated successfully, but these errors were encountered: