You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
tl;dr: Problems with Dask being able to acquire a workspace lock on path can be solved by changing the path that Dask uses.
Description of the problem
When using Dask to do parallel processing with Mintpy (as described here: https://mintpy.readthedocs.io/en/latest/dask/) I have been running into problems related to file permissions.
The problem arises during the invert_network stage of Mintpy processing, where Dask is used to split the job up over many CPUs. Then I start getting errors, shown below.
Full error message
Here is an example error message, that repeats many times
------- start parallel processing using Dask -------
input Dask cluster type: local
initiate Dask cluster
distributed.diskutils - ERROR - Could not acquire workspace lock on path: /marmot-nobak/olstephe/InSAR/Makran/T115a/mintpy/process_stack_full_time_small_region_1cpu/dask-worker-space/worker-efk6dn5q.dirlock .Continuing without lock. This may result in workspaces not being cleaned up
Traceback (most recent call last):
File "/home/olstephe/apps/miniconda3/envs/mintpy/lib/python3.8/site-packages/distributed/diskutils.py", line 61, in __init__
with workspace._global_lock():
File "/home/olstephe/apps/miniconda3/envs/mintpy/lib/python3.8/site-packages/distributed/locket.py", line 196, in __enter__
self.acquire()
File "/home/olstephe/apps/miniconda3/envs/mintpy/lib/python3.8/site-packages/distributed/locket.py", line 190, in acquire
self._lock.acquire(self._timeout, self._retry_period)
File "/home/olstephe/apps/miniconda3/envs/mintpy/lib/python3.8/site-packages/distributed/locket.py", line 119, in acquire
lock.acquire(timeout, retry_period)
File "/home/olstephe/apps/miniconda3/envs/mintpy/lib/python3.8/site-packages/distributed/locket.py", line 163, in acquire
_lock_file_blocking(self._file)
File "/home/olstephe/apps/miniconda3/envs/mintpy/lib/python3.8/site-packages/distributed/locket.py", line 59, in _lock_file_blocking
fcntl.flock(file_.fileno(), fcntl.LOCK_EX)
OSError: [Errno 37] No locks available
/home/olstephe/apps/miniconda3/envs/mintpy/lib/python3.8/contextlib.py:120: UserWarning: Creating scratch directories is taking a surprisingly long time. This is often due to running workers on a network file system. Consider specifying a local-directory to point workers to write scratch data to a local disk.
next(self.gen)
In my case, the issue is possibly related to how the specific disk I'm trying to use is mounted. The issue is resolved by getting Dask to use a different location for writing scratch data.
We can do this by creating an YAML file for dask in the ~/.config/dask/ directory (i.e. ~/.config/dask/dask.yaml), and adding the following line to that file:
temporary-directory: /tmp # Directory for local disk like /tmp, /scratch, or /local
In this case we use the /tmp directory, but this will depend on your system. Dask will create a dask-worker-space directory in /tmp, and put directories for each worker within that directory. If others are using the same machine they may already have created a dask-worker-space directory in /tmp which you won't have permissions for. In this case you can just create a personal directory for storing Dask workers (e.g. temporary-directory: /tmp/my_dask_dir in the YAML file).
tl;dr: Problems with Dask being able to acquire a workspace lock on path can be solved by changing the path that Dask uses.
Description of the problem
When using Dask to do parallel processing with Mintpy (as described here: https://mintpy.readthedocs.io/en/latest/dask/) I have been running into problems related to file permissions.
The problem arises during the invert_network stage of Mintpy processing, where Dask is used to split the job up over many CPUs. Then I start getting errors, shown below.
Full error message
Here is an example error message, that repeats many times
In my case, the issue is possibly related to how the specific disk I'm trying to use is mounted. The issue is resolved by getting Dask to use a different location for writing scratch data.
We can do this by creating an YAML file for dask in the
~/.config/dask/ directory
(i.e.~/.config/dask/dask.yaml
), and adding the following line to that file:temporary-directory: /tmp # Directory for local disk like /tmp, /scratch, or /local
In this case we use the
/tmp
directory, but this will depend on your system. Dask will create adask-worker-space
directory in/tmp
, and put directories for each worker within that directory. If others are using the same machine they may already have created adask-worker-space
directory in/tmp
which you won't have permissions for. In this case you can just create a personal directory for storing Dask workers (e.g.temporary-directory: /tmp/my_dask_dir
in the YAML file).This resolved the issue for me.
See other relevant issues on GitHub:
dask/distributed#2113
dask/distributed#2496
System information
Thanks to @yunjunz for previous help with this.
The text was updated successfully, but these errors were encountered: