Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with file permissions when using Dask #725

Closed
olliestephenson opened this issue Jan 20, 2022 · 1 comment · Fixed by #727
Closed

Issues with file permissions when using Dask #725

olliestephenson opened this issue Jan 20, 2022 · 1 comment · Fixed by #727

Comments

@olliestephenson
Copy link
Contributor

tl;dr: Problems with Dask being able to acquire a workspace lock on path can be solved by changing the path that Dask uses.

Description of the problem

When using Dask to do parallel processing with Mintpy (as described here: https://mintpy.readthedocs.io/en/latest/dask/) I have been running into problems related to file permissions.

The problem arises during the invert_network stage of Mintpy processing, where Dask is used to split the job up over many CPUs. Then I start getting errors, shown below.

Full error message
Here is an example error message, that repeats many times

------- start parallel processing using Dask -------
input Dask cluster type: local
initiate Dask cluster
distributed.diskutils - ERROR - Could not acquire workspace lock on path: /marmot-nobak/olstephe/InSAR/Makran/T115a/mintpy/process_stack_full_time_small_region_1cpu/dask-worker-space/worker-efk6dn5q.dirlock .Continuing without lock. This may result in workspaces not being cleaned up
Traceback (most recent call last):
  File "/home/olstephe/apps/miniconda3/envs/mintpy/lib/python3.8/site-packages/distributed/diskutils.py", line 61, in __init__
    with workspace._global_lock():
  File "/home/olstephe/apps/miniconda3/envs/mintpy/lib/python3.8/site-packages/distributed/locket.py", line 196, in __enter__
    self.acquire()
  File "/home/olstephe/apps/miniconda3/envs/mintpy/lib/python3.8/site-packages/distributed/locket.py", line 190, in acquire
    self._lock.acquire(self._timeout, self._retry_period)
  File "/home/olstephe/apps/miniconda3/envs/mintpy/lib/python3.8/site-packages/distributed/locket.py", line 119, in acquire
    lock.acquire(timeout, retry_period)
  File "/home/olstephe/apps/miniconda3/envs/mintpy/lib/python3.8/site-packages/distributed/locket.py", line 163, in acquire
    _lock_file_blocking(self._file)
  File "/home/olstephe/apps/miniconda3/envs/mintpy/lib/python3.8/site-packages/distributed/locket.py", line 59, in _lock_file_blocking
    fcntl.flock(file_.fileno(), fcntl.LOCK_EX)
OSError: [Errno 37] No locks available
/home/olstephe/apps/miniconda3/envs/mintpy/lib/python3.8/contextlib.py:120: UserWarning: Creating scratch directories is taking a surprisingly long time. This is often due to running workers on a network file system. Consider specifying a local-directory to point workers to write scratch data to a local disk.
  next(self.gen)

In my case, the issue is possibly related to how the specific disk I'm trying to use is mounted. The issue is resolved by getting Dask to use a different location for writing scratch data.

We can do this by creating an YAML file for dask in the ~/.config/dask/ directory (i.e. ~/.config/dask/dask.yaml), and adding the following line to that file:

temporary-directory: /tmp # Directory for local disk like /tmp, /scratch, or /local

In this case we use the /tmp directory, but this will depend on your system. Dask will create a dask-worker-space directory in /tmp, and put directories for each worker within that directory. If others are using the same machine they may already have created a dask-worker-space directory in /tmp which you won't have permissions for. In this case you can just create a personal directory for storing Dask workers (e.g. temporary-directory: /tmp/my_dask_dir in the YAML file).

This resolved the issue for me.

See other relevant issues on GitHub:
dask/distributed#2113
dask/distributed#2496

System information

  • Operating system: Red Hat Enterprise Linux 8.5
  • Python environment: conda
  • Version of MintPy: MintPy version v1.3.2, date 2021-11-21

Thanks to @yunjunz for previous help with this.

@yuankailiu
Copy link
Contributor

Thanks for documenting and researching this issue. That helps!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants