Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

exp run: cannot clean up temp directory runs on Linux + NFS #5641

Closed
werthen opened this issue Mar 17, 2021 · 10 comments · Fixed by #5849
Closed

exp run: cannot clean up temp directory runs on Linux + NFS #5641

werthen opened this issue Mar 17, 2021 · 10 comments · Fixed by #5849
Assignees
Labels
A: experiments Related to dvc exp bug Did we break something? p1-important Important, aka current backlog of things to do research

Comments

@werthen
Copy link

werthen commented Mar 17, 2021

Bug Report

Description

When using the exp run --run-all feature, the command can never finish due to a error deleting a git folder, even though that folder seems to be empty.

Reproduce

  1. dvc exp run --queue -S training.loss.name=mse
  2. dvc exp run --run-all -j1 --verbose

Expected

I expect the same output as running dvc exp run -S training.loss.name=mse, being a successful experiment with proper metrics shown in dvc exp show. Instead, when using the queueing functionality, the command errors and the metrics are not properly saved. The JSON representation of dvc exp show also states the experiment is still queued.

Environment information

Output of dvc doctor:

$ dvc doctor
DVC version: 2.0.6 (pip)
---------------------------------
Platform: Python 3.8.5 on Linux-5.4.0-60-generic-x86_64-with-glibc2.10
Supports: hdfs, http, https
Cache types: hardlink, symlink
Cache directory: nfs4 on [REDACTED]
Caches: local
Remotes: None
Workspace directory: nfs4 on [REDACTED]
Repo: dvc, git

Additional Information (if any):

Please note that at the end of this, it states the .git/object/pack directory is not empty, even though it can be deleted using rmdir without issue.

2021-03-17 12:02:49,159 DEBUG: state save (152491799, 1615977912613432576, 22244) 0b8958b3e558a41b046785ece234cbee
2021-03-17 12:02:49,164 DEBUG: state save (152491799, 1615977912613432576, 22244) 0b8958b3e558a41b046785ece234cbee
2021-03-17 12:02:49,179 DEBUG: state save (152487234, 1615977912709433600, 6498) 6f3ff4ce2d56a0a19e27ed2ebce075fa
2021-03-17 12:02:49,183 DEBUG: state save (152487234, 1615977912709433600, 6498) 6f3ff4ce2d56a0a19e27ed2ebce075fa
2021-03-17 12:02:49,226 DEBUG: state save (152547862, 1615978823436614400, 293776) 343f60bca0df1ae495b598bbce29b384
2021-03-17 12:02:49,231 DEBUG: state save (152547862, 1615978823436614400, 293776) 343f60bca0df1ae495b598bbce29b384
2021-03-17 12:02:49,236 DEBUG: {'MS_data/model_weights.h5': 'modified'}
2021-03-17 12:02:49,260 DEBUG: state save (152544875, 1615978968985396480, 240801) 474d305f96b1fab25155390c8e7d2e2c
2021-03-17 12:02:49,264 DEBUG: state save (152544875, 1615978968985396480, 240801) 474d305f96b1fab25155390c8e7d2e2c
2021-03-17 12:02:49,270 DEBUG: {'notebook_results/Train.ipynb': 'modified'}
2021-03-17 12:02:49,287 DEBUG: state save (152544214, 1615978823200606208, 244) c0fda638e8a2cc6192b86dab818e8954
2021-03-17 12:02:49,292 DEBUG: state save (152544214, 1615978823200606208, 244) c0fda638e8a2cc6192b86dab818e8954
2021-03-17 12:02:49,297 DEBUG: {'metrics.csv': 'modified'}
2021-03-17 12:02:49,315 DEBUG: state save (152549822, 1615978930064190208, 126) 66eb73c07b00069143c48a321af0abd7
2021-03-17 12:02:49,319 DEBUG: state save (152549822, 1615978930064190208, 126) 66eb73c07b00069143c48a321af0abd7
2021-03-17 12:02:49,325 DEBUG: {'train_metrics.json': 'modified'}
2021-03-17 12:02:49,344 DEBUG: state save (152549303, 1615978965597293312, 126) b8d9d9ec45c24e1f65474a88d22e1b37
2021-03-17 12:02:49,348 DEBUG: state save (152549303, 1615978965597293312, 126) b8d9d9ec45c24e1f65474a88d22e1b37
2021-03-17 12:02:49,353 DEBUG: {'val_metrics.json': 'modified'}
2021-03-17 12:02:49,361 DEBUG: Computed stage: 'train_model' md5: '5ca694409ddab75eb4b0ff0deddbef02'
2021-03-17 12:02:49,453 DEBUG: state save (152547862, 1615978823436614400, 293776) 343f60bca0df1ae495b598bbce29b384
2021-03-17 12:02:49,459 DEBUG: Checking out 'MS_data/model_weights.h5' with cache 'object md5: 343f60bca0df1ae495b598bbce29b384'.
2021-03-17 12:02:49,474 DEBUG: Created 'copy': ../../../cache/34/3f60bca0df1ae495b598bbce29b384 -> MS_data/model_weights.h5                    
2021-03-17 12:02:49,475 DEBUG: state save (152550395, 1615978969461410816, 293776) 343f60bca0df1ae495b598bbce29b384
2021-03-17 12:02:49,544 DEBUG: state save (152544875, 1615978968985396480, 240801) 474d305f96b1fab25155390c8e7d2e2c
2021-03-17 12:02:49,550 DEBUG: Checking out 'notebook_results/Train.ipynb' with cache 'object md5: 474d305f96b1fab25155390c8e7d2e2c'.
2021-03-17 12:02:49,563 DEBUG: Created 'copy': ../../../cache/47/4d305f96b1fab25155390c8e7d2e2c -> notebook_results/Train.ipynb                
2021-03-17 12:02:49,563 DEBUG: state save (152549836, 1615978969549413376, 240801) 474d305f96b1fab25155390c8e7d2e2c
2021-03-17 12:02:49,658 DEBUG: state save (152544214, 1615978823200606208, 244) c0fda638e8a2cc6192b86dab818e8954
2021-03-17 12:02:49,664 DEBUG: Checking out 'metrics.csv' with cache 'object md5: c0fda638e8a2cc6192b86dab818e8954'.
2021-03-17 12:02:49,670 DEBUG: Created 'copy': ../../../cache/c0/fda638e8a2cc6192b86dab818e8954 -> metrics.csv                                 
2021-03-17 12:02:49,671 DEBUG: state save (152550397, 1615978969657416704, 244) c0fda638e8a2cc6192b86dab818e8954
2021-03-17 12:02:49,739 DEBUG: state save (152549822, 1615978930064190208, 126) 66eb73c07b00069143c48a321af0abd7
2021-03-17 12:02:49,745 DEBUG: Checking out 'train_metrics.json' with cache 'object md5: 66eb73c07b00069143c48a321af0abd7'.
2021-03-17 12:02:49,752 DEBUG: Created 'copy': ../../../cache/66/eb73c07b00069143c48a321af0abd7 -> train_metrics.json                          
2021-03-17 12:02:49,753 DEBUG: state save (152548593, 1615978969741419264, 126) 66eb73c07b00069143c48a321af0abd7
2021-03-17 12:02:49,799 DEBUG: state save (152549303, 1615978965597293312, 126) b8d9d9ec45c24e1f65474a88d22e1b37
2021-03-17 12:02:49,805 DEBUG: Checking out 'val_metrics.json' with cache 'object md5: b8d9d9ec45c24e1f65474a88d22e1b37'.
2021-03-17 12:02:49,811 DEBUG: Created 'copy': ../../../cache/b8/d9d9ec45c24e1f65474a88d22e1b37 -> val_metrics.json                            
2021-03-17 12:02:49,812 DEBUG: state save (152549843, 1615978969801421056, 126) b8d9d9ec45c24e1f65474a88d22e1b37
2021-03-17 12:02:49,855 DEBUG: stage: 'train_model' was reproduced
Updating lock file 'dvc.lock'
2021-03-17 12:02:50,296 DEBUG: Commit to new experiment branch 'refs/exps/25/6b1cdc9d81d8f2feecd7b4538d526a8ddd28fc/exp-293c9'
2021-03-17 12:02:51,133 DEBUG: Collected experiment 'df26527'.                                                                                 
2021-03-17 12:02:51,135 DEBUG: Removing tmpdir '<ExpTemporaryDirectory '/home/sumo/lwerthen/ms/.dvc/tmp/exps/tmpqnsuwjdk'>'
2021-03-17 12:02:51,135 DEBUG: Removing '/home/sumo/lwerthen/ms/.dvc/tmp/exps/tmpqnsuwjdk'
2021-03-17 12:02:51,319 ERROR: unexpected error - [Errno 39] Directory not empty: '/home/sumo/lwerthen/ms/.dvc/tmp/exps/tmpqnsuwjdk/.git/objects/pack'
------------------------------------------------------------
Traceback (most recent call last):
  File "/home/sumo/lwerthen/miniconda3/envs/tf2.4/lib/python3.8/shutil.py", line 654, in _rmtree_safe_fd
    os.rmdir(entry.name, dir_fd=topfd)
OSError: [Errno 39] Directory not empty: 'pack'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/sumo/lwerthen/miniconda3/envs/tf2.4/lib/python3.8/site-packages/dvc/main.py", line 50, in main
    ret = cmd.run()
  File "/home/sumo/lwerthen/miniconda3/envs/tf2.4/lib/python3.8/site-packages/dvc/command/experiments.py", line 525, in run
    results = self.repo.experiments.run(
  File "/home/sumo/lwerthen/miniconda3/envs/tf2.4/lib/python3.8/site-packages/dvc/repo/experiments/__init__.py", line 815, in run
    return run(self.repo, *args, **kwargs)
  File "/home/sumo/lwerthen/miniconda3/envs/tf2.4/lib/python3.8/site-packages/dvc/repo/__init__.py", line 49, in wrapper
    return f(repo, *args, **kwargs)
  File "/home/sumo/lwerthen/miniconda3/envs/tf2.4/lib/python3.8/site-packages/dvc/repo/experiments/run.py", line 28, in run
    return repo.experiments.reproduce_queued(jobs=jobs)
  File "/home/sumo/lwerthen/miniconda3/envs/tf2.4/lib/python3.8/site-packages/dvc/repo/experiments/__init__.py", line 376, in reproduce_queued
    results = self._reproduce_revs(**kwargs)
  File "/home/sumo/lwerthen/miniconda3/envs/tf2.4/lib/python3.8/site-packages/dvc/repo/experiments/__init__.py", line 46, in wrapper
    return f(exp, *args, **kwargs)
  File "/home/sumo/lwerthen/miniconda3/envs/tf2.4/lib/python3.8/site-packages/dvc/repo/experiments/__init__.py", line 522, in _reproduce_revs
    exec_results = self._executors_repro(executors, **kwargs)
  File "/home/sumo/lwerthen/miniconda3/envs/tf2.4/lib/python3.8/site-packages/dvc/repo/experiments/__init__.py", line 648, in _executors_repro
    executor.cleanup()
  File "/home/sumo/lwerthen/miniconda3/envs/tf2.4/lib/python3.8/site-packages/dvc/repo/experiments/executor/local.py", line 74, in cleanup
    self._tmp_dir.cleanup()
  File "/home/sumo/lwerthen/miniconda3/envs/tf2.4/lib/python3.8/site-packages/dvc/repo/experiments/executor/local.py", line 26, in cleanup
    self._rmtree(self.name)
  File "/home/sumo/lwerthen/miniconda3/envs/tf2.4/lib/python3.8/site-packages/dvc/repo/experiments/executor/local.py", line 22, in _rmtree
    remove(name)
  File "/home/sumo/lwerthen/miniconda3/envs/tf2.4/lib/python3.8/site-packages/dvc/utils/fs.py", line 135, in remove
    shutil.rmtree(path, onerror=_chmod)
  File "/home/sumo/lwerthen/miniconda3/envs/tf2.4/lib/python3.8/shutil.py", line 715, in rmtree
    _rmtree_safe_fd(fd, path, onerror)
  File "/home/sumo/lwerthen/miniconda3/envs/tf2.4/lib/python3.8/shutil.py", line 652, in _rmtree_safe_fd
    _rmtree_safe_fd(dirfd, fullname, onerror)
  File "/home/sumo/lwerthen/miniconda3/envs/tf2.4/lib/python3.8/shutil.py", line 652, in _rmtree_safe_fd
    _rmtree_safe_fd(dirfd, fullname, onerror)
  File "/home/sumo/lwerthen/miniconda3/envs/tf2.4/lib/python3.8/shutil.py", line 656, in _rmtree_safe_fd
    onerror(os.rmdir, fullname, sys.exc_info())
  File "/home/sumo/lwerthen/miniconda3/envs/tf2.4/lib/python3.8/site-packages/dvc/utils/fs.py", line 120, in _chmod
    func(p)
OSError: [Errno 39] Directory not empty: '/home/sumo/lwerthen/ms/.dvc/tmp/exps/tmpqnsuwjdk/.git/objects/pack'
------------------------------------------------------------
2021-03-17 12:02:52,023 DEBUG: Version info for developers:
DVC version: 2.0.6 (pip)
---------------------------------
Platform: Python 3.8.5 on Linux-5.4.0-60-generic-x86_64-with-glibc2.10
Supports: hdfs, http, https
Cache types: hardlink, symlink
Cache directory: nfs4 on [REDACTED]
Caches: local
Remotes: None
Workspace directory: nfs4 on [REDACTED]
Repo: dvc, git

Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
2021-03-17 12:02:52,033 DEBUG: Removing '/home/sumo/lwerthen/ms/.dvc/tmp/exps/tmpssfyvyik'
Exception ignored in: <finalize object at 0x7f8865619de0; dead>
Traceback (most recent call last):
  File "/home/sumo/lwerthen/miniconda3/envs/tf2.4/lib/python3.8/weakref.py", line 566, in __call__
    return info.func(*info.args, **(info.kwargs or {}))
  File "/home/sumo/lwerthen/miniconda3/envs/tf2.4/lib/python3.8/tempfile.py", line 818, in _cleanup
    cls._rmtree(name)
  File "/home/sumo/lwerthen/miniconda3/envs/tf2.4/lib/python3.8/site-packages/dvc/repo/experiments/executor/local.py", line 22, in _rmtree
    remove(name)
  File "/home/sumo/lwerthen/miniconda3/envs/tf2.4/lib/python3.8/site-packages/dvc/utils/fs.py", line 135, in remove
    shutil.rmtree(path, onerror=_chmod)
  File "/home/sumo/lwerthen/miniconda3/envs/tf2.4/lib/python3.8/shutil.py", line 715, in rmtree
    _rmtree_safe_fd(fd, path, onerror)
  File "/home/sumo/lwerthen/miniconda3/envs/tf2.4/lib/python3.8/shutil.py", line 652, in _rmtree_safe_fd
    _rmtree_safe_fd(dirfd, fullname, onerror)
  File "/home/sumo/lwerthen/miniconda3/envs/tf2.4/lib/python3.8/shutil.py", line 652, in _rmtree_safe_fd
    _rmtree_safe_fd(dirfd, fullname, onerror)
  File "/home/sumo/lwerthen/miniconda3/envs/tf2.4/lib/python3.8/shutil.py", line 656, in _rmtree_safe_fd
    onerror(os.rmdir, fullname, sys.exc_info())
  File "/home/sumo/lwerthen/miniconda3/envs/tf2.4/lib/python3.8/site-packages/dvc/utils/fs.py", line 120, in _chmod
    func(p)
OSError: [Errno 39] Directory not empty: '/home/sumo/lwerthen/ms/.dvc/tmp/exps/tmpssfyvyik/.git/objects/pack'
2021-03-17 12:02:54,322 DEBUG: Analytics is enabled.
2021-03-17 12:02:54,501 DEBUG: Trying to spawn '['daemon', '-q', 'analytics', '/tmp/tmpyel92az0']'
2021-03-17 12:02:54,505 DEBUG: Spawned '['daemon', '-q', 'analytics', '/tmp/tmpyel92az0']'
(tf2.4) lwerthen@sumo-ai:~/ms$ ls -al /home/sumo/lwerthen/ms/.dvc/tmp/exps/tmpssfyvyik/.git/objects/pack
total 33
drwxr-xr-x  2 lwerthen sumo  2 Mär 17 12:02 .
drwxr-xr-x 31 lwerthen sumo 31 Mär 17 12:02 ..
(tf2.4) lwerthen@sumo-ai:~/ms$ rmdir /home/sumo/lwerthen/ms/.dvc/tmp/exps/tmpssfyvyik/.git/objects/pack
(tf2.4) lwerthen@sumo-ai:~/ms$ echo $?
0
@efiop efiop added the bug Did we break something? label Mar 23, 2021
@efiop
Copy link
Contributor

efiop commented Mar 23, 2021

For the record: Another user is running into this issue https://discord.com/channels/485586884165107732/485596304961962003/823940098458517581

@efiop efiop added A: experiments Related to dvc exp p1-important Important, aka current backlog of things to do labels Mar 23, 2021
@FredericoCoelhoNunes
Copy link

Having the exact same issue! Not able to use the --run-all functionality, but can run experiments individually.

@pmrowla
Copy link
Contributor

pmrowla commented Apr 14, 2021

Looks like this is something specfic to nfs (as in the NFS filesystem, not network mounted storage in general) on linux

@pmrowla
Copy link
Contributor

pmrowla commented Apr 14, 2021

It looks like rmtree will fail on NFS if any file in the directory still has an open file handle somewhere (similar to on windows), will have to do some investigation to see what we aren't closing before doing the tempdir cleanup

@pmrowla
Copy link
Contributor

pmrowla commented Apr 20, 2021

Issue is specific to NFS and pygit2 combination.

The NFS client works by creating .nfs... files in a directory any time a file handle to something in that directory is opened. Once all of the file handles are closed, the NFS client removes the .nfs... file. So if any file handles remain open at the time shutil.rmtree is used (to clean up the temp directory), shutil.rmtree will fail, since the NFS client will keep re-creating .nfs... files as long as the problematic file handle remains open.

Disabling the pygit2 backend (so that CLI git is used as a replacement) makes this issue go away, so it appears that whatever is keeping an open file handle to .git/objects/pack is somewhere in libgit2/pygit2.

We currently call pygit2's repo.free() explicitly during scm.close(), but some underlying libgit2 dealloc funcs are not called until the repo object is completely dereferenced. Adding a direct del self.repo call in our pygit2 backend does not resolve the issue either. So the problematic file handle is being leaked somewhere, probably on the internal C libgit2 side, which will take some significant amount of time for us to actually debug/fix.

The simplest solution on the DVC side is probably for us to open an issue in pygit2 and/or libgit2, and then direct users running with .dvc/tmp/ on NFS mounts to only use CLI git (this will require adding support for the git backend config option that's been discussed previously)

@pmrowla
Copy link
Contributor

pmrowla commented Apr 20, 2021

Also for the record, with regard to the original issue:

Please note that at the end of this, it states the .git/object/pack directory is not empty, even though it can be deleted using rmdir without issue.

.git/object/pack is not empty during runtime because of the NFS client .nfs... files. The directory appears empty after DVC has exited, because the OS explicitly closes all of the open file handles associated with the DVC process (including any handles which were opened by pygit/libgit), at which time the NFS client removes its .nfs... files.

@pmrowla pmrowla changed the title exp run --run-all: directory not empty .git/object/pack exp run: cannot clean up temp directory runs on Linux + NFS Apr 20, 2021
@pmrowla
Copy link
Contributor

pmrowla commented Apr 20, 2021

After some more investigation, it looks like libgit2s git_odb_backend_pack file handles to pack/index files are leaked during checkout and merge in the event that some other process writes or modifies them after they were first opened in pygit2.

So it's a side effect of mixing git backends. When only using pygit/libgit, pygit2's repo.free() works as expected and does release whatever file handles it had opened (since nothing was leaked as a result of a different backend touching the pack files).

As a workaround in DVC, we can explicitly free the pygit2 repo after any operations that would potentially touch packfiles so that the file handles are released immediately when we are done with them.

@gregstarr
Copy link

Hello,

I am having this issue still. When I check the pack directories, one is empty and one has files in it:

(almds_dl) [starrgw1@login01 dvctest]$ ls .dvc/tmp/exps/tmpm5ix9f4j/.git/objects/pack/
pack-1186ca2730fcc6628a18da2631e206dc5d09791b.idx  pack-1186ca2730fcc6628a18da2631e206dc5d09791b.pack
(almds_dl) [starrgw1@login01 dvctest]$ ls .dvc/tmp/exps/tmpb02b0lgm/.git/objects/pack/
(almds_dl) [starrgw1@login01 dvctest]$

Below is the debug output:

(almds_dl) [starrgw1@login01 dvctest]$ dvc exp run --run-all -j 2 -v
2022-03-10 19:21:22,130 DEBUG: Reproducing experiment revs '96d579e, 03eef69'
2022-03-10 19:21:22,195 DEBUG: Writing experiments local config '/home/starrgw1/code/dvctest/.dvc/tmp/exps/tmpb02b0lgm/.dvc/config.local'
2022-03-10 19:21:22,195 DEBUG: Init temp dir executor in '/home/starrgw1/code/dvctest/.dvc/tmp/exps/tmpb02b0lgm'
2022-03-10 19:21:22,227 DEBUG: Writing experiments local config '/home/starrgw1/code/dvctest/.dvc/tmp/exps/tmpm5ix9f4j/.dvc/config.local'
2022-03-10 19:21:22,227 DEBUG: Init temp dir executor in '/home/starrgw1/code/dvctest/.dvc/tmp/exps/tmpm5ix9f4j'
2022-03-10 19:21:22,337 DEBUG: Running repro in '/home/starrgw1/code/dvctest/.dvc/tmp/exps/tmpm5ix9f4j'
2022-03-10 19:21:22,338 DEBUG: Removing '/home/starrgw1/code/dvctest/.dvc/tmp/exps/tmpm5ix9f4j/.dvc/tmp/repro.dat'
2022-03-10 19:21:22,338 DEBUG: Running repro in '/home/starrgw1/code/dvctest/.dvc/tmp/exps/tmpb02b0lgm'
2022-03-10 19:21:22,338 DEBUG: Removing '/home/starrgw1/code/dvctest/.dvc/tmp/exps/tmpb02b0lgm/.dvc/tmp/repro.dat'
2022-03-10 19:21:22,502 DEBUG: state save (2286788291, 1646958082216025344, 350) 1079e31771794bac9a75210e2ac3ffda
2022-03-10 19:21:22,502 DEBUG: state save (1211851838, 1646958082182024192, 350) 1079e31771794bac9a75210e2ac3ffda
2022-03-10 19:21:22,503 DEBUG: state save (2286788291, 1646958082216025344, 350) 1079e31771794bac9a75210e2ac3ffda
2022-03-10 19:21:22,503 DEBUG: state save (1211851838, 1646958082182024192, 350) 1079e31771794bac9a75210e2ac3ffda
2022-03-10 19:21:22,506 DEBUG: Dependency 'params.yaml' of stage: 'stage1' changed because it is '{'input_text': 'modified'}'.
2022-03-10 19:21:22,506 DEBUG: Dependency 'params.yaml' of stage: 'stage1' changed because it is '{'input_text': 'modified'}'.
2022-03-10 19:21:22,506 DEBUG: stage: 'stage1' changed.
2022-03-10 19:21:22,506 DEBUG: stage: 'stage1' changed.
2022-03-10 19:21:22,508 DEBUG: Removing output 'metrics.json' of stage: 'stage1'.
2022-03-10 19:21:22,508 DEBUG: Removing '/home/starrgw1/code/dvctest/.dvc/tmp/exps/tmpm5ix9f4j/metrics.json'
2022-03-10 19:21:22,508 DEBUG: Removing output 'metrics.json' of stage: 'stage1'.
2022-03-10 19:21:22,509 DEBUG: Removing '/home/starrgw1/code/dvctest/.dvc/tmp/exps/tmpb02b0lgm/metrics.json'
2022-03-10 19:21:22,511 DEBUG: state save (2286788291, 1646958082216025344, 350) 1079e31771794bac9a75210e2ac3ffda
2022-03-10 19:21:22,511 DEBUG: state save (1211851838, 1646958082182024192, 350) 1079e31771794bac9a75210e2ac3ffda
2022-03-10 19:21:22,514 DEBUG: state save (2286788291, 1646958082216025344, 350) 1079e31771794bac9a75210e2ac3ffda
2022-03-10 19:21:22,514 DEBUG: state save (1211851838, 1646958082182024192, 350) 1079e31771794bac9a75210e2ac3ffda
2022-03-10 19:21:22,514 DEBUG: {}
2022-03-10 19:21:22,515 DEBUG: {}
2022-03-10 19:21:22,516 DEBUG: defaultdict(<class 'dict'>, {'params.yaml': {'input_text': 'modified'}})
2022-03-10 19:21:22,516 DEBUG: defaultdict(<class 'dict'>, {'params.yaml': {'input_text': 'modified'}})
2022-03-10 19:21:22,517 DEBUG: state save (2286788291, 1646958082216025344, 350) 1079e31771794bac9a75210e2ac3ffda
2022-03-10 19:21:22,518 DEBUG: state save (1211851838, 1646958082182024192, 350) 1079e31771794bac9a75210e2ac3ffda
2022-03-10 19:21:22,520 DEBUG: state save (2286788291, 1646958082216025344, 350) 1079e31771794bac9a75210e2ac3ffda
2022-03-10 19:21:22,520 DEBUG: state save (1211851838, 1646958082182024192, 350) 1079e31771794bac9a75210e2ac3ffda
2022-03-10 19:21:22,523 DEBUG: state save (2286788291, 1646958082216025344, 350) 1079e31771794bac9a75210e2ac3ffda
2022-03-10 19:21:22,523 DEBUG: state save (1211851838, 1646958082182024192, 350) 1079e31771794bac9a75210e2ac3ffda
Running stage 'stage1':
Running stage 'stage1':
> python submit_job.py stage1.py
> python submit_job.py stage1.py
/home/starrgw1/code/dvctest/.dvc/tmp/exps/tmpb02b0lgm/stage1.py
/home/starrgw1/code/dvctest/.dvc/tmp/exps/tmpm5ix9f4j/stage1.py
JobStatus(job_id='120799.vectivus.cm.cluster', name='qsub_script.sh', user='starrgw1', time_use='0', status='Q', queue='short')
JobStatus(job_id='120800.vectivus.cm.cluster', name='qsub_script.sh', user='starrgw1', time_use='0', status='Q', queue='short')
JobStatus(job_id='120799.vectivus.cm.cluster', name='qsub_script.sh', user='starrgw1', time_use='0', status='Q', queue='short')
JobStatus(job_id='120800.vectivus.cm.cluster', name='qsub_script.sh', user='starrgw1', time_use='0', status='Q', queue='short')
JobStatus(job_id='120799.vectivus.cm.cluster', name='qsub_script.sh', user='starrgw1', time_use='0', status='Q', queue='short')
JobStatus(job_id='120800.vectivus.cm.cluster', name='qsub_script.sh', user='starrgw1', time_use='0', status='Q', queue='short')
JobStatus(job_id='120799.vectivus.cm.cluster', name='qsub_script.sh', user='starrgw1', time_use='0', status='R', queue='short')
JobStatus(job_id='120800.vectivus.cm.cluster', name='qsub_script.sh', user='starrgw1', time_use='0', status='Q', queue='short')
JobStatus(job_id='120799.vectivus.cm.cluster', name='qsub_script.sh', user='starrgw1', time_use='0', status='R', queue='short')
JobStatus(job_id='120800.vectivus.cm.cluster', name='qsub_script.sh', user='starrgw1', time_use='0', status='R', queue='short')
JobStatus(job_id='120799.vectivus.cm.cluster', name='qsub_script.sh', user='starrgw1', time_use='0', status='R', queue='short')
JobStatus(job_id='120800.vectivus.cm.cluster', name='qsub_script.sh', user='starrgw1', time_use='0', status='R', queue='short')
JobStatus(job_id='120799.vectivus.cm.cluster', name='qsub_script.sh', user='starrgw1', time_use='0', status='R', queue='short')
JobStatus(job_id='120800.vectivus.cm.cluster', name='qsub_script.sh', user='starrgw1', time_use='0', status='R', queue='short')
2022-03-10 19:21:58,158 DEBUG: state save (2286788291, 1646958082216025344, 350) 1079e31771794bac9a75210e2ac3ffda
2022-03-10 19:21:58,162 DEBUG: state save (2286788293, 1646958103056720896, 8) 1166a8fbe4acb9cbfd182cfbb5fd9fdf
2022-03-10 19:21:58,162 DEBUG: state save (2286788293, 1646958103056720896, 8) 1166a8fbe4acb9cbfd182cfbb5fd9fdf
2022-03-10 19:21:58,163 DEBUG: Output 'metrics.json' doesn't use cache. Skipping saving.
2022-03-10 19:21:58,164 DEBUG: Computed stage: 'stage1' md5: '8aa9486314f0f8befb9277b8eb4e8def'
2022-03-10 19:21:58,165 DEBUG: state save (2286788291, 1646958082216025344, 350) 1079e31771794bac9a75210e2ac3ffda
2022-03-10 19:21:58,168 DEBUG: state save (2286788291, 1646958082216025344, 350) 1079e31771794bac9a75210e2ac3ffda
2022-03-10 19:21:58,176 DEBUG: state save (2286788293, 1646958103056720896, 8) 1166a8fbe4acb9cbfd182cfbb5fd9fdf
2022-03-10 19:21:58,178 DEBUG: Preparing to transfer data from 'memory://dvc-staging/d701057ee2ce3fdcd5408c3336f74eedf76e8d2655257ef739b3ea22a3904799' to '/home/starrgw1/code/dvctest/.dvc/cache'
2022-03-10 19:21:58,178 DEBUG: Preparing to collect status from '/home/starrgw1/code/dvctest/.dvc/cache'
2022-03-10 19:21:58,178 DEBUG: Collecting status from '/home/starrgw1/code/dvctest/.dvc/cache'
2022-03-10 19:21:58,179 DEBUG: Preparing to collect status from 'memory://dvc-staging/d701057ee2ce3fdcd5408c3336f74eedf76e8d2655257ef739b3ea22a3904799'
2022-03-10 19:21:58,182 DEBUG: state save (2286788293, 1646958103056720896, 8) 1166a8fbe4acb9cbfd182cfbb5fd9fdf
2022-03-10 19:21:58,187 DEBUG: Uploading '/home/starrgw1/code/dvctest/.dvc/cache/.J4vv5NDtBQtjSh8ycSkWhC.tmp' to '/home/starrgw1/code/dvctest/.dvc/tmp/exps/tmpm5ix9f4j/.SnbjvSEZ76cQkVrwvWcLN6.tmp'
2022-03-10 19:21:58,189 DEBUG: Removing '/home/starrgw1/code/dvctest/.dvc/tmp/exps/tmpm5ix9f4j/.SnbjvSEZ76cQkVrwvWcLN6.tmp'
2022-03-10 19:21:58,189 DEBUG: Removing '/home/starrgw1/code/dvctest/.dvc/cache/.J4vv5NDtBQtjSh8ycSkWhC.tmp'
2022-03-10 19:21:58,197 DEBUG: Removing '/home/starrgw1/code/dvctest/.dvc/tmp/exps/tmpm5ix9f4j/metrics.json'
2022-03-10 19:21:58,198 DEBUG: Uploading '/home/starrgw1/code/dvctest/.dvc/cache/11/66a8fbe4acb9cbfd182cfbb5fd9fdf' to '/home/starrgw1/code/dvctest/.dvc/tmp/exps/tmpm5ix9f4j/metrics.json'
2022-03-10 19:21:58,200 DEBUG: state save (2286788296, 1646958118198226432, 8) 1166a8fbe4acb9cbfd182cfbb5fd9fdf
2022-03-10 19:21:58,205 DEBUG: state save (2286788296, 1646958118198226432, 8) 1166a8fbe4acb9cbfd182cfbb5fd9fdf
2022-03-10 19:21:58,210 DEBUG: state save (1211851838, 1646958082182024192, 350) 1079e31771794bac9a75210e2ac3ffda
2022-03-10 19:21:58,211 DEBUG: stage: 'stage1' was reproduced
2022-03-10 19:21:58,213 DEBUG: state save (1211851840, 1646958108677908736, 8) 9b7916dcfbccc49c18581fd80884fa56
2022-03-10 19:21:58,214 DEBUG: state save (1211851840, 1646958108677908736, 8) 9b7916dcfbccc49c18581fd80884fa56
2022-03-10 19:21:58,215 DEBUG: Output 'metrics.json' doesn't use cache. Skipping saving.
2022-03-10 19:21:58,216 DEBUG: Computed stage: 'stage1' md5: '7eb88065797b13721ef3ffa7cf7b13ed'
Updating lock file 'dvc.lock'
2022-03-10 19:21:58,217 DEBUG: state save (1211851838, 1646958082182024192, 350) 1079e31771794bac9a75210e2ac3ffda
2022-03-10 19:21:58,220 DEBUG: state save (1211851838, 1646958082182024192, 350) 1079e31771794bac9a75210e2ac3ffda
2022-03-10 19:21:58,224 DEBUG: Staging files: {'stage1.py', 'dvc.yaml', 'params.yaml', 'dvc.lock', 'metrics.json'}
2022-03-10 19:21:58,227 DEBUG: state save (1211851840, 1646958108677908736, 8) 9b7916dcfbccc49c18581fd80884fa56
2022-03-10 19:21:58,228 DEBUG: Preparing to transfer data from 'memory://dvc-staging/d701057ee2ce3fdcd5408c3336f74eedf76e8d2655257ef739b3ea22a3904799' to '/home/starrgw1/code/dvctest/.dvc/cache'
2022-03-10 19:21:58,228 DEBUG: Preparing to collect status from '/home/starrgw1/code/dvctest/.dvc/cache'
2022-03-10 19:21:58,228 DEBUG: Collecting status from '/home/starrgw1/code/dvctest/.dvc/cache'
2022-03-10 19:21:58,229 DEBUG: Preparing to collect status from 'memory://dvc-staging/d701057ee2ce3fdcd5408c3336f74eedf76e8d2655257ef739b3ea22a3904799'
2022-03-10 19:21:58,231 DEBUG: state save (1211851840, 1646958108677908736, 8) 9b7916dcfbccc49c18581fd80884fa56
2022-03-10 19:21:58,236 DEBUG: Uploading '/home/starrgw1/code/dvctest/.dvc/cache/.MBmG3RvyuNGA6zEyzrYayc.tmp' to '/home/starrgw1/code/dvctest/.dvc/tmp/exps/tmpb02b0lgm/.nDJoTijpRqfnoaZLmjd2Bk.tmp'
2022-03-10 19:21:58,238 DEBUG: Removing '/home/starrgw1/code/dvctest/.dvc/tmp/exps/tmpb02b0lgm/.nDJoTijpRqfnoaZLmjd2Bk.tmp'
2022-03-10 19:21:58,238 DEBUG: Removing '/home/starrgw1/code/dvctest/.dvc/cache/.MBmG3RvyuNGA6zEyzrYayc.tmp'
2022-03-10 19:21:58,238 DEBUG: Removing '/home/starrgw1/code/dvctest/.dvc/tmp/exps/tmpb02b0lgm/metrics.json'
2022-03-10 19:21:58,239 DEBUG: Uploading '/home/starrgw1/code/dvctest/.dvc/cache/9b/7916dcfbccc49c18581fd80884fa56' to '/home/starrgw1/code/dvctest/.dvc/tmp/exps/tmpb02b0lgm/metrics.json'
2022-03-10 19:21:58,240 DEBUG: Commit to new experiment branch 'refs/exps/96/bf9d4272ad61392a913bf8f1e7f77faf69defb/exp-bb312'
2022-03-10 19:21:58,240 DEBUG: state save (1211851845, 1646958118239227648, 8) 9b7916dcfbccc49c18581fd80884fa56
2022-03-10 19:21:58,245 DEBUG: state save (1211851845, 1646958118239227648, 8) 9b7916dcfbccc49c18581fd80884fa56
2022-03-10 19:21:58,252 DEBUG: stage: 'stage1' was reproduced
Updating lock file 'dvc.lock'
2022-03-10 19:21:58,265 DEBUG: Staging files: {'stage1.py', 'dvc.yaml', 'params.yaml', 'dvc.lock', 'metrics.json'}
2022-03-10 19:21:58,268 WARNING: The following untracked files were present in the experiment directory after reproduction but will not be included in experiment commits:
        qsub_script.sh, qsub_script.sh.o120799, qsub_script.sh.e120799
2022-03-10 19:21:58,284 DEBUG: Commit to new experiment branch 'refs/exps/96/bf9d4272ad61392a913bf8f1e7f77faf69defb/exp-b936d'
2022-03-10 19:21:58,308 WARNING: The following untracked files were present in the experiment directory after reproduction but will not be included in experiment commits:
        qsub_script.sh, qsub_script.sh.o120800, qsub_script.sh.e120800
2022-03-10 19:21:58,325 DEBUG: Collected experiment '2ba86e6'.
2022-03-10 19:21:58,326 DEBUG: Removing tmpdir '/home/starrgw1/code/dvctest/.dvc/tmp/exps/tmpb02b0lgm'
2022-03-10 19:21:58,326 DEBUG: Removing '/home/starrgw1/code/dvctest/.dvc/tmp/exps/tmpb02b0lgm'
2022-03-10 19:21:58,337 ERROR: unexpected error - [Errno 39] Directory not empty: '/home/starrgw1/code/dvctest/.dvc/tmp/exps/tmpb02b0lgm/.git/objects/pack'
------------------------------------------------------------
Traceback (most recent call last):
  File "/home/starrgw1/.conda/envs/almds_dl/lib/python3.8/shutil.py", line 657, in _rmtree_safe_fd
    os.rmdir(entry.name, dir_fd=topfd)
OSError: [Errno 39] Directory not empty: 'pack'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/starrgw1/.conda/envs/almds_dl/lib/python3.8/site-packages/dvc/cli/__init__.py", line 78, in main
    ret = cmd.do_run()
  File "/home/starrgw1/.conda/envs/almds_dl/lib/python3.8/site-packages/dvc/cli/command.py", line 22, in do_run
    return self.run()
  File "/home/starrgw1/.conda/envs/almds_dl/lib/python3.8/site-packages/dvc/commands/experiments/run.py", line 32, in run
    results = self.repo.experiments.run(
  File "/home/starrgw1/.conda/envs/almds_dl/lib/python3.8/site-packages/dvc/repo/experiments/__init__.py", line 825, in run
    return run(self.repo, *args, **kwargs)
  File "/home/starrgw1/.conda/envs/almds_dl/lib/python3.8/site-packages/dvc/repo/__init__.py", line 48, in wrapper
    return f(repo, *args, **kwargs)
  File "/home/starrgw1/.conda/envs/almds_dl/lib/python3.8/site-packages/dvc/repo/experiments/run.py", line 28, in run
    return repo.experiments.reproduce_queued(jobs=jobs)
  File "/home/starrgw1/.conda/envs/almds_dl/lib/python3.8/site-packages/dvc/repo/experiments/__init__.py", line 457, in reproduce_queued
    results = self._reproduce_revs(**kwargs)
  File "/home/starrgw1/.conda/envs/almds_dl/lib/python3.8/site-packages/dvc/repo/experiments/__init__.py", line 53, in wrapper
    return f(exp, *args, **kwargs)
  File "/home/starrgw1/.conda/envs/almds_dl/lib/python3.8/site-packages/dvc/repo/experiments/__init__.py", line 644, in _reproduce_revs
    exec_results.update(self._executors_repro(manager, **kwargs))
  File "/home/starrgw1/.conda/envs/almds_dl/lib/python3.8/site-packages/dvc/repo/experiments/__init__.py", line 64, in wrapper
    ret = f(exp, *args, **kwargs)
  File "/home/starrgw1/.conda/envs/almds_dl/lib/python3.8/site-packages/dvc/repo/experiments/__init__.py", line 675, in _executors_repro
    return manager.exec_queue(self.repo, **kwargs)
  File "/home/starrgw1/.conda/envs/almds_dl/lib/python3.8/site-packages/dvc/repo/experiments/executor/manager/base.py", line 159, in exec_queue
    return self._exec_attached(repo, jobs=jobs)
  File "/home/starrgw1/.conda/envs/almds_dl/lib/python3.8/site-packages/dvc/repo/experiments/executor/manager/base.py", line 232, in _exec_attached
    self.cleanup_executor(rev, executor)
  File "/home/starrgw1/.conda/envs/almds_dl/lib/python3.8/site-packages/dvc/repo/experiments/executor/manager/base.py", line 270, in cleanup_executor
    executor.cleanup()
  File "/home/starrgw1/.conda/envs/almds_dl/lib/python3.8/site-packages/dvc/repo/experiments/executor/local.py", line 110, in cleanup
    remove(self.root_dir)
  File "/home/starrgw1/.conda/envs/almds_dl/lib/python3.8/site-packages/dvc/utils/fs.py", line 135, in remove
    shutil.rmtree(path, onerror=_chmod)
  File "/home/starrgw1/.conda/envs/almds_dl/lib/python3.8/shutil.py", line 718, in rmtree
    _rmtree_safe_fd(fd, path, onerror)
  File "/home/starrgw1/.conda/envs/almds_dl/lib/python3.8/shutil.py", line 655, in _rmtree_safe_fd
    _rmtree_safe_fd(dirfd, fullname, onerror)
  File "/home/starrgw1/.conda/envs/almds_dl/lib/python3.8/shutil.py", line 655, in _rmtree_safe_fd
    _rmtree_safe_fd(dirfd, fullname, onerror)
  File "/home/starrgw1/.conda/envs/almds_dl/lib/python3.8/shutil.py", line 659, in _rmtree_safe_fd
    onerror(os.rmdir, fullname, sys.exc_info())
  File "/home/starrgw1/.conda/envs/almds_dl/lib/python3.8/site-packages/dvc/utils/fs.py", line 120, in _chmod
    func(p)
OSError: [Errno 39] Directory not empty: '/home/starrgw1/code/dvctest/.dvc/tmp/exps/tmpb02b0lgm/.git/objects/pack'
------------------------------------------------------------
2022-03-10 19:21:58,503 DEBUG: [Errno 95] no more link types left to try out: [Errno 95] 'reflink' is not supported by <class 'dvc.fs.local.LocalFileSystem'>: [Errno 95] Operation not supported
------------------------------------------------------------
Traceback (most recent call last):
  File "/home/starrgw1/.conda/envs/almds_dl/lib/python3.8/shutil.py", line 657, in _rmtree_safe_fd
    os.rmdir(entry.name, dir_fd=topfd)
OSError: [Errno 39] Directory not empty: 'pack'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/starrgw1/.conda/envs/almds_dl/lib/python3.8/site-packages/dvc/cli/__init__.py", line 78, in main
    ret = cmd.do_run()
  File "/home/starrgw1/.conda/envs/almds_dl/lib/python3.8/site-packages/dvc/cli/command.py", line 22, in do_run
    return self.run()
  File "/home/starrgw1/.conda/envs/almds_dl/lib/python3.8/site-packages/dvc/commands/experiments/run.py", line 32, in run
    results = self.repo.experiments.run(
  File "/home/starrgw1/.conda/envs/almds_dl/lib/python3.8/site-packages/dvc/repo/experiments/__init__.py", line 825, in run
    return run(self.repo, *args, **kwargs)
  File "/home/starrgw1/.conda/envs/almds_dl/lib/python3.8/site-packages/dvc/repo/__init__.py", line 48, in wrapper
    return f(repo, *args, **kwargs)
  File "/home/starrgw1/.conda/envs/almds_dl/lib/python3.8/site-packages/dvc/repo/experiments/run.py", line 28, in run
    return repo.experiments.reproduce_queued(jobs=jobs)
  File "/home/starrgw1/.conda/envs/almds_dl/lib/python3.8/site-packages/dvc/repo/experiments/__init__.py", line 457, in reproduce_queued
    results = self._reproduce_revs(**kwargs)
  File "/home/starrgw1/.conda/envs/almds_dl/lib/python3.8/site-packages/dvc/repo/experiments/__init__.py", line 53, in wrapper
    return f(exp, *args, **kwargs)
  File "/home/starrgw1/.conda/envs/almds_dl/lib/python3.8/site-packages/dvc/repo/experiments/__init__.py", line 644, in _reproduce_revs
    exec_results.update(self._executors_repro(manager, **kwargs))
  File "/home/starrgw1/.conda/envs/almds_dl/lib/python3.8/site-packages/dvc/repo/experiments/__init__.py", line 64, in wrapper
    ret = f(exp, *args, **kwargs)
  File "/home/starrgw1/.conda/envs/almds_dl/lib/python3.8/site-packages/dvc/repo/experiments/__init__.py", line 675, in _executors_repro
    return manager.exec_queue(self.repo, **kwargs)
  File "/home/starrgw1/.conda/envs/almds_dl/lib/python3.8/site-packages/dvc/repo/experiments/executor/manager/base.py", line 159, in exec_queue
    return self._exec_attached(repo, jobs=jobs)
  File "/home/starrgw1/.conda/envs/almds_dl/lib/python3.8/site-packages/dvc/repo/experiments/executor/manager/base.py", line 232, in _exec_attached
    self.cleanup_executor(rev, executor)
  File "/home/starrgw1/.conda/envs/almds_dl/lib/python3.8/site-packages/dvc/repo/experiments/executor/manager/base.py", line 270, in cleanup_executor
    executor.cleanup()
  File "/home/starrgw1/.conda/envs/almds_dl/lib/python3.8/site-packages/dvc/repo/experiments/executor/local.py", line 110, in cleanup
    remove(self.root_dir)
  File "/home/starrgw1/.conda/envs/almds_dl/lib/python3.8/site-packages/dvc/utils/fs.py", line 135, in remove
    shutil.rmtree(path, onerror=_chmod)
  File "/home/starrgw1/.conda/envs/almds_dl/lib/python3.8/shutil.py", line 718, in rmtree
    _rmtree_safe_fd(fd, path, onerror)
  File "/home/starrgw1/.conda/envs/almds_dl/lib/python3.8/shutil.py", line 655, in _rmtree_safe_fd
    _rmtree_safe_fd(dirfd, fullname, onerror)
  File "/home/starrgw1/.conda/envs/almds_dl/lib/python3.8/shutil.py", line 655, in _rmtree_safe_fd
    _rmtree_safe_fd(dirfd, fullname, onerror)
  File "/home/starrgw1/.conda/envs/almds_dl/lib/python3.8/shutil.py", line 659, in _rmtree_safe_fd
    onerror(os.rmdir, fullname, sys.exc_info())
  File "/home/starrgw1/.conda/envs/almds_dl/lib/python3.8/site-packages/dvc/utils/fs.py", line 120, in _chmod
    func(p)
OSError: [Errno 39] Directory not empty: '/home/starrgw1/code/dvctest/.dvc/tmp/exps/tmpb02b0lgm/.git/objects/pack'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/starrgw1/.conda/envs/almds_dl/lib/python3.8/site-packages/dvc/fs/utils.py", line 28, in _link
    func(from_path, to_path)
  File "/home/starrgw1/.conda/envs/almds_dl/lib/python3.8/site-packages/dvc/fs/local.py", line 144, in reflink
    System.reflink(from_info, to_info)
  File "/home/starrgw1/.conda/envs/almds_dl/lib/python3.8/site-packages/dvc/system.py", line 112, in reflink
    System._reflink_linux(source, link_name)
  File "/home/starrgw1/.conda/envs/almds_dl/lib/python3.8/site-packages/dvc/system.py", line 96, in _reflink_linux
    fcntl.ioctl(d.fileno(), FICLONE, s.fileno())
OSError: [Errno 95] Operation not supported

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/starrgw1/.conda/envs/almds_dl/lib/python3.8/site-packages/dvc/fs/utils.py", line 69, in _try_links
    return _link(link, from_fs, from_path, to_fs, to_path)
  File "/home/starrgw1/.conda/envs/almds_dl/lib/python3.8/site-packages/dvc/fs/utils.py", line 32, in _link
    raise OSError(
OSError: [Errno 95] 'reflink' is not supported by <class 'dvc.fs.local.LocalFileSystem'>

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/starrgw1/.conda/envs/almds_dl/lib/python3.8/site-packages/dvc/fs/utils.py", line 124, in _test_link
    _try_links([link], from_fs, from_file, to_fs, to_file)
  File "/home/starrgw1/.conda/envs/almds_dl/lib/python3.8/site-packages/dvc/fs/utils.py", line 77, in _try_links
    raise OSError(
OSError: [Errno 95] no more link types left to try out
------------------------------------------------------------
2022-03-10 19:21:58,504 DEBUG: Removing '/home/starrgw1/code/.PHQHFmRzwbSbJrd69xLnsE.tmp'
2022-03-10 19:21:58,504 DEBUG: Removing '/home/starrgw1/code/.PHQHFmRzwbSbJrd69xLnsE.tmp'
2022-03-10 19:21:58,504 DEBUG: Removing '/home/starrgw1/code/.PHQHFmRzwbSbJrd69xLnsE.tmp'
2022-03-10 19:21:58,505 DEBUG: Removing '/home/starrgw1/code/dvctest/.dvc/cache/.C7nbLHG6hDepE3gwJDP8Jb.tmp'
2022-03-10 19:21:58,585 DEBUG: Version info for developers:
DVC version: 2.9.4 (conda)
---------------------------------
Platform: Python 3.8.12 on Linux-3.10.0-693.el7.x86_64-x86_64-with-glibc2.10
Supports:
        webhdfs (fsspec = 2022.2.0),
        http (aiohttp = 3.8.1, aiohttp-retry = 2.4.6),
        https (aiohttp = 3.8.1, aiohttp-retry = 2.4.6)
Cache types: hardlink, symlink
Cache directory: nfs on master:/home
Caches: local
Remotes: None
Workspace directory: nfs on master:/home
Repo: dvc, git

Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
2022-03-10 19:21:58,586 DEBUG: Analytics is enabled.
2022-03-10 19:21:58,618 DEBUG: Trying to spawn '['daemon', '-q', 'analytics', '/tmp/tmpq2t5205u']'
2022-03-10 19:21:58,619 DEBUG: Spawned '['daemon', '-q', 'analytics', '/tmp/tmpq2t5205u']'

@karajan1001
Copy link
Contributor

@pmrowla , sorry, I don't have two computers to simulate this condition. Any ideas about it?

@pmrowla
Copy link
Contributor

pmrowla commented Mar 14, 2022

There's new calls which have been implemented in pygit2 since this issue was originally closed, we likely just need to add the wrapper to free the handles in the new calls.

also, you don't really need 2 computers to test this, you just have to mount some export using nfs inside a vm or container to reproduce the issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: experiments Related to dvc exp bug Did we break something? p1-important Important, aka current backlog of things to do research
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants