Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] clean up routine tries to delete file with shutil.rmtree #60170

Closed
carinedengler opened this issue May 11, 2021 · 2 comments · Fixed by #61356
Closed

[BUG] clean up routine tries to delete file with shutil.rmtree #60170

carinedengler opened this issue May 11, 2021 · 2 comments · Fixed by #61356
Assignees
Labels
Bug broken, incorrect, or confusing behavior effort-small level of effort estimated in size enhancement enhancing, extending functionality, not refactor or net new Phosphorus v3005.0 Release code name and version severity-high 2nd top severity, seen by most users, causes major problems

Comments

@carinedengler
Copy link

Description
First of all, thank you for your work on this projet! We are using it for years now and are very happy with it! I would like to bring to your attention an issue we are experiencing recently though. Similarly to this issue, the clean up routine fails to clear the master's job cache, leading to all the inodes of the system being used. The bug itself is that the clean up routine tries to delete a file with shutil.rmtree

May 11 13:52:47 salt salt-master[8466]: Traceback (most recent call last):
May 11 13:52:47 salt salt-master[8466]:   File "/usr/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
May 11 13:52:47 salt salt-master[8466]:     self.run()
May 11 13:52:47 salt salt-master[8466]:   File "/usr/lib/python3/dist-packages/salt/utils/process.py", line 895, in wrapped_run_func
May 11 13:52:47 salt salt-master[8466]:     return run_func()
May 11 13:52:47 salt salt-master[8466]:   File "/usr/lib/python3/dist-packages/salt/master.py", line 233, in run
May 11 13:52:47 salt salt-master[8466]:     salt.daemons.masterapi.clean_old_jobs(self.opts)
May 11 13:52:47 salt salt-master[8466]:   File "/usr/lib/python3/dist-packages/salt/daemons/masterapi.py", line 156, in clean_old_jobs
May 11 13:52:47 salt salt-master[8466]:     mminion.returners[fstr]()
May 11 13:52:47 salt salt-master[8466]:   File "/usr/lib/python3/dist-packages/salt/loader.py", line 1235, in __call__
May 11 13:52:47 salt salt-master[8466]:     return self.loader.run(run_func, *args, **kwargs)
May 11 13:52:47 salt salt-master[8466]:   File "/usr/lib/python3/dist-packages/salt/loader.py", line 2268, in run
May 11 13:52:47 salt salt-master[8466]:     return self._last_context.run(self._run_as, _func_or_method, *args, **kwargs)
May 11 13:52:47 salt salt-master[8466]:   File "/usr/lib/python3/dist-packages/salt/loader.py", line 2283, in _run_as
May 11 13:52:47 salt salt-master[8466]:     return _func_or_method(*args, **kwargs)
May 11 13:52:47 salt salt-master[8466]:   File "/usr/lib/python3/dist-packages/salt/returners/local_cache.py", line 445, in clean_old_jobs
May 11 13:52:47 salt salt-master[8466]:     shutil.rmtree(f_path)
May 11 13:52:47 salt salt-master[8466]:   File "/usr/lib/python3.7/shutil.py", line 491, in rmtree
May 11 13:52:47 salt salt-master[8466]:     _rmtree_safe_fd(fd, path, onerror)
May 11 13:52:47 salt salt-master[8466]:   File "/usr/lib/python3.7/shutil.py", line 410, in _rmtree_safe_fd
May 11 13:52:47 salt salt-master[8466]:     onerror(os.scandir, path, sys.exc_info())
May 11 13:52:47 salt salt-master[8466]:   File "/usr/lib/python3.7/shutil.py", line 406, in _rmtree_safe_fd
May 11 13:52:47 salt salt-master[8466]:     with os.scandir(topfd) as scandir_it:
May 11 13:52:47 salt salt-master[8466]: NotADirectoryError: [Errno 20] Not a directory: '/var/cache/salt/master/jobs/6d/2f80d0dcb228daab95703f632eaf0608edb51a3d223258bcc88188e0939d7e'

The path in question points to a file, not a directory

file /var/cache/salt/master/jobs/6d/2f80d0dcb228daab95703f632eaf0608edb51a3d223258bcc88188e0939d7e
/var/cache/salt/master/jobs/6d/2f80d0dcb228daab95703f632eaf0608edb51a3d223258bcc88188e0939d7e: ASCII text, with no line terminators

Using os.path.isdir instead of os.path.exists in

if not os.path.isfile(jid_file) and os.path.exists(f_path):
and adding an else-clause to treat the case when f_path is a file could fix the problem, but I didn't really have time to test this.

Setup
I'm not sure any of the config files are relevant here - if you need any, I'm happy to provide them.

Steps to Reproduce the behavior
We experienced a pretty bad hardware crash some days before, so that it is entirely possible that the file system is just in a weird state, and that normally these files would be directories.

Steps I have taken so far

Stopped and restarted salt-master. I also deleted some of the files by hand to clear up some inodes.

Expected behavior
I don't know what would be appropriate behaviour. I guess removing the files with a warning in the logs?

Versions Report

salt --versions-report
Salt Version:
          Salt: 3003
 
Dependency Versions:
          cffi: Not Installed
      cherrypy: unknown
      dateutil: 2.7.3
     docker-py: Not Installed
         gitdb: Not Installed
     gitpython: Not Installed
        Jinja2: 2.10
       libgit2: Not Installed
      M2Crypto: Not Installed
          Mako: Not Installed
       msgpack: 0.5.6
  msgpack-pure: Not Installed
  mysql-python: Not Installed
     pycparser: Not Installed
      pycrypto: 2.6.1
  pycryptodome: 3.6.1
        pygit2: Not Installed
        Python: 3.7.3 (default, Jan 22 2021, 20:04:44)
  python-gnupg: Not Installed
        PyYAML: 3.13
         PyZMQ: 17.1.2
         smmap: Not Installed
       timelib: Not Installed
       Tornado: 4.5.3
           ZMQ: 4.3.1
 
System Versions:
          dist: debian 10 buster
        locale: UTF-8
       machine: x86_64
       release: 4.19.0-16-amd64
        system: Linux
       version: Debian GNU/Linux 10 buster

Additional context
As mentioned beforehand, the file system is probably in an unstable state. My solution in the meantime will be to remove any files in /var/cache/salt/master/jobs/ and monitor the situation to see if it is a recurring problem.

Thank you for your time!

@carinedengler carinedengler added Bug broken, incorrect, or confusing behavior needs-triage labels May 11, 2021
@welcome
Copy link

welcome bot commented May 11, 2021

Hi there! Welcome to the Salt Community! Thank you for making your first contribution. We have a lengthy process for issues and PRs. Someone from the Core Team will follow up as soon as possible. In the meantime, here’s some information that may help as you continue your Salt journey.
Please be sure to review our Code of Conduct. Also, check out some of our community resources including:

There are lots of ways to get involved in our community. Every month, there are around a dozen opportunities to meet with other contributors and the Salt Core team and collaborate in real time. The best way to keep track is by subscribing to the Salt Community Events Calendar.
If you have additional questions, email us at saltproject@vmware.com. We’re glad you’ve joined our community and look forward to doing awesome things with you!

@carinedengler carinedengler changed the title [BUG] [BUG] clean up routine tries to delete file with shutil.rmtree May 11, 2021
@sagetherage sagetherage added effort-small level of effort estimated in size enhancement enhancing, extending functionality, not refactor or net new and removed needs-triage labels May 11, 2021
@sagetherage sagetherage added this to the Approved milestone May 11, 2021
@sagetherage sagetherage added severity-high 2nd top severity, seen by most users, causes major problems Phosphorus v3005.0 Release code name and version labels Jun 16, 2021
@sagetherage sagetherage modified the milestones: Approved, Phosphorus Jun 16, 2021
@cmcmarrow
Copy link
Contributor

This looks like a bad inode. But clean_old_jobs is so mission critical we should probably be able to recover from such a failure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug broken, incorrect, or confusing behavior effort-small level of effort estimated in size enhancement enhancing, extending functionality, not refactor or net new Phosphorus v3005.0 Release code name and version severity-high 2nd top severity, seen by most users, causes major problems
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants