Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TemporaryDirectory.__exit__ sometimes raises bad PermissionError #113009

Closed
kfrydel opened this issue Dec 12, 2023 · 12 comments
Closed

TemporaryDirectory.__exit__ sometimes raises bad PermissionError #113009

kfrydel opened this issue Dec 12, 2023 · 12 comments
Labels
3.11 only security fixes 3.12 bugs and security fixes 3.13 bugs and security fixes OS-windows stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@kfrydel
Copy link

kfrydel commented Dec 12, 2023

Bug report

TemporaryDirectory.__exit__ usually raises PermissionError if another process keeping a handle to a file in the tmp directory has just finished its execution.

Bug description:

Please consider the following block of code:

import multiprocessing
import os
import tempfile
import time


def _open_func(file_path):
    with open(file_path, "w"):
        time.sleep(1000)


def test():
    with tempfile.TemporaryDirectory(suffix="used_by_another_process") as dir_path:
        file_path = os.path.join(dir_path, "file_being_used")
        proc = multiprocessing.Process(target=_open_func, args=(file_path,))
        proc.start()
        while not os.path.exists(file_path):
            time.sleep(0.1)
        proc.terminate()
        proc.join()


if __name__ == "__main__":
    test()

Despite the child process being terminated and joined, the __exit__ method of TemporaryDirectory sometimes raises PermissionError:

Traceback (most recent call last):
  File "C:\Python\python.3.12.1\tools\Lib\shutil.py", line 634, in _rmtree_unsafe
    os.unlink(fullname)
PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'C:\\Users\\kamil\\AppData\\Local\\Temp\\tmpvaou_oibused_by_another_process\\file_being_used'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\kamil\tmp_test\tmpdir.py", line 24, in <module>
    test()
  File "C:\Users\kamil\tmp_test\tmpdir.py", line 13, in test
    with tempfile.TemporaryDirectory(suffix="used_by_another_process") as dir_path:
  File "C:\Python\python.3.12.1\tools\Lib\tempfile.py", line 946, in __exit__
    self.cleanup()
  File "C:\Python\python.3.12.1\tools\Lib\tempfile.py", line 950, in cleanup
    self._rmtree(self.name, ignore_errors=self._ignore_cleanup_errors)
  File "C:\Python\python.3.12.1\tools\Lib\tempfile.py", line 930, in _rmtree
    _shutil.rmtree(name, onexc=onexc)
  File "C:\Python\python.3.12.1\tools\Lib\shutil.py", line 808, in rmtree
    return _rmtree_unsafe(path, onexc)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python\python.3.12.1\tools\Lib\shutil.py", line 636, in _rmtree_unsafe
    onexc(os.unlink, fullname, err)
  File "C:\Python\python.3.12.1\tools\Lib\tempfile.py", line 905, in onexc
    _os.unlink(path)
PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'C:\\Users\\kamil\\AppData\\Local\\Temp\\tmpvaou_oibused_by_another_process\\file_being_used'

It does not reproduce in 100% but most executions fail. I can reproduce it only using Python 3.12.1. It does not happen to me on 3.12.0 or 3.11. It seems to be a regression in the last release.
With some small sleep after the proc.join() it stops reproducing so it looks like a kind of race condition.

The Windows version I use is:
Edition: Windows 10 Enterprise
Version: 21H2
OS build: 19044.3693

CPython versions tested on:

3.12

Operating systems tested on:

Windows

Linked PRs

@kfrydel kfrydel added the type-bug An unexpected behavior, bug, or error label Dec 12, 2023
@terryjreedy terryjreedy changed the title TemporaryDirectory.__exit__ raises PermissionError if other process keeping handle to a file in the tmp directory has just finished its execution TemporaryDirectory.__exit__ raises PermissionError if other process keeping handle to a file in the tmp directory has just finished its execution Dec 12, 2023
@terryjreedy terryjreedy changed the title TemporaryDirectory.__exit__ raises PermissionError if other process keeping handle to a file in the tmp directory has just finished its execution TemporaryDirectory.__exit__ sometimes raises bad PermissionError Dec 12, 2023
@terryjreedy terryjreedy added the stdlib Python modules in the Lib dir label Dec 12, 2023
@savannahostrowski
Copy link
Member

savannahostrowski commented Dec 13, 2023

I was also able to reproduce this on Windows 11 Pro (22H2) with 3.11.7 and 3.12.1. It reproduces reliably with the code snippet above.

I also tested this on Linux (Debian) with 3.11, 3.12.0 and 3.12.1 and was unable to repro.

@serhiy-storchaka serhiy-storchaka added OS-windows 3.11 only security fixes 3.12 bugs and security fixes 3.13 bugs and security fixes labels Dec 13, 2023
@serhiy-storchaka
Copy link
Member

The cause is the same as in #109608 and #66223. There is also similar issue with directories: #86962.

@kfrydel
Copy link
Author

kfrydel commented Dec 14, 2023

I have spent some time debugging another issue in the same piece of code that revealed this bug and probably I have found the cause of this as well. So consider the following block of code:

import multiprocessing
import os
import tempfile
import time

import psutil


def _open_func(file_path):
    with open(file_path, "w"):
        time.sleep(1000)


def test(i):
    with tempfile.TemporaryDirectory(suffix="used_by_another_process") as dir_path:
        file_path = os.path.join(dir_path, "file_being_used")
        proc = multiprocessing.Process(target=_open_func, args=(file_path,))
        proc.start()
        while not os.path.exists(file_path):
            time.sleep(0.1)
        proc.terminate()
        proc.join()
        try:
            process = psutil.Process(proc.pid)
            print(f'{process=} {i=}')
            assert False
        except psutil.Error:
            pass


if __name__ == "__main__":
    for i in range(1000):
        test(i)

Running psutil.Process, since it takes some time, prevents from falling into PermissionError issue but sometimes the above code prints the child process information and finally fails on the assert False.

So the real problem here is probably in fact that invoking proc.terminate(); proc.join() finishes its execution before the process is stopped.

It looks like the regression was introduced by this change: a2074911ba Please note that it is not present in 3.12.0 but in 3.12.1 it is. Previously, the terminate method was not setting self.returncode so the wait method was indeed waiting. But now, since returncode is always set by terminate, wait does nothing. Reverting the change fixes both issues.
CC: @vstinner as the author of the change.

@vstinner
Copy link
Member

Previously, the terminate method was not setting self.returncode so the wait method was indeed waiting. But now, since returncode is always set by terminate, wait does nothing.

Do you mean that in terminate(), when TerminateProcess() raises PermissionError and GetExitCodeProcess() returns an exit code different than STILL_ACTIVE, the process is in fact still running?

@vstinner
Copy link
Member

Please log the returncode. Example:

def test():
    with tempfile.TemporaryDirectory(suffix="used_by_another_process") as dir_path:
        file_path = os.path.join(dir_path, "file_being_used")
        proc = multiprocessing.Process(target=_open_func, args=(file_path,))
        print(repr(proc))
        proc.start()
        while not os.path.exists(file_path):
            time.sleep(0.1)
        proc.terminate()
        print(f"terminate: {proc._popen.returncode=}")
        proc.join()
        print(f"join: {proc._popen.returncode=}")

@kfrydel
Copy link
Author

kfrydel commented Dec 14, 2023

No, I mean that we now set returncode in the else clause:

self.returncode = -signal.SIGTERM

Thus wait skips waiting:

if self.returncode is None:
and we may end up with running process for some time after Process.join finishes.

The output of the program with additional debug logs:

terminate: proc._popen.returncode=-15
join: proc._popen.returncode=-15

@vstinner
Copy link
Member

I reproduced the bug on Python 3.13 on Windows. When the bug occurred, TerminateProcess() succeeded.

vstinner@WIN C:\victor\python\main>python bug.py
Running Release|x64 interpreter...
<Process name='Process-1' parent=5512 initial>
TerminateProcess -> ok
terminate: proc._popen.returncode=-15
join: proc._popen.returncode=-15
Traceback (most recent call last):
  File "C:\victor\python\main\Lib\shutil.py", line 635, in _rmtree_unsafe
    os.unlink(fullname)
    ~~~~~~~~~^^^^^^^^^^
PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'C:\\Users\\vstinner\\AppData\\Local\\Temp\\tmpqyh2efopused_by_another_process\\file_being_used'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\victor\python\main\bug.py", line 25, in <module>
    test()
    ~~~~^^
  File "C:\victor\python\main\bug.py", line 12, in test
    with tempfile.TemporaryDirectory(suffix="used_by_another_process") as dir_path:
    ...<9 lines>...
        print(f"join: {proc._popen.returncode=}")
  File "C:\victor\python\main\Lib\tempfile.py", line 946, in __exit__
    self.cleanup()
    ~~~~~~~~~~~~^^
  File "C:\victor\python\main\Lib\tempfile.py", line 950, in cleanup
    self._rmtree(self.name, ignore_errors=self._ignore_cleanup_errors)
    ~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\victor\python\main\Lib\tempfile.py", line 930, in _rmtree
    _shutil.rmtree(name, onexc=onexc)
    ~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^
  File "C:\victor\python\main\Lib\shutil.py", line 830, in rmtree
    return _rmtree_unsafe(path, onexc)
           ~~~~~~~~~~~~~~^^^^^^^^^^^^^
  File "C:\victor\python\main\Lib\shutil.py", line 639, in _rmtree_unsafe
    onexc(os.unlink, fullname, err)
    ~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\victor\python\main\Lib\tempfile.py", line 905, in onexc
    _os.unlink(path)
    ~~~~~~~~~~^^^^^^
PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'C:\\Users\\vstinner\\AppData\\Local\\Temp\\tmpqyh2efopused_by_another_process\\file_being_used'

@vstinner
Copy link
Member

and we may end up with running process for some time after Process.join finishes.

Do you mean that it's possible that a process is still running even if GetExitCodeProcess() returns an exit code other than STILL_ACTIVE?

@kfrydel
Copy link
Author

kfrydel commented Dec 14, 2023

I think that the code now does not call GetExitCodeProcess. It is in two places:

  1. code = _winapi.GetExitCodeProcess(self._handle)
  2. code = _winapi.GetExitCodeProcess(int(self._handle))

If TerminateProcess succeeds (which I think is my case), we do not enter except PermissionError thus we do not call it in line 129. Then when the code calls proc.join(), and it calls internally wait, self.returncode is not None so return statement is executed immediately and the code does not call GetExitCodeProcess placed in line 112.

vstinner added a commit to vstinner/cpython that referenced this issue Dec 14, 2023
On Windows, Process.terminate() no longer sets the returncode
attribute to always call WaitForSingleObject() in Process.wait().
Previously, sometimes the process was still running after
TerminateProcess() even if GetExitCodeProcess() is not STILL_ACTIVE.
vstinner added a commit to vstinner/cpython that referenced this issue Dec 14, 2023
On Windows, Process.terminate() no longer sets the returncode
attribute to always call WaitForSingleObject() in Process.wait().
Previously, sometimes the process was still running after
TerminateProcess() even if GetExitCodeProcess() is not STILL_ACTIVE.
@vstinner
Copy link
Member

So the difference with my recent change is that WaitForSingleObject() is no longer called. Sometimes, the child process is no longer running, and TemporaryDirectory removal is fine. Sometimes, the child process is still running, and TemporaryDirectory removal fails.

I wrote PR #113128 to restore the old behavior: always call WaitForSingleObject(), even if TerminateProcess() succeeded and GetExitCodeProcess() is not STILL_ACTIVE. As an Unix programmer, the Windows API always surprise me. I expected TerminateProcess() to... terminate the process, not to schedule an asynchonous termination.

It reminds me a similar process in asyncio with asynchronous cancellation: https://vstinner.github.io/asyncio-proactor-cancellation-from-hell.html

@kfrydel
Copy link
Author

kfrydel commented Dec 15, 2023

Thank you for the fix! It now works fine with the change from your PR. Do you think it will be included in 3.12.2?

vstinner added a commit that referenced this issue Dec 15, 2023
On Windows, Process.terminate() no longer sets the returncode
attribute to always call WaitForSingleObject() in Process.wait().
Previously, sometimes the process was still running after
TerminateProcess() even if GetExitCodeProcess() is not STILL_ACTIVE.
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Dec 15, 2023
…ythonGH-113128)

On Windows, Process.terminate() no longer sets the returncode
attribute to always call WaitForSingleObject() in Process.wait().
Previously, sometimes the process was still running after
TerminateProcess() even if GetExitCodeProcess() is not STILL_ACTIVE.
(cherry picked from commit 4026ad5)

Co-authored-by: Victor Stinner <vstinner@python.org>
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Dec 15, 2023
…ythonGH-113128)

On Windows, Process.terminate() no longer sets the returncode
attribute to always call WaitForSingleObject() in Process.wait().
Previously, sometimes the process was still running after
TerminateProcess() even if GetExitCodeProcess() is not STILL_ACTIVE.
(cherry picked from commit 4026ad5)

Co-authored-by: Victor Stinner <vstinner@python.org>
@vstinner
Copy link
Member

Thanks for the bug report, analysis and testing my fix. It should now be fixed. Oops, I was surprised by async Windows API, different than Unix way of doing things. Backports will follow.

vstinner added a commit that referenced this issue Dec 15, 2023
…H-113128) (#113178)

gh-113009: Fix multiprocessing Process.terminate() on Windows (GH-113128)

On Windows, Process.terminate() no longer sets the returncode
attribute to always call WaitForSingleObject() in Process.wait().
Previously, sometimes the process was still running after
TerminateProcess() even if GetExitCodeProcess() is not STILL_ACTIVE.
(cherry picked from commit 4026ad5)

Co-authored-by: Victor Stinner <vstinner@python.org>
vstinner added a commit that referenced this issue Dec 15, 2023
…H-113128) (#113177)

gh-113009: Fix multiprocessing Process.terminate() on Windows (GH-113128)

On Windows, Process.terminate() no longer sets the returncode
attribute to always call WaitForSingleObject() in Process.wait().
Previously, sometimes the process was still running after
TerminateProcess() even if GetExitCodeProcess() is not STILL_ACTIVE.
(cherry picked from commit 4026ad5)

Co-authored-by: Victor Stinner <vstinner@python.org>
corona10 pushed a commit to corona10/cpython that referenced this issue Dec 15, 2023
…ython#113128)

On Windows, Process.terminate() no longer sets the returncode
attribute to always call WaitForSingleObject() in Process.wait().
Previously, sometimes the process was still running after
TerminateProcess() even if GetExitCodeProcess() is not STILL_ACTIVE.
aisk pushed a commit to aisk/cpython that referenced this issue Feb 11, 2024
…ython#113128)

On Windows, Process.terminate() no longer sets the returncode
attribute to always call WaitForSingleObject() in Process.wait().
Previously, sometimes the process was still running after
TerminateProcess() even if GetExitCodeProcess() is not STILL_ACTIVE.
Glyphack pushed a commit to Glyphack/cpython that referenced this issue Sep 2, 2024
…ython#113128)

On Windows, Process.terminate() no longer sets the returncode
attribute to always call WaitForSingleObject() in Process.wait().
Previously, sometimes the process was still running after
TerminateProcess() even if GetExitCodeProcess() is not STILL_ACTIVE.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.11 only security fixes 3.12 bugs and security fixes 3.13 bugs and security fixes OS-windows stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
Projects
Status: Done
Development

No branches or pull requests

5 participants