Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A race condition in mypy_test.py #9537

Closed
AlexWaygood opened this issue Jan 14, 2023 · 6 comments
Closed

A race condition in mypy_test.py #9537

AlexWaygood opened this issue Jan 14, 2023 · 6 comments
Labels
project: infrastructure typeshed build, test, documentation, or distribution related

Comments

@AlexWaygood
Copy link
Member

#9408 (my PR) appears to have introduced a race condition in mypy_test.py.

Evidence of the race

We've had several "flukey" errors in CI over the last few days, all with the same traceback:

The traceback is the same every time:
Testing third-party packages...
Traceback (most recent call last):
  File "./tests/mypy_test.py", line 557, in <module>
    main()
  File "./tests/mypy_test.py", line 544, in main
    code, files_checked_this_version = test_typeshed(code, args=config, tempdir=td_path)
  File "./tests/mypy_test.py", line 525, in test_typeshed
    code, third_party_files_checked = test_third_party_stubs(code, args, tempdir)
  File "./tests/mypy_test.py", line 500, in test_third_party_stubs
    setup_virtual_environments(distributions_to_check, args, tempdir)
  File "./tests/mypy_test.py", line 444, in setup_virtual_environments
    requirements_set, venv_info = venv_info_future.result()
  File "/opt/hostedtoolcache/Python/3.7.15/x64/lib/python3.7/concurrent/futures/_base.py", line 428, in result
    return self.__get_result()
  File "/opt/hostedtoolcache/Python/3.7.15/x64/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
  File "/opt/hostedtoolcache/Python/3.7.15/x64/lib/python3.7/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "./tests/mypy_test.py", line 384, in setup_venv_for_external_requirements_set
    return requirements_set, make_venv(venv_dir)
  File "/home/runner/work/typeshed/typeshed/tests/utils.py", line 150, in make_venv
    venv.create(venv_dir, with_pip=True, clear=True)
  File "/opt/hostedtoolcache/Python/3.7.15/x64/lib/python3.7/venv/__init__.py", line 390, in create
    builder.create(env_dir)
  File "/opt/hostedtoolcache/Python/3.7.15/x64/lib/python3.7/venv/__init__.py", line 68, in create
    self._setup_pip(context)
  File "/opt/hostedtoolcache/Python/3.7.15/x64/lib/python3.7/venv/__init__.py", line 288, in _setup_pip
    subprocess.check_output(cmd, stderr=subprocess.STDOUT)
  File "/opt/hostedtoolcache/Python/3.7.15/x64/lib/python3.7/subprocess.py", line 411, in check_output
    **kwargs).stdout
  File "/opt/hostedtoolcache/Python/3.7.15/x64/lib/python3.7/subprocess.py", line 488, in run
    with Popen(*popenargs, **kwargs) as process:
  File "/opt/hostedtoolcache/Python/3.7.15/x64/lib/python3.7/subprocess.py", line 800, in __init__
    restore_signals, start_new_session)
  File "/opt/hostedtoolcache/Python/3.7.15/x64/lib/python3.7/subprocess.py", line 1551, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
OSError: [Errno 26] Text file busy: '/tmp/tmp45sugieh/.venv-5354540476898/bin/python'

Cause of the race

The race happens in this part of the test, where multiple virtual environments are set up concurrently using a threadpool.

typeshed/tests/mypy_test.py

Lines 438 to 445 in 0a291da

with concurrent.futures.ThreadPoolExecutor() as executor:
venv_info_futures = [
executor.submit(setup_venv_for_external_requirements_set, requirements_set, tempdir)
for requirements_set in external_requirements_to_distributions
]
for venv_info_future in concurrent.futures.as_completed(venv_info_futures):
requirements_set, venv_info = venv_info_future.result()
requirements_sets_to_venvs[requirements_set] = venv_info

However, it's unclear exactly why the race occurs. The traceback indicates that two threads appear to be attempting to access the same Python executable at the same time. But it's hard to see why that would be the case.

Reproducing the race condition

@JelleZijlstra has managed to reproduce the race locally on a Linux machine using Python 3.7. This rules out any GitHub Actions-specific hypotheses for why the race might be happening.

I have not been able to reproduce the failure in my local environment (Windows, Python 3.10). (I've tried creating a venv for 145 stubs packages in a threadpool with 20 workers -- still no race for me.)

Note that mypy_test.py is only ever run on Ubuntu in CI.

Fixing the issue

Idea (1): Just create the venvs one-at-a-time

You can't have a race condition if you don't try to do things concurrently. Locally, this makes the test much slower for me -- the time taken to create the venvs goes up from around 20 seconds to around 60 seconds. The specific part of creating a venv that is slow is the venv.EnvBuilder._setup_pip function in the stdlib -- the exact part that appears to be implicated in the race condition.

However, if we did go this route, we could potentially mitigate the performance regression by caching venvs between runs of the test (both locally and in CI).

Idea (2): Do concurrency differently

Instead of using a threadpool, we could manually create our threads. Or we could move to an asyncio-based concurrency model.

Either of these might fix the race condition, but it's hard to know if they would without knowing exactly what's causing the race condition.

Idea (3): Paper over the problem

This diff would probably "fix" the problem, but it wouldn't be a particularly principled fix:

diff --git a/tests/mypy_test.py b/tests/mypy_test.py
index 087f41366..a0a66ed31 100644
--- a/tests/mypy_test.py
+++ b/tests/mypy_test.py
@@ -381,7 +381,17 @@ _DISTRIBUTION_TO_VENV_MAPPING: dict[str, VenvInfo] = {}

 def setup_venv_for_external_requirements_set(requirements_set: frozenset[str], tempdir: Path) -> tuple[frozenset[str], VenvInfo]:
     venv_dir = tempdir / f".venv-{hash(requirements_set)}"
-    return requirements_set, make_venv(venv_dir)
+    while True:
+        try:
+            venv_info = make_venv(venv_dir)
+        except OSError as e:
+            if e.errno != 26:
+                raise
+            else:
+                time.sleep(0.5)
+        else:
+            break
+    return requirements_set, venv_info
@AlexWaygood AlexWaygood added the project: infrastructure typeshed build, test, documentation, or distribution related label Jan 14, 2023
@srittau
Copy link
Collaborator

srittau commented Jan 16, 2023

We could try switching to ProcessPoolExecutor and see if that helps.

@AlexWaygood
Copy link
Member Author

We could try switching to ProcessPoolExecutor and see if that helps.

Implemented in #9556. I'll hold off work on trying to cache venvs between runs for a while, so we can test this change in isolation and see if the problem is actually fixed. Caching venvs between runs would be good to do anyway, though (and not just for this test -- stubtest_third_party would benefit as well), so I'll work on that after we've waited a while.

@JelleZijlstra, could you possibly also see if you can still reproduce the race after #9556?

@JelleZijlstra
Copy link
Member

Running it again now.

@JelleZijlstra
Copy link
Member

I ran it for a few hours and saw no failures. I think the issue is fixed!

Maybe there really is a bug in the ThreadPoolExecutor.

@JelleZijlstra
Copy link
Member

The traceback indicates that two threads appear to be attempting to access the same Python executable at the same time.

More precisely, it indicates that we're trying to execute the Python executable while someone has the file open for writing. I looked at the code for venv and it's not clear how that is possible.

@AlexWaygood
Copy link
Member Author

AlexWaygood commented Jan 17, 2023

Thanks v much @JelleZijlstra for helping debug this, and thanks @srittau for suggesting the fix!

I was pretty pleased with how easy it was to swap in ProcessPoolExecutor here, I've got to say. I've had cause to use ThreadPoolExecutor before, but never ProcessPoolExecutor before now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
project: infrastructure typeshed build, test, documentation, or distribution related
Projects
None yet
Development

No branches or pull requests

3 participants