A race condition in `mypy_test.py` #9537

AlexWaygood · 2023-01-14T20:04:04Z

#9408 (my PR) appears to have introduced a race condition in mypy_test.py.

Evidence of the race

We've had several "flukey" errors in CI over the last few days, all with the same traceback:

The traceback is the same every time:

Testing third-party packages...
Traceback (most recent call last):
  File "./tests/mypy_test.py", line 557, in <module>
    main()
  File "./tests/mypy_test.py", line 544, in main
    code, files_checked_this_version = test_typeshed(code, args=config, tempdir=td_path)
  File "./tests/mypy_test.py", line 525, in test_typeshed
    code, third_party_files_checked = test_third_party_stubs(code, args, tempdir)
  File "./tests/mypy_test.py", line 500, in test_third_party_stubs
    setup_virtual_environments(distributions_to_check, args, tempdir)
  File "./tests/mypy_test.py", line 444, in setup_virtual_environments
    requirements_set, venv_info = venv_info_future.result()
  File "/opt/hostedtoolcache/Python/3.7.15/x64/lib/python3.7/concurrent/futures/_base.py", line 428, in result
    return self.__get_result()
  File "/opt/hostedtoolcache/Python/3.7.15/x64/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
  File "/opt/hostedtoolcache/Python/3.7.15/x64/lib/python3.7/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "./tests/mypy_test.py", line 384, in setup_venv_for_external_requirements_set
    return requirements_set, make_venv(venv_dir)
  File "/home/runner/work/typeshed/typeshed/tests/utils.py", line 150, in make_venv
    venv.create(venv_dir, with_pip=True, clear=True)
  File "/opt/hostedtoolcache/Python/3.7.15/x64/lib/python3.7/venv/__init__.py", line 390, in create
    builder.create(env_dir)
  File "/opt/hostedtoolcache/Python/3.7.15/x64/lib/python3.7/venv/__init__.py", line 68, in create
    self._setup_pip(context)
  File "/opt/hostedtoolcache/Python/3.7.15/x64/lib/python3.7/venv/__init__.py", line 288, in _setup_pip
    subprocess.check_output(cmd, stderr=subprocess.STDOUT)
  File "/opt/hostedtoolcache/Python/3.7.15/x64/lib/python3.7/subprocess.py", line 411, in check_output
    **kwargs).stdout
  File "/opt/hostedtoolcache/Python/3.7.15/x64/lib/python3.7/subprocess.py", line 488, in run
    with Popen(*popenargs, **kwargs) as process:
  File "/opt/hostedtoolcache/Python/3.7.15/x64/lib/python3.7/subprocess.py", line 800, in __init__
    restore_signals, start_new_session)
  File "/opt/hostedtoolcache/Python/3.7.15/x64/lib/python3.7/subprocess.py", line 1551, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
OSError: [Errno 26] Text file busy: '/tmp/tmp45sugieh/.venv-5354540476898/bin/python'

Cause of the race

The race happens in this part of the test, where multiple virtual environments are set up concurrently using a threadpool.

typeshed/tests/mypy_test.py

Lines 438 to 445 in 0a291da

 with concurrent.futures.ThreadPoolExecutor() as executor: 

 venv_info_futures = [ 

 executor.submit(setup_venv_for_external_requirements_set, requirements_set, tempdir) 

 for requirements_set in external_requirements_to_distributions 

 ] 

 for venv_info_future in concurrent.futures.as_completed(venv_info_futures): 

 requirements_set, venv_info = venv_info_future.result() 

 requirements_sets_to_venvs[requirements_set] = venv_info

However, it's unclear exactly why the race occurs. The traceback indicates that two threads appear to be attempting to access the same Python executable at the same time. But it's hard to see why that would be the case.

Reproducing the race condition

@JelleZijlstra has managed to reproduce the race locally on a Linux machine using Python 3.7. This rules out any GitHub Actions-specific hypotheses for why the race might be happening.

I have not been able to reproduce the failure in my local environment (Windows, Python 3.10). (I've tried creating a venv for 145 stubs packages in a threadpool with 20 workers -- still no race for me.)

Note that mypy_test.py is only ever run on Ubuntu in CI.

Fixing the issue

Idea (1): Just create the venvs one-at-a-time

You can't have a race condition if you don't try to do things concurrently. Locally, this makes the test much slower for me -- the time taken to create the venvs goes up from around 20 seconds to around 60 seconds. The specific part of creating a venv that is slow is the venv.EnvBuilder._setup_pip function in the stdlib -- the exact part that appears to be implicated in the race condition.

However, if we did go this route, we could potentially mitigate the performance regression by caching venvs between runs of the test (both locally and in CI).

Idea (2): Do concurrency differently

Instead of using a threadpool, we could manually create our threads. Or we could move to an asyncio-based concurrency model.

Either of these might fix the race condition, but it's hard to know if they would without knowing exactly what's causing the race condition.

Idea (3): Paper over the problem

This diff would probably "fix" the problem, but it wouldn't be a particularly principled fix:

diff --git a/tests/mypy_test.py b/tests/mypy_test.py
index 087f41366..a0a66ed31 100644
--- a/tests/mypy_test.py
+++ b/tests/mypy_test.py
@@ -381,7 +381,17 @@ _DISTRIBUTION_TO_VENV_MAPPING: dict[str, VenvInfo] = {}

 def setup_venv_for_external_requirements_set(requirements_set: frozenset[str], tempdir: Path) -> tuple[frozenset[str], VenvInfo]:
     venv_dir = tempdir / f".venv-{hash(requirements_set)}"
-    return requirements_set, make_venv(venv_dir)
+    while True:
+        try:
+            venv_info = make_venv(venv_dir)
+        except OSError as e:
+            if e.errno != 26:
+                raise
+            else:
+                time.sleep(0.5)
+        else:
+            break
+    return requirements_set, venv_info

The text was updated successfully, but these errors were encountered:

srittau · 2023-01-16T13:56:28Z

We could try switching to ProcessPoolExecutor and see if that helps.

AlexWaygood · 2023-01-17T14:21:15Z

We could try switching to ProcessPoolExecutor and see if that helps.

Implemented in #9556. I'll hold off work on trying to cache venvs between runs for a while, so we can test this change in isolation and see if the problem is actually fixed. Caching venvs between runs would be good to do anyway, though (and not just for this test -- stubtest_third_party would benefit as well), so I'll work on that after we've waited a while.

@JelleZijlstra, could you possibly also see if you can still reproduce the race after #9556?

JelleZijlstra · 2023-01-17T17:35:54Z

Running it again now.

JelleZijlstra · 2023-01-17T22:54:39Z

I ran it for a few hours and saw no failures. I think the issue is fixed!

Maybe there really is a bug in the ThreadPoolExecutor.

JelleZijlstra · 2023-01-17T22:57:09Z

The traceback indicates that two threads appear to be attempting to access the same Python executable at the same time.

More precisely, it indicates that we're trying to execute the Python executable while someone has the file open for writing. I looked at the code for venv and it's not clear how that is possible.

AlexWaygood · 2023-01-17T23:11:14Z

Thanks v much @JelleZijlstra for helping debug this, and thanks @srittau for suggesting the fix!

I was pretty pleased with how easy it was to swap in ProcessPoolExecutor here, I've got to say. I've had cause to use ThreadPoolExecutor before, but never ProcessPoolExecutor before now.

AlexWaygood added the project: infrastructure typeshed build, test, documentation, or distribution related label Jan 14, 2023

AlexWaygood mentioned this issue Jan 14, 2023

Special-case object alias _typeshed.Unused PyCQA/flake8-pyi#320

Merged

AlexWaygood mentioned this issue Jan 17, 2023

mypy_test.py: Try using a process pool instead of a thread pool #9556

Merged

JelleZijlstra closed this as completed Jan 17, 2023

AlexWaygood mentioned this issue Sep 23, 2023

Speedup regr_test.py by running test cases concurrently #10714

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A race condition in `mypy_test.py` #9537

A race condition in `mypy_test.py` #9537

AlexWaygood commented Jan 14, 2023

srittau commented Jan 16, 2023

AlexWaygood commented Jan 17, 2023

JelleZijlstra commented Jan 17, 2023

JelleZijlstra commented Jan 17, 2023

JelleZijlstra commented Jan 17, 2023

AlexWaygood commented Jan 17, 2023 •

edited

Loading

A race condition in mypy_test.py #9537

A race condition in mypy_test.py #9537

Comments

AlexWaygood commented Jan 14, 2023

Evidence of the race

Cause of the race

Reproducing the race condition

Fixing the issue

Idea (1): Just create the venvs one-at-a-time

Idea (2): Do concurrency differently

Idea (3): Paper over the problem

srittau commented Jan 16, 2023

AlexWaygood commented Jan 17, 2023

JelleZijlstra commented Jan 17, 2023

JelleZijlstra commented Jan 17, 2023

JelleZijlstra commented Jan 17, 2023

AlexWaygood commented Jan 17, 2023 • edited Loading

A race condition in `mypy_test.py` #9537

A race condition in `mypy_test.py` #9537

AlexWaygood commented Jan 17, 2023 •

edited

Loading