Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DBG CI-trigger #395

Closed
wants to merge 12 commits into from
Closed

DBG CI-trigger #395

wants to merge 12 commits into from

Conversation

tomMoral
Copy link
Collaborator

No description provided.

@ogrisel
Copy link
Collaborator

ogrisel commented Apr 13, 2023

On the first trigger we get 5 failures:

FAILED tests/test_loky_backend.py::test_kill_process_tree[True] - RuntimeErro...
FAILED tests/test_loky_backend.py::test_kill_process_tree[False] - RuntimeErr...
FAILED tests/test_process_executor_loky.py::TestsProcessPoolLokyExecutor::test_max_depth[True]
FAILED tests/test_process_executor_loky.py::TestsProcessPoolLokyExecutor::test_max_depth[False]
FAILED tests/test_process_executor_spawn.py::TestsProcessPoolSpawnExecutor::test_max_depth[True]

and on the second we get a deadlock in test_max_depth after a failure in test_kill_process_tree:

tests/test_process_executor_loky.py::TestsProcessPoolLokyExecutor::test_max_depth[True]

@ogrisel
Copy link
Collaborator

ogrisel commented Apr 13, 2023

I pushed a commit to display more debug into.

@ogrisel
Copy link
Collaborator

ogrisel commented Apr 13, 2023

I pushed some logs to make sure which error comes from where.

@ogrisel
Copy link
Collaborator

ogrisel commented Apr 13, 2023

So supposedly the OSError: [WinError 50] The request is not supported raised in LokyProcess-1:1:MainThread has errno 22 and should be ignorable.

Then the following traceback is the root cause of the failure:

[DEBUG:LokyProcess-1:1:1:MainThread] error in main:
Traceback (most recent call last):
  File "D:\a\1\s\loky\backend\popen_loky_win32.py", line 167, in main
    self = load(from_parent)
           ^^^^^^^^^^^^^^^^^
EOFError: Ran out of input

Note however that it is always logged after the errno 22 in its parent.

@ogrisel
Copy link
Collaborator

ogrisel commented Apr 13, 2023

Also note that I cannot reproduce this in a py311 env on a local windows VM... hence the need to debug with the CI...

@ogrisel
Copy link
Collaborator

ogrisel commented Apr 13, 2023

It's weird that this problem only happens in the third nested subprocess:

  • MainProcess:MainThread is the pytest process
  • LokyProcess-1 is the first level worker and can successfully spawn LokyProcess-1:1,
  • LokyProcess-1:1 is the second level worker and fails to spawn LokyProcess-1:1:1 in turn.

The event is therefore never set. And this all happens before attempting to kill the process tree.

@ogrisel
Copy link
Collaborator

ogrisel commented Apr 13, 2023

I have the impression that the failure reduction.dump(process_obj, to_child) at line 106 of loky/backend/popen_loky_win32.py is actually the cause of the problem, rather than an ignorable consequence in this case: the subprocess cannot read from its parent because of the OSError raised by _winapi.DuplicateHandle when calling reduction.dump(process_obj, to_child) in the parent.

@ogrisel
Copy link
Collaborator

ogrisel commented Apr 13, 2023

For the record, is the full traceback of the ignored exception in the parent:

[DEBUG:LokyProcess-1:1:MainThread] error in Popen.__init__:
Traceback (most recent call last):
  File "D:\a\1\s\loky\backend\popen_loky_win32.py", line 106, in __init__
    reduction.dump(process_obj, to_child)
  File "D:\a\1\s\loky\backend\reduction.py", line 208, in dump
    _LokyPickler(file, reducers=reducers, protocol=protocol).dump(obj)
  File "D:\a\1\s\.tox\py311\Lib\site-packages\cloudpickle\cloudpickle_fast.py", line 632, in dump
    return Pickler.dump(self, obj)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\hostedtoolcache\windows\Python\3.11.3\x64\Lib\multiprocessing\synchronize.py", line 104, in __getstate__
    h = context.get_spawning_popen().duplicate_for_child(sl.handle)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\a\1\s\loky\backend\popen_loky_win32.py", line 126, in duplicate_for_child
    return duplicate(handle, self.sentinel)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\hostedtoolcache\windows\Python\3.11.3\x64\Lib\multiprocessing\reduction.py", line 79, in duplicate
    return _winapi.DuplicateHandle(
           ^^^^^^^^^^^^^^^^^^^^^^^^
OSError: [WinError 50] The request is not supported

tb = traceback.format_exception(
type(exc), exc, getattr(exc, "__traceback__", None)
)
util.debug("error in Popen.__init__:\n" + "".join(tb))
if exc.errno != 22:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that in cpython main, this special casing of errno 22 does not exist: the exception is always raised.

@ogrisel
Copy link
Collaborator

ogrisel commented Apr 13, 2023

Ok so this is a loky-specific problem: running the same test with the default "spawn" context works. So we will have to diff the 2 backends w.r.t. semlock usage.

@ogrisel
Copy link
Collaborator

ogrisel commented Apr 13, 2023

I will re-push a trigger commit just to be sure.

@ogrisel
Copy link
Collaborator

ogrisel commented Apr 13, 2023

Ok so based on the last CI runs, this is really a problem that only happens with the loky context, not the spawn context.

@ogrisel
Copy link
Collaborator

ogrisel commented Apr 14, 2023

Fixed by #394.

@ogrisel ogrisel closed this Apr 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants