Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-117378: Fix multiprocessing forkserver preload sys.path inheritance. #126538

Merged

Conversation

gpshead
Copy link
Member

@gpshead gpshead commented Nov 7, 2024

Fixes the multiprocessing "forkserver" start method forkserver process to correctly inherit the parent's sys.path during the importing of multiprocessing.set_forkserver_preload modules in the same manner as sys.path is configured when executing work items in the worker processes.

This bug could cause some forkserver module preloading to silently fail to be preloaded, leading to a performance degration in child processes due to additional repeated work. It could also have led to a side effect of "" still being in sys.path during forkserver preload imports instead of the absolute path of the directory that workers see.

A workaround for the bug that this fixes was to set PYTHONPATH in the environment before the forkserver process was started.

…ritance.

`sys.path` was not properly being sent from the parent process when launching
the multiprocessing forkserver process to preload imports.  This bug has been
there since the forkserver start method was introduced in Python ~3.4.  It was
always _supposed_ to inherit `sys.path` the same way the spawn method does.

Observable behavior change: A `''` value in `sys.path` will now be replaced in
the forkserver's `sys.path` with an absolute pathname
`os.path.abspath(os.getcwd())` saved at the time that `multiprocessing` was
imported in the parent process as it already was when using the spawn start
method.

A workaround for the bug this fixes was to set PYTHONPATH in the environment
before the forkserver process was started.
@gpshead gpshead added type-bug An unexpected behavior, bug, or error stdlib Python modules in the Lib dir topic-multiprocessing needs backport to 3.12 bug and security fixes needs backport to 3.13 bugs and security fixes labels Nov 7, 2024
@gpshead gpshead self-assigned this Nov 7, 2024
@gpshead gpshead added 🔨 test-with-buildbots Test PR w/ buildbots; report in status section 🔨 test-with-refleak-buildbots Test PR w/ refleak buildbots; report in status section labels Nov 9, 2024
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @gpshead for commit cf07467 🤖

If you want to schedule another build, you need to add the 🔨 test-with-refleak-buildbots label again.

@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @gpshead for commit cf07467 🤖

If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again.

@bedevere-bot bedevere-bot removed 🔨 test-with-refleak-buildbots Test PR w/ refleak buildbots; report in status section 🔨 test-with-buildbots Test PR w/ buildbots; report in status section labels Nov 9, 2024
Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM besides some minor suggestions.

Test for __main__ is not required if it is difficult to write it or it is too slow, but it would be better to add it.

Lib/test/_test_multiprocessing.py Show resolved Hide resolved
Lib/test/_test_multiprocessing.py Show resolved Hide resolved
Lib/test/_test_multiprocessing.py Outdated Show resolved Hide resolved
Lib/test/_test_multiprocessing.py Show resolved Hide resolved
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
@gpshead
Copy link
Member Author

gpshead commented Nov 9, 2024

Test for __main__ is not required if it is difficult to write it or it is too slow, but it would be better to add it.

reading the code and old changes... It looks like there is no possible way for main_path= to be set on the multiprocessing's forkserver.main() call since 2013's 9a76735 for #64145.

the forkserver.main call is constructed in forkserver.py getting its two possible allowed named args from spawn.get_preparation_data() which was changed to set "init_main_from_name" instead of "main_path" in the above change. Effectively making the main_path= code dead in forkserver.

I'll file a separate issue to track resolving that (remove it or recreate some missing functionality with a test).

@gpshead gpshead merged commit 9d08423 into python:main Nov 9, 2024
36 checks passed
@miss-islington-app
Copy link

Thanks @gpshead for the PR 🌮🎉.. I'm working now to backport this PR to: 3.12, 3.13.
🐍🍒⛏🤖

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Nov 9, 2024
…ritance. (pythonGH-126538)

pythongh-117378: Fix multiprocessing forkserver preload sys.path inheritance.

`sys.path` was not properly being sent from the parent process when launching
the multiprocessing forkserver process to preload imports.  This bug has been
there since the forkserver start method was introduced in Python 3.4.  It was
always _supposed_ to inherit `sys.path` the same way the spawn method does.

Observable behavior change: A `''` value in `sys.path` will now be replaced in
the forkserver's `sys.path` with an absolute pathname
`os.path.abspath(os.getcwd())` saved at the time that `multiprocessing` was
imported in the parent process as it already was when using the spawn start
method. **This will only be observable during forkserver preload imports**.

The code invoked before calling things in another process already correctly sets `sys.path`.
Which is likely why this went unnoticed for so long as a mere performance issue in
some configurations.

A workaround for the bug on impacted Pythons is to set PYTHONPATH in the
environment before multiprocessing's forkserver process was started. Not perfect
as that is then inherited by other children, etc, but likely good enough for many
people's purposes.

(cherry picked from commit 9d08423)

Co-authored-by: Gregory P. Smith <greg@krypto.org>
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
@miss-islington-app
Copy link

Sorry, @gpshead, I could not cleanly backport this to 3.12 due to a conflict.
Please backport using cherry_picker on command line.

cherry_picker 9d08423b6e0fa89ce9cfea08e580ed72e5db8c70 3.12

@bedevere-app
Copy link

bedevere-app bot commented Nov 9, 2024

GH-126632 is a backport of this pull request to the 3.13 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.13 bugs and security fixes label Nov 9, 2024
gpshead added a commit that referenced this pull request Nov 9, 2024
…eritance. (GH-126538) (GH-126632)

gh-117378: Fix multiprocessing forkserver preload sys.path inheritance. (GH-126538)

`sys.path` was not properly being sent from the parent process when launching
the multiprocessing forkserver process to preload imports.  This bug has been
there since the forkserver start method was introduced in Python 3.4.  It was
always _supposed_ to inherit `sys.path` the same way the spawn method does.

Observable behavior change: A `''` value in `sys.path` will now be replaced in
the forkserver's `sys.path` with an absolute pathname
`os.path.abspath(os.getcwd())` saved at the time that `multiprocessing` was
imported in the parent process as it already was when using the spawn start
method. **This will only be observable during forkserver preload imports**.

The code invoked before calling things in another process already correctly sets `sys.path`.
Which is likely why this went unnoticed for so long as a mere performance issue in
some configurations.

A workaround for the bug on impacted Pythons is to set PYTHONPATH in the
environment before multiprocessing's forkserver process was started. Not perfect
as that is then inherited by other children, etc, but likely good enough for many
people's purposes.

(cherry picked from commit 9d08423)

Co-authored-by: Gregory P. Smith <greg@krypto.org>
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
@gpshead gpshead deleted the bug/multiprocessing-forkserver-preload-sys-path branch November 9, 2024 23:48
gpshead added a commit to gpshead/cpython that referenced this pull request Nov 9, 2024
…th inheritance. (pythonGH-126538)

pythongh-117378: Fix multiprocessing forkserver preload sys.path inheritance.

`sys.path` was not properly being sent from the parent process when launching
the multiprocessing forkserver process to preload imports.  This bug has been
there since the forkserver start method was introduced in Python 3.4.  It was
always _supposed_ to inherit `sys.path` the same way the spawn method does.

Observable behavior change: A `''` value in `sys.path` will now be replaced in
the forkserver's `sys.path` with an absolute pathname
`os.path.abspath(os.getcwd())` saved at the time that `multiprocessing` was
imported in the parent process as it already was when using the spawn start
method. **This will only be observable during forkserver preload imports**.

The code invoked before calling things in another process already correctly sets `sys.path`.
Which is likely why this went unnoticed for so long as a mere performance issue in
some configurations.

A workaround for the bug on impacted Pythons is to set PYTHONPATH in the
environment before multiprocessing's forkserver process was started. Not perfect
as that is then inherited by other children, etc, but likely good enough for many
people's purposes.

(cherry picked from commit 9d08423)

Co-authored-by: Gregory P. Smith <greg@krypto.org>
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
@bedevere-app
Copy link

bedevere-app bot commented Nov 9, 2024

GH-126633 is a backport of this pull request to the 3.12 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.12 bug and security fixes label Nov 9, 2024
gpshead added a commit that referenced this pull request Nov 10, 2024
…eritance. (GH-126538) (GH-126633)

gh-117378: Fix multiprocessing forkserver preload sys.path inheritance.

`sys.path` was not properly being sent from the parent process when launching
the multiprocessing forkserver process to preload imports.  This bug has been
there since the forkserver start method was introduced in Python 3.4.  It was
always _supposed_ to inherit `sys.path` the same way the spawn method does.

Observable behavior change: A `''` value in `sys.path` will now be replaced in
the forkserver's `sys.path` with an absolute pathname
`os.path.abspath(os.getcwd())` saved at the time that `multiprocessing` was
imported in the parent process as it already was when using the spawn start
method. **This will only be observable during forkserver preload imports**.

The code invoked before calling things in another process already correctly sets `sys.path`.
Which is likely why this went unnoticed for so long as a mere performance issue in
some configurations.

A workaround for the bug on impacted Pythons is to set PYTHONPATH in the
environment before multiprocessing's forkserver process was started. Not perfect
as that is then inherited by other children, etc, but likely good enough for many
people's purposes.

(cherry picked from commit 9d08423)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stdlib Python modules in the Lib dir topic-multiprocessing type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants