Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX support for frozen executable on all platforms #375

Open
wants to merge 52 commits into
base: master
Choose a base branch
from

Conversation

tomMoral
Copy link
Collaborator

@tomMoral tomMoral commented Jan 4, 2023

This is an attempt to fix loky for frozen support for all plateforms. (cc #236 , joblib/joblib#1002). (I got nerd snipped 😅 )

This basically adds a freeze_support function that allows starting the worker with the right executable if it is called right after if __name__ == "__main__":.
This PR also cleanups and refactor the command line generation, to avoid some duplicated code.

I used the following scripts to test and reproduce the error in pure loky:

import argparse
from loky import get_reusable_executor, freeze_support


def main():
    arguments_parser = argparse.ArgumentParser()
    arguments_parser.add_argument("-n", default=None, type=int)
    flags, _ = arguments_parser.parse_known_args()

    # safe guard otherwise new processes are spawned forever
    if flags.n is None:
        raise RuntimeError('main() was called again.')

    exc = get_reusable_executor(2)
    f = exc.submit(print, 1)
    f.result()
    print("Correct end!!!!")


if __name__ == '__main__':
    freeze_support()
    main()
pyinstaller --noconfirm --log-level=WARN --onedir --nowindow test.py
dist/test/test -n 1

this gives me the output:

Traceback (most recent call last):
  File "test.py", line 22, in <module>
    main()
  File "test.py", line 12, in main
    raise RuntimeError('main() was called again.')
RuntimeError: main() was called again.
[551219] Failed to execute script 'test' due to unhandled exception!
1
Correct end!!!!

Note that the error is due to the fact that actually, the resource tracker from cpython is also broken. I am not totally sure why it is actually started but this is not a problem for running the code.
If the code from this PR actually fixes the issue, it could be ported to python to make the frozen support working with all platforms/start methods and with the resource tracker.

As a disclaimer, I did not test this on windows and I would be very glad to have some feedbacks from the original reporters. Maybe from @samuelstjean or @gdoras if you are still around. Sorry for the slow response....

TODO before merge:

  • add a test for frozen executable (with pyinstaller?).
  • Add some comments.

loky/backend/spawn.py Show resolved Hide resolved
loky/backend/spawn.py Outdated Show resolved Hide resolved
@tomMoral tomMoral linked an issue Jan 4, 2023 that may be closed by this pull request
@israeldi
Copy link

israeldi commented Jan 18, 2023

This sounds awesome! Hoping that freeze_support can finally be added to loky and work with pyinstaller. Thank you so much @tomMoral ☺️

@samuelstjean
Copy link

Are we still needed here? I can probably try to get it running, although I scrapped my original windows vm since then, but we now have github actions instead, so probably something to do there. Can't remember if it was also a problem on other platforms too, but that seems likely.

@ogrisel
Copy link
Collaborator

ogrisel commented Feb 21, 2023

I think we need an integration test that runs a minimal loky-based program through pyinstaller and check that it runs correctly.

And we also need a docstring for this: https://github.com/joblib/loky/pull/375/files#r1061777509.

@ogrisel
Copy link
Collaborator

ogrisel commented Feb 21, 2023

I tried this PR with pyinstaller on the following test program named frozen_loky.py):

import loky

if __name__ == "__main__":
    loky.freeze_support()
    e = loky.get_reusable_executor()
    print(sum(e.map(lambda x: x ** 2, range(1000))))
$ pyinstaller --version
5.8.0
$ pyinstaller frozen_loky.py
$ ./sdist/frozen_loky/frozen_loky
332833500
$ 332833500
332833500
332833500
332833500
332833500
332833500
[...]

and it creates many subprocesses that continuously output the result on stdout even after the parent process has completed (I had to pkill frozen_loky to terminate them all).

I tried both at 95ae81c and after the merge of the master branch at 00c8ded and got the same behavior in both cases so I think the problem does not comes.

This is running:

$ python -V
Python 3.10.2

on Linux. Note: I cannot get pyinstaller to work on my local macOS machine because of the problem with the codesign command.

@ogrisel
Copy link
Collaborator

ogrisel commented Feb 21, 2023

Hum, I re-read the original description of the issue and maybe parsing the command line args to raise the RuntimeError is actually needed? This seems not very user friendly.

@ogrisel
Copy link
Collaborator

ogrisel commented Feb 21, 2023

Hum, I re-read the original description of the issue and maybe parsing the command line args to raise the RuntimeError is actually needed? This seems not very user friendly.

I tried just that and could not get it to work either. Or maybe it's working as expected but the error messages on stderr are not expected.

@ogrisel
Copy link
Collaborator

ogrisel commented Feb 21, 2023

Thanks @tomMoral! I confirm that the latest commit make my minimal snippet from #375 (comment) work fine!

loky/backend/spawn.py Outdated Show resolved Hide resolved
Copy link
Collaborator Author

@tomMoral tomMoral left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few clarification comments

loky/backend/spawn.py Outdated Show resolved Hide resolved
loky/backend/spawn.py Show resolved Hide resolved
loky/backend/spawn.py Show resolved Hide resolved
loky/backend/spawn.py Show resolved Hide resolved
loky/backend/spawn.py Show resolved Hide resolved
Copy link
Collaborator Author

@tomMoral tomMoral left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

linting....

loky/backend/spawn.py Outdated Show resolved Hide resolved
loky/backend/spawn.py Outdated Show resolved Hide resolved
loky/backend/spawn.py Outdated Show resolved Hide resolved
tox.ini Outdated Show resolved Hide resolved
tests/test_loky_module.py Outdated Show resolved Hide resolved
@ogrisel
Copy link
Collaborator

ogrisel commented Feb 22, 2023

Summary of the failures of my new test:

  • under linux, I get a segfault when executing the frozen_loky executable generated by pyinstaller. I can reproduce that when I execute the test in tox locally but the same test works when running the test outside of tox (with pytest directly in my conda env with all the test dependencies manually installed). I get extra warnings when running the pyinstaller command (using the -s flag via tox -e py310 -- -s -k test_freeze_support_with_pyinstaller) that seem to point that pyinstaller has trouble to generate the froze executable when executed from with the tox managed venv nested under the CI conda env. Need to investigate to know the details,
  • macos with recent Python has a similar segfault when running the frozen executable;
  • however I cannot reproduce locally, since on my macos laptop, calling the pyinstaller command with a problem with the codesign command, probably a bug of pyinstaller,
  • this problem does not happen with older version of Python on macos for some unknown reason...
  • on both Windows builds pyinstaller crashes with FileNotFoundError: [WinError 3] The system cannot find the path specified: 'C:\\Users\\VssAdministrator\\Application Data\\pyinstaller'
  • on PyPy, pyinstaller fails with:
   File "/home/vsts/work/1/s/.tox/pypy3/site-packages/PyInstaller/depend/bytecode.py", line 36, in _instruction_to_regex
    return re.escape(bytes([opmap[x]]))
KeyError: 'LOAD_METHOD'

I will try to see if I can link some of those problems to known issues in the pyinstaller issue tracker.

@ogrisel
Copy link
Collaborator

ogrisel commented Feb 27, 2023

Some of the latest changes cause the resource tracker start-up to crash on all windows jobs:

Traceback (most recent call last):
  File "X:\loky\loky\backend\spawn.py", line 283, in main
    exitcode = _main(fd, parent_sentinel=parent_sentinel)
  File "X:\loky\loky\backend\spawn.py", line 301, in _main
    prepare(preparation_data)
  File "X:\loky\loky\backend\spawn.py", line 195, in prepare
    _resource_tracker._fd = msvcrt.open_osfhandle(handle, 0)
OSError: [Errno 9] Bad file descriptor

@tomMoral
Copy link
Collaborator Author

Yes this is expected. The strategy put in place by multiprocessing is to not use inheritance anymore but to use duplication.
I changed it for the main process but I also need to adapt the ressource_tracker strategy now that I understood the inheritance. hopefully I will get the time to investigate this by tomorrow.

@tomMoral
Copy link
Collaborator Author

tomMoral commented Mar 1, 2023

After (too much) investigation, I think this is actually related to pyinstaller and the way the frozen programs are created that mess around with the permission of the worker process (see pyinstaller/pyinstaller#7183 and https://github.com/pyinstaller/pyinstaller/blob/develop/PyInstaller/hooks/rthooks/pyi_rth_multiprocessing.py)

Not sure how to fix this yet...

@ogrisel
Copy link
Collaborator

ogrisel commented Mar 1, 2023

It's weird, I tried to use concurrent.futures.ProcessPoolExecutor / multiprocessing.freeze_support in the pyinstaller test of this PR and it passes. There must be something specific to the spawn variant implemented in loky that causes this problem.

@ogrisel
Copy link
Collaborator

ogrisel commented Mar 1, 2023

For information, I tried to see if the refactoring in this PR would potentially fix the SIGSEGV(-11) crash observed on macOS in #388 but unfortunately this does not solve it.

@tomMoral
Copy link
Collaborator Author

tomMoral commented Mar 1, 2023

It's weird, I tried to use concurrent.futures.ProcessPoolExecutor / multiprocessing.freeze_support in the pyinstaller test of this PR and it passes. There must be something specific to the spawn variant implemented in loky that causes this problem.

multiprocessing is monkeypatched but pyinstaller but loky is not so it is not weird that it fails on loky.
The complex thing is, I don't get what part of the monkey patching I need ^^. I tried a bit but I had to move on.
If I confirm this; then, we can xfail this test and add a hook for loky in pyinstaller to fix this and move on. But we have to make sure this is the case first...

@ogrisel
Copy link
Collaborator

ogrisel commented Mar 1, 2023

+1 for opening an issue upstream on the pyinstaller issue tracker and xfailing the windows test case in loky in the mean time.

Please update the changelog accordingly to document that freeze support for windows is still a work in progress.

@ogrisel
Copy link
Collaborator

ogrisel commented Mar 2, 2023

@tomMoral CI trigger

I believe you meant to merge or rebase with current master no?

@ogrisel
Copy link
Collaborator

ogrisel commented Mar 2, 2023

Actually no need, because the latest CI config of master is used in the PR even if there are not up to date with master.

@ogrisel
Copy link
Collaborator

ogrisel commented Mar 2, 2023

Unfortunately, this PR does not fix the TestsProcessPoolSpawnExecutor.test_max_depth[True] failure observed for Windows and Python 3.11.

@ogrisel
Copy link
Collaborator

ogrisel commented Mar 2, 2023

There is another, maybe related failure in test_kill_process_tree. I opened #392.

@ogrisel
Copy link
Collaborator

ogrisel commented Mar 2, 2023

Actually, the test_kill_process_tree failure in this PR is different from the one on master (as reported in #392). Here is the traceback:

================================== FAILURES ===================================
________________________ test_kill_process_tree[True] _________________________

use_psutil = True

    @pytest.mark.parametrize("use_psutil", [True, False])
    def test_kill_process_tree(use_psutil):
        psutil = pytest.importorskip("psutil")
        event = ctx_loky.Event()
        p = ctx_loky.Process(target=_run_nested_delayed, args=(4, 1000, event))
        p.start()
    
        # Wait for all the processes to be launched
        if not event.wait(30):
            kill_process_tree(p, use_psutil=use_psutil)
>           raise RuntimeError(
                "test_kill_process_tree was not able to launch "
                "all nested processes."
            )
E           RuntimeError: test_kill_process_tree was not able to launch all nested processes.

event      = <Event at 0x20e3e77e9d0 unset>
p          = <LokyProcess name='LokyProcess-20' pid=1792 parent=7720 stopped exitcode=15>
psutil     = <module 'psutil' from 'D:\\a\\1\\s\\.tox\\py311\\Lib\\site-packages\\psutil\\__init__.py'>
use_psutil = True

tests\test_loky_backend.py:721: RuntimeError
---------------------------- Captured stdout call -----------------------------


--------------------------------------------------------------------------------
None failed with traceback: 
--------------------------------------------------------------------------------

Traceback (most recent call last):
  File "D:\a\1\s\loky\backend\spawn.py", line 298, in main
    exitcode = _main(fd, parent_sentinel=parent_sentinel)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\a\1\s\loky\backend\spawn.py", line 317, in _main
    self = pickle.load(from_parent)
           ^^^^^^^^^^^^^^^^^^^^^^^^
EOFError: Ran out of input

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Frozen executable - loky (spawn method) does not work
4 participants