You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Using a custom executor using WorkerPlugin + ProcessPoolExecutor causes _pickle.PicklingError: Can't pickle <function my_process at 0x7f83f862ab00>: it's not the same object as __main__.my_process
home/ubuntu/mambaforge/envs/pygnssr2-dev/lib/python3.10/site-packages/distributed/worker_state_machine.py:3357: FutureWarning: The `Worker.nthreads` attribute has been moved to `Worker.state.nthreads`
warnings.warn(
2022-07-21 08:37:14,053 - distributed.worker - ERROR - Exception during execution of task my_process-0514e9dc6d6631bf45d18e50c3312d9a.
concurrent.futures.process._RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/ubuntu/mambaforge/envs/pygnssr2-dev/lib/python3.10/multiprocessing/queues.py", line 244, in _feed
obj = _ForkingPickler.dumps(obj)
File "/home/ubuntu/mambaforge/envs/pygnssr2-dev/lib/python3.10/multiprocessing/reduction.py", line 51, in dumpscls(buf, protocol).dump(obj)
_pickle.PicklingError: Can't pickle <function my_process at 0x7f83f862ab00>: it's not the same object as __main__.my_process
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/ubuntu/mambaforge/envs/pygnssr2-dev/lib/python3.10/site-packages/distributed/worker.py", line 2208, in execute
result =awaitself.loop.run_in_executor(
File "/home/ubuntu/mambaforge/envs/pygnssr2-dev/lib/python3.10/multiprocessing/queues.py", line 244, in _feed
obj = _ForkingPickler.dumps(obj)
File "/home/ubuntu/mambaforge/envs/pygnssr2-dev/lib/python3.10/multiprocessing/reduction.py", line 51, in dumpscls(buf, protocol).dump(obj)
_pickle.PicklingError: Can't pickle <function my_process at 0x7f83f862ab00>: it's not the same object as __main__.my_process
2022-07-21 08:37:14,062 - distributed.worker - ERROR - Exception during execution of task my_process-04f1de90691642d43d79802ef4ff84d0.
concurrent.futures.process._RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/ubuntu/mambaforge/envs/pygnssr2-dev/lib/python3.10/multiprocessing/queues.py", line 244, in _feed
obj = _ForkingPickler.dumps(obj)
File "/home/ubuntu/mambaforge/envs/pygnssr2-dev/lib/python3.10/multiprocessing/reduction.py", line 51, in dumpscls(buf, protocol).dump(obj)
_pickle.PicklingError: Can't pickle <function my_process at 0x7f83f862ab00>: it's not the same object as __main__.my_process
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/ubuntu/mambaforge/envs/pygnssr2-dev/lib/python3.10/site-packages/distributed/worker.py", line 2208, in execute
result =awaitself.loop.run_in_executor(
File "/home/ubuntu/mambaforge/envs/pygnssr2-dev/lib/python3.10/multiprocessing/queues.py", line 244, in _feed
obj = _ForkingPickler.dumps(obj)
File "/home/ubuntu/mambaforge/envs/pygnssr2-dev/lib/python3.10/multiprocessing/reduction.py", line 51, in dumpscls(buf, protocol).dump(obj)
_pickle.PicklingError: Can't pickle <function my_process at 0x7f83f862ab00>: it's not the same object as __main__.my_process
2022-07-21 08:37:14,063 - distributed.worker - ERROR - Exception during execution of task my_process-ec0c8bbbd0b9a7d11117f2690ad9a733.
concurrent.futures.process._RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/ubuntu/mambaforge/envs/pygnssr2-dev/lib/python3.10/multiprocessing/queues.py", line 244, in _feed
obj = _ForkingPickler.dumps(obj)
File "/home/ubuntu/mambaforge/envs/pygnssr2-dev/lib/python3.10/multiprocessing/reduction.py", line 51, in dumpscls(buf, protocol).dump(obj)
_pickle.PicklingError: Can't pickle <function my_process at 0x7f83f862ab00>: it's not the same object as __main__.my_process
"""
...
What is going on here is that the concurrent.futuresProcessPoolExecutormust have access to the __main__ in which the function is defined in order to work correctly. That's because pickle serializes things by reference instead of by value, so if __main__ isn't available in the subprocess, it doesn't know what function to run. In this case, the worker __main__ is not the same as the one in which the my_process function is defined. So the error message is actually a helpful one!
The way distributed projects typically get around this restriction in pickling is to use cloudpickle, which handles pickling things by value, which is particularly necessary functions defined interactively or in __main__. This is what the dask serialization protocol does, for instance, when sending interactively defined functions to workers. But the error we are seeing here is happening in a single worker, and is not going through the dask serialization protocol, so we run into the problem.
Fortunately, this is a problem that is nicely solved by the loky project. It provides its own implementation of a ProcessPoolExecutor which handles __main__ functions. If you take another look at the linked example above, it actually uses the lokyProcessPoolExecutor (presumably to get around exactly this issue).
Ideally, we would be able to identify this situation and handle it appropriately in dask, or at least provide a better error message. There are a number of reasons why it would be nice to make it easier for users to use process pools (cf #6325) For now, I would recommend the following workarounds to users:
If you are using concurrent.Futures.ProcessPoolExecutor, avoid referencing interactively defined functions, or those in __main__.
If you do want to use such functions, use loky.ProcessPoolExecutor.
What happened:
Using a custom executor using
WorkerPlugin
+ProcessPoolExecutor
causes_pickle.PicklingError: Can't pickle <function my_process at 0x7f83f862ab00>: it's not the same object as __main__.my_process
(This issue was originally reported on Discourse.)
What you expected to happen:
I'd have expected this to work (see additional notes below). If this is erroring intentionally, it'll be nice to raise a more informative message.
Minimal Complete Verifiable Example:
Error traceback:
Anything else we need to know?:
This code is coming from this video by Matt (+ this notebook) created in July 2021, where it seems to be working.
Environment:
@mrocklin Do you have thoughts on why this isn't working anymore?
cc @ncclementi
The text was updated successfully, but these errors were encountered: