-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support trio-run-in-process as process spawning backend #95
Conversation
Get a few more things working: - fail reliably when remote module loading goes awry - do a real hacky job of module loading using `sys.path` stuffsies - we're still totally borked when trying to spin up and quickly cancel a bunch of subactors... It's a small move forward I guess.
This took a ton of tinkering and a rework of the actor nursery tear down logic. The main changes include: - each subprocess is now spawned from inside a trio task from one of two containing nurseries created in the body of `tractor.open_nursery()`: one for `run_in_actor()` processes and one for `start_actor()` "daemons". This is to address the need for `trio-run-in_process.open_in_process()` opening a nursery which must be closed from the same task that opened it. Using this same approach for `multiprocessing` seems to work well. The nurseries are waited in order (rip actors then daemon actors) during tear down which allows for avoiding the recursive re-entry of `ActorNursery.wait()` handled prior. - pull out all the nested functions / closures that were in `ActorNursery.wait()` and move into the `_spawn` module such that that process shutdown logic takes place in each containing task's code path. This allows for vastly simplifying `.wait()` to just contain an event trigger which initiates process waiting / result collection. Likely `.wait()` should just be removed since it can no longer be used to synchronously wait on the actor nursery. - drop `ActorNursery.__aenter__()` / `.__atexit__()` and move this "supervisor" tear down logic into the closing block of `open_nursery()`. This not only cleans makes the code more comprehensible it also makes our nursery implementation look more like the one in `trio`. Resolves #93
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is neat work! It looks like trip make things conceptually simpler.
tests/test_rpc.py
Outdated
@@ -60,7 +60,7 @@ def test_rpc_errors(arb_addr, to_call, testdir): | |||
if exposed_mods == ['tmp_mod']: | |||
# create an importable module with a bad import | |||
testdir.syspathinsert() | |||
# module should cause raise a ModuleNotFoundError at import | |||
# module should cause a raise of a ModuleNotFoundError at import |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Warning, I'm pedantic. I'd have worded this as module should raise a ModuleNotFoundError at import
. Shorter and grammatically correct.
tractor/__init__.py
Outdated
# https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods | ||
start_method: str = 'forkserver', | ||
# OR `trio-run-in-process` (the new default). | ||
start_method: str = 'trip', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is confusing, since it seems like its saying that trio-run-in-process
is the new default, but it's actually the shortened version trip
that is the default. I'd pick one or the other as the "correct" spelling in the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think the long or short name is better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the short name better, but I think it would be best to be consistent with the package name. Maybe we'll eventually get a package with a shorter, more user friendly name. Then I'll be very happy to see us using that name instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ryanhiebert went with the explicit name as asked.
It seems that mixing the two backends in the test suite results in hangs due to lingering forkservers and resource managers from `multiprocessing`? Likely we'll need either 2 separate CI runs to work or someway to be sure that these lingering servers are killed in between tests.
Set `trio-run-in-process` as the default on *nix systems and `multiprocessing`'s spawn method on Windows. Enable overriding the default choice using `tractor._spawn.try_set_start_method()`. Allows for easy runs of the test suite using a user chosen backend.
Add a `--spawn-backend` option which can be set to one of {'mp', 'trio_run_in_process'} which will either run the test suite using the `multiprocessing` or `trio-run-in-process` backend respectively. Currently trying to run both in the same session can result in hangs seemingly due to a lack of cleanup of forkservers / resource trackers from `multiprocessing` which cause broken pipe errors on occasion (no idea on the details). For `test_cancellation.py::test_nested_multierrors`, use less nesting when mp is used since it breaks if we push it too hard with the whole recursive subprocess spawning thing...
aa26e9a
to
1d9d15a
Compare
If no one is going to review or complain about this I might just merge it and hope for the best. |
Fine by me. As long as you're happy and it's still running, don't let your momentum be stalled... Sometimes you're the only one looking. |
tractor/_spawn.py
Outdated
# passed through to actor main | ||
bind_addr: Tuple[str, int], | ||
parent_addr: Tuple[str, int], | ||
) -> mp.Process: | ||
begin_wait_phase: trio.Event, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh dang do we need this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope. Was left over from an old approach.
Thanks to @salotz for pointing out that the first example in the docs was broken. Though it's somewhat embarrassing this might also explain the problem in #79 and certain issues in #59... The solution here is to import the target RPC module using the its unique basename and absolute filepath in the sub-actor that requires it. Special handling for `__main__` and `__mp_main__` is needed since the spawned subprocess will have no knowledge about these parent- -state-specific module variables. Solution: map the modules name to the respective module file basename in the child process since the module variables will of course have different values in children.
Big props to @salotz for running the first example in the docs and finding it was totally borked... 2a43079 fixes this and should probably fix issues others have seen on Windows to do with |
Instead of hackery trying to map modules manually from the filesystem let Python do all the work by simply copying what ``multiprocessing`` does to "fixup the __main__ module" in spawned subprocesses. The new private module ``_mp_fixup_main.py`` is simply cherry picked code from ``multiprocessing.spawn`` which does just that. We only need these "fixups" when using a backend other then ``multiprocessing``; for now just when using ``trio_run_in_process``.
Just so it's noted, I actually instead just copied the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its going to take me some more time with the code base before I really understand the whole multiprocessing module load thing, so it was mostly pig latin for me. The one thing I did comment on is not really related to this PR, and is just a request to refactor some stuff which will have an impact on APIs.
# https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods | ||
start_method: str = 'forkserver', | ||
# OR `trio_run_in_process` (the new default). | ||
start_method: Optional[str] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should just default to the string for 'trip'. I think you set it to None here because that is what the try_set_start_method
returns for trip. I don't think that's how that function should behave, so once that gets refactored this should just be a string type with whatever the string repr for that mode is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
understand the whole multiprocessing module load thing
If you're talking about the forkserver hack stuff, yeah don't worry about it. I'm starting to think we'll drop all that to write our own anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@salotz so as you can see from the build triggered after fb93336 which failed, on windows we need to have a different default. If 'trio_run_in_process'
is passed to try_set_start_method()
we're going to get an error on windows (I think this is correct since we want that function to explicitly fail when an incorrect method is chosen on the platform).
This is the small dilemma, on different systems we require different default start methods. We can either leave it as I had it -> None
is default and implies the correct actual default for the system is chosen internally or we move that same logic into tractor.run()
so that's it's more clear that's what's happening?
Interested in your opinion here :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah and the other reason to leave that default as None
and not have the default-for-the-system logic here would be that when we get alternatives for windows then we'll want to pass through the user chosen value and validate it.
**kwargs, | ||
) -> Any: | ||
"""Run a trio-actor async function in process. | ||
|
||
This is tractor's main entry and the start point for any async actor. | ||
""" | ||
_spawn.try_set_start_method(start_method) | ||
if start_method is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
relatedly wouldn't need this check here with above refactoring.
# marked by the process spawning backend at startup | ||
# will be None for the parent most process started manually | ||
# by the user (currently called the "arbiter") | ||
_spawn_method: Optional[str] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto. Unless this has different semantics which is to just choose whichever was chosen in run
, in which case the Optional would be correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@salotz for the parent-most (called arbiter) right now this will actually stay as None
since there was no spawn method that created the top most process.
.travis.yml
Outdated
python: 3.7 # this works for Linux but is ignored on macOS or Windows | ||
env: SPAWN_BACKEND="trio_run_in_process" | ||
|
||
- name: "Pytron 3.8: multiprocessing" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo
|
||
|
||
def try_set_start_method(name: str) -> mp.context.BaseContext: | ||
def try_set_start_method(name: str) -> Optional[mp.context.BaseContext]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto with above. Optional and Union return values is a code smell anyways
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The thing is for trip
there is no context possible to return, nor is this function called in that case.
I'm not sure how else to specify the output if it's not this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either make a wrapper function that does produce a context or this function simply decides on the start method given the platform yadayada and then outputs a string/enum. I don't think it blocks this PR btw
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
KK we can discuss further in #101
tractor/_actor.py
Outdated
mod = importlib.import_module(name) | ||
mods[name] = _get_mod_abspath(mod) | ||
|
||
self.rpc_module_paths = mods | ||
self._mods: dict = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should probably type annotate this.
tractor/_actor.py
Outdated
# XXX append the allowed module to the python path which | ||
# should allow for relative (at least downward) imports. | ||
sys.path.append(os.path.dirname(filepath)) | ||
# XXX leaving this in for now incase we decide to swap |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this isn't worth keeping in for now.
I can always dig up how to do it again if someone complains about the path mutating approach.
tractor/_spawn.py
Outdated
@@ -3,7 +3,14 @@ | |||
|
|||
Mostly just wrapping around ``multiprocessing``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment is wrong now..
tractor/_spawn.py
Outdated
|
||
|
||
def try_set_start_method(name: str) -> mp.context.BaseContext: | ||
def try_set_start_method(name: str) -> Optional[mp.context.BaseContext]: | ||
"""Attempt to set the start method for ``multiprocess.Process`` spawning. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doc is also out of date - doesn't include trip deats.
tractor/_spawn.py
Outdated
# unblock parent task | ||
task_status.started(portal) | ||
|
||
# wait for ActorNursery.wait() to be called |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm is this comment right still?
# an `@asynccontextmanager` which has an internal nursery *and* the | ||
# task that opens a nursery **must also close it**. | ||
errors: Dict[Tuple[str, str], Exception] = {} | ||
async with trio.open_nursery() as da_nursery: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should probably comment on the difference between da_ (daemon actor) and ria_ (run in actor) nurseries..
tractor/_trionics.py
Outdated
# last bit before first nursery block ends | ||
log.debug(f"Waiting on all subactors to complete") | ||
anursery._join_procs.set() | ||
# ria_nursery scope |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should probably mark the same for the da_nursery
scope
The logic in the `ActorNursery` block is critical to cancellation semantics and in particular, understanding how supervisor strategies are invoked. Stick in a bunch of explanatory comments to clear up these details and also prepare to introduce more supervisor strats besides the current one-cancels-all approach.
Resolves #93.
For those interested in support for alternatives to
multiprocessing
this PR bring in the first new choice:trio-run-in-process
. I had to rejig the actor nursery machinery but I think it's overall resulted in more trionic, easier to understand code.trip (trio-run-in-process) seems to not suffer the ailments of the spawning in
mp
namely infinitely recursive, structured concurrent, actor pools seem to be quite reliable. The spawning is definitely slower then the forkserver method still but I think this is a decent first step toward trio-izing process launching entirely. I'm interested now to see how hard it would be to roll our own forkserver method using justtrio
APIs.The test suite is likely to fail at first since I haven't integrated
trip
into the test generation system quite yet. Failure on Windows is expected here out of the box as well.I would appreciate any code reviews or criticisms from lurkers ;)
ping @ryanhiebert