Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support trio-run-in-process as process spawning backend #95

Merged
merged 33 commits into from
Jan 31, 2020
Merged

Conversation

goodboy
Copy link
Owner

@goodboy goodboy commented Jan 20, 2020

Resolves #93.

For those interested in support for alternatives to multiprocessing this PR bring in the first new choice: trio-run-in-process. I had to rejig the actor nursery machinery but I think it's overall resulted in more trionic, easier to understand code.

trip (trio-run-in-process) seems to not suffer the ailments of the spawning in mp namely infinitely recursive, structured concurrent, actor pools seem to be quite reliable. The spawning is definitely slower then the forkserver method still but I think this is a decent first step toward trio-izing process launching entirely. I'm interested now to see how hard it would be to roll our own forkserver method using just trio APIs.

The test suite is likely to fail at first since I haven't integrated trip into the test generation system quite yet. Failure on Windows is expected here out of the box as well.

I would appreciate any code reviews or criticisms from lurkers ;)
ping @ryanhiebert

Get a few more things working:
- fail reliably when remote module loading goes awry
- do a real hacky job of module loading using `sys.path` stuffsies
- we're still totally borked when trying to spin up and quickly cancel
a bunch of subactors...

It's a small move forward I guess.
This took a ton of tinkering and a rework of the actor nursery tear down
logic. The main changes include:

- each subprocess is now spawned from inside a trio task
from one of two containing nurseries created in the body of
`tractor.open_nursery()`: one for `run_in_actor()` processes and one for
`start_actor()` "daemons". This is to address the need for
`trio-run-in_process.open_in_process()` opening a nursery which must
be closed from the same task that opened it. Using this same approach
for `multiprocessing` seems to work well. The nurseries are waited in
order (rip actors then daemon actors) during tear down which allows
for avoiding the recursive re-entry of `ActorNursery.wait()` handled
prior.

- pull out all the nested functions / closures that were in
`ActorNursery.wait()` and move into the `_spawn` module such that
that process shutdown logic takes place in each containing task's
code path. This allows for vastly simplifying `.wait()` to just contain an
event trigger which initiates process waiting / result collection.
Likely `.wait()` should just be removed since it can no longer be used
to synchronously wait on the actor nursery.

- drop `ActorNursery.__aenter__()` / `.__atexit__()` and move this
"supervisor" tear down logic into the closing block of `open_nursery()`.
This not only cleans makes the code more comprehensible it also
makes our nursery implementation look more like the one in `trio`.

Resolves #93
Copy link
Collaborator

@ryanhiebert ryanhiebert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is neat work! It looks like trip make things conceptually simpler.

@@ -60,7 +60,7 @@ def test_rpc_errors(arb_addr, to_call, testdir):
if exposed_mods == ['tmp_mod']:
# create an importable module with a bad import
testdir.syspathinsert()
# module should cause raise a ModuleNotFoundError at import
# module should cause a raise of a ModuleNotFoundError at import
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warning, I'm pedantic. I'd have worded this as module should raise a ModuleNotFoundError at import. Shorter and grammatically correct.

# https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods
start_method: str = 'forkserver',
# OR `trio-run-in-process` (the new default).
start_method: str = 'trip',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is confusing, since it seems like its saying that trio-run-in-process is the new default, but it's actually the shortened version trip that is the default. I'd pick one or the other as the "correct" spelling in the code.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think the long or short name is better?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the short name better, but I think it would be best to be consistent with the package name. Maybe we'll eventually get a package with a shorter, more user friendly name. Then I'll be very happy to see us using that name instead.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ryanhiebert went with the explicit name as asked.

It seems that mixing the two backends in the test suite results in hangs
due to lingering forkservers and resource managers from
`multiprocessing`? Likely we'll need either 2 separate CI runs to work
or someway to be sure that these lingering servers are killed in between
tests.
Set `trio-run-in-process` as the default on *nix systems and
`multiprocessing`'s spawn method on Windows. Enable overriding the
default choice using `tractor._spawn.try_set_start_method()`. Allows
for easy runs of the test suite using a user chosen backend.
Add a `--spawn-backend` option which can be set to one of {'mp',
'trio_run_in_process'} which will either run the test suite using the
`multiprocessing` or `trio-run-in-process` backend respectively.
Currently trying to run both in the same session can result in hangs
seemingly due to a lack of cleanup of forkservers / resource trackers
from `multiprocessing` which cause broken pipe errors on occasion (no
idea on the details).

For `test_cancellation.py::test_nested_multierrors`, use less nesting
when mp is used since it breaks if we push it too hard with the
whole recursive subprocess spawning thing...
@goodboy goodboy force-pushed the try_trip branch 2 times, most recently from aa26e9a to 1d9d15a Compare January 27, 2020 21:02
@goodboy
Copy link
Owner Author

goodboy commented Jan 28, 2020

If no one is going to review or complain about this I might just merge it and hope for the best.
There's only a few more things I want to get done before pushing an alpha version onto pypi so speak now or never 😉

@ryanhiebert
Copy link
Collaborator

Fine by me. As long as you're happy and it's still running, don't let your momentum be stalled... Sometimes you're the only one looking.

# passed through to actor main
bind_addr: Tuple[str, int],
parent_addr: Tuple[str, int],
) -> mp.Process:
begin_wait_phase: trio.Event,
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh dang do we need this?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope. Was left over from an old approach.

Thanks to @salotz for pointing out that the first example in the docs
was broken. Though it's somewhat embarrassing this might also explain
the problem in #79 and certain issues in #59...

The solution here is to import the target RPC module using the its
unique basename and absolute filepath in the sub-actor that requires it.
Special handling for `__main__` and `__mp_main__` is needed since the
spawned subprocess will have no knowledge about these parent-
-state-specific module variables. Solution: map the modules name to the
respective module file basename in the child process since the module
variables will of course have different values in children.
@goodboy
Copy link
Owner Author

goodboy commented Jan 29, 2020

Big props to @salotz for running the first example in the docs and finding it was totally borked...

2a43079 fixes this and should probably fix issues others have seen on Windows to do with __mp_main__ not found errors. For now that commit introduces more manual module loading in sub-actors, something multiprocessing was doing for us behind the scenes before.

Instead of hackery trying to map modules manually from the filesystem
let Python do all the work by simply copying what ``multiprocessing``
does to "fixup the __main__ module" in spawned subprocesses. The new
private module ``_mp_fixup_main.py`` is simply cherry picked code from
``multiprocessing.spawn`` which does just that. We only need these
"fixups" when using a backend other then ``multiprocessing``; for
now just when using ``trio_run_in_process``.
@goodboy
Copy link
Owner Author

goodboy commented Jan 30, 2020

Just so it's noted, I actually instead just copied the __main__ fixups stuff multiprocessing does into a new private module (see the commit message for deats). This technique seems to be the right way to make subprocesses behave in the expected way in terms of module loading. We only use it when not using the mp backend (i.e. for now when using trio_run_in_process).

Copy link
Collaborator

@salotz salotz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its going to take me some more time with the code base before I really understand the whole multiprocessing module load thing, so it was mostly pig latin for me. The one thing I did comment on is not really related to this PR, and is just a request to refactor some stuff which will have an impact on APIs.

# https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods
start_method: str = 'forkserver',
# OR `trio_run_in_process` (the new default).
start_method: Optional[str] = None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should just default to the string for 'trip'. I think you set it to None here because that is what the try_set_start_method returns for trip. I don't think that's how that function should behave, so once that gets refactored this should just be a string type with whatever the string repr for that mode is.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

understand the whole multiprocessing module load thing

If you're talking about the forkserver hack stuff, yeah don't worry about it. I'm starting to think we'll drop all that to write our own anyway.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@salotz so as you can see from the build triggered after fb93336 which failed, on windows we need to have a different default. If 'trio_run_in_process' is passed to try_set_start_method() we're going to get an error on windows (I think this is correct since we want that function to explicitly fail when an incorrect method is chosen on the platform).

This is the small dilemma, on different systems we require different default start methods. We can either leave it as I had it -> None is default and implies the correct actual default for the system is chosen internally or we move that same logic into tractor.run() so that's it's more clear that's what's happening?

Interested in your opinion here :)

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah and the other reason to leave that default as None and not have the default-for-the-system logic here would be that when we get alternatives for windows then we'll want to pass through the user chosen value and validate it.

**kwargs,
) -> Any:
"""Run a trio-actor async function in process.

This is tractor's main entry and the start point for any async actor.
"""
_spawn.try_set_start_method(start_method)
if start_method is not None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

relatedly wouldn't need this check here with above refactoring.

# marked by the process spawning backend at startup
# will be None for the parent most process started manually
# by the user (currently called the "arbiter")
_spawn_method: Optional[str] = None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto. Unless this has different semantics which is to just choose whichever was chosen in run, in which case the Optional would be correct.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@salotz for the parent-most (called arbiter) right now this will actually stay as None since there was no spawn method that created the top most process.

.travis.yml Outdated
python: 3.7 # this works for Linux but is ignored on macOS or Windows
env: SPAWN_BACKEND="trio_run_in_process"

- name: "Pytron 3.8: multiprocessing"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo



def try_set_start_method(name: str) -> mp.context.BaseContext:
def try_set_start_method(name: str) -> Optional[mp.context.BaseContext]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto with above. Optional and Union return values is a code smell anyways

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing is for trip there is no context possible to return, nor is this function called in that case.
I'm not sure how else to specify the output if it's not this?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either make a wrapper function that does produce a context or this function simply decides on the start method given the platform yadayada and then outputs a string/enum. I don't think it blocks this PR btw

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

KK we can discuss further in #101

@goodboy goodboy mentioned this pull request Jan 30, 2020
mod = importlib.import_module(name)
mods[name] = _get_mod_abspath(mod)

self.rpc_module_paths = mods
self._mods: dict = {}
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably type annotate this.

# XXX append the allowed module to the python path which
# should allow for relative (at least downward) imports.
sys.path.append(os.path.dirname(filepath))
# XXX leaving this in for now incase we decide to swap
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this isn't worth keeping in for now.
I can always dig up how to do it again if someone complains about the path mutating approach.

@@ -3,7 +3,14 @@

Mostly just wrapping around ``multiprocessing``.
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment is wrong now..



def try_set_start_method(name: str) -> mp.context.BaseContext:
def try_set_start_method(name: str) -> Optional[mp.context.BaseContext]:
"""Attempt to set the start method for ``multiprocess.Process`` spawning.
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doc is also out of date - doesn't include trip deats.

# unblock parent task
task_status.started(portal)

# wait for ActorNursery.wait() to be called
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm is this comment right still?

# an `@asynccontextmanager` which has an internal nursery *and* the
# task that opens a nursery **must also close it**.
errors: Dict[Tuple[str, str], Exception] = {}
async with trio.open_nursery() as da_nursery:
Copy link
Owner Author

@goodboy goodboy Jan 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should probably comment on the difference between da_ (daemon actor) and ria_ (run in actor) nurseries..

# last bit before first nursery block ends
log.debug(f"Waiting on all subactors to complete")
anursery._join_procs.set()
# ria_nursery scope
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should probably mark the same for the da_nursery scope

The logic in the `ActorNursery` block is critical to cancellation
semantics and in particular, understanding how supervisor strategies are
invoked. Stick in a bunch of explanatory comments to clear up these
details and also prepare to introduce more supervisor strats besides
the current one-cancels-all approach.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Try trio_run_in_process (aka trip)
3 participants