-
-
Notifications
You must be signed in to change notification settings - Fork 345
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add subprocess support #791
Conversation
Adds `trio.subprocess.Process` and `trio.subprocess.run`, async wrappers for the stdlib subprocess module. No communicate() yet due to discussion on python-trio#783. Note that this did not wind up using the code in linux_waitpid, instead running waitid() [which supports not consuming the status of the process] in a thread, in order to interoperate better with subprocess.Popen code. Still TODO: [ ] Windows support (need pipe implementations and testing) [ ] documentation writeup, newsfragment [ ] consider whether this module layout is the one we want (other option would be to move _subprocess/unix_pipes.py to just _unix_pipes.py, _subprocess/highlevel.py to _subprocess.py, and get rid of _subprocess/linux_waitpid.py) [ ] expose other subprocess functions such as call, check_call, output (simple wrappers around run()) [ ] expose a basic communicate()?
Codecov Report
@@ Coverage Diff @@
## master #791 +/- ##
==========================================
+ Coverage 99.01% 99.23% +0.22%
==========================================
Files 94 101 +7
Lines 11631 12182 +551
Branches 832 882 +50
==========================================
+ Hits 11516 12089 +573
+ Misses 94 72 -22
Partials 21 21
Continue to review full report at Codecov.
|
First: thank you, this is amazing work. I'm super excited! This isn't a complete review (I'm still chewing on the high-level API design issues), but rather than hold things up I'll write down some of the simpler comments :-)
|
- _subprocess/highlevel.py becomes _subprocess.py - _subprocess/unix_pipes.py becomes _unix_pipes.py - _subprocess/linux_waitpid.py removed (obsoleted by use of `waitid`)
5-6. Makes sense, will make these changes.
|
Never mind - |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did a quick skim and made one comment, but there are several test failures and strange coverage results currently, so I figure I'll let you clear that up before I do a close read :-).
I'm wary about adding wait_process_exiting
as a generic API... the native wait APIs are pretty quirky. On Windows users should be encouraged to use a process handle, not a PID, to avoid races. Windows and kqueue platforms let you wait on arbitrary processes, but waitid
only works on children. Unix can wait for stop/continue, not just exiting, but Windows doesn't have anything like that. With any luck, Linux will eventually make it possible to get an fd referring to a child process and then use that to get death notifications, instead of using a PID. (So then it would work like Windows, basically. BTW, FreeBSD also has process descriptors, but only for child processes, and macOS doesn't support it at all...)
If all we expose is Process.wait
, then it's easy to hide all this complexity from users. If we expose wait_process_exiting
, we have to figure out which bits to abstract away and how...
My thought here was that if our model of what's possible is wrong, and there is some rare circumstance where we can get a kevent NOTE_EXIT for a process that's nowhere near exiting, it's better to use a lot of CPU than to hang. Happy to go either way on this one.
NOTE_EXIT
actually includes the exit status in the notification, so it seems unlikely that the kernel could send a spurious note for a process that hasn't actually exited. Looking at macOS kern_exit.c and FreeBSD kern_exit.c, they both seem to do some bookkeeping immediately after sending NOTE_EXIT
, so I guess it's just a little race waiting for this to happens.
I guess under really bizarre circumstances we could have:
- child process exits
- trio gets
NOTE_EXIT
- some other Evil Code that's running in our process calls
wait
to reap the child (since only our process can reap the child) - Evil Code in our process spawns another child, which happens to be assigned the same PID (since only a process we spawned is a valid argument to
waitpid
) - trio calls
waitpid
, and it hangs waiting for the new child
This seems pretty unlikely to me. And I don't think there's really any correct way to handle this situation anyway... the Unix subprocess API has a deeply baked-in assumption that the parent will be careful to call waitpid
exactly once on each child, no more, no less. The only real solution is for the user to remove the Evil Code.
I believe I've addressed everything that was outstanding except for docs and a few remaining corners of coverage. I haven't tested the Windows support with anything more strenuous than the unit tests, but it seems to be holding up well enough so far... |
We went with However, if you wanted to tweak the trio variant of the API on the basis of "Errors should never pass silently", then my recommendation would be to:
Edit to clarify: the |
I don't remember any particular design reason for making the default
check=False. I can see scenarios where I'd want that - e.g. run this
program and then exit with its error code, without printing a python
traceback in addition to the subprocess' error message. But I agree that
that's not the most common case.
I stuck with the awkward capture API to keep run() more compatible with
existing subprocess functions. I do still occasionally like the flexibility
of connecting std streams directly to a file descriptor, so I'd argue for
keeping those parameters even if there's a more convenient API for output
capture.
…On Fri, 21 Dec 2018, 11:39 a.m. Nick Coghlan ***@***.*** wrote:
We went with capture_output over capture_stdout and capture_stderr
because we wanted the calling application to have complete control over the
output display, and didn't want to add *3* new options when one would do
(if you only want to capture stdout, just pass stdout=PIPE, the same as
you always have).
However, if you wanted to tweak the trio variant of the API on the basis
of "Errors should never pass silently", then my recommendation would be to:
1.
Make check=True the default, so check=False is explicitly opt-in
(doing that in the subprocess module would have been overly
inconsistent with the other APIs in that module, but trio doesn't have
that problem)
2.
Change CompletedProcess.check_returncode() to print a captured stderr
to sys.stderr in the current process by default, with an explicit
print_captured_stderr=False opt-out on that method
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#791 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAUA9ebSckpMiwcySHWg7fbmwe9qxN9Pks5u7MhagaJpZM4Y2mkl>
.
|
I removed As far as I can tell, |
The change I just uploaded doesn't include the |
Yeah, I read through the original thread and didn't see any discussion about this either way. And of course you couldn't have done a corpus survey of how people were using the API you hadn't invented yet :-).
Right, I think it makes total sense to pass an explicit file descriptor, or
Hmmmm interesting idea. Need to ponder that some more. |
Agreed on PIPE. I can see the attraction of reusing the existing keyword
arguments, but it's definitely exposing an implementation detail.
Maybe a feasible improvement would be to simply rename PIPE to CAPTURE.
Still not the most beautiful API, but it would align better to how we talk
about it.
…On Fri, 21 Dec 2018, 8:28 p.m. Nathaniel J. Smith ***@***.*** wrote:
@takluyver <https://github.com/takluyver>
I don't remember any particular design reason for making the default
check=False.
Yeah, I read through the original thread and didn't see any discussion
about this either way. And of course you couldn't have done a corpus survey
of how people were using the API you hadn't invented yet :-).
I do still occasionally like the flexibility of connecting std streams
directly to a file descriptor
Right, I think it makes total sense to pass an explicit file descriptor,
or DEVNULL, or stderr=STDOUT. To me those fit in well with the general
philosophy of "here's **kwargs, for any long-tail quirky requirements".
It's specifically the PIPE case that I find weird, since it's exposing an
implementation detail. I bet there are lots of users who don't know that
output capturing involves pipes, or even what a pipe is.
@ncoghlan <https://github.com/ncoghlan>
Edit to clarify: the check=True behaviour would be the default behaviour
of check_returncode(print_captured_stderr=True).
Hmmmm interesting idea. Need to ponder that some more.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#791 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAUA9ZcmjxX-EAzEa9MmeZ3KJcxYkqTjks5u7UR4gaJpZM4Y2mkl>
.
|
@oremanj perhaps we should split out Note for records: significant discussion about this in chat, starting here: https://gitter.im/python-trio/general?at=5c1d6718babbc178b2bb1235 |
Okay, I split that off into #813. |
OK, I fixed the silly Windows test bug and I think this is ready modulo API tweaking. I'm a little reluctant to provide subprocess support without I suggested a little ways upthread that maybe we could keep the code in this PR in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two small comments, but otherwise yeah, I agree this is ready to go except for run
.
I'm not a big fan of having both trio.run_process
and trio.subprocess.run
, that do almost the same thing with slightly different APIs. Process
is the low-level API like trio.socket
.
Let's split it into a separate PR. It doesn't mean we aren't going to provide a high-level API; it means we can have a focused discussion on getting that API right, without being distracted by stuff like details of how IOCP works, and without holding up the rest of this.
finally: | ||
# Run loop is exited without unwinding running tasks, so | ||
# we don't get here until the main() coroutine is GC'ed | ||
assert left_run_yet |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assert
unfortunately doesn't accomplish much, since it's called from inside a __del__
method, so exceptions get converted into text-printed-on-stderr. And then pytest will hide that text. So even if it fails, we're unlikely to notice...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, if it is in fact called from inside __del__
, left_run_yet
should be True, and we're fine regardless. But if the task gets unwound normally, we won't be inside a __del__
, and the assert will fire.
trio/_core/tests/test_windows.py
Outdated
# Suppress the Nursery.__del__ assertion about dangling children, | ||
# for both the nursery we created and the system nursery | ||
nursery._children.clear() | ||
nursery.parent_task.parent_nursery._children.clear() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is just hiding some messages printed by __del__
, I think it'd be simpler to let them be printed, and let pytest hide them?
We should use gc_collect_harder()
though to make sure that they do get printed during this test, so pytest will attach them to the right place.
Excising |
To minimise refactoring without committing to a public 'run()' API yet, you could call the interim internal-only API '_run()' |
run() removed - @njsmith, does this look good for landing? |
🎉 Now to open all the follow-up tickets... :-) |
Followup issues:
#661 is also relevant (a subtle bug in |
Just curious - are the docs on this up to date or is that still TBD? Thanks for all the work on this! |
Subprocess support! Add
trio.subprocess
, an async wrapper around the stdlibsubprocess
module with a similar interface:Process
(renamed fromPopen
) to create a process and interact with it at your leisure,run
to run a process to completion and report the results, and reexports of all the stdlibsubprocess
exceptions and constants. There is asubprocess.Popen
object hiding inside eachtrio.subprocess.Process
, so all the stdlib options for starting processes in various wacky ways work unmodified. (Currently we only support unbuffered byte streams for communicating with subprocess pipes, so theuniversal_newlines
,encoding
,errors
, andbufsize
arguments tosubprocess.Popen
are not supported by the Trio version.) Instead of file objects, thestdin
,stdout
, andstderr
attributes of atrio.subprocess.Process
are Trio streams. There is notrio.subprocess.Process.communicate
because the stdlib version has confusing cancellation semantics - usetrio.subprocess.run
or interact with the pipe streams directly.Fixes #4. Substantially improves our IOCP support (Windows native async I/O), which is relevant to #52.
Still TODO:
Process
vsPopen
namingwait_child_exiting
taking a stdlibsubprocess.Popen
vs atrio.subprocess.Process
(or Popen or whatever we call it)