Implementation of coroutines with asyncio (possibly shortens `run_command` substantially) #90

krinsman · 2018-05-30T18:33:27Z

Apparently (according to the Tornado wiki) @gen.coroutine has been deprecated by the Tornado project itself in favor of "native" coroutines implemented with the asyncio module, and as of Tornado v. 5.0, even tornado itself is written using asyncio, instead of tornado.gen.coroutine. Also most of the classes in the parent JupyterHub project are written using asyncio instead of torndao.gen.coroutine.

So it seemed to me like a possibly good idea if the coroutines in batchspawner were reimplemented using asyncio instead of tornado.gen.coroutine, to avoid problems with Tornado or JupyterHub possibly making breaking changes to the code in the future.

The documentation for the parent Spawner class in JupyterHub (if I remember correctly) still says that the start, stop, and poll methods have to be implemented as tornado.gen.coroutine's in any child class, so I didn't change those methods. For the remaining coroutines, the only method for which the changes involved anything more substantial than replacing @gen.coroutine \n def with async def and yield with await was run_command.

I think there are mistakes in my attempt to implement run_command using asyncio, since the testing I did was limited to dummy systems and not a real production deployment/environment. What follows is a summary of most of the changes I made (I might have forgotten some) and why I thought they were appropriate. Since you obviously understand the source code much better than I do, you will be able to determine whether my reasoning is appropriate.

The tornado.process.Subprocess class replaces subprocess.Popen, which is also what asyncio.subprocess.Process class does. The easiest ways (i.e. the only ways that seem to be analogous to calling the constructor of subprocess.Popen) to initialize an instance of that class are the functions asyncio.create_subprocess_exec and asyncio.create_subprocess_shell. asyncio.create_subprocess_exec has a really annoying format for accepting commands (every word has to be a separate argument in *args (in subprocess.Popen with shell=False one can use a list of strings), and since the point of run_command is to run a shell command, and the syntax for asyncio.create_subprocess_shell is both simpler and more analogous, I used that instead. That's why there is no shell=True in the call to asyncio.create_subprocess_shell.
It says somewhere in the documentation for both asyncio.create_subprocess_shell and asyncio.subprocess.Process that any extra keyword arguments get passed to the constructor of subprocess.Popen (or a very analogous constructor), so I left env=env.
As far as I understand, tornado.Subprocess.STREAM is a replacement for subprocess.PIPE, which is also what asyncio.subprocess.PIPE is supposed to be, so I changed all of stdin, stdout, stderr from tornado.Subprocess.STREAM to asyncio.subprocess.PIPE in the input to asyncio.create_subprocess_shell.
Everything is the same until a little bit into the if input: statement block. Apparently one isn't supposed to use stdin.write(), according to here. What the documentation recommends using instead is calling the communicate method of asyncio.subprocess.Process, which is what I do below. Since, seemingly, the call to the write method of proc.stdin is unnecessary if one uses the communicate method of proc, I deleted the whole try-except block involving it. (It seems that the input argument of proc.communicate is the part that replaces proc.stdin.write, so that is why I call proc.communicate with input=inbytes. I am not 100% sure that's right though.)
Also, as far as I can tell, proc.stdin will be closed automatically by the call to the communicate method of proc. But if that isn't correct, one could still re-add proc.stdin.close() since proc.stdin is a StreamWriter which has a close method (as opposed to stdout and stderr which are StreamReader's, which appear not to have any close method).
proc.communicate returns a tuple containing both the standard output and the standard error (the way subprocess.Popen does) so it seemed possible to merge the assignments of out and eout into one line.
While StreamReader's don't seem to have a close method, they can still be in a closed state, and based on my testing both proc.stdout and proc.stderr are closed automatically by the call of proc.communicate, so it seemed possible to delete these two lines explicitly closing them.
The last major change was merging the final try-except block with the concluding if-else block. This is the hardest part for me to justify, since I'm not entirely certain I understand what was intended to be accomplished by the original code, but here goes. As far as I can tell based on the documentation, tornado.Subprocess.wait_for_exit is intended as a replacement for subprocess.Popen.wait. Analogously, asyncio.subprocess.Process.wait is also a replacement for Popen.wait according to the documentation. So the big uncertainty for me was whether or not I should leave in a line saying proc.wait(). According to the documentation, apparently proc.wait might deadlock when using asyncio.subprocess.PIPE's for stdout and stderr. Since that is what is being done above, and deadlocks are bad, it seemed like I shouldn't use proc.wait. The documentation recommends using communicate instead, which is what was already done above. Seemingly the reason for calling tornado.Subprocess.wait_for_exit in the original code was to get the return code attribute of the process and assign it to err. It's unclear to me though because the tornado documentation only says that wait_for_exit returns a Future which resolves when the process exits -- but it doesn't clarify what the Future resolves to. Since the same documentation says that waitt_for_exit is supposed to replace Popen.wait, which returns the return code of the process, as does asyncio.subprocess.Process.wait, and because of the if-else block seems to be based on whether or not the return code (if the relevant Future was actually resolved) is non-zero, and the log message in the try-except block seems to be intended to explain how the process had non-zero exit status, my operating assumption has been that the value of proc.returncode set by proc.communicate suffices as a replacement for err. But since proc.wait isn't being called, the corresponding try-except block no longer seemed applicable, so I removed the non-applicable parts and merged the remaining applicable parts into the if statement. Also, since raising an exception and returning a value are mutually exclusive, in the if statement I had to choose between keeping return err and raise RuntimeError(eout). The latter seemed more informative, so that's why I kept it instead of return err. (Also since the value of err would be in the log.)

That was probably way too much unnecessary detail, but I just wanted to explain that, if I did butcher the code by removing too many essential parts, my intentions in doing so were benign.

krinsman · 2018-05-30T18:58:05Z

Looking at the details of the Travis CI output, it looks like the changes actually passed for the two builds using JupyterHub version 0.9, and failed only for those builds using older versions of JupyterHub, which is somewhat to be expected since apparently the implementation of asyncio coroutines is new to JupyterHub version 0.9.

Also, the changes failed the fastest for the builds using Python 3.4, which is also to be expected since asyncio was implemented starting with Python 3.5, if I recall correctly.

One thing that I noticed in my own testing: it might make sense to add a timeout (by wrapping asyncio.wait_for around the call to proc.communicate) inside of a try-except block (with the except part involving proc.kill and asyncio.TimeoutError), because if someone chooses something dumb like /bin/sh for the command cmd, then the process won't fail, so it won't ever close. But since there wasn't such a try-except block in the original code, which I tried as much as possible to imitate, I didn't add one.

rkdarst · 2018-05-31T04:58:10Z

You could check #78 for discussion of what versions of Python should be supported and #86 for something else that needs to add a coroutine and the difficulties with Python 3.4. I'm not a maintainer so I can't say what will work, but from what I can tell, supporting Python 3.4 or not is the most important decision to make, and then we can go forward. With HPC stuff I think backwards compatibility is very important, but 3.5 seems like a turning point. Too bad 3.4 is a turning point for redhad/CentOS now.

krinsman · 2018-06-01T21:56:56Z

@rkdarst I think you're right, this seems to be an idea whose time has not yet come.

rkdarst · 2018-06-18T07:46:45Z

But I still think that this could be done compatible to at least 3.4 - that's what JupyterHub does. Maybe even 3.3. My knowledge of asyncio is really limited and I'm basically just guessing. Also look at the implementation of start, stop, poll, etc. - these are coroutines with the syntax that works.

@minrk - sorry to bother you so much, but do you have any suggestions here? It would be nice to take this.

willingc · 2018-06-18T15:53:05Z

@rkdarst JupyterHub 0.9 supports 3.5+. As an FYI, we also use async_generator to provide support back to 3.5 for async generators.

I haven't looked too closely at this PR but perhaps have the spawner support for JupyterHub 0.8.1 to Python 3.4. And Jupyter 0.9 for Python 3.5+.

rkdarst · 2018-06-18T21:13:48Z

The problem with 3.4 is async/await is source-level incompatible...

Does this end the end do three things:

use python 3.5 async/await syntax
use asyncio.create_subprocess_shell instead of tornado.process.Subprocess... which also requires 3.5.

Is there any part we can split out and use right away? Or should we postpone the whole thing until we don't want 3.4 as a requirement for batchspawner?

I'm sorry for basically going in circles, I can't do much really and I haven't yet fully understood how JH has changed recently with regard to asyncio.

minrk · 2018-06-19T13:18:28Z

For the most part, JupyterHub 0.9's switch to asyncio shouldn't have changed much. tornado and asyncio are generally inter-compatible. It's mainly the adoption of async def syntax that caused the requirement of 3.5. This was elected over @asyncio.coroutine syntax because we get better diagnostics, etc. from tracebacks.

If you want to keep Python 3.4 and jupyterhub 0.8 support, this should be doable. tornado and asyncio are generally inter-compatible, so adopting asyncio doesn't mean you have to switch from tornado.process.Subprocess.

If supporting 3.4 is enough of a challenge, though, then I'd suggest requiring Python 3.5 for the next release of BatchSpawner. This should still work with JupyterHub 0.8, as long as it's running with Python 3.5, as tornado in general accepts asyncio coroutines and Futures.

rkdarst · 2018-06-20T12:52:59Z

OK, thanks @minrk. I guess that confirms that this PR is good in the long run, but (also from looking through it again) there isn't much we need _right_now, except...

I can see at least one thing that would be useful already / latent bugs:

the run_command RuntimeError could include or print stdout as well as stderr. No reason to hid useful debugging info.
the use of tornado.process.Subprocess could block because of the way stdout and stderr are separately read from. (but I don't know offhand the async way to fix this)

I can make a PR inspired by these things, but if you want to instead I'll give you first opportunity (might reduce forward-porting conflicts...).

minrk · 2018-06-20T15:03:53Z

the run_command RuntimeError could include or print stdout as well as stderr. No reason to hid useful debugging info.

Often it's not so useful to separate stdout and stderr, so using stdout=PIPE and stderr=STDOUT joins them into a single stream which you can re-display when the cause arises. This is the only reliable way to get them correctly ordered if both stdout and stderr are producing messages that you are interested in.

the use of tornado.process.Subprocess could block because of the way stdout and stderr are separately read from.

It's calling await proc.communicate(), no? That shouldn't block.

rkdarst · 2018-06-20T15:12:32Z

the run_command RuntimeError could include or print stdout as well as stderr. No reason to hid useful debugging info.

Often it's not so useful to separate stdout and stderr

I thought about combining, but the output is parsed against regexs in some spawners. The spawners I use should deal with combined streams... but didn't want to risk breaking things I don't know about! At least not until we get spawner maintainers who can confirm it's OK.

the use of tornado.process.Subprocess could block because of the way stdout and stderr are separately read from.

It's calling await proc.communicate(), no? That shouldn't block.

I should have clarified this... this PR fixes this, but exists in the base code. And we won't take this PR until we drop 3.4, so... do we want something in the meantime? I doubt typical spawners would fill the buffers since usually it isn't much output, but who knows.

minrk · 2018-06-21T11:38:49Z

didn't want to risk breaking things I don't know about!

Good plan. If stdout is parsed for output, then keeping it separate is a good idea. In that case, just displaying both separately in case of error is probably right.

I should have clarified this... this PR fixes this, but exists in the base code.

Before it was using yield read_until_close, so it should also have been non-blocking.

rkdarst · 2018-06-21T11:51:42Z

Before it was using yield read_until_close, so it should also have been non-blocking.

What if stderr buffer fills, process blocks, and stdout can't close? (I thought this was one of the usual problems with reading these)

I checked some and this is what I thought was the way to do it:

        out, eout = yield [proc.stdout.read_until_close(),
                           proc.stderr.read_until_close()]

I'll bet this has never happened in the history of batchspawner and probably won't, but...

minrk · 2018-06-22T08:55:34Z

I'm not sure you can await a list, you might need to do it in two statements, or wrap it in gen.multi:

out, eout = yield gen.multi([...])

I still don't think these should block, though. If a pipe fills up, it's the writer that should block, not the reader.

@krinsman

- Reading from stdout and stderr separately can produce a deadlock. I assume that the separate proc.wait_for_exit() doesn't matter here. - Thanks to @krinsman in jupyterhub#90.

@krinsman

- Thanks to @krinsman in jupyterhub#90.

@krinsman

- Thanks to @krinsman in jupyterhub#90.

@krinsman

- Reading from stdout and stderr separately can produce a deadlock. I assume that the separate proc.wait_for_exit() doesn't matter here. - Thanks to @krinsman in jupyterhub#90.

krinsman · 2018-08-16T16:17:51Z

As before, it seems that this still can't be merged until support for JupyterHub versions <0.9 is dropped. That being said, it seemed prudent to keep this up-to-date with master for now.

mbmilligan · 2018-08-16T17:00:46Z

Well it's mainly about Python 3.4 support, but yes, it is prudent to keep this updated until we are in a position to merge it. So thank you for staying on top of this!

Integration: #143 (inc. #141, #139), #130, #86, #90

implemented coroutines with asyncio

207c602

William Krinsman added 2 commits June 4, 2018 18:11

Made start, stop, poll asyncio coroutines too.

28a7207

Maybe better errorr handling/catching.

ee961fe

rkdarst added a commit to rkdarst/batchspawner that referenced this pull request Jul 25, 2018

run_command: Increase debugging output in case of error

941ca27

- Thanks to @krinsman in jupyterhub#90.

rkdarst mentioned this pull request Jul 31, 2018

Run command improvements #104

Closed

rkdarst added a commit to rkdarst/batchspawner that referenced this pull request Jul 31, 2018

run_command: Increase debugging output in case of error

9a00bee

- Thanks to @krinsman in jupyterhub#90.

This was referenced Jul 31, 2018

run_command: Increase debugging output in case of error #106

Merged

run_command: Fix possible deadlock #107

Merged

Updated PR

ad03efc

This was referenced Jun 11, 2019

[idea] batchspawner sprint #138

Closed

Integration: #143 (inc. #141, #139), #130, #86, #90 #144

Merged

mbmilligan merged commit ad03efc into jupyterhub:master Jul 24, 2019

mbmilligan added a commit that referenced this pull request Jul 24, 2019

Merge pull request #144 from rkdarst/dev

325aab9

Integration: #143 (inc. #141, #139), #130, #86, #90

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementation of coroutines with asyncio (possibly shortens `run_command` substantially) #90

Implementation of coroutines with asyncio (possibly shortens `run_command` substantially) #90

krinsman commented May 30, 2018

krinsman commented May 30, 2018

rkdarst commented May 31, 2018

krinsman commented Jun 1, 2018

rkdarst commented Jun 18, 2018

willingc commented Jun 18, 2018

rkdarst commented Jun 18, 2018

minrk commented Jun 19, 2018

rkdarst commented Jun 20, 2018

minrk commented Jun 20, 2018

rkdarst commented Jun 20, 2018

minrk commented Jun 21, 2018

rkdarst commented Jun 21, 2018

minrk commented Jun 22, 2018

krinsman commented Aug 16, 2018 •

edited

Loading

mbmilligan commented Aug 16, 2018

Implementation of coroutines with asyncio (possibly shortens run_command substantially) #90

Implementation of coroutines with asyncio (possibly shortens run_command substantially) #90

Conversation

krinsman commented May 30, 2018

krinsman commented May 30, 2018

rkdarst commented May 31, 2018

krinsman commented Jun 1, 2018

rkdarst commented Jun 18, 2018

willingc commented Jun 18, 2018

rkdarst commented Jun 18, 2018

minrk commented Jun 19, 2018

rkdarst commented Jun 20, 2018

minrk commented Jun 20, 2018

rkdarst commented Jun 20, 2018

minrk commented Jun 21, 2018

rkdarst commented Jun 21, 2018

minrk commented Jun 22, 2018

krinsman commented Aug 16, 2018 • edited Loading

mbmilligan commented Aug 16, 2018

Implementation of coroutines with asyncio (possibly shortens `run_command` substantially) #90

Implementation of coroutines with asyncio (possibly shortens `run_command` substantially) #90

krinsman commented Aug 16, 2018 •

edited

Loading