-
Notifications
You must be signed in to change notification settings - Fork 134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementation of coroutines with asyncio (possibly shortens run_command
substantially)
#90
Conversation
Looking at the details of the Travis CI output, it looks like the changes actually passed for the two builds using JupyterHub version 0.9, and failed only for those builds using older versions of JupyterHub, which is somewhat to be expected since apparently the implementation of Also, the changes failed the fastest for the builds using Python 3.4, which is also to be expected since One thing that I noticed in my own testing: it might make sense to add a timeout (by wrapping |
You could check #78 for discussion of what versions of Python should be supported and #86 for something else that needs to add a coroutine and the difficulties with Python 3.4. I'm not a maintainer so I can't say what will work, but from what I can tell, supporting Python 3.4 or not is the most important decision to make, and then we can go forward. With HPC stuff I think backwards compatibility is very important, but 3.5 seems like a turning point. Too bad 3.4 is a turning point for redhad/CentOS now. |
@rkdarst I think you're right, this seems to be an idea whose time has not yet come. |
But I still think that this could be done compatible to at least 3.4 - that's what JupyterHub does. Maybe even 3.3. My knowledge of asyncio is really limited and I'm basically just guessing. Also look at the implementation of start, stop, poll, etc. - these are coroutines with the syntax that works. @minrk - sorry to bother you so much, but do you have any suggestions here? It would be nice to take this. |
@rkdarst JupyterHub 0.9 supports 3.5+. As an FYI, we also use async_generator to provide support back to 3.5 for async generators. I haven't looked too closely at this PR but perhaps have the spawner support for JupyterHub 0.8.1 to Python 3.4. And Jupyter 0.9 for Python 3.5+. |
The problem with 3.4 is async/await is source-level incompatible... Does this end the end do three things:
Is there any part we can split out and use right away? Or should we postpone the whole thing until we don't want 3.4 as a requirement for batchspawner? I'm sorry for basically going in circles, I can't do much really and I haven't yet fully understood how JH has changed recently with regard to asyncio. |
For the most part, JupyterHub 0.9's switch to asyncio shouldn't have changed much. tornado and asyncio are generally inter-compatible. It's mainly the adoption of If you want to keep Python 3.4 and jupyterhub 0.8 support, this should be doable. tornado and asyncio are generally inter-compatible, so adopting asyncio doesn't mean you have to switch from If supporting 3.4 is enough of a challenge, though, then I'd suggest requiring Python 3.5 for the next release of BatchSpawner. This should still work with JupyterHub 0.8, as long as it's running with Python 3.5, as tornado in general accepts asyncio coroutines and Futures. |
OK, thanks @minrk. I guess that confirms that this PR is good in the long run, but (also from looking through it again) there isn't much we need _right_now, except... I can see at least one thing that would be useful already / latent bugs:
I can make a PR inspired by these things, but if you want to instead I'll give you first opportunity (might reduce forward-porting conflicts...). |
Often it's not so useful to separate stdout and stderr, so using
It's calling |
I thought about combining, but the output is parsed against regexs in some spawners. The spawners I use should deal with combined streams... but didn't want to risk breaking things I don't know about! At least not until we get spawner maintainers who can confirm it's OK.
I should have clarified this... this PR fixes this, but exists in the base code. And we won't take this PR until we drop 3.4, so... do we want something in the meantime? I doubt typical spawners would fill the buffers since usually it isn't much output, but who knows. |
Good plan. If stdout is parsed for output, then keeping it separate is a good idea. In that case, just displaying both separately in case of error is probably right.
Before it was using |
What if stderr buffer fills, process blocks, and stdout can't close? (I thought this was one of the usual problems with reading these) I checked some and this is what I thought was the way to do it:
I'll bet this has never happened in the history of batchspawner and probably won't, but... |
I'm not sure you can await a list, you might need to do it in two statements, or wrap it in out, eout = yield gen.multi([...]) I still don't think these should block, though. If a pipe fills up, it's the writer that should block, not the reader. |
- Reading from stdout and stderr separately can produce a deadlock. I assume that the separate proc.wait_for_exit() doesn't matter here. - Thanks to @krinsman in jupyterhub#90.
- Reading from stdout and stderr separately can produce a deadlock. I assume that the separate proc.wait_for_exit() doesn't matter here. - Thanks to @krinsman in jupyterhub#90.
As before, it seems that this still can't be merged until support for JupyterHub versions <0.9 is dropped. That being said, it seemed prudent to keep this up-to-date with master for now. |
Well it's mainly about Python 3.4 support, but yes, it is prudent to keep this updated until we are in a position to merge it. So thank you for staying on top of this! |
Apparently (according to the Tornado wiki)
@gen.coroutine
has been deprecated by the Tornado project itself in favor of "native" coroutines implemented with theasyncio
module, and as of Tornado v. 5.0, eventornado
itself is written usingasyncio
, instead oftornado.gen.coroutine
. Also most of the classes in the parent JupyterHub project are written usingasyncio
instead oftorndao.gen.coroutine
.So it seemed to me like a possibly good idea if the coroutines in
batchspawner
were reimplemented usingasyncio
instead oftornado.gen.coroutine
, to avoid problems with Tornado or JupyterHub possibly making breaking changes to the code in the future.The documentation for the parent
Spawner
class in JupyterHub (if I remember correctly) still says that thestart
,stop
, andpoll
methods have to be implemented astornado.gen.coroutine
's in any child class, so I didn't change those methods. For the remaining coroutines, the only method for which the changes involved anything more substantial than replacing@gen.coroutine \n def
withasync def
andyield
withawait
wasrun_command
.I think there are mistakes in my attempt to implement
run_command
usingasyncio
, since the testing I did was limited to dummy systems and not a real production deployment/environment. What follows is a summary of most of the changes I made (I might have forgotten some) and why I thought they were appropriate. Since you obviously understand the source code much better than I do, you will be able to determine whether my reasoning is appropriate.tornado.process.Subprocess
class replacessubprocess.Popen
, which is also whatasyncio.subprocess.Process
class does. The easiest ways (i.e. the only ways that seem to be analogous to calling the constructor ofsubprocess.Popen
) to initialize an instance of that class are the functionsasyncio.create_subprocess_exec
andasyncio.create_subprocess_shell
.asyncio.create_subprocess_exec
has a really annoying format for accepting commands (every word has to be a separate argument in*args
(insubprocess.Popen
withshell=False
one can use a list of strings), and since the point ofrun_command
is to run a shell command, and the syntax forasyncio.create_subprocess_shell
is both simpler and more analogous, I used that instead. That's why there is noshell=True
in the call toasyncio.create_subprocess_shell
.asyncio.create_subprocess_shell
andasyncio.subprocess.Process
that any extra keyword arguments get passed to the constructor ofsubprocess.Popen
(or a very analogous constructor), so I leftenv=env
.tornado.Subprocess.STREAM
is a replacement forsubprocess.PIPE
, which is also whatasyncio.subprocess.PIPE
is supposed to be, so I changed all ofstdin
,stdout
,stderr
fromtornado.Subprocess.STREAM
toasyncio.subprocess.PIPE
in the input toasyncio.create_subprocess_shell
.if input:
statement block. Apparently one isn't supposed to use stdin.write(), according to here. What the documentation recommends using instead is calling thecommunicate
method ofasyncio.subprocess.Process
, which is what I do below. Since, seemingly, the call to thewrite
method ofproc.stdin
is unnecessary if one uses thecommunicate
method ofproc
, I deleted the wholetry
-except
block involving it. (It seems that theinput
argument ofproc.communicate
is the part that replacesproc.stdin.write
, so that is why I callproc.communicate
withinput=inbytes
. I am not 100% sure that's right though.)proc.stdin
will be closed automatically by the call to thecommunicate
method ofproc
. But if that isn't correct, one could still re-addproc.stdin.close()
sinceproc.stdin
is aStreamWriter
which has aclose
method (as opposed tostdout
andstderr
which areStreamReader
's, which appear not to have anyclose
method).proc.communicate
returns a tuple containing both the standard output and the standard error (the waysubprocess.Popen
does) so it seemed possible to merge the assignments ofout
andeout
into one line.StreamReader
's don't seem to have aclose
method, they can still be in a closed state, and based on my testing bothproc.stdout
andproc.stderr
are closed automatically by the call ofproc.communicate
, so it seemed possible to delete these two lines explicitly closing them.try
-except
block with the concludingif
-else
block. This is the hardest part for me to justify, since I'm not entirely certain I understand what was intended to be accomplished by the original code, but here goes. As far as I can tell based on the documentation,tornado.Subprocess.wait_for_exit
is intended as a replacement forsubprocess.Popen.wait
. Analogously,asyncio.subprocess.Process.wait
is also a replacement forPopen.wait
according to the documentation. So the big uncertainty for me was whether or not I should leave in a line sayingproc.wait()
. According to the documentation, apparentlyproc.wait
might deadlock when usingasyncio.subprocess.PIPE
's forstdout
andstderr
. Since that is what is being done above, and deadlocks are bad, it seemed like I shouldn't useproc.wait
. The documentation recommends usingcommunicate
instead, which is what was already done above. Seemingly the reason for callingtornado.Subprocess.wait_for_exit
in the original code was to get the return code attribute of the process and assign it toerr
. It's unclear to me though because the tornado documentation only says thatwait_for_exit
returns aFuture
which resolves when the process exits -- but it doesn't clarify what theFuture
resolves to. Since the same documentation says thatwaitt_for_exit
is supposed to replacePopen.wait
, which returns the return code of the process, as doesasyncio.subprocess.Process.wait
, and because of theif
-else
block seems to be based on whether or not the return code (if the relevantFuture
was actually resolved) is non-zero, and the log message in thetry
-except
block seems to be intended to explain how the process had non-zero exit status, my operating assumption has been that the value ofproc.returncode
set byproc.communicate
suffices as a replacement forerr
. But sinceproc.wait
isn't being called, the correspondingtry
-except
block no longer seemed applicable, so I removed the non-applicable parts and merged the remaining applicable parts into theif
statement. Also, since raising an exception and returning a value are mutually exclusive, in theif
statement I had to choose between keepingreturn err
andraise RuntimeError(eout)
. The latter seemed more informative, so that's why I kept it instead ofreturn err
. (Also since the value oferr
would be in the log.)That was probably way too much unnecessary detail, but I just wanted to explain that, if I did butcher the code by removing too many essential parts, my intentions in doing so were benign.