Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation of coroutines with asyncio (possibly shortens run_command substantially) #90

Merged
merged 4 commits into from
Jul 24, 2019

Conversation

krinsman
Copy link
Contributor

Apparently (according to the Tornado wiki) @gen.coroutine has been deprecated by the Tornado project itself in favor of "native" coroutines implemented with the asyncio module, and as of Tornado v. 5.0, even tornado itself is written using asyncio, instead of tornado.gen.coroutine. Also most of the classes in the parent JupyterHub project are written using asyncio instead of torndao.gen.coroutine.

So it seemed to me like a possibly good idea if the coroutines in batchspawner were reimplemented using asyncio instead of tornado.gen.coroutine, to avoid problems with Tornado or JupyterHub possibly making breaking changes to the code in the future.

The documentation for the parent Spawner class in JupyterHub (if I remember correctly) still says that the start, stop, and poll methods have to be implemented as tornado.gen.coroutine's in any child class, so I didn't change those methods. For the remaining coroutines, the only method for which the changes involved anything more substantial than replacing @gen.coroutine \n def with async def and yield with await was run_command.

I think there are mistakes in my attempt to implement run_command using asyncio, since the testing I did was limited to dummy systems and not a real production deployment/environment. What follows is a summary of most of the changes I made (I might have forgotten some) and why I thought they were appropriate. Since you obviously understand the source code much better than I do, you will be able to determine whether my reasoning is appropriate.

  • The tornado.process.Subprocess class replaces subprocess.Popen, which is also what asyncio.subprocess.Process class does. The easiest ways (i.e. the only ways that seem to be analogous to calling the constructor of subprocess.Popen) to initialize an instance of that class are the functions asyncio.create_subprocess_exec and asyncio.create_subprocess_shell. asyncio.create_subprocess_exec has a really annoying format for accepting commands (every word has to be a separate argument in *args (in subprocess.Popen with shell=False one can use a list of strings), and since the point of run_command is to run a shell command, and the syntax for asyncio.create_subprocess_shell is both simpler and more analogous, I used that instead. That's why there is no shell=True in the call to asyncio.create_subprocess_shell.
  • It says somewhere in the documentation for both asyncio.create_subprocess_shell and asyncio.subprocess.Process that any extra keyword arguments get passed to the constructor of subprocess.Popen (or a very analogous constructor), so I left env=env.
  • As far as I understand, tornado.Subprocess.STREAM is a replacement for subprocess.PIPE, which is also what asyncio.subprocess.PIPE is supposed to be, so I changed all of stdin, stdout, stderr from tornado.Subprocess.STREAM to asyncio.subprocess.PIPE in the input to asyncio.create_subprocess_shell.
  • Everything is the same until a little bit into the if input: statement block. Apparently one isn't supposed to use stdin.write(), according to here. What the documentation recommends using instead is calling the communicate method of asyncio.subprocess.Process, which is what I do below. Since, seemingly, the call to the write method of proc.stdin is unnecessary if one uses the communicate method of proc, I deleted the whole try-except block involving it. (It seems that the input argument of proc.communicate is the part that replaces proc.stdin.write, so that is why I call proc.communicate with input=inbytes. I am not 100% sure that's right though.)
  • Also, as far as I can tell, proc.stdin will be closed automatically by the call to the communicate method of proc. But if that isn't correct, one could still re-add proc.stdin.close() since proc.stdin is a StreamWriter which has a close method (as opposed to stdout and stderr which are StreamReader's, which appear not to have any close method).
  • proc.communicate returns a tuple containing both the standard output and the standard error (the way subprocess.Popen does) so it seemed possible to merge the assignments of out and eout into one line.
  • While StreamReader's don't seem to have a close method, they can still be in a closed state, and based on my testing both proc.stdout and proc.stderr are closed automatically by the call of proc.communicate, so it seemed possible to delete these two lines explicitly closing them.
  • The last major change was merging the final try-except block with the concluding if-else block. This is the hardest part for me to justify, since I'm not entirely certain I understand what was intended to be accomplished by the original code, but here goes. As far as I can tell based on the documentation, tornado.Subprocess.wait_for_exit is intended as a replacement for subprocess.Popen.wait. Analogously, asyncio.subprocess.Process.wait is also a replacement for Popen.wait according to the documentation. So the big uncertainty for me was whether or not I should leave in a line saying proc.wait(). According to the documentation, apparently proc.wait might deadlock when using asyncio.subprocess.PIPE's for stdout and stderr. Since that is what is being done above, and deadlocks are bad, it seemed like I shouldn't use proc.wait. The documentation recommends using communicate instead, which is what was already done above. Seemingly the reason for calling tornado.Subprocess.wait_for_exit in the original code was to get the return code attribute of the process and assign it to err. It's unclear to me though because the tornado documentation only says that wait_for_exit returns a Future which resolves when the process exits -- but it doesn't clarify what the Future resolves to. Since the same documentation says that waitt_for_exit is supposed to replace Popen.wait, which returns the return code of the process, as does asyncio.subprocess.Process.wait, and because of the if-else block seems to be based on whether or not the return code (if the relevant Future was actually resolved) is non-zero, and the log message in the try-except block seems to be intended to explain how the process had non-zero exit status, my operating assumption has been that the value of proc.returncode set by proc.communicate suffices as a replacement for err. But since proc.wait isn't being called, the corresponding try-except block no longer seemed applicable, so I removed the non-applicable parts and merged the remaining applicable parts into the if statement. Also, since raising an exception and returning a value are mutually exclusive, in the if statement I had to choose between keeping return err and raise RuntimeError(eout). The latter seemed more informative, so that's why I kept it instead of return err. (Also since the value of err would be in the log.)

That was probably way too much unnecessary detail, but I just wanted to explain that, if I did butcher the code by removing too many essential parts, my intentions in doing so were benign.

@krinsman
Copy link
Contributor Author

Looking at the details of the Travis CI output, it looks like the changes actually passed for the two builds using JupyterHub version 0.9, and failed only for those builds using older versions of JupyterHub, which is somewhat to be expected since apparently the implementation of asyncio coroutines is new to JupyterHub version 0.9.

Also, the changes failed the fastest for the builds using Python 3.4, which is also to be expected since asyncio was implemented starting with Python 3.5, if I recall correctly.

One thing that I noticed in my own testing: it might make sense to add a timeout (by wrapping asyncio.wait_for around the call to proc.communicate) inside of a try-except block (with the except part involving proc.kill and asyncio.TimeoutError), because if someone chooses something dumb like /bin/sh for the command cmd, then the process won't fail, so it won't ever close. But since there wasn't such a try-except block in the original code, which I tried as much as possible to imitate, I didn't add one.

@rkdarst
Copy link
Contributor

rkdarst commented May 31, 2018

You could check #78 for discussion of what versions of Python should be supported and #86 for something else that needs to add a coroutine and the difficulties with Python 3.4. I'm not a maintainer so I can't say what will work, but from what I can tell, supporting Python 3.4 or not is the most important decision to make, and then we can go forward. With HPC stuff I think backwards compatibility is very important, but 3.5 seems like a turning point. Too bad 3.4 is a turning point for redhad/CentOS now.

@krinsman
Copy link
Contributor Author

krinsman commented Jun 1, 2018

@rkdarst I think you're right, this seems to be an idea whose time has not yet come.

@rkdarst
Copy link
Contributor

rkdarst commented Jun 18, 2018

But I still think that this could be done compatible to at least 3.4 - that's what JupyterHub does. Maybe even 3.3. My knowledge of asyncio is really limited and I'm basically just guessing. Also look at the implementation of start, stop, poll, etc. - these are coroutines with the syntax that works.

@minrk - sorry to bother you so much, but do you have any suggestions here? It would be nice to take this.

@willingc
Copy link
Contributor

@rkdarst JupyterHub 0.9 supports 3.5+. As an FYI, we also use async_generator to provide support back to 3.5 for async generators.

I haven't looked too closely at this PR but perhaps have the spawner support for JupyterHub 0.8.1 to Python 3.4. And Jupyter 0.9 for Python 3.5+.

@rkdarst
Copy link
Contributor

rkdarst commented Jun 18, 2018

The problem with 3.4 is async/await is source-level incompatible...

Does this end the end do three things:

  1. use python 3.5 async/await syntax
  2. use asyncio.create_subprocess_shell instead of tornado.process.Subprocess... which also requires 3.5.

Is there any part we can split out and use right away? Or should we postpone the whole thing until we don't want 3.4 as a requirement for batchspawner?

I'm sorry for basically going in circles, I can't do much really and I haven't yet fully understood how JH has changed recently with regard to asyncio.

@minrk
Copy link
Member

minrk commented Jun 19, 2018

For the most part, JupyterHub 0.9's switch to asyncio shouldn't have changed much. tornado and asyncio are generally inter-compatible. It's mainly the adoption of async def syntax that caused the requirement of 3.5. This was elected over @asyncio.coroutine syntax because we get better diagnostics, etc. from tracebacks.

If you want to keep Python 3.4 and jupyterhub 0.8 support, this should be doable. tornado and asyncio are generally inter-compatible, so adopting asyncio doesn't mean you have to switch from tornado.process.Subprocess.

If supporting 3.4 is enough of a challenge, though, then I'd suggest requiring Python 3.5 for the next release of BatchSpawner. This should still work with JupyterHub 0.8, as long as it's running with Python 3.5, as tornado in general accepts asyncio coroutines and Futures.

@rkdarst
Copy link
Contributor

rkdarst commented Jun 20, 2018

OK, thanks @minrk. I guess that confirms that this PR is good in the long run, but (also from looking through it again) there isn't much we need _right_now, except...

I can see at least one thing that would be useful already / latent bugs:

  • the run_command RuntimeError could include or print stdout as well as stderr. No reason to hid useful debugging info.
  • the use of tornado.process.Subprocess could block because of the way stdout and stderr are separately read from. (but I don't know offhand the async way to fix this)

I can make a PR inspired by these things, but if you want to instead I'll give you first opportunity (might reduce forward-porting conflicts...).

@minrk
Copy link
Member

minrk commented Jun 20, 2018

the run_command RuntimeError could include or print stdout as well as stderr. No reason to hid useful debugging info.

Often it's not so useful to separate stdout and stderr, so using stdout=PIPE and stderr=STDOUT joins them into a single stream which you can re-display when the cause arises. This is the only reliable way to get them correctly ordered if both stdout and stderr are producing messages that you are interested in.

the use of tornado.process.Subprocess could block because of the way stdout and stderr are separately read from.

It's calling await proc.communicate(), no? That shouldn't block.

@rkdarst
Copy link
Contributor

rkdarst commented Jun 20, 2018

the run_command RuntimeError could include or print stdout as well as stderr. No reason to hid useful debugging info.

Often it's not so useful to separate stdout and stderr

I thought about combining, but the output is parsed against regexs in some spawners. The spawners I use should deal with combined streams... but didn't want to risk breaking things I don't know about! At least not until we get spawner maintainers who can confirm it's OK.

the use of tornado.process.Subprocess could block because of the way stdout and stderr are separately read from.

It's calling await proc.communicate(), no? That shouldn't block.

I should have clarified this... this PR fixes this, but exists in the base code. And we won't take this PR until we drop 3.4, so... do we want something in the meantime? I doubt typical spawners would fill the buffers since usually it isn't much output, but who knows.

@minrk
Copy link
Member

minrk commented Jun 21, 2018

didn't want to risk breaking things I don't know about!

Good plan. If stdout is parsed for output, then keeping it separate is a good idea. In that case, just displaying both separately in case of error is probably right.

I should have clarified this... this PR fixes this, but exists in the base code.

Before it was using yield read_until_close, so it should also have been non-blocking.

@rkdarst
Copy link
Contributor

rkdarst commented Jun 21, 2018

Before it was using yield read_until_close, so it should also have been non-blocking.

What if stderr buffer fills, process blocks, and stdout can't close? (I thought this was one of the usual problems with reading these)

I checked some and this is what I thought was the way to do it:

        out, eout = yield [proc.stdout.read_until_close(),
                           proc.stderr.read_until_close()]

I'll bet this has never happened in the history of batchspawner and probably won't, but...

@minrk
Copy link
Member

minrk commented Jun 22, 2018

I'm not sure you can await a list, you might need to do it in two statements, or wrap it in gen.multi:

out, eout = yield gen.multi([...])

I still don't think these should block, though. If a pipe fills up, it's the writer that should block, not the reader.

rkdarst added a commit to rkdarst/batchspawner that referenced this pull request Jul 25, 2018
- Reading from stdout and stderr separately can produce a deadlock.  I
  assume that the separate proc.wait_for_exit() doesn't matter here.
- Thanks to @krinsman in jupyterhub#90.
rkdarst added a commit to rkdarst/batchspawner that referenced this pull request Jul 25, 2018
@rkdarst rkdarst mentioned this pull request Jul 31, 2018
rkdarst added a commit to rkdarst/batchspawner that referenced this pull request Jul 31, 2018
rkdarst added a commit to rkdarst/batchspawner that referenced this pull request Jul 31, 2018
- Reading from stdout and stderr separately can produce a deadlock.  I
  assume that the separate proc.wait_for_exit() doesn't matter here.
- Thanks to @krinsman in jupyterhub#90.
@krinsman
Copy link
Contributor Author

krinsman commented Aug 16, 2018

As before, it seems that this still can't be merged until support for JupyterHub versions <0.9 is dropped. That being said, it seemed prudent to keep this up-to-date with master for now.

@mbmilligan
Copy link
Member

Well it's mainly about Python 3.4 support, but yes, it is prudent to keep this updated until we are in a position to merge it. So thank you for staying on top of this!

@mbmilligan mbmilligan merged commit ad03efc into jupyterhub:master Jul 24, 2019
mbmilligan added a commit that referenced this pull request Jul 24, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants