-
-
Notifications
You must be signed in to change notification settings - Fork 350
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Awaiting futures versus starting coroutines (How opinionated is trio?) #892
Comments
@dabeaz Probably could use your advice as well :) |
Hello! Trio does not provide a 1-1 equivalent of an For example, you might do something like: import trio
from functools import partial
# See https://outcome.readthedocs.io
import outcome
# This is 3.7+ only; if using older Python see https://stackoverflow.com/a/48800772/1925449
from contextlib import asynccontextmanager
class ManagedTaskCancelled(Exception):
pass
class ManagedTask:
def __init__(self, async_fn):
self._async_fn = async_fn
self._finished = trio.Event()
self._cancel_scope = trio.CancelScope() # unbound cancel scope; requires trio v0.11.0 (out soon)
self.outcome = None
async def _run(self):
# XX TODO: this is actually broken in the presence of MultiErrors... it's very difficult to do this
# correctly until MultiError v2 lands (see #611)!
try:
with self._cancel_scope:
self.outcome = outcome.Value(await self._async_fn())
except Exception as exc:
self.outcome = outcome.Error(exc)
finally:
if self.outcome is None:
# It raised some kind of BaseException... we'll treat them all as "cancelled"
self.outcome = outcome.Error(ManagedTaskCancelled("cancelled"))
self._finished.set()
# Once #611 lands, the above will reduce to something like:
# with self._cancel_scope:
# try:
# self.outcome = await outcome.acapture(self._async_fn, filter=Exception)
# finally:
# if self.outcome is None:
# self.outcome = outcome.Error(ManagedTaskCancelled("cancelled"))
# self._finished.set()
def cancel(self):
self._cancel_scope.cancel()
async def wait(self):
await self._finished.wait()
class ManagedTaskRunner:
def __init__(self, nursery):
self._nursery = nursery
def start_soon(self, async_fn, *args):
managed_task = ManagedTask(partial(async_fn, *args))
self._nursery.start_soon(managed_task._run)
return managed_task
@asynccontextmanager
async def open_managed_task_runner():
async with trio.open_nursery() as nursery:
try:
yield ManagedTaskRunner(nursery)
finally:
# If you want any remaining "managed tasks" to be automatically cancelled when the
# request finishes:
nursery.cancel_scope.cancel()
################################
async def handle_request(...):
async with open_managed_task_runner() as runner:
hobby_task = runner.start_soon(hobby)
# Now we can do hobby_task.cancel() to request cancellation,
# await hobby_task.wait() to wait for it,
# and then examine hobby_task.cancelled and hobby_task.outcome to find out what happened This is fairly elaborate – in many cases I suspect there's a simpler way to accomplish what you want without all of this machinery. But it does give you all the same power as the asyncio version, with a number of bonuses – for example, if |
My only thought is that there are often many ways to do things. Should you try this in Curio, you'll find it much less opinionated about the whole thing. For one, you can create free-floating tasks if you want. You can also use Curio |
@njsmith Thank you very much for your answer and the example code. It did provoke more thought. Indeed, trio has no concept of a task. It also forbids the reuse of coroutine objects, which one might treat as a (unstarted) future: you get To me it is natural to be able to use/reuse the task object: why not, everything is an object in python. Btw, in trio a nursery has that async with open_nursery() as n:
a_task = n.start_soon(c1)
val = await n.start_soon(c2) Those 2 lines (with and without From what I understand, |
@dabeaz Tried it, with curio everything is possible, yay! Btw, why isn't task awaitable? You can make |
Outcome is already a standalone utility library – you can use it in any python program, whether it's async or not :-). The plan with "MultiError v2" is that Yury and I will be writing a PEP to hopefully make it built-in to the interpreter in 3.8. See #611 for all the details. (It has to be builtin to really work 100% reliably, plus Yury is eager to add it to asyncio. AFAICT his plan at this point is to turn asyncio into trio as much and as fast as he can within the limits imposed by backwards compatibility etc. I'm not sure if that makes asyncio more or less attractive to you :-).)
Yes, trio has a very different design than asyncio. Not sure what to tell you there. It's not true that everything in Python is an object; for example, a synchronous function call is not an object. Trio takes the position that synchronous python works pretty well so we might as well copy it whenever possible :-). Most trio users never learn about coroutine objects at all. And unfortunately, AFAICT it's impossible to have reasonable cancellation semantics in any system where tasks have results and are cancelable. The two properties are just inherently at odds – if task objects carry results, then it means many different consumers may look at those results over time, possibly including consumers who don't know about each other. But if you want to cancel a Task/Future, you have to know that all its potential consumers aren't interested in that result any more, even if you don't know who those consumers are. Of course in your code you at least in principle know what all your different tasks could do in the future, and could somehow keep track of this and only call In particular, suppose tasks A and B both do The solution is to make sure that you never ever call
In early versions of trio, It sounds like you're very committed to the asyncio way of thinking about tasks and futures. Most people who switch from asyncio to Trio struggle for a while to wrap their head around the different approach, and find it very frustrating until suddenly it clicks and they can't imagine ever going back to asyncio. (This was my experience, in fact.) I also realize that this kind of makes Trio sound like a cult, so, uh... I totally understand if you're skeptical and prefer to stick to what makes sense for you currently. That's a very sensible position :-). If asyncio is what makes sense to you then you should definitely use asyncio; it's way better at being asyncio than anything else is. |
It's really kind of amazing how complex scenarios like this just "work" in Curio.
Output:
Anyways, I'll just leave it at that. Task away... if you must. Or not ;-). |
@dabeaz Yeah, curio's approach of never automatically propagating cancels across |
I guess that asyncio assumes by default that tasks were created in trio-like safe way, so it
class TaskGroup:
def __init__(self):
self._tasks = []
@property
def task(self):
return asyncio.gather(*self._tasks)
def __enter__(self):
return self
def __exit__(self, exc_type, exc_val, exc_tb):
pass
def __truediv__(self, coro):
return self.create_task(coro)
def __await__(self):
return self.task.__await__()
def create_task(self, coro):
t = asyncio.create_task(coro)
self._tasks.append(t)
return t
with TaskGroup() as g:
task = g / co1()
val = await g / co2()
await g Well, I defend asyncio, but actually like curio more. Will give it a try I guess |
@abetkin Be careful – ...though actually now that I think about it, I'm nto actually sure how ...looking at the source, AFAICT canceling Anyway, sounds like you've got a plan and there's nothing todo for trio here, so I'm going to mark this closed. Feel free to keep chatting. |
You can already see from the backlink above that I've requested a very similar concept in trio-util, but I thought it was worth adding to the discussion here about cancellation semantics given @njsmith's comments above:
This is maybe a bit off topic for this thread, but I think the most common reason for wanting tasks in Trio (read: the reason I want them) is just to reduce the boilerplate needed for accessing the return value of a coroutine started in a nursery. You don't need complex cancellation semantics for that. Actually I've had situations where I definitely don't want cancellation to be propagated (starting a callback in a user supplied nursery and waiting for it to complete). Once you drop the requirement for propagating cancellation like that, it becomes much easier to implement a task class, and also easier for users to understand IMO:
Although I've requested this in trio-util, I think it would even make a lot of sense in Trio itself. |
Mmh. I assume that you start the task in the user's nursery so that if the user's nursery errors out your task gets cancelled along with it. Correct? If so, I'd just do this: #!/usr/bin/python3
import trio
import outcome
async def some_task():
await trio.sleep(0.5)
return 42
class CaptureCancelled(Exception):
pass
class Capture:
def __init__(self, nursery, p,*a,**k):
self._evt = trio.Event()
nursery.start_soon(self._run,p,a,k)
async def _run(self, p,a,k):
self._result = await outcome.acapture(p,*a,**k)
self._evt.set()
def __await__(self): # shortcut
return self.get_result().__await__()
async def get_result(self):
await self._evt.wait()
try:
return self._result.unwrap()
except trio.Cancelled as err:
raise CaptureCancelled() from err
async def main():
async with trio.open_nursery() as N:
c = Capture(N,some_task) # you'd use a different nursery of course
...
res = await c # or "await c.get_result()" without the shortcut
print(res)
trio.run(main) No task object required. |
I had a slightly simpler design, where you can't wait on the task class at all - instead the result is accessible only once the nursery has completed successfully. If it was cancelled or raised an exception, code using the result will never be reached. |
Well if it's a "guest nursery" (i.e.. one supplied by the client) you can't depend on the guest terminating any time soon, and if it's cancelled your code won't notice as the cancellation won't be propagated to you. I have related code in https://gist.github.com/smurfix/0130817fa5ba6d3bb4a0f00e4d93cf86 that streams results from subtasks into an iterator as they become available. That obviously won't work if you need to wait for the nursery to end. I'd really like to have this kind of nursery extension available as a mix-in instead of bolting it on via external classes and whatnot, but the pesky |
Thanks for that example The main difference is the exception handling. This was the main point I was making in my previous comment, I think the solution I described would work better (no chance of lost exceptions, no need to special case which exceptions get wrapped and which don't, easier for users to understand). Did you have any thoughts about that? To spell it out, it would look like this (not tested so probably has silly syntax errors): (I also deliberately decoupled from nursery to make a bit more flexible and explicit in user code.) class Task:
def __init__(self, routine, *args):
self.routine = routine
self.args = args
self.is_started = False
self.result = None
self.exception = None
self.completed_event = trio.Event()
async def run(self):
assert not self.is_started
self.is_started = True
try:
self.result = await self.routine(*self.args)
except BaseException as e:
self.exception = e
raise # Note the exception is allowed to propagate into user nursery
finally:
self.completed_event.set()
async def wait_complete(self):
await self.completed_event.wait()
def get_result(self):
if not self.completed_event.is_set():
raise TaskNotCompletedException(self)
if self.exception is not None:
raise TaskWrappedException(self) from self.exception # Exception is always wrapped for task user
return self.result |
That's not the main motivation. Instead, I'm assuming that the user's nursery survives longer than this routine. If this routine gets cancelled, the callback should be allowed to continue running as long as the user nursery does. To be specific, in one use case the user nursery is "all open connections" while this routine is for creating connections and the callback is logic for handling the connection. During shutdown, we should stop initiating new connections immediately, but we should spend a reasonable grace period shutting down open connections gracefully. This is very much much the same situation as |
Update I have now put my idea for a task-like class into its own package: aioresult. It's now called Original post Just for a giggle, I made a version of my toy |
Adding some more elaboration on this topic from a gitter conversation: My question: Matthias Urlichs smurfix 01:33 Matthias Urlichs smurfix 01:40 arthur-tacca arthur-tacca 05:29 https://gist.github.com/arthur-tacca/32c9b5fa81294850cabc890f4a898a4e I think a lot of the time you do just want something like gather(). That function waits until all tasks are complete and stops early if one throws an exception. Guess what: that's a Trio nursery! (Except nurseries deal with exceptions and cancellation much better of course.) The fact that Trio nurseries exist prove that waiting for a bunch of tasks to all finish is useful. It's just missing a way to get the task return values (without intrusively changing the functions to save results elsewhere). The key thing that I've realised is that, at least in this simple situation, the best way for exceptions to work is that they're just allowed to propagate straight out. All the task/future classes that I've seen try to be too clever: they try to perfectly forward exceptions by catching and masking them in the task's context, and then re-raising them when fetching the result (typically by using the Outcome library). The clever solution actually makes things worse IMO, because it prevents the nursery where the task is being run from being cancelled (even thought that's probably what you want), and it means you need to be careful not to get the result more than once (which is annoying). Worst of all, you must get it once otherwise the exception is silently lost - that's the cardinal sin that nurseries where original meant to prevent. Plus, the exception may not make sense in the new context; the classic example is a cancel exception that is now being raised outside it's owning nursery but I think it's a sign of a general problem. Sorry for the wall of text. In summary, my issues with trio-future are: (1) it shouldn't perfectly forward exceptions, (2) it shouldn't have a gather() function (just advice to use nurseries for that role). Lura lura:veriny.tf [m] 05:32 Alex Grönholm agronholm 05:35 Venky Iyer indigoviolet 19:32 arthur-tacca arthur-tacca 21:15 |
Hi, I stumbled across this issue while looking for a kind of breathing nursery that yields task outcomes as they become available, basically something along the lines of the nursery wrapper of @smurfix. Inspired by that, I thought of an API to safely handle scenarios where new work may be spawned in reaction to the processing of previous results. What came out are basically three concepts, all somewhat akin and related to the discussion here, but otherwise not strictly dependent on one another. I would be very interested in the opinions of @njsmith and the other experts on which of these (and to what extent) could be valuable to have in Trio, and in discussion of the API and UX of course.
The whole thing can be found here: https://gist.github.com/bob1de/80aaefc4d5515e70ed25cd19b861af94 Don't be overwhelmed by the sheer number of lines, without having counted, ~2/3 of it are docs and comments, so Thanks for your time! EDIT: Restructured text and added more details. |
After having rethought this a bit, probably the way to go is to open an issue for the future part at least, as that's the most fundamental and -- considering past discussions -- most requested interface. In the latest version, I factored that out into an ABC to facilitate different implementations than the async callable one. Will reference this issue over there. |
Hi, @njsmith and trio folks!
As I see in trio docs, the main API is letting trio run my async functions for me.
Like this:
This is a valid approach, and probably a good style, since it passes the arguments explicitly
to each coroutine. There is another aproach though.
Please, say if I'm wrong, but a task in trio docs usually equals to coroutine, i.e., not yet started or even created. In asyncio, a task is a Future. This aproach tries to use them:
The difference is that we are using global variables:
hobby_task
,tasks
, and thecounter
. Probably, an antipattern. But imagine we are handling a web request, and it can serve as a natural scope, so globally-scoped tasks are actually scoped by a web request.In many cases, it makes sense to execute async tasks once, and let other async tasks use the result.
In our example, that task is
hobby()
. Can't doawait hobby()
in every coroutine because it's a lengthy task.Also, you can't cancel the hobby task from another coroutine: not without the additional knowledge, which other coroutines are using it.
My question is: can trio be used with this application design? Or does it have parts that can be reused for this?
Basically, tasks (started coroutines) can be viewed as functional components that form some kind of dependency tree.
Theoretically, the nodes in this tree can do supervision/nursery for the dependent nodes.
And the tree knows all the dependencies, so it knows which tasks can be cancelled safely.
Also, do you see any issues with this design? Maybe, you know frameworks that are more suited for it?
P.S.
This issue was born from me trying to adapt apistar by @tomchristie to the needs of a project at work.
(upstream version is different stuff, have to look at 0.5.x version for this). It's a dependency injection framework, and it has "components" classes backed by async tasks.
The text was updated successfully, but these errors were encountered: