-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] first draft of more generic timeouts throughout the request lifecycle #2842
Conversation
…, combined with tracing
29dcf7b
to
fb6089d
Compare
Yeps, the works looks pretty interesting and looks like that we can build on top of the signal flow a complete timeout system that will give a great granularity to the user! so, first of all thanks for the first draft and keep pushing for this! I have a general comment about the MR regarding how it implements the Why am I saying this? In this MR the My suggestion here would be, get rid of the
What do you think? @asvetlov ? Also, have in mind that with this change we are forcing to call the |
I agree that defining the methods more explicitly in production code and getting rid of the dynamic
So I agree with points 3 and 4 of your list, but I'd strongly suggest keeping 1 and 2:
I'm still in favour of defining each of the send methods explicitly, but here's how I'd do what without getting too much redundant, duplicated code: using collectors and dispatchers. async def send_request_start(self, method, url, headers):
return await self._send_signal(Signal.REQUEST_START, TraceRequestStartParams(method, url, headers))
async def send_request_end(... Here, a normal member method async def sender(self, signal, params):
# record timestamp
self._signal_timestamps[signal] = time.time()
# cancel all running timeouts that end with this signal
while self._set_timeouts[signal]:
timeout_handle = self._set_timeouts[signal].pop()
timeout_handle.cancel()
# send on_signal to all trace listeners
await asyncio.gather(
trace_config.dispatch_signal(signal, self._session, trace_context, params)
for trace_config, trace_context in self._trace_configs
)
# start all timeouts that begin with this signal and register their handles for the end signal
for end, timeout in self._set_timeouts[signal]:
assert isinstance(self.request_timer_context, TimerContext)
at = ceil(self._loop.time() + timeout)
handle = self._loop.call_at(at, self.request_timer_context.timeout)
self._set_timeouts[end].append(handle) Notice how only the parameters and the call to class TraceConfig:
def __init__(self, trace_config_ctx_factory=SimpleNamespace):
self._on_request_start = Signal(self)
self._on_request_chunk_sent = Signal(self)
...
async def dispatch_signal(self, signal, session, trace_context, params):
if signal == Signal.REQUEST_START:
self.on_request_start.send(session, trace_context, params)
elif signal == Signal.REQUEST_END:
self.on_request_end.send(session, trace_context, params)
elif signal == ...
@property
def on_request_start(self):
return self._on_request_start
@property
def on_request_chunk_sent(... If you don't like the long With this approach, all public important functions would still be explicitly defined, but the a little more complicated glue code between them would not be duplicated. |
|
||
# send on_signal to all trace listeners | ||
params = params_class(*args, **kwargs) | ||
await asyncio.gather( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
default gather is kinda dangerous, after first exception the un-finished coroutines will continue executing un-parented, is this what you want? If so I wouldn't cmt why it's ok. I think it would be weird to return to caller and then have the signals hit ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
two other issues:
- Do we need to wait for receipt of each signal, should these just all be
ensure_future
s ? If we want the receipt, does send need a timeout? ;) - I don't think we want to raise out if
send
raises?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my comment below, I don't have a strong opinion about how exactly the signal should be sent. I just saw it as a faster version of the old for trace: await trace.send
. There's also the return_exceptions=False
parameter or as an alternative asyncio.wait
with its return_when=ALL_COMPLETED, FIRST_COMPLETED or FIRST_EXCEPTION
parameter, depending on which exact behaviour we want. If you don't want to discuss this now, we could just as well stay with the old sequential approach for this PR.
@@ -41,7 +42,7 @@ | |||
|
|||
|
|||
# 5 Minute default read and connect timeout | |||
DEFAULT_TIMEOUT = 5 * 60 | |||
DEFAULT_TIMEOUTS = RequestTimeouts(uber_timeout=5 * 60) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds like this should be request_timeout instead of uber
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the old timeout worked like the uber_timeout is working now, so I though this would be suited best for keeping backwards compatibility. If changing the way things work is okay, I'd be happy to use a more sensible timeout for the default. ;)
headers, | ||
e | ||
) | ||
lifecycle.clear_timeouts() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we do this as part of a context, it's too easy to get wrong otherwise
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure whether it is required at all, clear_timeouts
is analogous to the old handle.cancel
e.g. also used in resp.connection.add_callback(handle.cancel)
. As timeouts are automatically cancelled once the end signal is emitted, we probably don't need any clean-up any more. But I'm not 100% sure whether there were any side effects through resp.connection.add_callback
that I'm not aware of right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool, ya lets figure that out
for trace in traces: | ||
await trace.send_dns_resolvehost_start(host) | ||
if lifecycle: | ||
await lifecycle.send_dns_resolvehost_start(host) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be better to have a context class for stuff like this that's a no-op if lifecycle is None, otherwise do the right thing at start and end, would make things cleaner too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, we could either make a specific with lifecycle.send_dns_resolvehost_context():
for each pair of start/end signals or even use a generic with lifecycle. enter_context(start="send_dns_resolvehost_start", end="send_dns_resolvehost_end", exception="send_dns_resolvehost_exception")
.
added a few cmts, one thing that worries me (not directly related to this PR), but is the non-trivial overhead that tracing will have because there are so many signals and they're all async functions...given each async function will trigger an iteration of the main message loop. I realize that having it async allows for a more possibilities in the handler but I'm afraid with this flexibility adds a non-trivial cost...anyways, lets keep performance in mind. Let's make sure we use the fastest "time" function as well because calling the regular |
Thanks for PR. @thehesiod tracing signals should be async functions. If you do something slow in tracing callbacks -- aiohttp cannot help with it. If tracing is not used it is no-op.
Strictly speaking it is not correct: if async function doesn't block (wait for a future somewhere) -- there is no task switch and no a new loop iteration.
|
Another disruptive comment, I would proposal having a PR that does not couple the trace system with the new timeout feature. True, that it helped us to get a list of the timeouts that will make sense but IMHO it doesn't mean that both implementations have to be coupled. Why am I saying that?
To sum up, I would prefer a none coupled implementation from the very beginning. Thoughts? @N-Coder @thehesiod @asvetlov @kxepal ? |
Agree with @pfreixes |
The reason why I came up with this solution was because I saw both features in the bigger context, both of them tracking the lifecycle each and every request goes through. I don't see that both things should be logically separated, as they belong to the same thing - the request lifecycle, which possibly was not as obvious when tracing was first implemented. Some of the decisions I made are in deed opinionated and make bigger steps without being fully discussed - that's exactly the reason why I created this working draft, to be able to discuss every change I saw necessary. I thought working on actual code that already does the thing might have helped to find the best solution faster. Fiddling timeouts in between the already comparatively long and complex code and hoping for somebody to clean everything up at some later point did not seem like a good idea to me. This solution also looked a lot cleaner and nicer to me, especially if you see how much code it actually removed from the ClientSession. If you don't like the discussion about tracing were having here, that is in my opinion something we can actually defer without building technical debt. The only discussion we had here related to tracing was how the async callback methods should be called. I though it would be clearly better to call them in parallel than in sequence (using If you want to go back to the drawing board, feel free to do so or propose another way. Unfortunately, I can't see the clean solution that you are proposing without making to much mess of the code. |
@N-Coder on first glaze tracing and timeouts are not tightly coupled. |
@asvetlov oh ya forgot that for now async doesn't hit the message loop unless you block, thanks @N-Coder thanks again for all this! It's worth considering the underlying reason why each is needed (IMO):
the commonality is that both want to know/control the areas that can take a lot of time in aiohttp...so this this scenario they're very much alike, they want to know/control similar areas. also, 1) can impact 2) given it can interrupt a trace. 2) similarly can impact 1) if the tracing callbacks take too long. so I guess the question is:
I think obviously it's not 3. |
This is a kind of review. I very like Also this is the reason why I want to separate client timeouts from client tracing signals. Moreover I strongly suggest to not use At the end thank you again for raising such important questions not for only timeouts implementation but for aiohttp design principles at all. |
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a [new issue] for related bugs. |
What do these changes do?
I combined the
TimeoutHandle
class used for the current uber (read) timeout and it'stimer()
context with the list of registeredTrace
s into theRequestLifecycle
class. This class now provides a context to execute the request in and methods to notify theTrace
s of changes of the request change, while also keeping track of various timeouts between the various states throughout the whole request lifecycle.The main changes lie within the new lifecycle.py file:
The attr.s class
RequestTimeouts
is used for configuring the various timeouts. Additionally, attr.ib metadata is used to define the signals where a timeout should be started or ended, where applicable. This makes the attribute definitions analogous to the timeout table in my comment to the issue.Unfortunately, some of the proposed timeouts can't be defined using the currently available signals. In those cases, we could either add further signals, or add code for tracking the specific timeout explicitly.
The
RequestLifecycle
class now uses this meta-information together with a list of all signals and their parameters class to build the varioussend_*
dynamically (if you don't like this dynamic generation, the methods could also be "precompiled" statically, but I didn't want to make all that writing effort right now). The newsend_*
methods now do the following:Trace
in parallel using asyncio.gatherThis leads to the time for the tracing of signals between timeout start and end signals to count for encompassing timeout, while the tracing for start and end signal themselves won't be counted. As timeouts are executed in parallel by
asyncio.gather
, the impact should be minimal when using sensible tracing function.On the topic of merging the timeout config with the session or default config:
The attr.s class
RequestTimeouts
is frozen, preventing any direct modification to and thus requiring the user to create and assign a new instance to make changes. This can either be done by directly calling the class constructor and thus ignoring all set session and default configs, or by usingattr.evolve
(session.timeouts, overwritten_timeout=12.3)
to reuse the existing config and only overwrite some specific values.If you deem the chance of users now accidentally overwriting the default config by always creating a new instance, we could also try to hide the constructor and let the users either use
session.defaulted_timeout_config(**kwargs)
for theattr.evolve
version orsession.new_timeout_config(**kwargs)
for the one without defaults.Are there changes in behavior for the user?
These changes should be 100% backward-compatible for users only using the interface defined by ClientSession. The old
read_timeout
andconn_timeout
parameters to the__init__
function will be translated to the new RequestTimeouts config object and thetimeout
parameter can either be a new config object or as previously a number that will be merged into the config object.To respect the merge of
timer
andtraces
I changed the parameters of__init__
method and some other request-lifecycle-related methods inClientRequest
,ClientResponse
orConnector
objects.To circumvent these breaking-changes,
lifecycle.request_timer_context
can be used as drop-in replacement of the oldtimer
(compare client_reqrep:680, where I already did this to circumvent any changes to native code) and[lifecycle]
can be used as drop-in replacement of the oldtraces
array.If you use these drop-ins starting from client:254, you can drop all the changes to client_reqrep.py and connector.py, as just renamed parameters and replaced for-loops with direct calls there.
Related issue number
See #2768 for discussion.
Checklist
Nothing done here yet, as this is just a draft intended as material for discussion. I had to make a PR to allow inline comments within the code. As soon as everything is settled, I'll make a proper PR.
CONTRIBUTORS.txt
CHANGES
folder<issue_id>.<type>
for example (588.bugfix)issue_id
change it to the pr id after creating the pr.feature
: Signifying a new feature..bugfix
: Signifying a bug fix..doc
: Signifying a documentation improvement..removal
: Signifying a deprecation or removal of public API..misc
: A ticket has been closed, but it is not of interest to users.