Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Client signals #2313 #2429

Merged
merged 27 commits into from
Nov 18, 2017
Merged

Client signals #2313 #2429

merged 27 commits into from
Nov 18, 2017

Conversation

pfreixes
Copy link
Contributor

What do these changes do?

Implementation of the client signals exposed by the ClientSession
class, to get a list of the all signals implementation please visit
the documentation.

Are there changes in behavior for the user?

Yes

Related issue number

#2313

Checklist

  • I think the code is well written
  • Unit tests for the changes exist
  • Documentation reflects the changes
  • If you provide code modification, please add yourself to CONTRIBUTORS.txt
    • The format is <Name> <Surname>.
    • Please keep alphabetical order, the file is sorted by names.
  • Add a new news fragment into the changes folder
    • name it <issue_id>.<type> for example (588.bug)
    • if you don't have an issue_id change it to the pr id after creating the pr
    • ensure type is one of the following:
      • .feature: Signifying a new feature.
      • .bugfix: Signifying a bug fix.
      • .doc: Signifying a documentation improvement.
      • .removal: Signifying a deprecation or removal of public API.
      • .misc: A ticket has been closed, but it is not of interest to users.
    • Make sure to use full sentences with correct case and punctuation, for example: "Fix issue with non-ascii contents in doctest text files."

Implementation of the client signals exposed by the `ClientSession`
class, to get a list of the all signals implementation please visit
the documentation.
@pfreixes
Copy link
Contributor Author

@codecov-io
Copy link

codecov-io commented Oct 29, 2017

Codecov Report

Merging #2429 into master will increase coverage by 0.05%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2429      +/-   ##
==========================================
+ Coverage   97.13%   97.18%   +0.05%     
==========================================
  Files          39       40       +1     
  Lines        8059     8203     +144     
  Branches     1411     1441      +30     
==========================================
+ Hits         7828     7972     +144     
  Misses         99       99              
  Partials      132      132
Impacted Files Coverage Δ
aiohttp/client_reqrep.py 97.2% <ø> (ø) ⬆️
aiohttp/tracing.py 100% <100%> (ø)
aiohttp/connector.py 97.65% <100%> (+0.16%) ⬆️
tests/autobahn/client.py 95.09% <0%> (+0.18%) ⬆️
demos/polls/aiohttpdemo_polls/__init__.py 89.47% <0%> (+0.58%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 992499d...031bb3b. Read the comment docs.

Copy link
Member

@asvetlov asvetlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I doubt if parsers code need to be changed.

I thought these signals should be send by more high level code.
await client.request() returns just before all response headers are get and parsed for example.

Working with body is more complicated: the proper API is resp.content.read() and family.
The problem is the resp.content is a stream. The stream class is shared between client and server (BTW it is true for parsers too).
We could either add on_content_received to stream or make a new stream just for client (derive if from base stream, sure).
Both approaches have own drawbacks: implementing a new class or paying for empty subscribers list call on server.
I don't know what is better. On server every python function call matters, for client we can relax our striving for performance.
On other hand server code has too many redirection levels, maybe the degradation is negligible.

Anyway, I pretty sure that parses should not know about signals.
I hope we'll add HTTP/2 eventually, the protocol has own HTTP headers parser. I don't like adding signals to it too.

@jettify
Copy link
Member

jettify commented Oct 30, 2017

@asvetlov lets summarize what should be done to move this PR forward

  1. move code _on_headers_received and _on_content_received to higher level
  2. as for body stuff, is there any point where we can signal that full body consumed? I believe this approach will cover 90% of use cases. Other 10% we can address in next PRs

anything else?

@pfreixes
Copy link
Contributor Author

I had my doubts about the parsers and how to implement the signals triggered by sending or receiving some pieces of the HTTP protocol. The current implementation is just an intention or a POC of how other signals that have more granularity such as on_content_chunk_sent could be implemented, and I have my own rationale for that ( Obviously I can have it in a wrong way)

But I would prefer to get rid of this variable and try to put all of us on the same page. My proposal here is get rid of the on_headers_sent, on_headers_received, on_content_sent, on_content_received signals to be implemented later, and if we consider that they are a must implement them together with the chunk ones. It is gonna give us the freedom of focus our efforts right now in a set of signals that already give a user value.

What do you think @asvetlov and @jettify ?

@asvetlov
Copy link
Member

Sounds good

@pfreixes
Copy link
Contributor Author

pfreixes commented Nov 1, 2017

@asvetlov I think that your requested changes are no longer valid and the branch is ready to be either merged or moving on with other reviewing stuff.

yield from self.on_request_start.send(
trace_context,
method,
url.host,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whole URL object maybe? Query part might be interested for tracer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I will do that

@@ -291,6 +315,9 @@ def _request(self, method, url, *,
# redirects
if resp.status in (
301, 302, 303, 307, 308) and allow_redirects:

self.on_request_redirect.send(trace_context, resp)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is intended usage for the signal?
How to figure out what initial request was redirected?
The same is true for all other signals.
Maybe we should always pass URL, method and headers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea was to give a way to trace when a redirect happens. Completly agree that the signal parameters are not enough consistency to give enough information to the user, lets add the method, URL, headers.

What do you mean with :: The same is true for all other signals.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will do the same with the on_request_end, and on_request_exception allowing the user to know the source URL in case the request came from a redirect.


@property
def on_request_queued_start(self):
return self._connector.on_queued_start
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think every session should have own signals.
Connector might be shared between sessions by connector_owner=False or session.detach().
Shared subscriptions make a mess: nobody know when recipient is dead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oks I can see right now the idea of the connector_owner parameter, my fault I havent check it and it head me to a wrong implementation. If at last the connector is shared between sessions its absolutely necessary don't mess the signals.

In another way worries me a bit the connector_owner implementation, leaving to the user's hand the power of making it True or False, meaning that in case the user gives an alternative connector but forget to pass the connector_owner set as False, when the session is gonna be closed this automatically will close the connections of the connector that might be shared by another Session.

Should the connector_owner param handled internally by the Session object` ?

@@ -347,8 +354,12 @@ def closed(self):
"""
return self._closed

async def connect(self, req):
async def connect(self, req, trace_context=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's pin trace_context to mandatory params set.
I'm ok with breaking backward compatibility in connectors API.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same for other places with trace_context=None in signatures.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oks.

"""Get from pool or create new connection."""

if trace_context is None:
trace_context = SimpleNamespace()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not create an empty context here -- just pass a parameter as is.

_, proto = await self._create_proxy_connection(req)
_, proto = await self._create_proxy_connection(
req,
trace_context=None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trace_context=trace_context I pretty sure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okis.

_, proto = await self._create_direct_connection(req)
_, proto = await self._create_direct_connection(
req,
trace_context=None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trace_context=trace_context

@@ -218,6 +227,18 @@ def _request(self, method, url, *,
handle = tm.start()

url = URL(url)

if trace_context is None:
trace_context = SimpleNamespace()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the API is wrong here.
User will never send a trace_context into async with session.get() explicitly.
It's another level of abstraction.
What the user will do is setting up session properly on initialization stage by substribing on signals and (optionally) providing a trace context factory for creating a new container for user data.
I even doubt if we need a factory parameter, at least at current stage.

Copy link
Contributor Author

@pfreixes pfreixes Nov 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rationale behind this implementation is the following one:

Give the proper freedom to the developer to have a grain control of his requests calls building trace context for each request.

Perhaps

async def on_request_start(trace_context, host, port, headers):
    trace_context['start'] = loop.time()

async def on_request_end(trace_context, resp):
    await send_metrics(
        time=loop.time() - trace_context['start']
        query=trace_context['query']
    )


sesion = ClientSession()
session.add_on_requests_start(on_request_start)
session.add_on_requests_end(on_request_end)

resp = session.get("http://localhost?query=foo", trace={'query':'foo'})
resp = session.get("http://localhost?query=foo", trace={'query':'bar'})

This example shows how the same ClientSession is used to make different queries that might have divergent traces.

In case the user is keen on share information between all requests that belong to the same ClientSession might use a closure pattern, perhaps:

def on_request_end(query):
    async def on_request_end(trace_context, resp):
        await send_metrics(
            time=loop.time() - trace_context['start']
            query=query
        )
    return on_request_end

sesion = ClientSession()
session.add_on_requests_start(on_request_start)
session.add_on_requests_end(on_request_end(query='foo'))

I can see that your point about forcing the user to populate each request call can be less kindy, but from my experience have the way to pass a context that has information about the current execution is a must. Also, take into account that having this granularity allows the user to implement Session or Requests context.

Another solution would pass for implement two different contexts, one for the session and another one for the request. But IMHO overcomplicates right now the implementation.

Thoughts ?

@asvetlov
Copy link
Member

asvetlov commented Nov 3, 2017

New comments.
Sorry, I'm on conference these days, have very limited time for careful review.
There are other issues for discussion not covered by my notes yet.

@pfreixes
Copy link
Contributor Author

pfreixes commented Nov 7, 2017

Something that I don't like about the current proposal after applying two comments regarding your concerns::

  1. Making the trace_context an arg instead of a kwarg breaks many function signatures such as connect, direct_connect and so on. And also IMHO makes the API of at least the Connector and its derivates dirtier. The same will happen with other classes that are used by the ClientSession and have to use signals to trace a specific event related to a specific request.
  2. Removing the signals from the Connector and putting them together into the ClientSession forces us to add another param, the session, that will even make dirtier these functions. The same will happen with other classes.

To fix the second issue, trying to minimize the number of parameters needed I will be keen on creating an object that stores all of the dependencies needed to send a signal, perhaps::

class Trace:
    def __init__(self, session, trace_context):
        self._session = session
        self._trace_context = trace_context

    def on_connect_start(self, host):
        self._session.on_connect_start.send(self._trace_context, host)

def connect(self, host, trace):
    trace.on_connect_start.send(host)
    trace.on_connect_start.send(host)

The first issue, that was other the concerns that you expressed @asvetlov, related to make the trace argument mandatory, I will be still more inclined to make them optional having, as a result, the following snippet::

def connect(self, host, trace=None):
    if trace:
        trace.on_connect_start.send(host)
    if trace:
        trace.on_connect_start.send(host)

The Trace function won't be a first citizen class that could be created per each request in the ClientSession._request function:

def _request(self, ..)
    trace = Trace(self, trace_context)
    self.connector.connect(trace=trace)

I'm still wondering if it's the best implementation have the signals attached to ClientSession, and if its worth it to move them to another object called TraceRequest, passing this object to the ClientSession at constructor time. perhaps::

trace_request = TraceRequest()
trace_request.on_request_start.append(my_function)
session = ClientSession(..., trace_request=trace_request)

When the developer wants to trace a specific client session gives it as a parameter, meanwhile the default behavior doesn't trace the requests. It will allow us to use the freeze method of the signals, once the TraceRequest instance is given to ClientSession all signals belonging to that specific TraceRequest instance are frozen.

Worth mentioning that having the signals to trace a request decoupled from the session, it forces us to pass the session instance as the first parameter for all of the signals.

Thoughts ?

@pfreixes
Copy link
Contributor Author

pfreixes commented Nov 8, 2017

I couldn't help it and I wrote a POC of the last proposal changes without testing here [1], two things that I dont like it

  1. The repeated pattern if trace: trace.send(...), but its just a matter of cosmetic issue Indeed having this pattern, the performance impact when there is not tracing enabled should be the minimum.
  2. The trace.send('signal') function, it breaks a bit the normal usage of signals.
  3. Linked with the previous one, if in the future we have support for function signals we must force to refactorize the Trace object.

[1] pfreixes@4caddd9

Copy link
Member

@kxepal kxepal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few comments here. + biggest request: provide some visualization of signals during request lifetime. It will help a lot to quickly understand what calls when.

"""
Sends data to all registered receivers.
"""
yield from self._send(*args, **kwargs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

async/await

docs/client.rst Outdated
print("Starting request")

async def on_request_end(trace_context, resp):
print("Ending request")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wording bit: is on_request_end happens just before request is ended or when it's actually ended. -ing suffix confusing here.

# cleanup timer
tm.close()
if handle:
handle.cancel()
handle = None

yield from self.on_request_exception.send(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

await

@@ -354,15 +386,29 @@ def _request(self, method, url, *,
handle.cancel()

resp._history = tuple(history)
yield from self.on_request_end.send(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

await

if trace_context is None:
trace_context = SimpleNamespace()

yield from self.on_request_start.send(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

await

def test_request_tracing_proxies_connector_signals(loop):
connector = TCPConnector(loop=loop)
session = aiohttp.ClientSession(connector=connector, loop=loop)
assert id(session.on_request_queued_start) == id(connector.on_queued_start)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not to do simpler is check?

@@ -474,3 +476,121 @@ def test_client_session_implicit_loop_warn():

asyncio.set_event_loop(None)
loop.close()


@asyncio.coroutine
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you know what (:

@pfreixes
Copy link
Contributor Author

@asvetlov and @kxepal the way to implement the client signals have been rewritten at least the way are exposed to the user and how the user has to enable them into the ClientSession object, its based on the rationale exposed in this comment [1].

A good point of start is the documentation [2].

Disclaimer: I didn't take care of the awaitish work that has to be done in the ClientSession object taking into account that it might be done by another PR that will take care of it plus all of the tests.

[1] #2429 (comment)
[2] https://github.com/aio-libs/aiohttp/pull/2429/files#diff-e5af816206d783c3bd169522ceecf99bR750

@asvetlov
Copy link
Member

The code might be a bit verbose but at least the change leave it more readable.

Readability beats code succinctness, at least for libraries.
Also it simplifies debugging.

@asvetlov
Copy link
Member

I don't like a list of traceconfig objects but support your idea in general.
We could provide a meta config class with TraceConfig interface:

from aws_xray_aiohttp import ClientConfigXRaySignals
from .mymodule import ClientConfigOpentTSDB

session = ClientSession(trace_configs=TraceConfigComposite(ClientConfigXRaySignals(), ClientConfigOpentTSDB()))

I believe the proposal keeps internals simple but allows to specify multiple tracers easy.

@pfreixes
Copy link
Contributor Author

The good thing about the array is that it follows the same idea of the middlewares, and the implementation its pretty straightforward having only to modify the ad-hoc Trace.send_<signa>() functions iterating for all of the TraceoConfig instances.

@asvetlov ^^

@asvetlov
Copy link
Member

Do you want to support both single value and a list?

@pfreixes
Copy link
Contributor Author

IMHO doesnt matter, for the sake of API clearness we can force to give always a list, if not this is something that can be handled internally by the ClientSession init:

self._trace_configs = trace_configs if isinstance(trace_configs, list) else [trace_configs]

@asvetlov
Copy link
Member

Ok

@pfreixes
Copy link
Contributor Author

@asvetlov and @kxepal more changes, now the ClientSession accepts a list of TraceConfig objects, allowing the user to install different trace config objects. To make so, the trace_context has become unique per TraceConfig. Therefore, to allow the user progress custom values from the request() method has been created a new parameter, in favor of the old one trace_context, called trace_request_context.

I think that the documentation reflects pretty well these changes [1], please read it and review it. Sorry for the many changes, but I'm an almost absolutely believe that these changes are necessary for the user.

[1] https://github.com/aio-libs/aiohttp/pull/2429/files#diff-e5af816206d783c3bd169522ceecf99bR750

@asvetlov
Copy link
Member

docs/client.rst was split up to client_quickstart and client_advanced. Most likely you need the second one or, even better, create a new client_tracing.rst for Tracing Usage. I suspect it is a complex thing, the document content will grow.
docs/toc.rst gas gone, add tracing_reference to client.rst toctree.

@asvetlov
Copy link
Member

Thanks for hard work @pfreixes
Is it possible to rename trace_request_context -> trace_context?
The name is very long, I believe request application is obvious in the context of using the feature.

The PR is really very huge, I'm inclining to merge it after fixing very obvious changes and continue the work on client tracing in next PRs. Including a discussion for every supported signal parameters.

@@ -294,6 +294,12 @@ The client session supports the context manager protocol for self closing.

.. versionadded:: 2.3

:param trace_context: Object used to give as a param for the all signals
triggered by the ongoing request. Default uses the object returned
by the :class:`TraceConfig.trace_context()` method.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:meth:TraceConfig.trace_context maybe. Parenthesis is not required by sphinx

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no longer required, this documentation has been deprecated.

Property that gives access to the signals that will be executed when a
request starts, based on the :class:`~signals.Signal` implementation.

The coroutines listening will receive as a param the `session`,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spinx prefer double backticks, e.g. `session` should be replaced by ``session``.
On other hand CPython documentation uses *session* for parameters bu looks like in aiohttp docs we have not pronounced agreement to use double backticks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, using `` as you proposed.

@@ -500,7 +500,8 @@ def test_gen_netloc_no_port(make_request):
'012345678901234567890'


async def test_connection_header(loop, conn):
@asyncio.coroutine
def test_connection_header(loop, conn):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why removing async def?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My fault, fixed.

@pfreixes
Copy link
Contributor Author

Just to put all of us in the same page, sure the large MR and many changes do not help. The trace_context naming is reserved for the TraceConfig objects that use this param to pass along the different signals a place to store internal information related to the same request and same TraceConfig

Meanwhile the trace_request_context is the param that the user can pass at the begining of the request that can be used at any signal.

My proposal for the sake of clearness will be:

  • trace_context to trace_config_ctx
  • trace_request_context to trace_request_ctx

Or other alternatives?

@asvetlov
Copy link
Member

trace_config_ctx and trace_request_ctx are perfect!

@pfreixes
Copy link
Contributor Author

Documentation accommodated to the new architecture, basically added the tracing as a new section of the client_advanced chapter and added a new chapter as tracing_reference.

More or less the PR is done to be Merged or move on with the reviewing.

Disclaimer: once the code is merged I will open a discussion about if there is any chance to make a port to the 2.X series for this feature to have later a new 2.4 release. The goal is to receive feedback to the community and improve the API for the 3.X version and also important make it usable by the developers right now.

@asvetlov asvetlov merged commit ca8878a into aio-libs:master Nov 18, 2017
@kxepal
Copy link
Member

kxepal commented Nov 18, 2017

🎉

@asvetlov
Copy link
Member

@pfreixes the PR has merged.
Thanks for hard work again.

Personally I doubt if we need 2.4 release:

  1. We have no strong schedule for 3.0 but I hope to release it after New Year, definitely in January.
    Not much time is left.
  2. IMHO the proposal is not ready yet -- we need a careful review of tracing signals and, most important, their signatures.
  3. Current master has dramatic changes: async/await and documentation refactoring. Keeping backports is nightmare in this situation.

@asvetlov
Copy link
Member

I support @kxepal emoji. Sorry, I'm not emoji jedi but share the delight of merging the proposal into master branch!

@pfreixes
Copy link
Contributor Author

Oks January is not so far away. But my intentions are starting to work with the support of AWS XRay for Aiohttp and also refactor one of the middlewares that instrumentalizes the calls made through ClientSession, using for both the new tracing system.

I hope that this is gonna give us a bit of feedback.

@asvetlov
Copy link
Member

Check a concept on real usage is always very useful.

@lock
Copy link

lock bot commented Oct 28, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a [new issue] for related bugs.
If you feel like there's important points made in this discussion, please include those exceprts into that [new issue].
[new issue]: https://github.com/aio-libs/aiohttp/issues/new

@lock lock bot added the outdated label Oct 28, 2019
@lock lock bot locked as resolved and limited conversation to collaborators Oct 28, 2019
@psf-chronographer psf-chronographer bot added the bot:chronographer:provided There is a change note present in this PR label Oct 28, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bot:chronographer:provided There is a change note present in this PR outdated
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants