-
Notifications
You must be signed in to change notification settings - Fork 631
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement WSGI middleware integration. #84
Conversation
bbd118c
to
5938ea6
Compare
@a-feld do you want reviews on this yet, or should we wait until it's out draft? |
Regarding the package naming convention, package structure, folder structure, |
integrations/opentelemetry-wsgi/src/opentelemetry_wsgi/__init__.py
Outdated
Show resolved
Hide resolved
@a-feld Just saw your comments that you expect early comments. Tip: you can let us know if the PR is ready for review by removing the draft status. |
self.span = None | ||
|
||
def _add_request_attributes(self, environ): | ||
self.span.add_attribute("http.method", environ["HTTP_METHOD"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should provide constants for these well-known attributes (like http.method
).
integrations/opentelemetry-wsgi/src/opentelemetry_wsgi/__init__.py
Outdated
Show resolved
Hide resolved
integrations/opentelemetry-wsgi/src/opentelemetry_wsgi/__init__.py
Outdated
Show resolved
Hide resolved
integrations/opentelemetry-wsgi/src/opentelemetry_wsgi/__init__.py
Outdated
Show resolved
Hide resolved
integrations/opentelemetry-wsgi/src/opentelemetry_wsgi/__init__.py
Outdated
Show resolved
Hide resolved
|
||
# TODO: change start_response to self._add_response_attributes after | ||
# set_attribute API is implemented | ||
for yielded in self.wsgi(environ, start_response): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not add our own loop here. For example, if the underlying app uses https://www.python.org/dev/peps/pep-3333/#optional-platform-specific-file-handling, we destroy any optimizations done by that. Maybe more relevant, some servers could optimize if an object with known len
is returned. I think yield from
would have the same problem but I'm not sure. We could try to instrument/add the close
method of the result (though that might fail if a C object is returned that does not support setting that attribute).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is definitely an interesting problem that I haven't considered in depth yet. From reading the specification I believe the optimization you're referring to is this part of the spec?
The callable must return an iterable object, and must not perform any data transmission until and unless the server/gateway actually receives the iterable as a return value from the application. (To do otherwise would prevent middleware from being able to interpret or override the response data.)
From a functional perspective, I believe the behavior is correct since the loop is iterating over the wsgi.file_wrapper
iterable.
From a timing perspective, I wonder what the response time should be recorded as in this case? Should it include the time the server spends streaming the file? For any other request, the answer would generally be yes.
Performance-wise, the intent was to avoid allocating a new response-like iterable object and instead pass the values through transparently, taking advantage of the context manager behavior for proper timing. In order to maintain the pass-through behavior, the application would have to be invoked and a response wrapper object that was iterable would have to be created.
def __call__(self, environ, start_response):
iterable = self.wsgi(environ, start_response)
return ResponseTimer(iterable) # allocation of the ResponseTimer object
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe more relevant, some servers could optimize if an object with known len is returned.
I'll have to think more about this. If there are examples of these types of optimizations I'd love to have a look!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, as far as I can tell, the only way to preserve the wsgi.file_wrapper
optimization is to add an isinstance check for the wsgi.file_wrapper
. Using this approach implies that we cannot time the full response cycle where the file contents are streamed.
So it sounds like the trade-off is performance versus accuracy of response time values.
integrations/opentelemetry-wsgi/src/opentelemetry_wsgi/__init__.py
Outdated
Show resolved
Hide resolved
if url is not None: | ||
self.span.add_attribute("http.url", url) | ||
|
||
def _add_response_attributes(self, status, response_headers, *args): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For later, we should consider processing the exception information in args
. Maybe add a # TODO
for now.
-- (@reyang in #84 (comment)) For example, this means no |
Exactly. Personally I would highly recommend this way because:
Later if we decided to move the package to a git submodule, we can easily copy the |
8380e96
to
71a7c6e
Compare
bc6fed8
to
9b27c21
Compare
Co-Authored-By: Reiley Yang <reyang@microsoft.com>
ext/opentelemetry-ext-wsgi/src/opentelemetry/ext/wsgi/__init__.py
Outdated
Show resolved
Hide resolved
start_response = self._create_start_response(span, start_response) | ||
|
||
for yielded in self.wsgi(environ, start_response): | ||
yield yielded |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, first time I've seen yield yielded
:)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still vote to at least use yield from
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yield from
is equivalent in this circumstance since wsgi applications are not coroutines and unfortunately using yield from
doesn't solve the issue with WSGI applications becoming generators / deferring the call to the wsgi callable 😞
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been putting some thought into this and I think yield from
would actually be incorrect in this case. While the call to self.wsgi
returns an iterable, that iterable is not guaranteed to be a generator so delegating send
and close
calls to the result of self.wsgi
would be incorrect.
Note that yield from
defined in PEP 380 defines yield from
as being used by a generator to delegate part of its operations to another generator.
Since self.wsgi
does not always return a generator, using yield from
here would be incorrect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yield from
supports, but does not require the iterable (it is defined as yield from <iterable>
not yield from <generator>
, even though that seems to have been the main motivation) to implement close or throw or send. But the WSGI spec explicitly calls out close
and generators, see https://www.python.org/dev/peps/pep-3333/#specification-details. So delegating close
(if it exists) would actually be beneficial:
If the iterable returned by the application has a close() method, the server or gateway must call that method upon completion of the current request, whether the request was completed normally, or terminated early due to an application error during iteration or an early disconnect of the browser. (The close() method requirement is to support resource release by the application. This protocol is intended to complement PEP 342's generator support, and other common iterables with close() methods.)
What is more, yield from
seems to be a tiny bit faster than a manual loop:
In [1]: def generate(xs): yield from xs
In [4]: def generate2(xs):
...: for x in xs:
...: yield x
...:
In [6]: xs = [b'<html>'] * 1024
In [18]: %timeit for _ in generate(xs): pass
23.5 µs ± 374 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [19]: %timeit for _ in generate2(xs): pass
30.2 µs ± 729 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [21]: xs = xs[:1]
In [22]: %timeit for _ in generate(xs): pass
196 ns ± 1.19 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
In [23]: %timeit for _ in generate2(xs): pass
202 ns ± 4.79 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The close call is definitely an omission on my part, great catch! I'll fix this before merging! 😄
For yield from, the performance improvements are pretty compelling but the behavior does change though between yield
and yield from
. We don't want to delegate close
calls to an underlying iterable that's not a generator. I think the problem here is the "re-use" of the close function in the WSGI sense versus the generator sense. Here's an example where close behavior is unstable:
import sys
class Response:
def __init__(self, v, exc=None):
self.iterable = iter(v)
self.exc = exc
def __next__(self):
if self.exc:
raise self.exc
return next(self.iterable)
def __iter__(self):
return self
def close(self):
caller = sys._getframe(1)
print("***** close called from {}***** ".format(str(caller)))
def gen_yielded(iterable):
try:
for yielded in iterable:
yield yielded
finally:
iterable.close()
def gen_yield_from(iterable):
try:
yield from iterable
finally:
iterable.close()
def drive(g, first_value_only=False, exc=None):
iterable = Response([1,2,3], exc=exc)
g = g(iterable)
try:
for item in g:
if first_value_only:
break
except Exception:
return "exception"
finally:
g.close()
return item
print("gen_yielded complete:")
print(drive(gen_yielded), "\n")
print("gen_yield_from complete:")
print(drive(gen_yield_from), "\n")
print("gen_yielded first_only:")
print(drive(gen_yielded, first_value_only=True), "\n")
print("gen_yield_from first_only:")
print(drive(gen_yield_from, first_value_only=True), "\n")
print("gen_yielded exc:")
print(drive(gen_yielded, exc=ValueError), "\n")
print("gen_yield_from exc:")
print(drive(gen_yield_from, exc=ValueError), "\n")
Result:
gen_yielded complete:
***** close called from <frame at 0x10ab915f0, file 'wsgi.py', line 27, code gen_yielded>*****
3
gen_yield_from complete:
***** close called from <frame at 0x10ac31bf0, file 'wsgi.py', line 34, code gen_yield_from>*****
3
gen_yielded first_only:
***** close called from <frame at 0x10ab915f0, file 'wsgi.py', line 27, code gen_yielded>*****
1
gen_yield_from first_only:
***** close called from <frame at 0x10ac82250, file 'wsgi.py', line 48, code drive>*****
***** close called from <frame at 0x10ac31bf0, file 'wsgi.py', line 34, code gen_yield_from>*****
1
gen_yielded exc:
***** close called from <frame at 0x10ab915f0, file 'wsgi.py', line 27, code gen_yielded>*****
exception
gen_yield_from exc:
***** close called from <frame at 0x10ac31bf0, file 'wsgi.py', line 34, code gen_yield_from>*****
exception
Co-Authored-By: Reiley Yang <reyang@microsoft.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM after fixing the lint issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even though I'd like to see the |
Actually, I noticed one additional (probably minor) issue: By making |
ext/opentelemetry-ext-wsgi/src/opentelemetry/ext/wsgi/__init__.py
Outdated
Show resolved
Hide resolved
Yes, this is the case but the timing is still correct minus the time it takes between call and iterate (which should encompass only the generator allocation + anything between these two calls in the server). The alternative is to allocate a custom response iterable and return that object: def __call__(self, environ, start_response):
span = tracer.create_span(...)
span.start()
iterable = self.wsgi(environ, start_response)
return Response(span, iterable) I figured the performance hit for the allocation of the |
445a794
to
6f007ed
Compare
6f007ed
to
637adfe
Compare
Thanks for the reviews everybody! 🎉 I'm going to merge this in for now. We can always follow up if changes are required! |
…open-telemetry#84) * Setup `platform` folder and add browser tests * Add @types/webpack-env to fix TS compile error * Add CHROME_BIN environment variable for CircleCI * Use the Node 12 with browsers CircleCI image
This PR implements a WSGI middleware (#81) which invokes the OpenTelemetry tracer APIs to time the request/response cycle.
This PR currently uses
trace_api.tracer()
andtracer.start_span()
APIs but the intent is to use theset_attribute
APIs as well to add dimensions to the generated span.