New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Implement WSGI middleware integration. #84

Merged

a-feld merged 38 commits into open-telemetry:master from a-feld:wsgi-integration

Aug 19, 2019

Member

a-feld commented Aug 12, 2019

This PR implements a WSGI middleware (#81) which invokes the OpenTelemetry tracer APIs to time the request/response cycle.

This PR currently uses trace_api.tracer() and tracer.start_span() APIs but the intent is to use the set_attribute APIs as well to add dimensions to the generated span.

a-feld force-pushed the wsgi-integration branch 2 times, most recently from bbd118c to 5938ea6 Compare

August 12, 2019 23:46

toumorokoshi reviewed

View reviewed changes

integrations/opentelemetry-wsgi/tests/test_wsgi_middleware.py Outdated Show resolved Hide resolved

Member

c24t commented Aug 13, 2019

@a-feld do you want reviews on this yet, or should we wait until it's out draft?

Member Author

a-feld commented Aug 13, 2019

@c24t Comments are always welcome! 😄 I was waiting until after #83 is merged to update the PR with set_attribute calls.

Member

reyang commented Aug 14, 2019

Regarding the package naming convention, package structure, folder structure, README.rst, setup.py, setup.cfg and tox, please take a look at https://github.com/census-instrumentation/opencensus-python/tree/master/contrib.

reyang reviewed

View reviewed changes

integrations/opentelemetry-wsgi/setup.cfg Outdated Show resolved Hide resolved

reyang reviewed

View reviewed changes

integrations/opentelemetry-wsgi/setup.cfg Outdated Show resolved Hide resolved

reyang reviewed

View reviewed changes

integrations/opentelemetry-wsgi/src/opentelemetry_wsgi/__init__.py Outdated Show resolved Hide resolved

Member

reyang commented Aug 14, 2019

@a-feld Just saw your comments that you expect early comments. Tip: you can let us know if the PR is ready for review by removing the draft status.
Any PR that has "WIP" or "Work in Progress" in the title, or draft status, is treated as under construction and I personally wouldn't comment on it until it's ready.
If the PR is ready for comments, but is pending on another PR before ready to merge, we can tag it as do-not-merge.

Oberon00 reviewed

View reviewed changes

integrations/opentelemetry-wsgi/.flake8 Outdated Show resolved Hide resolved

integrations/opentelemetry-wsgi/src/opentelemetry_wsgi/__init__.py Outdated

+                      self.span = None
+                  def _add_request_attributes(self, environ):
+                      self.span.add_attribute("http.method", environ["HTTP_METHOD"])

Member

Oberon00 Aug 13, 2019

We should provide constants for these well-known attributes (like http.method).

integrations/opentelemetry-wsgi/tests/test_wsgi_middleware.py Outdated Show resolved Hide resolved

integrations/opentelemetry-wsgi/src/opentelemetry_wsgi/__init__.py Outdated Show resolved Hide resolved

integrations/opentelemetry-wsgi/src/opentelemetry_wsgi/__init__.py Outdated Show resolved Hide resolved

integrations/opentelemetry-wsgi/src/opentelemetry_wsgi/__init__.py Outdated Show resolved Hide resolved

integrations/opentelemetry-wsgi/src/opentelemetry_wsgi/__init__.py Outdated Show resolved Hide resolved

integrations/opentelemetry-wsgi/src/opentelemetry_wsgi/__init__.py Outdated

+                          # TODO: change start_response to self._add_response_attributes after
+                          # set_attribute API is implemented
+                          for yielded in self.wsgi(environ, start_response):

Member

Oberon00 Aug 14, 2019

We should not add our own loop here. For example, if the underlying app uses https://www.python.org/dev/peps/pep-3333/#optional-platform-specific-file-handling, we destroy any optimizations done by that. Maybe more relevant, some servers could optimize if an object with known len is returned. I think yield from would have the same problem but I'm not sure. We could try to instrument/add the close method of the result (though that might fail if a C object is returned that does not support setting that attribute).

Member Author

a-feld Aug 14, 2019

This is definitely an interesting problem that I haven't considered in depth yet. From reading the specification I believe the optimization you're referring to is this part of the spec?

The callable must return an iterable object, and must not perform any data transmission until and unless the server/gateway actually receives the iterable as a return value from the application. (To do otherwise would prevent middleware from being able to interpret or override the response data.)

From a functional perspective, I believe the behavior is correct since the loop is iterating over the wsgi.file_wrapper iterable.

From a timing perspective, I wonder what the response time should be recorded as in this case? Should it include the time the server spends streaming the file? For any other request, the answer would generally be yes.

Performance-wise, the intent was to avoid allocating a new response-like iterable object and instead pass the values through transparently, taking advantage of the context manager behavior for proper timing. In order to maintain the pass-through behavior, the application would have to be invoked and a response wrapper object that was iterable would have to be created.

def __call__(self, environ, start_response):
    iterable = self.wsgi(environ, start_response)
    return ResponseTimer(iterable)  # allocation of the ResponseTimer object

Member Author

a-feld Aug 14, 2019 •

edited

Loading

Maybe more relevant, some servers could optimize if an object with known len is returned.

I'll have to think more about this. If there are examples of these types of optimizations I'd love to have a look!

Member Author

a-feld Aug 14, 2019 •

edited

Loading

Also, as far as I can tell, the only way to preserve the wsgi.file_wrapper optimization is to add an isinstance check for the wsgi.file_wrapper. Using this approach implies that we cannot time the full response cycle where the file contents are streamed.

So it sounds like the trade-off is performance versus accuracy of response time values.

integrations/opentelemetry-wsgi/src/opentelemetry_wsgi/__init__.py Outdated Show resolved Hide resolved

integrations/opentelemetry-wsgi/src/opentelemetry_wsgi/__init__.py Outdated

+                      if url is not None:
+                          self.span.add_attribute("http.url", url)
+                  def _add_response_attributes(self, status, response_headers, *args):

Member

Oberon00 Aug 14, 2019

For later, we should consider processing the exception information in args. Maybe add a # TODO for now.

Member

Oberon00 commented Aug 14, 2019

Regarding the package naming convention, package structure, folder structure, README.rst, setup.py, setup.cfg and tox, please take a look at https://github.com/census-instrumentation/opencensus-python/tree/master/contrib.

-- (@reyang in #84 (comment))

For example, this means no tox.ini but add the package to the root tox.ini?

Member

reyang commented Aug 14, 2019

For example, this means no tox.ini but add the package to the root tox.ini?

Exactly. Personally I would highly recommend this way because:

It makes CI much easier (especially if the CI involves installing multiple packages across the repo).
It enforces all the packages to follow the same set of rules/tools (e.g. lint, doc, coverage).
Maintenance cost is lower.

Later if we decided to move the package to a git submodule, we can easily copy the tox.ini file with small changes.

a-feld force-pushed the wsgi-integration branch 3 times, most recently from 8380e96 to 71a7c6e Compare

August 14, 2019 18:16

a-feld marked this pull request as ready for review

August 14, 2019 18:18

a-feld requested review from c24t and carlosalberto as code owners

August 14, 2019 18:18

a-feld force-pushed the wsgi-integration branch 5 times, most recently from bc6fed8 to 9b27c21 Compare

August 14, 2019 21:43

a-feld added 6 commits

August 15, 2019 10:47


          Initial implementation of WSGI middleware.

a94e567


          Add tox file for WSGI integration.

8d9cbd3


          Fix whitespace in setup.cfg

64ec559


          Remove http.route attribute.

af74bee


          Fix WSGI threaded cases.

ae8a776


          Add wsgiref request_uri fallback mechanism.

637ffdc

reyang reviewed

View reviewed changes

ext/opentelemetry-ext-wsgi/setup.cfg Outdated Show resolved Hide resolved


          Fix whitespace in setup.cfg

606e3bd

Co-Authored-By: Reiley Yang <reyang@microsoft.com>

reyang reviewed

View reviewed changes

ext/opentelemetry-ext-wsgi/src/opentelemetry/ext/wsgi/__init__.py Show resolved Hide resolved

reyang reviewed

View reviewed changes

ext/opentelemetry-ext-wsgi/src/opentelemetry/ext/wsgi/__init__.py Outdated Show resolved Hide resolved

reyang reviewed

View reviewed changes

ext/opentelemetry-ext-wsgi/src/opentelemetry/ext/wsgi/__init__.py Outdated

+                          start_response = self._create_start_response(span, start_response)
+                          for yielded in self.wsgi(environ, start_response):
+                              yield yielded

Member

reyang Aug 16, 2019 •

edited

Loading

Interesting, first time I've seen yield yielded :)

Member

Oberon00 Aug 16, 2019

I still vote to at least use yield from.

Member Author

a-feld Aug 16, 2019 •

edited

Loading

yield from is equivalent in this circumstance since wsgi applications are not coroutines and unfortunately using yield from doesn't solve the issue with WSGI applications becoming generators / deferring the call to the wsgi callable 😞

Member Author

a-feld Aug 16, 2019 •

edited

Loading

I've been putting some thought into this and I think yield from would actually be incorrect in this case. While the call to self.wsgi returns an iterable, that iterable is not guaranteed to be a generator so delegating send and close calls to the result of self.wsgi would be incorrect.

Note that yield from defined in PEP 380 defines yield from as being used by a generator to delegate part of its operations to another generator.

Since self.wsgi does not always return a generator, using yield from here would be incorrect.

Member

Oberon00 Aug 19, 2019 •

edited

Loading

yield from supports, but does not require the iterable (it is defined as yield from <iterable> not yield from <generator>, even though that seems to have been the main motivation) to implement close or throw or send. But the WSGI spec explicitly calls out close and generators, see https://www.python.org/dev/peps/pep-3333/#specification-details. So delegating close (if it exists) would actually be beneficial:

If the iterable returned by the application has a close() method, the server or gateway must call that method upon completion of the current request, whether the request was completed normally, or terminated early due to an application error during iteration or an early disconnect of the browser. (The close() method requirement is to support resource release by the application. This protocol is intended to complement PEP 342's generator support, and other common iterables with close() methods.)

What is more, yield from seems to be a tiny bit faster than a manual loop:

In [1]: def generate(xs): yield from xs

In [4]: def generate2(xs):
   ...:     for x in xs:
   ...:         yield x
   ...:

In [6]: xs = [b'<html>'] * 1024

In [18]: %timeit for _ in generate(xs): pass
23.5 µs ± 374 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [19]: %timeit for _ in generate2(xs): pass
30.2 µs ± 729 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [21]: xs = xs[:1]

In [22]: %timeit for _ in generate(xs): pass
196 ns ± 1.19 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

In [23]: %timeit for _ in generate2(xs): pass
202 ns ± 4.79 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

Member Author

a-feld Aug 19, 2019

The close call is definitely an omission on my part, great catch! I'll fix this before merging! 😄

For yield from, the performance improvements are pretty compelling but the behavior does change though between yield and yield from. We don't want to delegate close calls to an underlying iterable that's not a generator. I think the problem here is the "re-use" of the close function in the WSGI sense versus the generator sense. Here's an example where close behavior is unstable:

import sys


class Response:
    def __init__(self, v, exc=None):
        self.iterable = iter(v)
        self.exc = exc

    def __next__(self):
        if self.exc:
            raise self.exc
        return next(self.iterable)

    def __iter__(self):
        return self

    def close(self):
        caller = sys._getframe(1)
        print("***** close called from {}***** ".format(str(caller)))


def gen_yielded(iterable):
    try:
        for yielded in iterable:
            yield yielded
    finally:
        iterable.close()


def gen_yield_from(iterable):
    try:
        yield from iterable
    finally:
        iterable.close()


def drive(g, first_value_only=False, exc=None):
    iterable = Response([1,2,3], exc=exc)
    g = g(iterable)

    try:
        for item in g:
            if first_value_only:
                break
    except Exception:
        return "exception"
    finally:
        g.close()

    return item


print("gen_yielded complete:")
print(drive(gen_yielded), "\n")
print("gen_yield_from complete:")
print(drive(gen_yield_from), "\n")

print("gen_yielded first_only:")
print(drive(gen_yielded, first_value_only=True), "\n")
print("gen_yield_from first_only:")
print(drive(gen_yield_from, first_value_only=True), "\n")

print("gen_yielded exc:")
print(drive(gen_yielded, exc=ValueError), "\n")
print("gen_yield_from exc:")
print(drive(gen_yield_from, exc=ValueError), "\n")

Result:

gen_yielded complete:
***** close called from <frame at 0x10ab915f0, file 'wsgi.py', line 27, code gen_yielded>***** 
3 

gen_yield_from complete:
***** close called from <frame at 0x10ac31bf0, file 'wsgi.py', line 34, code gen_yield_from>***** 
3 

gen_yielded first_only:
***** close called from <frame at 0x10ab915f0, file 'wsgi.py', line 27, code gen_yielded>***** 
1 

gen_yield_from first_only:
***** close called from <frame at 0x10ac82250, file 'wsgi.py', line 48, code drive>***** 
***** close called from <frame at 0x10ac31bf0, file 'wsgi.py', line 34, code gen_yield_from>***** 
1 

gen_yielded exc:
***** close called from <frame at 0x10ab915f0, file 'wsgi.py', line 27, code gen_yielded>***** 
exception 

gen_yield_from exc:
***** close called from <frame at 0x10ac31bf0, file 'wsgi.py', line 34, code gen_yield_from>***** 
exception

reyang reviewed

View reviewed changes

ext/opentelemetry-ext-wsgi/tests/test_wsgi_middleware.py Outdated Show resolved Hide resolved

reyang reviewed

View reviewed changes

tox.ini Show resolved Hide resolved

reyang reviewed

View reviewed changes

tox.ini Outdated Show resolved Hide resolved


          Update tox.ini

5c5139c

Co-Authored-By: Reiley Yang <reyang@microsoft.com>

reyang approved these changes

View reviewed changes

Member

reyang left a comment

LGTM after fixing the lint issues.

a-feld added 3 commits

August 16, 2019 09:54


          Fix lint errors.


          Combine propagators into a single argument.

ec52931


          Use kwargs for start_response callable.

375518b

c24t approved these changes

View reviewed changes

Member

c24t left a comment

No more blocking comments from me. This PR looks great, and will be especially helpful as we write integration tests to prepare for the release.

Thanks for fielding all these comments @a-feld, and for forcing me to go read the WSGI PEPs.

Let's get @Oberon00's approval before merging.

Member

Oberon00 commented Aug 19, 2019

Even though I'd like to see the yield from discussion properly resolved, apart from that, this LGTM.

Member

Oberon00 commented Aug 19, 2019 •

edited

Loading

Actually, I noticed one additional (probably minor) issue: By making __call__ a generator, we only start the span once the server starts iterating. Since we also delay all processing that the wrapped WSGI app might do directly in it's __call__ to that point, the span's time should be more or less correct though. (My LGTM still stands, if you want, this could be discussed in a separate issue.)

Oberon00 reviewed

View reviewed changes

ext/opentelemetry-ext-wsgi/setup.cfg Outdated Show resolved Hide resolved

Oberon00 reviewed

View reviewed changes

ext/opentelemetry-ext-wsgi/src/opentelemetry/ext/wsgi/__init__.py Outdated Show resolved Hide resolved


          Remove universal flag from wsgi integration.

ac2076e

Member Author

a-feld commented Aug 19, 2019 •

edited

Loading

Actually, I noticed one additional (probably minor) issue: By making call a generator, we only start the span once the server starts iterating. Since we also delay all processing that the wrapped WSGI app might do directly in it's call to that point, the span's time should be more or less correct though. (My LGTM still stands, if you want, this could be discussed in a separate issue.)

Yes, this is the case but the timing is still correct minus the time it takes between call and iterate (which should encompass only the generator allocation + anything between these two calls in the server).

The alternative is to allocate a custom response iterable and return that object:

def __call__(self, environ, start_response):
  span = tracer.create_span(...)
  span.start()
  iterable = self.wsgi(environ, start_response)
  return Response(span, iterable)

I figured the performance hit for the allocation of the Response object wasn't worth it, but it is an option 😄


          Change span name to path_info.

d53bf27

a-feld force-pushed the wsgi-integration branch from 445a794 to 6f007ed Compare

August 19, 2019 17:42


          Call close on wsgi iterable.

637adfe

a-feld force-pushed the wsgi-integration branch from 6f007ed to 637adfe Compare

August 19, 2019 18:05

Member Author

a-feld commented Aug 19, 2019

Thanks for the reviews everybody! 🎉 I'm going to merge this in for now. We can always follow up if changes are required!

a-feld merged commit 17afba5 into open-telemetry:master

a-feld deleted the wsgi-integration branch

August 19, 2019 18:10

a-feld mentioned this pull request

REQUEST: New membership for @a-feld open-telemetry/community#165

Closed

6 tasks

Oberon00 mentioned this pull request

WSGI fixes #148

Merged

srikanthccv pushed a commit to srikanthccv/opentelemetry-python that referenced this pull request


          Move Node/browser specific code to platform folder, add browser tests (…

ce8446d

…open-telemetry#84)

* Setup `platform` folder and add browser tests

* Add @types/webpack-env to fix TS compile error

* Add CHROME_BIN environment variable for CircleCI

* Use the Node 12 with browsers CircleCI image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

reyang reyang approved these changes

victoraugustolls victoraugustolls left review comments

Oberon00 Oberon00 left review comments

toumorokoshi toumorokoshi left review comments

c24t c24t approved these changes

carlosalberto Awaiting requested review from carlosalberto

Labels

None yet