Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: ASGI support #1573

Merged
merged 32 commits into from
Jan 20, 2020
Merged

feat: ASGI support #1573

merged 32 commits into from
Jan 20, 2020

Conversation

kgriffs
Copy link
Member

@kgriffs kgriffs commented Oct 5, 2019

This patch adds the much-anticipated async support to Falcon by way of a
new ASGI interface, additional testing helpers, and updated internals.

Note that only the HTTP ASGI interface is implemented. WebSocket support
is planned to follow.

Docs will be fully fleshed-out in a follow-up PR.

Changelog snippets will also be added in a follow-up PR.

In order to reconcile differences between the WSGI and ASGI interfaces,
several breaking changes were made in this patch, as follows:

BREAKING CHANGE: create_environ no longer sets a default user agent header

BREAKING CHANGE: Renamed protocol kwarg for create_environ() to
http_version and also the renamed kwarg only takes the version string
(no longer prefixed with "HTTP/")

BREAKING CHANGE: Renamed app kwarg for create_environ() to root_path.
and deprecated, may be removed in a future release.

BREAKING CHANGE: get_http_status() is deprecated, no longer accepts floats

BREAKING CHANGE: BoundedStream.writeable() changed to writable() per the
standard file-like I/O interface (the old name was a misspelling).

BREAKING CHANGE: api_helpers.prepare_middleware() no longer accepts a single
object; the value that is passed must be an iterable.

BREAKING CHANGE: Removed outer "finally" block from API and APP; add an
exception handler for the base Exception type if you need to deal with
unhandled exceptions.

BREAKING CHANGE: falcon.request.access_route will now include the value of
the remote_addr property as the last element in the route, if not already present
in one of the headers that are checked.

BREAKING CHANGE: When the 'REMOTE_ADDR' field is not present in the WSGI
environ, Falcon will assume '127.0.0.1' for the value, rather than
simply returning None for Request.remote_addr.

@kgriffs kgriffs requested review from nZac, jmvrbanac and vytas7 October 5, 2019 00:12
@kgriffs kgriffs changed the title [WIP - DO NOT MERGE] feat: ASGI support feat: ASGI support Oct 5, 2019
@kgriffs
Copy link
Member Author

kgriffs commented Oct 5, 2019

OK folks, this is ready for review!

@kgriffs kgriffs force-pushed the asgi-final branch 3 times, most recently from afb1d38 to 4a314fa Compare October 6, 2019 00:50
@kgriffs kgriffs mentioned this pull request Oct 6, 2019
@kgriffs
Copy link
Member Author

kgriffs commented Oct 6, 2019

Implements #1358

Copy link
Member

@vytas7 vytas7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job, I haven't managed to review it in depth yet so just some quick things I reacted to.
Furthermore, I'll try to just fiddle with the whole thing with semi-real apps so I get a better "feel" of it, and can hopefully provide better feedback.

One general question regarding Cython, do we need to make them [Cython and ASGI] mutually exclusive?
I'm thinking if we couldn't just exclude ASGI files (basically anything involving async or await keywords) from cythonization? If async stuff is calling "normal" Python functions, those can be well be cythonized, or can't they?

elif self.text is not None:
block += 'data: ' + self.text + '\n'
elif self.json is not None:
block += 'data: ' + json_dumps(self.json, ensure_ascii=False) + '\n'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Composing string this way from many items (local_var += '...something...' etc) is optimized on CPython if you keep concatenating towards a local variable, but there is no guarantee it wouldn't perform poorly on PyPy (unless you benchmarked it?)

See also here -- String concatenation is expensive, the recommended way is

parts = []
for x in mylist:
    parts.append(foo(x))
s = "".join(parts)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC PyPy actually optimizes string concatenation pretty well, at least for small numbers of concatenations. In the past when I've benchmarked this strategy, concatenation tended to be faster for at least up to ~5 concatenations (at the expense of a tiny bit of memory overhead), and it doesn't seem likely that we would go beyond that here. That being said, I will do some specific benchmarking with this method as-is vs. using join().

Copy link
Member

@vytas7 vytas7 Nov 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 small string concatenations would definitely compare in favour or +=.
I was just thinking here we actually had more concatenations, like block += 'data: ' + self.text + '\n' is actually 3 += concatenations (although the two first reduce to BINARY_ADD, but the final concat to block is INPLACE_ADD) and so on.
But I wouldn't be surprised it still outperforms join() (at least on CPython).

falcon/routing/compiled.py Outdated Show resolved Hide resolved
falcon/testing/client.py Outdated Show resolved Hide resolved
@@ -305,3 +311,89 @@ def get_http_status(status_code, default_reason='Unknown'):
except AttributeError:
# not found
return str(code) + ' ' + default_reason


@functools.lru_cache(maxsize=64)
Copy link
Member

@vytas7 vytas7 Nov 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've commented on #1135 (comment), functools.lru_cache had thread-safety issues which where fixed post- the CPython 3.6+ release.
Maybe the issue is less serious here with async being mixed-and-matched with threads less often, and us mandating Python 3.6+, but still thought it might be worth bringing attention to the issue. Maybe we should check which CPython 3.5.x and 3.6.x versions are susceptible to the issue, and make a decision based on the findings. And check PyPy 3.6 if we are going to support ASGI on it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I think we will have to do something about this.

Copy link
Member Author

@kgriffs kgriffs Dec 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opted to just use a nop for older pythons. The performance decrease probably won't be noticeable for most people, and over time those platforms will be used less and less.

@vytas7
Copy link
Member

vytas7 commented Nov 11, 2019

I started the whole thing (testing in progress), issues found so far:

Serious issues:
None so far!

Cosmetic issues/desired improvements etc:

  • Trying to use a WSGI resource with asgi.App results in a fatal warning with an explanation what went wrong, however, the opposite case happily proceeds until a request is handled, and the results are unpredictable (calling such a "responder" produces a coroutine object, which is basically no-op in the sense of request handling, but then strange warnings are produced)
  • Could we consider streamlining implementation of custom media handlers somewhat? Right now, serialize and deserialize are defined as abstract methods, so one needs to take care of them even in case one is only interested in the async variants for an ASGI app

"""

data = await stream.read()
return self.deserialize(io.BytesIO(data), content_type, content_length)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to override content_length here ➡️ len(data)?

Copy link
Member

@vytas7 vytas7 Jan 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To explain a bit more how I was reasoning here.
Although we'll be changing the media API to not always mandate the content_length, but this might be helpful to legacy handlers that may be expecting the perfect content_length.

OTOH multipart form part media handling has the same requirement even on the WSGI side, so maybe there is no point in shimming this way if we cannot consistently guarantee it anyway.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't hurt. I went ahead and made the change.

@kgriffs
Copy link
Member Author

kgriffs commented Dec 11, 2019

I've rebased this on master (non-trivial!) and made a few tweaks. I still need to take care of the following:

  • Address the remainder of @vytas7's comments
  • Add an async_to_sync() helper
  • Require the use of awaitables for responders, middleware, and error handlers
  • Make sure that recent changes on the WSGI side are all ported over as necessary
  • Add all the docstrings (with full docs probably waiting for a followup PR)
  • Add towncrier fragments.
  • Resolve cython+asgi issue

import falcon.testing as testing

_SERVER_HOST = '127.0.0.1'
_SERVER_PORT = 9000 + os.getpid() % 100 # Facilitates parallel test execution
Copy link
Member

@vytas7 vytas7 Dec 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that while OS PIDs are normally assigned sequentially on Unix-like systems, one needs to walk around existing PIDs for other programs, furthermore, allocation is restarted (often to PID 300) at some point to recycle IDs. The approach presented here is probably good enough for practical use, but could it be worth adding a disclaimer note?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went ahead and just switched to using a simple RNG with a fairly large range. That should be sufficient and be easier to reason about.

@kgriffs kgriffs force-pushed the asgi-final branch 6 times, most recently from a45ae55 to 2331955 Compare December 18, 2019 01:42
@kgriffs kgriffs force-pushed the asgi-final branch 5 times, most recently from 374b655 to c7b5cf9 Compare January 4, 2020 22:33
@falconry falconry deleted a comment from codecov bot Jan 4, 2020
@kgriffs
Copy link
Member Author

kgriffs commented Jan 4, 2020

@vytas7 OK, I think I've addressed all your feedback so far.

@kgriffs
Copy link
Member Author

kgriffs commented Jan 4, 2020

I'm going to start working on docstrings and towncrier fragments. Everyone please let me know if you see any other implementation/design issues.

@codecov
Copy link

codecov bot commented Jan 5, 2020

Codecov Report

Merging #1573 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@          Coverage Diff           @@
##           master   #1573   +/-   ##
======================================
  Coverage     100%    100%           
======================================
  Files          45      45           
  Lines        3102    3102           
  Branches      479     479           
======================================
  Hits         3102    3102

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 19e726b...5d5982b. Read the comment docs.

vytas7
vytas7 previously approved these changes Jan 5, 2020
Copy link
Member

@vytas7 vytas7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for the alpha 🎉

One area that may need improvement is recipes how to test SSE events in a test suite employing simulated requests, but I don't perceive this as a blocker for the alpha. I also need to experiment more with SSE to provide meaningful suggestions.
To be more specific, it is quite common that SSE events are emitted due to server state changes caused by other requests. Is it possible to simulate parallel ASGI requests towards an asgi.App? I.e. that I keep the simulated SSE response stream open while simulating PATCH, POST, DELETE etc and validating that I'm receiving the corresponding SSE messages?

def cors_client():
app = falcon.App(cors_enable=True)
def cors_client(asgi):
app = create_app(asgi, cors_enable=True)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get CompatibilityError when using cors_enable=True with asgi App.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@kgriffs
Copy link
Member Author

kgriffs commented Jan 8, 2020

@vytas7 re SSE testing that may be worth creating a standalone issue so we don't forget about it?

@vytas7
Copy link
Member

vytas7 commented Jan 8, 2020

@vytas7 re SSE testing that may be worth creating a standalone issue so we don't forget about it?

Created #1634.

Copy link
Member

@vytas7 vytas7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@kgriffs
Copy link
Member Author

kgriffs commented Jan 8, 2020

Due to the cython bug re iscoroutinefunction(), we may have to add some warnings to the docs about cythonizing ASGI apps, since that would hit the same issues we've been experiencing in the framework itself.

In some cases we could simply not do the check (when it is just to highlight a developer mistake), but in other cases it is less clear how we would avoid using iscoroutinefunction() (e.g., jsonschema.validate, hooks) without creating explicitly named functions (e.g., validate_async()). I suppose we could do that, but it puts more burden on the developer to remember to use the right function name.

See also:

@vytas7
Copy link
Member

vytas7 commented Jan 8, 2020

Yeah, I guess we'll just have to be transparent in the docs and make sure the warnings are easily discoverable.

OTOH we could make everything explicit, and always expect coroutines where applicable, i.e.

  • Do not perform developer mistake checks
  • Do jsonschema the validate_async() way
  • Hooks could be applied with falcon.asgi.after / falcon.asgi.before instead of falcon.after / falcon.before

Going this path would make porting clumsier, and easier to make mistakes due to lacking checks though...

FWIW I was reading more on the subject, and although Cython maintainers state on cython/cython#2273 they would accept a PR exposing a faux _is_coroutine attribute, I'm not sure if this is a sustainable solution given the whole asyncio.coroutine thing is deprecated and slated for removal in Python 3.10.

@kgriffs
Copy link
Member Author

kgriffs commented Jan 9, 2020

I'm not sure if this is a sustainable solution given the whole asyncio.coroutine thing is deprecated and slated for removal in Python 3.10.

Agreed. It seems that the proposed solution will only work for asyncio.iscoroutinefunction() and not for inspect.iscoroutinefunction(), with the former possibly going away in Python 3.10. The alternative is setting co_flags bits, but apparently that will not work in Python 3.8+ because of the new inspect._has_code_flag() function that ignores flags when the function is not a Python function (but a Cython function). So then you would have to check it directly ala hug, but that isn't a general enough solution so I can understand why scoder is reluctant to go there...

...hence the proposal to try and create an attribute-based protocol for it in core python. If a PR does land that makes asyncio.iscoroutinefunction() work, I suppose we could use that in the interim while we wait on https://bugs.python.org/issue38225 .

Regardless, I think it does make sense to provide an escape hatch for anyone using cythonized async functions in their app. I could provide alternative functions that just assume the thing is a coroutine. I would then sprinkle the docs with warnings re the issue and the workaround (using the alternative function names). I could even catch any TypeErrors that are raised while attempting to await the result of calling a non-coroutine function, check for cython in the environment, and finally raise an error with a message that talks about the workaround.

If we are able to come to a rough consensus on this, I can tackle it following 3.0.0a1.

@nZac nZac requested a review from vytas7 January 17, 2020 01:46
# hasn't cleaned up yet.
# NOTE(kgriffs): Use our own Random instance because we don't want
# pytest messing with the seed.
server_port = _random.randint(50000, 60000)
Copy link
Member

@vytas7 vytas7 Jan 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll probably need to make this more robust in the future, because it may randomly clash with other stuff on the system; although the probability is low, it may clash between tests as well.

As evidenced in CI (I have now rerun the job in question):

[Server process start succeeded]
---------------------------- Captured stderr setup -----------------------------
2020-01-20 08:12:03,049 INFO     Starting server at tcp:port=53348:interface=127.0.0.1
2020-01-20 08:12:03,049 INFO     HTTP/2 support not enabled (install the http2 and tls Twisted extras)
2020-01-20 08:12:03,050 INFO     Configuring endpoint tcp:port=53348:interface=127.0.0.1
2020-01-20 08:12:03,050 CRITICAL Listen failure: Couldn't listen on 127.0.0.1:53348: [Errno 98] Address already in use.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants