HTTP interface updates (part 2) #131

jonemo · 2023-02-04T00:49:03Z

Updates the HTTP client related interfaces and implementations to match the most recent spec.

A few high level notes for orientation:

fields & fieldlists interfaces and implementation #122 was a precurser PR covering only HTTP headers/fields. This PR starts using the classes added in fields & fieldlists interfaces and implementation #122.
Much of the line count of the diff is due to trivial renaming, for example request.url to request.destination.
The spec defines the request.body of requests and responses as an iterable of byte chunks. Because we currently focus on async clients, I chose to make this an AsyncIterable. This caused me a lot of head scratching and led to smithy_python.async_utils.async_list which can be used to turn a plain list into a value for request.body.
The sync awscrt client has been removed and everything in smithy_python._private.http.crt got consolidated accordingly. Updating crt.py to conform to the new interface is a big part of the diff.
Some scattered use of exceptions in the http module is now consolidated to use smithy_python.exceptions.SmithyHTTPException.
I removed any test code that builds on integration tests pytest fixture for running local http server #128 from this PR.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

python-packages/smithy-python/smithy_python/interfaces/http.py

dlm6693 · 2023-02-07T15:06:51Z

python-packages/smithy-python/smithy_python/interfaces/http.py

    """An asynchronous HTTP client interface."""

+    def __init__(self, *, client_config: HttpRequestConfiguration | None) -> None:


I don't think we need the return type for __init__ functions

We should just run a test that does AST crawling and validates these kinds of things for us so it can be automated. It doesn't matter which way we go, it should just be consistent.

@dlm6693 If this matters to you, please update the repo's config so that mypy doesn't flag it up as an error. Once that's working I will gladly drop all the return type annotations for __init__s. Currently, if I remove these everywhere, I get this:

➜ smithy-python ✗ ./pants check :: 11:38:14.74 [ERROR] Completed: Typecheck using MyPy - mypy failed (exit code 1). python-packages/smithy-python/smithy_python/_private/http/crt.py:31: error: Function is missing a return type annotation python-packages/smithy-python/smithy_python/_private/http/crt.py:31: note: Use "-> None" if function does not return a value python-packages/smithy-python/smithy_python/_private/http/crt.py:41: error: Function is missing a return type annotation python-packages/smithy-python/smithy_python/_private/http/crt.py:41: note: Use "-> None" if function does not return a value python-packages/smithy-python/smithy_python/_private/http/crt.py:185: error: Call to untyped function "_AWSCRTEventLoop" in typed context python-packages/smithy-python/smithy_python/_private/http/crt.py:205: error: Call to untyped function "AwsCrtHttpResponse" in typed context Found 4 errors in 1 file (checked 30 source files)

Ah ok I'd seen a comment about this before. I don't really care one way or the other.

Previous discussion was here: #122 (comment)

There and here, I initially dropped it in response to the comment just to get hit with a mypy error and bringing it back.

python-packages/smithy-python/smithy_python/_private/http/crt.py

python-packages/smithy-python/smithy_python/_private/http/__init__.py

nateprewitt · 2023-02-07T17:55:10Z

python-packages/smithy-python/smithy_python/interfaces/http.py

+    first byte over an established, open connection before timing out.
+    """
+
+    force_http_2: bool = False


I'd bias against having booleans like this for configuration. It will likely be more scalable in the long term to have a single protocol enum that supports a subset like HTTP/1.1, HTTP/2, Auto at the start. That will enable us to integrate QUIC(H/3) without a new configuration option and having to worry about how they all interact.

What you are describing approximately matches the spec for HttpVersionPolicy in the "Connections and Connection Pooling" doc of the reference architecture specification. For this PR I have set the boundary of what's included at the "Smithy Requests and Responses" doc (and still ended up with this huge diff).

This HttpClientConfiguration only exists in this PR because I wanted to keep the client-level config concept that already existed in smithy_python._private.crt.AwsCrtHttpSessionConfig. It matches conceptually to the ConnectConfiguration in "Connections and Connection Pooling", and I would prefer if we defer the details of this to a future task where that doc gets implemented.

That's fine. If we're going to wait on doing the implementation properly, I'd almost prefer we keep this as an empty class then with a TODO comment rather than doing something we'd consider wrong.

nateprewitt · 2023-02-07T17:56:39Z

python-packages/smithy-python/smithy_python/interfaces/http.py

    """An asynchronous HTTP client interface."""

+    def __init__(self, *, client_config: HttpRequestConfiguration | None) -> None:


We should just run a test that does AST crawling and validates these kinds of things for us so it can be automated. It doesn't matter which way we go, it should just be consistent.

nateprewitt · 2023-02-07T17:58:44Z

python-packages/smithy-python/tests/integration/http_clients/conftest.py

+
+
+@pytest.fixture
+def aws_request() -> HTTPRequest:


We shouldn't have anything aws_* related in the smithy_python package. Let's try to keep naming/example domains generic unless we're in aws-smithy-python.

I'm happy to change the name if you think that's necessary.

Note that the only thing AWS specific about this is that the request gets made to an AWS owned domain. I think it's generally a good idea to use hosts that we trust to remain available and to not do weird things with request logs in tests.

I also proposed an alternative that would do away with the need to make external requests at all in #128.

Let's leave the URL and change the fixture to something like sample_request then. While I agree with the domain safety concern, I'd like to keep as many references out of the smithy-python client as possible. This is mostly to prevent muddying the barrier between both packages.

We've already had a couple PRs that struggled with that and I'd prefer to avoid any visual queues that might perpetuate the problems if possible.

Done in 518cd6a

nateprewitt · 2023-02-07T17:59:24Z

python-packages/smithy-python/tests/integration/http_clients/test_aiohttp.py

+    config = AwsCrtHttpClientConfig()
+    session = AioHttpClient(client_config=config)


🤔 Why are we passing a CRT config to an AioHttpClient?

Same note about decoupling this package from AWS.

Why are we passing a CRT config to an AioHttpClient?

That's a mistake. However, it highlights that there currently isn't much of a point in having different [HTTPClientName]Config classes for each http client. Maybe that will start being important in future, or maybe I should just consolidate this into a single HTTPClientConfig.

Same note about decoupling this package from AWS.

Is this a general statement about moving the AWS Common Runtime HTTP client from smithy-python to aws-smithy-python? I didn't consider that because the CRT http client was already here. But is the CRT's http client truly AWS specific? Or is it just another generic HTTP client that can exist next to aiohttp and any others we will implement?

Since writing the comment above I read the "Connections and Connection Pooling" doc again and got reminded that all the details of the various config objects are covered there. I wrote up a separate task to implement all these config objects.

nateprewitt · 2023-02-07T18:03:02Z

python-packages/smithy-python/tests/unit/test_http.py

 def test_uri_password_but_no_username() -> None:
    uri = URI(host="test.aws.dev", password="def")
    assert uri.password == "def"
-    # the password is ignore if no username is present
+    # the password is ignored if no username is present
    assert uri.netloc == "test.aws.dev"


Is this right? I don't think there's anything precluding this, I know Basic auth specifically allows for a password without an explicit username.

The reference architecture doc mentions RFC 3987 as the guideline for URI. Section 3.2.1 declares the concept of any part of the URI being a "password" deprecated:

Use of the format "user:password" in the userinfo field is deprecated.

While not explicitly prohibited, my interpretation of this sentence is that something other than the empty string needs to appear before the first (optional) colon character:

The userinfo subcomponent may consist of a user name and, optionally, scheme-specific information about how to gain authorization to access the resource.

Of course, that's the RFC and then there's reality. I erred towards staying close to the RFC since that's what the RA doc references.

So I know we've had this issue even recently in regards to proxies that still use userinfo for auth. Corporations consider it "safe" because it never leaves their internal firewall. There's also libraries that will strip this from the visible URL and encode it in the Authorization header.

If we don't support it, it should be all or nothing. This specific case of not allowing :password is an odd middle ground. I'm in favor of making it possible (but difficult to do wrong) rather than completely removing the functionality. I'll concede on that though if there's a strong opinion to the opposite.

… the Response implementation

JordonPhillips · 2023-02-07T13:58:12Z

python-packages/smithy-python/smithy_python/interfaces/http.py

+@dataclass(kw_only=True)
+class HttpClientConfiguration:
+    """Client-level HTTP configuration.
+
+    :param force_http_2: How long, in seconds, the client will attempt to read the
+    first byte over an established, open connection before timing out.
+    """
+
+    force_http_2: bool = False


The docs for this are wrong. But also, configuration should ideally be per-request. For libraries that have http version as session configuration (such as aiohttp), the implementation of HttpClient could just construct and cache the necessary underlying sessions as needed. That being said, do we even need this yet?

Fixed in d113c5e

You fixed the docs but didn't address the remainder of the comment.

python-packages/smithy-python/smithy_python/interfaces/http.py

python-packages/smithy-python/smithy_python/_private/http/__init__.py

JordonPhillips · 2023-02-07T15:12:35Z

python-packages/smithy-python/smithy_python/_private/http/__init__.py

 @dataclass
 class Endpoint(interfaces.http.Endpoint):
-    url: interfaces.http.URI
-    headers: interfaces.http.HeadersList = field(default_factory=list)
+    url: interfaces.URI


May as well fix this while we're updating all of this anyway

Suggested change

url: interfaces.URI

uri: interfaces.URI

Replaced "url" with "uri" everywhere (where it's possible) in 3db2f53

The reason this was originally specified as URL and not URI is because an endpoint in the context of HTTP is explicitly a location. It constrains the usage of the broader URI concept which is an arbitrary identifier.

We can move to uri here if we feel it's less confusing but we're leaking some context from outside the transport stack into the HTTP module.

We can move to uri here if we feel it's less confusing but we're leaking some context from outside the transport stack into the HTTP module.

In this context we could also have a discussion about what should be inside vs outside the HTTP module. I made a half-hearted attempt to sort things into the right place by moving interfaces.http.URI to interfaces.URI but then stopped to not make the diff even bigger. If I understand correctly, all of the following should be outside the HTTP module:

URI

Fields, Field, FieldPosition

Endpoint, EndpointParam, EndpointResolver

I can take care of (1) and (2) in this PR or in a follow-up PR if you want. (3) should probably be left for whoever implements the Endpoints component of the reference architecture, I assume there will be a separate Endpoints module.

Yeah, I think that all makes sense. The concepts of URI/Fields can definitely live externally from the HTTP module. Otherwise we'll be re-implementing them for things like WebSockets, MQTT, etc.

I'm also fine creating a rough shape for Endpoint right now and deferring proper integration until implementation starts.

I have the change to move URI and everything related to Fields out of the http modules. If I pushed this to the branch here it would increase the PR diff to 1,110 additions and 829 deletions. I don't think that's what we want, so I'll leave that for an immediate follow-on PR after this one is merged.

(Note to self: The commit with those changes is ff6a879.)

JordonPhillips · 2023-02-07T15:37:39Z

python-packages/smithy-python/smithy_python/_private/http/aiohttp_client.py

+            url=self._serialize_url_without_query(request.destination),
+            params=parse_qs(request.destination.query),
+            headers=headers_list,
+            data=await request.consume_body(),


You can't assume that the body is going to fit into memory.

aiohttp's documentation for this parameter is actually wrong because it says:

This can be a FormData object or anything that can be passed into FormData

But that would be an absurd api if true because it would mean that aiohttp only supports bodies with the content types multipart/form-data or application/x-www-form-urlencoded. If you look at the code that handles the body you can see it actually delegates to this registry thing that supports delegating to a bunch of different body handlers depending on the type passed in. Fortunately for us, one of those handlers takes an AsyncIterable, so we can just hand it off directly.

Suggested change

data=await request.consume_body(),

data=request.body,

Thank you @JordonPhillips for doing the detective work on this. Your suggestion is included in 3ae2bea

python-packages/smithy-python/smithy_python/_private/http/aiohttp_client.py

JordonPhillips · 2023-02-07T15:50:59Z

python-packages/smithy-python/smithy_python/async_utils.py

+async def async_list(lst: Iterable[_ListEl]) -> AsyncIterable[_ListEl]:
+    """Turn an Iterable into an AsyncIterable"""
+    for x in lst:
+        await sleep(0)


Why is this necessary?

I've actually found this function useful in writing tests. Its not exactly straightforward to write an async iterable from scratch.

Assuming that this question is specifically about the sleep: I am not certain it is necessary in any realistic case where this function is used. The general idea is that it makes sure that control is handed back to the event loop at some point while iterating.

python-packages/smithy-python/smithy_python/exceptions.py

# Conflicts: # codegen/smithy-python-codegen/src/main/java/software/amazon/smithy/python/codegen/HttpProtocolTestGenerator.java # codegen/smithy-python-codegen/src/main/java/software/amazon/smithy/python/codegen/integration/HttpBindingProtocolGenerator.java

dlm6693 · 2023-02-09T14:02:11Z

...ithy-python-codegen/src/main/java/software/amazon/smithy/python/codegen/ClientGenerator.java

-                        endpoint.url.host = context.transport_request.url.host + endpoint.url.host
-                        context._transport_request.url = endpoint.url
-                        context._transport_request.headers.extend(endpoint.headers)
+                        if not endpoint.uri.path:


If we're setting a falsy value here, I think this should check if its None instead.

dummy comment to request changes. accidentally hit approve button 🙃

We may be able to handle that in a future PR but since we're not functionally changing the behavior here, we should try to avoid additional scope creep.

designs/shapes.md

python-packages/smithy-python/smithy_python/_private/http/__init__.py

python-packages/smithy-python/smithy_python/_private/http/aiohttp_client.py

python-packages/smithy-python/smithy_python/_private/http/crt.py

dlm6693 · 2023-02-09T15:00:50Z

...ithy-python-codegen/src/main/java/software/amazon/smithy/python/codegen/ClientGenerator.java

-                        endpoint.url.host = context.transport_request.url.host + endpoint.url.host
-                        context._transport_request.url = endpoint.url
-                        context._transport_request.headers.extend(endpoint.headers)
+                        if not endpoint.uri.path:


dummy comment to request changes. accidentally hit approve button 🙃

This reverts commit 82a42a0.

dlm6693

A couple of questions and one minor formatting nit, but I think this is looking pretty good!

dlm6693 · 2023-02-14T18:35:23Z

python-packages/smithy-python/smithy_python/_private/http/aiohttp_client.py

+    def _serialize_uri_without_query(self, uri: interfaces.URI) -> str:
+        """Serialize all parts of the URI up to and including the path."""
+        components = (uri.scheme, uri.host, uri.path or "", "", "", "")
        return urlunparse(components)


Is there a particular advantage to doing it this way vs. constructing a new URI with no query and calling .build on it? Wondering if doing it that way may be 'better' but it probably doesn't matter.

dlm6693 · 2023-02-14T18:37:40Z

python-packages/smithy-python/smithy_python/_private/http/crt.py

+    def reason(self) -> str | None:
+        """Optional string provided by the server explaining the status."""
+        # TODO: See how CRT exposes reason.
+        return None


Apologies if this was explained before, but could you provide additional context on why this is being emptied out? Will this be a part of the follow-up PR you're working on?

What do you mean by "emptied out"? The interface prescribes having a reason field. My understanding is that this is the string that a server optionally returns in the same line as the numeric HTTP status code. The awscrt Python package doesn't return this reason string, at least not in an obvious place. I also don't see it getting dealt with in the awscrt C code. I left this TODO comment here to signal that this is something that maybe could be dug into. I did not make a followup task specifically for this.

Will this be a part of the follow-up PR you're working on?

The only followup PR I am currently committed to is the one mentioned in this comment. Apart from that, I am not currently assigned to any tasks in this repo.

python-packages/smithy-python/smithy_python/_private/http/crt.py

JordonPhillips

There's still one bit un-addressed but I'm fine deferring that to a follow-up.

jonemo requested review from JordonPhillips, nateprewitt and dlm6693 February 4, 2023 00:49

jonemo force-pushed the http-interface-update-2 branch 2 times, most recently from 0986894 to cb87f50 Compare February 7, 2023 07:24

dlm6693 suggested changes Feb 7, 2023

View reviewed changes

dlm6693 reviewed Feb 7, 2023

View reviewed changes

python-packages/smithy-python/smithy_python/_private/http/__init__.py Show resolved Hide resolved

nateprewitt reviewed Feb 7, 2023

View reviewed changes

jonemo added 6 commits February 7, 2023 16:51

requests & responses

40a2102

http client integration tests

bf962de

move HTTPException to exceptions.py

fe6253c

update usage of interfaces in Java code

f3544c2

correctly implement HTTPResponse in crt client

b406bc2

work around Python throwing AttributeErrors when explicitly declaring…

745204c

… the Response implementation

jonemo force-pushed the http-interface-update-2 branch from cb87f50 to fd2a1c2 Compare February 8, 2023 00:47

jonemo added 2 commits February 7, 2023 17:48

docstrings polishing

d113c5e

Fields.extend

4646525

jonemo force-pushed the http-interface-update-2 branch from fd2a1c2 to 264671b Compare February 8, 2023 00:49

update protocolutils to new http interfaces

76d57fb

jonemo force-pushed the http-interface-update-2 branch from 264671b to 76d57fb Compare February 8, 2023 01:06

JordonPhillips requested changes Feb 8, 2023

View reviewed changes

jonemo added 6 commits February 8, 2023 10:35

naming: HTTP* -> Http*

69e4a43

remove HttpResponse.done

f859f16

AioHttpClient fixes suggested by @JordonPhillips

3ae2bea

replace "url" with "uri" everywhere

3db2f53

get CI to green after merge from develop

a75555b

dlm6693 approved these changes Feb 9, 2023

View reviewed changes

dlm6693 self-requested a review February 9, 2023 14:59

dlm6693 suggested changes Feb 9, 2023

View reviewed changes

jonemo added 6 commits February 9, 2023 14:46

HTTPResponse.reason is optional

17e60f7

rename tuples_list_to_fields to tuples_to_fields

42b8da0

fix naming: AwsCrtHttpClientConfig -> AioHttpClientConfig

7ebe202

ConsumeBodyMixin

82a42a0

rename fixture aws_request --> sample_request

518cd6a

Revert "ConsumeBodyMixin"

636051e

This reverts commit 82a42a0.

jonemo requested review from dlm6693 and JordonPhillips February 14, 2023 16:49

dlm6693 suggested changes Feb 14, 2023

View reviewed changes

dlm6693 mentioned this pull request Feb 15, 2023

sigv4 implementation without using CRT #136

Closed

jonemo added 2 commits February 15, 2023 13:03

Revert: naming: HTTP* -> Http*

448bb4f

black

6c4844b

jonemo requested a review from dlm6693 February 15, 2023 20:09

dlm6693 approved these changes Feb 15, 2023

View reviewed changes

JordonPhillips approved these changes Feb 16, 2023

View reviewed changes

nateprewitt approved these changes Feb 16, 2023

View reviewed changes

jonemo merged commit 21a9e01 into develop Feb 16, 2023

jonemo deleted the http-interface-update-2 branch February 16, 2023 17:13

jonemo mentioned this pull request Feb 16, 2023

Move URI, Field, Fields outside of http module #138

Merged

		"""An asynchronous HTTP client interface."""

		def __init__(self, *, client_config: HttpRequestConfiguration \| None) -> None:

		config = AwsCrtHttpClientConfig()
		session = AioHttpClient(client_config=config)

HTTP interface updates (part 2) #131

HTTP interface updates (part 2) #131

Conversation

jonemo commented Feb 4, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jonemo Feb 8, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JordonPhillips Feb 7, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jonemo Feb 10, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dlm6693 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JordonPhillips left a comment

Choose a reason for hiding this comment

jonemo Feb 8, 2023 •

edited

Loading

JordonPhillips Feb 7, 2023 •

edited

Loading

jonemo Feb 10, 2023 •

edited

Loading