Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Local address support. #100

Closed
wants to merge 18 commits into from
Closed

Local address support. #100

wants to merge 18 commits into from

Conversation

bwelling
Copy link

Based on the discussion in #88 .

This isn't really complete yet. The open issues are (at least).

  • There's no testing. I haven't even done any manual testing; I mostly wanted to verify that the changes look like a good start.

  • The raised exceptions should be improved.

  • mypy reports a bogus error. See socket.create_connection source_address should be optional python/typeshed#4116

  • It would be nice to support passing a raw string as local_addr (if you just want to specify an address, and don't care about port). I haven't done that yet.

@bwelling
Copy link
Author

The typing issue has been fixed upstream (python/typeshed#4133).

@bwelling
Copy link
Author

Any comments on this approach?

@florimondmanca
Copy link
Member

I'm a bit far away from discussions that happened in #88 but this looks sensible enough to me. :-)

@florimondmanca
Copy link
Member

florimondmanca commented Jun 11, 2020

It would be nice to support passing a raw string as local_addr

Another option while keeping the tuple form could be to mark the port there as Optional[int], like we did in URL via #92.

@bwelling
Copy link
Author

Making the port an Optional[int] would mean having to pass an (<address>, None) tuple, which isn't really any better than (<address>, 0). I think using something like Union[SocketAddress, str, bytes, bytearray] would work, but I was running into typing issues when I first tried it.

@florimondmanca
Copy link
Member

Hmm, a general design philosophy I've seen in httpcore is simplicity/minimalism in terms of interfaces and usage.

Eg I'm not sure what the meaning of Union[SocketAddress, str, bytes, bytearray] would be; I'd assume this would mostly be for the sake of convenience. But from my perspective httpcore is not designed to be convenient (eg look at the kind of tuple one needs to pass for URLs, it looks pretty unusual). We should require users pass only what is required and nothing more - any smarts around encoding/decoding of such an address should be up to higher-level clients like HTTPX.

Which is why I was suggesting the Optional approach as a way to literally transcribe from "pass a tuple containing the hostname and an optional port", while following the lead of #92. :-)

@bwelling
Copy link
Author

Fair enough. If passing in a tuple from higher layers is ok, there's no reason for the port to be optional; callers can pass in 0.

@bwelling bwelling changed the title WIP: Local address support. Local address support. Jun 15, 2020
@bwelling
Copy link
Author

There are some tests now, although it's hard to test too much in an automated fashion, since proper testing requires the system to have multiple routable addresses and working IPv6, and testing explicit source port numbers requires that the ports are unused. I've done manual testing for all of these.

Setting local_addr requires setting family, as asyncio requires that. Maybe it would be better to use something like a (family, address, port) tuple, instead of separate fields for family and address. This could allow address=None, or require that address be set to the appropriate wildcard address for the family if the address isn't important.

The bogus error flagged by mypy is now suppressed; hopefully the typeshed fix will be picked up at some point.

I think this is ready for a more formal review.

Copy link
Contributor

@yeraydiazdiaz yeraydiazdiaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @bwelling! 💯

Left some minor comments on the current implementation.

Setting local_addr requires setting family, as asyncio requires that. Maybe it would be better to use something like a (family, address, port) tuple, instead of separate fields for family and address. This could allow address=None, or require that address be set to the appropriate wildcard address for the family if the address isn't important.

I think it might be best to stay with the current approach, follow httpcore's "you know what you're doing" philosophy and leave ergonomics to HTTPX. I'd love to hear other's thoughts on this though.

@@ -76,6 +76,9 @@ class AsyncConnectionPool(AsyncHTTPTransport):
* **keepalive_expiry** - `Optional[float]` - The maximum time to allow
before closing a keep-alive connection.
* **http2** - `bool` - Enable HTTP/2 support.
* **family** - `Optional[int]` - Address family to use
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit:

Suggested change
* **family** - `Optional[int]` - Address family to use
* **family** - `int` - Address family to use, defaults to 0.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll fix this.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the type of the family parameter be socket.AddressFamily rather than just int?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If all of occurrences of 0 were replaced with socket.AF_UNSPEC, then sure. I don't want to make this change yet, though, since if there's consensus to combine family and address into a single field, then there wouldn't be a need for either one, because the entire field would be None.

async def test_http_request_local_addr(local_addr: str) -> None:
family = socket.AF_INET6 if ':' in local_addr else socket.AF_INET
async with httpcore.AsyncConnectionPool(family=family,
local_addr=(local_addr, 0)) as http:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, I would've expected Black to reformat this 🤔

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I must have forgotten to run the reformatting script before my last commit; it's done now.

@pytest.mark.asyncio
# This doesn't run with trio, since trio doesn't support local_addr.
async def test_http_request_local_addr(local_addr: str) -> None:
family = socket.AF_INET6 if ':' in local_addr else socket.AF_INET
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems leftover from when there was more than one value in the parametrization, might be worth removing it altogether?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test really should be trying more than one case, but when I tried testing IPv6, it failed due to the test system not supporting IPv6. I'd rather leave this to make it more clear that it should be parametrized, but if it's important, it can be removed.

@@ -204,6 +205,46 @@ async def test_http_proxy(
assert reason == b"OK"


@pytest.mark.parametrize("family", [socket.AF_INET])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably worth just setting it below?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, this originally tested AF_INET6 also, but the test machine doesn't support IPv6.

@bwelling
Copy link
Author

Thanks @bwelling! 💯

Left some minor comments on the current implementation.

Setting local_addr requires setting family, as asyncio requires that. Maybe it would be better to use something like a (family, address, port) tuple, instead of separate fields for family and address. This could allow address=None, or require that address be set to the appropriate wildcard address for the family if the address isn't important.

I think it might be best to stay with the current approach, follow httpcore's "you know what you're doing" philosophy and leave ergonomics to HTTPX. I'd love to hear other's thoughts on this though.

I agree with this, but I'm not sure which approach matches "you know what you're doing" more; having 2 linked parameters or 1 more complicated parameter? That is:

a) Have separate family (int) and local_addr (tuple of address and port) parameters (as the current patch does)

b) Have a single optional parameter (maybe called source) containing (family, address, port), where address could be None.

c) Have a single optional parameter containing (family, address, port), where all 3 fields must not be None (that is, address would be set to the appropriate wildcard address for the family if the specific address was unimportant but family was)

I don't think there's a reason to support more than 1 of those choices, but I don't know which of them best matches the current philosophy.

@njsmith
Copy link

njsmith commented Jun 24, 2020

Is specifying the source port useful? I wasn't planning to support that in Trio, since it's kind of inherently incompatible with any kind of happy-eyeballs support... (And similarly, Twisted's version of this feature only lets you specify the source host, not the source port.)

@bwelling
Copy link
Author

bwelling commented Jun 24, 2020

Is specifying the source port useful? I wasn't planning to support that in Trio, since it's kind of inherently incompatible with any kind of happy-eyeballs support... (And similarly, Twisted's version of this feature only lets you specify the source host, not the source port.)

In the case where you care about happy-eyeballs support, probably not. But if you're specifying a single IPv4 or IPv6 source address (which this patch does), you explicitly don't, and being able to specify the source port is a pretty reasonable thing to do.

While Twisted might not support a port, the stdlib socket and asyncio routines do, and curio at least appears to.

@njsmith
Copy link

njsmith commented Jun 24, 2020

In the case where you care about happy-eyeballs support, probably not. But if you're specifying a single IPv4 or IPv6 source address (which this patch does), you explicitly don't

Happy eyeballs is still useful if you're restricting to just IPv4 or just IPv6. Lots of website DNS names resolve to multiple IPv4/IPv6 addresses. E.g.:

❯ host debian.org
debian.org has address 149.20.4.15
debian.org has address 128.31.0.62
debian.org has address 130.89.148.77
debian.org has IPv6 address 2603:400a:ffff:bb8::801f:3e
debian.org has IPv6 address 2001:4f8:1:c::15
debian.org has IPv6 address 2001:67c:2564:a119::77

being able to specify the source port is a pretty reasonable thing to do.

I didn't ask if it was reasonable, I asked if it was useful :-). Supporting source ports isn't trivial: the exact desired semantics aren't obvious, and writing good tests is tricky. It can be done, obviously, but if you want us to do the work then it'd be nice to have more of a justification. I've looked for use cases, and I haven't found any yet that care about source ports. (See: python-trio/trio#275 (comment))

An interesting case is the requests-toolbelt SourceAddressAdapter. It does support setting the source port, but has a big warning saying that if you try to actually use it then it will break stuff, so please don't: https://toolbelt.readthedocs.io/en/latest/adapters.html#sourceaddressadapter

@bwelling
Copy link
Author

I didn't ask if it was reasonable, I asked if it was useful :-). Supporting source ports isn't trivial: the exact desired semantics aren't obvious, and writing good tests is tricky. It can be done, obviously, but if you want us to do the work then it'd be nice to have more of a justification. I've looked for use cases, and I haven't found any yet that care about source ports. (See: python-trio/trio#275 (comment))

After thinking about this some more, you're probably right. My use case here is DoH (dns-over-https) queries from dnspython, and dnspython does allow specifying source port. That's mostly historical in this case, though; while it's important for UDP in some cases, it's only really important in archaic situations for TCP, and probably not at all useful for HTTPS.

Would it be better to remove source-port support from this PR altogether, or have the trio backend raise an exception if a nonzero port is specified?

@njsmith
Copy link

njsmith commented Jun 24, 2020

Would it be better to remove source-port support from this PR altogether, or have the trio backend raise an exception if a nonzero port is specified?

I guess that's a judgement call for Tom to make, but if it were my project I would definitely remove the support entirely until a compelling use case comes along. For this kind of infrastructure project, it's way easier to add features later than it is to claw back features you added in a burst of enthusiasm and then later ended up regretting.

@bwelling
Copy link
Author

That sounds reasonable to me.

If someone who's willing to merge these changes could make a decision on what the new parameters to AsyncConnectionPool should be, I'll make those changes, and fix all of the underlying code to do the same thing.

@ntninja
Copy link

ntninja commented Jun 25, 2020

Just to document the current state of things in trio on this: @njsmith merged a commit in lightning speed that (python-trio/trio#1644) that added local_address parameter. Restricting the address family during name resolution is not implemented and njs thinks it's better to resolve both names types and just restrict the source address to "0.0.0.0" or "::" instead – that way trio can give a better error message that explains that name resolution actually succeeded, just not for the request address type. (No strong opinion on my side whether this restriction is for better or worse.)

I suppose that this means that the family parameter (if it is kept) should map to default local_address of "0.0.0.0"/"::" for AF_INET/AF_INET6 respectively.

This also creates an interesting edge-case: If both an address family and a local address are given and their kinds do not agree, what should happen. For backends that support both parameters, such a case will reliably cause an error down the line (as none of the resolved names will be compatible with the given socket type), but with previously mentioned implementation for trio the given local address would win instead. Maybe there should be some explicit error checking in place to catch this condition if both parameters are supplied? Or maybe its better to leave this edge-case explicitly undefined (ie: documented, but no special handling)? Some dev should probably decide this for completeness sake, although its just a minor thing…

@bwelling
Copy link
Author

bwelling commented Jul 8, 2020

Just following up, is there anything else I need to do here? We need this functionality in dnspython for asynchronous DNS-over-HTTPS and HTTP/2 support.

@@ -9,3 +9,4 @@
URL = Tuple[bytes, bytes, Optional[int], bytes]
Headers = List[Tuple[bytes, bytes]]
TimeoutDict = Dict[str, Optional[float]]
SocketAddress = Tuple[Union[str, bytes, bytearray], int]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be reasonable to keep this as tightly constrained as possible?
Presumably Tuple[bytes, int] is sufficient right?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If source-port is removed, this type isn't needed at all, as the address could just be a bytes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, yup.

@tomchristie
Copy link
Member

Just getting back into this...

I would definitely remove the [source-port] support entirely until a compelling use case comes along. For this kind of infrastructure project, it's way easier to add features later than it is to claw back features you added in a burst of enthusiasm and then later ended up regretting.

100% yup.

Having separate arguments seems to make sense to me, but it might make sense to keep this PR as tightly scoped as possible and just start with adding the local address support?

@bwelling
Copy link
Author

bwelling commented Jul 9, 2020

I just pushed an update that removes address family and source-port support; this now just allows setting a local address, and passes it through.

Note that this still doesn't support trio, as the trio change that @ntninja mentioned earlier hasn't been released yet.

@bwelling
Copy link
Author

It's been about 3 weeks since I addressed the last set of comments, and nothing's happened since then. Is there anything I can do to get this merged?

Copy link
Member

@tomchristie tomchristie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fantastic stuff, and thanks for the reminder yup.
We'll make sure this is landing for our imminent httpcore 0.10 release, alongside httpx 0.14.

I've got some tiny suggestions, but can probably also pick them up myself if needed.

Thank you so much for this - it's a really great addition! 🎉

@@ -23,11 +23,13 @@ def __init__(
http2: bool = False,
ssl_context: SSLContext = None,
socket: AsyncSocketStream = None,
local_addr: bytes = None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to check - Do we want local_addr or source_addr here?
What are trio, asyncio, and the sync stdlib using for their naming?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

asyncio uses local_addr, trio uses local_address, and the stdlib socket module uses source_address (and curio uses source_addr). There's really no consistency.

@@ -76,6 +76,7 @@ class AsyncConnectionPool(AsyncHTTPTransport):
* **keepalive_expiry** - `Optional[float]` - The maximum time to allow
before closing a keep-alive connection.
* **http2** - `bool` - Enable HTTP/2 support.
* **local_addr** - `Optional[bytes]` - Local address to connect from
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Miniature nitpick. Let's use a full stop to match the other cases. "Local address to connect from." 😃

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

) -> SocketStream:
host = hostname.decode("ascii")
connect_timeout = timeout.get("connect")
exc_map = {asyncio.TimeoutError: ConnectTimeout, OSError: ConnectError}
with map_exceptions(exc_map):
local_addrport = None
if local_addr:
local_addrport = (local_addr, 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

local_addrport = None if local_addr is None else (local_addr, 0)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

) -> AsyncSocketStream:
return await self.backend.open_tcp_stream(hostname, port, ssl_context, timeout)
return await self.backend.open_tcp_stream(
hostname, port, ssl_context, timeout, local_addr
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reckon we ought to use .open_tcp_stream(..., local_addr=local_addr) here.
Just makes it super obvious visually that it's an optional extra.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

) -> AsyncSocketStream:
if local_addr:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a comment that it is currently supported in trio master, and should be expected from version 0.16.1 onwards?

Also, what will our implementation change look like once trio support does land here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm adding a comment.

I believe the code would just add something like local_address=local_addr to the open_tcp_stream call, although it's possible that there would be a type mismatch, as the local_addr parameter is None or a bytes, and trio claims to want None or a str (although it just passes it into socket.bind, which should accept a bytes).

@tomchristie tomchristie added this to the 0.10.0 milestone Jul 31, 2020
@tomchristie tomchristie added the enhancement New feature or request label Jul 31, 2020
@tomchristie
Copy link
Member

Referencing the commit in trio that adds their support for this, mostly just because they do a great job on docs, and we might want to follow their lead here. https://github.com/python-trio/trio/pull/1644/files

@tomchristie tomchristie mentioned this pull request Aug 3, 2020
@tomchristie tomchristie mentioned this pull request Aug 5, 2020
@tomchristie
Copy link
Member

Great stuff - I've made some tiny amendments in #134. As soon as that's reviewed we'll get it in, and look at rolling our 0.10 release.

@tomchristie tomchristie closed this Aug 5, 2020
@bwelling
Copy link
Author

I might be missing something, but this was linked to by the httpx 0.14 release PR (encode/httpx#1083). These changes were merged into #134 and pulled up from there, but nothing was actually added to httpx 0.14, and there's nothing in that release PR or any other PR/issue indicating why not.

@tomchristie
Copy link
Member

@bwelling Indeed - that's because we don't actually need any code changes in httpx to support this.

For "local_address=...", and for our planned "uds=..." support, we'll require users to drop down to the low-level httpcore API, and install the transport using Client(transport=...)

I've drafted a PR with some docs notes that should hopefully clear this up... encode/httpx#1165

@bwelling
Copy link
Author

The doc referenced explains how to use it, but that seems like an especially fragile interface. If httpcore isn't designed to be user-friendly (which is fine), then requiring users to use it seems suboptimal. Having to create a separate transport adapter to set custom and somewhat uncommon options is a perfectly reasonable API design, but requiring users to also pass in a bunch of unrelated configuration options with no defaults is a recipe for disaster, as it means that users who want the standard client behavior, except for one thing, need to poll httpx over time to see if its defaults change, or if new fields are added or removed from the default set.

Providing wrappers in httpx around httpcore.SyncConnectionPool and httpcore.AsyncConnectionPool that merge a set of user-provided configuration options with the defaults would go a long way toward solving this.

@tomchristie
Copy link
Member

We could certainly think about switching our policy on httpcore defaults, yup.

It's not really been a very exposed issue until just now.

@bwelling
Copy link
Author

Should I open an issue for that?

@tomchristie
Copy link
Member

Welcome to, yup.

There's a bunch of stuff to think through. (Eg. What should our default SSLConfig be & how do we plan to keep things in sync between httpcore/httpx)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants