How are requests with differing `Host:` header field values mapped to outbound connections? #415

briansmith · 2018-02-22T05:48:54Z

This is a follow-up to PR #397, in particular #397 (comment).

In the case where we send the request to the SO_ORIGINAL_DST destination, because we didn't discover the destination using the Destinations service, we're not making anything w.r.t. whether requests with different Host request header field values are sent on the same connection; the original service is making the choice whether or not to do that. We should, at least, document this in the code with a comment.

In the case where HTTP/1.x requests with differing host values are inbound to the proxy from an external source:

What do we currently do?
What should we do? In particular, should we segregate such requests by Host header field value, and have (at least) one connection to the proxied service per Host field value?

In the case where HTTP/1.x requests with different host values are outbound from the proxy to an external source:

What do we currently do?
What should we do? Again, should we segregate such requests by Host header field value, and have (at least) one connection to the external service per Host header field value?

Note in particular that in the outbound case we don't know whether the destination supports HTTP/1.1 or only HTTP/1.0; i.e. we don't know whether it might ignore the Host header. Also, even in the case where the destination claims to support HTTP/1.1, I'm not sure we can rely on this to mean that we can safely send it requests with differing Host header field values on the same connection. We should do what web browsers like Chrome do here.

All of the above is regarding plaintext (http://) connections. In the case of TLS-protected (https://) connections, I already know that at least empirical results from browsers require that the Host header field match the SNI (Server Name Indication) TLS extension. That is, HTTPS rules require us to avoid sending requests with different Host header field values on the same connection. See https://bugs.chromium.org/p/chromium/issues/detail?id=615413#c2.

Unless there's important reasons to do otherwise, I'd prefer to do the same for HTTP as is required for HTTPS; i.e. have distinct connections per Host request field value.

The text was updated successfully, but these errors were encountered:

briansmith · 2018-02-22T06:00:16Z

Similarly, we need to decide how to deal with the possibility that we might have requests claiming to be HTTP/1.0 and others claiming to be HTTP/1.1 that might be ending up on the same outbound (to the proxied service, or to an external destination). It seems like we should be segregating these by protocol version.

briansmith · 2018-02-26T19:45:27Z

@seanmonstar @carllerche Do you have thoughts about this? Could you explain how things currently work and whether you think my suggestion above is good or bad and, if bad, why? Also, how much work would it be to change things, if we were to make these changes? Would we need to bump hyper, tokio, and/or tower?

carllerche · 2018-02-26T20:14:24Z

It looks like @seanmonstar would know more about the impl details as it looks like the original PR came from him. It sounds like you want any connection pool to be keyed on the host header and not the socket addr?

As for the last question, I would highly doubt this would relate to Tokio and probably wouldn't touch Hyper either. It might relate to some code in Tower, but that is not on crates.io yet and we are tracking git, so if there are any required fixes, it should be easy to do.

briansmith · 2018-02-26T20:19:38Z

It sounds like you want any connection pool to be keyed on the host header and not the socket addr?

Basically, for HTTP/1 + TLS, we need to ensure we're not mixing (outbound) traffic for different hosts on the same (outbound) connection, and I propose we do the same for non-TLS HTTP/1 traffic. That's partially because I want to minimize differences between non-TLS and TLS traffic, and partially because I"m concerned that not all destinations will correctly support mixing different Hosts on the same connection--especially since the destination might really be HTTP/1.0 even though we're sending HTTP/1.1 requests.

carllerche · 2018-02-26T20:22:02Z

This seems fine to me.

seanmonstar · 2018-02-26T21:22:10Z

Currently the HTTP/1 proxy part looks for the Host (or the authority of the URL if the request-target is in absolute-form), and uses that as the "key" for our Recognize implementation. After returning that key, the proxy asks the controller for a socket address for that Host, and once received sets up a hyper client that connects solely to that socket address.

The code that creates the hyper client does not check that the outgoing request host matches previous requests sent on the same connection.

To make sure the connections are keyed not only on the socket address but the host as well, we could probably adjust the Protocol part of our recognize key to enum Protocol { Http1(Host), Http2 }.

hawkw · 2018-02-27T00:04:50Z

To make sure the connections are keyed not only on the socket address but the host as well, we could probably adjust the Protocol part of our recognize key to enum Protocol { Http1(Host), Http2 }.

What seems like an appropriate type to use to represent the Host in the Protocol key? Is it a Destination, the Host type in connect.rs, or its' own type in this instance?

seanmonstar · 2018-02-27T00:17:15Z

The Host probably needs to be some Either<FullyQualifiedAuthority, Authority>. This would probably be needed in both inbound and outbound.

hawkw · 2018-02-27T01:06:40Z

Okay, so I’m beginning to think that the most correct thing to do is just to reuse the Destination in outbound.rs for Host here.

I was going to have an enum that’s either FullyQualifiedAuthority or Authority, but this doesn’t handle the case where there’s no authority part in the request URI which is when we use SO_ORIG_DST to route on. GetOrigDst gives us back a SocketAddr rather than an Authority, which is what's in the External case of Destination.

Therefore, I think it's best to use Destination, and just factor out the logic for extracting Destinations from Requests from the impl of Recognize for Outbound. Does that seem reasonable to you all?

seanmonstar · 2018-02-27T18:44:57Z

That seems fine, however there is a difference in how Destination is used currently, versus what is needed in this issue. Current Destination uses either FullyQualifiedAuthority, or SocketAddr. The second case doesn't mean the request didn't have a host header, it just means that the authority didn't look like a local service, and so we aren't looking up a new address, just using the original.

Even with a FullyQualifiedAuthority, we ask the controller for a socket address, and if it matches an existing connection we already have, I believe it is reused. So, specifically in HTTP/1, we'd need the key to be Destination plus the Host.

hawkw · 2018-02-27T18:48:10Z

@seanmonstar Okay, that makes sense. Thanks for clearing that up for me, I think I understand what's necessary now.

hawkw · 2018-02-27T20:42:01Z

Here's my understanding of how to determine a Host component for the Protocol key thus far:

if I have a HTTP/1.x request:
- if the request URI has an authority:
  - if we can normalise that authority as a Fully Qualified Authority, the Host key is that FQA
    else the Host key component is the authority
- else if if the request has a Host: header:
  - if we can normalise the host header value as a Fully Qualified Authority, the Host key is that FQA
    else the Host key component is the authority
- else* ???

The question I've arrived at now is, what do we do if the request lacks both an authority in the URI and a Host: header? For routing, we use get_orig_dst in that case. Should the host key just be set to the socket addr in that case?

seanmonstar · 2018-02-27T20:52:12Z

I wonder if to be on the safe side, an HTTP/1.0 request with absolutely no authority should just always have its own connection. It makes me sad to waste resources like that, but perhaps it's expected?

hawkw · 2018-02-27T20:55:17Z

@seanmonstar that seems like a safe bet for now, at least.

briansmith · 2018-02-27T21:08:30Z

The idea of normalizing to FullyQualifiedAuthorityis to help with TLS; we need to decide which form of the name that we're going to validate the TLS certificate for. It is basically the canonical name we want to use to refer to the host. Right now we FullyQualifiedAuthority::normalize() returns Some(FullyQualifiedAuthority) for DNS names that we should resolve using the Destination service, None for DNS names that we shouldn't try to resolve using the Destination services, and None for IP addresses. However, really it should return something like this:

pub enum NamedAddress {
    DnsName {
        name: FullyQualifiedAuthority,
        use_destination_service: bool,
    }
    // IP addresses.
    Ip(IpAddr),
}

Then we can have something like this:

pub struct NamedDestination {
    addr: NamedAddress,
    port: Port,
}

Importantly, there's never a reason to have Either<Authority, FullyQualifiedAuthority> because FullyQualifiedAuthority is really just a type that indicates that normalize() has been called; i.e. that we're not dealing with unnormalized names.

Also note that we have to decide what to do in the case that FullyQualifiedAuthority::normalize() actually normalizes the name, e.g. changes name to name.namespace.svc.cluster.local. In that case, the Host header field of the outbound connection also needs to be changed from Host: name to Host: name.namespace.svc.cluster.local.

briansmith · 2018-02-27T21:12:03Z

I also think we don't have to optimize the case where there is no Host header field in the request, at least for now. It's not clear yet if it would be safe to optimize, and almost every modern HTTP library is going to be formatting requests in HTTP/1.1 form or at least HTTP/1.0 form with a Host header field.

briansmith · 2018-02-27T21:27:41Z

In fact, FullyQualifiedAuthority is already defined as FullyQualifiedAuthority(Authority), and Authority handles both DNS names and IP addresses, so we really seem to just need something like this:

pub enum NamedAddress {
    name: FullyQualifiedAuthority,
    use_destination_service: bool,
}

And then use FullyQualifiedAuthority::without_trailing_dot() wherever we need the value of the authority. We could change without_trailing_dot() return Authority too.

hawkw · 2018-02-27T21:29:50Z

@briansmith Thanks for all of the additional information. Do you think all of the changes you're describing here are in scope for a branch to address ensuring that each host has its' own connection, or should some of these changes be done in additional PRs?

briansmith · 2018-02-27T21:36:53Z

It would be good to first do the refactoring such that normalize() always returns a FullyQualifiedAuthority and an indication of whether the Destination service should be used. Then a PR for this issue could be built on top of it.

It would be bad to have a PR that uses Either<FullyQualifiedAuthority, Authority> or similar, since that's just perpetuating the confusion about what FullyQualifiedAuthority is for.

hawkw · 2018-02-27T21:40:35Z

@briansmith okay, that sounds good to me. Should there be a separate issue to track the FullyQualifiedAuthority changes?

briansmith · 2018-02-27T22:04:59Z

@briansmith okay, that sounds good to me. Should there be a separate issue to track the FullyQualifiedAuthority changes?

I don't care if there is a separate issue. A separate PR would be nice, though not strictly required.

hawkw · 2018-02-27T22:28:32Z

@briansmith I've opened PR #476 to make the changes you described here.

hawkw · 2018-02-27T23:44:11Z

A quick update: I think I've made the necessary changes here. Since this is hard to validate in a unit test, I'm talking to the release confidence team about end-to-end testing for this behaviour.

briansmith · 2018-02-28T00:04:45Z

A quick update: I think I've made the necessary changes here. Since this is hard to validate in a unit test, I'm talking to the release confidence team about end-to-end testing for this behaviour.

I am pretty sure we can use (and/or extend) the proxy's own integration test framework to test this. I think the end-to-end tests should be reserved for Kubernetes- or environment- dependent behavior whenever possible.

seanmonstar · 2018-02-28T00:21:07Z

If wanting to validate in a unit test, I would edit the proxy/tests/support/server.rs to allow the route functions to be able to access the Service, and since a Service is created to handle a single connection, a test can set some flag on the service, and error if it was already set.

hawkw · 2018-02-28T00:22:11Z

Okay, thanks @briansmith and @seanmonstar, I'll see what I can do.

@briansmith

#476) As requested by @briansmith in #415 (comment) and #415 (comment), I've refactored `FullyQualifiedAuthority::normalize` to _always_ return a `FullyQualifiedAuthority`, along with a boolean value indicating whether or not the Destination service should be used for that authority. This is in contrast to returning an `Option<FullyQualifiedAuthority>` where `None` indicated that the Destination service should not be used, which is what this function did previously. This is required for further progress on #415. Signed-off-by: Eliza Weisman <eliza@buoyant.io>

hawkw · 2018-02-28T22:56:34Z

I have these changes ready, but writing unit tests for them is currently blocked by a Hyper bug that's causing multiple connections to be opened and then immediately closed. I believe @seanmonstar is looking into that.

Should I go ahead and open a PR for this without the unit tests, and then add the tests in a subsequent PR?

hawkw · 2018-03-01T00:05:37Z

Okay, so while trying to test the branch I wrote to add the Host header to Recognize keys, @seanmonstar and I determined that Hyper's connection pool already implements the behaviour we want --- even though we're using a custom connector, the pool in Hyper is still keying connections off of the Host header. This appears to be the behaviour we want, and I can open a PR to add a unit test to ensure that we continue to have this behaviour.

If we're still interested in merging the code I've written to modify our Recognize impls to also key off the host header, I believe that would ensure we still have this behaviour without relying on Hyper to provide it? This might be valuable if e.g. we were to replace Hyper's default connection pooling with our own connection management later on.

briansmith · 2018-03-01T00:10:01Z

This appears to be the behaviour we want, and I can open a PR to add a unit test to ensure that we continue to have this behaviour.

Yes, please.

If we're still interested in merging the code I've written to modify our Recognize impls to also key off the host header, I believe that would ensure we still have this behaviour without relying on Hyper to provide it? This might be valuable if e.g. we were to replace Hyper's default connection pooling with our own connection management later on.

We don't need to merge the new code as long as we have a good test. If we change how connection pooling works in the future, then the test will catch any breakage, which is one of the main points of having a test. The even more important reason to have a test is to inform all of us how our product actually works. :)

@briansmith

#476) As requested by @briansmith in #415 (comment) and #415 (comment), I've refactored `FullyQualifiedAuthority::normalize` to _always_ return a `FullyQualifiedAuthority`, along with a boolean value indicating whether or not the Destination service should be used for that authority. This is in contrast to returning an `Option<FullyQualifiedAuthority>` where `None` indicated that the Destination service should not be used, which is what this function did previously. This is required for further progress on #415. Signed-off-by: Eliza Weisman <eliza@buoyant.io>

hawkw · 2018-03-01T00:15:45Z

Okay, opened PR #489 for the tests.

seanmonstar · 2018-03-01T00:42:31Z

I suspect your code may still be needed to handle the case of there being no host header at all, since that just assumes the ORIG_DST as the host.

hawkw · 2018-03-01T01:45:12Z

Ah, that's a good point, thanks @seanmonstar for pointing that out.

olix0r · 2018-03-07T00:01:00Z

@briansmith

Basically, for HTTP/1 + TLS, we need to ensure we're not mixing (outbound) traffic for different hosts on the same (outbound) connection, and I propose we do the same for non-TLS HTTP/1 traffic.

We're going to have to revisit all of this with logical routing, fwiw.

I"m concerned that not all destinations will correctly support mixing different Hosts on the same connection--especially since the destination might really be HTTP/1.0 even though we're sending HTTP/1.1 requests.

I'm very skeptical that this is a problem we should be concerned about without a concrete use case.

… values (#492) This PR ensures that the mapping of requests to outbound connections is segregated by `Host:` header values. In most cases, the desired behavior is provided by Hyper's connection pooling. However, Hyper does not handle the case where a request had no `Host:` header and the request URI had no authority part, and the request was routed based on the SO_ORIGINAL_DST in the desired manner. We would like these requests to each have their own outbound connection, but Hyper will reuse the same connection for such requests. Therefore, I have modified `conduit_proxy_router::Recognize` to allow implementations of `Recognize` to indicate whether the service for a given key can be cached, and to only cache the service when it is marked as cachable. I've also changed the `reconstruct_uri` function, which rewrites HTTP/1 requests, to mark when a request had no authority and no `Host:` header, and the authority was rewritten to be the request's ORIGINAL_DST. When this is the case, the `Recognize` implementations for `Inbound` and `Outbound` will mark these requests as non-cachable. I've also added unit tests ensuring that A, connections are created per `Host:` header, and B, that requests with no `Host:` header each create a new connection. The first test passes without any additional changes, but the second only passes on this branch. The tests were added in PR #489, but this branch supersedes that branch. Fixes #415. Closes #489.

@briansmith

#476) As requested by @briansmith in linkerd/linkerd2#415 (comment) and linkerd/linkerd2#415 (comment), I've refactored `FullyQualifiedAuthority::normalize` to _always_ return a `FullyQualifiedAuthority`, along with a boolean value indicating whether or not the Destination service should be used for that authority. This is in contrast to returning an `Option<FullyQualifiedAuthority>` where `None` indicated that the Destination service should not be used, which is what this function did previously. This is required for further progress on #415. Signed-off-by: Eliza Weisman <eliza@buoyant.io>

… values (linkerd#492) This PR ensures that the mapping of requests to outbound connections is segregated by `Host:` header values. In most cases, the desired behavior is provided by Hyper's connection pooling. However, Hyper does not handle the case where a request had no `Host:` header and the request URI had no authority part, and the request was routed based on the SO_ORIGINAL_DST in the desired manner. We would like these requests to each have their own outbound connection, but Hyper will reuse the same connection for such requests. Therefore, I have modified `conduit_proxy_router::Recognize` to allow implementations of `Recognize` to indicate whether the service for a given key can be cached, and to only cache the service when it is marked as cachable. I've also changed the `reconstruct_uri` function, which rewrites HTTP/1 requests, to mark when a request had no authority and no `Host:` header, and the authority was rewritten to be the request's ORIGINAL_DST. When this is the case, the `Recognize` implementations for `Inbound` and `Outbound` will mark these requests as non-cachable. I've also added unit tests ensuring that A, connections are created per `Host:` header, and B, that requests with no `Host:` header each create a new connection. The first test passes without any additional changes, but the second only passes on this branch. The tests were added in PR linkerd#489, but this branch supersedes that branch. Fixes linkerd#415. Closes linkerd#489.

briansmith added area/proxy area/security labels Feb 22, 2018

briansmith added this to the Conduit 0.3.1 milestone Feb 22, 2018

olix0r added the priority/P0 Release Blocker label Feb 23, 2018

olix0r modified the milestone: Conduit 0.3.1 Feb 23, 2018

olix0r added priority/P1 Planned for Release and removed priority/P0 Release Blocker labels Feb 23, 2018

hawkw self-assigned this Feb 26, 2018

hawkw mentioned this issue Feb 27, 2018

Refactor FullyQualifiedAuthority::normalize to always return authority #476

Merged

hawkw added a commit that referenced this issue Feb 28, 2018

WIP test for #415

1fdbc9d

hawkw added a commit that referenced this issue Mar 1, 2018

WIP test for #415

de20203

hawkw mentioned this issue Mar 1, 2018

Add unit test ensuring connections are created per-host #489

Closed

hawkw added the review/ready Issue has a reviewable PR label Mar 1, 2018

hawkw mentioned this issue Mar 1, 2018

Enforce that requests are mapped to connections for each Host: header values #492

Merged

hawkw added a commit that referenced this issue Mar 2, 2018

WIP test for #415

23b3814

hawkw added a commit that referenced this issue Mar 6, 2018

WIP test for #415

4ec13f7

hawkw closed this as completed in #492 Mar 7, 2018

hawkw removed the review/ready Issue has a reviewable PR label Mar 7, 2018

github-actions bot locked as resolved and limited conversation to collaborators Jul 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How are requests with differing `Host:` header field values mapped to outbound connections? #415

How are requests with differing `Host:` header field values mapped to outbound connections? #415

briansmith commented Feb 22, 2018 •

edited

Loading

briansmith commented Feb 22, 2018

briansmith commented Feb 26, 2018

carllerche commented Feb 26, 2018

briansmith commented Feb 26, 2018

carllerche commented Feb 26, 2018

seanmonstar commented Feb 26, 2018

hawkw commented Feb 27, 2018

seanmonstar commented Feb 27, 2018

hawkw commented Feb 27, 2018

seanmonstar commented Feb 27, 2018

hawkw commented Feb 27, 2018

hawkw commented Feb 27, 2018

seanmonstar commented Feb 27, 2018

hawkw commented Feb 27, 2018

briansmith commented Feb 27, 2018 •

edited

Loading

briansmith commented Feb 27, 2018

briansmith commented Feb 27, 2018

hawkw commented Feb 27, 2018

briansmith commented Feb 27, 2018

hawkw commented Feb 27, 2018

briansmith commented Feb 27, 2018

hawkw commented Feb 27, 2018

hawkw commented Feb 27, 2018

briansmith commented Feb 28, 2018

seanmonstar commented Feb 28, 2018

hawkw commented Feb 28, 2018

hawkw commented Feb 28, 2018

hawkw commented Mar 1, 2018

briansmith commented Mar 1, 2018

hawkw commented Mar 1, 2018

seanmonstar commented Mar 1, 2018

hawkw commented Mar 1, 2018

olix0r commented Mar 7, 2018 •

edited

Loading

How are requests with differing Host: header field values mapped to outbound connections? #415

How are requests with differing Host: header field values mapped to outbound connections? #415

Comments

briansmith commented Feb 22, 2018 • edited Loading

briansmith commented Feb 22, 2018

briansmith commented Feb 26, 2018

carllerche commented Feb 26, 2018

briansmith commented Feb 26, 2018

carllerche commented Feb 26, 2018

seanmonstar commented Feb 26, 2018

hawkw commented Feb 27, 2018

seanmonstar commented Feb 27, 2018

hawkw commented Feb 27, 2018

seanmonstar commented Feb 27, 2018

hawkw commented Feb 27, 2018

hawkw commented Feb 27, 2018

seanmonstar commented Feb 27, 2018

hawkw commented Feb 27, 2018

briansmith commented Feb 27, 2018 • edited Loading

briansmith commented Feb 27, 2018

briansmith commented Feb 27, 2018

hawkw commented Feb 27, 2018

briansmith commented Feb 27, 2018

hawkw commented Feb 27, 2018

briansmith commented Feb 27, 2018

hawkw commented Feb 27, 2018

hawkw commented Feb 27, 2018

briansmith commented Feb 28, 2018

seanmonstar commented Feb 28, 2018

hawkw commented Feb 28, 2018

hawkw commented Feb 28, 2018

hawkw commented Mar 1, 2018

briansmith commented Mar 1, 2018

hawkw commented Mar 1, 2018

seanmonstar commented Mar 1, 2018

hawkw commented Mar 1, 2018

olix0r commented Mar 7, 2018 • edited Loading

How are requests with differing `Host:` header field values mapped to outbound connections? #415

How are requests with differing `Host:` header field values mapped to outbound connections? #415

briansmith commented Feb 22, 2018 •

edited

Loading

briansmith commented Feb 27, 2018 •

edited

Loading

olix0r commented Mar 7, 2018 •

edited

Loading