Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

quic: HTTP/3 upstream initial checkin #15027

Merged
merged 8 commits into from
Mar 3, 2021

Conversation

alyssawilk
Copy link
Contributor

This is working end to end for a number of the protocol integration tests, but still needs a bunch of work before it's production ready (I triaged some of the excludes but not all, watermarks not working etc).
I think given this works for the happy path it's worth checking in, and I can iterate on getting individual tests working and doing the TODOs.

Risk Level: low (minor core refactors, major changes are all behind hidden config)
Testing: integration tests, some unit tests
Docs Changes: n/a
Release Notes: n/a
#14829

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
@alyssawilk
Copy link
Contributor Author

cc @DavidSchinazi

Copy link
Contributor

@danzh2010 danzh2010 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working out this framework!

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
@alyssawilk
Copy link
Contributor Author

comments should be addressed, PTAL

Copy link
Contributor

@danzh2010 danzh2010 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late review. I have some thoughts on the EnvoyQuicClientSessionWithExtra, happy to discuss. Looks good over all, but I would prefer someone who is familiar with allocateConnPool() implementation and cluster_manager_impl to review that part.

Event::Dispatcher& dispatcher, TimeSource& time_source) {
// TODO(#14829): reject the config if a raw buffer socket is configured.
auto* ssl_socket_factory =
dynamic_cast<Extensions::TransportSockets::Tls::ClientSslSocketFactory*>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of adding getter for config_ in ClientSslSocketFactory(), can we use QuicClientTransportSocketFactory instead? Maybe worth to reject the config if it is configured to use anything else.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, have a TODO to reject the config already. Using the quic transport factory involves a bunch of extra workarounds due to the release assert on creating transport sockets. It means I have to fix the TODO for not creating the connection (which used the TLS config) and tossing it, but isn't too much extra cruft to get that TODO done :-)

source/extensions/transport_sockets/tls/ssl_socket.h Outdated Show resolved Hide resolved
source/common/http/http2/conn_pool.h Show resolved Hide resolved
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Copy link
Contributor

@danzh2010 danzh2010 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, alone aside a few nits

namespace Http3 {

// TODO(#14829) the constructor of Http2::ActiveClient sets max requests per
// connection based on HTTP/2 config. Sort out the HTTP/3 config story.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

quic listener configures this field in envoy.config.listener.v3.QuicProtocolOptions. I guess it's a question about where to place this config and how to plumb thru.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related to this PR, but given some recent discussions around whether QUIC should be an extension vs built-in (potentially with the option to compile it out) may influence our decisions here. I would suggest a short document on QUIC build and configuration so we can discuss the options?

std::vector<std::string> server_names;
auto& config_factory = Config::Utility::getAndCheckFactoryByName<
Server::Configuration::DownstreamTransportSocketConfigFactory>(
"envoy.transport_sockets.quic");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nits: use Extensions::TransportSockets::TransportSocketNames::get().Quic instead

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're moving away from wkn files per #7238
I used hard-coded strings to avoid further use of wkn. will let Matt comment on his preference on these ones since I do lean towards using predefined values myself.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine either way for now, but again related to why I think we need to undo all the extension stuff.

envoy::config::listener::v3::QuicProtocolOptions config;
auto& config_factory =
Config::Utility::getAndCheckFactoryByName<Server::ActiveUdpListenerConfigFactory>(
"quiche_quic_listener");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nits: use Envoy.Quic.QuicListenerName instead

config_helper_.addConfigModifier(
[&](envoy::config::bootstrap::v3::Bootstrap& bootstrap) -> void {
// Docker doesn't allow writing to the v6 address returned by
// Network::Utility::getLocalAddress.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a docker bug that leads us to hardcode address here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this test passes locally for me without that change, but failed on CI. I as far as I could tell the docker container was resolving a local v6 address but was cranky at sending to its non-loopback v6 address, where it was fine doing so with v4.
AFIK we have similar issues in-house with v6 docker behaving differently so I'm not terribly surprised.

namespace Envoy {
namespace Http {

// A factory to create EnvoyQuicClientConnection instance for QUIC
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nits: s/EnvoyQuicClientConnection/EnvoyQuicClientSession and EnvoyQuicClientConnection

virtual std::unique_ptr<Network::ClientConnection>
createQuicNetworkConnection(Network::Address::InstanceConstSharedPtr server_addr,
Network::Address::InstanceConstSharedPtr local_addr,
Network::TransportSocketFactory& transport_socket_factory,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nits: pass in QuicClientTransportSocketFactory instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a core http3 API and that's a QUIC class in the extension directory so I think leaving as-is makes sense?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. That makes sense! Maybe I should move Quic TransportSocketFactory into core code instead. It doesn't depends on QUICHE.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should move it all at this point. See my other comments.

source/common/http/http2/conn_pool.h Show resolved Hide resolved
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Copy link
Contributor Author

@alyssawilk alyssawilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattklein123 you up for a pass on the conn pool side of things?

virtual std::unique_ptr<Network::ClientConnection>
createQuicNetworkConnection(Network::Address::InstanceConstSharedPtr server_addr,
Network::Address::InstanceConstSharedPtr local_addr,
Network::TransportSocketFactory& transport_socket_factory,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a core http3 API and that's a QUIC class in the extension directory so I think leaving as-is makes sense?

std::vector<std::string> server_names;
auto& config_factory = Config::Utility::getAndCheckFactoryByName<
Server::Configuration::DownstreamTransportSocketConfigFactory>(
"envoy.transport_sockets.quic");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're moving away from wkn files per #7238
I used hard-coded strings to avoid further use of wkn. will let Matt comment on his preference on these ones since I do lean towards using predefined values myself.

config_helper_.addConfigModifier(
[&](envoy::config::bootstrap::v3::Bootstrap& bootstrap) -> void {
// Docker doesn't allow writing to the v6 address returned by
// Network::Utility::getLocalAddress.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this test passes locally for me without that change, but failed on CI. I as far as I could tell the docker container was resolving a local v6 address but was cranky at sending to its non-loopback v6 address, where it was fine doing so with v4.
AFIK we have similar issues in-house with v6 docker behaving differently so I'm not terribly surprised.

@alyssawilk
Copy link
Contributor Author

ping!

@alyssawilk
Copy link
Contributor Author

(mac failure is the known timeout issue)

danzh2010
danzh2010 previously approved these changes Mar 2, 2021
Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool. LGTM with some comments and one line that I think was meant to be deleted.

/wait

namespace Http3 {

// TODO(#14829) the constructor of Http2::ActiveClient sets max requests per
// connection based on HTTP/2 config. Sort out the HTTP/3 config story.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related to this PR, but given some recent discussions around whether QUIC should be an extension vs built-in (potentially with the option to compile it out) may influence our decisions here. I would suggest a short document on QUIC build and configuration so we can discuss the options?

data.host_description_->transportSocketFactory();
data.connection_ =
Config::Utility::getAndCheckFactoryByName<Http::QuicClientConnectionFactory>(
Http::QuicCodecNames::get().Quiche)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just another example of an extension leaking into core code which I don't think makes sense.

virtual std::unique_ptr<Network::ClientConnection>
createQuicNetworkConnection(Network::Address::InstanceConstSharedPtr server_addr,
Network::Address::InstanceConstSharedPtr local_addr,
Network::TransportSocketFactory& transport_socket_factory,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should move it all at this point. See my other comments.

@@ -678,6 +721,7 @@ testing::AssertionResult FakeUpstream::waitForUdpDatagram(Network::UdpRecvData&
}

void FakeUpstream::onRecvDatagram(Network::UdpRecvData& data) {
std::cerr << "Got datagrams. Need to dispatch\n";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete

std::vector<std::string> server_names;
auto& config_factory = Config::Utility::getAndCheckFactoryByName<
Server::Configuration::DownstreamTransportSocketConfigFactory>(
"envoy.transport_sockets.quic");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine either way for now, but again related to why I think we need to undo all the extension stuff.

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (let's discuss/track how to deal with the extension status in #12829)

@alyssawilk alyssawilk merged commit 0a58f84 into envoyproxy:main Mar 3, 2021
@alyssawilk alyssawilk mentioned this pull request Mar 3, 2021
@alyssawilk alyssawilk deleted the upstream_quic branch August 18, 2021 13:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants