Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

quic: remove ability to have pluggable implementation #12829

Closed
ggreenway opened this issue Aug 26, 2020 · 18 comments · Fixed by #15752
Closed

quic: remove ability to have pluggable implementation #12829

ggreenway opened this issue Aug 26, 2020 · 18 comments · Fixed by #15752
Assignees
Labels

Comments

@ggreenway
Copy link
Contributor

The implementation of QUIC is complicated, and the integration of a QUIC library with Envoy is also complicated. There is no sign that anyone will implement a different QUIC implementation in Envoy.

To simplify the implementation, we should consider removing the pluggability of QUIC implementations, and just use QUICHE (but with a build setting to remove it entirely if needed).

If we decide in the future to have an alternate implementation, I think it is sufficient to make it a build-time decision; supporting multiple QUIC implementations in the same binary, by filter configuration, seems unneccessary.

@ggreenway
Copy link
Contributor Author

cc @danzh2010 @mattklein123

@moderation
Copy link
Contributor

On a recent Envoy Community Call @florincoras mentioned that Cisco had or were developing a QUIC implementation that might be integrated with Envoy. Copying him for comment. Also copying @alyssawilk as I recall her being interested in alternative implementations when this was mentioned. No mention of this in the minutes but pretty sure I'm not making this up 😄

@ggreenway
Copy link
Contributor Author

Yep, one of my reasons for posting this here is to solicit opinions on the topic.

@moderation
Copy link
Contributor

@mattklein123
Copy link
Member

My vote is to remove pluggability. I don't think it's realistic that we will have a 2nd implementation, both due to the work involved, but also because of being realistic about library supply chain issues.

@florincoras
Copy link
Member

@moderation you're definitely not making this up :-)

The implementation I was talking about is part of fd.io vpp's host stack and is based on/wraps Quicly. All of the transports, i.e., tcp, udp, tls, quic, are consumable by external applications (wrt vpp) by means of posix-like apis exposed through a library (vcl) which applications must link against. That's to say that if Envoy wants to use quic in vpp, it could do so through vcl and apis that are similar to what we have today for IoHandle.

Regarding the extendibility, we could at one point add/extend a SocketInterface that instantiates quic specificIoHandles and then extend IoHandle to support all of quic's apis. At that point, we can hide quic implementations (QUICHE, vpp or others) "behind" the SocketInterfaces, but that might end up needing a decent amount of work :-)

@ggreenway
Copy link
Contributor Author

I think QUIC ends up being harder to plugin than other protocols, because it's not just a transport. It's a transport plus an HTTP codec integrated together, so it's a much larger implementation than just the SocketInterface. For example, there is QUIC-specific code in the HCM, and a setting in udp_listener_config, in addition to the transport socket settings.

@florincoras
Copy link
Member

Unfortunately, I'm not familiar with the existing code, but wouldn't it be possible to separate the transport for the other parts? The approach used in vpp for tls/quic transports was to have them explicitly manage the creation/interaction with the underlying tcp/udp connections and to expose byte stream oriented apis towards the apps.

@ggreenway
Copy link
Contributor Author

QUIC couples several things that were in separate layers previously. It's a UDP protocol, but it builds lossless bytestreams like HTTP/2 streams on top of that, and couples encryption into the transport layer. So the QUIC transport layer has API interactions with the HTTP layer for things like creating new streams (over an existing connection), etc.

It may be possible to de-couple some of these concerns. I'm honestly not sure. It would depend on the libraries used for QUIC supporting this split, and I haven't looked to see if it's possible.

Note that none of this precludes using VPP for the UDP transport beneath QUIC; that is still a separate layer.

@florincoras
Copy link
Member

florincoras commented Aug 26, 2020

QUIC couples several things that were in separate layers previously. It's a UDP protocol, but it builds lossless bytestreams like HTTP/2 streams on top of that, and couples encryption into the transport layer. So the QUIC transport layer has API interactions with the HTTP layer for things like creating new streams (over an existing connection), etc.

It may be possible to de-couple some of these concerns. I'm honestly not sure. It would depend on the libraries used for QUIC supporting this split, and I haven't looked to see if it's possible.

Understood. I guess we'll have to wait and see if people are interested in doing that separation.

Note that none of this precludes using VPP for the UDP transport beneath QUIC; that is still a separate layer.

Agreed! However, from my limited interactions with the existing implementation, it does seem to want access to the underlying fd in the UdpGsoBatchWriter. In fact, together with tls, it's one of the last places that needs access to the IoHandle's fd (see here). If you happen to know a way around that, it would be very helpful!

@ggreenway
Copy link
Contributor Author

The UdpGsoBatchWriter is an optimization, using a specific set of linux kernel APIs, to write packets faster (less CPU; only works on linux 4.18 and newer). I think something like that will always need the fd. But it is optional; if it isn't used, normal UDP socket writes are used. If VPP were in use, my guess is that there would be a similar, special implementation to write into VPP as fast as possible, which would be highly VPP-specific.

@ggreenway
Copy link
Contributor Author

Actually, now that I think about it, is there a clear benefit to using the QUIC support in VPP, vs the QUICHE QUIC support built into Envoy but using VPP for the UDP socket layer? I don't know a lot about VPP, but my guess is that, given that it is an alternative to the kernel stack, it made sense for many applications to include all the needed pieces for reliable transports, like TCP; so it made sense to also include a QUIC implementation in VPP.

But for the specific case of using VPP as the base of Envoy, where there's already a QUIC implementation, is anything gained by using VPP's QUIC implementation vs the one already in Envoy?

@florincoras
Copy link
Member

I should've been clearer above that I'm by no means advocating for changing/replacing the existing QUIC implementation in Envoy. I was only trying to point out that we could have factories for QUIC specific IoHandles, so anybody could in theory insert a custom implementation. Therefore, if we followed this approach, what you're proposing with this issue should not have an impact on QUIC pluggability. Given that this is flawed, as you pointed out, the only options are to give up on pluggability or do the necessary work to have custom QUIC IoHandles. I'm fine with the former 🙂.

Regarding performance, it's hard to say. VPP does all the transport and network specific work on dedicated, cpu bound workers and exchanges data with the application over shared memory. That means that Envoy workers do only L7 work and vpp workers take care of the rest of the layers. So, ignoring the details of the QUIC implementations, it all depends on cache locality benefits, i.e., impact of having to do/not do QUIC specific work. Probably hardware acceleration would be another dimension to explore.

@danzh2010
Copy link
Contributor

Another reason why we made QUICHE pluggable to begin with is that QUICHE cross platform compatibility is not as good as envoy core code. In some platforms, if QUICHE breaks the build, we still want the rest of the envoy to build.

@mattklein123 mattklein123 added the help wanted Needs help! label Sep 1, 2020
@alyssawilk
Copy link
Contributor

yeah, in other areas of the code base we compile things out with bazel flags, like the hot restart code, so I think the theory is we can remove some of the extension complexity but we'll still preserve compiling without QUIC.

@mattklein123
Copy link
Member

I will work on this cc @alyssawilk

alyssawilk added a commit that referenced this issue Mar 23, 2021
Commit Message:
Additional Description:
Risk Level:
Testing:
Docs Changes:
Release Notes:
Platform Specific Features:
Much of #12829

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
mattklein123 added a commit that referenced this issue Mar 26, 2021
Part of #12829. The transport
extension will stay a built-in extension since it fits well. UDP
listener and UDP writer extension points have been removed. GSO is
still only enabled for QUIC because it currently depends on QUICHE, has
some obvious perf issues, and is failing non-QUIC integration tests.
Futher work is needed to remove codec extension factories.

Part of #12829

Signed-off-by: Matt Klein <mklein@lyft.com>
@CameronNemo
Copy link

CameronNemo commented Mar 27, 2021

I would like to point out that there are two QUIC implementations named quiche. Apparently the pun was two too clever to pass up.

https://github.com/cloudflare/quiche (Rust)

https://quiche.googlesource.com/quiche/ (C++)

@mattklein123
Copy link
Member

note: follow up on #15720 (comment) before this is closed.

mattklein123 added a commit that referenced this issue Mar 29, 2021
Part of #12829. The transport
extension will stay a built-in extension since it fits well. UDP
listener and UDP writer extension points have been removed. GSO is
still only enabled for QUIC because it currently depends on QUICHE, has
some obvious perf issues, and is failing non-QUIC integration tests.
Futher work is needed to remove codec extension factories.

Part of #12829

Signed-off-by: Matt Klein <mklein@lyft.com>
lizan pushed a commit to envoyproxy/data-plane-api that referenced this issue Mar 29, 2021
Part of envoyproxy/envoy#12829. The transport
extension will stay a built-in extension since it fits well. UDP
listener and UDP writer extension points have been removed. GSO is
still only enabled for QUIC because it currently depends on QUICHE, has
some obvious perf issues, and is failing non-QUIC integration tests.
Futher work is needed to remove codec extension factories.

Part of envoyproxy/envoy#12829

Signed-off-by: Matt Klein <mklein@lyft.com>

Mirrored from https://github.com/envoyproxy/envoy @ 8e8a21d20ef0e90ef31ea16f5b5c85ac08d922ae
mattklein123 added a commit that referenced this issue Mar 29, 2021
Fixes #12829

Signed-off-by: Matt Klein <mklein@lyft.com>
mattklein123 added a commit that referenced this issue Mar 31, 2021
Fixes #12829

Signed-off-by: Matt Klein <mklein@lyft.com>
lizan pushed a commit to envoyproxy/data-plane-api that referenced this issue Mar 31, 2021
Fixes envoyproxy/envoy#12829

Signed-off-by: Matt Klein <mklein@lyft.com>

Mirrored from https://github.com/envoyproxy/envoy @ 1134a1602c37d6fec8d5dd9bccd2855f32b2a34a
alyssawilk added a commit that referenced this issue Apr 1, 2021
Risk Level: n/a (moving files)
Part of #12829.

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
rexengineering pushed a commit to rexengineering/istio-envoy that referenced this issue Oct 15, 2021
Part of envoyproxy/envoy#12829. The transport
extension will stay a built-in extension since it fits well. UDP
listener and UDP writer extension points have been removed. GSO is
still only enabled for QUIC because it currently depends on QUICHE, has
some obvious perf issues, and is failing non-QUIC integration tests.
Futher work is needed to remove codec extension factories.

Part of envoyproxy/envoy#12829

Signed-off-by: Matt Klein <mklein@lyft.com>
rexengineering pushed a commit to rexengineering/istio-envoy that referenced this issue Oct 15, 2021
Fixes envoyproxy/envoy#12829

Signed-off-by: Matt Klein <mklein@lyft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants