-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
quic: fix parsing of UDP_GRO and ECN socket options to also work in big endian platforms #37217
base: main
Are you sure you want to change the base?
quic: fix parsing of UDP_GRO and ECN socket options to also work in big endian platforms #37217
Conversation
…ig endian platforms Fixes: * //test/common/quic:active_quic_listener_test * //test/integration:quic_http_integration_test on big endian platforms. Signed-off-by: Jonathan Albrecht <jonathan.albrecht@ibm.com>
One test failed on arm64 in Envoy/Prechecks but I think it might just be a flake. I ran the test on x64 under the debugger and it didn't hit either of the methods I modified. Is it possible to re-run the check? Failing test on arm64:
|
/assgin @danzh2010 @RyanTheOptimist |
Thanks for fixing this issue! Any chance you can add a unit test? |
/assign @danzh2010 @RyanTheOptimist |
@danzh2010 I think the ECN case is well covered by the //test/common/quic:active_quic_listener_test mentioned in the description. I'll try to see if I can add a unit test specifically for UDP_GRO which was causing the integration test //test/integration:quic_http_integration_test to fail if that makes sense. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few things:
- What big-endian platforms are being targeted here? I thought that these days virtually everything was little-endian?
- Can you explain what part of the PR changes behavior on big-endian systems but not little-endian?
- The typical way of transfroming integers from the network byte order into the host byte order is via htons and htonl. Should we use that here?
case CMSG_LEN(sizeof(uint32_t)): | ||
return static_cast<T>(*reinterpret_cast<const uint32_t*>(CMSG_DATA(&cmsg))); | ||
case CMSG_LEN(sizeof(uint64_t)): | ||
return static_cast<T>(*reinterpret_cast<const uint64_t*>(CMSG_DATA(&cmsg))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that unless the CMSG data happens to be correctly aligned, this may cause an unaligned access error, so we should use memcpy instead. Could we do:
T value;
if (cmsg.cmsg_len != CMSG_LEN(sizeof(value)) {
IS_ENVOY_BUG(
fmt::format("unexpected cmsg_len value for unsigned integer payload: {}", cmsg.cmsg_len));
return absl::nullopt;
}
memcpy(&value, CMSG_DATA(&cmsg), sizeof(value));
return value;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The conditional wouldn't work because (cmsg.cmsg_len != CMSG_LEN(sizeof(value))
is usually true because cmsg.cmsg_len
is usually greater than CMSG_LEN(sizeof(value))
. The memcpy
would work on little endian like it does now but it would still have the same problems on big endian. See #37217 (comment) for details.
I don't know enough about data alignment to know if its a problem here but I couldn't find any other places in envoy that used memcpy
on CMSG_DATA(&cmsg)
so maybe that should be addressed separately?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, actually a colleague of mine found a reference to the alignment issue:
https://man7.org/linux/man-pages/man3/cmsg.3.html
CMSG_DATA()
returns a pointer to the data portion of a cmsghdr. The
pointer returned cannot be assumed to be suitably aligned
for accessing arbitrary payload data types. Applications
should not cast it to a pointer type matching the payload,
but should instead use memcpy(3) to copy data to or from a
suitably declared object.
So let's go ahead and memcpy() into a temporary object of the right type based on the length and then
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I'll make that change and also try to add a unit test for the GRO case as mentioned above.
which takes the 2 bytes starting at
so that the full length of
so for the case
|
Oh, fascinating. Thanks! I'm concerned, however, that the QUIC code in general will not work on big endian systems. From talking with colleagues who work on BoringSSL, I believe it assumes little endian, though I could be wrong. But the QUICHE QUIC code which Envoy uses definitely assumes little endian:
I'm happy to fix this GRO/ECN code to do the right thing with these fields, but I'm rather confused by how QUIC is working on these systems?
Ah! Thanks. I misunderstood the problem. I assumed the issue was a byte order conversion problem, but as you point out it's really a truncation problem. I see. Thanks for the explanation. |
I'm also working on endian issues in envoy's dependencies as needed so no QUIC doesn't work out of the box on s390x. We're not always able to get all of our changes upstream but if the community is able to accept these kind of changes we try to upstream them to help with future porting efforts. |
Oh, I see. So in the PR description when it talks about fixing:
|
/wait |
Commit Message: quic: fix parsing of UDP_GRO and ECN socket options to also work in big endian platforms
Additional Description:
Check the actual cms_len so that the entire value will be parsed.
Fixes:
//test/common/quic:active_quic_listener_test
//test/integration:quic_http_integration_test
on big endian platforms.
Risk Level: low
Testing: unit tests
Docs Changes: no
Release Notes: no
Platform Specific Features:
[Optional Runtime guard:]
[Optional Fixes #Issue]
[Optional Fixes commit #PR or SHA]
[Optional Deprecated:]
[Optional API Considerations:]