Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

quic: fix parsing of UDP_GRO and ECN socket options to also work in big endian platforms #37217

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jonathan-albrecht-ibm
Copy link
Contributor

Commit Message: quic: fix parsing of UDP_GRO and ECN socket options to also work in big endian platforms
Additional Description:
Check the actual cms_len so that the entire value will be parsed.

Fixes:

//test/common/quic:active_quic_listener_test
//test/integration:quic_http_integration_test

on big endian platforms.
Risk Level: low
Testing: unit tests
Docs Changes: no
Release Notes: no
Platform Specific Features:
[Optional Runtime guard:]
[Optional Fixes #Issue]
[Optional Fixes commit #PR or SHA]
[Optional Deprecated:]
[Optional API Considerations:]

…ig endian platforms

Fixes:

* //test/common/quic:active_quic_listener_test
* //test/integration:quic_http_integration_test

on big endian platforms.

Signed-off-by: Jonathan Albrecht <jonathan.albrecht@ibm.com>
@jonathan-albrecht-ibm
Copy link
Contributor Author

One test failed on arm64 in Envoy/Prechecks but I think it might just be a flake. I ran the test on x64 under the debugger and it didn't hit either of the methods I modified. Is it possible to re-run the check?

Failing test on arm64:

[ RUN      ] GetAddrInfoDnsImplTest.NoName
test/extensions/network/dns_resolver/getaddrinfo/getaddrinfo_test.cc:239: Failure
Value of: traces
Expected: has 4 elements where
element #0 has trace 1-byte object <00>,
element #1 has trace 1-byte object <01>,
element #2 has trace 1-byte object <04>,
element #3 has trace 1-byte object <08>
  Actual: { "1=908346291054", "4=908346295734", "0=908346301294", "8=908346302174" }, whose element #0 doesn't match
Stack trace:
  0x518ce0: absl::lts_20240722::internal_any_invocable::RemoteInvoker<>()
  0x5c2e18: Envoy::Event::DispatcherImpl::runPostCallbacks()
  0x5c2cd4: Envoy::Event::DispatcherImpl::run()
  0x4f072c: Envoy::Network::(anonymous namespace)::GetAddrInfoDnsImplTest_NoName_Test::TestBody()
  0x9d5ae0: testing::internal::HandleExceptionsInMethodIfSupported<>()
  0x9d597c: testing::Test::Run()
  0x9d68d8: testing::TestInfo::Run()
... Google Test internal frames ...

[  FAILED  ] GetAddrInfoDnsImplTest.NoName (7 ms)

@soulxu
Copy link
Member

soulxu commented Nov 19, 2024

/assgin @danzh2010 @RyanTheOptimist

@danzh2010
Copy link
Contributor

Thanks for fixing this issue! Any chance you can add a unit test?

@soulxu
Copy link
Member

soulxu commented Nov 20, 2024

/assign @danzh2010 @RyanTheOptimist

@jonathan-albrecht-ibm
Copy link
Contributor Author

@danzh2010 I think the ECN case is well covered by the //test/common/quic:active_quic_listener_test mentioned in the description. I'll try to see if I can add a unit test specifically for UDP_GRO which was causing the integration test //test/integration:quic_http_integration_test to fail if that makes sense.

Copy link
Contributor

@RyanTheOptimist RyanTheOptimist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few things:

  1. What big-endian platforms are being targeted here? I thought that these days virtually everything was little-endian?
  2. Can you explain what part of the PR changes behavior on big-endian systems but not little-endian?
  3. The typical way of transfroming integers from the network byte order into the host byte order is via htons and htonl. Should we use that here?

case CMSG_LEN(sizeof(uint32_t)):
return static_cast<T>(*reinterpret_cast<const uint32_t*>(CMSG_DATA(&cmsg)));
case CMSG_LEN(sizeof(uint64_t)):
return static_cast<T>(*reinterpret_cast<const uint64_t*>(CMSG_DATA(&cmsg)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that unless the CMSG data happens to be correctly aligned, this may cause an unaligned access error, so we should use memcpy instead. Could we do:

  T value;
  if (cmsg.cmsg_len != CMSG_LEN(sizeof(value)) {
    IS_ENVOY_BUG(
      fmt::format("unexpected cmsg_len value for unsigned integer payload: {}", cmsg.cmsg_len));
  return absl::nullopt;
  }
  memcpy(&value, CMSG_DATA(&cmsg), sizeof(value));
  return value;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The conditional wouldn't work because (cmsg.cmsg_len != CMSG_LEN(sizeof(value)) is usually true because cmsg.cmsg_len is usually greater than CMSG_LEN(sizeof(value)). The memcpy would work on little endian like it does now but it would still have the same problems on big endian. See #37217 (comment) for details.

I don't know enough about data alignment to know if its a problem here but I couldn't find any other places in envoy that used memcpy on CMSG_DATA(&cmsg) so maybe that should be addressed separately?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, actually a colleague of mine found a reference to the alignment issue:
https://man7.org/linux/man-pages/man3/cmsg.3.html

CMSG_DATA()
              returns a pointer to the data portion of a cmsghdr.  The
              pointer returned cannot be assumed to be suitably aligned
              for accessing arbitrary payload data types.  Applications
              should not cast it to a pointer type matching the payload,
              but should instead use memcpy(3) to copy data to or from a
              suitably declared object.

So let's go ahead and memcpy() into a temporary object of the right type based on the length and then

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'll make that change and also try to add a unit test for the GRO case as mentioned above.

@jonathan-albrecht-ibm
Copy link
Contributor Author

A few things:

  1. What big-endian platforms are being targeted here? I thought that these days virtually everything was little-endian?
  2. Can you explain what part of the PR changes behavior on big-endian systems but not little-endian?
  3. The typical way of transfroming integers from the network byte order into the host byte order is via htons and htonl. Should we use that here?
  1. The big-endian platform I'm working with is s390x which is used on mainframes running Linux on IBM Z. It's still widely and actively used especially in the financial sector.
  2. This PR changes Envoy to read the gso_size_ and tos_ values using the full length of the CMSG_DATA(cmsg) value. I think I can explain it best with an example. For gso_size_, the value is currently read like this:
output.msg_[0].gso_size_ = *reinterpret_cast<uint16_t*>(CMSG_DATA(cmsg))

which takes the 2 bytes starting at CMSG_DATA(cmsg) and reinterprets them as uint16_t. But the actual length of the data stored at CMSG_DATA(cmsg) is given by cmsg.cmsg_len - CMSG_LEN(0) and is 4 bytes. Note that the value in CMSG_DATA(cmsg) is little endian on little endian platforms. So it happens to work on little endian platforms because the byte order puts the little end of the value first at CMSG_DATA(cmsg). On big endian platforms, the value at CMSG_DATA(cmsg) is in big endian byte order so the first two bytes just contains 0s because the actual value is in the last two bytes.
This PR effectively changes the gso_size_ value to be read like:

output.msg_[0].gso_size_ = static_cast<uint16_t>(*reinterpret_cast<uint32_t*>(CMSG_DATA(cmsg)))

so that the full length of CMSG_DATA(cmsg) is used. This is correct regardless of endianness.
From checking the cmsg.cmsg_len value for all gso_size_ and tos_ cases, I found that the actual sizes in bytes of CMSG_DATA(cmsg)) on both x86_64 and s390x are:

value (type) IPV4 IPV6
gso_size_ (unit16_t) 4 4
tos_ (uint8_t) 1 4

so for the case (IPV4, tos_), the size of tos_ and its CMSG_DATA(cmsg) are both 1 so both little endian and big endian were able to read the values correctly but in the other cases the size of gso_size_ or tos_ was smaller than their CMSG_DATA(cmsg) which was ok on little endian but not for big endian.
I couldn't find any info on why the sizes of CMSG_DATA(cmsg) are what they are so in the PR I handle 1, 2, 4 and 8 byte lengths in case its different on other platforms.
Sorry I didn't put more information in the description. I was really surprised by all of this.

  1. No we don't need to use htons and htonl because the values in CMSG_DATA(cmsg)) are already in native byte order.

@RyanTheOptimist
Copy link
Contributor

  1. The big-endian platform I'm working with is s390x which is used on mainframes running Linux on IBM Z. It's still widely and actively used especially in the financial sector.

Oh, fascinating. Thanks!

I'm concerned, however, that the QUIC code in general will not work on big endian systems. From talking with colleagues who work on BoringSSL, I believe it assumes little endian, though I could be wrong. But the QUICHE QUIC code which Envoy uses definitely assumes little endian:

I'm happy to fix this GRO/ECN code to do the right thing with these fields, but I'm rather confused by how QUIC is working on these systems?

  1. This PR changes Envoy to read the gso_size_ and tos_ values using the full length of the CMSG_DATA(cmsg) value. I think I can explain it best with an example. For gso_size_, the value is currently read like this:
output.msg_[0].gso_size_ = *reinterpret_cast<uint16_t*>(CMSG_DATA(cmsg))

which takes the 2 bytes starting at CMSG_DATA(cmsg) and reinterprets them as uint16_t. But the actual length of the data stored at CMSG_DATA(cmsg) is given by cmsg.cmsg_len - CMSG_LEN(0) and is 4 bytes. Note that the value in CMSG_DATA(cmsg) is little endian on little endian platforms. So it happens to work on little endian platforms because the byte order puts the little end of the value first at CMSG_DATA(cmsg). On big endian platforms, the value at CMSG_DATA(cmsg) is in big endian byte order so the first two bytes just contains 0s because the actual value is in the last two bytes. This PR effectively changes the gso_size_ value to be read like:

output.msg_[0].gso_size_ = static_cast<uint16_t>(*reinterpret_cast<uint32_t*>(CMSG_DATA(cmsg)))

so that the full length of CMSG_DATA(cmsg) is used. This is correct regardless of endianness. From checking the cmsg.cmsg_len value for all gso_size_ and tos_ cases, I found that the actual sizes in bytes of CMSG_DATA(cmsg)) on both x86_64 and s390x are:

value (type) IPV4 IPV6
gso_size_ (unit16_t) 4 4
tos_ (uint8_t) 1 4
so for the case (IPV4, tos_), the size of tos_ and its CMSG_DATA(cmsg) are both 1 so both little endian and big endian were able to read the values correctly but in the other cases the size of gso_size_ or tos_ was smaller than their CMSG_DATA(cmsg) which was ok on little endian but not for big endian. I couldn't find any info on why the sizes of CMSG_DATA(cmsg) are what they are so in the PR I handle 1, 2, 4 and 8 byte lengths in case its different on other platforms. Sorry I didn't put more information in the description. I was really surprised by all of this.

  1. No we don't need to use htons and htonl because the values in CMSG_DATA(cmsg)) are already in native byte order.

Ah! Thanks. I misunderstood the problem. I assumed the issue was a byte order conversion problem, but as you point out it's really a truncation problem. I see. Thanks for the explanation.

@jonathan-albrecht-ibm
Copy link
Contributor Author

I'm happy to fix this GRO/ECN code to do the right thing with these fields, but I'm rather confused by how QUIC is working on these systems?

I'm also working on endian issues in envoy's dependencies as needed so no QUIC doesn't work out of the box on s390x. We're not always able to get all of our changes upstream but if the community is able to accept these kind of changes we try to upstream them to help with future porting efforts.

@RyanTheOptimist
Copy link
Contributor

I'm happy to fix this GRO/ECN code to do the right thing with these fields, but I'm rather confused by how QUIC is working on these systems?

I'm also working on endian issues in envoy's dependencies as needed so no QUIC doesn't work out of the box on s390x. We're not always able to get all of our changes upstream but if the community is able to accept these kind of changes we try to upstream them to help with future porting efforts.

Oh, I see. So in the PR description when it talks about fixing:

//test/common/quic:active_quic_listener_test //test/integration:quic_http_integration_test
that's including some patches to Envoy dependencies? I see. I ask about QUIC specifically because I also work on the QUICHE codebase. I'd be interested to learn more about the work required to get that working on your systems, though that's orthogonal to this PR, of course.

@RyanTheOptimist
Copy link
Contributor

/wait

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants