Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add functions for reading and writing length-prefixed data with customizable encodings for the length #2867

Merged
merged 18 commits into from
Oct 2, 2024

Conversation

hamzahrmalik
Copy link
Contributor

Add functions for reading and writing length-prefixed data with customizable encodings for the length, particularly for quic variable-length integers (RFC 9000)

Motivation:

Many protocols require us to write data and then prefix that data with its length. But each protocol has a different way of encoding the length. This PR introduces general purpose functions which can be extended for different encoding strategies

Modifications:

Create a new protocol which defines how to encode an integer
Implement this protocol for QUIC

Provide functions on bytebuffer for writing length-prefixed buffers, strings or bytes

@hamzahrmalik hamzahrmalik added the semver/minor Adds new public API. label Sep 4, 2024
Copy link
Contributor

@glbrntt glbrntt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is neat! I've not reviewed it all but left some initial feedback and questions to make sure I understood how it all fits together.

Sources/NIOCore/ByteBuffer-binaryEncodedLengthPrefix.swift Outdated Show resolved Hide resolved
Sources/NIOCore/ByteBuffer-binaryEncodedLengthPrefix.swift Outdated Show resolved Hide resolved
Sources/NIOCore/ByteBuffer-binaryEncodedLengthPrefix.swift Outdated Show resolved Hide resolved
Sources/NIOCore/ByteBuffer-binaryEncodedLengthPrefix.swift Outdated Show resolved Hide resolved
Sources/NIOCore/ByteBuffer-binaryEncodedLengthPrefix.swift Outdated Show resolved Hide resolved
/// If you use write more or less than this many bytes, we wll need to move bytes around to make that possible
/// Therefore, you may decide to encode differently to use more bytes than you actually would need, if it is possible for you to do so
/// That way, you waste the reserved bytes, but improve writing performance
func writeIntegerWithReservedSpace(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This API 'stutters' a little bit: could we use reservedCapacity to overload writeInteger? i.e.

func writeInteger(_ value: Int, to buffer: inout ByteBuffer, reservedCapacity: Int) -> Int

to buffer: inout ByteBuffer
) -> Int

/// When writing an integer using this strategy, how many bytes we should reserve
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found this and writeIntegerWithReservedSpace quite unintuitive without looking at how they were used. It became clearer when I started reading "Integer" as "Length". Is that the right way to think about it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah i agree this is confusing

The idea here is that reservedSpaceForInteger is the amount of space that someone should reserve if theyre intending to write an integer using your strategy
And the writeIntegerWithReservedSpace is for when someone has already reserved some space, and now wants you to write an integer into that space. You can then make the decision on whether you want to try and use exactly that much space (this is what the quic impl does) even if you could've used less. Ie you choose whether you optimize for the smallest possible encoding of the int, or for trying to avoid moving bytes around

This is in practice currently only used by the writePrefixed* functions so yes, there the integer is a length. But i don't think it has to be a length

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be used to write an integer, but in the context of this protocol should it be used for anything other than the length?

IMO we should make the intent of the requirement clear, if you wanted to write an integer then you should just call the writeInteger(_:to:) method.

Sources/NIOCore/ByteBuffer-binaryEncodedLengthPrefix.swift Outdated Show resolved Hide resolved
Sources/NIOCore/ByteBuffer-binaryEncodedLengthPrefix.swift Outdated Show resolved Hide resolved
Copy link
Contributor

@glbrntt glbrntt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few more comments. The bulk of this looks great but still unsure around some of the naming.

/// If you use write more or less than this many bytes, we wll need to move bytes around to make that possible.
/// Therefore, you may decide to encode differently to use more bytes than you actually would need, if it is possible for you to do so.
/// That way, you waste the reserved bytes, but improve writing performance.
func writeIntegerWithReservedCapacity(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: repeating "reserved" in the name and in the parameter name. Is that necessary?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

George's note here is good, we can probably remove the note in the name.

Sources/NIOCore/ByteBuffer-binaryEncodedLengthPrefix.swift Outdated Show resolved Hide resolved
Sources/NIOCore/ByteBuffer-binaryEncodedLengthPrefix.swift Outdated Show resolved Hide resolved
Sources/NIOCore/ByteBuffer-binaryEncodedLengthPrefix.swift Outdated Show resolved Hide resolved
#elseif canImport(Bionic)
import Bionic
#else
#error("The Byte Buffer module was unable to identify your C library.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MaxDesiatov don't we need WASIlibc here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thanks for catching this!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we need this, then it's missing across all of NIO

//===----------------------------------------------------------------------===//

#if os(Windows)
import ucrt
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hamzahrmalik why do we need the libcs here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the memmove function

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, in this case we should remove the imports here.

  • we shouldn't use *Unsafe* here, the most minimal subset of ByteBuffer-core.swift needs to provide enough
  • memmove isn't necessary to be called directly. UnsafeRawBufferPointer can move memory with a higher-level API (I think it's bufferPointer[0..<100] = bufferPointer[2..<102] or something like that; but again, no Unsafe in here please)

#elseif canImport(Musl)
import Musl
#elseif canImport(Bionic)
import Bionic
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
import Bionic
import Bionic
#elseif canImport(WASILibc)
import WASILibc

// Add the required number of bytes to the end first
self.writeRepeatingByte(0, count: requiredSpace)
// Move the data forward by that many bytes, to make space at the front
self.withVeryUnsafeMutableBytes { pointer in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we avoid *Unsafe* here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really: this is the safe wrapper around the access.

Ultimately this is implementing memmove on ByteBuffer. Implementing it safely is somewhere between very hard and impossible using the APIs we currently have. We can run through the options:

  1. Bytewise copy using writeInteger. This is going to be very costly for large moves, which is the concern we have in this method.
  2. Manually-autovectorised copy using writeMultipleInteger and wider integers. This can work, but it's still going to be very slow compared to memmove and it will be much more complex code as we have to handle a wide range of edge-cases.
  3. Move using writeImmutableBuffer. This will, unfortunately, trigger a CoW.
  4. Move using ByteBufferView. This will also trigger a CoW.

I fundamentally can't see an implementation here that performs well enough to be used on larger buffers, which is also the circumstance in which it is most likely that we'll need a move. But I'm open to other options.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, if this is necessary functionality, can we add it to ByteBuffer-core.swift and use in other places too? If my memory serves me right, then @glbrntt added some memory-moving API already a few years back, might be wrong about that though

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Close enough, I added copyBytes(at:to:length:) 5 years ago as part of making ByteBufferView mutable. Good memory!

https://github.com/apple/swift-nio/blame/282f5935cf3352b3d026c35eb57cb3619dd9536f/Sources/NIOCore/ByteBuffer-core.swift#L1077

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hamzahrmalik isn't that exactly what we need? memmove == copying bytes too

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this looks like we could use it here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, this works perfectly

/// If you use write more or less than this many bytes, we wll need to move bytes around to make that possible.
/// Therefore, you may decide to encode differently to use more bytes than you actually would need, if it is possible for you to do so.
/// That way, you waste the reserved bytes, but improve writing performance.
func writeIntegerWithReservedCapacity(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

George's note here is good, we can probably remove the note in the name.

to buffer: inout ByteBuffer
) -> Int

/// An estimate of the bytes required to write integers using ths strategy.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// An estimate of the bytes required to write integers using ths strategy.
/// An estimate of the bytes required to write integers using this strategy.

var requiredBytesHint: Int { get }

/// Write an integer to a buffer. Move the writer index to after the written integer.
/// Call this function if you have already reserved some capacity for an integer to be written.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tone of this sentence seems wrong: it's an instruction to the caller, but the caller is us.


/// Write an integer to a buffer. Move the writer index to after the written integer.
/// Call this function if you have already reserved some capacity for an integer to be written.
/// Implementors should consider using a less efficient encoding, if possible,to fit exactly within the reserved capacity.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Implementors should consider using a less efficient encoding, if possible,to fit exactly within the reserved capacity.
/// Implementers should consider using a less efficient encoding, if possible,to fit exactly within the reserved capacity.

/// Call this function if you have already reserved some capacity for an integer to be written.
/// Implementors should consider using a less efficient encoding, if possible,to fit exactly within the reserved capacity.
/// Otherwise, callers may need to shift bytes to reconcile the difference.
/// It is up to the implementor to find the balance between performance and size.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// It is up to the implementor to find the balance between performance and size.
/// It is up to the implementer to find the balance between performance and size.

/// Read a binary encoded integer, moving the `readerIndex` appropriately.
/// If there are not enough bytes, nil is returned.
@inlinable
public mutating func readEncodedInteger<Strategy: NIOBinaryIntegerEncodingStrategy>(_ strategy: Strategy) -> Int? {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should probably be generic over FixedWidthInteger, the way the existing readInteger and writeInteger functions are.

//===----------------------------------------------------------------------===//

extension ByteBuffer {
public struct QUICBinaryEncodingStrategy: NIOBinaryIntegerEncodingStrategy {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be nice to have a doc comment here.


/// - Parameter requiredBytesHint: An estimate of the bytes required to write integers using this strategy
@inlinable
public init(requiredBytesHint: Int) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we think it's useful to make this configurable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In some cases, a user might have a rough idea of how many bytes they're expecting to write. In such cases, allowing them to use a smaller number would result in saved bytes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative would be, make this not configurable, but then add a parameter to ByteBuffer/writeLengthPrefixed(strategy:writeData:)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leaning towards the alternative
Having the requiredBytesHint parameter here is quite confusing

Copy link
Contributor

@glbrntt glbrntt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This mostly looks really good. Most of my comments are doc nits but a couple are minor API tweaks that I think we should fix.

Comment on lines 15 to 16
/// Describes a way to encode and decode an integer as bytes
/// For more information, see <doc:ByteBuffer-lengthPrefix>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this once sentence or two? We need some punctuation between the two otherwise it'll be rendered as "... bytes For more information, ..."


/// Write an integer to a buffer. Move the writer index to after the written integer.
/// - Parameters:
/// - integer: The type of the integer to write.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// - integer: The type of the integer to write.
/// - integer: The integer to write.


/// Write an integer to a buffer. Move the writer index to after the written integer.
/// This function will be called when an integer needs to be written, and some capacity has already been reserved for it.
/// Implementers should consider using a less efficient encoding, if possible,to fit exactly within the reserved capacity.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Implementers should consider using a less efficient encoding, if possible,to fit exactly within the reserved capacity.
/// Implementers should consider using a less efficient encoding, if possible, to fit exactly within the reserved capacity.

Comment on lines 9 to 10
integer encodings, in which smaller numbers can be encoded in fewer bytes. The first 2 bits of the first byte indicate
how many further bytes should be read.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the QUIC example is great but don't think we need the detail about how the QUIC strategy works.

Suggested change
integer encodings, in which smaller numbers can be encoded in fewer bytes. The first 2 bits of the first byte indicate
how many further bytes should be read.
integer encodings, in which smaller numbers can be encoded in fewer bytes.


- Encode the length of the integer into the integer itself when writing, so it knows how many bytes to read when
reading. This is what QUIC does.
- Always use the same length, e.g. a simple strategy which always writes the integer as a UInt64.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Always use the same length, e.g. a simple strategy which always writes the integer as a UInt64.
- Always use the same length, e.g. a simple strategy which always writes the integer as a `UInt64`.

- If the length ends up needing more bytes than we had reserved, shuffle the data forward to make space

This code will be most performant when the `requiredBytesHint` is exactly correct, because it will avoid needing to
shuffle any bytes. With that in mind, we can actually make one more optimisation: when we call the writeInteger function
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
shuffle any bytes. With that in mind, we can actually make one more optimisation: when we call the writeInteger function
shuffle any bytes. With that in mind, we can actually make one more optimisation: when we call the `writeInteger` function

) -> Int
```

Many strategies will not be able to do anything useful with the additional `reservedCapacity` parameter. For e.g., in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "e.g." is the abbreviation of "exempli gratia" which means "for the sake of example", so no need to write "For e.g.", Just use "e.g." or "for example"

Suggested change
Many strategies will not be able to do anything useful with the additional `reservedCapacity` parameter. For e.g., in
Many strategies will not be able to do anything useful with the additional `reservedCapacity` parameter. For example, in

Comment on lines 83 to 86
public mutating func readEncodedInteger<Strategy: NIOBinaryIntegerEncodingStrategy, Integer: FixedWidthInteger>(
_ strategy: Strategy,
as: Integer.Type = Integer.self
) -> Integer? {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

readEncodedInteger(.quic) is a bit odd when using type inference (and is different to writeEncodedInteger(42, strategy: .quic)).

I think this should be spelled as readEncodedInteger(as:strategy:) which works well with or without type inference and mirrors the write API nicely.

}

/// Prefixes bytes written by `writeData` with the number of bytes written.
/// The number of bytes written is encoded using `strategy`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// The number of bytes written is encoded using `strategy`
/// The number of bytes written is encoded using `strategy`.

/// If there are not enough bytes to read the full slice, the readerIndex will stay unchanged.
@inlinable
public mutating func readLengthPrefixedSlice<Strategy: NIOBinaryIntegerEncodingStrategy>(
_ strategy: Strategy
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about dropping the label on the strategy here, I think we should keep the strategy label.

Copy link
Contributor

@glbrntt glbrntt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good modulo one doc nit!

@@ -129,10 +131,14 @@ extension ByteBuffer {

extension NIOBinaryIntegerEncodingStrategy where Self == ByteBuffer.QUICBinaryEncodingStrategy {
@inlinable
/// Encodes bytes as defined in RFC 9000 § 16
/// - Parameter requiredBytesHint: An estimate of the bytes required to write integers using this strategy
/// - Returns: <#description#>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one needs filling out, <#description#>

}

@usableFromInline
func bytesNeededForInteger<IntegerType: FixedWidthInteger>(_ integer: IntegerType) -> Int {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think exposing this logic publicly would be useful. Consider a case where I want to write multiple variable length integers to a buffer - how can I appropriately size the buffer's capacity ahead of time? I can assume they'll all need 8 bytes, but then I'm going to over allocate.

var buffer = allocator.buffer(capacity: enoughRoomForTwoVarints)
buffer.writeInteger(myInt, strategy: .quic)
buffer.writeInteger(myOtherInt, strategy: .quic)

Copy link
Contributor

@Lukasa Lukasa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice one, let's get this landed.

@hamzahrmalik hamzahrmalik enabled auto-merge (squash) October 2, 2024 09:27
@hamzahrmalik hamzahrmalik merged commit fb87d5e into apple:main Oct 2, 2024
28 of 29 checks passed
var buffer = ByteBuffer()
let strategy = ByteBuffer.QUICBinaryEncodingStrategy.quic
let bytesWritten = strategy.writeInteger(
0b00000010_00011001_01111100_01011110_11111111_00010100_11101000_10001100,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@finagolfin @Lukasa @FranzBusch can we integrate the Android & 32 bit CIs into NIO's own github actions?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Franz and I discussed getting Android tested on this CI, I'm aiming to have something for that in the coming months. However, that just builds the source for armv7 but wouldn't run the tests.

If you have a 32-bit CI like watchOS or embedded that you could add quickly, that may be a better first step to get 32-bit platforms tested.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've raised a PR to fix this: #2904

hamzahrmalik added a commit that referenced this pull request Oct 7, 2024
… systems (#2904)

This is a followup to #2867

Some of the literals used in the test cases were too big to fit in an
Int32
This is fine on 64-bit systems, because the literals are considered as
`Int`, which is Int64 on those systems
However, on 32-bit systems, those literals are considered as Int64

Change: Add `as Int64` where needed, to tell the compiler we want these
literals to be treated as Int64, which should allow these tests to run
on 32 bit systems too
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
semver/minor Adds new public API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants