Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement LEB128 stdlib #2098

Closed
furesoft opened this issue Apr 22, 2024 · 2 comments
Closed

Implement LEB128 stdlib #2098

furesoft opened this issue Apr 22, 2024 · 2 comments
Labels

Comments

@furesoft
Copy link

Add Buffer.addNumber(Number) that appends the bytes of the number based on the varint algorithm.

Some file formats or protocols rely on that feature so it could be helpful to have this function

@ospencer
Copy link
Member

Hey @furesoft, thanks for raising this.

Grain's Number type is a composition of many number subtypes: integers, bigints, floats, rationals. As such, they all have different byte layouts and we wouldn't be able to generically implement an setNumber as it wouldn't be meaningful or have some special unknown encoding that also wouldn't be very helpful. We could of course do it for the integer-specific types.

That said though, there are various varint algorithms and I don't think it makes sense to lock Buffer to just one. For the ones we'd want to implement, they should just be their own libraries. The one that makes the most sense for Grain (in the context of WebAssembly and such) is LEB128. It could have functions that encode to Bytes or directly into a Buffer. We would likely keep it to the fixed-width ints for now, as that would cover most people's use cases and explore an implementation for BigInt later, mostly because Grain bigints are not two's complement.

While not necessary for a basic implementation, we should really consider SIMD for this because naive LEB128 encoding/decoding is slow.

The algorithm to consider would be Vectorized VByte Decoding. It uses 16 SIMD instructions. 15 of them are directly implemented by WebAssembly, with pmovsxbd able to be implemented by a combination of i16x8.extend_low_i8x16_u and i32x4.extend_low_i16x8_u.

We can probably implement the naive version first, as it will take some time to get SIMD instructions into Grain.

@ospencer ospencer changed the title Add VarInt Algorithm to store numbers more efficient in data transfer Implement LEB128 stdlib Apr 22, 2024
@spotandjake
Copy link
Member

spotandjake commented Aug 16, 2024

Instead of a specific LEB128 library given it would really only have 4 encoding/decoding libraries for uleb and leb. what if we add a Number.encode(number, format=LEB127) -> Bytes on the number library and to the bytes and buffer libries we add a getLeb127(position) and the respective unsighned versions. alternatively we could have a getNumber(position, format=LEB127) which leaves this api open for more number formats in the future?

We can also add Buffer.encodeNumber(x, format=LEB127) if we wanted as an option to skip the allocation from allocating a byte region, though I think that would be a pretty minimal optimization in most cases.

@furesoft furesoft closed this as completed Nov 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants