Implement LEB128 stdlib #2098

furesoft · 2024-04-22T12:13:20Z

Add Buffer.addNumber(Number) that appends the bytes of the number based on the varint algorithm.

Some file formats or protocols rely on that feature so it could be helpful to have this function

ospencer · 2024-04-22T14:24:05Z

Hey @furesoft, thanks for raising this.

Grain's Number type is a composition of many number subtypes: integers, bigints, floats, rationals. As such, they all have different byte layouts and we wouldn't be able to generically implement an setNumber as it wouldn't be meaningful or have some special unknown encoding that also wouldn't be very helpful. We could of course do it for the integer-specific types.

That said though, there are various varint algorithms and I don't think it makes sense to lock Buffer to just one. For the ones we'd want to implement, they should just be their own libraries. The one that makes the most sense for Grain (in the context of WebAssembly and such) is LEB128. It could have functions that encode to Bytes or directly into a Buffer. We would likely keep it to the fixed-width ints for now, as that would cover most people's use cases and explore an implementation for BigInt later, mostly because Grain bigints are not two's complement.

While not necessary for a basic implementation, we should really consider SIMD for this because naive LEB128 encoding/decoding is slow.

The algorithm to consider would be Vectorized VByte Decoding. It uses 16 SIMD instructions. 15 of them are directly implemented by WebAssembly, with pmovsxbd able to be implemented by a combination of i16x8.extend_low_i8x16_u and i32x4.extend_low_i16x8_u.

We can probably implement the naive version first, as it will take some time to get SIMD instructions into Grain.

spotandjake · 2024-08-16T18:19:24Z

Instead of a specific LEB128 library given it would really only have 4 encoding/decoding libraries for uleb and leb. what if we add a Number.encode(number, format=LEB127) -> Bytes on the number library and to the bytes and buffer libries we add a getLeb127(position) and the respective unsighned versions. alternatively we could have a getNumber(position, format=LEB127) which leaves this api open for more number formats in the future?

We can also add Buffer.encodeNumber(x, format=LEB127) if we wanted as an option to skip the allocation from allocating a byte region, though I think that would be a pretty minimal optimization in most cases.

ospencer changed the title ~~Add VarInt Algorithm to store numbers more efficient in data transfer~~ Implement LEB128 stdlib Apr 22, 2024

spotandjake added the stdlib label Jun 10, 2024

furesoft closed this as completed Nov 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement LEB128 stdlib #2098

Implement LEB128 stdlib #2098

furesoft commented Apr 22, 2024

ospencer commented Apr 22, 2024

spotandjake commented Aug 16, 2024 •

edited

Loading

Implement LEB128 stdlib #2098

Implement LEB128 stdlib #2098

Comments

furesoft commented Apr 22, 2024

ospencer commented Apr 22, 2024

spotandjake commented Aug 16, 2024 • edited Loading

spotandjake commented Aug 16, 2024 •

edited

Loading