Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add support for variable ints #26

Open
sydp opened this issue Oct 16, 2021 · 4 comments
Open

add support for variable ints #26

sydp opened this issue Oct 16, 2021 · 4 comments

Comments

@sydp
Copy link

sydp commented Oct 16, 2021

Some data formats such as sqlite and leveldb use varints to efficiently serialise integer values.

@joachimmetz
Copy link
Member

joachimmetz commented Oct 16, 2021

I assume different data format use different VARINT formats/specifications. Do you know which are used by sqlite and leveldb?

Also Python is not the most efficient when doing many bit operations

Context: https://wiki.vg/Data_types#VarInt_and_VarLong

@sydp
Copy link
Author

sydp commented Oct 17, 2021

From what I can tell sqlite3 supports 64-bit big endian signed (two's complement) varints [0] whilst leveldb uses unsigned LE varints of size 32-bit [1] and 64-bit [2].

[0] https://sqlite.org/fileformat2.html
A variable-length integer or "varint" is a static Huffman encoding of 64-bit twos-complement integers that uses less space for small positive values. A varint is between 1 and 9 bytes in length. The varint consists of either zero or more bytes which have the high-order bit set followed by a single byte with the high-order bit clear, or nine bytes, whichever is shorter. The lower seven bits of each of the first eight bytes and all 8 bits of the ninth byte are used to reconstruct the 64-bit twos-complement integer. Varints are big-endian: bits taken from the earlier byte of the varint are more significant than bits taken from the later bytes.

[1] https://github.com/google/leveldb/blob/c5d5174a66f02e66d8e30c21ff4761214d8e4d6d/util/coding.cc#L21
[2] https://github.com/google/leveldb/blob/c5d5174a66f02e66d8e30c21ff4761214d8e4d6d/util/coding.cc#L55

@sydp
Copy link
Author

sydp commented Oct 17, 2021

Furthermore, note ZigZag encoding for signed varints in protobufs:

https://developers.google.com/protocol-buffers/docs/encoding#signed_integers

@joachimmetz
Copy link
Member

Would need to think a bit on how to best integrate these into the definitions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants