Storing decimals in minimal bytes #105684
Replies: 1 comment 1 reply
-
For historical reason, When processing in memory, aligning at whole powers of 2 can make some benefit. For storage, compressing makes sense. The compression method in this post seems leveraging the fact that only a small amount of significant bits within the If you have clear requirements of precision in your system, you can also look at the IEEE754 Decimal32/64 types (planned for .NET 10). They do not carry excessive unused precision, and is more widely accepted by other systems. |
Beta Was this translation helpful? Give feedback.
-
Background:
Decimals are great for representing human-made data. Humans enter data in decimal format and like to read data in decimal format (ignoring historical alternatives for now). The decimal class in C# is perfect for this but it takes up 16 bytes. 12 bytes are used to store the mantissa and 4 contain a sign flag and the scale (0-28).
In a real life use case, I am storing logs with decimal data. These logs get big. Compression algorithms such as Brotli are great for reducing the size. Still I wondered if the decimals could be stored more efficiently. So I came up with a system that stores decimals in 1-14 bytes.
It works by treating decimals as an 96bit unsigned integer (just like decimal itself) that is multiplied by a power of 10.
The power of 10 is between -28 and 28. The uint96 is split in 3x uint32. These are stored in their smallest representation. Zeroes are left out. The amount of bytes used to store the uint96 is stored in a header byte. The header byte also has bit flags for the sign and whether or not there is a power of 10/scale byte.
The result is:
What do you think of this?
Code:
Tests:
Edit: I ran some tests where the decimals were compressed using the code below and then brotli compressed vs just brotli compressed.
The files (real world data) contain other data types as well. I ran 3 tests:
Based on this I conclude that the decimal values take up
420858 - 352697 = 68161
bytes in the baseline and399007 - 352697 = 46310
bytes in the test. This means the decimal compression reduced the required bytes for decimals by ~32%, a result that I am happy with.Beta Was this translation helpful? Give feedback.
All reactions