-
Notifications
You must be signed in to change notification settings - Fork 30
Encoded Histogram format
The Encoded Histogram format is a binary format of a histogram header and its count values. The format is used as part of the V2-Log-Encoding to allow histograms to be persisted or shared.
[Cookie][LengthOfCompressedContents][CompressedHeader]
At a high level, the format is just the cookie to identify the type of histogram, the length of the compressed contents and then the compressed contents.
The cookie is a 32bit integer used to identify the characteristics of the subsequent data. The cookie is used to identify the version of the encoding format.
The current format uses a constant value of 478450452
(or 0x1c849314
in hex) for the cookie to represent a v2 encoding.
As a side note cookie values have been chosen so that when they are base64Encoded the output string will start with "HIST"
This 32 bit integer value simply defines how long the [CompressedHeader]
content is.
DEFALTE(
[Cookie][payloadLengthInBytes][normalizingIndexOffset][numberOfSignificantValueDigits][lowestTrackableUnitValue][highestTrackableValue][integerToDoubleValueConversionRatio][CompressedContents]
)
The Zlib compressed payload of an HdrHistogram. The uncompressed payload is stored in binary in the following format
01234567890123456789012345678901234567890
[A ][B ][C ][D ][E ][F ][G ][H....>
- A. Cookie value. 32bit integer (4bytes) that stores a code for the type of Histogram.
- B. Payload length as a 32bit integer (4bytes).
- C. NomalizingIndex offset as an 32bit integer (4bytes). Currently not implemented in HdrHistogram.NET
- D. Number of significant value digits (4bytes)
- E. Lowest Trackable Unit value as a 64bit integer (8bytes).
- F. Highest Trackable Unit value as a 64bit integer (8bytes).
- G. Integer to double conversion ratio as a double floating point number (8bytes)
- H. Compressed contents
ZigZaggedEncoding(
[count]*
)
A ZigZag encoded (LEB128-64b9B-variant format) compression of the counts for an HdrHistogram. The LEB128-64b9B-variant format is
- Little Endian Base128 Encoding
- 64bit values store as a maximum of 9Bytes. The standard LEB128 can use 10Bytes.
- Only supports values up to 2^63 (not 2^64) i.e. HdrHistogram only support positive signed long values for counts.
- Zig Zagged so that common/small positive and negative values use the least amount of space. As counts are only allowed to be 0 or a positive 64bit number, we can use the negative range of numbers to represent consecutive zero values. As it can be very common for there to be numerous consecutive buckets with counts of zero, this can provide significant space saving.
Consecutive 0 counts will be compressed into a single negative number indicating the span of the zero count sequence. Positive counts will be represented as the actual number. Counts are recorded as 64bit integers.
LEB128's variable length encoding provides for using a smaller number of bytes for smaller values, and the use of ZigZag encoding allows small(closer to zero) negative values to use fewer bytes.
The LEB128-64b9B-variant encoding used here diverges from the "original" LEB128 as it extends to 64 bit values. In the original LEB128, a 64 bit value can take up to 10 bytes in the stream, where this variant's encoding of a 64 bit values will max out at 9 bytes. As such, this encoder/decoder should NOT be used for encoding or decoding "standard" LEB128 formats (e.g.Google Protocol Buffers).
The follow sample will deconstruct the following encoded histogram.
HISTFAAAAEV42pNpmSzMwMCgyAABTBDKT4GBgdnNYMcCBvsPEBEJISEuATEZMQ4uASkhIR4nrxg9v2lMaxhvMekILGZkKmcCAEf2CsI=
First we Base64 decode the string to a byte array.
[28,132,147,20,0,0,0,69,120,218,147,105,153,44,204,192,192,160,200,0,1,76,16,
202,79,129,129,129,217,205,96,199,2,6,251,15,16,17,9,33,33,46,1,49,25,49,14,
46,1,41,33,33,30,39,175,24,61,191,105,76,107,24,111,49,233,8,44,102,100,42,
103,2,0,71,246,10,194]
From this array we can read out the first 4 bytes as the Cookie
, and the next 4 bytes as the CompressedPayloadLength
Name | Bytes | Type | Value |
---|---|---|---|
Cookie | [28,132,147,20] | 32 bit integer | 478450452 |
CompressedPayloadLength | [0,0,0,69] | 32 bit integer | 69 |
The remainder of the decoded byte array is the compressed contents. As the cookie value equals the expected value for a V2 encoding, we know the header of the compressed data will be 40 uncompressed bytes long.
So we deflate the our compressed contents to get the following byte array
[28,132,147,19,0,0,0,33,0,0,0,0,0,0,0,2,0,0,0,0,0,0,78,32,0,0,3,70,48,184,
160,0,63,240,0,0,0,0,0,0,24,18,18,10,16,22,28,22,8,10,16,26,18,18,12,66,74,
92,46,78,150,2,172,1,218,2,44,16,163,1,2,119,2]
Of this byte array the first 40 bytes is the header
Name | Bytes | Type | Value |
---|---|---|---|
Cookie | [28,132,147,19] | 32 bit integer | 478450451 |
Payload length | [0,0,0,33] | 32 bit integer | 33 |
Normalizing index offset | [0,0,0,0] | 32 bit integer | 0 |
Number of significant value digits | [0,0,0,2] | 32 bit integer | 2 |
Lowest trackable value | [0,0,0,0,0,0,78,32] | 64 bit integer | 20,000 |
Highest trackable value | [0,0,3,70,48,184,160,0] | 64 bit integer | 3,600,000,000,000 |
Integer to double conversion ratio | [63,240,0,0,0,0,0,0] | double precision float | 1.0 |
The remaining bytes of the deflated contents represent the ZigZag encoded histogram values.
[24,18,18,10,16,22,28,22,8,10,16,26,18,18,12,66,74,
92,46,78,150,2,172,1,218,2,44,16,163,1,2,119,2]
Decoded value |
---|
12 |
9 |
9 |
5 |
8 |
11 |
14 |
11 |
4 |
5 |
8 |
13 |
9 |
9 |
6 |
33 |
37 |
46 |
23 |
39 |
139 |
86 |
173 |
22 |
8 |
-82 |
1 |
-60 |
1 |
Note that the negative value represent consecutive 0 value counts. With this in mind the values for the Histogram translate to
Index | Value |
---|---|
0 | 12 |
1 | 9 |
2 | 9 |
3 | 5 |
4 | 8 |
5 | 11 |
6 | 14 |
7 | 11 |
8 | 4 |
9 | 5 |
10 | 8 |
11 | 13 |
12 | 9 |
13 | 9 |
14 | 6 |
15 | 33 |
16 | 37 |
17 | 46 |
18 | 23 |
19 | 39 |
20 | 139 |
21 | 86 |
22 | 173 |
23 | 22 |
24 | 8 |
107 | 1 |
168 | 1 |
A LinqPad code sample in C# - SampleDecoding.linq
- ZLIB RCF-1950 compression - [https://en.wikipedia.org/wiki/Zlib], [https://www.ietf.org/rfc/rfc1950.txt]
- Base64 Encoding - [https://en.wikipedia.org/wiki/Base64], [https://www.base64encode.org/]
- ZigZag LEB128 compression - [https://en.wikipedia.org/wiki/LEB128], [https://gist.github.com/mfuerstenau/ba870a29e16536fdbaba]