Instead of varlen encoding, which requires up to 8 conditional branches
in a loop, we use a simpler encoding which is just 4 instructions
without branching.
New encoding:
* length<=254: <1-byte length>
* otherwise: <0xFF> <usize length> <0xFF>
Improves throughputs by 10-80%, with the biggest improvement in
resolve().
get_or_intern/fill-empty/new/BufferBackend
time: [6.4562 ms 6.5068 ms 6.5684 ms]
thrpt: [15.224 Melem/s 15.369 Melem/s 15.489 Melem/s]
change:
time: [-8.4289% -7.3761% -6.1844%] (p = 0.00 < 0.05)
thrpt: [+6.5920% +7.9635% +9.2048%]
Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
2 (2.00%) high mild
5 (5.00%) high severe
get_or_intern/fill-empty/with_capacity/BufferBackend
time: [3.8182 ms 3.8467 ms 3.8796 ms]
thrpt: [25.776 Melem/s 25.996 Melem/s 26.190 Melem/s]
change:
time: [-18.750% -18.024% -17.246%] (p = 0.00 < 0.05)
thrpt: [+20.840% +21.986% +23.077%]
Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
2 (2.00%) high mild
5 (5.00%) high severe
get_or_intern/already-filled/BufferBackend
time: [2.7638 ms 2.8026 ms 2.8477 ms]
thrpt: [35.116 Melem/s 35.681 Melem/s 36.182 Melem/s]
change:
time: [-7.9910% -6.4121% -4.8432%] (p = 0.00 < 0.05)
thrpt: [+5.0897% +6.8514% +8.6851%]
Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
1 (1.00%) high mild
4 (4.00%) high severe
get_or_intern_static/BufferBackend/get_or_intern
time: [2.8467 µs 2.8632 µs 2.8807 µs]
thrpt: [18.051 Melem/s 18.162 Melem/s 18.267 Melem/s]
change:
time: [-1.7629% +0.8260% +4.7452%] (p = 0.71 > 0.05)
thrpt: [-4.5302% -0.8193% +1.7945%]
No change in performance detected.
Found 7 outliers among 100 measurements (7.00%)
3 (3.00%) high mild
4 (4.00%) high severe
get_or_intern_static/BufferBackend/get_or_intern_static
time: [2.8259 µs 2.8456 µs 2.8634 µs]
thrpt: [18.160 Melem/s 18.274 Melem/s 18.401 Melem/s]
change:
time: [-6.7386% -3.1051% -0.1965%] (p = 0.06 > 0.05)
thrpt: [+0.1969% +3.2046% +7.2255%]
No change in performance detected.
Found 3 outliers among 100 measurements (3.00%)
1 (1.00%) high mild
2 (2.00%) high severe
resolve/already-filled/BufferBackend
time: [239.71 µs 242.17 µs 245.19 µs]
thrpt: [407.85 Melem/s 412.93 Melem/s 417.16 Melem/s]
change:
time: [-46.443% -45.470% -44.561%] (p = 0.00 < 0.05)
thrpt: [+80.377% +83.386% +86.719%]
Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
4 (4.00%) high mild
6 (6.00%) high severe
get/already-filled/BufferBackend
time: [2.3035 ms 2.3222 ms 2.3427 ms]
thrpt: [42.686 Melem/s 43.062 Melem/s 43.411 Melem/s]
change:
time: [-2.6436% -1.3829% -0.0703%] (p = 0.03 < 0.05)
thrpt: [+0.0703% +1.4023% +2.7154%]
Change within noise threshold.
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) high mild
1 (1.00%) high severe
iter/already-filled/BufferBackend
time: [278.13 µs 279.55 µs 281.19 µs]
thrpt: [355.64 Melem/s 357.72 Melem/s 359.54 Melem/s]
change:
time: [-46.368% -42.429% -39.796%] (p = 0.00 < 0.05)
thrpt: [+66.102% +73.699% +86.454%]
Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
2 (2.00%) high mild
3 (3.00%) high severe