Use const-generics #19

gendx · 2019-12-10T20:50:11Z

For now, the DecoderState and BitTree structs use memory allocations (in Vec) to store their state, despite having sizes known at compile time. This is in the most part to avoid code duplication.

Once const generics are stable enough, they should be used instead to remove the need for dynamic allocation, and make the whole state struct more compact and contiguous (therefore potentially more cache-friendly).

This should hopefully improve performance a bit.

The text was updated successfully, but these errors were encountered:

npmccallum · 2021-03-25T15:56:06Z

Const generics are now stable in 1.51.

gendx · 2021-05-03T06:58:21Z

I had a brief look, the min_const_generics feature was indeed stabilized in 1.51, but we still need the more complex const_evaluatable_checked feature due to the 1 << num_bits expression here.

In the meantime, it might be worth experimenting with it on nightly to see if benchmarks would improve with const generics.

chyyran · 2022-08-06T03:59:31Z

I've been doing some experiments in https://github.com/chyyran/lzma-rs/commits/feature-perf-experiments and found that const generics in BitTree and RangeCoder doesn't actually contribute that much performance benefit, except for large dictionaries.

Without const generics (but with some additional optimizations) SnowflakePowered@ecc71a0

running 8 tests
test compress_65536                  ... bench:   1,372,700 ns/iter (+/- 32,501)
test compress_empty                  ... bench:         713 ns/iter (+/- 35)
test compress_hello                  ... bench:       1,021 ns/iter (+/- 50)
test decompress_after_compress_65536 ... bench:   2,463,569 ns/iter (+/- 77,148)
test decompress_after_compress_empty ... bench:       1,720 ns/iter (+/- 146)
test decompress_after_compress_hello ... bench:       2,219 ns/iter (+/- 50)
test decompress_big_file             ... bench:   3,959,745 ns/iter (+/- 3,155,824)
test decompress_huge_dict            ... bench:       2,221 ns/iter (+/- 136)

With const generics SnowflakePowered@ad7d096

running 8 tests
test compress_65536                  ... bench:   1,379,341 ns/iter (+/- 160,163)
test compress_empty                  ... bench:         711 ns/iter (+/- 57)
test compress_hello                  ... bench:       1,016 ns/iter (+/- 30)
test decompress_after_compress_65536 ... bench:   2,442,160 ns/iter (+/- 125,797)
test decompress_after_compress_empty ... bench:         544 ns/iter (+/- 110)
test decompress_after_compress_hello ... bench:       1,003 ns/iter (+/- 47)
test decompress_big_file             ... bench:   3,939,209 ns/iter (+/- 177,650)
test decompress_huge_dict            ... bench:       1,065 ns/iter (+/- 40)

Probably worth pursuing in the future because of that benefit but SnowflakePowered@ad7d096 uses kind of a nasty workaround to get around the lack of #[feature(generic_const_exprs)] by putting 1 << N into a const fn and passing it in at the constructor site.

chyyran · 2022-08-09T06:15:16Z

Vec2D from #77 lets us easily experiment with different backing stores for the literal_probs array which is the main allocation.
I used arrayvec which gives us a nice API over a raw array.

Since DecoderState needs to be able to update lclppb at runtime, we can't just parameterize it with const generics unlike BitTree, so we have to account for the theoretical maximum parameters which is sizeof(u16) * 0x300 * { 1 << (4 + 8) } = 3145728 * 2. An array of 6 megabytes is way too large to fit on the stack and I was unable to actually decompress anything without a stack overflow.

For LZMA2 streams, the theoretical maximum parameters for the literal_probs array is sizeof(u16) * 0x300 * { 1 << 4 } = 12288 * 2. While this isn't great, 24KB is way smaller than 6MB so I was able to actually run some benchmarks, which are not great.

These benchmarks are with an allocating BitTree, i.e. without const-generics.

ArrayVec<u16, 0x300 * { 1 << 4 }>

running 8 tests
test compress_65536                  ... bench:   1,323,468 ns/iter (+/- 174,348)
test compress_empty                  ... bench:         695 ns/iter (+/- 20)
test compress_hello                  ... bench:       1,035 ns/iter (+/- 90)
test decompress_after_compress_65536 ... bench:   1,336,165 ns/iter (+/- 57,379)
test decompress_after_compress_empty ... bench:       6,166 ns/iter (+/- 359)
test decompress_after_compress_hello ... bench:       6,531 ns/iter (+/- 224)
test decompress_big_file             ... bench:   3,716,763 ns/iter (+/- 121,865)
test decompress_huge_dict            ... bench:       6,572 ns/iter (+/- 315)

test result: ok. 0 passed; 0 failed; 0 ignored; 8 measured; 0 filtered out; finished in 13.64s

Box<[u16]>

running 8 tests
test compress_65536                  ... bench:   1,334,964 ns/iter (+/- 1,371,984)
test compress_empty                  ... bench:         709 ns/iter (+/- 436)
test compress_hello                  ... bench:       1,033 ns/iter (+/- 64)
test decompress_after_compress_65536 ... bench:   1,145,697 ns/iter (+/- 82,369)
test decompress_after_compress_empty ... bench:       1,890 ns/iter (+/- 92)
test decompress_after_compress_hello ... bench:       2,231 ns/iter (+/- 295)
test decompress_big_file             ... bench:   3,620,380 ns/iter (+/- 152,660)
test decompress_huge_dict            ... bench:       2,271 ns/iter (+/- 75)

test result: ok. 0 passed; 0 failed; 0 ignored; 8 measured; 0 filtered out; finished in 7.76s

The huge array on the stack probably causes a lot of cache misses and substantially reduces performance. It seems the only benefit to using const generics here is with BitTree, where it does show an improvement in the benchmarks as above.

gendx added performance Make lzma-rs faster! blocked labels Dec 10, 2019

chyyran mentioned this issue Aug 9, 2022

Use const generics to remove BitTree heap allocations #79

Merged

gendx closed this as completed in #79 Mar 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use const-generics #19

Use const-generics #19

gendx commented Dec 10, 2019

npmccallum commented Mar 25, 2021

gendx commented May 3, 2021

chyyran commented Aug 6, 2022 •

edited

Loading

chyyran commented Aug 9, 2022 •

edited

Loading

Use const-generics #19

Use const-generics #19

Comments

gendx commented Dec 10, 2019

npmccallum commented Mar 25, 2021

gendx commented May 3, 2021

chyyran commented Aug 6, 2022 • edited Loading

chyyran commented Aug 9, 2022 • edited Loading

chyyran commented Aug 6, 2022 •

edited

Loading

chyyran commented Aug 9, 2022 •

edited

Loading