Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* fse: Optimize table building Skipping the loop body when v == 0 helps endzerobits and normcount2. Not writing to s.symbolLen in every iteration helps the other benchmarks. name old speed new speed delta Compress/gettysburg-8 181MB/s ± 1% 183MB/s ± 0% +1.15% (p=0.002 n=10+8) Compress/digits-8 241MB/s ± 0% 241MB/s ± 1% ~ (p=0.434 n=9+10) Compress/twain-8 218MB/s ± 0% 218MB/s ± 0% ~ (p=0.755 n=10+10) Compress/low-ent-8 239MB/s ± 0% 239MB/s ± 1% ~ (p=0.853 n=10+10) Compress/superlow-ent-8 208MB/s ± 1% 208MB/s ± 0% ~ (p=0.408 n=9+7) Compress/endzerobits-8 11.5MB/s ± 1% 13.3MB/s ± 1% +16.35% (p=0.000 n=10+9) Compress/pngdata.001-8 224MB/s ± 0% 224MB/s ± 1% +0.38% (p=0.004 n=8+10) Compress/normcount2-8 35.7MB/s ± 1% 36.6MB/s ± 1% +2.66% (p=0.000 n=10+9) * fse: Skip bounds checks each occurrence of v3, v2, v1, v0 := src[len(src)-4], src[len(src)-3], src[len(src)-2], src[len(src)-1] now incurs three bounds checks instead of four. I haven't found a way to eliminate the remaining three. name old speed new speed delta Compress/gettysburg-8 183MB/s ± 0% 189MB/s ± 0% +3.32% (p=0.000 n=8+9) Compress/digits-8 241MB/s ± 1% 251MB/s ± 1% +4.14% (p=0.000 n=10+9) Compress/twain-8 218MB/s ± 0% 228MB/s ± 0% +4.36% (p=0.000 n=10+10) Compress/low-ent-8 239MB/s ± 1% 244MB/s ± 1% +1.90% (p=0.000 n=10+10) Compress/superlow-ent-8 208MB/s ± 0% 210MB/s ± 0% +0.89% (p=0.000 n=7+8) Compress/endzerobits-8 13.3MB/s ± 1% 13.4MB/s ± 1% +0.40% (p=0.019 n=9+10) Compress/pngdata.001-8 224MB/s ± 1% 225MB/s ± 1% +0.41% (p=0.006 n=10+9) Compress/normcount2-8 36.6MB/s ± 1% 36.4MB/s ± 1% -0.62% (p=0.012 n=9+10)
- Loading branch information