Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decoder improvements #172

Merged
merged 3 commits into from
Mar 22, 2022
Merged

Decoder improvements #172

merged 3 commits into from
Mar 22, 2022

Conversation

greatroar
Copy link
Contributor

  • Handles a corner case in the decoders that wasn't previously detected, where the last token declared a match but input stopped right where the match offset would be.
  • amd64 decoder now checks for zero offsets in its shortcut code, making it consistent with the other decoders.
  • arm64 decoder has been made a few instructions shorter and it can use its fast match copy loop even after copying from a dictionary, just like the other decoders.

Use fast loop after dict copy. Checking for its possibility costs as
many instructions as jumping over it.

Move SUBS close to conditional branches for CPUs that fuse these
instructions.

Shave one instruction off the remainder handling code after this loop.
A load from register base+register offset has the same latency and
throughput as a load from register+constant offset, at least on
Cortex-A72.
This corner case wasn't detected by any of the decoders.
The other decoders handle this correctly.
@greatroar
Copy link
Contributor Author

I suppose the test failures are the ones on v4; locally, all tests pass with this branch.

@pierrec pierrec merged commit b8cae7c into pierrec:v4 Mar 22, 2022
@greatroar greatroar deleted the decoder-fixes branch April 3, 2022 12:13
pierrec added a commit that referenced this pull request May 17, 2022
cherrypick (#172): internal/lz4block: arm64 decoder improvements
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants