Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove dead code in fwht #275

Merged

Conversation

AndersTrier
Copy link
Contributor

When reworking my Rust implementation of leopard16, I realized that the last loop in fwht is unreachable.

On the last iteration of the outermost loop, dist4 == order, after which we update the value of dist to dist4, making us unable to enter the body of the last loop.

order is always either 256 or 16384, and

In [2]: 4<<2<<2<<2
Out[2]: 256
In [5]: 4<<2<<2<<2<<2<<2<<2
Out[5]: 16384

On a related note: I tried to optimize fwht further by using 8 values at a time (fwht_8). It was slightly faster on x86, but slower on ARM.
fwht_16 was slower on both architectures.

@AndersTrier AndersTrier marked this pull request as draft March 11, 2024 11:00
as we're always working on an array of size `order`
@AndersTrier AndersTrier force-pushed the AndersTrier/leopard_fwht_del_dead_code branch from 4b07afa to b32e610 Compare March 11, 2024 11:10
@AndersTrier AndersTrier marked this pull request as ready for review March 11, 2024 11:14
@klauspost
Copy link
Owner

Looks like a nice cleanup! btw, added your repo to the README since I assume they are compatible.

@klauspost
Copy link
Owner

wrt unrolling, yeah I settled on 4x since 8x didn't give a significant speedup and didn't want to choke old platforms with limited registers more than needed.

@AndersTrier
Copy link
Contributor Author

Looks like a nice cleanup! btw, added your repo to the README since I assume they are compatible.

Thanks!

I would have to test wrt. compatibility, as I don't think high rate and low rate are compatible, and sometimes you can choose either.

@AndersTrier
Copy link
Contributor Author

wrt unrolling, yeah I settled on 4x since 8x didn't give a significant speedup and didn't want to choke old platforms with limited registers more than needed.

Did you consider implementing it with SIMD?

I don't know if it's worth it, but I found some code to get you started if you want to give it a try ;)
https://github.com/catid/leopard/blob/master/docs/vector_fwht_4.txt
https://github.com/paritytech/reed-solomon-novelpoly/blob/df906e1ca27dc6c0c1b663f1653f57d4620f03dd/reed-solomon-novelpoly/src/field/inc_log_mul.rs#L118
https://stackoverflow.com/a/54133143

@klauspost
Copy link
Owner

Yeah, it could be done... It could actually be rather effective AFAICT. But it's been too long since I looked at this to judge if/when it would provide a speedup.

@klauspost klauspost merged commit 85a5e93 into klauspost:master Mar 11, 2024
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants