Save memory by skipping the shuffle map from Radix4 and Radix3 #81

ejmahler · 2021-10-19T02:03:50Z

I was looking into how to make the bit reversal in Radix4 and Radix3 more friendly to SIMD. I was working under the assumption that the bit reversals were too expensive to do in the outer loop of bitreversed_transpose(), but during my experiments, i stumbled across something that made me challenge that assumption.

I discovered that there was little or no performance difference between

Using the shuffle map as-is
Unrolling one step of the shuffle map, so that it only stores some of the values, with the rest being reconstructed via simple arithmetic
Entirely eliminating the shuffle map, and computing one bit reversal per outer loop, with the rest of the bit reversals being reconstructed
Just computing all the bit reversals in the outer loop, with no fancy reconstruction.

As a result, this PR changes Radix 4 and Radix 3 to the last bullet point, completely eliminating the shuffle map. This makes radix4 and radix3 simpler, and creates a much more obvious path for SIMD-ification of the bit reversal algorithm. Although after my experiments here, I'm not too confident that SIMD bit reversal will make much of a difference.

ejmahler · 2021-10-19T02:05:10Z

@HEnquist may find this interesting

HEnquist · 2021-10-19T07:12:05Z

src/algorithm/radix4.rs

        let x0 = 4 * x;
        let x1 = 4 * x + 1;
        let x2 = 4 * x + 2;
        let x3 = 4 * x + 3;

+        let x_rev = [


It could make sense to make a "reverse_bits_of_four" function that reverses four numbers in the same loop. I'm guessing that would make it good for the auto-vectorizer.

Nope, not a good idea. That actually runs a tiny bit slower for some odd reason.

Update on that. I can't measure any difference so it's not slower. And also not faster. Let's not bother.

HEnquist · 2021-10-19T07:17:57Z

This is very interesting! I didn't consider this approach since I just assumed it would be slower. How does the speed compare to the map version?

ejmahler · 2021-10-19T17:33:04Z

I'm not sure what you mean by map version.

HEnquist · 2021-10-19T17:42:29Z

I'm not sure what you mean by map version.

Oh just the previous version, before this change.

ejmahler · 2021-10-19T17:49:52Z

Ah. The speed difference is within the noise range of the benchmarker. So there may be a difference, but it's too small to see.

…machine

HEnquist · 2021-10-19T21:29:19Z

I can confirm that it compiles and passes the tests just fine on an aarch64 machine.

Save memory by skipping the shuffle map

1ad8055

Removed an accidentally-duplicated transpose call

d16fcb7

HEnquist reviewed Oct 19, 2021

View reviewed changes

Ported shuffle map removal to the neon radix4

ba9a374

ejmahler added 2 commits October 19, 2021 10:50

Another neon fix, flying blind since i don't have access to an arm64 …

51221c3

…machine

format

1543e60

ejmahler merged commit b0d9bd0 into master Oct 20, 2021

ejmahler deleted the radix4-noshuffle branch October 20, 2021 03:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Save memory by skipping the shuffle map from Radix4 and Radix3 #81

Save memory by skipping the shuffle map from Radix4 and Radix3 #81

ejmahler commented Oct 19, 2021

ejmahler commented Oct 19, 2021

HEnquist Oct 19, 2021

HEnquist Oct 19, 2021

HEnquist Oct 19, 2021

HEnquist commented Oct 19, 2021

ejmahler commented Oct 19, 2021

HEnquist commented Oct 19, 2021

ejmahler commented Oct 19, 2021

HEnquist commented Oct 19, 2021

Save memory by skipping the shuffle map from Radix4 and Radix3 #81

Save memory by skipping the shuffle map from Radix4 and Radix3 #81

Conversation

ejmahler commented Oct 19, 2021

ejmahler commented Oct 19, 2021

HEnquist Oct 19, 2021

Choose a reason for hiding this comment

HEnquist Oct 19, 2021

Choose a reason for hiding this comment

HEnquist Oct 19, 2021

Choose a reason for hiding this comment

HEnquist commented Oct 19, 2021

ejmahler commented Oct 19, 2021

HEnquist commented Oct 19, 2021

ejmahler commented Oct 19, 2021

HEnquist commented Oct 19, 2021