Use Popcnt hardware intrinsic in System.Reflection.Metadata BitArithmetic.CountBits. #26190

eerhardt · 2018-01-05T19:50:43Z

I did some Benchmark tests on my machine, and got the following results:

Method	Mean	Error	StdDev
PopCountInt32	2.265 ns	0.0836 ns	0.0996 ns
PopCountInt64	2.673 ns	0.0939 ns	0.2080 ns
CountBitsInt32	27.530 ns	0.7307 ns	1.1158 ns
CountBitsInt64	31.071 ns	0.5751 ns	0.5098 ns

See https://github.com/dotnet/coreclr/issues/15506#issuecomment-351494808 for this suggestion.

/cc @fiigii @tannergooding @benaadams

…etic.CountBits.

danmoseley · 2018-01-05T20:36:59Z

Nice!

fiigii · 2018-01-05T21:08:23Z

Thank you for the work!

tmat · 2018-01-05T21:14:51Z

This is an overkill. The method is called once or twice when opening metadata file. The extra complexity is not worth it.

tmat · 2018-01-05T21:16:44Z

If this was a public API somewhere in CoreFX that we could call from SRM then it would make sense to optimize.

tmat

Too complex

benaadams · 2018-01-05T21:19:03Z

This is an overkill.

It does serve a good example on how to use the intrinsics in corefx (with a netcoreapp file split)?
I assume other usecases will be more complicated

tannergooding · 2018-01-05T21:26:57Z

If this was a public API somewhere in CoreFX that we could call from SRM then it would make sense to optimize.

I think that is the eventual goal for some of the HWIntrinsic APIs (the bit manipulation instructions, at the very least).

eerhardt · 2018-01-05T21:31:14Z

If this was a public API somewhere in CoreFX that we could call from SRM then it would make sense to optimize.

One thought is that since this code is open source, and in corefx, people see it as "the right way" to do something.

For example, here is a place this code was copied: https://github.com/aspnet/KestrelHttpServer/blob/2b54b2fc91629a96b57af9dddb18f44dff53a70f/src/Kestrel.Core/Internal/Http/HttpHeaders.cs#L115-L129

// see https://github.com/dotnet/corefx/blob/5965fd3756bc9dd9c89a27621eb10c6931126de2/src/System.Reflection.Metadata/src/System/Reflection/Internal/Utilities/BitArithmetic.cs

tmat · 2018-01-05T21:54:52Z

There was nothing wrong with the original implementation though. It was the right way.

tannergooding · 2018-01-05T22:02:56Z

It was the right way.

Yes, but it is also 15x slower than the new APIs which are being exposed. It may not matter in this case, but it will in others (Kestrel, for example).

I agree that this is overly complex if it is only being called once or twice in the entire library.

@eerhardt, perhaps we could fast track the System.BitManipulation proposal (I think it is https://github.com/dotnet/corefx/issues/12425) and update it to use the hardware intrinsics. We could then update this to call the CoreFX API?

eerhardt · 2018-01-05T22:55:46Z

It was the right way.

Sorry, I should have said "the best way".

perhaps we could fast track the System.BitManipulation proposal (I think it is #12425) and update it to use the hardware intrinsics. We could then update this to call the CoreFX API?

Most of the complexity would still be here. The API would only exist on netcoreapp2.x, so you would still need to #if the code.

tmat · 2018-01-05T23:05:46Z

Most of the complexity would still be here.

Not necessarily. If the BitManipulation type had all the functionality we have currently in BitArithmetics, we could just simply use BitArithmetics as is on !=netcoreapp2.x and BitManipulation otherwise.

4creators · 2018-01-06T00:34:04Z

There was nothing wrong with the original implementation though. It was the right way.

With any performance optimization there is always tradoff between simplicity provided by base case and complexity of optimized code. I think that one of the best examples what extreme optimization means is comparison between simple 100 line Fast Fourier Transform implementation in portable C++ and the size of fastest FFT library FFTW3.

stephentoub · 2018-01-06T02:31:43Z

I agree with @tmat here; the improvement on the function itself doesn't matter if there's no measurable improvement in the all-up usage. If it didn't make the code more complex, then "why not", but as it does, it's not worth it.

jkotas · 2018-01-06T07:18:07Z

I did some Benchmark tests on my machine

This needs to be benchmarking the public API. Are there any visible improvements on the public System.Reflection.Metadata APIs?

jkotas · 2018-01-06T07:20:07Z

As far as I can tell this is only used as tiny part of very heavy operations like constructing PEHeaderBuilder. I do not think the extra complexity is worth it for this.

danmoseley · 2018-01-08T19:01:01Z

As much as I like the perf, I have to agree it does not seem worth it in this case and the right thing seems to push #12425 along. @tannergooding do you wish to champion that?

tannergooding · 2018-01-08T20:02:53Z

@danmosemsft, I've updated the PR with a request for the original post to be cleaned up a bit first.

eerhardt requested review from tmat and nguerrera January 5, 2018 19:50

Use Popcnt hardware intrinsic in System.Reflection.Metadata BitArithm…

1631b11

…etic.CountBits.

eerhardt force-pushed the PopCnt branch from 11d7aeb to 1631b11 Compare January 5, 2018 20:34

tmat suggested changes Jan 5, 2018

View reviewed changes

jkotas added the * NO MERGE * The PR is not ready for merge yet (see discussion for detailed reasons) label Jan 6, 2018

karelz added the area-System.Reflection.Metadata label Jan 6, 2018

karelz assigned eerhardt Jan 6, 2018

eerhardt closed this Jan 8, 2018

karelz added this to the 2.1.0 milestone Jan 20, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use Popcnt hardware intrinsic in System.Reflection.Metadata BitArithmetic.CountBits. #26190

Use Popcnt hardware intrinsic in System.Reflection.Metadata BitArithmetic.CountBits. #26190

eerhardt commented Jan 5, 2018

danmoseley commented Jan 5, 2018

fiigii commented Jan 5, 2018

tmat commented Jan 5, 2018

tmat commented Jan 5, 2018

tmat left a comment

benaadams commented Jan 5, 2018 •

edited

Loading

tannergooding commented Jan 5, 2018

eerhardt commented Jan 5, 2018

tmat commented Jan 5, 2018

tannergooding commented Jan 5, 2018

eerhardt commented Jan 5, 2018

tmat commented Jan 5, 2018

4creators commented Jan 6, 2018

stephentoub commented Jan 6, 2018

jkotas commented Jan 6, 2018

jkotas commented Jan 6, 2018

danmoseley commented Jan 8, 2018

tannergooding commented Jan 8, 2018

Use Popcnt hardware intrinsic in System.Reflection.Metadata BitArithmetic.CountBits. #26190

Use Popcnt hardware intrinsic in System.Reflection.Metadata BitArithmetic.CountBits. #26190

Conversation

eerhardt commented Jan 5, 2018

danmoseley commented Jan 5, 2018

fiigii commented Jan 5, 2018

tmat commented Jan 5, 2018

tmat commented Jan 5, 2018

tmat left a comment

Choose a reason for hiding this comment

benaadams commented Jan 5, 2018 • edited Loading

tannergooding commented Jan 5, 2018

eerhardt commented Jan 5, 2018

tmat commented Jan 5, 2018

tannergooding commented Jan 5, 2018

eerhardt commented Jan 5, 2018

tmat commented Jan 5, 2018

4creators commented Jan 6, 2018

stephentoub commented Jan 6, 2018

jkotas commented Jan 6, 2018

jkotas commented Jan 6, 2018

danmoseley commented Jan 8, 2018

tannergooding commented Jan 8, 2018

benaadams commented Jan 5, 2018 •

edited

Loading