Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.
/ corefx Public archive

Use Popcnt hardware intrinsic in System.Reflection.Metadata BitArithmetic.CountBits. #26190

Closed
wants to merge 1 commit into from

Conversation

eerhardt
Copy link
Member

@eerhardt eerhardt commented Jan 5, 2018

I did some Benchmark tests on my machine, and got the following results:

Method Mean Error StdDev
PopCountInt32 2.265 ns 0.0836 ns 0.0996 ns
PopCountInt64 2.673 ns 0.0939 ns 0.2080 ns
CountBitsInt32 27.530 ns 0.7307 ns 1.1158 ns
CountBitsInt64 31.071 ns 0.5751 ns 0.5098 ns

See https://github.com/dotnet/coreclr/issues/15506#issuecomment-351494808 for this suggestion.

/cc @fiigii @tannergooding @benaadams

@eerhardt eerhardt requested review from tmat and nguerrera January 5, 2018 19:50
@danmoseley
Copy link
Member

Nice!

@fiigii
Copy link
Contributor

fiigii commented Jan 5, 2018

Thank you for the work!

@tmat
Copy link
Member

tmat commented Jan 5, 2018

This is an overkill. The method is called once or twice when opening metadata file. The extra complexity is not worth it.

@tmat
Copy link
Member

tmat commented Jan 5, 2018

If this was a public API somewhere in CoreFX that we could call from SRM then it would make sense to optimize.

Copy link
Member

@tmat tmat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too complex

@benaadams
Copy link
Member

benaadams commented Jan 5, 2018

This is an overkill.

It does serve a good example on how to use the intrinsics in corefx (with a netcoreapp file split)?
I assume other usecases will be more complicated

@tannergooding
Copy link
Member

If this was a public API somewhere in CoreFX that we could call from SRM then it would make sense to optimize.

I think that is the eventual goal for some of the HWIntrinsic APIs (the bit manipulation instructions, at the very least).

@eerhardt
Copy link
Member Author

eerhardt commented Jan 5, 2018

If this was a public API somewhere in CoreFX that we could call from SRM then it would make sense to optimize.

One thought is that since this code is open source, and in corefx, people see it as "the right way" to do something.

For example, here is a place this code was copied: https://github.com/aspnet/KestrelHttpServer/blob/2b54b2fc91629a96b57af9dddb18f44dff53a70f/src/Kestrel.Core/Internal/Http/HttpHeaders.cs#L115-L129

// see https://github.com/dotnet/corefx/blob/5965fd3756bc9dd9c89a27621eb10c6931126de2/src/System.Reflection.Metadata/src/System/Reflection/Internal/Utilities/BitArithmetic.cs

@tmat
Copy link
Member

tmat commented Jan 5, 2018

There was nothing wrong with the original implementation though. It was the right way.

@tannergooding
Copy link
Member

It was the right way.

Yes, but it is also 15x slower than the new APIs which are being exposed. It may not matter in this case, but it will in others (Kestrel, for example).

I agree that this is overly complex if it is only being called once or twice in the entire library.

@eerhardt, perhaps we could fast track the System.BitManipulation proposal (I think it is https://github.com/dotnet/corefx/issues/12425) and update it to use the hardware intrinsics. We could then update this to call the CoreFX API?

@eerhardt
Copy link
Member Author

eerhardt commented Jan 5, 2018

It was the right way.

Sorry, I should have said "the best way".

perhaps we could fast track the System.BitManipulation proposal (I think it is #12425) and update it to use the hardware intrinsics. We could then update this to call the CoreFX API?

Most of the complexity would still be here. The API would only exist on netcoreapp2.x, so you would still need to #if the code.

@tmat
Copy link
Member

tmat commented Jan 5, 2018

Most of the complexity would still be here.

Not necessarily. If the BitManipulation type had all the functionality we have currently in BitArithmetics, we could just simply use BitArithmetics as is on !=netcoreapp2.x and BitManipulation otherwise.

@4creators
Copy link
Contributor

There was nothing wrong with the original implementation though. It was the right way.

With any performance optimization there is always tradoff between simplicity provided by base case and complexity of optimized code. I think that one of the best examples what extreme optimization means is comparison between simple 100 line Fast Fourier Transform implementation in portable C++ and the size of fastest FFT library FFTW3.

@stephentoub
Copy link
Member

I agree with @tmat here; the improvement on the function itself doesn't matter if there's no measurable improvement in the all-up usage. If it didn't make the code more complex, then "why not", but as it does, it's not worth it.

@jkotas
Copy link
Member

jkotas commented Jan 6, 2018

I did some Benchmark tests on my machine

This needs to be benchmarking the public API. Are there any visible improvements on the public System.Reflection.Metadata APIs?

@jkotas
Copy link
Member

jkotas commented Jan 6, 2018

As far as I can tell this is only used as tiny part of very heavy operations like constructing PEHeaderBuilder. I do not think the extra complexity is worth it for this.

@jkotas jkotas added the * NO MERGE * The PR is not ready for merge yet (see discussion for detailed reasons) label Jan 6, 2018
@danmoseley
Copy link
Member

As much as I like the perf, I have to agree it does not seem worth it in this case and the right thing seems to push #12425 along. @tannergooding do you wish to champion that?

@eerhardt eerhardt closed this Jan 8, 2018
@tannergooding
Copy link
Member

@danmosemsft, I've updated the PR with a request for the original post to be cleaned up a bit first.

@karelz karelz added this to the 2.1.0 milestone Jan 20, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Reflection.Metadata * NO MERGE * The PR is not ready for merge yet (see discussion for detailed reasons)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants