-
Notifications
You must be signed in to change notification settings - Fork 4.9k
Use Popcnt hardware intrinsic in System.Reflection.Metadata BitArithmetic.CountBits. #26190
Conversation
Nice! |
Thank you for the work! |
This is an overkill. The method is called once or twice when opening metadata file. The extra complexity is not worth it. |
If this was a public API somewhere in CoreFX that we could call from SRM then it would make sense to optimize. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Too complex
It does serve a good example on how to use the intrinsics in corefx (with a netcoreapp file split)? |
I think that is the eventual goal for some of the HWIntrinsic APIs (the bit manipulation instructions, at the very least). |
One thought is that since this code is open source, and in corefx, people see it as "the right way" to do something. For example, here is a place this code was copied: https://github.com/aspnet/KestrelHttpServer/blob/2b54b2fc91629a96b57af9dddb18f44dff53a70f/src/Kestrel.Core/Internal/Http/HttpHeaders.cs#L115-L129
|
There was nothing wrong with the original implementation though. It was the right way. |
Yes, but it is also 15x slower than the new APIs which are being exposed. It may not matter in this case, but it will in others (Kestrel, for example). I agree that this is overly complex if it is only being called once or twice in the entire library. @eerhardt, perhaps we could fast track the |
Sorry, I should have said "the best way".
Most of the complexity would still be here. The API would only exist on |
Not necessarily. If the BitManipulation type had all the functionality we have currently in BitArithmetics, we could just simply use BitArithmetics as is on !=netcoreapp2.x and BitManipulation otherwise. |
With any performance optimization there is always tradoff between simplicity provided by base case and complexity of optimized code. I think that one of the best examples what extreme optimization means is comparison between simple 100 line Fast Fourier Transform implementation in portable C++ and the size of fastest FFT library FFTW3. |
I agree with @tmat here; the improvement on the function itself doesn't matter if there's no measurable improvement in the all-up usage. If it didn't make the code more complex, then "why not", but as it does, it's not worth it. |
This needs to be benchmarking the public API. Are there any visible improvements on the public System.Reflection.Metadata APIs? |
As far as I can tell this is only used as tiny part of very heavy operations like constructing PEHeaderBuilder. I do not think the extra complexity is worth it for this. |
As much as I like the perf, I have to agree it does not seem worth it in this case and the right thing seems to push #12425 along. @tannergooding do you wish to champion that? |
@danmosemsft, I've updated the PR with a request for the original post to be cleaned up a bit first. |
I did some Benchmark tests on my machine, and got the following results:
See https://github.com/dotnet/coreclr/issues/15506#issuecomment-351494808 for this suggestion.
/cc @fiigii @tannergooding @benaadams