add explicit non-zero context to `_Countr_zero` #2298

AlexGuteniev · 2021-10-23T14:31:46Z

x86 diff:
https://godbolt.org/z/v7v9zv3Gr
x64 diff:
https://godbolt.org/z/b8h86MrMo

stl/inc/limits

# Conflicts: # stl/inc/limits

Fix "assmuption" typo. Simplify _Non_zero_input to _Nonzero, consistent with function name. Avoid negation in _No_assumption; _Possibly_zero focuses on what we're concerned with.

StephanTLavavej · 2021-11-17T04:19:48Z

@AlexGuteniev I have pushed a merge with main, containing major changes to <limits>. I believe that this preserves the intent of your changes, while applying them to the post-#2333 codebase, refactoring the logic to be cleaner (particularly around the widen-8/16-bits logic), and further enhances the nonzero optimization (in particular, we can avoid inspecting BSF's return value).

Please let me know if anything looks weird/wrong!

AlexGuteniev · 2021-11-17T07:16:21Z

Please let me know if anything looks weird/wrong!

I think it is correct, but looks like without attempting to use tzcnt as bsf the intent may need reconsideration.

As the change was originally written, it avoided branches. The branches were there only for zero input.
Now there is a branch between tzcnt and bsf, regardless of zero/nonzero. And bsf itself is branchless (it uses cmov).
And bsf is a fallback.

So currently the change appears to be near-zero improvement, that doesn't even worth complicating code.

Need to benchmark and inspect codegen to figure out what should be done. Maybe implementing the same thing as for vector<👻>. Maybe just implementing #2133 will do (then again, this change may be useful, but may be not).

AlexGuteniev · 2021-11-17T19:51:46Z

This optimization makes gcd 1 to 2 percent faster.
I've tried alternative approach (similar to _Select_popcount_impl), it turned out to be slightly faster than this one, still between 1 and 2 percent faster than the current.
Do we want to proceed?

StephanTLavavej · 2021-11-17T22:28:19Z

1-2% is worth it, I think (it's pretty small on the scale of libs performance improvements, but it's easily something that the backend team would celebrate as their codegen is already highly optimized, and improving program perf without user-programmer intervention is always nice). @barcharcraz agrees - let's proceed!

AlexGuteniev · 2021-11-18T10:37:42Z

Please consider submitting #2343 instead of this one.
It:

Gives slightly better results
Easier to explain the optimization
Consistent with similar optimization for _Popcount

StephanTLavavej · 2021-11-23T01:37:28Z

Looking at #2343, I agree that it appears to be superior. Closing this PR.

add explicit non-zero context to _Countr_zero

c94a229

fixes microsoft#2292

AlexGuteniev marked this pull request as ready for review October 23, 2021 14:49

AlexGuteniev requested a review from a team as a code owner October 23, 2021 14:49

for smallish vals can also optimize

38287a2

miscco reviewed Oct 23, 2021

View reviewed changes

stl/inc/limits Outdated Show resolved Hide resolved

CaseyCarter added the performance Must go faster label Oct 23, 2021

AlexGuteniev added 3 commits October 24, 2021 13:28

@miscco assumes this is clearer

efe7434

arm build

6148e10

Merge remote-tracking branch 'upstream/main' into gcd

af386c0

# Conflicts: # stl/inc/limits

StephanTLavavej assigned barcharcraz Nov 3, 2021

StephanTLavavej added 5 commits November 16, 2021 19:34

Rework <limits>, merge branch 'main' into gcd

14d4b07

Rename enumerators.

498d80b

Fix "assmuption" typo. Simplify _Non_zero_input to _Nonzero, consistent with function name. Avoid negation in _No_assumption; _Possibly_zero focuses on what we're concerned with.

_STL_INTERNAL_CHECK is non-core, oops.

9ffbf06

Tests directly call _Countr_zero_bsf.

64a9ecb

Suggestion: Avoid one-line helper function.

00e0c62

StephanTLavavej approved these changes Nov 17, 2021

View reviewed changes

AlexGuteniev marked this pull request as draft November 17, 2021 07:24

AlexGuteniev marked this pull request as ready for review November 17, 2021 19:48

AlexGuteniev mentioned this pull request Nov 18, 2021

Alternative way to optimize _Countr_zero #2343

Merged

StephanTLavavej closed this Nov 23, 2021

StephanTLavavej unassigned barcharcraz Nov 23, 2021

AlexGuteniev deleted the gcd branch November 23, 2021 11:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add explicit non-zero context to `_Countr_zero` #2298

add explicit non-zero context to `_Countr_zero` #2298

Uh oh!

AlexGuteniev commented Oct 23, 2021 •

edited

Loading

Uh oh!

Uh oh!

StephanTLavavej commented Nov 17, 2021

Uh oh!

AlexGuteniev commented Nov 17, 2021

Uh oh!

AlexGuteniev commented Nov 17, 2021

Uh oh!

StephanTLavavej commented Nov 17, 2021

Uh oh!

AlexGuteniev commented Nov 18, 2021

Uh oh!

StephanTLavavej commented Nov 23, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

add explicit non-zero context to _Countr_zero #2298

add explicit non-zero context to _Countr_zero #2298

Uh oh!

Conversation

AlexGuteniev commented Oct 23, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

StephanTLavavej commented Nov 17, 2021

Uh oh!

AlexGuteniev commented Nov 17, 2021

Uh oh!

AlexGuteniev commented Nov 17, 2021

Uh oh!

StephanTLavavej commented Nov 17, 2021

Uh oh!

AlexGuteniev commented Nov 18, 2021

Uh oh!

StephanTLavavej commented Nov 23, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

add explicit non-zero context to `_Countr_zero` #2298

add explicit non-zero context to `_Countr_zero` #2298

AlexGuteniev commented Oct 23, 2021 •

edited

Loading