Skip to content

Conversation

@AlexGuteniev
Copy link
Contributor

@AlexGuteniev AlexGuteniev commented Apr 3, 2020

Keep wait for review #593 , easy parts can be made available separately earlier

AlexGuteniev added a commit to AlexGuteniev/STL that referenced this pull request Apr 3, 2020
@AlexGuteniev AlexGuteniev marked this pull request as ready for review April 3, 2020 21:26
@AlexGuteniev AlexGuteniev requested a review from a team as a code owner April 3, 2020 21:26
@AlexGuteniev AlexGuteniev changed the title Extract simple part from #593 to have it earlier Extract simple part from #593 to have them earlier Apr 3, 2020
@AlexGuteniev AlexGuteniev changed the title Extract simple part from #593 to have them earlier atomic_flag_test, lock free type aliases Apr 3, 2020
@AlexGuteniev
Copy link
Contributor Author

AlexGuteniev commented Apr 4, 2020

I want to stress that there's a design decision on atomic_signed_lock_free and atomic_unsigned_lock_free should be made, and this has huge, well, potentially huge impact on future ABI.

There are 3 sensible options:

  1. Make these types 64-bit on both 32-bit and 64-bit systems
  2. Make these types 64-bit on 64-bit systems and 32-bit on 32-bit systems
  3. Make these types 32 bit on both 32-bit and 64-bit systems

First option will make good counters that are unlikely to wrap around, which makes some algorithm more reliable. Also it makes good interprocess types, since they can be shared between 32-bit and 64-bit processes. Standard mentions that "lock-free types should also be address-free for interprocess shared memory". Though it violates what standard tells that "operations should be most effective". 64-bit type on x86 is not most effective, though #651 could make the situation way better.

Second option looks natural, but it is not different bitness interprocess friendly

Third option is good for efficiency, for interprocess, in line with Linux, but does not make effort to make the best of x64: prone to wrap around, and cannot even fit pointer or size on x64.

@CaseyCarter CaseyCarter added the cxx20 C++20 feature label Apr 6, 2020
@BillyONeal
Copy link
Member

I think u?intptr_t is probably the right choice here.

@BillyONeal
Copy link
Member

Can you add smoke tests for these?

# Conflicts:
#	stl/inc/yvals_core.h
#	tests/std/tests/VSO_0157762_feature_test_macros/test.cpp
@StephanTLavavej StephanTLavavej added the decision needed We need to choose something before working on this label Apr 22, 2020
@StephanTLavavej
Copy link
Member

Marking as decision needed because we need to determine whether 64-bit atomic operations on x86 (with upcoming dedicated intrinsics) are fast enough that we should consider selecting 64-bit types here.

@cbezault says that on ARM32, 64-bit atomic operations aren't super efficient, so we are essentially deciding whether to special-case x86 or not.

@BillyONeal
Copy link
Member

I vote to not special case x86. The only thing that was changed recently was load and store, exchange / cas / arithmetic etc. are still relatively slow.

@AlexGuteniev
Copy link
Contributor Author

AlexGuteniev commented Apr 23, 2020

exchange / cas / arithmetic etc. are still relatively slow.

Are they way slower? Specifically, is lock cmpxchg8b ways slower than lock xadd or xchg?
I think both end up in full fence.

Sure, lock cmpxchg8b is slow for bad case, where it fails, and needs to be repeated, but I expect good case to be approximately the same.

@AlexGuteniev
Copy link
Contributor Author

Anyway, is there end of 32-bit x86 era in sight? What about 32-bit ARM?
(this probably would not help in determining the decision, but may make the decision easier)

@BillyONeal
Copy link
Member

exchange / cas / arithmetic etc. are still relatively slow.

Are they way slower? Specifically, is lock cmpxchg8b ways slower than lock xadd or xchg?
I think both end up in full fence.

Still a full fence but one needs a loop and the other does not. And cmpxchg8b clobbers all the registers.

Sure, lock cmpxchg8b is slow for bad case, where it fails, and needs to be repeated, but I expect good case to be approximately the same.

I haven't bothered to benchmark but the loop case alone I think makes it worth not special casing in absence of data indicating that such a special case would be useful for anything. Being able to stuff a pointer in here is important. Being able to stuff more than a pointer has limited utility when still interpreting the result as an integral type.

AlexGuteniev and others added 2 commits April 25, 2020 06:05
Co-Authored-By: Casey Carter <cartec69@gmail.com>
Co-Authored-By: Casey Carter <cartec69@gmail.com>
@AlexGuteniev
Copy link
Contributor Author

Being able to stuff more than a pointer has limited utility when still interpreting the result as an integral type.

I was thinking about having 64-bit value that is less likely to wrap around is useful for patterns similar to SeqLock. std::barrier::arrival_token is a possible example.

But I missed registers clobbering and that loop overhead is present even if there's just one iteration, so I agree now that having 64-bit atomic lock free type on x86 contradicts the Standard requirement of "most efficient operations".

This reverts commit 3ae2c0a.

# Conflicts:
#	tests/tr1/tests/atomic/test.cpp
…into atomic_flag_test

# Conflicts:
#	tests/tr1/tests/atomic/test.cpp
@AlexGuteniev
Copy link
Contributor Author

Now new test pass.
And I think the decision can be made.

@cbezault cbezault removed the decision needed We need to choose something before working on this label Apr 27, 2020
Copy link
Member

@StephanTLavavej StephanTLavavej left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me! I have very minor suggestions, so I'll just go ahead and push changes.

Copy link
Contributor

@CaseyCarter CaseyCarter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - just a couple of style nitpicks that probably aren't worth resetting testing.

Copy link
Member

@StephanTLavavej StephanTLavavej left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GitHub testing is so fast, resetting it isn't a big deal. 😸 I just pushed changes to address your comments. I also added a reserve() call because @AlexGuteniev had already computed total and it seemed reasonable.

@CaseyCarter CaseyCarter self-assigned this May 12, 2020
@CaseyCarter CaseyCarter merged commit 7353cd1 into microsoft:master May 12, 2020
@CaseyCarter
Copy link
Contributor

Thanks for yet another contribution, @AlexGuteniev!

@AlexGuteniev AlexGuteniev deleted the atomic_flag_test branch May 13, 2020 01:53
@CaseyCarter CaseyCarter removed their assignment Jun 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cxx20 C++20 feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants