Skip to content

Conversation

@AdamBucior
Copy link
Contributor

@AdamBucior AdamBucior commented Apr 6, 2021

Enables optimizations for contiguous iterator + unreachable_sentinel_t ranges.

The only thing I'm concerned about is users overloading operator==(custom contiguous iterator, unreachable_sentinel_t) which doesn't always return false. Such code should definitely be forbidden by the standard but I'm not sure if it is.

@AdamBucior AdamBucior requested a review from a team as a code owner April 6, 2021 15:57
@StephanTLavavej StephanTLavavej added performance Must go faster ranges C++20/23 ranges labels Apr 6, 2021
Copy link
Contributor

@miscco miscco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit concerned about the additional derivations of _Copy_memcpy

Also I believe the _STL_ASSERTs should bee _STATIC_ASSERT(_Always_false, ...)

&& (_Is_sized1 || same_as<_Se1, unreachable_sentinel_t>) //
#pragma warning(suppress : 6287) // Redundant code: the left and right subexpressions are identical
&&(_Is_sized2 || same_as<_Se2, unreachable_sentinel_t>) //
&&same_as<_Pj1, identity> && same_as<_Pj2, identity>) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the above case you put those constraints on top. I would prefer to keep them there but downgrade to is_same_v or _Same_impl

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of the ranges algorithm optimizations I've written already use same_as.

stl/inc/xmemory Outdated
Comment on lines 1579 to 1604
template <class _InIt, class _OutIt, typename _DistIt>
in_out_result<_InIt, _OutIt> _Copy_memcpy_distance(
_InIt _IFirst, _OutIt _OFirst, _DistIt _DFirst, _DistIt _DLast) noexcept {
const auto _IFirstPtr = _To_address(_IFirst);
const auto _OFirstPtr = _To_address(_OFirst);
const auto _DFirstPtr = _To_address(_DFirst);
const auto _DLastPtr = _To_address(_DLast);
const auto _IFirst_ch = const_cast<char*>(reinterpret_cast<const volatile char*>(_IFirstPtr));
const auto _OFirst_ch = const_cast<char*>(reinterpret_cast<const volatile char*>(_OFirstPtr));
const auto _DFirst_ch = const_cast<char*>(reinterpret_cast<const volatile char*>(_DFirstPtr));
const auto _DLast_ch = const_cast<char*>(reinterpret_cast<const volatile char*>(_DLastPtr));
const auto _Count = static_cast<size_t>(_DLast_ch - _DFirst_ch);
_CSTD memcpy(_OFirst_ch, _IFirst_ch, _Count);
if constexpr (is_pointer_v<_InIt>) {
_IFirst = reinterpret_cast<_InIt>(_IFirst_ch + _Count);
} else {
_IFirst += _Count / sizeof(iter_value_t<_InIt>);
}

if constexpr (is_pointer_v<_OutIt>) {
_OFirst = reinterpret_cast<_OutIt>(_OFirst_ch + _Count);
} else {
_OFirst += _Count / sizeof(iter_value_t<_OutIt>);
}
return {_STD move(_IFirst), _STD move(_OFirst)};
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that overload is superfluous. The only difference is how we get to _Count

I guess we can get to that easier that duplicating all that machinery

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might just use the _count version but that might have worse codegen. #1433 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought more about determining _Count via the right method and passing that around. That should give the optimal codegen all the time

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I'm happy with this as-is. I think extracting the "distance" behavior from this function would either produce less efficient code or something even more confusing.

@CaseyCarter CaseyCarter self-assigned this Apr 28, 2021
stl/inc/xmemory Outdated
Comment on lines 1579 to 1604
template <class _InIt, class _OutIt, typename _DistIt>
in_out_result<_InIt, _OutIt> _Copy_memcpy_distance(
_InIt _IFirst, _OutIt _OFirst, _DistIt _DFirst, _DistIt _DLast) noexcept {
const auto _IFirstPtr = _To_address(_IFirst);
const auto _OFirstPtr = _To_address(_OFirst);
const auto _DFirstPtr = _To_address(_DFirst);
const auto _DLastPtr = _To_address(_DLast);
const auto _IFirst_ch = const_cast<char*>(reinterpret_cast<const volatile char*>(_IFirstPtr));
const auto _OFirst_ch = const_cast<char*>(reinterpret_cast<const volatile char*>(_OFirstPtr));
const auto _DFirst_ch = const_cast<char*>(reinterpret_cast<const volatile char*>(_DFirstPtr));
const auto _DLast_ch = const_cast<char*>(reinterpret_cast<const volatile char*>(_DLastPtr));
const auto _Count = static_cast<size_t>(_DLast_ch - _DFirst_ch);
_CSTD memcpy(_OFirst_ch, _IFirst_ch, _Count);
if constexpr (is_pointer_v<_InIt>) {
_IFirst = reinterpret_cast<_InIt>(_IFirst_ch + _Count);
} else {
_IFirst += _Count / sizeof(iter_value_t<_InIt>);
}

if constexpr (is_pointer_v<_OutIt>) {
_OFirst = reinterpret_cast<_OutIt>(_OFirst_ch + _Count);
} else {
_OFirst += _Count / sizeof(iter_value_t<_OutIt>);
}
return {_STD move(_IFirst), _STD move(_OFirst)};
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I'm happy with this as-is. I think extracting the "distance" behavior from this function would either produce less efficient code or something even more confusing.

@CaseyCarter CaseyCarter removed their assignment Jun 2, 2021
Copy link
Member

@StephanTLavavej StephanTLavavej left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good - I'll validate and push changes for the very minor things I noticed.

@StephanTLavavej
Copy link
Member

@AdamBucior @CaseyCarter FYI I added _STD move occurrences and verified that the tests are passing, please hiss if I damaged something (I'm 99.999% sure they're good).

@StephanTLavavej
Copy link
Member

Also had to merge main and fix a stealth merge conflict - now that views are guarded by /std:c++latest after GH-1929, the tests being modified to use views::take_while need to use the concepts_latest_matrix.

@StephanTLavavej StephanTLavavej self-assigned this Jun 11, 2021
@StephanTLavavej StephanTLavavej merged commit e326ec1 into microsoft:main Jun 12, 2021
@StephanTLavavej
Copy link
Member

Thanks for increasing the STL's performance to previously unreachable levels! 😹 🚀 📈

@AdamBucior AdamBucior deleted the unreachable-sentinel-optimizations branch June 12, 2021 09:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance Must go faster ranges C++20/23 ranges

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants