Skip to content

[libc++] Add some _LIBCPP_ASSUMEs for bounded iterators #109033

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

davidben
Copy link
Contributor

Playing around, this seems to address #101370 for std::vector<char>, but not std::vector<int>. std::vector<int> I believe also needs a solution to #101372, which is an alignment issue.

The root problem is that vector uses end_cap instead of end as the hardening fencepost. But user code (be it an actual iter != vec.end() check, or one synthesized by the language in a range-for loop) uses the container end as the fencepost.

We would like the user fencepost to delete the hardening fencepost. For that to happen, the compiler must know that if you take your iterator and then steadily ++iter, stopping at iter == end, you won't hit iter == end_cap along the way. To fgire this out, the compiler needs to know a few things:

  1. iter <= end <= end_cap at the start

  2. iter, end, and end_cap are all compatibly aligned, such that ++iter cannot skip over end and then get to end_cap.

The first of these is not obvious in std::vector for because std::vector stores three pointers, rather than one pointer and then sizes. That means the compiler never sees end (or end_cap) computed as begin + size (or begin + capacity). Without type invariants, the compiler does not know that the three pointers have any relation at all.

This PR addresses it by putting assumes in __bounded_iter itself. We could also place it in std::vector::__make_iter, but this invariant is important enough for reasoning about bounds that it seemed worth establishing it across the board. (Note this means we trust container implementations to use the bounded iterators correctly, which we already do. We're interested in catching bugs in user code, not the STL itself.)

That alone is actually enough to handle this because constructing vector::end() is enough to tell the compiler that begin <= end, and loops usually start at begin. But since __make_iter is sometimes called on non-endpoint iterators, I added one extra invariant to __make_iter.

The second issue is #101372. This PR does not address it but will (hopefully) take advantage of it once available.

In working on this, I noticed that _LIBCPP_ASSUME silences -Wassume. Without that warning, I ended up spending a lot of time debugging silently no-op assumes. This seems to be a remnant of when _LIBCPP_ASSUME was part of _LIBCPP_ASSERT. Now that it's standalone, I think we shouldn't disable the warning by default. If we ever need to silence the warning, let's do it explicitly.

@davidben davidben requested a review from a team as a code owner September 17, 2024 19:29
@davidben
Copy link
Contributor Author

CC @danakj

@llvmbot llvmbot added the libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. label Sep 17, 2024
@llvmbot
Copy link
Member

llvmbot commented Sep 17, 2024

@llvm/pr-subscribers-libcxx

Author: David Benjamin (davidben)

Changes

Playing around, this seems to address #101370 for std::vector&lt;char&gt;, but not std::vector&lt;int&gt;. std::vector&lt;int&gt; I believe also needs a solution to #101372, which is an alignment issue.

The root problem is that vector uses end_cap instead of end as the hardening fencepost. But user code (be it an actual iter != vec.end() check, or one synthesized by the language in a range-for loop) uses the container end as the fencepost.

We would like the user fencepost to delete the hardening fencepost. For that to happen, the compiler must know that if you take your iterator and then steadily ++iter, stopping at iter == end, you won't hit iter == end_cap along the way. To fgire this out, the compiler needs to know a few things:

  1. iter &lt;= end &lt;= end_cap at the start

  2. iter, end, and end_cap are all compatibly aligned, such that ++iter cannot skip over end and then get to end_cap.

The first of these is not obvious in std::vector for because std::vector stores three pointers, rather than one pointer and then sizes. That means the compiler never sees end (or end_cap) computed as begin + size (or begin + capacity). Without type invariants, the compiler does not know that the three pointers have any relation at all.

This PR addresses it by putting assumes in __bounded_iter itself. We could also place it in std::vector::__make_iter, but this invariant is important enough for reasoning about bounds that it seemed worth establishing it across the board. (Note this means we trust container implementations to use the bounded iterators correctly, which we already do. We're interested in catching bugs in user code, not the STL itself.)

That alone is actually enough to handle this because constructing vector::end() is enough to tell the compiler that begin &lt;= end, and loops usually start at begin. But since __make_iter is sometimes called on non-endpoint iterators, I added one extra invariant to __make_iter.

The second issue is #101372. This PR does not address it but will (hopefully) take advantage of it once available.

In working on this, I noticed that _LIBCPP_ASSUME silences -Wassume. Without that warning, I ended up spending a lot of time debugging silently no-op assumes. This seems to be a remnant of when _LIBCPP_ASSUME was part of _LIBCPP_ASSERT. Now that it's standalone, I think we shouldn't disable the warning by default. If we ever need to silence the warning, let's do it explicitly.


Full diff: https://github.com/llvm/llvm-project/pull/109033.diff

3 Files Affected:

  • (modified) libcxx/include/__assert (+1-3)
  • (modified) libcxx/include/__iterator/bounded_iter.h (+12)
  • (modified) libcxx/include/vector (+9-1)
diff --git a/libcxx/include/__assert b/libcxx/include/__assert
index 90eaa6023587b9..5a5cad472425c5 100644
--- a/libcxx/include/__assert
+++ b/libcxx/include/__assert
@@ -27,9 +27,7 @@
 // optimization intent. See https://discourse.llvm.org/t/llvm-assume-blocks-optimization/71609 for a
 // discussion.
 #if __has_builtin(__builtin_assume)
-#  define _LIBCPP_ASSUME(expression)                                                                                   \
-    (_LIBCPP_DIAGNOSTIC_PUSH _LIBCPP_CLANG_DIAGNOSTIC_IGNORED("-Wassume")                                              \
-         __builtin_assume(static_cast<bool>(expression)) _LIBCPP_DIAGNOSTIC_POP)
+#  define _LIBCPP_ASSUME(expression) __builtin_assume(static_cast<bool>(expression))
 #else
 #  define _LIBCPP_ASSUME(expression) ((void)0)
 #endif
diff --git a/libcxx/include/__iterator/bounded_iter.h b/libcxx/include/__iterator/bounded_iter.h
index 5a86bd98e71940..bc0a4d0bc56930 100644
--- a/libcxx/include/__iterator/bounded_iter.h
+++ b/libcxx/include/__iterator/bounded_iter.h
@@ -89,10 +89,22 @@ struct __bounded_iter {
   _LIBCPP_HIDE_FROM_ABI
   _LIBCPP_CONSTEXPR_SINCE_CXX14 explicit __bounded_iter(_Iterator __current, _Iterator __begin, _Iterator __end)
       : __current_(__current), __begin_(__begin), __end_(__end) {
+    // These are internal checks rather than hardening checks because the STL container is expected to ensure they are
+    // in order.
     _LIBCPP_ASSERT_INTERNAL(
         __begin <= __current, "__bounded_iter(current, begin, end): current and begin are inconsistent");
     _LIBCPP_ASSERT_INTERNAL(
         __current <= __end, "__bounded_iter(current, begin, end): current and end are inconsistent");
+
+    // However, this order is important to help the compiler reason about bounds checks. For example, `std::vector` sets
+    // `__end_ptr` to the capacity, not the true container end. To translate container-end fenceposts into hardening-end
+    // fenceposts, we must know that container-end <= hardening-end. `std::__to_address` is needed because `_Iterator`
+    // may be wrapped type, such that `operator<=` has side effects.
+    pointer __begin_ptr   = std::__to_address(__begin);
+    pointer __current_ptr = std::__to_address(__current);
+    pointer __end_ptr = std::__to_address(__end);
+    _LIBCPP_ASSUME(__begin_ptr <= __current_ptr);
+    _LIBCPP_ASSUME(__current_ptr <= __end_ptr);
   }
 
   template <class _It>
diff --git a/libcxx/include/vector b/libcxx/include/vector
index 7d3aac5989a48c..7e39b53d420654 100644
--- a/libcxx/include/vector
+++ b/libcxx/include/vector
@@ -850,6 +850,10 @@ private:
 
   _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI iterator __make_iter(pointer __p) _NOEXCEPT {
 #ifdef _LIBCPP_ABI_BOUNDED_ITERATORS_IN_VECTOR
+    // `__bounded_iter` will tell the compiler that `__p` is bounded by `__begin_` and `__end_cap`, but nothing a priori
+    // relates `__p` to `__end_`.
+    _LIBCPP_ASSUME(__p <= this->__end_);
+
     // Bound the iterator according to the capacity, rather than the size.
     //
     // Vector guarantees that iterators stay valid as long as no reallocation occurs even if new elements are inserted
@@ -870,7 +874,11 @@ private:
 
   _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI const_iterator __make_iter(const_pointer __p) const _NOEXCEPT {
 #ifdef _LIBCPP_ABI_BOUNDED_ITERATORS_IN_VECTOR
-    // Bound the iterator according to the capacity, rather than the size.
+    // `__bounded_iter` will tell the compiler that `__p` is bounded by `__begin_` and `__end_cap`, but nothing a priori
+    // relates `__p` to `__end_`.
+    _LIBCPP_ASSUME(__p <= this->__end_);
+
+    // Bound the iterator according to the capacity, rather than the size. See above.
     return std::__make_bounded_iter(
         std::__wrap_iter<const_pointer>(__p),
         std::__wrap_iter<const_pointer>(this->__begin_),

@@ -850,6 +850,10 @@ private:

_LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI iterator __make_iter(pointer __p) _NOEXCEPT {
#ifdef _LIBCPP_ABI_BOUNDED_ITERATORS_IN_VECTOR
// `__bounded_iter` will tell the compiler that `__p` is bounded by `__begin_` and `__end_cap`, but nothing a priori
// relates `__p` to `__end_`.
_LIBCPP_ASSUME(__p <= this->__end_);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is strictly needed. I did this for the __make_iter calls that don't pass one of begin or end. Thoughts?

Copy link

github-actions bot commented Sep 17, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@davidben
Copy link
Contributor Author

Not sure what's going on with the failing checks. When I open them, it just says "This job failed" with no log output.
https://github.com/llvm/llvm-project/actions/runs/10909918134/job/30279195273?pr=109033

@ldionne
Copy link
Member

ldionne commented Sep 25, 2024

Please rebase onto main to re-trigger CI. The CI instability should be resolved now.

@davidben davidben force-pushed the libcxx-bounded-iter-assume branch from 84493f9 to 847d636 Compare September 25, 2024 17:35
@davidben
Copy link
Contributor Author

Rebased

@davidben
Copy link
Contributor Author

Anything left to be done here?

@danakj
Copy link
Contributor

danakj commented Nov 20, 2024

Will need to rebase again 😅

Playing around, this seems to address llvm#101370 for `std::vector<char>`,
but not `std::vector<int>`. `std::vector<int>` I believe also needs a
solution to llvm#101372, which is an alignment issue.

The root problem is that vector uses end_cap instead of end as the
hardening fencepost. But user code (be it an actual `iter != vec.end()`
check, or one synthesized by the language in a range-for loop) uses the
container end as the fencepost.

We would like the user fencepost to delete the hardening fencepost. For
that to happen, the compiler must know that if you take your iterator
and then steadily `++iter`, stopping at `iter == end`, you won't hit
`iter == end_cap` along the way. To fgire this out, the compiler needs
to know a few things:

1. `iter <= end <= end_cap` at the start

2. `iter`, `end`, and `end_cap` are all compatibly aligned, such that
   `++iter` cannot skip over `end` and then get to `end_cap`.

The first of these is not obvious in `std::vector` for because
`std::vector` stores three pointers, rather than one pointer and then
sizes. That means the compiler never sees `end` (or `end_cap`) computed
as `begin + size` (or `begin + capacity`). Without type invariants, the
compiler does not know that the three pointers have any relation at all.

This PR addresses it by putting assumes in `__bounded_iter` itself. We
could also place it in `std::vector::__make_iter`, but this invariant is
important enough for reasoning about bounds that it seemed worth
establishing it across the board. (Note this means we trust container
implementations to use the bounded iterators correctly, which we already
do. We're interested in catching bugs in user code, not the STL itself.)

That alone is actually enough to handle this because constructing
`vector::end()` is enough to tell the compiler that `begin <= end`, and
loops usually start at `begin`. But since `__make_iter` is sometimes
called on non-endpoint iterators, I added one extra invariant to
`__make_iter`.

The second issue is llvm#101372. This PR does not address it but will
(hopefully) take advantage of it once available.

In working on this, I noticed that _LIBCPP_ASSUME silences -Wassume.
Without that warning, I ended up spending a lot of time debugging
silently no-op assumes. This seems to be a remnant of when
_LIBCPP_ASSUME was part of _LIBCPP_ASSERT. Now that it's standalone, I
think we shouldn't disable the warning by default. If we ever need to
silence the warning, let's do it explicitly.
@davidben davidben force-pushed the libcxx-bounded-iter-assume branch from 24a659a to bc319a5 Compare November 20, 2024 18:26
@davidben
Copy link
Contributor Author

Rebased once again.

@ldionne Friendly ping. I would prefer to not have to keep rebasing these. :)

Copy link
Member

@ldionne ldionne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay, for some reason I thought someone else was looking at this.

@var-const var-const added the hardening Issues related to the hardening effort label Dec 21, 2024
@ldionne
Copy link
Member

ldionne commented Jan 13, 2025

Gentle ping, looks like there are minor things to address but it would be great to hit the LLVM 20 release with those.

Copy link
Member

@ldionne ldionne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I configure my local tree with

-DLIBCXX_HARDENING_MODE=fast
-DLIBCXX_ABI_DEFINES="_LIBCPP_ABI_BOUNDED_ITERATORS;_LIBCPP_ABI_BOUNDED_ITERATORS_IN_STRING;_LIBCPP_ABI_BOUNDED_ITERATORS_IN_VECTOR"

and then run this benchmark with and without the patch:

$ libcxx-lit <build> -sv --param optimization=speed --time-tests --show-all --param enable_benchmarks=run -j1 libcxx/test/benchmarks/containers/sequence/vector.bench.cpp

I get those results:

$ libcxx/utils/libcxx-compare-benchmarks build/{default,candidate} libcxx/test/benchmarks/containers/sequence/vector.bench.cpp
Comparing build/default/libcxx/test/benchmarks/containers/sequence/Output/vector.bench.cpp.dir/benchmark-result.json to build/candidate/libcxx/test/benchmarks/containers/sequence/Output/vector.bench.cpp.dir/benchmark-result.json
Benchmark                                                                Time             CPU      Time Old      Time New       CPU Old       CPU New
-----------------------------------------------------------------------------------------------------------------------------------------------------
std::vector<int>::iterate-whole-container/32                          -0.0273         -0.0271            11            11            11            11
std::vector<int>::iterate-whole-container/1024                        -0.0257         -0.0256           336           328           336           328
std::vector<int>::iterate-whole-container/8192                        -0.0375         -0.0374          2657          2558          2657          2558
std::vector<std::string>::iterate-whole-container/32                  +0.3467         +0.3473            11            15            11            15
std::vector<std::string>::iterate-whole-container/1024                -0.0394         -0.0387           341           328           341           328
std::vector<std::string>::iterate-whole-container/8192                -0.0041         -0.0040          2571          2560          2570          2560
OVERALL_GEOMEAN                                                       +0.0272         +0.0276             0             0             0             0

I think the result for std::vector<std::string>::iterate-whole-container/32 is a fluke due to the small size of the container. But I also don't see any improvement, which is a bit surprising to me. Any ideas?

Copy link
Member

@ldionne ldionne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with this patch assuming it does provide a benefit, but @davidben I'd be really curious to understand where it does provide benefits, since my naive attempt with a benchmark was not successful.

@ldionne
Copy link
Member

ldionne commented May 12, 2025

Gentle ping @davidben

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hardening Issues related to the hardening effort libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants