-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Intrusive shamap inner final #5152
base: develop
Are you sure you want to change the base?
Conversation
This branch has a long history. About two years ago I wrote a patch to remove the mutex from shamap inner nodes (ref: https://github.com/seelabs/rippled/tree/lockfree-tagged-cache). At the time I measured a large memory savings of about 2 gig. Unfortunately, the code required using the `folly` library, and I was hesitant to introduce such a large dependency into rippled (especially one that was so hard to build). This branch resurrects that old work and removes the `folly` dependency. The old branch used a lockless atomic shared pointer. This new branch introduces a intrusive pointer type. Unlike boost's intrusive pointer, this intrusive pointer can handle both strong and weak pointers (needed for the tagged cache). Since this is an intrusive pointer type, in order to support weak pointers, the object is not destroyed when the strong count goes to zero. Instead, it is "partially destroyed" (for example, inner nodes will reset their children). This intrusive pointer takes 16-bits for the strong count and 14-bits for the weak count, and takes one 64-bit pointer to point at the object. This is much smaller than a std::shared_pointer, which needs a control block to hold the strong and weak counts (and potentially other objects), as well as an extra pointer to point at the control block. The intrusive shared pointer can be modified to support for atomic operations (there is a branch that adds this support). These atomic operations can be used instead of the lock when changing inner node pointers in the shamap. Note: The space savings is independent from removing the locks from shamap inner node. Therefor this work is divided into two phases. In the first phase a non-atomic intrusive pointer is introduced and the locks are kept. In a second phases the atomic intrusive pointer could be introduced and the locks will be removed. Some of the code in this patch is written with the upcoming atomic work in mind (for example, using exchange in places). The atomic intrusive pointer also requires the C++ library to support `atomic_ref`. Both gcc and msvc support this, but at the time of this writing clang's library does not. Note: Intrusive pointer will be 12 bytes. The shared_ptr will be around 40 bytes, depending on implementation. When measuring memory usage on a validator, this patch resulted in between a 10 and 15% memory savings.
Analysis of Reference Count Ranges for Intrusive Smart PointersBackgroundFollowing the conversation in the original PR (Intrusive shamap inner (SHAMapTreeNode memory reduction) by seelabs · Pull Request #4815 · XRPLF/rippled ), it was raised that unlike the standard library
Questions
Code auditStrong referencesFrom analyzing the code: Worst-case scenario:
Theoretical Maximum value: 210 = 2 X (15 / 5) X 35 Weak referencesClass TestsTemporary code changes
Test runs
Test results
The test result of 387 references being observed is much higher than the theoretical maximum. This suggests: Conclusion
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The theoretical maximal value for string references is calculated to be 210. Experimental evidence also from the readme detects a value above the theoretical maximum: 387.
I ran a server for about an hour and detected a max of 1908.
These are all well below the limits of 65535, so this limit is probably safe. But it wouldn't hurt to revisit the theoretical maximum and discover why it is incorrect.
I did additional digging following a comment from @HowardHinnant. What I didn't take into account is that
Two of those routines are executed from the Based on those findings, we should update the Theoretical Maximum value.
Giving overall Theoretical Maximum value as: This is still significantly lower than the allocated 65535 range. |
See the original PR and review comments here: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not a complete review, mostly nits so far.
// shared pointer class for tree pointers | ||
// The ref counts are kept on the tree pointers themselves | ||
// I.e. this is an intrusive pointer type. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think these comments are not in the right location
void | ||
adopt(T* ptr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand the need to have adopt for a weak pointer? What would be the use case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Theoretically, we can have a case where we are transitioning from standard smart pointers to intr_ptr
in stages, as we plan to continue integrating intr_ptr
into other parts of rippled. During this transition, it is possible that some parts of the logic will already be using intr_ptr
, while other parts will still rely on std::shared_ptr
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense for shared ptrs, but not for weak pointer. I'm just struggling to understand the use case of adopt for weak ptr (std weak_ptr has not such functionality either)
assert(0); // only a strong pointer should case a | ||
// partialDestruction | ||
ptr_->partialDestructor(); | ||
partialDestructorFinished(&ptr_); | ||
// ptr_ is null and may no longer be used | ||
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's asserting but then still has some logic, maybe this logic should not be here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking through the rest of the codebase where UNREACHABLE
is used, it appears that we prefer to future-proof the implementation. In this case, the inclusion of partialDestructor
provides a safeguard should a situation arise where partial destruction becomes legitimate for a weak pointer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would agree with this statement most of the time, but the whole intent and idea of intrusive pointer here is that weak pointer must not call partial destruction. So even if such need suddenly arises, this whole idea and implementation will have to be revamped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolved by introducing separate ReleaseStrongRefAction and ReleaseWeakRefAction.
if (!ptr_) | ||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is redundant since unsafeReleaseNoStore() does the same
// We just added a weak ref. How could we destroy? | ||
assert(0); | ||
delete p; | ||
unsafeSetRawPtr(nullptr); | ||
return true; // Should never happen |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here, what's the purpose of this assert(0) and then logic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above, the pattern is to future-proof the implementation, but I don't have strong feelings about this approach and happy to remove the dead code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolved by introducing separate ReleaseStrongRefAction and ReleaseWeakRefAction.
{ | ||
using enum ReleaseRefAction; | ||
|
||
static_assert(weakDelta > strongDelta); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe it would be better to place this assert near the declaration of these variables?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand that Scott's idea was to perform the check where it truly matters. In the future, the nature of those constants might change, and they could become variables.
// 3) Test assignment from null union pointer | ||
union1 = SharedWeakUnion<TIBase>(); | ||
BEAST_EXPECT(union1.get() == nullptr); | ||
BEAST_EXPECT(TIBase::getState(id1) == TrackedState::alive); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this check is redundant since union1 was assigned strong2 in test 1) Normal assignment
src/xrpld/shamap/SHAMapInnerNode.h
Outdated
void | ||
shareChild(int m, std::shared_ptr<SHAMapTreeNode> const& child); | ||
shareChild(int m, SharedIntrusive<T> const& child); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: should be intr_ptr::SharedPtr here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. Well spotted!
src/xrpld/shamap/TreeNodeCache.h
Outdated
SHAMapTreeNode, | ||
/*IsKeyCache*/ false, | ||
SharedWeakUnion<SHAMapTreeNode>, | ||
SharedIntrusive<SHAMapTreeNode>>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: should be intr_ptr::SharedPtr here as well?
Also I see there's no SharedWeakUnion exposed the same way. I wonder if that should?
SHAMapInnerNode::partialDestructor() | ||
{ | ||
intr_ptr::SharedPtr<SHAMapTreeNode>* children; | ||
// structured bindings can't be captured in c++ 17; use tie instead |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I suppose we can use it now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated here + 3 other places to use modern structured bindings.
This reverts commit d610533.
operator=(SharedIntrusive&& rhs); | ||
|
||
template <class TT> | ||
// clang-format off |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: i tried to remove clang-format off from here (and place above). It doesn't seem to change the way it's formatted. Do you know if the previous version of clang-format was messing this up? It doesn't seem like this is required now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does one minor thing: requires
stays aligned with the rest of the template definition.
This is how it looks with formatting enabled for me:
template <class T>
template <class TT>
requires std::convertible_to<TT*, T*>
SharedIntrusive<T>&
SharedIntrusive<T>::operator=(SharedIntrusive<TT>&& rhs)
{
static_assert(
!std::is_same_v<T, TT>,
"This overload should not be instantiated for T == TT");
unsafeReleaseAndStore(rhs.unsafeExchange(nullptr));
return *this;
}
We seem to use this approach a lot throughout a project primarily for the same reason - to keep the beginning of the line aligned with the rest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, whatever approach we take that should be uniform.
lines 103 and 109 do not use clang format off, so please add it there or remove it here :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a bit more digging into the question of leading space since we are setting an example here for consistent style. At least two reputable C++ codebases Boost and LLVM are using space at the beginning of new line for multiline definitions:
ie:
template<
format FromFormat,
format ToFormat,
std::input_iterator I,
std::sentinel_for<I> S = I,
transcoding_error_handler ErrorHandler = use_replacement_character>
requires std::convertible_to<std::iter_value_t<I>, detail::format_to_type_t<FromFormat>>
and
template <class T>
requires(!std::convertible_to<T, int>)
void requires_init_is_convertible_to_decayed() {
static_assert(!requires(std::ranges::subrange<int*, int*> r, T init) {
std::ranges::fold_left_with_iter(r.begin(), r.end(), init, std::plus());
});
This is what our clang-format config suggests as well. In my view, overriding this style with clang-format off/on only makes the code less readable. Removing clang format overrides ...
static_assert( | ||
alignof(T) >= 2, | ||
"Bad alignment: Combo pointer requires low bit to be zero"); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I would suggest adding comment from line 430 here. That could improve readability
High Level Overview of Change
This PR finalises the work authored by Scott Determan (https://github.com/seelabs) and is based on the original PR (#4815).
Context of Change
There are two goals:
develop
branch.Type of Change
.gitignore
, formatting, dropping support for older tooling)