Intrusive shamap inner final #5152

vlntb · 2024-10-03T14:25:29Z

High Level Overview of Change

This PR finalises the work authored by Scott Determan (https://github.com/seelabs) and is based on the original PR (#4815).

Context of Change

There are two goals:

Synchronise this change with the most recent develop branch.
Address outstanding questions raised in the original PR.

Type of Change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Refactor (non-breaking change that only restructures code)
Performance (increase or change in throughput and/or latency)
Tests (you added tests for code that already exists, or your new feature included in this PR)
Documentation update
Chore (no impact to binary, e.g. .gitignore, formatting, dropping support for older tooling)
Release

This branch has a long history. About two years ago I wrote a patch to remove the mutex from shamap inner nodes (ref: https://github.com/seelabs/rippled/tree/lockfree-tagged-cache). At the time I measured a large memory savings of about 2 gig. Unfortunately, the code required using the `folly` library, and I was hesitant to introduce such a large dependency into rippled (especially one that was so hard to build). This branch resurrects that old work and removes the `folly` dependency. The old branch used a lockless atomic shared pointer. This new branch introduces a intrusive pointer type. Unlike boost's intrusive pointer, this intrusive pointer can handle both strong and weak pointers (needed for the tagged cache). Since this is an intrusive pointer type, in order to support weak pointers, the object is not destroyed when the strong count goes to zero. Instead, it is "partially destroyed" (for example, inner nodes will reset their children). This intrusive pointer takes 16-bits for the strong count and 14-bits for the weak count, and takes one 64-bit pointer to point at the object. This is much smaller than a std::shared_pointer, which needs a control block to hold the strong and weak counts (and potentially other objects), as well as an extra pointer to point at the control block. The intrusive shared pointer can be modified to support for atomic operations (there is a branch that adds this support). These atomic operations can be used instead of the lock when changing inner node pointers in the shamap. Note: The space savings is independent from removing the locks from shamap inner node. Therefor this work is divided into two phases. In the first phase a non-atomic intrusive pointer is introduced and the locks are kept. In a second phases the atomic intrusive pointer could be introduced and the locks will be removed. Some of the code in this patch is written with the upcoming atomic work in mind (for example, using exchange in places). The atomic intrusive pointer also requires the C++ library to support `atomic_ref`. Both gcc and msvc support this, but at the time of this writing clang's library does not. Note: Intrusive pointer will be 12 bytes. The shared_ptr will be around 40 bytes, depending on implementation. When measuring memory usage on a validator, this patch resulted in between a 10 and 15% memory savings.

codecov · 2024-10-14T11:58:16Z

Codecov Report

Attention: Patch coverage is 86.05769% with 116 lines in your changes missing coverage. Please review.

Project coverage is 77.9%. Comparing base (0324764) to head (da54c14).

Files with missing lines	Patch %	Lines
include/xrpl/basics/TaggedCache.ipp	85.2%	39 Missing ⚠️
include/xrpl/basics/IntrusivePointer.ipp	86.8%	38 Missing ⚠️
include/xrpl/basics/SharedWeakCachePointer.ipp	75.9%	13 Missing ⚠️
include/xrpl/basics/IntrusiveRefCounts.h	91.8%	8 Missing ⚠️
src/xrpld/shamap/detail/SHAMapInnerNode.cpp	58.3%	5 Missing ⚠️
src/xrpld/shamap/detail/SHAMapTreeNode.cpp	62.5%	3 Missing ⚠️
src/xrpld/shamap/detail/TaggedPointer.ipp	80.0%	3 Missing ⚠️
src/xrpld/shamap/SHAMapTxPlusMetaLeafNode.h	0.0%	2 Missing ⚠️
src/xrpld/shamap/detail/SHAMap.cpp	96.1%	2 Missing ⚠️
include/xrpl/basics/IntrusivePointer.h	91.7%	1 Missing ⚠️
... and 2 more

Additional details and impacted files

@@           Coverage Diff            @@
##           develop   #5152    +/-   ##
========================================
  Coverage     77.9%   77.9%            
========================================
  Files          783     788     +5     
  Lines        66707   67166   +459     
  Branches      8118    8161    +43     
========================================
+ Hits         51953   52326   +373     
- Misses       14754   14840    +86

Files with missing lines	Coverage Δ
include/xrpl/basics/TaggedCache.h	`100.0% <100.0%> (+13.1%)`	⬆️
src/xrpld/app/ledger/detail/LedgerMaster.cpp	`43.9% <ø> (ø)`
src/xrpld/app/ledger/detail/TransactionMaster.cpp	`73.8% <ø> (ø)`
src/xrpld/app/main/Application.h	`100.0% <ø> (ø)`
src/xrpld/app/misc/SHAMapStoreImp.h	`96.6% <ø> (ø)`
src/xrpld/ledger/detail/CachedView.cpp	`94.4% <ø> (ø)`
src/xrpld/nodestore/Database.h	`69.2% <ø> (ø)`
src/xrpld/shamap/SHAMap.h	`100.0% <ø> (ø)`
src/xrpld/shamap/SHAMapAccountStateLeafNode.h	`100.0% <100.0%> (ø)`
src/xrpld/shamap/SHAMapInnerNode.h	`88.2% <ø> (ø)`
... and 17 more

... and 4 files with indirect coverage changes

…e counts

…om strong to weak

vlntb · 2024-11-06T12:29:34Z

Analysis of Reference Count Ranges for Intrusive Smart Pointers

Background

Following the conversation in the original PR (Intrusive shamap inner (SHAMapTreeNode memory reduction) by seelabs · Pull Request #4815 · XRPLF/rippled ), it was raised that unlike the standard library shared_ptr and weak_ptr, the newly introduced intrusive versions have narrower ranges for storing reference counts. The proposed change sets ranges as:

For strong references: 65535
For weak references: 16383

Questions

The task is to do a code audit and prepare tests to check possible maximum reference number counts that can occur in the current version of rippled.
Decide if the proposed ranges are enough for the current version and the near future. It is possible to increase ranges in the future, while the move to intrusive smart pointers will still be beneficial.

Code audit

Strong references

From analyzing the code:
Theoretical Maximum = (shareChild calls) X (number of ledgers containing the same node)
where
shareChild calls - the shareChild calls during tree traversal (walkSubTree).
number of ledgers containing the same node - while generating a ledger, the same transaction might be added to several versions of the ledger until one of them gets accepted by consensus. Therefore, the same node may get referenced from multiple trees representing different ledger versions.

Worst-case scenario:

shareChild calls during tree traversal = 2
Given network of 35 validators
5-second interval to reach a consensus
15-second interval deadline before network reset

Theoretical Maximum value: 210 = 2 X (15 / 5) X 35

Weak references

Class WeakIntrusive is not used explicitly or implicitly at the moment. The only place where the weak pointer is used is in the conversion from strong to weak when sweeping the TaggedCache. This means that the number of weak reference counts can never be higher than the number of strong references.

Tests

Temporary code changes

Test runs

12 rippled sessions ranging in duration from 1 hr to 24 hrs
Network: livenet
State: proposing

Test results

Maximum number of strong references: 387
Maximum number of weak references: 1

The test result of 387 references being observed is much higher than the theoretical maximum. This suggests:
There may be excessive copying of nodes during the initialization phase or traversal.
Certain caching mechanisms (like TaggedCache or similar) can have an effect on the reference count.

Conclusion

Strong Reference Count Range: The proposed limit of 65535 is more than adequate, and the logic for calculating the theoretical maximum (210) is sound. The observed discrepancy (387) highlights a need to investigate potential inefficiencies in node copying or caching.
Weak Reference Count Range: The proposed limit of 16383 is also sufficient, and the observed maximum (1) confirms that weak references are minimal under current usage patterns.
Actionable Insight: The excessive copying or caching logic leading to 387 references warrants further investigation to improve efficiency.

HowardHinnant

The theoretical maximal value for string references is calculated to be 210. Experimental evidence also from the readme detects a value above the theoretical maximum: 387.
I ran a server for about an hour and detected a max of 1908.

These are all well below the limits of 65535, so this limit is probably safe. But it wouldn't hurt to revisit the theoretical maximum and discover why it is incorrect.

vlntb · 2024-11-13T19:09:44Z

The theoretical maximal value for string references is calculated to be 210. Experimental evidence also from the readme detects a value above the theoretical maximum: 387. I ran a server for about an hour and detected a max of 1908.

These are all well below the limits of 65535, so this limit is probably safe. But it wouldn't hurt to revisit the theoretical maximum and discover why it is incorrect.

I did additional digging following a comment from @HowardHinnant. What I didn't take into account is that rippled is processing transaction or ledger data in a concurrent environment. I identified four types of routines that can happen in parallel:

InboundLedgersImp::gotLedgerData
SHAMap::walkTowardsKey
SHAMap::flushDirty
LedgerMaster::gotFetchPack

Two of those routines are executed from the JobQueue and can be parallized further based on the node_size configuration parameter. The difference in this parameter explains the difference in the result that Howard and I received. Howard had his node_size set as huge, resulting in 8 threads in the JobQueue pool. In my setup, I had it defined as medium, resulting in 4 threads.

Based on those findings, we should update the Theoretical Maximum value.

Theoretical Maximum value for a single thread: 210 = 2 X (15 / 5) X 35.
InboundLedgersImp::gotLedgerData can be executed across a maximum of 8 threads.
SHAMap::walkTowardsKey - 1 thread
SHAMap::flushDirty - 1 thread
LedgerMaster::gotFetchPack - can be executed across a maximum of 8 threads.

Giving overall Theoretical Maximum value as:
8 x 210 + 210 + 210 + 8 x 210 = 3780.

This is still significantly lower than the allocated 65535 range.

… strong to weak

…counts

vlntb · 2024-11-21T12:36:27Z

See the original PR and review comments here:
#4815

vvysokikh1

not a complete review, mostly nits so far.

vvysokikh1 · 2024-12-11T12:12:03Z

include/xrpl/basics/IntrusivePointer.h

+// shared pointer class for tree pointers
+// The ref counts are kept on the tree pointers themselves
+// I.e. this is an intrusive pointer type.


nit: I think these comments are not in the right location

vvysokikh1 · 2024-12-12T12:03:44Z

include/xrpl/basics/IntrusivePointer.h

+    void
+    adopt(T* ptr);


I'm not sure I understand the need to have adopt for a weak pointer? What would be the use case?

Theoretically, we can have a case where we are transitioning from standard smart pointers to intr_ptr in stages, as we plan to continue integrating intr_ptr into other parts of rippled. During this transition, it is possible that some parts of the logic will already be using intr_ptr, while other parts will still rely on std::shared_ptr.

This makes sense for shared ptrs, but not for weak pointer. I'm just struggling to understand the use case of adopt for weak ptr (std weak_ptr has not such functionality either)

vvysokikh1 · 2024-12-12T12:09:00Z

include/xrpl/basics/IntrusivePointer.ipp

+            assert(0);  // only a strong pointer should case a
+                        // partialDestruction
+            ptr_->partialDestructor();
+            partialDestructorFinished(&ptr_);
+            // ptr_ is null and may no longer be used
+            break;


It's asserting but then still has some logic, maybe this logic should not be here?

Looking through the rest of the codebase where UNREACHABLE is used, it appears that we prefer to future-proof the implementation. In this case, the inclusion of partialDestructor provides a safeguard should a situation arise where partial destruction becomes legitimate for a weak pointer.

I would agree with this statement most of the time, but the whole intent and idea of intrusive pointer here is that weak pointer must not call partial destruction. So even if such need suddenly arises, this whole idea and implementation will have to be revamped.

Resolved by introducing separate ReleaseStrongRefAction and ReleaseWeakRefAction.

vvysokikh1 · 2024-12-12T12:27:54Z

include/xrpl/basics/IntrusivePointer.ipp

+    if (!ptr_)
+        return;


I think this is redundant since unsafeReleaseNoStore() does the same

vvysokikh1 · 2024-12-12T12:34:17Z

include/xrpl/basics/IntrusivePointer.ipp

+            // We just added a weak ref. How could we destroy?
+            assert(0);
+            delete p;
+            unsafeSetRawPtr(nullptr);
+            return true;  // Should never happen


same here, what's the purpose of this assert(0) and then logic

Same as above, the pattern is to future-proof the implementation, but I don't have strong feelings about this approach and happy to remove the dead code.

Resolved by introducing separate ReleaseStrongRefAction and ReleaseWeakRefAction.

vvysokikh1 · 2024-12-13T15:43:50Z

include/xrpl/basics/IntrusiveRefCounts.h

+{
+    using enum ReleaseRefAction;
+
+    static_assert(weakDelta > strongDelta);


nit: maybe it would be better to place this assert near the declaration of these variables?

I understand that Scott's idea was to perform the check where it truly matters. In the future, the nature of those constants might change, and they could become variables.

vvysokikh1 · 2024-12-13T20:33:36Z

src/test/basics/IntrusiveShared_test.cpp

+            // 3) Test assignment from null union pointer
+            union1 = SharedWeakUnion<TIBase>();
+            BEAST_EXPECT(union1.get() == nullptr);
+            BEAST_EXPECT(TIBase::getState(id1) == TrackedState::alive);


nit: this check is redundant since union1 was assigned strong2 in test 1) Normal assignment

vvysokikh1 · 2024-12-13T20:57:58Z

src/xrpld/shamap/SHAMapInnerNode.h

    void
-    shareChild(int m, std::shared_ptr<SHAMapTreeNode> const& child);
+    shareChild(int m, SharedIntrusive<T> const& child);


nit: should be intr_ptr::SharedPtr here?

I agree. Well spotted!

vvysokikh1 · 2024-12-13T21:00:39Z

src/xrpld/shamap/TreeNodeCache.h

+    SHAMapTreeNode,
+    /*IsKeyCache*/ false,
+    SharedWeakUnion<SHAMapTreeNode>,
+    SharedIntrusive<SHAMapTreeNode>>;


nit: should be intr_ptr::SharedPtr here as well?

Also I see there's no SharedWeakUnion exposed the same way. I wonder if that should?

vvysokikh1 · 2024-12-13T21:06:10Z

src/xrpld/shamap/detail/SHAMapInnerNode.cpp

+SHAMapInnerNode::partialDestructor()
+{
+    intr_ptr::SharedPtr<SHAMapTreeNode>* children;
+    // structured bindings can't be captured in c++ 17; use tie instead


nit: I suppose we can use it now?

Updated here + 3 other places to use modern structured bindings.

This reverts commit d610533.

vvysokikh1 · 2025-01-02T11:11:49Z

include/xrpl/basics/IntrusivePointer.h

+    operator=(SharedIntrusive&& rhs);
+
+    template <class TT>
+    // clang-format off


nit: i tried to remove clang-format off from here (and place above). It doesn't seem to change the way it's formatted. Do you know if the previous version of clang-format was messing this up? It doesn't seem like this is required now.

It does one minor thing: requires stays aligned with the rest of the template definition.

This is how it looks with formatting enabled for me:

template <class T> template <class TT> requires std::convertible_to<TT*, T*> SharedIntrusive<T>& SharedIntrusive<T>::operator=(SharedIntrusive<TT>&& rhs) { static_assert( !std::is_same_v<T, TT>, "This overload should not be instantiated for T == TT"); unsafeReleaseAndStore(rhs.unsafeExchange(nullptr)); return *this; }

We seem to use this approach a lot throughout a project primarily for the same reason - to keep the beginning of the line aligned with the rest.

ok, whatever approach we take that should be uniform.

lines 103 and 109 do not use clang format off, so please add it there or remove it here :)

I did a bit more digging into the question of leading space since we are setting an example here for consistent style. At least two reputable C++ codebases Boost and LLVM are using space at the beginning of new line for multiline definitions:
ie:

template< format FromFormat, format ToFormat, std::input_iterator I, std::sentinel_for<I> S = I, transcoding_error_handler ErrorHandler = use_replacement_character> requires std::convertible_to<std::iter_value_t<I>, detail::format_to_type_t<FromFormat>>

and

template <class T> requires(!std::convertible_to<T, int>) void requires_init_is_convertible_to_decayed() { static_assert(!requires(std::ranges::subrange<int*, int*> r, T init) { std::ranges::fold_left_with_iter(r.begin(), r.end(), init, std::plus()); });

This is what our clang-format config suggests as well. In my view, overriding this style with clang-format off/on only makes the code less readable. Removing clang format overrides ...

vvysokikh1 · 2025-01-02T11:45:05Z

include/xrpl/basics/IntrusivePointer.h

+    static_assert(
+        alignof(T) >= 2,
+        "Bad alignment: Combo pointer requires low bit to be zero");
+


nit: I would suggest adding comment from line 430 here. That could improve readability

Scott Determan added 4 commits October 2, 2024 17:25

Split tagged cache into .h and .ipp files cc

3f07882

[remove] Scripts to measure and plot memory savings

da72b7d

[fold] Replace jthread with thread (mac does not have jthread yet)

136db4f

vlntb mentioned this pull request Oct 10, 2024

Intrusive shamap inner (SHAMapTreeNode memory reduction) #4815

Open

adding missing headers

19464a0

vlntb added 2 commits October 17, 2024 18:33

[remove] Temporary change to measure maximum strong and weak referenc…

c0659ff

…e counts

[remove] Temporary change to measure max ref counts when switching fr…

c20d442

…om strong to weak

vlntb requested a review from HowardHinnant November 6, 2024 12:30

HowardHinnant approved these changes Nov 6, 2024

View reviewed changes

vlntb added 5 commits November 15, 2024 12:04

[remove] stack trace printout

f68dda8

sync with develop

785fefc

clang-reformat

aa1022d

clang-reformat

392ded3

resolve levelization issues

2510c2d

vlntb self-assigned this Nov 18, 2024

vlntb added 11 commits November 18, 2024 18:22

trigger GitHub actions

f54e434

revert stack trace printout

a2fa231

revert Temporary change to measure max ref counts when switching from…

82af26a

… strong to weak

revert Temporary change to measure maximum strong and weak reference …

d607e78

…counts

revert Scripts to measure and plot memory savings

5819fa6

inlining template functions

4b875e9

addressing review comments XRPLF#1

666d9ea

addressing review comments

ce3e00f

addressing review comments

8c27b3a

addressing review comments

ac9ef08

code cleanup

3f8ab63

Merge branch 'develop' into intrusive_shamap_inner_final

0050eab

vlntb marked this pull request as ready for review November 21, 2024 13:56

vlntb changed the title ~~[WIP ]Intrusive shamap inner final~~ Intrusive shamap inner final Nov 22, 2024

vlntb added 4 commits November 27, 2024 16:14

fixing ValueEntry checks based on the combined shared and weak

3beaa43

Merge branch 'develop' into intrusive_shamap_inner_final

1fa8519

sync with develop

5a1790b

new ASSERTs for Antithesis

d8588d0

vvysokikh1 reviewed Dec 13, 2024

View reviewed changes

vlntb added 11 commits December 16, 2024 12:56

moving to structured bindings

d610533

Addressing review comments XRPLF#1

cc2563f

Addressing review comments XRPLF#2

f9650ea

Addressing review comments XRPLF#3

f54fb8f

messages in ASSERTS

5bdab50

clang-format

0b613fe

Merge branch 'develop' into intrusive_shamap_inner_final

0df59d4

Revert "moving to structured bindings"

a5ccaf5

This reverts commit d610533.

sync with develop

eefdb89

fixed special case in isWeak

6aa0a0a

Merge branch 'develop' into intrusive_shamap_inner_final

4d5f016

vvysokikh1 reviewed Jan 2, 2025

View reviewed changes

vlntb added 3 commits January 3, 2025 11:59

removed formatter override

2cdabe6

adding comment

f371d88

Merge branch 'develop' into intrusive_shamap_inner_final

da54c14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Intrusive shamap inner final #5152

Intrusive shamap inner final #5152

vlntb commented Oct 3, 2024

codecov bot commented Oct 14, 2024 •

edited

Loading

vlntb commented Nov 6, 2024

HowardHinnant left a comment

vlntb commented Nov 13, 2024

vlntb commented Nov 21, 2024

vvysokikh1 left a comment

vvysokikh1 Dec 11, 2024

vvysokikh1 Dec 12, 2024

vlntb Dec 16, 2024 •

edited

Loading

vvysokikh1 Dec 16, 2024

vvysokikh1 Dec 12, 2024

vlntb Dec 16, 2024

vvysokikh1 Dec 16, 2024

vlntb Dec 16, 2024

vvysokikh1 Dec 12, 2024

vvysokikh1 Dec 12, 2024

vlntb Dec 16, 2024

vlntb Dec 16, 2024

vvysokikh1 Dec 13, 2024

vlntb Dec 16, 2024

vvysokikh1 Dec 13, 2024

vvysokikh1 Dec 13, 2024

vlntb Dec 16, 2024

vvysokikh1 Dec 13, 2024

vvysokikh1 Dec 13, 2024

vlntb Dec 16, 2024

vvysokikh1 Jan 2, 2025

vlntb Jan 2, 2025 •

edited

Loading

vvysokikh1 Jan 2, 2025

vlntb Jan 3, 2025

vvysokikh1 Jan 2, 2025

Intrusive shamap inner final #5152

Are you sure you want to change the base?

Intrusive shamap inner final #5152

Conversation

vlntb commented Oct 3, 2024

High Level Overview of Change

Context of Change

Type of Change

codecov bot commented Oct 14, 2024 • edited Loading

Codecov Report

vlntb commented Nov 6, 2024

Analysis of Reference Count Ranges for Intrusive Smart Pointers

Background

Questions

Code audit

Strong references

Weak references

Tests

Conclusion

HowardHinnant left a comment

Choose a reason for hiding this comment

vlntb commented Nov 13, 2024

vlntb commented Nov 21, 2024

vvysokikh1 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vlntb Dec 16, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vlntb Jan 2, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Oct 14, 2024 •

edited

Loading

vlntb Dec 16, 2024 •

edited

Loading

vlntb Jan 2, 2025 •

edited

Loading