BucketListDB in-memory Buckets #4630

SirTyson · 2025-01-28T04:38:20Z

Description

Partially resolves #3696

This change refactors the BucketIndex parts of BucketListDB to be more friendly towards the Hot Archive BucketList. This includes cleanups to metrics, which previously only tracked metrics for the main thread access of the BucketList. Now, both BucketList types and background threads record metrics properly.

Additionally, this change removes the IndividualIndex and instead caches small Buckets entirely in-memory so we never read from disk. RangeIndex is largely unchanged but has been renamed to DiskIndex.

A follow up PR will add a random eviction cache to the DiskIndex. I tried to break this up as much as possible, but it was easiest to do the refactor + in-memory buckets at the same time so I didn't have to refactor IndividualIndex.

The BUCKETLIST_DB_INDEX_CUTOFF config setting determines the maximum size at which we keep bucket in-memory. I've set this to 250 MB, which is approximately the first 4-5 levels of the BucketList. This increases total memory consumption of stellar-core from 2.2 GB to 3 GB. This seems reasonable, and we could probably go even higher, but I'm holding off for now as the random eviction cache will further increase memory requirements.

Checklist

Reviewed the contributing document
Rebased on top of master (no merge commits)
Ran clang-format v8.0.0 (via make format or the Visual Studio extension)
Compiles
Ran all tests
If change impacts performance, include supporting evidence per the performance document

src/bucket/SearchableBucketList.cpp

src/bucket/BucketManager.cpp

src/bucket/HotArchiveBucketIndex.cpp

src/bucket/BucketIndexUtils.h

src/bucket/InMemoryIndex.h

# Description Resolves #4633 This PR indexes bucket files during `VerifyBucketsWork`. Previously we would download Buckets, iterate all the buckets to check their hash, then iterate through all the buckets again to index them. Since startup is primarily disk bound, iterating through the entire BucketList twice is expensive. This change does the hash verification and indexing step in the same pass so we only have to read the BucketList once. This introduces no new DOS vectors. Indexing unverified buckets could lead to an OOM based DOS attack, where a malicious History Archive provider hosts malicious buckets that are very large. However, such OOM attacks are already possible via a zip bomb, and History Archive providers are fairly trusted, so this is not a significant concern. To mitigate this I've added an INFO level log message saying what history archive a given file is being downloaded from. In the event of a DOS attack, these logs would give us enough info to quickly assign blame to the attacker and remove them from quorum sets. On my laptop, this decreases startup time from `new-db` by about 16%. Rebased on top of #4630. # Checklist - [x] Reviewed the [contributing](https://github.com/stellar/stellar-core/blob/master/CONTRIBUTING.md#submitting-changes) document - [x] Rebased on top of master (no merge commits) - [x] Ran `clang-format` v8.0.0 (via `make format` or the Visual Studio extension) - [x] Compiles - [x] Ran all tests - [ ] If change impacts performance, include supporting evidence per the [performance document](https://github.com/stellar/stellar-core/blob/master/performance-eval/performance-eval.md)

SirTyson added 3 commits January 28, 2025 11:06

Templated BucketBase

4a7753c

Refactor BucketIndex

a58d9ca

BucketListDB metrics for both BucketList types

36adbfd

SirTyson force-pushed the bl-cache-2 branch from fc0b66b to 3424a6b Compare January 28, 2025 19:36

SirTyson requested review from dmkozh and sisuresh January 28, 2025 22:11

This was referenced Jan 29, 2025

BucketListDB Random Eviction Cache #4632

Merged

Apply buckets optimization #4634

Merged

BucketList cache #4565

Closed

dmkozh reviewed Jan 30, 2025

View reviewed changes

dmkozh approved these changes Jan 31, 2025

View reviewed changes

SirTyson added this pull request to the merge queue Jan 31, 2025

SirTyson removed this pull request from the merge queue due to a manual request Jan 31, 2025

In-memory index

3fc04c1

SirTyson force-pushed the bl-cache-2 branch from 353f8da to 3fc04c1 Compare January 31, 2025 22:16

dmkozh approved these changes Jan 31, 2025

View reviewed changes

SirTyson enabled auto-merge January 31, 2025 22:17

SirTyson added this pull request to the merge queue Jan 31, 2025

Merged via the queue into stellar:master with commit 96a822e Feb 1, 2025
13 checks passed

SirTyson deleted the bl-cache-2 branch February 1, 2025 00:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BucketListDB in-memory Buckets #4630

BucketListDB in-memory Buckets #4630

SirTyson commented Jan 28, 2025

BucketListDB in-memory Buckets #4630

BucketListDB in-memory Buckets #4630

Conversation

SirTyson commented Jan 28, 2025

Description

Checklist