Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In memory testing #4562

Merged
merged 2 commits into from
Dec 13, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions docs/integration.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,9 +17,9 @@ stellar-core generates several types of data that can be used by applications, d

## Ledger State

Full [Ledger](ledger.md) snapshots are available in both:
Full [Ledger](ledger.md) snapshots are available via both:
* [history archives](history.md) (checkpoints, every 64 ledgers, updated every 5 minutes)
* in the case of captive-core (enabled via the `--in-memory` command line option) the ledger is maintained within the stellar-core process and ledger-state need to be tracked as it changes via "meta" updates.
* a stellar-core instance, where the ledger is maintained within the stellar-core process and ledger-state need to be tracked as it changes via "meta" updates.

## Ledger State transition information (transactions, etc)

Expand Down
40 changes: 0 additions & 40 deletions docs/quick-reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -146,46 +146,6 @@ some time, as the entire sequence of ledger _headers_ in the archive (though non
transactions or ledger states) must be downloaded and verified sequentially. It may therefore be
worthwhile to save and reuse such a trusted reference file multiple times before regenerating it.

##### Experimental fast "meta data generation"
`catchup` has a command line flag `--in-memory` that when combined with the
`METADATA_OUTPUT_STREAM` allows a stellar-core instance to stream meta data instead
of using a database as intermediate store.

This has been tested as being orders of magnitude faster for replaying large sections
of history.

If you don't specify any value for stream the command will just replay transactions
in memory and throw away all meta. This can be useful for performance testing the transaction processing subsystem.

The `--in-memory` flag is also supported by the `run` command, which can be used to
run a lightweight, stateless validator or watcher node, and this can be combined with
`METADATA_OUTPUT_STREAM` to stream network activity to another process.

By default, such a stateless node in `run` mode will catch up to the network starting from the
network's most recent checkpoint, but this behaviour can be further modified using two flags
(that must be used together) called `--start-at-ledger <N>` and `--start-at-hash <HEXHASH>`. These
cause the node to start with a fast in-memory catchup to ledger `N` with hash `HEXHASH`, and then
replay ledgers forward to the current state of the network.

A stateless and meta-streaming node can additionally be configured with
`EXPERIMENTAL_PRECAUTION_DELAY_META=true` (if unspecified, the default is
`false`). If `EXPERIMENTAL_PRECAUTION_DELAY_META` is `true`, then the node will
delay emitting meta for a ledger `<N>` until the _next_ ledger, `<N+1>`, closes.
The idea is that if a node suffers local corruption in a ledger because of a
software bug or hardware fault, it will be unable to close the _next_ ledger
because it won't be able to reach consensus with other nodes on the input state
of the next ledger. Therefore, the meta for the corrupted ledger will never be
emitted. With `EXPERIMENTAL_PRECAUTION_DELAY_META` set to `false`, a local
corruption bug could cause a node to emit meta that is inconsistent with that of
other nodes on the network. Setting `EXPERIMENTAL_PRECAUTION_DELAY_META` to
`true` does have a cost, though: clients waiting for the meta to determine the
result of a transaction will have to wait for an extra ledger close duration.

During catchup from history archives, a stateless node will emit meta for any
historical ledger without delay, even if `EXPERIMENTAL_PRECAUTION_DELAY_META` is
`true`, because the ledger's results are already part of the validated consensus
history.

#### Publish backlog
There is a command `publish` that allows to flush the publish backlog without starting
core. This can be useful to run to guarantee that certain tasks are done before moving
Expand Down
6 changes: 0 additions & 6 deletions docs/software/commands.md
Original file line number Diff line number Diff line change
Expand Up @@ -160,12 +160,6 @@ apply.
* **run**: Runs stellar-core service.<br>
Option **--wait-for-consensus** lets validators wait to hear from the network
before participating in consensus.<br>
(deprecated) Option **--in-memory** stores the current ledger in memory rather than a
database.<br>
(deprecated) Option **--start-at-ledger <N>** starts **--in-memory** mode with a catchup to
ledger **N** then replays to the current state of the network.<br>
(deprecated) Option **--start-at-hash <HASH>** provides a (mandatory) hash for the ledger
**N** specified by the **--start-at-ledger** option.
* **sec-to-pub**: Reads a secret key on standard input and outputs the
corresponding public key. Both keys are in Stellar's standard
base-32 ASCII format.
Expand Down
13 changes: 0 additions & 13 deletions docs/stellar-core_example.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -601,19 +601,6 @@ MAX_SLOTS_TO_REMEMBER=12
# only a passive "watcher" node.
METADATA_OUTPUT_STREAM=""

# Setting EXPERIMENTAL_PRECAUTION_DELAY_META to true causes a stateless node
# which is streaming meta to delay streaming the meta for a given ledger until
# it closes the next ledger. This ensures that if a local bug had corrupted the
# given ledger, then the meta for the corrupted ledger will never be emitted, as
# the node will not be able to reach consensus with the network on the next
# ledger.
#
# Setting EXPERIMENTAL_PRECAUTION_DELAY_META to true in combination with a
# non-empty METADATA_OUTPUT_STREAM (which can be configured on the command line
# as well as in the config file) requires an in-memory database (specified by
# using --in-memory on the command line).
EXPERIMENTAL_PRECAUTION_DELAY_META=false

# Number of ledgers worth of transaction metadata to preserve on disk for
# debugging purposes. These records are automatically maintained and rotated
# during processing, and are helpful for recovery in case of a serious error;
Expand Down
2 changes: 2 additions & 0 deletions src/bucket/BucketApplicator.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -110,11 +110,13 @@ BucketApplicator::advance(BucketApplicator::Counters& counters)
// directly instead of creating a temporary inner LedgerTxn
// as "advance" commits changes during each step this does not introduce any
// new failure mode
#ifdef BUILD_TESTS
if (mApp.getConfig().MODE_USES_IN_MEMORY_LEDGER)
{
ltx = static_cast<AbstractLedgerTxn*>(&root);
}
else
#endif
{
innerLtx = std::make_unique<LedgerTxn>(root, false);
ltx = innerLtx.get();
Expand Down
15 changes: 2 additions & 13 deletions src/bucket/BucketManager.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -164,10 +164,6 @@ BucketManager::BucketManager(Application& app)
app.getMetrics().NewCounter({"bucketlist-archive", "size", "bytes"}))
, mBucketListEvictionCounters(app)
, mEvictionStatistics(std::make_shared<EvictionStatistics>())
// Minimal DB is stored in the buckets dir, so delete it only when
// mode does not use minimal DB
, mDeleteEntireBucketDirInDtor(
app.getConfig().isInMemoryModeWithoutMinimalDB())
, mConfig(app.getConfig())
{
for (uint32_t t =
Expand Down Expand Up @@ -259,15 +255,8 @@ BucketManager::getBucketDir() const

BucketManager::~BucketManager()
{
ZoneScoped;
if (mDeleteEntireBucketDirInDtor)
{
deleteEntireBucketDir();
}
else
{
deleteTmpDirAndUnlockBucketDir();
}

deleteTmpDirAndUnlockBucketDir();
}

void
Expand Down
1 change: 0 additions & 1 deletion src/bucket/BucketManager.h
Original file line number Diff line number Diff line change
Expand Up @@ -106,7 +106,6 @@ class BucketManager : NonMovableOrCopyable

std::future<EvictionResult> mEvictionFuture{};

bool const mDeleteEntireBucketDirInDtor;
// Copy app's config for thread-safe access
Config const mConfig;

Expand Down
2 changes: 1 addition & 1 deletion src/bucket/test/BucketListTests.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -861,7 +861,7 @@ TEST_CASE("BucketList check bucket sizes", "[bucket][bucketlist][count]")
TEST_CASE_VERSIONS("network config snapshots BucketList size", "[bucketlist]")
{
VirtualClock clock;
Config cfg(getTestConfig(0, Config::TESTDB_IN_MEMORY_NO_OFFERS));
Config cfg(getTestConfig(0, Config::TESTDB_IN_MEMORY));
cfg.USE_CONFIG_FOR_GENESIS = true;

auto app = createTestApplication<BucketTestApplication>(clock, cfg);
Expand Down
2 changes: 1 addition & 1 deletion src/bucket/test/BucketManagerTests.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -501,7 +501,7 @@ TEST_CASE("bucketmanager do not leak empty-merge futures",
// are thereby not leaking. Disable BucketListDB so that snapshots do not
// hold persist buckets, complicating bucket counting.
VirtualClock clock;
Config cfg(getTestConfig(0, Config::TESTDB_IN_MEMORY_NO_OFFERS));
Config cfg(getTestConfig(0, Config::TESTDB_IN_MEMORY));
cfg.ARTIFICIALLY_PESSIMIZE_MERGES_FOR_TESTING = true;
cfg.TESTING_UPGRADE_LEDGER_PROTOCOL_VERSION =
static_cast<uint32_t>(
Expand Down
76 changes: 0 additions & 76 deletions src/bucket/test/BucketTests.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1011,79 +1011,3 @@ TEST_CASE_VERSIONS("merging bucket entries with initentry with shadows",
}
});
}

TEST_CASE_VERSIONS("legacy bucket apply", "[bucket]")
{
VirtualClock clock;
Config cfg(getTestConfig(0, Config::TESTDB_IN_MEMORY_OFFERS));
for_versions_with_differing_bucket_logic(cfg, [&](Config const& cfg) {
Application::pointer app = createTestApplication(clock, cfg);

std::vector<LedgerEntry> live(10), noLive;
std::vector<LedgerKey> dead, noDead;

for (auto& e : live)
{
e.data.type(ACCOUNT);
auto& a = e.data.account();
a = LedgerTestUtils::generateValidAccountEntry(5);
a.balance = 1000000000;
dead.emplace_back(LedgerEntryKey(e));
}

std::shared_ptr<LiveBucket> birth = LiveBucket::fresh(
app->getBucketManager(), getAppLedgerVersion(app), {}, live, noDead,
/*countMergeEvents=*/true, clock.getIOContext(),
/*doFsync=*/true);

std::shared_ptr<LiveBucket> death = LiveBucket::fresh(
app->getBucketManager(), getAppLedgerVersion(app), {}, noLive, dead,
/*countMergeEvents=*/true, clock.getIOContext(),
/*doFsync=*/true);

CLOG_INFO(Bucket, "Applying bucket with {} live entries", live.size());
birth->apply(*app);
{
auto count = app->getLedgerTxnRoot().countObjects(ACCOUNT);
REQUIRE(count == live.size() + 1 /* root account */);
}

CLOG_INFO(Bucket, "Applying bucket with {} dead entries", dead.size());
death->apply(*app);
{
auto count = app->getLedgerTxnRoot().countObjects(ACCOUNT);
REQUIRE(count == 1 /* root account */);
}
});
}

TEST_CASE("bucket apply bench", "[bucketbench][!hide]")
{
auto runtest = [](Config::TestDbMode mode) {
VirtualClock clock;
Config cfg(getTestConfig(0, mode));
Application::pointer app = createTestApplication(clock, cfg);

std::vector<LedgerEntry> live(100000);
std::vector<LedgerKey> noDead;

for (auto& l : live)
{
l.data.type(ACCOUNT);
auto& a = l.data.account();
a = LedgerTestUtils::generateValidAccountEntry(5);
}

std::shared_ptr<LiveBucket> birth = LiveBucket::fresh(
app->getBucketManager(), getAppLedgerVersion(app), {}, live, noDead,
/*countMergeEvents=*/true, clock.getIOContext(),
/*doFsync=*/true);

CLOG_INFO(Bucket, "Applying bucket with {} live entries", live.size());
// note: we do not wrap the `apply` call inside a transaction
// as bucket applicator commits to the database incrementally
birth->apply(*app);
};

runtest(Config::TESTDB_BUCKET_DB_PERSISTENT);
}
14 changes: 13 additions & 1 deletion src/catchup/ApplyBucketsWork.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,8 @@ ApplyBucketsWork::ApplyBucketsWork(
, mLevel(startingLevel())
, mMaxProtocolVersion(maxProtocolVersion)
, mCounters(app.getClock().now())
, mIsApplyInvariantEnabled(
app.getInvariantManager().isBucketApplyInvariantEnabled())
{
}

Expand Down Expand Up @@ -111,6 +113,7 @@ ApplyBucketsWork::doReset()
mLastPos = 0;
mBucketToApplyIndex = 0;
mMinProtocolVersionSeen = UINT32_MAX;
mSeenKeysBeforeApply.clear();
mSeenKeys.clear();
mBucketsToApply.clear();
mBucketApplicator.reset();
Expand Down Expand Up @@ -201,6 +204,14 @@ ApplyBucketsWork::startBucket()
auto bucket = mBucketsToApply.at(mBucketToApplyIndex);
mMinProtocolVersionSeen =
std::min(mMinProtocolVersionSeen, bucket->getBucketVersion());

// Take a snapshot of seen keys before applying the bucket, only if
// invariants are enabled since this is expensive.
if (mIsApplyInvariantEnabled)
{
mSeenKeysBeforeApply = mSeenKeys;
}

// Create a new applicator for the bucket.
mBucketApplicator = std::make_unique<BucketApplicator>(
mApp, mMaxProtocolVersion, mMinProtocolVersionSeen, mLevel, bucket,
Expand Down Expand Up @@ -297,7 +308,8 @@ ApplyBucketsWork::doWork()
// bucket.
mApp.getInvariantManager().checkOnBucketApply(
mBucketsToApply.at(mBucketToApplyIndex),
mApplyState.currentLedger, mLevel, isCurr, mEntryTypeFilter);
mApplyState.currentLedger, mLevel, isCurr,
mSeenKeysBeforeApply);
prepareForNextBucket();
}
if (!appliedAllBuckets())
Expand Down
2 changes: 2 additions & 0 deletions src/catchup/ApplyBucketsWork.h
Original file line number Diff line number Diff line change
Expand Up @@ -38,12 +38,14 @@ class ApplyBucketsWork : public Work
uint32_t mLevel{0};
uint32_t mMaxProtocolVersion{0};
uint32_t mMinProtocolVersionSeen{UINT32_MAX};
std::unordered_set<LedgerKey> mSeenKeysBeforeApply;
std::unordered_set<LedgerKey> mSeenKeys;
std::vector<std::shared_ptr<LiveBucket>> mBucketsToApply;
std::unique_ptr<BucketApplicator> mBucketApplicator;
bool mDelayChecked{false};

BucketApplicator::Counters mCounters;
bool const mIsApplyInvariantEnabled;

void advance(std::string const& name, BucketApplicator& applicator);
std::shared_ptr<LiveBucket> getBucket(std::string const& bucketHash);
Expand Down
8 changes: 4 additions & 4 deletions src/database/test/DatabaseTests.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ transactionTest(Application::pointer app)

TEST_CASE("database smoketest", "[db]")
{
Config const& cfg = getTestConfig(0, Config::TESTDB_IN_MEMORY_OFFERS);
Config const& cfg = getTestConfig(0, Config::TESTDB_IN_MEMORY);

VirtualClock clock;
Application::pointer app = createTestApplication(clock, cfg, true, false);
Expand All @@ -81,7 +81,7 @@ TEST_CASE("database smoketest", "[db]")

TEST_CASE("database on-disk smoketest", "[db]")
{
Config const& cfg = getTestConfig(0, Config::TESTDB_ON_DISK_SQLITE);
Config const& cfg = getTestConfig(0, Config::TESTDB_BUCKET_DB_PERSISTENT);

VirtualClock clock;
Application::pointer app = createTestApplication(clock, cfg, true, false);
Expand Down Expand Up @@ -201,7 +201,7 @@ checkMVCCIsolation(Application::pointer app)

TEST_CASE("sqlite MVCC test", "[db]")
{
Config const& cfg = getTestConfig(0, Config::TESTDB_ON_DISK_SQLITE);
Config const& cfg = getTestConfig(0, Config::TESTDB_BUCKET_DB_PERSISTENT);
VirtualClock clock;
Application::pointer app = createTestApplication(clock, cfg, true, false);
checkMVCCIsolation(app);
Expand Down Expand Up @@ -349,7 +349,7 @@ TEST_CASE("postgres performance", "[db][pgperf][!hide]")

TEST_CASE("schema test", "[db]")
{
Config const& cfg = getTestConfig(0, Config::TESTDB_IN_MEMORY_OFFERS);
Config const& cfg = getTestConfig(0, Config::TESTDB_IN_MEMORY);

VirtualClock clock;
Application::pointer app = createTestApplication(clock, cfg);
Expand Down
Loading
Loading