-
Notifications
You must be signed in to change notification settings - Fork 973
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Captive core fast startup #2994
Captive core fast startup #2994
Conversation
1300480
to
af73b1b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a few suggestions to make the code flow easier to understand
src/main/ApplicationUtils.cpp
Outdated
@@ -131,6 +155,63 @@ runWithConfig(Config cfg, optional<CatchupConfiguration> cc) | |||
return 0; | |||
} | |||
|
|||
int | |||
rebuildInMemoryLedger(Application& app) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks more or less like rebuildLedgerFromBuckets
, should be refactored
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(NB: there are bugs in the other one)
src/main/ApplicationUtils.cpp
Outdated
auto checkBucket = [&](std::string bucketStr) { | ||
auto bucketHash = hexToBin256(bucketStr); | ||
auto bucket = app.getBucketManager().getBucketByHash(bucketHash); | ||
if (!bucket || (isZero(bucket->getHash()) && !isZero(bucketHash))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why || (isZero(bucket->getHash()) && !isZero(bucketHash)))
?
that doesn't seem to be related to on disk/not on disk (covered by !bucket
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you look at the implementation of getBucketByHash
? It creates an empty bucket pointer if it can't find the bucket, so checking !bucket
is not enough.
That being said, looking at the last known ledger loading code, we already check for missing buckets there, so I think I don't even need this check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you look at the implementation of
getBucketByHash
? It creates an empty bucket pointer if it can't find the bucket, so checking !bucket is not enough.
yes I did, when it doesn't find the file, it does return std::shared_ptr<Bucket>();
which is nullptr
not an empty bucket.
other places in the code are only checking for nullptr
But yeah maybe we don't care anyways
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my bad, for some reason I was convinced that function uses std::make_shared
. Sorry about that. But yeah, I removed this code now.
src/main/CommandLine.cpp
Outdated
cc->toLedger() - lcl > RESTORE_STATE_LEDGER_WINDOW) | ||
{ | ||
LOG_INFO(DEFAULT_LOG, "Cannot restore the in-memory state, " | ||
"rebuilding the state from scratch"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be better to not use pass an extra flag to runWithConfig
and just reset the app to genesis right here
src/main/CommandLine.cpp
Outdated
"In-memory database not found, creating one..."); | ||
app = Application::create(clock, cfg, /* newDB */ true); | ||
} | ||
app->start(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this code should be in the try
block: if we don't find the mini database
we should proceed like normal as we're starting up at genesis
src/main/CommandLine.cpp
Outdated
optional<CatchupConfiguration> | ||
maybeEnableInMemoryLedgerMode(Config& config, bool inMemory, | ||
uint32_t startAtLedger, | ||
std::string const& startAtHash) | ||
std::string const& startAtHash, | ||
bool persistPartially = false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do not use default value here
src/main/CommandLine.cpp
Outdated
return 0; | ||
}); | ||
auto resetParser = [](bool& resetInMemoryState) { | ||
return clara::Opt{resetInMemoryState}["--reset-for-in-memory-mode"]( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should use inMemoryParser
: it was added so that we use the same option across subcommands
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The call new-db --in-memory
is confusing and contradicts your other comment about calling the database "an in-memory database". The flag is called "in-memory", however the command sets a persistent state, that the next commands also depend on. I'd like to be as clear to the users as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest reusing the term you're using above: minimal. I.e. stellar-core new-db --minimal-for-in-memory-mode
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I was especially concerned by the word "reset" here because this suggests a non-minimal DB can be changed / reset into a minimal one, and I don't think that's possible?)
src/main/CommandLine.cpp
Outdated
// aggressively so that we only store a few ledgers worth of data | ||
config.AUTOMATIC_MAINTENANCE_PERIOD = std::chrono::seconds(30); | ||
config.AUTOMATIC_MAINTENANCE_COUNT = 100; | ||
config.DISABLE_XDR_FSYNC = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you should not touch DISABLE_XDR_FSYNC
here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Picked up the latest master, which already addresses this problem but not forcing DISABLE_XDR_FSYNC
, so I believe this is resolved now.
@graydon may have better insight than me |
af73b1b
to
6446e50
Compare
src/main/CommandLine.cpp
Outdated
auto resetParser = [](bool& resetInMemoryState) { | ||
return clara::Opt{resetInMemoryState}["--reset-for-in-memory-mode"]( | ||
"Reset the special database used only for in-memory mode (see " | ||
"--replay-in-memory flag"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the --replay-in-memory
option is itself marked as deprecated; should reference --in-memory
here.
src/main/ApplicationUtils.cpp
Outdated
} | ||
|
||
void | ||
setupMinimalDB(Config const& cfg, uint32_t startAtLedger) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the term "minimal DB" you're using here -- I would repeat / reuse / surface this term to the user. It is clear and can be extended into the phrase "minimal DB for in-memory mode".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Maybe also extend the name of this function so any reader immediately knows it's about in-memory mode: setupMinimalDBForInMemoryMode
src/main/CommandLine.cpp
Outdated
uint32_t startAtLedger, | ||
std::string const& startAtHash) | ||
std::string | ||
partialDBForInMemoryMode(Config const& cfg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename "partial" here to "minimal" for consistency.
src/main/CommandLine.cpp
Outdated
std::string | ||
partialDBForInMemoryMode(Config const& cfg) | ||
{ | ||
return fmt::format("sqlite3://{}/partial.db", cfg.BUCKET_DIR_PATH); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and partial.db
here to minimal.db
src/main/CommandLine.cpp
Outdated
@@ -957,11 +970,26 @@ int | |||
runNewDB(CommandLineArgs const& args) | |||
{ | |||
CommandLine::ConfigOption configOption; | |||
bool resetInMemoryState = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename variable: minimalForInMemoryMode
src/main/ApplicationUtils.cpp
Outdated
@@ -130,6 +182,35 @@ runWithConfig(Config cfg, std::optional<CatchupConfiguration> cc) | |||
return 0; | |||
} | |||
|
|||
bool | |||
rebuildLatestState(Application& app) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename function for specificity: applyBucketsOfLastClosedLedger
auto has = app.getLedgerManager().getLastClosedLedgerHAS(); | ||
auto lclHash = | ||
app.getPersistentState().getState(PersistentState::kLastClosedLedger); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a releaseAssertOrThrow
here that the ledger tables are all empty (using LedgerTxn::countObjects
). We never want this code to run on a nonempty ledger.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this sounds like a useful check, but the problem is that calling countObjects
is not allowed on non-root LedgerTxn, which is what we use in in-memory mode (that is, the never committing LedgerTxn). There doesn't seem to be any other machinery to verify LedgerTxn's state (we could add some, but I don't think we should at this time). I will add a comment about this though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with Marta, I think it's fine as is for now
@@ -392,6 +392,24 @@ ApplicationImpl::~ApplicationImpl() | |||
LOG_INFO(DEFAULT_LOG, "Application destroyed"); | |||
} | |||
|
|||
void | |||
ApplicationImpl::resetDBForInMemoryMode() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This worries me / reminds me a bit of a "schema migration", or just .. an alternative path for creating a DB (albeit a minimal one) that does not necessarily arrive at the same place we'd arrive if we reinitialized. I gather you're trying to preserve the overlay state here (but nothing else); would it be possible to do that explicitly? That is:
- load all the overlay state you want to preserve into a temporary data structure in memory.
- reinitialize the database using the normal code path.
- reinsert just the overlay state you wanted to preserve (data which we're a bit lax about anyways)
src/main/ApplicationUtils.cpp
Outdated
int | ||
runWithConfig(Config cfg, std::optional<CatchupConfiguration> cc) | ||
bool | ||
canRebuildFromBuckets(uint32_t startAtLedger, uint32_t lcl) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename for specificity: canRestartInMemoryLedgerFromOldBuckets
void | ||
setupMinimalDB(Config const& cfg, uint32_t startAtLedger) | ||
{ | ||
VirtualClock clock; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a releaseAssert
that we're in in-memory mode.
src/main/ApplicationUtils.cpp
Outdated
setupApp(Config& cfg, VirtualClock& clock, bool inMemory, | ||
uint32_t startAtLedger, std::string const& startAtHash) | ||
{ | ||
if (inMemory) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
separate flag here seems redundant -- isn't this just cfg.isInMemoryMode()
?
auto work = app.getWorkScheduler().scheduleWork<ApplyBucketsWork>( | ||
buckets, has, maxProtocolVersion); | ||
|
||
while (app.getClock().crank(true) && !work->isDone()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aside (not critical): we do this pattern in quite a few places, I wonder if it'd be worth adding a VirtualClock::crankUntilWorkDone(BasicWork&)
or something.
src/bucket/BucketManagerImpl.cpp
Outdated
, mDeleteEntireBucketDirInDtor(app.getConfig().isInMemoryMode()) | ||
, mDeleteEntireBucketDirInDtor( | ||
app.getConfig().isInMemoryMode() && | ||
!app.getConfig().MODE_STORES_HISTORY_LEDGERHEADERS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A comment would be nice here explaining why !MODE_STORES_HISTORY_LEDGERHEADERS
relates to the deletion of entries in the bucketlist; or perhaps a helper method like bool Config::isInMemoryModeWithoutMinimalDB() const { return isInMemoryMode() && !MODE_STORES_HISTORY_LEDGERHEADERS; }
or something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on adding a helper function for clarity
@@ -212,6 +212,9 @@ class Config : public std::enable_shared_from_this<Config> | |||
// fees, and scp history in the database | |||
bool MODE_STORES_HISTORY; | |||
|
|||
// A config parameter that stores ledger headers in the database |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extend this comment to include the fact that this exists in order to support a sub-mode of in-memory mode, where MODE_STORES_HISTORY
is false but MODE_STORES_HISTORY_LEDGERHEADERS
remains true. That is: that there is no legal configuration where MODE_STORES_HISTORY_LEDGERHEADERS
is false but MODE_STORES_HISTORY
is true. Or in yet other words, that only 3 of the 4 states of these 2 flags are legal.
(Also option to discuss here: I know we've gone back and forth on orthogonal flags vs. enumerated modes, but maybe this is as good as any to ask whether to reconsider merging MODE_USES_IN_MEMORY_LEDGER
, MODE_STORES_HISTORY
and MODE_STORES_HISTORY_LEDGERHEADERS
back into a single enum. I'm not sure which of the 8 possible combinations is legal off the top of my head, which is not a great sign.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mmm, I don't know @graydon : we used to have those "uber flags" used in the code base and it was actually very hard to understand which parts of the code had to change when touching things.
What we may want to do in this commit is probably change MODE_STORES_HISTORY
to something that doesn't overlap with MODE_STORES_HISTORY_LEDGERHEADERS
(right now it's hard to tell how they relate to each other: are they mutually exclusive?!).
So if we rename MODE_STORES_HISTORY
to something like MODE_STORES_HISTORY_MISC
(and introduce MODE_STORES_HISTORY_LEDGERHEADERS
before) we can make it clear that they complement each other.
We can add helper functions to help check if any historical data is enabled (that seems to be the common case), or maybe to enable/disable "persist all historical data" (to flip both at the same time).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea to rename MODE_STORES_HISTORY
. I ended up implementing modeStoresAllHistory
and modeStoresAnyHistory
, and that seems to improve clarity quite a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking good!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks promising! Fairly subtle changes though -- I've marked places where I think things could maybe be made a little more obvious for our future selves when we revisit these treacherous paths (as we surely will). Mostly just cosmetic though, I think there's only one structural / logic change around the question of how to most-safely "reset" vs. "reinitialize + reinsert" a minimal DB that can't be reused.
@@ -212,6 +212,9 @@ class Config : public std::enable_shared_from_this<Config> | |||
// fees, and scp history in the database | |||
bool MODE_STORES_HISTORY; | |||
|
|||
// A config parameter that stores ledger headers in the database |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mmm, I don't know @graydon : we used to have those "uber flags" used in the code base and it was actually very hard to understand which parts of the code had to change when touching things.
What we may want to do in this commit is probably change MODE_STORES_HISTORY
to something that doesn't overlap with MODE_STORES_HISTORY_LEDGERHEADERS
(right now it's hard to tell how they relate to each other: are they mutually exclusive?!).
So if we rename MODE_STORES_HISTORY
to something like MODE_STORES_HISTORY_MISC
(and introduce MODE_STORES_HISTORY_LEDGERHEADERS
before) we can make it clear that they complement each other.
We can add helper functions to help check if any historical data is enabled (that seems to be the common case), or maybe to enable/disable "persist all historical data" (to flip both at the same time).
src/bucket/BucketManagerImpl.cpp
Outdated
, mDeleteEntireBucketDirInDtor(app.getConfig().isInMemoryMode()) | ||
, mDeleteEntireBucketDirInDtor( | ||
app.getConfig().isInMemoryMode() && | ||
!app.getConfig().MODE_STORES_HISTORY_LEDGERHEADERS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on adding a helper function for clarity
src/main/ApplicationImpl.cpp
Outdated
@@ -439,10 +439,13 @@ ApplicationImpl::validateAndLogConfig() | |||
|
|||
if (getHistoryArchiveManager().hasAnyWritableHistoryArchive()) | |||
{ | |||
if (!mConfig.MODE_STORES_HISTORY) | |||
if (!mConfig.MODE_STORES_HISTORY || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see my other comment: what we're looking for here is to check that if publish is enabled we better store all historical data. So a helper function here would help
src/main/CommandHandler.cpp
Outdated
@@ -75,7 +75,8 @@ CommandHandler::CommandHandler(Application& app) : mApp(app) | |||
|
|||
mServer->add404(std::bind(&CommandHandler::fileNotFound, this, _1, _2)); | |||
|
|||
if (mApp.getConfig().MODE_STORES_HISTORY) | |||
if (mApp.getConfig().MODE_STORES_HISTORY && | |||
mApp.getConfig().MODE_STORES_HISTORY_LEDGERHEADERS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be an ||
as maintenance
related endpoints should work regardless of which historical data is in there
6446e50
to
dc88ce5
Compare
@graydon @MonsieurNicolas thanks for the feedback! I believe I addressed your comments in several new commits (I haven't squashed the changes yet, so that it's easy to review). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks very close to something we can merge. Good job. I added a couple suggestions/questions related to potentially bad things that we're carrying over from master
src/main/ApplicationImpl.cpp
Outdated
throw std::invalid_argument( | ||
"Core is not configured to store history, but " | ||
"some history archives are writable (see " | ||
"MODE_STORES_HISTORY_MISC " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually this error message is weird (and got carried over from before):
end users to not have control over this, so the error should hint more about something they can fix.
I think the error message should just be something like
"Core is not configured to store history, but some history archives are writable"
auto has = app.getLedgerManager().getLastClosedLedgerHAS(); | ||
auto lclHash = | ||
app.getPersistentState().getState(PersistentState::kLastClosedLedger); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with Marta, I think it's fine as is for now
|
||
// As the local HAS might have merges in progress, let | ||
// `prepareForPublish` convert it into a "valid historical HAS". | ||
has.prepareForPublish(app); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another "carry over".
Is this has.prepareForPublish(app);
really needed?
The current ledger state should never depend on merges in progress (when we close a ledger we have already computed the current ledger) - I think the only thing that this call is going to do is potentially cause a stall when restarting because it will perform merges that it doesn't need to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is currently needed since ApplyBucketsWork is designed to apply buckets from an untrusted source, so it verifies validity of the HAS. That being said, I think the check could be moved out of ApplyBucketsWork into catchup. I opened #3044 to track this.
src/main/CommandLine.cpp
Outdated
{ | ||
LOG_WARNING(DEFAULT_LOG, | ||
"Using MANUAL_CLOSE and RUN_STANDALONE " | ||
"together " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this string got chopped-formatted into too many bits. Maybe you need to merge it back into a single line and re-run clang-format?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, you're right, looks much better now.
@@ -212,6 +212,9 @@ class Config : public std::enable_shared_from_this<Config> | |||
// fees, and scp history in the database | |||
bool MODE_STORES_HISTORY; | |||
|
|||
// A config parameter that stores ledger headers in the database |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking good!
src/overlay/PeerManager.cpp
Outdated
{ | ||
ZoneScoped; | ||
std::vector<std::pair<PeerBareAddress, PeerRecord>> result; | ||
std::string sql = "SELECT * FROM peers"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
avoid using SELECT *
as it's a footgun, just list the column names that you expect
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good call, updated.
dc88ce5
to
7cde809
Compare
I believe this is ready now @MonsieurNicolas, I've squashed the commits as well. |
r+ 7cde809 |
@latobarita: retry |
@latobarita: retry |
Implementation based on the discussion in #2960
Resolves #2960
Resolves #2993