Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure captive core LCL is not ahead of horizon last ingested ledger during restarts #5123

Open
tamirms opened this issue Nov 21, 2023 · 15 comments
Labels
cdp-horizon-scrum horizon ingest New ingestion system performance issues aimed at improving performance

Comments

@tamirms
Copy link
Contributor

tamirms commented Nov 21, 2023

If Horizon is running captive core with BucketsDB enabled, Horizon should be able to restart quickly as long as we run captive core so that it resumes from the LCL (last closed ledger) recorded in the core DB. We have code in Horizon's ingestion library to detect this case:

https://github.com/stellar/go/blob/horizon-v2.27.0-rc2/ingest/ledgerbackend/stellar_core_runner.go#L376-L379

But, there is a scenario where the LCL can be ahead of the last ingested ledger in Horizon's DB:

When captive core writes a ledger to the file pipe, the captive core process (which is single threaded) blocks until the ledger is read out of the file pipe. However, Horizon mitigates the possibility that core is blocked on writing ledgers over the pipe by continuously reading ledgers out of the file pipe until an in-memory buffer is filled up:

https://github.com/stellar/go/blob/horizon-v2.27.0-rc2/ingest/ledgerbackend/buffered_meta_pipe_reader.go

If the speed of Horizon ingestion is faster than the rate at which captive core emits ledgers, the in-memory ledger buffer will not have a chance to accrue many ledgers. But there is always the possibility that, when Horizon receives the shutdown signal, the in-memory ledger buffer will have accumulated some ledgers. In that scenario, the ledgers stored in the buffer are discarded which means that the LCL in the captive core DB is ahead of Horizon's last ingested ledger.

We should mitigate this scenario where the LCL is ahead of horizon's DB so that Horizon can be restarted quickly.

@tamirms tamirms added horizon ingest New ingestion system performance issues aimed at improving performance labels Nov 21, 2023
@tamirms
Copy link
Contributor Author

tamirms commented Nov 21, 2023

We could modify Horizon ingestion's termination logic so that we ingest whatever ledgers are left in the in-memory buffer before shutting down. But, I think that will be tricky to implement because of concurrency issues and coupling between the Horizon ingestion state machine and the captive core process manager.

Another approach which I think could be promising is to save the ledgers from the in-memory buffer into a file before shutting down. Then when Horizon is restarted it can read the ledgers from the file and insert them into the in-memory buffer before starting up captive core.

@mollykarcher
Copy link
Contributor

mollykarcher commented Nov 21, 2023

Horizon mitigates the possibility that core is blocked on writing ledgers over the pipe by continuously reading ledgers out of the file pipe until an in-memory buffer is filled up

I wonder if we capture metrics on how often it happens that this in-memory buffer is used. Presumably/theoretically, if Horizon is ingesting faster than captive-core is generating ledgers, it shouldn't happen often (or at all?), but obviously we added this for a reason.

I ask this because maybe we should explore if it has any utility to us at all, and if so, if we should even have it. There was a recent slack conversation, where @graydon actually pointed out that core has/had attempted to design captive core so as to remove the possibility of this happening (in the case of core itself crashing). And it seems like we re-introduced that possibility by the introduction of this in-memory queue. Context:

...architecturally we also really want to have a moment where we're sure we've handed off a given txmeta (and flushed the stream, and know the reader got it) before we issue a durable commit to disk. because if we don't, and we commit first and then crash while flushing to horizon, if we restart we'll be a ledger ahead and horizon won't be able to recover that ledger from us.

@graydon
Copy link
Contributor

graydon commented Nov 21, 2023

Yeah the metadata pipe was made synchronous and contains a synchronous flush embedded in core's commit path specifically to allow horizon to make a durability guarantee inside the captive's commit cycle, such that this never happens. Horizon "mitigating" this feature by reading ahead into a non-durable (in-memory) buffer is actually defeating that design.

(Thinking about this a bit more, I can see that we might even need a subsequent durability-ack signal from horizon in the form of, say, an additional synchronous dummy frame we write-and-flush and horizon reads in order to ack the previously-flushed non-dummy txmeta; but the idea is absolutely to do-what-is-necessary to prevent his scenario!)

@graydon
Copy link
Contributor

graydon commented Nov 21, 2023

(if it's a problem to have a captive block while horizon commits, that's a different kettle of fish; we assumed this is not a problem, since captives are guaranteed not to be validators we assumed that it's ok for them to block a bit / fall behind the network a bit, nobody's waiting for their vote)

@mollykarcher
Copy link
Contributor

Point posed by @tamirms at grooming just now. It seems like core commits as soon as Horizon reads from the pipe, however, at that point, Horizon hasn't committed to it's own database. So there's still a possibility that Horizon could crash or be shut down while processing that ledger, and before it commits to it's own database.

The durability-ack idea you propose could potentially solve this. Of course, Horizon would need to remove the in-memory queue on it's side for that to be workable. Either way, it would definitely be valuable for us to understand the history of why that queue was introduced.

@tamirms
Copy link
Contributor Author

tamirms commented Nov 21, 2023

I found the issue which motivated adding the queue: #3132

Another way to solve this issue is if core stored the n most recently applied ledgers in its db and allowed catching up from any ledger in the range of [LCL - n, LCL]

@sreuland
Copy link
Contributor

if considering an extended custom serial transport protocol over core's meta o/s pipe with ack messaging, it is worth considering oss network libraries that run a framed transport underneath to accomplish the same, like 0mq REQ/REP logical socket pair, which can be applied in a reliable messaging pattern, in context of captive core usage on same-host machine, REQ/REP pair could be bound to ipc or socket transports, the former translates to o/s pipes and the latter to tcp sockets on the host machine, then horizon would be the REQ and the captive core process would be the REP, both using 0mq sdk to do send/recv's in a non-blocking way.

@leighmcculloch
Copy link
Member

Another way to solve this issue is if core stored the n most recently applied ledgers in its db

Core has the MAX_SLOTS_TO_REMEMBER config, which by default is configured as 12, and indicates the number of ledgers that it remembers.

@mollykarcher mollykarcher moved this from Backlog to Next Sprint Proposal in Platform Scrum Dec 5, 2023
@tamirms
Copy link
Contributor Author

tamirms commented Feb 13, 2024

We discussed this issue during the onsite last week with the stellar core team. Core cannot restart from a ledger earlier than the LCL because we cannot rewind the state of bucket db to a previous ledger. So it looks like we need to implement a solution on the Horizon side (perhaps one of the two options mentioned in #5123 (comment) )

@mollykarcher mollykarcher moved this from Next Sprint Proposal to Backlog in Platform Scrum Feb 13, 2024
@2opremio
Copy link
Contributor

Note that, even if we remove the explicit queue, we are still using a pipe to read the data (with an associated buffer).

@2opremio
Copy link
Contributor

As @sreuland says, replacing the pipe with a two-way communication protocol would solve the problem but that's probably a lot of work to implement and propagate downstream.

@jacekn
Copy link
Contributor

jacekn commented May 10, 2024

As @sreuland says, replacing the pipe with a two-way communication protocol would solve the problem but that's probably a lot of work to implement and propagate downstream.

two-way protocol could possibly allow us to decouple core and horizon or soroban-rpc in the future. This would be a big win as currently we have to bundle core in docker images and there is no way to mange versions independently.

@2opremio
Copy link
Contributor

Uhm, you still need to manage the captive core process (restart it on error etc ...)

@jacekn
Copy link
Contributor

jacekn commented May 10, 2024

Uhm, you still need to manage the captive core process (restart it on error etc ...)

Yes absolutely - this would have to be folded into the protocol. Core might also need to grow some additional functionality to allow that. I appreciate that this is hard of course but IMO this is the best long term strategy.
Right now, because of this tight coupling, we have to produce large-ish docker images and they go against best practices of having 1 pid per container. It also complicates our release process because core has to be baked into multiple images. We may be OK leaving it as is but I'm worried that one day we'll have to address this problem at inconvenient time. Doing this proactively would be nice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cdp-horizon-scrum horizon ingest New ingestion system performance issues aimed at improving performance
Projects
Status: Backlog
Development

No branches or pull requests

7 participants