Ensure captive core LCL is not ahead of horizon last ingested ledger during restarts #5123

tamirms · 2023-11-21T13:05:50Z

If Horizon is running captive core with BucketsDB enabled, Horizon should be able to restart quickly as long as we run captive core so that it resumes from the LCL (last closed ledger) recorded in the core DB. We have code in Horizon's ingestion library to detect this case:

https://github.com/stellar/go/blob/horizon-v2.27.0-rc2/ingest/ledgerbackend/stellar_core_runner.go#L376-L379

But, there is a scenario where the LCL can be ahead of the last ingested ledger in Horizon's DB:

When captive core writes a ledger to the file pipe, the captive core process (which is single threaded) blocks until the ledger is read out of the file pipe. However, Horizon mitigates the possibility that core is blocked on writing ledgers over the pipe by continuously reading ledgers out of the file pipe until an in-memory buffer is filled up:

https://github.com/stellar/go/blob/horizon-v2.27.0-rc2/ingest/ledgerbackend/buffered_meta_pipe_reader.go

If the speed of Horizon ingestion is faster than the rate at which captive core emits ledgers, the in-memory ledger buffer will not have a chance to accrue many ledgers. But there is always the possibility that, when Horizon receives the shutdown signal, the in-memory ledger buffer will have accumulated some ledgers. In that scenario, the ledgers stored in the buffer are discarded which means that the LCL in the captive core DB is ahead of Horizon's last ingested ledger.

We should mitigate this scenario where the LCL is ahead of horizon's DB so that Horizon can be restarted quickly.

tamirms · 2023-11-21T13:11:22Z

We could modify Horizon ingestion's termination logic so that we ingest whatever ledgers are left in the in-memory buffer before shutting down. But, I think that will be tricky to implement because of concurrency issues and coupling between the Horizon ingestion state machine and the captive core process manager.

Another approach which I think could be promising is to save the ledgers from the in-memory buffer into a file before shutting down. Then when Horizon is restarted it can read the ledgers from the file and insert them into the in-memory buffer before starting up captive core.

mollykarcher · 2023-11-21T16:20:20Z

Horizon mitigates the possibility that core is blocked on writing ledgers over the pipe by continuously reading ledgers out of the file pipe until an in-memory buffer is filled up

I wonder if we capture metrics on how often it happens that this in-memory buffer is used. Presumably/theoretically, if Horizon is ingesting faster than captive-core is generating ledgers, it shouldn't happen often (or at all?), but obviously we added this for a reason.

I ask this because maybe we should explore if it has any utility to us at all, and if so, if we should even have it. There was a recent slack conversation, where @graydon actually pointed out that core has/had attempted to design captive core so as to remove the possibility of this happening (in the case of core itself crashing). And it seems like we re-introduced that possibility by the introduction of this in-memory queue. Context:

...architecturally we also really want to have a moment where we're sure we've handed off a given txmeta (and flushed the stream, and know the reader got it) before we issue a durable commit to disk. because if we don't, and we commit first and then crash while flushing to horizon, if we restart we'll be a ledger ahead and horizon won't be able to recover that ledger from us.

graydon · 2023-11-21T17:25:54Z

Yeah the metadata pipe was made synchronous and contains a synchronous flush embedded in core's commit path specifically to allow horizon to make a durability guarantee inside the captive's commit cycle, such that this never happens. Horizon "mitigating" this feature by reading ahead into a non-durable (in-memory) buffer is actually defeating that design.

(Thinking about this a bit more, I can see that we might even need a subsequent durability-ack signal from horizon in the form of, say, an additional synchronous dummy frame we write-and-flush and horizon reads in order to ack the previously-flushed non-dummy txmeta; but the idea is absolutely to do-what-is-necessary to prevent his scenario!)

graydon · 2023-11-21T17:29:09Z

(if it's a problem to have a captive block while horizon commits, that's a different kettle of fish; we assumed this is not a problem, since captives are guaranteed not to be validators we assumed that it's ok for them to block a bit / fall behind the network a bit, nobody's waiting for their vote)

mollykarcher · 2023-11-21T18:24:56Z

Point posed by @tamirms at grooming just now. It seems like core commits as soon as Horizon reads from the pipe, however, at that point, Horizon hasn't committed to it's own database. So there's still a possibility that Horizon could crash or be shut down while processing that ledger, and before it commits to it's own database.

The durability-ack idea you propose could potentially solve this. Of course, Horizon would need to remove the in-memory queue on it's side for that to be workable. Either way, it would definitely be valuable for us to understand the history of why that queue was introduced.

tamirms · 2023-11-21T20:04:45Z

I found the issue which motivated adding the queue: #3132

Another way to solve this issue is if core stored the n most recently applied ledgers in its db and allowed catching up from any ledger in the range of [LCL - n, LCL]

sreuland · 2023-11-21T20:06:47Z

if considering an extended custom serial transport protocol over core's meta o/s pipe with ack messaging, it is worth considering oss network libraries that run a framed transport underneath to accomplish the same, like 0mq REQ/REP logical socket pair, which can be applied in a reliable messaging pattern, in context of captive core usage on same-host machine, REQ/REP pair could be bound to ipc or socket transports, the former translates to o/s pipes and the latter to tcp sockets on the host machine, then horizon would be the REQ and the captive core process would be the REP, both using 0mq sdk to do send/recv's in a non-blocking way.

leighmcculloch · 2023-11-21T20:14:12Z

Another way to solve this issue is if core stored the n most recently applied ledgers in its db

Core has the MAX_SLOTS_TO_REMEMBER config, which by default is configured as 12, and indicates the number of ledgers that it remembers.

tamirms · 2023-11-21T20:48:52Z

#4845 added a metric which can be used to calculate how many ledgers are stored in the in-memory queue. Note that these metrics are only available on staging because they have not yet been deployed in production. In the past 30 days we have never accumulated more than 1 ledger in the in-memory queue:

https://grafana.stellar-ops.com/explore?left=%7B%22datasource%22:%22Prometheus%22,%22queries%22:%5B%7B%22expr%22:%22horizon_ingest_captive_stellar_core_latest_ledger%7Bhorizon_network%3D%5C%22pubnet%5C%22,%20environment%3D%5C%22stg%5C%22%7D%5Cn-%5Cn%5Cnhorizon_ingest_local_latest_ledger%7Benvironment%3D~%5C%22stg%5C%22,%20horizon_network%3D~%5C%22pubnet%5C%22%7D%22,%22format%22:%22time_series%22,%22instant%22:false,%22interval%22:%22%22,%22intervalFactor%22:4,%22refId%22:%22A%22,%22datasource%22:%7B%22type%22:%22prometheus%22,%22uid%22:%22000000001%22%7D%7D%5D,%22range%22:%7B%22from%22:%22now-30d%22,%22to%22:%22now%22%7D%7D&orgId=1

tamirms · 2024-02-13T10:40:44Z

We discussed this issue during the onsite last week with the stellar core team. Core cannot restart from a ledger earlier than the LCL because we cannot rewind the state of bucket db to a previous ledger. So it looks like we need to implement a solution on the Horizon side (perhaps one of the two options mentioned in #5123 (comment) )

2opremio · 2024-05-10T10:08:42Z

Note that, even if we remove the explicit queue, we are still using a pipe to read the data (with an associated buffer).

2opremio · 2024-05-10T10:10:03Z

As @sreuland says, replacing the pipe with a two-way communication protocol would solve the problem but that's probably a lot of work to implement and propagate downstream.

jacekn · 2024-05-10T10:12:25Z

As @sreuland says, replacing the pipe with a two-way communication protocol would solve the problem but that's probably a lot of work to implement and propagate downstream.

two-way protocol could possibly allow us to decouple core and horizon or soroban-rpc in the future. This would be a big win as currently we have to bundle core in docker images and there is no way to mange versions independently.

2opremio · 2024-05-10T10:41:56Z

Uhm, you still need to manage the captive core process (restart it on error etc ...)

jacekn · 2024-05-10T12:02:22Z

Uhm, you still need to manage the captive core process (restart it on error etc ...)

Yes absolutely - this would have to be folded into the protocol. Core might also need to grow some additional functionality to allow that. I appreciate that this is hard of course but IMO this is the best long term strategy.
Right now, because of this tight coupling, we have to produce large-ish docker images and they go against best practices of having 1 pid per container. It also complicates our release process because core has to be baked into multiple images. We may be OK leaving it as is but I'm worried that one day we'll have to address this problem at inconvenient time. Doing this proactively would be nice.

tamirms added horizon ingest New ingestion system performance issues aimed at improving performance labels Nov 21, 2023

tamirms added this to Platform Scrum Nov 21, 2023

github-project-automation bot moved this to Backlog in Platform Scrum Nov 21, 2023

mollykarcher moved this from Backlog to Next Sprint Proposal in Platform Scrum Dec 5, 2023

mollykarcher added the objective-8 label Feb 8, 2024

mollykarcher moved this from Next Sprint Proposal to Backlog in Platform Scrum Feb 13, 2024

mollykarcher added cdp-horizon-scrum and removed objective-8 labels Jun 25, 2024

tamirms mentioned this issue Aug 30, 2024

ingest/ledgerbackend: Add prometheus metrics to track captive core startup time #5449

Merged

7 tasks

mollykarcher moved this from Backlog to To Do in Platform Scrum Jan 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ensure captive core LCL is not ahead of horizon last ingested ledger during restarts #5123

Ensure captive core LCL is not ahead of horizon last ingested ledger during restarts #5123

tamirms commented Nov 21, 2023

tamirms commented Nov 21, 2023

mollykarcher commented Nov 21, 2023 •

edited

Loading

graydon commented Nov 21, 2023

graydon commented Nov 21, 2023

mollykarcher commented Nov 21, 2023

tamirms commented Nov 21, 2023

sreuland commented Nov 21, 2023

leighmcculloch commented Nov 21, 2023

tamirms commented Nov 21, 2023

tamirms commented Feb 13, 2024

2opremio commented May 10, 2024

2opremio commented May 10, 2024

jacekn commented May 10, 2024

2opremio commented May 10, 2024

jacekn commented May 10, 2024

Ensure captive core LCL is not ahead of horizon last ingested ledger during restarts #5123

Ensure captive core LCL is not ahead of horizon last ingested ledger during restarts #5123

Comments

tamirms commented Nov 21, 2023

tamirms commented Nov 21, 2023

mollykarcher commented Nov 21, 2023 • edited Loading

graydon commented Nov 21, 2023

graydon commented Nov 21, 2023

mollykarcher commented Nov 21, 2023

tamirms commented Nov 21, 2023

sreuland commented Nov 21, 2023

leighmcculloch commented Nov 21, 2023

tamirms commented Nov 21, 2023

tamirms commented Feb 13, 2024

2opremio commented May 10, 2024

2opremio commented May 10, 2024

jacekn commented May 10, 2024

2opremio commented May 10, 2024

jacekn commented May 10, 2024

mollykarcher commented Nov 21, 2023 •

edited

Loading