-
Notifications
You must be signed in to change notification settings - Fork 499
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
services/horizon/internal/ingest: reap lookup tables without blocking ingestion #5405
Conversation
37b3e5b
to
c671a7a
Compare
c671a7a
to
93be300
Compare
services/horizon/internal/db2/history/loader_concurrency_test.go
Outdated
Show resolved
Hide resolved
services/horizon/internal/db2/history/loader_concurrency_test.go
Outdated
Show resolved
Hide resolved
one edge case wanted to check on, if a user reingests an older range which goes further back than the retention period cutoff, and reaping for data and lookup tables has already completed for that retention period, will the next iteration of lookup reaper sense those and delete the qualified(orhpaned) lookup ids in that case? I ask b/c of the offsets for reapers that are stored in key-value, it seems like once those advance, the reaper won't inspect that older id range anymore? |
Co-authored-by: shawn <sreuland@users.noreply.github.com>
Co-authored-by: shawn <sreuland@users.noreply.github.com>
no, in that scenario those rows will not be deleted in the next iteration. However, eventually the reaper will traverse through all rows from the history lookup tables. Once it does that, the reaper will start from 0 again. So, eventually the reaper will wrap around and pickup those orphaned rows (though it might take a long time to do so for very large tables like history_claimable_balances) |
@sreuland I believe I have addressed your feedback. PTAL, thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks great, nice work!
PR Checklist
PR Structure
otherwise).
services/friendbot
, orall
ordoc
if the changes are broad or impact manypackages.
Thoroughness
.md
files, etc... affected by this change). Take a look in the
docs
folder for a given service,like this one.
Release planning
needed with deprecations, added features, breaking changes, and DB schema changes.
semver, or if it's mainly a patch change. The PR is targeted at the next
release branch if it's not a patch change.
What
Close #4870
This PR improves reaping of history lookup tables (e.g. history_accounts, history_claimable_balances) so that it can run safely in parallel with ingestion. Currently, reaping of history lookup tables is a blocking operation for ingestion so if the queries to reap history lookup tables take too long that can result in ingestion lag. With this PR, reaping of history lookup tables will be able to run concurrently to ingestion with minimal contention. Also, it is important to note that this PR does not add any performance degradation for either reingestion or live ingestion.
When reviewing this PR it would be helpful to read this design doc:
https://docs.google.com/document/d/1CGfBCS99MTEZDP4mMhV1o6Z5NE_Tlg7ENCcWTwzhlio/edit
Known limitations
After running a full vacuum on history_accounts, the reaping query sped up dramatically. Previously, the duration of reaping the history_accounts table peaked at ~1.9 seconds:
https://grafana.stellar-ops.com/d/x8xDSQQIk/stellar-horizon?orgId=1&from=1722295775773&to=1722400061302&var-environment=stg&var-cluster=pubnet&var-network=All&var-route=All&viewPanel=2531
After the vacuum, the average duration for reaping history_accounts is ~20 ms and the peak duration was ~400 ms:
https://grafana.stellar-ops.com/d/x8xDSQQIk/stellar-horizon?orgId=1&from=1724782666959&to=1724869066959&var-environment=stg&var-cluster=pubnet&var-network=All&var-route=All&viewPanel=2531
This means that the risk that reaping of history lookup tables taking so long that it introduces ingestion lag is a lot less of a concern.
Update:
After running reaping of history lookup tables on staging for 24 hours I have observed that the peak duration actually reaches 600 ms.
https://grafana.stellar-ops.com/d/x8xDSQQIk/stellar-horizon?orgId=1&from=1724866821793&to=1724953221793&var-environment=stg&var-cluster=pubnet&var-network=All&var-route=All&viewPanel=2531