Improve Startup Indexing Time #4633

SirTyson · 2025-01-29T01:56:02Z

When starting fresh after new-db, core first downloads Buckets, reads them once to verify the hash, then reads them all again to construct BucketIndex. We should combine the index and verify step since startup is mostly disk bound. There is no additional DOS risk that this imposes. If a History Archive provider is malicious, they could zip bomb us anyway as an OOM attack vector.

The text was updated successfully, but these errors were encountered:

MonsieurNicolas · 2025-01-30T22:21:59Z

couple comments:

a zip bomb today does not cause an OOM, but a temporary (potentially full) disk space issue that gets resolved on retry.
on some systems, a process eating up all RAM may take the whole system down; the OOM killer may not be fast enough to catch the issue, and other processes will fail in not so deterministic ways (because VRAM is at capacity). I think I've seen some system lock up entirely (and need to be rebooted via AWS console).

net is: a proper analysis is probably needed + enforce some sort of upper bound "just in case"

SirTyson · 2025-01-30T22:50:32Z

Leaving the full analysis off here because it's a bit of a DOS angle, but I ran the numbers and the worst case index attack is as follows:

100 GB worst case bucket = 2.04 GB index
150 GB worst case bucket = 4.6 GB index
200 GB worst case bucket = 8.18 GB index

I think it might be reasonable to put a 100 GB hard limit on unzipped buckets. If an unzipped Bucket is over the limit, we throw as invalid before we start the hashing or indexing process.

# Description Resolves #4633 This PR indexes bucket files during `VerifyBucketsWork`. Previously we would download Buckets, iterate all the buckets to check their hash, then iterate through all the buckets again to index them. Since startup is primarily disk bound, iterating through the entire BucketList twice is expensive. This change does the hash verification and indexing step in the same pass so we only have to read the BucketList once. This introduces no new DOS vectors. Indexing unverified buckets could lead to an OOM based DOS attack, where a malicious History Archive provider hosts malicious buckets that are very large. However, such OOM attacks are already possible via a zip bomb, and History Archive providers are fairly trusted, so this is not a significant concern. To mitigate this I've added an INFO level log message saying what history archive a given file is being downloaded from. In the event of a DOS attack, these logs would give us enough info to quickly assign blame to the attacker and remove them from quorum sets. On my laptop, this decreases startup time from `new-db` by about 16%. Rebased on top of #4630. # Checklist - [x] Reviewed the [contributing](https://github.com/stellar/stellar-core/blob/master/CONTRIBUTING.md#submitting-changes) document - [x] Rebased on top of master (no merge commits) - [x] Ran `clang-format` v8.0.0 (via `make format` or the Visual Studio extension) - [x] Compiles - [x] Ran all tests - [ ] If change impacts performance, include supporting evidence per the [performance document](https://github.com/stellar/stellar-core/blob/master/performance-eval/performance-eval.md)

SirTyson added the enhancement label Jan 29, 2025

SirTyson self-assigned this Jan 29, 2025

SirTyson changed the title ~~Improve Startup Time~~ Improve Startup Indexing Time Jan 29, 2025

SirTyson mentioned this issue Jan 29, 2025

Apply buckets optimization #4634

Merged

6 tasks

SirTyson closed this as completed in #4634 Feb 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve Startup Indexing Time #4633

Improve Startup Indexing Time #4633

SirTyson commented Jan 29, 2025

MonsieurNicolas commented Jan 30, 2025

SirTyson commented Jan 30, 2025

Improve Startup Indexing Time #4633

Improve Startup Indexing Time #4633

Comments

SirTyson commented Jan 29, 2025

MonsieurNicolas commented Jan 30, 2025

SirTyson commented Jan 30, 2025