Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Startup Indexing Time #4633

Closed
SirTyson opened this issue Jan 29, 2025 · 2 comments · Fixed by #4634
Closed

Improve Startup Indexing Time #4633

SirTyson opened this issue Jan 29, 2025 · 2 comments · Fixed by #4634
Assignees

Comments

@SirTyson
Copy link
Contributor

When starting fresh after new-db, core first downloads Buckets, reads them once to verify the hash, then reads them all again to construct BucketIndex. We should combine the index and verify step since startup is mostly disk bound. There is no additional DOS risk that this imposes. If a History Archive provider is malicious, they could zip bomb us anyway as an OOM attack vector.

@SirTyson SirTyson self-assigned this Jan 29, 2025
@SirTyson SirTyson changed the title Improve Startup Time Improve Startup Indexing Time Jan 29, 2025
@MonsieurNicolas
Copy link
Contributor

couple comments:

  • a zip bomb today does not cause an OOM, but a temporary (potentially full) disk space issue that gets resolved on retry.
  • on some systems, a process eating up all RAM may take the whole system down; the OOM killer may not be fast enough to catch the issue, and other processes will fail in not so deterministic ways (because VRAM is at capacity). I think I've seen some system lock up entirely (and need to be rebooted via AWS console).

net is: a proper analysis is probably needed + enforce some sort of upper bound "just in case"

@SirTyson
Copy link
Contributor Author

Leaving the full analysis off here because it's a bit of a DOS angle, but I ran the numbers and the worst case index attack is as follows:

100 GB worst case bucket = 2.04 GB index
150 GB worst case bucket = 4.6 GB index
200 GB worst case bucket = 8.18 GB index

I think it might be reasonable to put a 100 GB hard limit on unzipped buckets. If an unzipped Bucket is over the limit, we throw as invalid before we start the hashing or indexing process.

github-merge-queue bot pushed a commit that referenced this issue Feb 13, 2025
# Description

Resolves #4633

This PR indexes bucket files during `VerifyBucketsWork`. Previously we
would download Buckets, iterate all the buckets to check their hash,
then iterate through all the buckets again to index them. Since startup
is primarily disk bound, iterating through the entire BucketList twice
is expensive. This change does the hash verification and indexing step
in the same pass so we only have to read the BucketList once.

This introduces no new DOS vectors. Indexing unverified buckets could
lead to an OOM based DOS attack, where a malicious History Archive
provider hosts malicious buckets that are very large. However, such OOM
attacks are already possible via a zip bomb, and History Archive
providers are fairly trusted, so this is not a significant concern. To
mitigate this I've added an INFO level log message saying what history
archive a given file is being downloaded from. In the event of a DOS
attack, these logs would give us enough info to quickly assign blame to
the attacker and remove them from quorum sets.

On my laptop, this decreases startup time from `new-db` by about 16%.

Rebased on top of #4630.

# Checklist
- [x] Reviewed the
[contributing](https://github.com/stellar/stellar-core/blob/master/CONTRIBUTING.md#submitting-changes)
document
- [x] Rebased on top of master (no merge commits)
- [x] Ran `clang-format` v8.0.0 (via `make format` or the Visual Studio
extension)
- [x] Compiles
- [x] Ran all tests
- [ ] If change impacts performance, include supporting evidence per the
[performance
document](https://github.com/stellar/stellar-core/blob/master/performance-eval/performance-eval.md)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants