-
Notifications
You must be signed in to change notification settings - Fork 988
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve Startup Indexing Time #4633
Comments
couple comments:
net is: a proper analysis is probably needed + enforce some sort of upper bound "just in case" |
Leaving the full analysis off here because it's a bit of a DOS angle, but I ran the numbers and the worst case index attack is as follows: 100 GB worst case bucket = 2.04 GB index I think it might be reasonable to put a 100 GB hard limit on unzipped buckets. If an unzipped Bucket is over the limit, we throw as invalid before we start the hashing or indexing process. |
# Description Resolves #4633 This PR indexes bucket files during `VerifyBucketsWork`. Previously we would download Buckets, iterate all the buckets to check their hash, then iterate through all the buckets again to index them. Since startup is primarily disk bound, iterating through the entire BucketList twice is expensive. This change does the hash verification and indexing step in the same pass so we only have to read the BucketList once. This introduces no new DOS vectors. Indexing unverified buckets could lead to an OOM based DOS attack, where a malicious History Archive provider hosts malicious buckets that are very large. However, such OOM attacks are already possible via a zip bomb, and History Archive providers are fairly trusted, so this is not a significant concern. To mitigate this I've added an INFO level log message saying what history archive a given file is being downloaded from. In the event of a DOS attack, these logs would give us enough info to quickly assign blame to the attacker and remove them from quorum sets. On my laptop, this decreases startup time from `new-db` by about 16%. Rebased on top of #4630. # Checklist - [x] Reviewed the [contributing](https://github.com/stellar/stellar-core/blob/master/CONTRIBUTING.md#submitting-changes) document - [x] Rebased on top of master (no merge commits) - [x] Ran `clang-format` v8.0.0 (via `make format` or the Visual Studio extension) - [x] Compiles - [x] Ran all tests - [ ] If change impacts performance, include supporting evidence per the [performance document](https://github.com/stellar/stellar-core/blob/master/performance-eval/performance-eval.md)
When starting fresh after
new-db
, core first downloads Buckets, reads them once to verify the hash, then reads them all again to constructBucketIndex
. We should combine the index and verify step since startup is mostly disk bound. There is no additional DOS risk that this imposes. If a History Archive provider is malicious, they could zip bomb us anyway as an OOM attack vector.The text was updated successfully, but these errors were encountered: