-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: index cache in SQLite3 #13584
feat: index cache in SQLite3 #13584
Conversation
dbec9b0
to
58655e5
Compare
❗ Help wanted: benchmark on Windows and HDD ❗Benchmark result
Cache writeResult: naive SQLite cache is significant slower than fs cache on macOS and Linux. This might be the index cache blobs are too big. There is a post on SQLite website about storing blobs inside DB file versus separate files https://www.sqlite.org/intern-v-extern-blob.html. The rule of thumb from that experiment is file size larger than 100k would be better to store separately. I've picked some popular crates, and most of them are over 50k. See the list of randomly-picked popular crates
benchmark script hyperfine --warmup 2 \
--min-runs 100 \
-L ZFLAG -Zindex-cache-sqlite, \
--setup "CARGO_REGISTRIES_CRATES_IO_PROTOCOL=git CARGO_HOME='$MY_CARGO_HOME' '$MY_CARGO' fetch" \
--prepare 'rm -rf "$MY_CARGO_HOME/registry/index/<git-index-url-and-hash>/.cache"' \
"CARGO_REGISTRIES_CRATES_IO_PROTOCOL=git CARGO_HOME='$MY_CARGO_HOME' '$MY_CARGO' generate-lockfile --offline {ZFLAG}" Result on Linux with SSD
Result on macOS with SSD
Cache readResult: no significant different between Linux and macOS on cache read benchmark script hyperfine --warmup 2 \
--min-runs 100 \
-L ZFLAG -Zindex-cache-sqlite, \
--setup "CARGO_HOME='$MY_CARGO_HOME' '$MY_CARGO' generate-lockfile {ZFLAG}" \
"CARGO_HOME='$MY_CARGO_HOME' '$MY_CARGO' generate-lockfile --offline {ZFLAG}" Result on Linux with SSD
Result on Linux with SSD
Disk usage
|
One of the things that was important for the last-use performance was to batch the inserts in a single transaction. From what I can tell, this is using a separate transaction for each insert which can be expensive. That might be something to experiment with. Another thought I had was that the schema could be Have you also tried adjusting the page size? |
Yes, both are good ideas, though requires more than trivial changes and is hard to share the same interface with old cache mechanism 😞.
Yes. And also a series of combination of WAL journal, cache size/limit, synchronouns=normal, and other pragmas. The difference was insignificant. |
☔ The latest upstream changes (presumably #13632) made this pull request unmergeable. Please resolve the merge conflicts. |
__CARGO_TEST_FORCE_SQLITE_INDEX_CACHE to force enable it.
58655e5
to
aadab5f
Compare
I've pushed some variants of this:
On macOS, both of the read and write performance of these two variants are on par with the original filesystem cache.
On Linux, however, we've got a 5% performance hit on write.
@ehuss do you have time doing a simple benchmark on Windows? I also wonder if it is not worthy due to
|
Going to close this as it is not going to anywhere in a near future. We've collected some interesting data points and people can look into them when needed :) |
What does this PR try to resolve?
#6908
Add an unstable feature to store index cache in SQLite3 database.
The flag is undocumented since the schema is just a naive dump.
The implementation can be used as benchmark baseline while we explore other cache/SQL schema design.
How should we test and review this PR?
Run this to force using SQLite for index cache.
You'll find SQLite3 db file at
registry/index/<index-url>/.cache/index-cache.db
.Additional information