Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

internal: Compress file text using LZ4 #16335

Merged
merged 4 commits into from
Mar 11, 2024

Conversation

lnicola
Copy link
Member

@lnicola lnicola commented Jan 10, 2024

I haven't tested properly, but this roughly looks like:

1246 MB
    59mb   4899 FileTextQuery

1008 MB
    20mb   4899 CompressedFileTextQuery
   555kb   1790 FileTextQuery

We might want to test on something more interesting, like bevy.

@rustbot rustbot added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jan 10, 2024
crates/base-db/src/lib.rs Outdated Show resolved Hide resolved
Arc::from(text)
}

pub trait SourceDatabaseExt2 {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do this without a new trait?

@lnicola
Copy link
Member Author

lnicola commented Jan 10, 2024

555 KB for 16 files seems a bit much, but I guess these measurements are pretty rough.

But why 1790 entries if the LRU size is set to 16?

@Veykril
Copy link
Member

Veykril commented Jan 10, 2024

But why 1790 entries if the LRU size is set to 16?

The entries persist, but the values get evicted.

@lnicola
Copy link
Member Author

lnicola commented Jan 10, 2024

Right, I guess we just have a lot of code (crates/hir/src/lib.rs is 180 KB).

@lnicola
Copy link
Member Author

lnicola commented Jan 10, 2024

Doesn't make much of a difference in analysis-stats self except 62 -> 25 MB on "Database loaded":

# baseline
Database loaded:     436.21ms, 302minstr, 62mb (metadata 238.03ms, 27minstr, 596kb; build 97.65ms, 9891kinstr, 42kb)
  item trees: 1175
Item Tree Collection: 794.18ms, 9285minstr, 23mb
  crates: 47, mods: 1001, decls: 26139, bodies: 24215, adts: 1860, consts: 1126
Item Collection:     9.36s, 80ginstr, 412mb
Body lowering:       4.20s, 37ginstr, 275mb                                                                                                                                       
  exprs: 686572, ??ty: 38 (0%), ?ty: 123 (0%), !ty: 3                                                                                                                             
  pats: 161608, ??ty: 4 (0%), ?ty: 4 (0%), !ty: 0
Inference:           42.09s, 305ginstr, 538mb
MIR lowering:        7.88s, 54ginstr, 265mb
Mir failed bodies: 19 (0%)
Data layouts:        72.00ms, 503minstr, 10mb
Failed data layouts: 130 (7%)
Const evaluation:    479.39ms, 5086minstr, 8mb
Failed const evals: 0 (0%)
Total:               64.88s, 492ginstr, 1535mb

Database loaded:     421.74ms, 297minstr, 62mb (metadata 237.11ms, 27minstr, 596kb; build 97.03ms, 9918kinstr, 41kb)
  item trees: 1175
Item Tree Collection: 795.65ms, 9285minstr, 23mb
  crates: 47, mods: 1001, decls: 26139, bodies: 24215, adts: 1860, consts: 1126
Item Collection:     9.42s, 80ginstr, 412mb
Body lowering:       4.17s, 37ginstr, 275mb                                                                                                                                       
  exprs: 686572, ??ty: 38 (0%), ?ty: 123 (0%), !ty: 3                                                                                                                             
  pats: 161608, ??ty: 4 (0%), ?ty: 4 (0%), !ty: 0
Inference:           44.97s, 328ginstr, 538mb
MIR lowering:        8.36s, 58ginstr, 265mb
Mir failed bodies: 19 (0%)
Data layouts:        72.50ms, 504minstr, 10mb
Failed data layouts: 130 (7%)
Const evaluation:    489.52ms, 5170minstr, 9mb
Failed const evals: 0 (0%)
Total:               68.30s, 520ginstr, 1535mb

Database loaded:     435.20ms, 297minstr, 62mb (metadata 238.48ms, 27minstr, 596kb; build 97.62ms, 9892kinstr, 42kb)
  item trees: 1175
Item Tree Collection: 799.75ms, 9285minstr, 23mb
  crates: 47, mods: 1001, decls: 26139, bodies: 24215, adts: 1860, consts: 1126
Item Collection:     9.40s, 80ginstr, 412mb
Body lowering:       4.20s, 37ginstr, 275mb                                                                                                                                       
  exprs: 686572, ??ty: 38 (0%), ?ty: 123 (0%), !ty: 3                                                                                                                             
  pats: 161608, ??ty: 4 (0%), ?ty: 4 (0%), !ty: 0
Inference:           44.07s, 321ginstr, 538mb
MIR lowering:        8.26s, 57ginstr, 265mb
Mir failed bodies: 19 (0%)
Data layouts:        73.24ms, 503minstr, 10mb
Failed data layouts: 130 (7%)
Const evaluation:    512.87ms, 5327minstr, 9mb
Failed const evals: 0 (0%)
Total:               67.33s, 512ginstr, 1535mb

# pr
Database loaded:     492.48ms, 967minstr, 25mb (metadata 235.37ms, 27minstr, 596kb; build 100.76ms, 9927kinstr, 42kb)
  item trees: 1175
Item Tree Collection: 782.93ms, 9331minstr, 24mb
  crates: 47, mods: 1001, decls: 26139, bodies: 24215, adts: 1860, consts: 1126
Item Collection:     9.16s, 80ginstr, 413mb
Body lowering:       4.16s, 37ginstr, 275mb                                                                                                                                       
  exprs: 686572, ??ty: 38 (0%), ?ty: 123 (0%), !ty: 3                                                                                                                             
  pats: 161608, ??ty: 4 (0%), ?ty: 4 (0%), !ty: 0
Inference:           42.10s, 307ginstr, 538mb
MIR lowering:        7.90s, 54ginstr, 265mb
Mir failed bodies: 19 (0%)
Data layouts:        75.66ms, 504minstr, 10mb
Failed data layouts: 130 (7%)
Const evaluation:    478.94ms, 5021minstr, 9mb
Failed const evals: 0 (0%)
Total:               64.68s, 494ginstr, 1536mb

Database loaded:     503.51ms, 968minstr, 25mb (metadata 240.30ms, 27minstr, 596kb; build 95.45ms, 9927kinstr, 41kb)
  item trees: 1175
Item Tree Collection: 809.47ms, 9331minstr, 24mb
  crates: 47, mods: 1001, decls: 26139, bodies: 24215, adts: 1860, consts: 1126
Item Collection:     9.58s, 80ginstr, 413mb
Body lowering:       4.19s, 37ginstr, 275mb                                                                                                                                       
  exprs: 686572, ??ty: 38 (0%), ?ty: 123 (0%), !ty: 3                                                                                                                             
  pats: 161608, ??ty: 4 (0%), ?ty: 4 (0%), !ty: 0
Inference:           41.57s, 304ginstr, 537mb
MIR lowering:        7.79s, 54ginstr, 265mb
Mir failed bodies: 19 (0%)
Data layouts:        74.69ms, 502minstr, 10mb
Failed data layouts: 130 (7%)
Const evaluation:    491.05ms, 5145minstr, 9mb
Failed const evals: 0 (0%)
Total:               64.51s, 491ginstr, 1536mb

Database loaded:     499.02ms, 967minstr, 25mb (metadata 235.03ms, 27minstr, 596kb; build 100.37ms, 9879kinstr, 42kb)
  item trees: 1175
Item Tree Collection: 769.98ms, 9331minstr, 24mb
  crates: 47, mods: 1001, decls: 26139, bodies: 24215, adts: 1860, consts: 1126
Item Collection:     9.02s, 80ginstr, 413mb
Body lowering:       4.08s, 37ginstr, 275mb                                                                                                                                       
  exprs: 686572, ??ty: 38 (0%), ?ty: 123 (0%), !ty: 3                                                                                                                             
  pats: 161608, ??ty: 4 (0%), ?ty: 4 (0%), !ty: 0
Inference:           41.39s, 308ginstr, 538mb
MIR lowering:        7.71s, 54ginstr, 265mb
Mir failed bodies: 19 (0%)
Data layouts:        71.71ms, 505minstr, 10mb
Failed data layouts: 130 (7%)
Const evaluation:    503.42ms, 5300minstr, 9mb
Failed const evals: 0 (0%)
Total:               63.56s, 496ginstr, 1536mb

@lnicola lnicola marked this pull request as draft January 10, 2024 12:42
@lnicola
Copy link
Member Author

lnicola commented Jan 10, 2024

What's less great about this is that it recompresses the file on every change.

Can we store them in something like a HashMap<FileId, String> (kept in the database) and use salsa::transparent when accessing it without losing change tracking? It also needs to work when updating files.

@Veykril
Copy link
Member

Veykril commented Jan 10, 2024

Maybe? I know you can report synthetic reads in salsa so you could have a query that fakes reading from the file text input for example. But the question is whether that's worth the complication. I do wonder how often we'll end up decompressing files though, as a lot of IDE features will re-parse files pretty frequently currently.

@lnicola
Copy link
Member Author

lnicola commented Jan 10, 2024

https://crates.io/crates/lz4_flex has some benchmarks. I hope we're not reparsing more than 1 MB at once.

And we can increase the LRU size if necessary (not sure how we could tell though).

@Veykril
Copy link
Member

Veykril commented Jan 10, 2024

The LRU will as usual depend on the project in question so its tough to figure out a good value for that. Regarding the synthetic read I would link a salsa test but github doesnt let me search the repo there right now.

@lnicola
Copy link
Member Author

lnicola commented Jan 10, 2024

I think we also need synthetic writes? Or a way to update an input without invalidating it. I don't know 😅

@Veykril
Copy link
Member

Veykril commented Jan 10, 2024

Yes, thats called invalidate apparently https://github.com/rust-analyzer/salsa/blob/d9ff03481c48466f896a03f2f293b6e37a1a899e/book/src/common_patterns/on_demand_inputs.md?plain=1

@bors
Copy link
Contributor

bors commented Jan 10, 2024

☔ The latest upstream changes (presumably #16339) made this pull request unmergeable. Please resolve the merge conflicts.

@Veykril Veykril added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 16, 2024
@lnicola lnicola force-pushed the salsa-lz4-file-text branch 2 times, most recently from 6f557ab to cbc2bde Compare February 14, 2024 11:43
@lnicola
Copy link
Member Author

lnicola commented Feb 14, 2024

@Veykril think we could merge this in the current form, without the optimization that would avoid compressing the file being edited?

It still looks a bit messy, not sure if that extension trait is needed.

@Veykril
Copy link
Member

Veykril commented Feb 14, 2024

Should probably be fine.

and yes I think we need the trait.

@lnicola lnicola marked this pull request as ready for review February 26, 2024 18:14
@bors
Copy link
Contributor

bors commented Mar 4, 2024

☔ The latest upstream changes (presumably #16747) made this pull request unmergeable. Please resolve the merge conflicts.

@Veykril
Copy link
Member

Veykril commented Mar 11, 2024

Time to try this out I guess?
@bors r+

@bors
Copy link
Contributor

bors commented Mar 11, 2024

📌 Commit 717ba1d has been approved by Veykril

It is now in the queue for this repository.

@bors
Copy link
Contributor

bors commented Mar 11, 2024

⌛ Testing commit 717ba1d with merge 8f8bcfc...

@bors
Copy link
Contributor

bors commented Mar 11, 2024

☀️ Test successful - checks-actions
Approved by: Veykril
Pushing 8f8bcfc to master...

@bors bors merged commit 8f8bcfc into rust-lang:master Mar 11, 2024
11 checks passed
@lnicola lnicola deleted the salsa-lz4-file-text branch March 11, 2024 13:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants