internal: Build source map for `hir_def::TypeRef`s #18074

ChayimFriedman2 · 2024-09-08T19:28:31Z

So that given a TypeRef we will be able to trace it back to source code.

This is necessary to be able to provide diagnostics for lowering to chalk tys, since the input to that is TypeRef. This is the first step towards diagnostics for ty lowering. The next steps will be smaller :P

This means that TypeRefs now have an identity, which means storing them in arena and not interning them, which is an unfortunate (but necessary) loss but also a pretty massive change. Luckily, because of the separation layer we have for IDE and HIR, this change never crosses the IDE boundary.

This gives an unfortunate regression of ~170mb in analysis-stats ..

Sorry for the size of the PR; I did some extra things but they are trivial - most of the code is necessary and impossible to split.

bors · 2024-09-11T12:23:13Z

☔ The latest upstream changes (presumably #18075) made this pull request unmergeable. Please resolve the merge conflicts.

ChayimFriedman2 · 2024-09-12T21:44:55Z

Okay, this is ready for review.

I intend to submit a follow-up PR with few cleanups that I don't want to insert here to not increase the size of this already-large PR.

Veykril · 2024-09-25T09:46:34Z

This gives an unfortunate regression of ~170mb in analysis-stats ..

Whew that's quite a bit 😞 Wonder if we can shrink TypeRef some more (its 40 bytes on master)

crates/hir-def/src/db.rs

Veykril · 2024-09-25T10:47:43Z

crates/hir-def/src/path.rs

 }

 /// A single generic argument.
 #[derive(Debug, Clone, PartialEq, Eq, Hash)]
 pub enum GenericArg {
-    Type(TypeRef),
+    Type(TypeRefId),
    Lifetime(LifetimeRef),


Aside: I imagine we might wanna give lifetimes the same treatment (given they also only appear in type position anyways)

Veykril · 2024-09-25T11:02:53Z

I guess part of the memory increase comes from the new source map queries not being LRU'd. Gonna think a bit aabout whether we can reduce the memory impact of this change to some degree (aside from LRUing those queries)

lnicola · 2024-09-25T11:16:51Z

Can we make block_item_tree etc. transparent to save some memory? Because block_item_tree_query just calls block_item_tree_with_source_map now.

Veykril · 2024-09-25T11:22:28Z

The split exists for incrementality purposes, and making that one transparent won't really save much given it just clones the Arc

ChayimFriedman2 · 2024-09-25T13:14:05Z

If anything, it's the opposite: we can make the source maps query transparent (not just this one), since everyone that wants the source map is not perf-sensitive (IDE layer).

ChayimFriedman2 · 2024-09-25T13:17:43Z

I guess part of the memory increase comes from the new source map queries not being LRU'd

What is the difference between LRU'd and not LRU'd queries?

ChayimFriedman2 · 2024-09-25T19:32:17Z

I'll try to find how each type consumes memory, and as a result of this analysis what can we do to shrink it.

Veykril · 2024-09-25T20:08:50Z

I guess part of the memory increase comes from the new source map queries not being LRU'd

What is the difference between LRU'd and not LRU'd queries?

LRU'd ones will just have their caches evicted if they overflow the LRU limit (up to the limit).

ChayimFriedman2 · 2024-10-08T20:49:11Z

@Veykril Adding #[salsa::lru] around the source map queries does not help analysis-stats . at all (neither it makes it worse, though). I'll explore other ways. Should I still LRU them?

bors · 2024-10-14T14:36:12Z

☔ The latest upstream changes (presumably #18239) made this pull request unmergeable. Please resolve the merge conflicts.

bors · 2024-10-15T08:38:56Z

☔ The latest upstream changes (presumably #18278) made this pull request unmergeable. Please resolve the merge conflicts.

ChayimFriedman2 · 2024-10-19T23:33:59Z

@Veykril After long time of trying to reduce the memory effect of this, I was able to reduce it down to 84mb. The first commit remains untouched (minus rebases), follow-up commits reduce memory usage. The two last specifically have somewhat bad ratio for code they touch versus their memory usage impact (and they also use unsafe code), so you may not want to include them. There is one commit (the one about attributes) that does an unrelated improvement, but all others are because of this PR.

I believe I pretty much hit the end of the ability to optimize this (and introspection shows at least some amount is inherent to this, e.g. lack of interning and having to store source maps). I can optimize further, but the changes requires will be pervasive and the compensation will be very low (I estimate up to 3 megabytes for each change). I don't think this is worth it. I think that at this point, we need to bite the bullet and accept that this is a necessary evil.

Also, do we have a way to run Miri on CI? ThinVecWithHeader will benefit from that.

bors · 2024-10-20T14:36:48Z

☔ The latest upstream changes (presumably #18350) made this pull request unmergeable. Please resolve the merge conflicts.

bors · 2024-10-22T08:04:31Z

☔ The latest upstream changes (presumably #18362) made this pull request unmergeable. Please resolve the merge conflicts.

bors · 2024-10-23T12:49:04Z

☔ The latest upstream changes (presumably #18264) made this pull request unmergeable. Please resolve the merge conflicts.

crates/stdx/src/lib.rs

crates/hir-expand/src/attrs.rs

crates/hir-def/src/path.rs

Veykril · 2024-10-24T12:18:37Z

crates/hir-def/src/data.rs

@@ -532,10 +538,11 @@ impl ExternCrateDeclData {
 pub struct ConstData {
    /// `None` for `const _: () = ();`
    pub name: Option<Name>,
-    pub type_ref: Interned<TypeRef>,
+    pub type_ref: TypeRefId,


Question for this, and related structs here, but shouldn't this type_ref always be the 0 ID? (if not, we could probably make it so) in which case we can omit storing it here

It is not (it is always the last actually because typerefs are allocated recursively), but the last is also easy to tell. I do wonder if we should make it so dependent on implicit behavior however, or maybe reserve the first to it somehow.

So I tried this and this doesn't reduce memory usage at all. On some types this doesn't even shrink them because of padding. And this depends on subtle implementation detail, and, if we take this to the extreme (i.e. apply this for as many types as possible, e.g. impls) it introduces an order dependency between the type ref lowering which isn't clear from a glance and needs to be documented. As such, I'm inclined to not do this.

Veykril · 2024-10-24T12:25:50Z

crates/hir-def/src/hir/type_ref.rs

    Macro(AstId<ast::MacroCall>),
    Error,
 }

+#[cfg(target_arch = "x86_64")]
+const _: () = assert!(size_of::<TypeRef>() == 16);


I wonder if asking for 24 might be better, 16 seems to requires us to be very aggressive with boxing things which might spent equal amount of memory on allocator overhead (we can check that as a follow up to the PR though)

I benchmarked memory usage using jemalloc, which I think includes the allocator overhead. Speed could be affected, but I don't think it will (nor did my benchmarks showed), since boxed types account for less than 10% of all types. If we allow 24 bytes we are able to unbox Reference but not Array, but most of the 10% is references anyway. I will try to benchmark memory usage of each option to see what is better.

For context: here are the statistics of the r-a repo:

total_types: 656222 (100%)
tuples: 11492 (1.75%)
path_types: 552122 (84.14%)
raw_ptrs: 13167 (2.01%)
references: 58745 (8.95%)
references_with_lifetimes: 10529 (1.60%)
arrays: 2792 (0.43%)
slices: 6691 (1.02%)
fns: 821 (0.13%)
impl_traits: 1523 (0.23%)
dyn_traits: 4723 (0.72%)
macros: 1050 (0.16%)

This shows that if we are really concerned by the boxing, we can split Reference into ReferenceWithLifetime and ReferenceWithoutLifetime and box only the latter, and then we'll have a bit more than 2% boxed.

Veykril · 2024-10-24T12:51:30Z

crates/hir-ty/src/display.rs

@@ -1897,100 +1899,150 @@ pub fn write_visibility(
    }
 }

-impl HirDisplay for TypeRef {
+pub trait HirDisplayWithTypesMap {


Can we just add &TypesMap to the HirFormatter, default that to the &'static EMPTY and add a new method to HirDisplay for hir_fmt_with_typemap or so? Up to you if you think that would turn out "cleaner" or not, just throwing ideas

Veykril · 2024-10-24T12:53:51Z

crates/hir-ty/src/lower.rs

+    /// If this is set, that means we're in a context of a freshly expanded macro, and that means
+    /// we should not use `TypeRefId` in diagnostics because the caller won't have the `TypesMap`,
+    /// instead we need to put `TypeSource` from the source map.
+    types_source_map: Option<&'a TypesSourceMap>,


I take it this is due to type source stuff being pre-expansion? Would be nice if we can figure out a way to make this work at some point

Yes. This does work, but is a bit ugly. It means every diagnostic will have to be Either<TypeRefId, TypesSource>. This isn't an incrementality problem because we touch the AST anyway when there are types macros, but it is not nice. OTOH I don't see a good way to avoid this, type refs are in the item tree so they cannot be post expansion, and making another lowering layer will be more expensive performance wise. And the situation isn't that terrible, it's just not nice.

Veykril · 2024-10-24T12:55:50Z

crates/hir-ty/src/lower.rs

+                            // FIXME: That may be better served by mutating `self` then restoring, but this requires
+                            // making it `&mut self`.


Gotta say TyLoweringContext is getting more and more cursed with all of this stuff and the interior mutability 😬

Yeah I intend to clean that up in a follow-up PR, didn't wanted to do it now because it's already too big.

So that given a `TypeRef` we will be able to trace it back to source code. This is necessary to be able to provide diagnostics for lowering to chalk tys, since the input to that is `TypeRef`. This means that `TypeRef`s now have an identity, which means storing them in arena and not interning them, which is an unfortunate (but necessary) loss but also a pretty massive change. Luckily, because of the separation layer we have for IDE and HIR, this change never crosses the IDE boundary.

This saves back 15mb that went for typeref source maps.

This saves 16mb on `analysis-stats .`.

This saves 8mb.

Thanks to the observation (supported by counting) that the vast majority paths have neither generics no type anchors, and thanks to a new datastructure `ThinVecWithHeader` that is essentially `(T, Box<[U]>)` but with the size of a single pointer, we are able to reach this feat. This (together with `ThinVecWithHeader`) makes the possibility to shrink `TypeRef`, because most types are paths.

Only references and arrays need to be boxed, and they comprise only 9.4% of the types (according to counting on r-a's code). This saves 17mb.

Veykril · 2024-10-25T05:21:22Z

I'll merge this on monday just in case so we have time for fixing stuff before the next release

rustbot added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Sep 8, 2024

ChayimFriedman2 marked this pull request as draft September 8, 2024 19:28

ChayimFriedman2 mentioned this pull request Sep 8, 2024

feat: Implement diagnostics for E0038 #18000

Open

ChayimFriedman2 force-pushed the typeref-source-map branch 2 times, most recently from 9af921a to 3c34665 Compare September 12, 2024 21:44

ChayimFriedman2 marked this pull request as ready for review September 12, 2024 21:44

ChayimFriedman2 changed the title ~~Build source map for hir_def::TypeRefs~~ internal: Build source map for hir_def::TypeRefs Sep 12, 2024

ChayimFriedman2 mentioned this pull request Sep 18, 2024

Provide assistance for writing an impl block #17590

Open

Veykril reviewed Sep 25, 2024

View reviewed changes

Veykril mentioned this pull request Sep 30, 2024

Implement assist to add missing lifetime to function #17265

Closed

ChayimFriedman2 force-pushed the typeref-source-map branch from 3c34665 to 9a2a0a2 Compare October 13, 2024 06:06

ChayimFriedman2 force-pushed the typeref-source-map branch from 9a2a0a2 to e8d7a19 Compare October 14, 2024 14:44

ChayimFriedman2 force-pushed the typeref-source-map branch from e8d7a19 to 9a4b995 Compare October 15, 2024 12:16

ChayimFriedman2 mentioned this pull request Oct 18, 2024

Store token trees in contiguous Vec instead of as a tree #18327

Open

ChayimFriedman2 force-pushed the typeref-source-map branch 2 times, most recently from be029b9 to a4b350f Compare October 19, 2024 23:29

ChayimFriedman2 force-pushed the typeref-source-map branch 2 times, most recently from 9deffe3 to b2b6e4d Compare October 19, 2024 23:56

ChayimFriedman2 force-pushed the typeref-source-map branch from b2b6e4d to f8b674a Compare October 20, 2024 15:42

ChayimFriedman2 force-pushed the typeref-source-map branch from f8b674a to ce82b01 Compare October 22, 2024 22:09

Veykril reviewed Oct 24, 2024

View reviewed changes

ChayimFriedman2 added 7 commits October 25, 2024 06:15

Fix memory usage calculation's queries list

1fae57f

Reuse empty GenericParams

39747cb

This saves back 15mb that went for typeref source maps.

Shrink ItemTreeSourceMaps

d0933cc

This saves 16mb on `analysis-stats .`.

Do not allocate attributes entry if there are no attributes

bccf006

This saves 8mb.

Shrink TypeRef from 16 from 32 bytes

bf7edd3

Only references and arrays need to be boxed, and they comprise only 9.4% of the types (according to counting on r-a's code). This saves 17mb.

ChayimFriedman2 force-pushed the typeref-source-map branch from ce82b01 to bf7edd3 Compare October 25, 2024 04:28

Veykril added this pull request to the merge queue Oct 28, 2024

Merged via the queue into rust-lang:master with commit 80e9d01 Oct 28, 2024
9 checks passed

ChayimFriedman2 deleted the typeref-source-map branch October 28, 2024 14:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

internal: Build source map for `hir_def::TypeRef`s #18074

internal: Build source map for `hir_def::TypeRef`s #18074

ChayimFriedman2 commented Sep 8, 2024 •

edited

Loading

bors commented Sep 11, 2024

ChayimFriedman2 commented Sep 12, 2024

Veykril commented Sep 25, 2024 •

edited

Loading

Veykril Sep 25, 2024

Veykril commented Sep 25, 2024

lnicola commented Sep 25, 2024 •

edited

Loading

Veykril commented Sep 25, 2024

ChayimFriedman2 commented Sep 25, 2024

ChayimFriedman2 commented Sep 25, 2024

ChayimFriedman2 commented Sep 25, 2024

Veykril commented Sep 25, 2024

ChayimFriedman2 commented Oct 8, 2024

bors commented Oct 14, 2024

bors commented Oct 15, 2024

ChayimFriedman2 commented Oct 19, 2024 •

edited

Loading

bors commented Oct 20, 2024

bors commented Oct 22, 2024

bors commented Oct 23, 2024

Veykril Oct 24, 2024

ChayimFriedman2 Oct 25, 2024

ChayimFriedman2 Oct 25, 2024

Veykril Oct 24, 2024

ChayimFriedman2 Oct 25, 2024

ChayimFriedman2 Oct 25, 2024

Veykril Oct 24, 2024

Veykril Oct 24, 2024

ChayimFriedman2 Oct 25, 2024

Veykril Oct 24, 2024

ChayimFriedman2 Oct 24, 2024

Veykril commented Oct 25, 2024

		// FIXME: That may be better served by mutating `self` then restoring, but this requires
		// making it `&mut self`.

internal: Build source map for hir_def::TypeRefs #18074

internal: Build source map for hir_def::TypeRefs #18074

Conversation

ChayimFriedman2 commented Sep 8, 2024 • edited Loading

bors commented Sep 11, 2024

ChayimFriedman2 commented Sep 12, 2024

Veykril commented Sep 25, 2024 • edited Loading

Choose a reason for hiding this comment

Veykril commented Sep 25, 2024

lnicola commented Sep 25, 2024 • edited Loading

Veykril commented Sep 25, 2024

ChayimFriedman2 commented Sep 25, 2024

ChayimFriedman2 commented Sep 25, 2024

ChayimFriedman2 commented Sep 25, 2024

Veykril commented Sep 25, 2024

ChayimFriedman2 commented Oct 8, 2024

bors commented Oct 14, 2024

bors commented Oct 15, 2024

ChayimFriedman2 commented Oct 19, 2024 • edited Loading

bors commented Oct 20, 2024

bors commented Oct 22, 2024

bors commented Oct 23, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Veykril commented Oct 25, 2024

internal: Build source map for `hir_def::TypeRef`s #18074

internal: Build source map for `hir_def::TypeRef`s #18074

ChayimFriedman2 commented Sep 8, 2024 •

edited

Loading

Veykril commented Sep 25, 2024 •

edited

Loading

lnicola commented Sep 25, 2024 •

edited

Loading

ChayimFriedman2 commented Oct 19, 2024 •

edited

Loading