[ty] Add caching for submodule completion suggestions #19408

BurntSushi · 2025-07-17T18:07:26Z

This change makes it so we aren't doing a directory traversal every time
we ask for completions from a module. Specifically, submodules that
aren't attributes of their parent module can only be discovered by
looking at the directory tree. But we want to avoid doing a directory
scan unless we think there are changes.

To make this work, this change does a little bit of surgery to
FileRoot. Previously, a FileRoot was only used for library search
paths. Its revision was bumped whenever a file in that tree was added,
deleted or even modified (to support the discovery of pth files and
changes to its contents). This generally seems fine since these are
presumably dependency paths that shouldn't change frequently.

In this change, we add a FileRoot for the project. But having the
FileRoot's revision bumped for every change in the project makes
caching based on that FileRoot rather ineffective. That is, cache
invalidation will occur too aggressively. To the point that there is
little point in adding caching in the first place. To mitigate this, a
FileRoot's revision is only bumped on a change to a child file's
contents when the FileRoot is a LibrarySearchPath. Otherwise, we
only bump the revision when a file is created or added.

The effect is that, at least in VS Code, when a new module is added or
removed, this change is picked up and the cache is properly invalidated.
Other LSP clients with worse support for file watching (which seems to
be the case for the CoC vim plugin that I use) don't work as well. Here,
the cache is less likely to be invalidated which might cause completions
to have stale results. Unless there's an obvious way to fix or improve
this, I propose punting on improvements here for now.

BurntSushi · 2025-07-17T18:08:15Z

Demo (although I guess this is technically indistinguishable from what's on main):

completions-vscode-submodule-caching.mp4

github-actions · 2025-07-17T18:11:27Z

`mypy_primer` results

No ecosystem changes detected ✅
No memory usage changes detected ✅

BurntSushi · 2025-07-17T18:51:35Z

The test failure on Windows looks interesting. Maybe the file watcher isn't picking up the deleted file? If it doesn't, then the cache doesn't get invalidated and you end up with wazoo in the completions (which should be gone).

MichaReiser

This looks great. I left a few comments around the file watching test. I hope it helps resolve the windows issue.

The only thing that needs addressing before landing is that we properly handle the case where a user creates or deletes a new ty.toml or pyproject.toml (see inline comment).

I also think it's worth considering making Module a salsa tracked ingredient to remove the () pseudo argument hack but that's something for a separate PR and only worth doing if the fallout isn't too big

MichaReiser · 2025-07-18T11:32:16Z

crates/ty/tests/file_watching.rs

+    );
+
+    std::fs::write(case.project_path("bar/wazoo.py").as_std_path(), "")?;
+    let changes = case.stop_watch(|_: &ChangeEvent| true);


I assume we're waiting here on the event for bar/wazoo.py. If so, I suggest using event_for_file("bar/wazoo.py"). That makes the test less flaky because the runner explicitly waits for an event for this file (or Rescan) rather than any event

Suggested change

let changes = case.stop_watch(|_: &ChangeEvent| true);

let changes = case.stop_watch(event_for_file("wazoo.py"));

Ah okay, I didn't realize that providing a specific file path would help avoid flakes. That makes sense. I noticed the other tests did that, but wondered about it. I added some comments on stop_watch explaining this.

Thank you. I guess there are two benefits:

It prevents "stopping" if there was a spurious file watcher event

It gives better error messages because we can now say: We waited for an event for wazoo.py but we only saw ...

crates/ty/tests/file_watching.rs

MichaReiser · 2025-07-18T11:39:48Z

crates/ty_project/src/db/changes.rs

+                            if let Some(root) = self.files().root(self, &absolute) {
+                                root.set_revision(self).to(FileRevision::now());
+                            }


Isn't sync_recursively already updating all roots?

It looks like it doesn't. It only seems to sync roots that are beneath the path given. Which is maybe not intended? Specifically, if this is flipped around

ruff/crates/ruff_db/src/files.rs

Line 235 in ba7ed3a

if root.path(db).starts_with(&path) {

then I think it works how you describe. But it seems like the intent of sync_recursively is only to sync files/directories beneath the path given. But here, we want to sync the root for the path given, which should be above it.

Uhm, this looks the wrong way round... It doesn't seem very useful to bump a root if its parent directory changed? The idea of sync_recursively is to resync the entire subtree.

Ah okay. So for file roots you want to go up, but for the actual files you want to go down. So I think this part is correct:

ruff/crates/ruff_db/src/files.rs

Lines 226 to 230 in ba7ed3a

for entry in inner.system_by_path.iter_mut() {

if entry.key().starts_with(&path) {

File::sync_system_path(db, entry.key(), Some(*entry.value()));

}

}

I can flip this condition around.

MichaReiser · 2025-07-18T11:43:15Z

crates/ty_project/src/db/changes.rs

-                        sync_path(self, &path);
+                        if synced_files.insert(path.to_path_buf()) {
+                            File::sync_path(self, &path);
+                        }


On line 284: We'll need to re-create the FileRoot. The case handled there is if a user deletes a pyproject.toml (or creates a new ty.toml). Doing so can change the root path of the project. You might want to move the FileRoot creation into the from_metadata method

Good catch. I actually had this at an even higher level, but I agree that putting it in from_metadata is even better.

Adding a regression test for this seems a little tricky. I think it's because the project is created initially and a file root is added. And even when the Project is re-created, I guess by that point the file root has already been added so everything still works?

Anyway, this is done. And I did add a test.

MichaReiser · 2025-07-18T11:54:42Z

crates/ty_python_semantic/src/module_resolver/module.rs

    inner: Arc<ModuleInner>,
 }

+#[salsa::tracked]


It's probably best to do this in a separate PR but I think it could make sense to explore making Module a salsa::tracked and removing the inner Arc.

The decision for making something a salsa ingredient vs e.g. using an Arc is mostly on whether there are any salsa queries that take the struct as an argument. Module is an Arc because, up to now, no query took Module as an argument so it was simply unnecessary for it to be a sala tracked struct. This changes with your PR. That's why I think it's worth to give it a short try to see how big the fallout is if we make it a salsa::struct.

I would probably design it like this:

#[salsa::tracked(debug)] #[derive(PartialEq, Eq)] pub(crate) struct FileModule { #[returns(ref)] name: ModuleName, kind: ModuleKind, #[returns(ref)] search_path: SearchPath, file: File, known: Option<KnownModule>, } #[salsa::tracked(debug)] #[derive(PartialEq, Eq)] pub(crate) struct NamespacePackage { #[returns(ref)] name: ModuleName } #[derive(salsa::Supertype, Eq, PartialEq, Debug)] pub(crate) enum Module { File(FileModule), Namespace(NamespacePackage) }

Thank you! I'll give this a try in a follow-up.

This change makes it so we aren't doing a directory traversal every time we ask for completions from a module. Specifically, submodules that aren't attributes of their parent module can only be discovered by looking at the directory tree. But we want to avoid doing a directory scan unless we think there are changes. To make this work, this change does a little bit of surgery to `FileRoot`. Previously, a `FileRoot` was only used for library search paths. Its revision was bumped whenever a file in that tree was added, deleted or even modified (to support the discovery of `pth` files and changes to its contents). This generally seems fine since these are presumably dependency paths that shouldn't change frequently. In this change, we add a `FileRoot` for the project. But having the `FileRoot`'s revision bumped for every change in the project makes caching based on that `FileRoot` rather ineffective. That is, cache invalidation will occur too aggressively. To the point that there is little point in adding caching in the first place. To mitigate this, a `FileRoot`'s revision is only bumped on a change to a child file's contents when the `FileRoot` is a `LibrarySearchPath`. Otherwise, we only bump the revision when a file is created or added. The effect is that, at least in VS Code, when a new module is added or removed, this change is picked up and the cache is properly invalidated. Other LSP clients with worse support for file watching (which seems to be the case for the CoC vim plugin that I use) don't work as well. Here, the cache is less likely to be invalidated which might cause completions to have stale results. Unless there's an obvious way to fix or improve this, I propose punting on improvements here for now.

We want to write queries that depend on `Module` for caching. While it seems it can be done without making `Module` an ingredient, it seems it is best practice to do so. [best practice to do so]: #19408 (comment)

BurntSushi requested review from AlexWaygood, MichaReiser, carljm, dcreager and sharkdp as code owners July 17, 2025 18:07

BurntSushi requested review from dhruvmanila and removed request for AlexWaygood, carljm, dcreager and sharkdp July 17, 2025 18:07

BurntSushi added internal An internal refactor or improvement server Related to the LSP server ty Multi-file analysis & type inference labels Jul 17, 2025

BurntSushi force-pushed the ag/submodule-caching branch from fdd0de7 to 8371355 Compare July 17, 2025 18:25

MichaReiser approved these changes Jul 18, 2025

View reviewed changes

BurntSushi force-pushed the ag/submodule-caching branch from 8371355 to 615ad55 Compare July 18, 2025 14:42

BurntSushi force-pushed the ag/submodule-caching branch 3 times, most recently from 5f7b543 to ddae85e Compare July 18, 2025 15:18

BurntSushi merged commit 64f9481 into main Jul 18, 2025
37 checks passed

BurntSushi deleted the ag/submodule-caching branch July 18, 2025 15:54

BurntSushi mentioned this pull request Jul 22, 2025

[ty] Make Module a Salsa ingredient #19495

Merged

	let changes = case.stop_watch(\|_: &ChangeEvent\| true);
	let changes = case.stop_watch(event_for_file("wazoo.py"));

	for entry in inner.system_by_path.iter_mut() {
	if entry.key().starts_with(&path) {
	File::sync_system_path(db, entry.key(), Some(*entry.value()));
	}
	}

[ty] Add caching for submodule completion suggestions #19408

[ty] Add caching for submodule completion suggestions #19408

Uh oh!

Conversation

BurntSushi commented Jul 17, 2025

Uh oh!

BurntSushi commented Jul 17, 2025

Uh oh!

github-actions bot commented Jul 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

mypy_primer results

Uh oh!

BurntSushi commented Jul 17, 2025

Uh oh!

MichaReiser left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MichaReiser Jul 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions bot commented Jul 17, 2025 •

edited

Loading

`mypy_primer` results

MichaReiser Jul 18, 2025 •

edited

Loading