Skip to content

Conversation

@BurntSushi
Copy link
Member

This change makes it so we aren't doing a directory traversal every time
we ask for completions from a module. Specifically, submodules that
aren't attributes of their parent module can only be discovered by
looking at the directory tree. But we want to avoid doing a directory
scan unless we think there are changes.

To make this work, this change does a little bit of surgery to
FileRoot. Previously, a FileRoot was only used for library search
paths. Its revision was bumped whenever a file in that tree was added,
deleted or even modified (to support the discovery of pth files and
changes to its contents). This generally seems fine since these are
presumably dependency paths that shouldn't change frequently.

In this change, we add a FileRoot for the project. But having the
FileRoot's revision bumped for every change in the project makes
caching based on that FileRoot rather ineffective. That is, cache
invalidation will occur too aggressively. To the point that there is
little point in adding caching in the first place. To mitigate this, a
FileRoot's revision is only bumped on a change to a child file's
contents when the FileRoot is a LibrarySearchPath. Otherwise, we
only bump the revision when a file is created or added.

The effect is that, at least in VS Code, when a new module is added or
removed, this change is picked up and the cache is properly invalidated.
Other LSP clients with worse support for file watching (which seems to
be the case for the CoC vim plugin that I use) don't work as well. Here,
the cache is less likely to be invalidated which might cause completions
to have stale results. Unless there's an obvious way to fix or improve
this, I propose punting on improvements here for now.

@BurntSushi BurntSushi requested review from dhruvmanila and removed request for AlexWaygood, carljm, dcreager and sharkdp July 17, 2025 18:07
@BurntSushi BurntSushi added internal An internal refactor or improvement server Related to the LSP server ty Multi-file analysis & type inference labels Jul 17, 2025
@BurntSushi
Copy link
Member Author

Demo (although I guess this is technically indistinguishable from what's on main):

completions-vscode-submodule-caching.mp4

@github-actions
Copy link
Contributor

github-actions bot commented Jul 17, 2025

mypy_primer results

No ecosystem changes detected ✅
No memory usage changes detected ✅

@BurntSushi BurntSushi force-pushed the ag/submodule-caching branch from fdd0de7 to 8371355 Compare July 17, 2025 18:25
@BurntSushi
Copy link
Member Author

The test failure on Windows looks interesting. Maybe the file watcher isn't picking up the deleted file? If it doesn't, then the cache doesn't get invalidated and you end up with wazoo in the completions (which should be gone).

Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great. I left a few comments around the file watching test. I hope it helps resolve the windows issue.

The only thing that needs addressing before landing is that we properly handle the case where a user creates or deletes a new ty.toml or pyproject.toml (see inline comment).

I also think it's worth considering making Module a salsa tracked ingredient to remove the () pseudo argument hack but that's something for a separate PR and only worth doing if the fallout isn't too big

);

std::fs::write(case.project_path("bar/wazoo.py").as_std_path(), "")?;
let changes = case.stop_watch(|_: &ChangeEvent| true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume we're waiting here on the event for bar/wazoo.py. If so, I suggest using event_for_file("bar/wazoo.py"). That makes the test less flaky because the runner explicitly waits for an event for this file (or Rescan) rather than any event

Suggested change
let changes = case.stop_watch(|_: &ChangeEvent| true);
let changes = case.stop_watch(event_for_file("wazoo.py"));

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah okay, I didn't realize that providing a specific file path would help avoid flakes. That makes sense. I noticed the other tests did that, but wondered about it. I added some comments on stop_watch explaining this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. I guess there are two benefits:

  • It prevents "stopping" if there was a spurious file watcher event
  • It gives better error messages because we can now say: We waited for an event for wazoo.py but we only saw ...

Comment on lines 136 to 138
if let Some(root) = self.files().root(self, &absolute) {
root.set_revision(self).to(FileRevision::now());
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't sync_recursively already updating all roots?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like it doesn't. It only seems to sync roots that are beneath the path given. Which is maybe not intended? Specifically, if this is flipped around

if root.path(db).starts_with(&path) {

then I think it works how you describe. But it seems like the intent of sync_recursively is only to sync files/directories beneath the path given. But here, we want to sync the root for the path given, which should be above it.

Copy link
Member

@MichaReiser MichaReiser Jul 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uhm, this looks the wrong way round... It doesn't seem very useful to bump a root if its parent directory changed? The idea of sync_recursively is to resync the entire subtree.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah okay. So for file roots you want to go up, but for the actual files you want to go down. So I think this part is correct:

for entry in inner.system_by_path.iter_mut() {
if entry.key().starts_with(&path) {
File::sync_system_path(db, entry.key(), Some(*entry.value()));
}
}

I can flip this condition around.

sync_path(self, &path);
if synced_files.insert(path.to_path_buf()) {
File::sync_path(self, &path);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On line 284: We'll need to re-create the FileRoot. The case handled there is if a user deletes a pyproject.toml (or creates a new ty.toml). Doing so can change the root path of the project. You might want to move the FileRoot creation into the from_metadata method

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I actually had this at an even higher level, but I agree that putting it in from_metadata is even better.

Adding a regression test for this seems a little tricky. I think it's because the project is created initially and a file root is added. And even when the Project is re-created, I guess by that point the file root has already been added so everything still works?

Anyway, this is done. And I did add a test.

inner: Arc<ModuleInner>,
}

#[salsa::tracked]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably best to do this in a separate PR but I think it could make sense to explore making Module a salsa::tracked and removing the inner Arc.

The decision for making something a salsa ingredient vs e.g. using an Arc is mostly on whether there are any salsa queries that take the struct as an argument. Module is an Arc because, up to now, no query took Module as an argument so it was simply unnecessary for it to be a sala tracked struct. This changes with your PR. That's why I think it's worth to give it a short try to see how big the fallout is if we make it a salsa::struct.

I would probably design it like this:

#[salsa::tracked(debug)]
#[derive(PartialEq, Eq)]
pub(crate) struct FileModule {
	#[returns(ref)]
	name: ModuleName, 
	kind: ModuleKind,
	#[returns(ref)]
  search_path: SearchPath,
  file: File,
  known: Option<KnownModule>,
}

#[salsa::tracked(debug)]
#[derive(PartialEq, Eq)]
pub(crate) struct NamespacePackage {
	#[returns(ref)]
	name: ModuleName
}

#[derive(salsa::Supertype, Eq, PartialEq, Debug)]
pub(crate) enum Module {
	File(FileModule),
	Namespace(NamespacePackage)
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! I'll give this a try in a follow-up.

@BurntSushi BurntSushi force-pushed the ag/submodule-caching branch from 8371355 to 615ad55 Compare July 18, 2025 14:42
This change makes it so we aren't doing a directory traversal every time
we ask for completions from a module. Specifically, submodules that
aren't attributes of their parent module can only be discovered by
looking at the directory tree. But we want to avoid doing a directory
scan unless we think there are changes.

To make this work, this change does a little bit of surgery to
`FileRoot`. Previously, a `FileRoot` was only used for library search
paths. Its revision was bumped whenever a file in that tree was added,
deleted or even modified (to support the discovery of `pth` files and
changes to its contents). This generally seems fine since these are
presumably dependency paths that shouldn't change frequently.

In this change, we add a `FileRoot` for the project. But having the
`FileRoot`'s revision bumped for every change in the project makes
caching based on that `FileRoot` rather ineffective. That is, cache
invalidation will occur too aggressively. To the point that there is
little point in adding caching in the first place. To mitigate this, a
`FileRoot`'s revision is only bumped on a change to a child file's
contents when the `FileRoot` is a `LibrarySearchPath`. Otherwise, we
only bump the revision when a file is created or added.

The effect is that, at least in VS Code, when a new module is added or
removed, this change is picked up and the cache is properly invalidated.
Other LSP clients with worse support for file watching (which seems to
be the case for the CoC vim plugin that I use) don't work as well. Here,
the cache is less likely to be invalidated which might cause completions
to have stale results. Unless there's an obvious way to fix or improve
this, I propose punting on improvements here for now.
@BurntSushi BurntSushi force-pushed the ag/submodule-caching branch 3 times, most recently from 5f7b543 to ddae85e Compare July 18, 2025 15:18
@BurntSushi BurntSushi merged commit 64f9481 into main Jul 18, 2025
37 checks passed
@BurntSushi BurntSushi deleted the ag/submodule-caching branch July 18, 2025 15:54
BurntSushi added a commit that referenced this pull request Jul 22, 2025
We want to write queries that depend on `Module` for caching. While it
seems it can be done without making `Module` an ingredient, it seems it
is best practice to do so.

[best practice to do so]: #19408 (comment)
BurntSushi added a commit that referenced this pull request Jul 23, 2025
We want to write queries that depend on `Module` for caching. While it
seems it can be done without making `Module` an ingredient, it seems it
is best practice to do so.

[best practice to do so]: #19408 (comment)
BurntSushi added a commit that referenced this pull request Jul 23, 2025
We want to write queries that depend on `Module` for caching. While it
seems it can be done without making `Module` an ingredient, it seems it
is best practice to do so.

[best practice to do so]: #19408 (comment)
AlexWaygood pushed a commit that referenced this pull request Jul 25, 2025
We want to write queries that depend on `Module` for caching. While it
seems it can be done without making `Module` an ingredient, it seems it
is best practice to do so.

[best practice to do so]: #19408 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

internal An internal refactor or improvement server Related to the LSP server ty Multi-file analysis & type inference

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants