-
Notifications
You must be signed in to change notification settings - Fork 1.6k
[ty] Add caching for submodule completion suggestions #19408
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Demo (although I guess this is technically indistinguishable from what's on completions-vscode-submodule-caching.mp4 |
|
fdd0de7 to
8371355
Compare
|
The test failure on Windows looks interesting. Maybe the file watcher isn't picking up the deleted file? If it doesn't, then the cache doesn't get invalidated and you end up with |
MichaReiser
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great. I left a few comments around the file watching test. I hope it helps resolve the windows issue.
The only thing that needs addressing before landing is that we properly handle the case where a user creates or deletes a new ty.toml or pyproject.toml (see inline comment).
I also think it's worth considering making Module a salsa tracked ingredient to remove the () pseudo argument hack but that's something for a separate PR and only worth doing if the fallout isn't too big
crates/ty/tests/file_watching.rs
Outdated
| ); | ||
|
|
||
| std::fs::write(case.project_path("bar/wazoo.py").as_std_path(), "")?; | ||
| let changes = case.stop_watch(|_: &ChangeEvent| true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume we're waiting here on the event for bar/wazoo.py. If so, I suggest using event_for_file("bar/wazoo.py"). That makes the test less flaky because the runner explicitly waits for an event for this file (or Rescan) rather than any event
| let changes = case.stop_watch(|_: &ChangeEvent| true); | |
| let changes = case.stop_watch(event_for_file("wazoo.py")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah okay, I didn't realize that providing a specific file path would help avoid flakes. That makes sense. I noticed the other tests did that, but wondered about it. I added some comments on stop_watch explaining this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you. I guess there are two benefits:
- It prevents "stopping" if there was a spurious file watcher event
- It gives better error messages because we can now say: We waited for an event for
wazoo.pybut we only saw ...
crates/ty_project/src/db/changes.rs
Outdated
| if let Some(root) = self.files().root(self, &absolute) { | ||
| root.set_revision(self).to(FileRevision::now()); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't sync_recursively already updating all roots?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like it doesn't. It only seems to sync roots that are beneath the path given. Which is maybe not intended? Specifically, if this is flipped around
ruff/crates/ruff_db/src/files.rs
Line 235 in ba7ed3a
| if root.path(db).starts_with(&path) { |
then I think it works how you describe. But it seems like the intent of sync_recursively is only to sync files/directories beneath the path given. But here, we want to sync the root for the path given, which should be above it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uhm, this looks the wrong way round... It doesn't seem very useful to bump a root if its parent directory changed? The idea of sync_recursively is to resync the entire subtree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah okay. So for file roots you want to go up, but for the actual files you want to go down. So I think this part is correct:
ruff/crates/ruff_db/src/files.rs
Lines 226 to 230 in ba7ed3a
| for entry in inner.system_by_path.iter_mut() { | |
| if entry.key().starts_with(&path) { | |
| File::sync_system_path(db, entry.key(), Some(*entry.value())); | |
| } | |
| } |
I can flip this condition around.
| sync_path(self, &path); | ||
| if synced_files.insert(path.to_path_buf()) { | ||
| File::sync_path(self, &path); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On line 284: We'll need to re-create the FileRoot. The case handled there is if a user deletes a pyproject.toml (or creates a new ty.toml). Doing so can change the root path of the project. You might want to move the FileRoot creation into the from_metadata method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. I actually had this at an even higher level, but I agree that putting it in from_metadata is even better.
Adding a regression test for this seems a little tricky. I think it's because the project is created initially and a file root is added. And even when the Project is re-created, I guess by that point the file root has already been added so everything still works?
Anyway, this is done. And I did add a test.
| inner: Arc<ModuleInner>, | ||
| } | ||
|
|
||
| #[salsa::tracked] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's probably best to do this in a separate PR but I think it could make sense to explore making Module a salsa::tracked and removing the inner Arc.
The decision for making something a salsa ingredient vs e.g. using an Arc is mostly on whether there are any salsa queries that take the struct as an argument. Module is an Arc because, up to now, no query took Module as an argument so it was simply unnecessary for it to be a sala tracked struct. This changes with your PR. That's why I think it's worth to give it a short try to see how big the fallout is if we make it a salsa::struct.
I would probably design it like this:
#[salsa::tracked(debug)]
#[derive(PartialEq, Eq)]
pub(crate) struct FileModule {
#[returns(ref)]
name: ModuleName,
kind: ModuleKind,
#[returns(ref)]
search_path: SearchPath,
file: File,
known: Option<KnownModule>,
}
#[salsa::tracked(debug)]
#[derive(PartialEq, Eq)]
pub(crate) struct NamespacePackage {
#[returns(ref)]
name: ModuleName
}
#[derive(salsa::Supertype, Eq, PartialEq, Debug)]
pub(crate) enum Module {
File(FileModule),
Namespace(NamespacePackage)
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! I'll give this a try in a follow-up.
8371355 to
615ad55
Compare
This change makes it so we aren't doing a directory traversal every time we ask for completions from a module. Specifically, submodules that aren't attributes of their parent module can only be discovered by looking at the directory tree. But we want to avoid doing a directory scan unless we think there are changes. To make this work, this change does a little bit of surgery to `FileRoot`. Previously, a `FileRoot` was only used for library search paths. Its revision was bumped whenever a file in that tree was added, deleted or even modified (to support the discovery of `pth` files and changes to its contents). This generally seems fine since these are presumably dependency paths that shouldn't change frequently. In this change, we add a `FileRoot` for the project. But having the `FileRoot`'s revision bumped for every change in the project makes caching based on that `FileRoot` rather ineffective. That is, cache invalidation will occur too aggressively. To the point that there is little point in adding caching in the first place. To mitigate this, a `FileRoot`'s revision is only bumped on a change to a child file's contents when the `FileRoot` is a `LibrarySearchPath`. Otherwise, we only bump the revision when a file is created or added. The effect is that, at least in VS Code, when a new module is added or removed, this change is picked up and the cache is properly invalidated. Other LSP clients with worse support for file watching (which seems to be the case for the CoC vim plugin that I use) don't work as well. Here, the cache is less likely to be invalidated which might cause completions to have stale results. Unless there's an obvious way to fix or improve this, I propose punting on improvements here for now.
5f7b543 to
ddae85e
Compare
We want to write queries that depend on `Module` for caching. While it seems it can be done without making `Module` an ingredient, it seems it is best practice to do so. [best practice to do so]: #19408 (comment)
We want to write queries that depend on `Module` for caching. While it seems it can be done without making `Module` an ingredient, it seems it is best practice to do so. [best practice to do so]: #19408 (comment)
We want to write queries that depend on `Module` for caching. While it seems it can be done without making `Module` an ingredient, it seems it is best practice to do so. [best practice to do so]: #19408 (comment)
We want to write queries that depend on `Module` for caching. While it seems it can be done without making `Module` an ingredient, it seems it is best practice to do so. [best practice to do so]: #19408 (comment)
This change makes it so we aren't doing a directory traversal every time
we ask for completions from a module. Specifically, submodules that
aren't attributes of their parent module can only be discovered by
looking at the directory tree. But we want to avoid doing a directory
scan unless we think there are changes.
To make this work, this change does a little bit of surgery to
FileRoot. Previously, aFileRootwas only used for library searchpaths. Its revision was bumped whenever a file in that tree was added,
deleted or even modified (to support the discovery of
pthfiles andchanges to its contents). This generally seems fine since these are
presumably dependency paths that shouldn't change frequently.
In this change, we add a
FileRootfor the project. But having theFileRoot's revision bumped for every change in the project makescaching based on that
FileRootrather ineffective. That is, cacheinvalidation will occur too aggressively. To the point that there is
little point in adding caching in the first place. To mitigate this, a
FileRoot's revision is only bumped on a change to a child file'scontents when the
FileRootis aLibrarySearchPath. Otherwise, weonly bump the revision when a file is created or added.
The effect is that, at least in VS Code, when a new module is added or
removed, this change is picked up and the cache is properly invalidated.
Other LSP clients with worse support for file watching (which seems to
be the case for the CoC vim plugin that I use) don't work as well. Here,
the cache is less likely to be invalidated which might cause completions
to have stale results. Unless there's an obvious way to fix or improve
this, I propose punting on improvements here for now.