-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Fix cargo locate-project --workspace performance issue #16423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
af4174a to
e6bd4cb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might also want to add some extra failing cases:
- Run from a nested directory of a package. The package specifies
package.workspaceto the other workspace but the workspace in its parent directory - Run from a nested directory of a package. The outer workspace manifest doesn't have that package as a member.
The current implementation may forget to take this into account #15107 (comment).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
e6bd4cb to
6667ca2
Compare
tests/testsuite/locate_project.rs
Outdated
| name = "nested" | ||
| version = "0.0.0" | ||
| [package.workspace] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test setup is wrong.
package.workspace should be a valid workspace path
https://doc.rust-lang.org/cargo/reference/manifest.html#the-workspace-field
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in what way?
I expected that point to a valid workspace rootz and that affects what this command returns
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m sorry if I’m missing something, but I don’t understand why this is not considered valid.
nested/Cargo.toml has workspace = "..", which resolves to the parent directory.
The parent Cargo.toml contains a [workspace] section with members = ["nested"], which is a valid workspace root.
Therefore, workspace = ".." points to a valid workspace path according to the documentation.
Is there something I’m misunderstanding?
In this test:
workspace = ".." points to the parent directory
The parent workspace lists ["nested"] in members, which includes this package
Given this, the cargo metadata command returns [ROOT]/foo/Cargo.toml (the parent workspace root).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was reading the wrong diff. Sorry.
In that comment I was looking for failure cases, where package.workspace points to some other workspaces, not the parent one. With that we can validate when workspace manifest and package manifest disagree with each other, and should return the workspace the package manifest belongs to.
This test itself is nice to have to validate the package.workspace working, but not the case I was looking for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, by failing I didn't meant they really need to fail. They are edge cases where Cargo might want to point to the right workspace rather than the wrong workspace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added these tests, plus one more to validate cases like nested/*. If the tests and solution look good, I’ll add them along with the commits made before the fix, as usual.
| .with_stdout_data( | ||
| str![[r#" | ||
| { | ||
| "root": "[ROOT]/foo/Cargo.toml" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this point to the non-member package?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With find_workspace_root, it returns the parent workspace root, even though the package is not a member
before change it should fails with error "current package believes it's in a workspace when it's not"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With find_workspace_root, it returns the parent workspace root, even though the package is not a member
Is this the desired behavior?
Please see #15107 (comment) that we still want a minimal verification
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
f04c198 to
f928e9e
Compare
tests/testsuite/locate_project.rs
Outdated
| .with_stdout_data( | ||
| str![[r#" | ||
| { | ||
| "root": "[ROOT]/foo/inner-workspace/Cargo.toml" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test doesn't make much sense to me. The inner-workspace/pkg/Cargo.toml will always point to the inner-workspace/Cargo.toml because it is the immediate parent directory that contain a workspace manifest, regardless of having a outer workspace manifest or not.
To avoid the parent directory probiding in your code path, you might want to change workspace = ".." to point to a workspace manifest that is in a sibiling directory of the package manifest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
src/cargo/core/workspace.rs
Outdated
| } | ||
| match self.members { | ||
| Some(ref members) => members.iter().any(|mem| { | ||
| if mem.contains('*') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not the correct way to handle glob syntax, see this function for the right approach (and we might want reuse it or refactor to reuse)
cargo/src/cargo/core/workspace.rs
Lines 2066 to 2101 in 1f5feec
| /// Returns expanded paths along with the glob that they were expanded from. | |
| /// The glob is `None` if the path matched exactly. | |
| #[tracing::instrument(skip_all)] | |
| fn members_paths<'g>( | |
| &self, | |
| globs: &'g [String], | |
| ) -> CargoResult<Vec<(PathBuf, Option<&'g str>)>> { | |
| let mut expanded_list = Vec::new(); | |
| for glob in globs { | |
| let pathbuf = self.root_dir.join(glob); | |
| let expanded_paths = Self::expand_member_path(&pathbuf)?; | |
| // If glob does not find any valid paths, then put the original | |
| // path in the expanded list to maintain backwards compatibility. | |
| if expanded_paths.is_empty() { | |
| expanded_list.push((pathbuf, None)); | |
| } else { | |
| let used_glob_pattern = expanded_paths.len() > 1 || expanded_paths[0] != pathbuf; | |
| let glob = used_glob_pattern.then_some(glob.as_str()); | |
| // Some OS can create system support files anywhere. | |
| // (e.g. macOS creates `.DS_Store` file if you visit a directory using Finder.) | |
| // Such files can be reported as a member path unexpectedly. | |
| // Check and filter out non-directory paths to prevent pushing such accidental unwanted path | |
| // as a member. | |
| for expanded_path in expanded_paths { | |
| if expanded_path.is_dir() { | |
| expanded_list.push((expanded_path, glob)); | |
| } | |
| } | |
| } | |
| } | |
| Ok(expanded_list) | |
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
src/cargo/core/workspace.rs
Outdated
| /// - The path is the workspace root manifest itself, or | ||
| /// - No explicit `members` list is specified (implicit membership), or | ||
| /// - The path matches one of the `members` patterns | ||
| fn is_path_member(&self, manifest_path: &Path) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is missing an important check: path dependencies.
@epage what is your though on this, as it may provide a false result?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Has there been any update or thoughts regarding the path dependencies check ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Briefly discussed this with epage. We would lean towards not having Cargo produce false results.
Path dependencies are implicit workspaces dependency if reside in the workspace directory (see the doc for complete definition). To determine whether a package is a path dependencies of any workspace, we need to load the manifest file, which undermines the performance improvement we want here for #15107
Ed has proposed a solution: If a package is explicitly listed as a member (in workspace.members), it is fine to skip loading manifest as Cargo knows it belongs to the workspace,. For package that is unknown whether it belongs to any workspace in its ancestor directories, Cargo needs to fallback to the all way (load everything) to deteremine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did that and tested it to check the dependencies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The one downside to this change is the error reporting will be conditioned on how the manifest is loaded.
Not saying we shouldn't do my idea but we need to be cognizant of that downside when deciding to move forward with it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback. I’ve applied the discussed changes and tested them.
Let me know if there’s anything else you’d like me to update.
f928e9e to
ae37cf6
Compare
ae37cf6 to
e051ddc
Compare
| // If loading fails (e.g., package is not a member of any workspace), | ||
| // treat the package as its own workspace root. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the fallback for loading failure? This sounds like a behavior change than an optimization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I think it should be like the old one:
workspace = args.workspace(gctx)?;
workspace.root_manifest()This way, we keep the fast path and the slow path for full validation, following the old approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
tests/testsuite/locate_project.rs
Outdated
| "sibling-workspace/Cargo.toml", | ||
| r#" | ||
| [workspace] | ||
| members = ["../pkg"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line is redundant, no? We already have package.workspace = "../sibling-workspace". What do we want to test here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, could we add these test before the fix like other tests, so the test snapshot diff will show the behavior change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it okay to add all the tests in one commit before the fix commit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I forgot to remove it before. I’ll removed it now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
tests/testsuite/locate_project.rs
Outdated
| p.cargo("locate-project --workspace") | ||
| .cwd("not-member") | ||
| .with_stderr_data(str![[r#" | ||
| [ERROR] current package believes it's in a workspace when it's not: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid this error, we should make not-member a workspace, which requires putting [workspace] in the manifest`. I think we can have both tests:
- failed with this error
- succeeded if in subdirector but having
[workspace].
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
tests/testsuite/locate_project.rs
Outdated
| .build(); | ||
|
|
||
| p.cargo("locate-project --workspace") | ||
| .cwd("not-member/src") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only difference between this and workspace_not_a_member is that the cwd is in not-member/src for? Why testing this? And if we need it, we may want to just adding another p.cargo("locate-project --workspace") call in workspace_not_a_member.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this test that it doesn't get confused by being one level deeper and still correctly identifies.
I`ll add it to workspace_not_a_member and remove this test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| "Cargo.toml", | ||
| r#" | ||
| [workspace] | ||
| members = ["nested"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we have package.workspace = ".." already in nested/Cargo.toml, do we still need workspace.members in the root Cargo.toml?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
src/cargo/core/workspace.rs
Outdated
| /// package and workspace agree on each other: | ||
| /// - If the package has an explicit `package.workspace` pointer, it is trusted | ||
| /// - Otherwise, the workspace must include the package in its `members` list | ||
| pub fn find_workspace_root_with_metadata( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of reimplementing most of the probing logic, I guess it might be possible to reuse find_workspace_root_with_loader with a provided loader?
It might look like
pub fn find_workspace_root_with_metadata(
…
) -> CargoResult<Option<PathBuf>> {
find_workspace_root_with_loader(manifest_path, gctx, |self_path| {
let source_id = SourceId::for_manifest_path(self_path)?;
let manifest = read_manifest(self_path, source_id, gctx)?;
match manifest.workspace_config() {
// check explicit membership...
}
})
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
src/cargo/core/workspace.rs
Outdated
| /// package and workspace agree on each other: | ||
| /// - If the package has an explicit `package.workspace` pointer, it is trusted | ||
| /// - Otherwise, the workspace must include the package in its `members` list | ||
| pub fn find_workspace_root_with_metadata( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name find_workspace_root_with_metadata is not ideal. with_metadata is less meaningful to API user. find_workspace_root_with_membership_check might be a better name for the exposed API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
e051ddc to
7990081
Compare
tests/testsuite/locate_project.rs
Outdated
| #[cargo_test] | ||
| fn workspace_nested_with_explicit_pointer() { | ||
| let p = project() | ||
| .file("Cargo.toml", "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This workspace manifest is not a valid manifest. I would assume this to fail. Having an explicit pointer in member manifest doesnt mean workspace manifest agrees with the membership.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’ll do it, but what do you mean by fail?
I mean, the test has already passed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This #16423 (comment).
Invalid workspace manifest should fail the invocation of cargo locate-project but I believed it wasn't the intent of this test.
tests/testsuite/locate_project.rs
Outdated
| .with_status(101) | ||
| .with_stderr_data(str![[r#" | ||
| [ERROR] current package believes it's in a workspace when it's not: | ||
| ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we omitting here? I feel like it is fine snapshotting the entire stderr, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
tests/testsuite/locate_project.rs
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be a valid workspace manifest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you want to mention here, since the code isn’t shown?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I meant .file("sibling-workspace/Cargo.toml", ""). If the workspace manifest isn't valid, the cargo locate-project should fail.
| WorkspaceConfig::Member { | ||
| root: Some(path_to_root), | ||
| } => { | ||
| // Has explicit `package.workspace` pointer - trust it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. We should not. We should check if the workspace manifest agrees with it.
So basically we'll always have to read at least two manifests for inspecting a member manifest.
| }; | ||
| expanded_members | ||
| .iter() | ||
| .any(|(member_path, _)| manifest_path.starts_with(member_path)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of starts_with we probably want to compare the path explicitly. Something like
| .any(|(member_path, _)| manifest_path.starts_with(member_path)) | |
| .any(|(member_path, _)| manifest_path.parent() == Some(member_path)) |
Also, you might want to check whether the member_path is not normalized (e.g., ../crates/* expand to ../crates/foo/Cargo.toml without removing ..). There are a couple of paths::normalize_path call in the module I am aware of.
| /// A `false` return does NOT mean the package is definitely not a member - | ||
| /// it could still be a member via path dependencies. Callers should fallback | ||
| /// to full workspace loading when this returns `false`. | ||
| fn is_explicitly_listed_member(&self, manifest_path: &Path) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cargo/src/cargo/core/workspace.rs
Lines 841 to 846 in 3861f60
| // default-members are allowed to be excluded, but they | |
| // still must be referred to by the original (unfiltered) | |
| // members list. Note that we aren't testing against the | |
| // manifest path, both because `members_paths` doesn't | |
| // include `/Cargo.toml`, and because excluded paths may not | |
| // be crates. |
I found this interesting comment. Wonder what would happen with the current implementation in these situations:
- A member is listed in
workspace.default-membersbut not inworkspace.members. - A member is listed in both
workspace.default-membersandworkspace.exclude - A member is listed in both
workspace.membersandworkspace.exclude
The new implementation should be aligned with the default args.workspace() behavior IMO.
7990081 to
2f54485
Compare
Fix cargo locate-project --workspace performance issue
cargo locate-project --workspace was unnecessarily loading and validating all workspace members, causing slowdowns in large workspaces (>1s for 1500+ members).
Changed to use find_workspace_root() instead of args.workspace(), which only parses the minimal manifests needed to locate the workspace root path without loading all members.
Fixes #15107