Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(package): check dirtiness of symlinks source files #14981

Merged
merged 6 commits into from
Dec 31, 2024

Conversation

weihanglo
Copy link
Member

@weihanglo weihanglo commented Dec 24, 2024

What does this PR try to resolve?

This adds a special case for checking source files are symlinks
and have been modified when under a VCS control

This is required because those paths may link to a file outside the
current package root, but still under the git workdir, affecting the
final packaged .crate file.

How should we test and review this PR?

Pretty similar to #14966, as a part of #14967.

This may have potential performance issue. If a package contains thousands of symlinks, Cargo will fire git status for each of them. Not sure if we want to do anything proactively now.

The introduction of the PathEntry struct gives us more room for storing file metadata to satisfiy use cases in the future. For instances,

@rustbot
Copy link
Collaborator

rustbot commented Dec 24, 2024

r? @ehuss

rustbot has assigned @ehuss.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added Command-package Command-vendor S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Dec 24, 2024
// TODO: Should we warn users if there are like thousands of symlinks,
// which may hurt the performance?
if repo.status_file(&rel_path)? != git2::Status::CURRENT {
dirty_symlinks.insert(workdir.join(rel_path));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could potentially cache git status for each repo then. That leaves to future improvement if this one has got merged.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #14985.

github-merge-queue bot pushed a commit that referenced this pull request Dec 25, 2024
### What does this PR try to resolve?

Do nothing but move code around

* `cargo_package.rs` -> `cargo_package/mod.rs`
* Extract vcs info helpers to `cargo_package/vcs.rs`
* Extract verification helpers to `cargo_package/verify.rs`

Doing so because I realized how big `cargo_package.rs` has grown.
Like, the vcs helpers will continue growing with #14981 and potentially
other optimizations.

### How should we test and review this PR?

All tests pass.
@rustbot rustbot added the A-git Area: anything dealing with git label Dec 25, 2024
weihanglo added a commit to weihanglo/cargo that referenced this pull request Dec 26, 2024
It may not be worthy for the majority.
Single file `git status` is generally fast.

On some slow file systems or lower-end machines,
skipping file system access might be helpful.

However, slow file system access usually happen on Window.
And we'll only have large amount of `git status` when
rust-lang#14981 merges and the repo contains lots of symlinks.
Given symlink handling story is crazy in real world Windows,
I doubt anybody will hit the performance issue without this.
@weihanglo weihanglo marked this pull request as draft December 27, 2024 02:20
@weihanglo

This comment was marked as resolved.

@weihanglo weihanglo marked this pull request as ready for review December 29, 2024 16:10
@weihanglo
Copy link
Member Author

This doesn't handle symlink directories correctly with git. Fixing…

This has been fixed.

src/cargo/sources/path.rs Outdated Show resolved Hide resolved
src/cargo/sources/path.rs Outdated Show resolved Hide resolved
src/cargo/sources/path.rs Outdated Show resolved Hide resolved
@ehuss ehuss assigned epage and unassigned ehuss Dec 31, 2024
@weihanglo weihanglo force-pushed the package-symlink branch 2 times, most recently from 73a57f4 to 95bb4ed Compare December 31, 2024 20:48
github-merge-queue bot pushed a commit that referenced this pull request Dec 31, 2024
### What does this PR try to resolve?

`cargo package` will warn users when git `core.symlinks` is `false`
and some symlinks were checked out as plain files during packaging.

Git config [`core.symlinks`] defaults to true when unset.
In git-for-windows (and git as well),
the config should be set to false explicitly when the repo was created,
if symlink support wasn't detected [^1].

We assume the config was always set at creation time and never changed.
So, if it is true, we don't bother users with any warning.

[^1]:
https://github.com/git-for-windows/git/blob/f1241afcc7956918d5da33ef74abd9cbba369247/setup.c#L2394-L2403

[`core.symlinks`]:
https://git-scm.com/docs/git-config#Documentation/git-config.txt-coresymlinks

### How should we test and review this PR?

CI passes.

This shares two commits 42dc4ef and
c8c8223 with #14981.

I didn't commit to fix all symlink issues all at once.
This PR demonstrates how we could leverage metadata in `PathEntry`.
Maybe later we can really follow plain-text symlinks and resolve the
issue.

### Additional information

cc #5664
So `path` and `base` are able to accept different types
`abs_path` and `workdir.join(rel_path)` are the same at that point.
@rustbot

This comment has been minimized.

This is helpful for VCS status check.
Paths emitted by PathSource are always under package root,
We lose the track of file type info of paths under symlink dirs,
so we need this extra bit of information.
This show that a regular file under a symlink directory
is also not tarcked by the current vcs dirtiness check.
This adds a special case for checking source files are symlinks
and have been modified when under a VCS control

This is required because those paths may link to a file outside the
current package root, but still under the git workdir, affecting the
final packaged `.crate` file.

This may have potential performance issue. If a package contains
thousands of symlinks, Cargo will fire `git status` for each of them.
metdata path fields may point to a dirty symlilnk and cause
duplicate report. This commit combines `dirty_metadata_paths`
and `dirty_symlinks` into one so is able to de-duplicate them.
weihanglo added a commit to weihanglo/cargo that referenced this pull request Dec 31, 2024
It may not be worthy for the majority.
Single file `git status` is generally fast.

On some slow file systems or lower-end machines,
skipping file system access might be helpful.

However, slow file system access usually happen on Window.
And we'll only have large amount of `git status` when
rust-lang#14981 merges and the repo contains lots of symlinks.
Given symlink handling story is crazy in real world Windows,
I doubt anybody will hit the performance issue without this.
@epage epage added this pull request to the merge queue Dec 31, 2024
Merged via the queue into rust-lang:master with commit d85d761 Dec 31, 2024
20 checks passed
@weihanglo weihanglo deleted the package-symlink branch December 31, 2024 23:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-git Area: anything dealing with git Command-package Command-vendor S-waiting-on-review Status: Awaiting review from the assignee but also interested parties.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants