Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent .gitignore handling of symlinks in cargo package #10032

Closed
jonhoo opened this issue Nov 4, 2021 · 5 comments
Closed

Inconsistent .gitignore handling of symlinks in cargo package #10032

jonhoo opened this issue Nov 4, 2021 · 5 comments
Labels
A-git Area: anything dealing with git C-bug Category: bug

Comments

@jonhoo
Copy link
Contributor

jonhoo commented Nov 4, 2021

Problem

cargo package does not handle .gitignore rules the same as git for symlinks. Specifically, in .gitignore, /symlink will match a symlink whether even if it is a symlink to a directory, whereas Cargo will not match symlink in that case, causing it to fail to ignore the symlink and its target.

Similarly, /symlink/ in .gitignore will ignore the files under symlink in git, but not the symlink itself, whereas Cargo just dereferences the symlink and includes the referent directory contents, and essentially ends up ignoring the /symlink/ rule.

Furthermore, Cargo's behavior changes if --allow-dirty is passed. See below.

Steps

$ cargo new cargo-symlink-ignore
$ cd cargo-symlink-ignore
$ ln -s src/ src1
$ echo '/src1' >> .gitignore
$ ln -s src/ src2
$ echo '/src2/' >> .gitignore

Now, cargo --allow-dirty:

$ cargo package -l --allow-dirty | grep -v Cargo
src/main.rs
src1/main.rs
src2/main.rs

Notice that the files under both symlinks are included, and no actual symlinks are included.

Now, commit and try again without --allow-dirty:

$ git add .
$ git commit -m "x"
$ cargo package -l | grep -v Cargo
.cargo_vcs_info.json
.gitignore
src/main.rs
src2/main.rs

Ignoring the .cargo_vcs_info.json file, and the fact that .gitignore is now included(?), notice that this now (correctly) does not include src1 or the contents of its referent directory. But it still includes src2 as a directory even though /src2/ was specified as an ignore.

Constrast that with git ls-files:

$ git ls-files
.gitignore
Cargo.toml
src/main.rs
src2

Which represents the "true" behavior: src1 is not included (it matches /src1), and src2 is included, but only as a symlink.

Possible Solution(s)

Assuming we want to continue to not include symlinks in .crate files, the solution here is to make the walk implementation for --allow-dirty to make /symlink match symlink regardless of whether it points to a directory or not, and to make the git-based walk correctly handle /symlink/ exclusions. My first instinct was that for the former,

let is_dir = path.is_dir();

should be

let is_dir = path.symlink_metadata()?.is_dir();

but unfortunately I don't think that'll work since we should continue to treat the symlink as a directory for recursion purposes. I'm not sure how best to augment the git walking.

Notes

Here is a failing test for --allow-dirty:

fn gitignore_symlink_dir() {
    if !symlink_supported() {
        return;
    }

    project()
        .file("src/main.rs", r#"fn main() { println!("hello"); }"#)
        .symlink_dir("src", "src1")
        .symlink_dir("src", "src2")
        .file(".gitignore", "/src1\n/src2/")
        .build()
        .cargo("package -l")
        .with_stderr("")
        .with_stdout(
            "\
            Cargo.lock\n\
            Cargo.toml\n\
            Cargo.toml.orig\n\
            src/main.rs\n\
         ",
        )
        .run();
}

I'm not sure how to best trigger the git-based walk inside the test suite.

Version

cargo 1.56.0 (4ed5d137b 2021-10-04)
release: 1.56.0
commit-hash: 4ed5d137baff5eccf1bae5a7b2ae4b57efad4a7d
commit-date: 2021-10-04

Though also occurs on master @ 94ca096

@jonhoo jonhoo added the C-bug Category: bug label Nov 4, 2021
@ehuss ehuss added A-git Area: anything dealing with git E-help-wanted labels Nov 4, 2021
@jonhoo
Copy link
Contributor Author

jonhoo commented Nov 4, 2021

I suspect this may partially be a bug in the ignore library, or in the way we invoke it. I'll try digging into that and see if I can reproduce outside of Cargo.

@jonhoo
Copy link
Contributor Author

jonhoo commented Nov 5, 2021

Okay, a couple of things from digging deeper:

  1. Cargo only respects .gitignore when in a git-managed directory. The test above did not create a git repository, so the .gitignore was ignored.
  2. When in a git-managed directory, Cargo currently does the right thing (🎉) — see Add tests for ignoring symlinks #10047
  3. (the source of my observed bug) Cargo gets confused about whether it's in a git-managed directory if Cargo.toml is not in the index (e.g., if you run git rm --cached Cargo.toml or simply haven't git added it yet). Test (and hopefully fix) coming shortly.

@jonhoo
Copy link
Contributor Author

jonhoo commented Nov 5, 2021

Failing test:

#[cargo_test]
/// Test that .gitignore is respected even if Cargo.toml isn't in git.
fn gitignore_with_untracked_cargo_toml() {
    if !symlink_supported() {
        return;
    }

    let (p, repo) = git::new_repo("foo", |p| {
        p.file("src/main.rs", r#"fn main() { println!("hello"); }"#)
            .file("foo.txt", "hello")
            .file(".gitignore", "foo.txt")
    });

    let mut index = repo.index().unwrap();
    index.remove_path(&Path::new("Cargo.toml")).unwrap();
    index.write().unwrap();

    p.cargo("package -l --no-metadata --allow-dirty")
        .with_stderr("")
        .with_stdout(
            "\
.gitignore
Cargo.lock
Cargo.toml
Cargo.toml.orig
src/main.rs
",
        )
        .run();
}

Fails with

test failed running `/Users/jongje/dev/cargo/target/debug/cargo package -l --no-metadata --allow-dirty`
error: stdout did not match:
1        -.gitignore
2   1     Cargo.lock
3   2     Cargo.toml
4   3     Cargo.toml.orig
    4    +foo.txt
5   5     src/main.rs

@jonhoo
Copy link
Contributor Author

jonhoo commented Nov 5, 2021

The rabbithole continues. This appears to be intentional:

/// Returns `Some(git2::Repository)` if found sibling `Cargo.toml` and `.git`
/// directory; otherwise, caller should fall back on full file list.

let manifest_path = repo_relative_path.join("Cargo.toml");
if index.get_path(&manifest_path, 0).is_some() {
return Ok(Some(repo));
}
// Package Cargo.toml is not in git, don't use git to guide our selection.
Ok(None)

In fact, the commit that originally introduced this logic states

// If we find a git repository next to this Cargo.toml, we still
// check to see if we are indeed part of the index. If not, then
// this is likely an unrelated git repo, so keep going.

But I think maybe there's a case missing here, which is that if no other git repositories are found, then we should go with the "topmost" one that is at or above the found Cargo.toml.

Ultimately though, I'll leave it up to you all to decide whether that should be considered a bug or more like a lack of a feature. In the meantime I'll close this issue, since the issue as reported is not actually present.

@jonhoo jonhoo closed this as completed Nov 5, 2021
@jonhoo
Copy link
Contributor Author

jonhoo commented Nov 5, 2021

I'll add that one place where this comes up is if someone tries to do the following:

$ cargo new project
$ cd project
$ $EDITOR # implement the first alpha of the project
$ $EDITOR .gitignore # add some huge log files to gitignore
$ cargo publish --allow-dirty

Then they'll accidentally include all the logfiles in their published artifact. It's fixed if they git add ., and they do have to explicitly choose to --allow-dirty, but it's probably still somewhat unexpected.

bors added a commit that referenced this issue Mar 23, 2022
Add tests for ignoring symlinks

This adds tests for the expected behavior in #10032. Interestingly, these tests pass (🎉). Will update that issue with more details shortly, but figured these tests were worthwhile to add to the testsuite anyway now that I've written them.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-git Area: anything dealing with git C-bug Category: bug
Projects
None yet
Development

No branches or pull requests

2 participants