-
-
Notifications
You must be signed in to change notification settings - Fork 320
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gix clean -xde
deletes whole repo if .gitignore
lists *
or /
#1458
Comments
I've edited the description to note that the code I quoted is not necessarily where a change has to be made, since With us being in different time zones, it would have been useful had I contributed some part of the tests or fix between the initial report and now. Unfortunately I was quite distracted and, while I was able to make a bit of progress on other gitoxide-related things, I didn't really come up with anything for this. |
Thanks a lot for the incredibly detailed analysis of the problem. I also feel that ideally, a solution would make such kind of mistakes impossible, but I am also not sure this can happen. More will be known once I dive into the code.
I'd hope you will be able to submit a PR with the documentation improvements you would like to see - I cannot imagine anyone better suited to make them. Besides that, I hope that the default messaging around the removal of ignored directories works:
Even though the example is taken from the buggy version that is unaware this is the root, it does warn that ignored repositories would be removed. Overall, I like the way it guides the user towards adding flags to change the set of would-be-cleaned files. Regarding Regarding the
Agreed.
The intended behaviour is that it allows itself to remove ignored git repositories which are contained in ignored directories. That's easy to do as it may not even look deeper unless the The root cause is probably that it's possible to declare the top-level directory as ignored, which allows contained repositories to be removed as well.
This invocation shows the behaviour quite nicely, as it now found the 'hidden' repository itself, and flagged it for deletion as requested. Of course, that shouldn't be possible here. So I'd think this is a very 'local' bug as it really can only apply to the root of the repository, or so I'd think. |
Yes. Assuming it is intended that
This makes it seem not only that I'll think about how to improve that wording and open a pull request to do so.
As far as how But
That makes sense, because if I ignore
I would not consider this to disprove that, but I have found that when the current repository is a submodule with a That pertains to "Case 5" in the Expected behavior section of the issue description. I had not tried it at that time. I have tried it out now, by doing a recursive clone of has-submodule and experimenting with
Yes, I do think case 6 is already not a problem, and the experiments I've just done seem to confirm this. |
Thank you 🙏
Thanks for researching this. I will add this to the baseline tests (comparing Git results with
I mean, this is how
Thank you, that sounds wrong, too, and I will take a look. |
To clarify the part about the This is likely what you took me to mean, but I just realized that was extremely ambiguous, so I figured I'd clarify just in case. |
This is quite typical in the wild where there ignore files are used as include lists, instead of exclude-lists.
If they were, they would more easily be deleted by tooling like `gix clean`.
If they were, they would more easily be deleted by tooling like `gix clean`.
No problem! I'm holding off for a bit in doing so, since #1464 might affect what ought to be said. |
…deLabs#1458) This is quite typical in the wild where there ignore files are used as include lists, instead of exclude-lists.
…eLabs#1458). If they were, they would more easily be deleted by tooling like `gix clean`.
Current behavior 😯
gix clean -xde
will delete the entire top-level repository it is operating on, including tracked files and the.git
directory itself--thus the whole local history--if a.gitignore
file contains*
or/
. This seems to happen because the repository itself is identified as an untracked nested repository.Illustration with a real-world example
One approach to writing
.gitignore
is to list*
followed by!
exclusions. In such a repository, runninggix clean -xde
deletes the entire contents of the repository directory, causing inconvenience and the loss of any unpushed data. For example, on thecargo-update
repository:That is on Ubuntu 22.04 LTS.
Windows is affected when outside the directory
Although Windows superficially appears unaffected because open files and directories cannot usually be deleted, the entire repository directory will still be deleted if one runs
gix -r cargo-update clean -xde
from the parent directory. (This-r
is agix
option and should not be confused withgix clean -r
.) So really all systems are affected, though to different degrees.The
/
means the repositoryThe
/
is in each case referring to the top-level directory of the repository. It fortunately does not refer to the actual root of the filesystem. Likewise:..
components in.gitignore
do not cause files outside of the repository to be deleted..gitignore
patterns that cause only some of the contents of.git
to be deleted. Although this may seem like cold comfort, it is actually a major mitigating factor, because partial deletion of.git
would not necessarily be noticed and could result in situations that could be much harder to recover from, since local refs could be silently removed, recreated with different referents, and then force-pushed to a remote without awareness that they were replacing preexisting conceptually unrelated tags or branches.However, all local branches and their history (except the tip of any branches checked out in separate worktrees and unmodified), the index, any stashes, and other objects available through the reflog, can all be deleted.
Some relevant code
It looks like, in traversals, paths that are not conceptually part of the repository are pruned, except that when a
.git
directory would be pruned for this reason but also matches a.gitignore
entry, then it is instead retained and given theIgnored
status:https://github.com/Byron/gitoxide/blob/15f1cf76764221d14afa66d03a6528b19b9c30c9/gix-dir/src/entry.rs#L43-L49
But this is not necessarily to say that this aspect of the classification has to be changed. It may be suitable for most repositories found during traversal, just not for the top-level working tree or
.git
directory, nor for any submodules or their.git
files.Expected behavior 🤔
One never expects cleaning to remove tracked files or cause data loss in the repository's local history, much less loss of the whole history.
Although the behavior described here was not intended, and exposition may not be required to demonstrate that this is a bug, I've nonetheless detailed what I think is the expected behavior below. This is because I think it may be useful identifying current or future cases of the bug, some of which are less obvious than others.
A
*
entry in a.gitignore
has real-world uses (as shown in the example of thecargo-update
repository) and should be taken as a pattern that matches all files, which subsequent!
exclusions can then make exceptions to. A/
might likewise be used deliberately, though I'm not sure I have seen it outside of testing. Furthermore, the ability to delete nested untracked repositories is a deliberate feature ofgix clean
when passed some combinations of options, and a very useful feature at that. However:gix -r ... clean -xde
. This would apply even if the directory were actually empty due to a nonstandard worktree. (This-r
is agix
option and should not be confused with-r
as agix clean
option as presented below in case 4.).git
..git
directory nor anything inside it should ever be deleted in a clean, even if intentionally cleaning ignored files and even if intentionally including ignored subdirectories that are themselves repositories. Although the current repository's.git
directory is "ignored" in the sense that commands likegit add .
do not stage anything from it, this behavior is separate from, and supersedes, the effect of.gitignore
. The.git
directory is effectively a void in the working tree..git
is specified explicitly in.gitignore
and-r
is included in the options passed togix clean
, the.git
directory should not be deleted. Currently adding-r
will make this happen, covered in "Steps to reproduce" below. (This-r
is agix clean
option and should not be confused with-r
as agix
option as presented above in case 1.).git
file instead of a.git
directory. This likewise should not be deleted when runninggix clean
in the submodule, irrespective of the options togix clean
or the contents of any.gitignore
file. I have not tested this case yet..gitmodules
--those submodules should not be deleted bygix clean
, since submodule directories are likewise voids in the superproject's working tree, rather than being ignored due to.gitignore
. I have not tested this case yet either.Cases 2 and 3 are the most important, because they are the most common and they are the most likely to cause data loss, especially case 3.
Although I have not tested cases 5 and 6 (yet), I mention them because it seems like incorrect behavior for them might be easy to bring about by accident when fixing this bug for the other cases.
Notes on
-p
and-r
One possible implementation approach comes to mind for a fix that I want to recommend against. While
gix clean
recognizes precious files, I recommend against allowing any of the above to occur even when-p
is passed. I think regarding.git
as typically ineligible for deletion by automatically considering it a precious directory would still be far from strong enough protection. It also doesn't really fit conceptually: I believe that precious files are conceptually those that should usually not be deleted due to their status as being important for reasons independent of source control.In addition to the above, the effect of
-x
and-d
on actual nested repositories, especially if they are to continue to delete them under some conditions even in the absence of-r
, should be documented explicitly, including in the output ofgix help clean
. But that could be done separately from the fix for this bug.Git behavior
No one-to-one comparison...
There is no exact comparison to
git clean
behavior, becausegix clean
deliberately deletes ignored nested repositories when-x
and-d
are passed (provided that-e
is passed to allow it do anything at all). It furthermore seems to do so intentionally even without-r
, though as examined below in Steps to reproduce, perhaps this is unintentional in the absence of-r
.More broadly,
gix clean
is not intended to behave exactly the same asgit clean
, as detailed in #1308....but
git
behavior is relevantHowever, it's true that
git clean
does set strong expectations for what kinds of deletions are within the ambit of cleaning, and no way of usinggit clean
produces this effect.For example, running
git clean -xdf
in thecargo-update
repository used as an example above does not delete any tracked files or the.git
directory.Steps to reproduce 🕹
To reproduce this, one can carry out the example shown above with the non-toy
cargo-update
repository, which confirms the practical significance.Simplified reproducer
One can alternatively run the following commands to reproduce it with a simple repository. These and all commands shown in this section were tested in Ubuntu 22.04 LTS.
With full output:
Demonstration that this relates to nested repository handling
Some of the above commands are not necessary to confirm the bug but illustrate relevant aspects of it. For example, consider this part of the output of the dry-run
gix clean -xdn
command run before doing the real clean:This strongly suggests that the problem relates to the way code in
gix::dirwalk
identifies nested repositories.Variation with
/
The above commands can be repeated with a
/
entry instead of*
to confirm that it also happens with that. This works both with and without the second line,!.gitignore
, sincegit
itself treats a/
entry in.gitignore
not to cover.gitignore
. Actually I am not sure what a/
in.gitignore
is supposed to do.Variation with tracked files that are not special
Although the real-world example with
cargo-update
, as well as the minimal example where.gitignore
lists*
and!.gitingore
, illustrate that files excluded from being ignored by matching a!
pattern that comes after*
are not spared, here's a minimal example focused on that:That produces:
The key seems to be that the top-level directory is taken to match the entry
*
, causing its contents all to be deleted even if some of them match!
exclusions.Variation listing
.git
and passing-r
Making the
.gitignore
file contain.git
does not causegix clean -xde
to delete.git
, but it does causegix clean -xdre
to delete.git
. (Note that this-r
is an option togix clean
and should not be confused with the-r
option togix
before a subcommand, which specifies a repository to operate on.) This can be seen by running the commands:Here's what that looks like:
When should
-r
affect what happens?I believe this should not happen even with
-r
.In addition, combined with the above results, this raises the question of whether
gix clean -xde
without-r
is actually intended to delete any actually nested repositories:gix clean -xde
without-r
is currently a good command to delete both build output and generated archives when run ingitoxide
's own repository--might be considered part of this bug, or its own separate related bug.Other variations with
.git
and-r
Listing
/.git
has the same effect as.git
at least when the current directory is the top-level directory of the repository.Listing paths under
.git
in.gitignore
, such as with the line.git/config
or/.git/config
, fortunately has no effect. It does not appear that a.gitignore
entry can causegix clean
with any combination of options to attempt to delete only some files inside.git
.The text was updated successfully, but these errors were encountered: