Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto-add of untracked files screws me up every time #323

Open
durin42 opened this issue May 16, 2022 · 131 comments
Open

Auto-add of untracked files screws me up every time #323

durin42 opened this issue May 16, 2022 · 131 comments
Assignees

Comments

@durin42
Copy link
Collaborator

durin42 commented May 16, 2022

Description

Initially I thought the auto-add of files was a neat idea, but in practice I just leave untracked files in my repo all the time, and tools like patch(1) assume they can drop .orig or similar files in the WC without it being a problem. I think every time I've used jj I've ended up getting grumpy at auto-adds and having to rip something out of a commit, sometimes after doing a push (when it was effectively emulating hg import for example).

Steps to Reproduce the Problem

  1. patch -p1 < some.diff (or similar)
  2. jj describe && jj close && jj git push
  3. Look at web view of diff, notice you committed a .orig file again (or similar)

Expected Behavior

I (still) don't expect auto-adds, and it really surprises me every time, plus it's super frustrating to have to ignore every file spec I might create as a temporary scratch file or what have you.

Actual Behavior

As above.

Specifications

@arxanas
Copy link
Collaborator

arxanas commented May 17, 2022

Agreed on that — I think automatically committing tracked files is fine, but untracked is probably bad:

  • They could be very big
  • They could contain secrets
  • Querying the working copy for only tracked files is probably more efficient in practice (like with git status -uno)

@martinvonz
Copy link
Owner

I mostly find the feature useful, but I also agree that it can be annoying and confusing. The worst case I've noticed is when you - perhaps accidentally - check out the root commit, where there's no .gitignore file containing target/. Then any command you run will try to commit GBs of data. I ran into that again just a few days ago, and was actually thinking of adding a config for the auto-add behavior.

When looking at an old version of the repo, you'll not see untracked files (e.g. jj --at-op=<some old operation> status), but that seems fine. I think the most annoying bit will be to add a way of providing that information to users who want to commit the working copy in the background (like we will probably do at Google). I'll probably skip that bit to start with when I implement this.

martinvonz added a commit that referenced this issue Jun 10, 2022
I plan to use this matcher for some future `jj add` command (for
#323). The idea is that we'll do a path-restricted walk of the working
copy based on the intersection of the sparse patterns and any patterns
specified by the user. However, I think it will be useful before that,
for @arxanas's fsmonitor feature (#362).
martinvonz added a commit that referenced this issue Jun 10, 2022
I plan to use this matcher for some future `jj add` command (for
#323). The idea is that we'll do a path-restricted walk of the working
copy based on the intersection of the sparse patterns and any patterns
specified by the user. However, I think it will be useful before that,
for @arxanas's fsmonitor feature (#362).
@tp-woven
Copy link
Collaborator

Just to throw my 2c in: With the exception of new files that are also added to .gitignore, I actually really like the automatic tracking. I also think the same "repro" from the first comment can be used as a reason to auto-track - if you don't jj st before pushing, you're just as likely to miss a file that you should have added as you are a file that you should have ignored. So my preference is that this would be configurable if possible, rather than just removed.

@elasticdog
Copy link
Collaborator

Just wanted to say that I really like the auto-tracking feature as well. After using jj for a while, going back to manually having to think about adding files to be tracked seems like a lot of extra work and possibly more error prone. I'm already used to checking jj status to make sure that I'm committing the expected files...that said, I can definitely see how not thinking about the appropriate ignores right away could be troublesome if the files are large, and also acknowledge the the root commit situation (although I don't know how often that would realistically come up in day-to-day usage).

@necauqua
Copy link
Collaborator

necauqua commented May 8, 2023

The thing with accidentally committing GBs of target when you forgot to ignore it or checkout some old commit is that there's no GC at the moment, so those GBs are forever in the history.

Especially if you have something automatically creating it, like direnv, the moment you check out the commit before your update of gitignore, literally happened to me with jj repo more than once.

The only way is to recreate the repo (losing the oplog) this time not forgetting to ignore stuff/not editing old commits - which is kind of meh as well.

Full GC of course means basically the same thing, only automated by a single command, but something like jj undo --harder [--at-op OP] [--and-immediate-gc-too] (are you sure [y/N]) to delete the last op/OP unrecoverably and gc things it referenced could be cool.

@ony
Copy link

ony commented Jun 11, 2023

Maybe it is possible to track ignores in jj too. If you switch away from commit at which you had something ignored but not at new one - that information probably can be used. E.g. sort of in-flight ignores that visible in jj st to inform about either extending explicit ignores or confirming addition of new paths to current change.

P.S. Can think of how git checkout reacts when your untracked file is about to be overwritten by checked out files. In case of jj all files are tracked by default and thus even absence of the file is tracked.

@necauqua
Copy link
Collaborator

necauqua commented Jun 11, 2023

Hm, so auto-tracking when you are doing changes in a working commit is the main killer feature.

But for my issue with it, how about this:
What if jj considered .gitignore not just from the current commit, but from the entire history - or actually just descendant of the current commit?
And if you needed to actually add some file in the past you'd have to explicitly track it (instead of explicitly untracking it in the common case you don't need that).

I know this is kind of magical, but the more I think about it the more it makes sense, idk - and the autorebase is magical on it's own, this somehow is kind of even consistent in my head

@martinvonz
Copy link
Owner

martinvonz commented Jun 11, 2023

It would be expensive to find the gitignores from all commits, but we could probably index that information. I'm more concerned that it would be unexpected behavior. For example, if you check out a sibling commit where target/ is not ignored, then it would still not be ignored if we only consider descendants.

Maybe it's better to check if the gitignores changed between the old and the new commit and if any untracked files according to the new patterns match the old patterns. If that happen, we could just print a warning about it. We could additionally add the ignores to a per-workspace set of ignores (which we don't support yet). (EDIT: I think this is what @ony suggested.)

@necauqua
Copy link
Collaborator

If that happen, we could just print a warning about it

The warning being "those differences are implicitly untracked for this WC, in case your direnv caused GBs of files to generate in target and/or .direnv - add them to gitignore here or explicitly track them, moving to another commit without them ignored (e.g. jj new) will cause them to be tracked in that commit"

^ this is a loose idea, could be refined, for example jj st will have that information ofc

@ilyagr
Copy link
Collaborator

ilyagr commented Aug 13, 2023

Following @ony's suggestion, perhaps when the working copy moves to a new commit, we could track "newly unignored" files by comparing the old .gitignore and the new .gitignore. For example, we could store an extra tree of "newly unignored" files.

Then, the UI could provide ways for dealing with these files. E.g. there could be a command like jj ignored --previously that lists them and jj ignored --previously --restore that gets rid of them. We'd also have to decide whether a modification to a "previously ignored" file makes it no longer "previously ignored".

@kevincliao
Copy link
Collaborator

kevincliao commented Jan 13, 2024

I ran into this today when trying to checkout to a different branch that doesn't contain node_modules. As people have mentioned above at that point it's not possible to run any jj commands. I ended up creating a temporary .gitignore file before being able to jj op restore to a previous checkpoint. Is there a better way to recover when running into this? I wonder if it's possible to have jj op commands still work in this scenario.

@kevincliao
Copy link
Collaborator

Ahh ignore my comment, I think I got confused - once there is an option to not auto-add untrack files jj op commands will work again.

@yuja
Copy link
Collaborator

yuja commented Jan 14, 2024

Is there a better way to recover when running into this? I wonder if it's possible to have jj op commands still work in this scenario.

You can pass --ignore-working-copy to these commands, but we don't have the last bit to reset the working copy without snapshotting yet.

jj op log --ignore-working-copy  # "jj op log" also works with the current main branch
jj op restore --ignore-working-copy @-
jj workspace update-stale --some-option-to-not-snapshot-before-resetting

@HadrienG2
Copy link

HadrienG2 commented Jan 25, 2024

Overall, it seems there is no good automated way to handle untracked files when creating commits. One one hand, not tracking them leads to incomplete commits. On the other hand, auto-tracking them leads to commiting of unwanted files. So how would you feel about some variation of the following semi-automated design?

  1. By default, interactively prompt before auto-adding files, something like This command will add <list of files> to the current commit, proceed? [y/N].
    • Saying yes follows the current behavior.
    • Saying no aborts the command with an error return value and lets you use jj track and .gitignore as appropriate.
  2. Have a way to whitelist sets of files (e.g. source files) so that they are auto-added without a prompt, and not mentioned in prompts when they do occur.
    • This could take the form of a .jjadd file that uses the same glob syntax as .gitignore.
    • The aforementioned prompt would mention the possibility of configuring jj for auto-adding and ignoring.

I think this might strike a good balance between the following concerns:

  • Files which we want to track (like source files) eventually get auto-added silently as desired, avoiding incomplete commits and replicating the good parts of the current jj auto-add UX.
  • Files which we do not want to track (like object files, target/ directories...) eventually get ignored silently as desired, without undesirable creation of commits that will keep them in the history forever.
  • After a short initial configuration period, seeing the prompt becomes an exceptional event and thus leads the user to pause and think, as desired in this situation.

For scripted operation, there should be a way to provide a default answer to the prompt via CLI arguments.

@martinvonz
Copy link
Owner

I'm personally quite happy with the current behavior (except for the behavior when updating to a commit with different .gitignore). It can be a bit annoying in the beginning, but once you've added the appropriate paths, I find that it works pretty well. Maybe others feel differently. But even if they don't, we may want to make it less annoying for new users by doing something like you suggest.

@ilyagr
Copy link
Collaborator

ilyagr commented Jan 25, 2024

I'm not very happy with the idea of the interactive prompt.

I think that if you edit a .gitignore, any subsequent jj command could trigger this prompt, including jj log. I tend to run an analogue of watch jj log in a tmux pane permanently, and I think this would work very badly with the interactive prompt. Firstly, I'll need to adjust the command to use the "scripted mode". In the "scripted" mode, if the default answer to the prompt is "yes", this goes back to users experiencing auto-add of untracked files. If it's "no", jj's view of the workspace could be out of sync with reality for a while (but, if we go with a prompt, I think this is the better option).

Other UIs will also do an analogue of jj log regularly. Every jj UI (e.g. VS Code plugin) would probably need to have a way of giving this prompt to the user, if we made this interactive.

@HadrienG2
Copy link

Ah, yes, there's that. I knew that this design decision of having status commands modify the repository was fishy and going to cause problems someday...

@lf-
Copy link

lf- commented Apr 12, 2024

This feature has sadly made me bounce off of jj immediately every time I try it, which is really unfortunate because I keep hearing such good things about it, and want to give it a genuine try.

Every single repository I work on regularly has various testing/strace-log/whatever files in its root. I actually don't mind auto tracking in src/, but in the root it just is not compatible with my workflow.

fwiw a workaround for this that I've not yet checked works on jj might be some kind of terrible thing like so in the .git/info/exclude:

/*
!src/
!Cargo.*

This workaround is quite bad indeed, and I would rather not have to reimplement the git index in .git/info/exclude to be able to use jj, though admittedly it would be with wildcards at least.

@ilyagr
Copy link
Collaborator

ilyagr commented Apr 12, 2024

I have no idea whether this would be helpful you, but here's something that helped me a lot. I can't remember who had suggested it originally; it might be in the FAQ.

  • I added _ignore/* to ~/.config/git/ignore (~/.gitignore should also work).

    This is possibly not absolutely optimal (I have been wondering whether /_ignore/ would be better), but works well enough. I actually use _ilyaignore to make the name more unique.

  • Create an _ignore subdir in my repo

  • Save all weird logs and traces to it

@lf-
Copy link

lf- commented Apr 12, 2024

Yup it is in the FAQ or something; I've seen it given as advice before. I just don't like it and it doesn't vibe with how I work, since it would be a whole bunch of extra typing. I could have it be i/ or something, I guess, to reduce typing, but I would still have to remember to do it every time, which feels kind of bad?

@ilyagr
Copy link
Collaborator

ilyagr commented Apr 18, 2024

Inspired by more feedback (#3528 (comment)), perhaps @dpc 's suggestion from that post might work. Perhaps we could have a notion of "untracked" files, like Git, and default files to "untracked", while also auto-updating all the tracked files on each command?

jj status would certainly show any untracked files. jj log could too. It's not quite in the spirit of "everything is a commit" (untracked files would show up as a fake commit in some places), but might work.

One question is what jj diff would do. My first instinct would be to have it act on tracked files only, but complain loudly when there are untracked files. Perhaps each command would do that, I'm unsure.

This would be a huge change, so I almost certainly missed some important considerations.

@dpc
Copy link

dpc commented Apr 18, 2024

This would be a huge change

Speaking out of ignorance, I'm guessing jj already needs to compare all worktree files against .gitignore. Right after that it could just compare them against files already tracked in the current change and ignore ones that are not. Plus a command to track a file. And that's kind of it, no? Showing untracked files, etc. seems like a nice-to-have. Deleting a tracked file could work as a "untrack", just like it already does.

Hmm... I guess mv <sometrackedfile> <newlocation> now requires explicit calling "track" on a new location, which is a bit breaking the "immersion", but I think it's fine. And again - this behavior would be optional (but I would suggest making it the default for the sake of newcomers). People that figured out everything could just opt-in into current seamless behavior, which I find elegant and I'm sure I would eventually settle into it just fine, after making sure given repo doesn't produce untracked trash, adding some ./tmp/ to .gitignore and remembering to create my debugging stuff inside it.

@martinvonz
Copy link
Owner

jj status would certainly show any untracked files. jj log could too. It's not quite in the spirit of "everything is a commit" (untracked files would show up as a fake commit in some places), but might work.

One question is what jj diff would do. My first instinct would be to have it act on tracked files only, but complain loudly when there are untracked files. Perhaps each command would do that, I'm unsure.

If we add support for untracked files, I think it should be pretty much only jj status that shows them. They would just be invisible to every other command. Would that work for the untracked-files proponents?

I guess mv <sometrackedfile> <newlocation> now requires explicit calling "track" on a new location

I don't think so. Almost all commands, and probably also the future jj mv work on commits and just update the working copy to match afterwards.

@sheremetyev
Copy link
Collaborator

Another issue that confuses me at the moment: suppose jj made changes to .gitignore after jj new. What should jj restore do, either to the .gitignore or to the newly ignored files listed in it.

This is tricky... also abandoning commit that adjusted .gitignore. Another problem - .gitignore file itself might have merge conflict markers.

Here is a different direction: make ignored files/directories "first class" citizens and maintain as a property of the working copy. In this case .gitignore files become "instruction what to do when a new file is just created": either start tracking new file or add it to the list of untracked files. After decision to track or not track is made (next snapshotting), user can override it (track or untrack manually). When user switches working copy to another commit, the list of untracked files for the working copy is preserved and nothing is added to the commit automatically. When file is deleted from disk it is automatically removed from the list of untracked files (if recreated then .gitignore kicks in again).

Would something like that be a reasonable behaviour?

@ilyagr
Copy link
Collaborator

ilyagr commented Oct 10, 2024

This is tricky... also abandoning commit that adjusted .gitignore.

I discussed this a bit in my "train of thought" comment, see also below.

Another problem - .gitignore file itself might have merge conflict markers.

I think that's a separate issue that jj is already facing. I'm not actually sure what jj does in this case (and I'm getting curious!), but it hasn't been a serious problem in practice.


I'll ignore the rest of @sheremetyev 's interesting comment for now (need to think about it more) and give a potential outline of a design for the "automatically update gitignore for newly untracked files" idea. I think it makes sense and might be viable, though I'm not sure it's the best option. All the details (esp. the names) are preliminary.

We could give almost every jj command a --newly-unignored-policy option with a few possible values:

  • record: update .gitignore with any files that are present in the working copy, were ignored before the command, and are not ignored after the command (e.g. because we switched to a different commit with a different .gitignore)
  • delete: do not touch the .gitignore and delete any files that were ignored previously and now aren't
  • track: do not touch the .gitignore and track those files (the current behavior)
  • error: abort if the command causes some files present in the working copy to become unignored.

For most commands including new and abandon, the default would be --newly-unignored-policy=record. This is the behavior I described in #323 (comment). In particular, just as in that comment, 1) jj new might create a non-empty commit (with changes to .gitignore) and 2) jj abandon would be a no-op after jj new, even in that case.

If you are in that situation, and you wanted to get rid of both the .gitignore changes and the files that references, you'd do jj abandon --newly-unignored-policy=delete. More confusingly (but more flexibly), you could do jj abandon --newly-unignored-policy=track and then either deleted some files manually or do another jj abandon.

For a few commands, e.g. edit and maybe restore, the default would be --newly-unignored-policy=error. This prevents accidentally modifying commits with jj edit. We'd have to write really nice and actionable messages for these cases.

(Strictly speaking, if we considered snapshotting an "operation", it would follow the "track" behavior)

We'd also want detailed warnings if the record action actually happens after the result of some command.


As you can see, this is complex. I'm not sure we can explain it simply; we'd have to try. I'm also not sure any other design we choose, once we work out its details, would be simpler.

These behaviors could occasionally be annoying, but mostly (hopefully only) in the cases where jj currently does something bad silently. Hopefully, that is bearable.

Implementing this might take some work, but the main question in my mind is how frustrating and confusing (or clear and painless) this would seem to users that run into these problems unexpectedly. Whatever design we choose, the issue of un-ignored files is almost always going to be an unexpected obstacle that distracts the user from what they are actually trying to do.

@sheremetyev
Copy link
Collaborator

Agreed @ilyagr! With all the edge cases automatically modifying .gitignore looks too complex - it wasn't a good idea :)

Hopefully "first-class ignores" is simpler? I tried to think through various cases and don't see major issues. Short way to describe the behaviour:

.gitignore is edge-triggered rather than level-triggered

@yuja
Copy link
Collaborator

yuja commented Oct 10, 2024

iirc, there's another feature request to add some overlay that contains "volatile" changes (e.g. debug prints, environment-specific modification, etc.) that aren't meant to be included in the final commit, but should be carried over to new checkout. If we had such layer, newly un-ignored files could be snapshotted to that layer, not to the working-copy commit.

Just an idea. It wouldn't solve the big "un-ignored node_modules problem" if the overlay is backed by a commit object.

@scott2000
Copy link
Collaborator

What if the working copy contained a list of prefixes of untracked/ignored paths (meaning any paths that aren't included in the snapshot.auto-track fileset and any paths included in a .gitignore file)? For efficiency, if all files in a directory are ignored, the directory path could be stored instead.

The idea is that snapshot.auto-track and .gitignore only apply when a file is first detected by a snapshotting operation, and then the working copy remembers the state (either tracked/untracked) until the user manually tracks/untracks/deletes the file. I think this behavior for untracked files is more consistent actually, because it matches how it works for tracked files (i.e. if you add a file and it gets tracked, it remains tracked even if it gets added to .gitignore, unless the user manually untracks it).

This would mean that if the user switches to a new commit without a .gitignore, the ignored files remain untracked since the paths are still marked as untracked in the working copy. It could also be possible to turn the file size limit error into a warning, and just add it as an ignored path in the working copy to prevent snapshotting the file until the user explicitly tracks it.

This would probably also require a new section in jj status to list all untracked paths which aren't currently part of a .gitignore so that the user could easily see which files they might need to track or add to .gitignore.

This sounds similar to what @ilyagr was talking about in #323 (comment).

@joyously
Copy link

The user should be able to reason easily about whether the file is tracked or not, but it seems to be getting too complicated to explain.
There should be one source of truth for ignored or not. I assume this to be the ignore file. To me, this means that track and untrack should involve a modification of the ignore file. My vision of this is an interactive edit, with a suggested line containing the path.
Perhaps the best option is to make that change always rebased to the root commit, separate from the working copy.

@PhilipMetzger
Copy link
Collaborator

@marc-h38 said:

Just like a function in your code base that exists, it will get used by someone down the road. It will get more mindshare and more tooling around it.

Yes, and? In this particular case, it also means removing this deal breaker will get jj many more users overall. Isn't that great?

Some parties in this conversation consider it a broken window which enables a "bad workflow" or a workflow a new vcs shouldn't have, while the other party really depends on that behavior.

I think my opinion on that is known, so I've said enough.

My opinion on the other conversation is that we're actually converging on a solution which @joyously brought up here: #4338 (comment), which definitely can work. Ideally this could be just a single binary state instead of the trinary behavior with the snapshotting option, but we have it now and must work with it.

@ilyagr
Copy link
Collaborator

ilyagr commented Oct 12, 2024

There's a lot going on in this thread now. (Good!) I'll just reply to a few things, but I'm still actively thinking.

This will be inconclusive. I did, in fact, try to write down a conclusion or at least a summary, but I just don't have one in mind at this point.

Re @sheremetyev

Agreed @ilyagr! With all the edge cases automatically modifying .gitignore looks too complex - it wasn't a good idea :)

I haven't given up on it at all, I think it's very much worth considering. It seems likely to me that the reason it seems more complicated than the other ideas is merely that I explained it in more detail.

Hopefully "first-class ignores" is simpler?

make ignored files/directories "first class" citizens and maintain as a property of the working copy. In this case .gitignore files become "instruction what to do when a new file is just created": either start tracking new file or add it to the list of untracked files. [ See also #323 (comment) - Ilya ]

I am not sure about treating .gitignore as "instruction what to do when a new file is just created". This would seem to imply that when the user edits .gitignore, we don't update the actual list of ignored files.

I don't think this is necessarily fatal to the idea. For example, after editing .gitignore, some files could go into a weird middle state and jj could start warning people about it. But it's not entirely clear to me what UI we'd have to help the user keep track of the difference between the .gitingore and the actual list of tracked files, and how do we make this less surprising (when exactly do we warn the user about these kinds of files?).

@scott2000 's idea seems very close (or very compatible) with this one.

.gitignore is edge-triggered rather than level-triggered

I didn't follow this part.

Re @yuja

iirc, there's another feature request to add some overlay that contains "volatile" changes

Good point! I think this is worth considering as well. From a certain perspective, it's very similar to what I was suggesting in #323 (comment).

From another perspective, this overlay could be a way to implement something like Fedor's idea or even some extension of jj (un)track. So, I think the main question with this idea is what the UI would look like, exactly.

It wouldn't solve the big "un-ignored node_modules problem" if the overlay is backed by a commit object.

If this is a problem, we could try to keep it in an easily garbage-collectible commit, though that might come at the price of not having jj undo affect the overlay. I'm thinking we would not remember the overlay in operation log and not have a jj/heads ref pointing to the old versions of the overlay, and only keep track of the newest version.

This does come at some cost to usability for the "scratch file" usecase, but maybe that's OK? We don't have to tell people there was ever an option of tracking their scratch files in the operation log. 😇

@ilyagr
Copy link
Collaborator

ilyagr commented Oct 12, 2024

Re @joyously

To me, this means that track and untrack should involve a modification of the ignore file.

Aside: For other people, different aspects of this approach are discussed in #4338 (comment).

The main question we seem to be discussing at the moment is what to do when the user switches to a file with a different .gitignore and some of the files that were previously ignored are still in their working dir. I think this is more or less solvable, as @sheremetyev suggested in, #323 (comment), but when I tried to flesh it out, the result is complicated (but possibly not too complicated, I'm not sure): #323 (comment).

To be clear, I'm still interested in this approach, and @PhilipMetzger 's comment suggests he is too. It's not clear to me yet whether it will work.

Perhaps the best option is to make that change always rebased to the root commit, separate from the working copy.

I didn't understand this idea.

@ilyagr
Copy link
Collaborator

ilyagr commented Oct 12, 2024

I realized that, for better or worse, @yuja 's "overlay commit" is pretty much equivalent to Git's staging area. We would use it for a slightly different purpose, though, it wouldn't be as front-and-center as in Git.

@joyously
Copy link

Perhaps the best option is to make that change always rebased to the root commit, separate from the working copy.

I didn't understand this idea.

What I meant was that the ignore file is a special case since it is a control file, and whenever it is modified, that should be separated from other changes into its own commit, which is then applied as far back in the history as possible so the user can't switch to before and after that ignore. As for the ignored files, a true ignore should not delete them, but not snapshot them either.

@ilyagr
Copy link
Collaborator

ilyagr commented Oct 12, 2024

which is then applied as far back in the history as possible so the user can't switch to before and after that ignore

I think that regardless of its merits, this idea (if I understood it correctly) is impractical for any repo where people are not allowed to amend a commit that's been shared with other people ("immutable commits" in jj parlance). This prohibition would include rebasing a commit on top of the new .gitignore (which affects the commit id), and this applies to pretty much every git repo that's shared between multiple people.

Perhaps once jj takes over the world and people can refer to commits by their change id, this will become possible to consider. Even then, there would be security implications and other assumptions people are currently used to being broken by it.

@yuja
Copy link
Collaborator

yuja commented Oct 12, 2024

I realized that, for better or worse, @yuja 's "overlay commit" is pretty much equivalent to Git's staging area.

Kind of? New working-copy changes would be squashed into the overlay's parent, so it's conceptually quite different from the staging area. Oh, but that means changes on un-ignored files wouldn't be ignored, so it's not what the user want.

@ilyagr
Copy link
Collaborator

ilyagr commented Oct 12, 2024

I am liking the editing .gitignore idea, especially after Yuya's point as I understand it (which deserves a longer reply/explanation I won't write right now).

One moderate problem with the editing .gitignore, however, is how it deals with multiple workspaces. I wonder if anybody will have any ideas about that.

I think that what I wrote in #323 (comment) is actually OK as written if we require the user to specify --newly-unignored-policy on every jj workspace update-stale that might create newly unignored files (possibly due to files that are only present in one of the workspaces). In other words, we default jj workspace update-stale to --newly-unignored-policy=error. However, this is not great UI; new people who have to do jj workspace update-stale are probably already confused enough as it is, and the choice of a good --newly-unignored-policy might not be obvious in this case.

The other possible solution would be to use a workspace-local ignore file for these, as I pondered in #4338 (comment). However, that blocks solving the problem of unignored files on solving the problem of how to do per-workspace gitignores in jj. I don't know an obvious solution to that problem, and that second problem is not otherwise as urgent, so blocking the original problem on it is also not great.

Aside: In principle, we could also consider changing how jj workspace update-stale works (as a third option), but I think it's already a testament to its good design that it can mostly substitute for the support of per-workspace gitignores. In any case, I don't currently have any concrete ideas in this direction at the moment.

@yuja
Copy link
Collaborator

yuja commented Oct 12, 2024

I'm skeptical about auto-updating .gitignore because of the complexity of ignore patterns and sub-directory layering. (I don't remember the exact rule, so I might be wrong.)

I think the idea described here can be a good compromise.

What if the working copy contained a list of prefixes of untracked/ignored paths (meaning any paths that aren't included in the snapshot.auto-track fileset and any paths included in a .gitignore file)? For efficiency, if all files in a directory are ignored, the directory path could be stored instead.

#323 (comment)

(FWIW, if watchman is enabled, the current jj would behave in that way thanks to a bug #2613.)

@ilyagr
Copy link
Collaborator

ilyagr commented Oct 12, 2024

I'm skeptical about auto-updating .gitignore because of the complexity of ignore patterns and sub-directory layering. (I don't remember the exact rule, so I might be wrong.)

I was thinking of keeping it simple and listing each newly unignored file individually at the end of the .gitignore. The "newly unignored file" logic would only even add new lines at the end when needed, it would never delete or edit them. (Though, the user could remove them en masse with jj abandon --newly-ignored-policy=delete).

@yuja
Copy link
Collaborator

yuja commented Oct 12, 2024

I was thinking of keeping it simple and listing each newly unignored file individually at the end of the .gitignore.

You mean "newly unignored file or directory"? If it were only files, tons of node_modules paths would be added to .gitignore. Since .gitignore is a file managed by human, the user would expect somewhat reasonable and compact modification to be made.

@sheremetyev
Copy link
Collaborator

I am not sure about treating .gitignore as "instruction what to do when a new file is just created". This would seem to imply that when the user edits .gitignore, we don't update the actual list of ignored files.

I don't think this is necessarily fatal to the idea. For example, after editing .gitignore, some files could go into a weird middle state and jj could start warning people about it. But it's not entirely clear to me what UI we'd have to help the user keep track of the difference between the .gitingore and the actual list of tracked files, and how do we make this less surprising (when exactly do we warn the user about these kinds of files?).

@ilyagr, I think the mental model here is quite simple, and "jj status" is the all UI we need:

  • Jujutsu knows about all files in the working copy
  • any file on disk in the working copy is ether tracked or untracked by Jujutsu
  • combination of .gitignore files in various directories defines what user wants to be untracked
  • list of files that exist on disk but are not tracked is a property of the working copy
  • most of the time there is no discrepancy: existing untracked files are all match .gitignore patterns; existing tracked files don't match .gitignore patterns
  • "jj status" command lists all discrepancies: 1) untracked files that are outside of .gitignore, 2) tracked files matching .gitignore patterns
  • user looks at discrepancies in "jj status" and either a) updates .gitignore files to express what they want to track, b) manually track/untrack files that are in the wrong state, c) lives happily with dirty status
  • when user changes .gitignore files nothing is automatically tracked/untracked - only output of "jj status" changes
  • when new file is created, the patterns from .gitignore are applied and the new file gets tracked or added to the list of untracked files (hopefully correctly - if .gitignore files define what user wants)
  • when switching to another commit in the tree, the content of .gitignore files may change and there may be discrepancy with the list of untracked files on disk, but nothing gets automatically tracked (discrepancies are visible in "jj status") - user can safely switch back to previous commit and status will be clean

@sheremetyev
Copy link
Collaborator

sheremetyev commented Oct 12, 2024

This would seem to imply that when the user edits .gitignore, we don't update the actual list of ignored files.

That may be a good thing IMHO. If I make a mistake while editing .gitignore files, I'd rather not have files automatically tracked or untracked - to not complicate cleanup of the mess after my mistake.

@scott2000
Copy link
Collaborator

scott2000 commented Oct 12, 2024

I think another issue with modifying .gitignore is that it's unclear which ignore file should be modified. Often when I ignore files, I only want them to be ignored for me and I don't want to modify it for my coworkers as well (because they're just notes or some files specific to me), so probably 90% of the time they go into .git/info/exclude instead of .gitignore. I also occasionally might want to add it to my global .gitignore instead of the repo-specitic one. In the future, it also seems like it would be good to transition away from .gitignore and support .ignore as well, especially for the native backend.

If we were to go with the approach of modifying .gitignore, I think the safest option would be to have a workspace-specific, local ignore file that automatically gets modified when you track/untrack files, since it feels safest not to automatically add a change to a commit (since you might not intend to share that change with coworkers). But then at that point, I feel like it's basically equivalent to what I was describing with adding a list of untracked files/paths to the working copy itself.

@ilyagr
Copy link
Collaborator

ilyagr commented Oct 14, 2024

@sheremetyev

  • when new file is created, the patterns from .gitignore are applied and the new file gets tracked or added to the list of untracked files (hopefully correctly - if .gitignore files define what user wants)

It is an excellent point that we can consider this. This is such a big change to the UI that I didn't consider it seriously, but I agree that it might be a good thing.

A lot of the complication of some of the ideas I described comes from trying to keep up with potential changes to .gitignore.

We already have the concept of .gitignore and a separate list of tracked files. If we can make the usecases work with just commands to manipulate these, that'd be great, and I think there's a chance of a UI could uniformly treat different categories of tracked/untracked files.

when switching to another commit in the tree, the content of .gitignore files may change and there may be discrepancy with the list of untracked files on disk, but nothing gets automatically tracked (discrepancies are visible in "jj status") - user can safely switch back to previous commit and status will be clean

This is currently my biggest worry about this idea -- I'm not sure this will work (and I'm not sure that I follow your thinking).

For one, what if the new commit has a tracked file where the working copy had an ignored file? This situation would already be a problem today, but I think it'll become far more common with your setup. E.g. it would be potentially problematic if you start on the main branch of jj, then jj new gh-pages, and then jj new main-.

Update: This usecase would be problematic with other solutions as well. So, on second thought, I'm not sure whether this problem is worse for Fedor's approach than for any other. It might be worth thinking about anyway.

It seems likely to me that, if we want the user to manually resolve cases when .gitignore does not match the actual ignore files (or a subset of them that is not marked as scratch files), we would be forced to create a jj state where it's not OK to jj new to another commit.

@ilyagr
Copy link
Collaborator

ilyagr commented Oct 14, 2024

@yuja, @scott2000

I want to keep this reply brief to give some space to Fedor's idea, but if we do not go with that idea:

  • For the list of newly unignored files, I'd try to put some energy into making it work with .gitignore first, for reasons Joy described, but if there are insurmountable problems, we could store it in the working copy. I think that later rules always override earlier ones in .gitingore, which is the main reason I hope that storing things in .gitignore might work without the ability to parse .gitignore syntax.

    If we are OK with the idea of "For efficiency, if all files in a directory are ignored, the directory path could be stored instead.", that would solve the problem of the .gitignore becoming a mile long as in Auto-add of untracked files screws me up every time #323 (comment). If we are not OK with it, that might be insurmountable, I'm not sure.

  • For which gitignore, a lot of the issues can be postponed if we just focus on the "newly unignored files" usecase for now. I'm not sure what to do about workspace update-stale as I mentioned in Auto-add of untracked files screws me up every time #323 (comment).

  • Note also that currently, jj only supports .git/config/exclude in co-located repos and probably by accident. Addressing this is closely related in my mind to what we do about workspace-local ignores. I wish we had a better solution for both, but I'm not sure what it would be; see the comment I linked above for more links.

@ilyagr
Copy link
Collaborator

ilyagr commented Oct 15, 2024

For @sheremetyev 's idea, I'm wonder what to do in co-located repos where Git will get the list of ignored files from .gitignore. It seems like, in a co-located repo, ignored file that's not in .gitignore is OK, but not-ignored file that is in .gitignore could be a problem.

I guess jj could/should silently git add --intent-to-add such files.

@scott2000
Copy link
Collaborator

@ilyagr

If we are OK with the idea of "For efficiency, if all files in a directory are ignored, the directory path could be stored instead.", that would solve the problem of the .gitignore becoming a mile long

I think for .gitignore, this definitely isn't ok actually because they have different semantics. If I ignore dir/a.txt and it adds dir/ to the ignore file, then when I add "dir/b.txt" it would be surprising if it were ignored even though I didn't specify it as ignored. This same issue might also apply to storing the untracked file list in the working copy in some cases, so we'd have to be careful there as well.

Either way, I personally still don't feel very comfortable with jj automatically adding paths to the committed .gitignore file in general, because I like to keep my committed .gitignore file as clean as possible. For instance, I sometimes use comments to create section headers for files produced by different tools, and I like to add the simplest rule that can capture all of the necessary files, even if a longer list of files would have the same effect in practice.

I do see the value of such a command, but I would at least prefer for it to be something the user has to ask for explicitly (e.g. jj untrack --ignore <file> or jj ignore <file>) rather than something that could happen automatically during another operation.

@yuja
Copy link
Collaborator

yuja commented Oct 15, 2024

  • For the list of newly unignored files, I'd try to put some energy into making it work with .gitignore first, for reasons Joy described, but if there are insurmountable problems, we could store it in the working copy. I think that later rules always override earlier ones in .gitingore, which is the main reason I hope that storing things in .gitignore might work without the ability to parse .gitignore syntax.

There may be a negative pattern in sub directory's .gitignore. I don't remember the rule, but I think sub-dir rule is prioritized? That's another complexity I had in mind.

If we are OK with the idea of "For efficiency, if all files in a directory are ignored, the directory path could be stored instead.", that would solve the problem of the .gitignore becoming a mile long as in Auto-add of untracked files screws me up every time #323 (comment).

It would probably become simpler if the working copy tracked ignored directory/file paths. If the whole directory was previously ignored, and if it is now un-ignored, record it as new ignore path. However, this means the working-copy tracks ignored paths, so I think it's easier to accumulate ignored paths instead of updating user-managed .gitignore files. We'll need jj status support and some other commands to update/reset the cache of ignored paths, but we can instead get away from --ignore-policy options, which seems equally unintuitive.

@martinvonz
Copy link
Owner

There may be a negative pattern in sub directory's .gitignore. I don't remember the rule, but I think sub-dir rule is prioritized? That's another complexity I had in mind.

Yes, I think the rules are basically appended to the list from the parent and processed in reverse order, so later rules in a subdirectory or in an individual .gitignore file take priority.

@sheremetyev
Copy link
Collaborator

For one, what if the new commit has a tracked file where the working copy had an ignored file? This situation would already be a problem today, but I think it'll become far more common with your setup. E.g. it would be potentially problematic if you start on the main branch of jj, then jj new gh-pages, and then jj new main-.

@ilyagr that's a great example! I think it's reasonable to expect such file to remain untracked - because it's explicitly listed in the working copy's list of untracked files? Switching back to the main branch would put working copy in a good state (ignored file is still on disk). Intermediate situation on gh-pages branch is somewhat similar to the case where a file was committed previously, then user decided to stop tracking it but wants to keep the file on disk.

IIUC, to achieve such behaviour, the list of untracked files in the working copy should take precedence over the list of tracked files in the current commit - when determining tracked/untracked status for a file.

It seems likely to me that, if we want the user to manually resolve cases when .gitignore does not match the actual ignore files (or a subset of them that is not marked as scratch files), we would be forced to create a jj state where it's not OK to jj new to another commit.

IMHO it's an important advantage of UX in Jujutsu that user can always switch to another commit - would be nice to preserve it

@robinst
Copy link
Collaborator

robinst commented Oct 18, 2024

Tip

Summary of current state for anyone landing here

If you use jj with git and want to not automatically track any files, since #4338 you can set this configuration:

jj config set --user 'snapshot.auto-track' 'none()'

Then to see untracked files:

git status

To add an untracked file:

jj file track myfile.txt

(The work remaining for this IIUC is to not require a git status to see untracked files.)

tmeijn pushed a commit to tmeijn/dotfiles that referenced this issue Oct 18, 2024
This MR contains the following updates:

| Package | Update | Change |
|---|---|---|
| [martinvonz/jj](https://github.com/martinvonz/jj) | minor | `v0.21.0` -> `v0.22.0` |

MR created with the help of [el-capitano/tools/renovate-bot](https://gitlab.com/el-capitano/tools/renovate-bot).

**Proposed changes to behavior should be submitted there as MRs.**

---

### Release Notes

<details>
<summary>martinvonz/jj (martinvonz/jj)</summary>

### [`v0.22.0`](https://github.com/martinvonz/jj/releases/tag/v0.22.0)

[Compare Source](martinvonz/jj@v0.21.0...v0.22.0)

##### Breaking changes

-   Fixing [#&#8203;4239](martinvonz/jj#4239) means the
    ordering of some messages have changed.

-   Invalid `ui.graph.style` configuration is now an error.

-   The builtin template `branch_list` has been renamed to `bookmark_list` as part
    of the `jj branch` deprecation.

##### Deprecations

-   `jj branch` has been deprecated in favor of `jj bookmark`.

    **Rationale:** Jujutsu's branches don't behave like Git branches, which a
    confused many newcomers, as they expected a similar behavior given the name.
    We've renamed them to "bookmarks" to match the actual behavior, as we think
    that describes them better, and they also behave similar to Mercurial's
    bookmarks.

-   `jj obslog` is now called `jj evolution-log`/`jj evolog`. `jj obslog` remains
    as an alias.

-   `jj unsquash` has been deprecated in favor of `jj squash` and
    `jj diffedit --restore-descendants`.

    **Rationale:** `jj squash` can be used in interactive mode to pull
    changes from one commit to another, including from a parent commit
    to a child commit. For fine-grained dependent diffs, such as when
    the parent and the child commits must successively modify the same
    location in a file, `jj diffedit --restore-descendants` can be used
    to set the parent commit to the desired content without altering the
    content of the child commit.

-   The `git.push-branch-prefix` config has been deprecated in favor of
    `git.push-bookmark-prefix`.

-   `conflict()` and `file()` revsets have been renamed to `conflicts()` and `files()`
    respectively. The old names are still around and will be removed in a future
    release.

##### New features

-   The new config option `snapshot.auto-track` lets you automatically track only
    the specified paths (all paths by default). Use the new `jj file track`
    command to manually tracks path that were not automatically tracked. There is
    no way to list untracked files yet. Use `git status` in a colocated workspace
    as a workaround.
    [#&#8203;323](martinvonz/jj#323)

-   `jj fix` now allows fixing unchanged files with the `--include-unchanged-files` flag. This
    can be used to more easily introduce automatic formatting changes in a new
    commit separate from other changes.

-   `jj workspace add` now accepts a `--sparse-patterns=<MODE>` option, which
    allows control of the sparse patterns for a newly created workspace: `copy`
    (inherit from parent; default), `full` (full working copy), or `empty` (the
    empty working copy).

-   New command `jj workspace rename` that can rename the current workspace.

-   `jj op log` gained an option to include operation diffs.

-   `jj git clone` now accepts a `--remote <REMOTE NAME>` option, which
    allows to set a name for the remote instead of using the default
    `origin`.

-   `jj op undo` now reports information on the operation that has been undone.

-   `jj squash`: the `-k` flag can be used as a shorthand for `--keep-emptied`.

-   CommitId / ChangeId template types now support `.normal_hex()`.

-   `jj commit` and `jj describe` now accept `--author` option allowing to quickly change
    author of given commit.

-   `jj diffedit`, `jj abandon`, and `jj restore` now accept a `--restore-descendants`
    flag. When used, descendants of the edited or deleted commits will keep their original
    content.

-   `jj git fetch -b <remote-git-branch-name>` will now warn if the branch(es)
    can not be found in any of the specified/configured remotes.

-   `jj split` now lets the user select all changes in interactive mode. This may be used
    to keeping all changes into the first commit while keeping the current commit
    description for the second commit (the newly created empty one).

-   Author and committer names are now yellow by default.

##### Fixed bugs

-   Update working copy before reporting changes. This prevents errors during reporting
    from leaving the working copy in a stale state.

-   Fixed panic when parsing invalid conflict markers of a particular form.
    ([#&#8203;2611](martinvonz/jj#2611))

-   Editing a hidden commit now makes it visible.

-   The `present()` revset now suppresses missing working copy error. For example,
    `present(@&#8203;)` evaluates to `none()` if the current workspace has no
    working-copy commit.

##### Contributors

Thanks to the people who made this release happen!

-   Austin Seipp ([@&#8203;thoughtpolice](https://github.com/thoughtpolice))
-   Danny Hooper ([@&#8203;hooper](https://github.com/hooper))
-   Emily Shaffer ([@&#8203;nasamuffin](https://github.com/nasamuffin))
-   Essien Ita Essien ([@&#8203;essiene](https://github.com/essiene))
-   Ethan Brierley ([@&#8203;eopb](https://github.com/eopb))
-   Ilya Grigoriev ([@&#8203;ilyagr](https://github.com/ilyagr))
-   Kevin Liao ([@&#8203;kevincliao](https://github.com/kevincliao))
-   Lukas Wirth ([@&#8203;Veykril](https://github.com/Veykril))
-   Martin von Zweigbergk ([@&#8203;martinvonz](https://github.com/martinvonz))
-   Mateusz Mikuła ([@&#8203;mati865](https://github.com/mati865))
-   mlcui ([@&#8203;mlcui-corp](https://github.com/mlcui-corp))
-   Philip Metzger ([@&#8203;PhilipMetzger](https://github.com/PhilipMetzger))
-   Samuel Tardieu ([@&#8203;samueltardieu](https://github.com/samueltardieu))
-   Stephen Jennings ([@&#8203;jennings](https://github.com/jennings))
-   Tyler Goffinet ([@&#8203;qubitz](https://github.com/qubitz))
-   Vamsi Avula ([@&#8203;avamsi](https://github.com/avamsi))
-   Yuya Nishihara ([@&#8203;yuja](https://github.com/yuja))

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever MR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this MR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this MR, check this box

---

This MR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNy40NDAuNyIsInVwZGF0ZWRJblZlciI6IjM3LjQ0MC43IiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6WyJSZW5vdmF0ZSBCb3QiXX0=-->
@simonmichael
Copy link

simonmichael commented Oct 23, 2024

Thanks to @martinvonz and all the discussers for working on this. I'm able to use jj now because of the new setting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests