-
Notifications
You must be signed in to change notification settings - Fork 382
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Auto-add of untracked files screws me up every time #323
Comments
Agreed on that — I think automatically committing tracked files is fine, but untracked is probably bad:
|
I mostly find the feature useful, but I also agree that it can be annoying and confusing. The worst case I've noticed is when you - perhaps accidentally - check out the root commit, where there's no When looking at an old version of the repo, you'll not see untracked files (e.g. |
I plan to use this matcher for some future `jj add` command (for #323). The idea is that we'll do a path-restricted walk of the working copy based on the intersection of the sparse patterns and any patterns specified by the user. However, I think it will be useful before that, for @arxanas's fsmonitor feature (#362).
I plan to use this matcher for some future `jj add` command (for #323). The idea is that we'll do a path-restricted walk of the working copy based on the intersection of the sparse patterns and any patterns specified by the user. However, I think it will be useful before that, for @arxanas's fsmonitor feature (#362).
Just to throw my 2c in: With the exception of new files that are also added to |
Just wanted to say that I really like the auto-tracking feature as well. After using jj for a while, going back to manually having to think about adding files to be tracked seems like a lot of extra work and possibly more error prone. I'm already used to checking |
The thing with accidentally committing GBs of Especially if you have something automatically creating it, like The only way is to recreate the repo (losing the oplog) this time not forgetting to ignore stuff/not editing old commits - which is kind of meh as well. Full GC of course means basically the same thing, only automated by a single command, but something like |
Maybe it is possible to track ignores in P.S. Can think of how |
Hm, so auto-tracking when you are doing changes in a working commit is the main killer feature. But for my issue with it, how about this: I know this is kind of magical, but the more I think about it the more it makes sense, idk - and the autorebase is magical on it's own, this somehow is kind of even consistent in my head |
It would be expensive to find the gitignores from all commits, but we could probably index that information. I'm more concerned that it would be unexpected behavior. For example, if you check out a sibling commit where Maybe it's better to check if the gitignores changed between the old and the new commit and if any untracked files according to the new patterns match the old patterns. If that happen, we could just print a warning about it. We could additionally add the ignores to a per-workspace set of ignores (which we don't support yet). (EDIT: I think this is what @ony suggested.) |
The warning being "those differences are implicitly untracked for this WC, in case your direnv caused GBs of files to generate in target and/or .direnv - add them to gitignore here or explicitly track them, moving to another commit without them ignored (e.g. ^ this is a loose idea, could be refined, for example |
Following @ony's suggestion, perhaps when the working copy moves to a new commit, we could track "newly unignored" files by comparing the old Then, the UI could provide ways for dealing with these files. E.g. there could be a command like |
I ran into this today when trying to checkout to a different branch that doesn't contain |
Ahh ignore my comment, I think I got confused - once there is an option to not auto-add untrack files |
You can pass jj op log --ignore-working-copy # "jj op log" also works with the current main branch
jj op restore --ignore-working-copy @-
jj workspace update-stale --some-option-to-not-snapshot-before-resetting |
Overall, it seems there is no good automated way to handle untracked files when creating commits. One one hand, not tracking them leads to incomplete commits. On the other hand, auto-tracking them leads to commiting of unwanted files. So how would you feel about some variation of the following semi-automated design?
I think this might strike a good balance between the following concerns:
For scripted operation, there should be a way to provide a default answer to the prompt via CLI arguments. |
I'm personally quite happy with the current behavior (except for the behavior when updating to a commit with different .gitignore). It can be a bit annoying in the beginning, but once you've added the appropriate paths, I find that it works pretty well. Maybe others feel differently. But even if they don't, we may want to make it less annoying for new users by doing something like you suggest. |
I'm not very happy with the idea of the interactive prompt. I think that if you edit a Other UIs will also do an analogue of |
Ah, yes, there's that. I knew that this design decision of having status commands modify the repository was fishy and going to cause problems someday... |
This feature has sadly made me bounce off of jj immediately every time I try it, which is really unfortunate because I keep hearing such good things about it, and want to give it a genuine try. Every single repository I work on regularly has various testing/strace-log/whatever files in its root. I actually don't mind auto tracking in fwiw a workaround for this that I've not yet checked works on jj might be some kind of terrible thing like so in the
This workaround is quite bad indeed, and I would rather not have to reimplement the git index in |
I have no idea whether this would be helpful you, but here's something that helped me a lot. I can't remember who had suggested it originally; it might be in the FAQ.
|
Yup it is in the FAQ or something; I've seen it given as advice before. I just don't like it and it doesn't vibe with how I work, since it would be a whole bunch of extra typing. I could have it be i/ or something, I guess, to reduce typing, but I would still have to remember to do it every time, which feels kind of bad? |
Inspired by more feedback (#3528 (comment)), perhaps @dpc 's suggestion from that post might work. Perhaps we could have a notion of "untracked" files, like Git, and default files to "untracked", while also auto-updating all the tracked files on each command?
One question is what This would be a huge change, so I almost certainly missed some important considerations. |
Speaking out of ignorance, I'm guessing Hmm... I guess |
If we add support for untracked files, I think it should be pretty much only
I don't think so. Almost all commands, and probably also the future |
For what is worth, let me add a vote for changing the current behavior. While I understand that for a lot of current users might really enjoy it, I agree with @durin42 that jujutsu's current default behavior is problematic in several ways (and in fact it tripped me the very first time I tried jj). However, if you often create private files that you want to keep in your working directory without tracking them (i.e. files that you never want to share and which you don't want to disappear as you move to another changeset), the current default and the |
I'm afraid you have not read the whole discussion. You should also read #4338. This mega issue has now transitioned to discussing #5138. If you really want to start a "vote" to change the default (you do not), then create a new, specific issue. This issue has gone in too many directions and become too big and unreadable.
The consistent answer from some members of the I'm not a fan of this either but getting |
With #5138 |
Revision of previous stanceI originally commented two years ago arguing against snapshotting untracked files. I no longer think that, for two reasons —
Design considerationsI skimmed through the thread and tried to collect the important design considerations:
Let me know if I missed anything. Unifying designsWhile I'm fine with the current set of workarounds ( Example alternative designTo demonstrate similarity for a general design, I'll claim that untracked files aren't that different from LFS files:
Here's one example of a way to generalize the design (but note that it's not a fully-fleshed out design). Suppose we added object types for each of the following (among possibly others from the previously-linked listing):
EvaluationIf untracked/ignored/precious files are represented similarly to normal files in the snapshots, then you also resolve most of the above design considerations:
Versus other proposed designsIn contrast, I am less favorable towards a solution where we automatically update I'll also just remark that "untracked files" are pretty similar to "files which aren't checked out due to the sparse checkout configuration":
|
@arxanas are you arguing for snapshotting of untracked files in the current commit or in the state of working copy? I think snapshotting in working copy would give a reasonable behaviour (#323 (comment)) |
Does this representation require different data than the existing backends save? If so, does that mean that data is only stored locally? |
Deep and interesting thoughts, except maybe for this:
Errr... no: "not shared/published yet" is definitely not the same as "losing work". That's a serious misrepresentation. The funny thing is: this type of accident is absolutely nothing new. Whether it's the better or worse design choice, most (all?) VC systems have allowed it to happen so far. The only reason we're discussing it is because
Very good point: with the (welcome) rise of markup languages and treating everything "as code", VC is becoming more popular than ever and reaching new types of users.
Agreed: you could for instance publish a book accidentally missing a section. This requires a lack of automated checking of cross-references, a poor proof-reading process not even noticing that table of contents don't match, etc. but none of these is hard to imagine. For developers on the other hand, catching a forgotten source file does not even require any CI system in practice. If that new file serves some actual purpose (other than hijacking a shared repo to backup private resources...), then the miss is immediately noticed the moment ANYone or anything (not just CI) tries to compile or run tests etc[*]. Many developers have first hand experience with this (= exactly why So looking back, making this behavior configurable was practically necessary for There is also a profile of [*] I'm not considering one-person projects that don't even have any sort of automated CI either. These have much, much bigger test issues than autotracking. |
So, since our discussions I've been using jj a lot (for everything), and I only recently - like a week or two recently - finally set the And I've found the experience worse so far, I forgot to track files several times. I probably got used to autotrack and stopped having random files in my repos?. My personal issues were things like tracking a massive target/ folder or not having junk in oplog - for the first one I was saved multiple times by the 1mb heuristic (which I'm pretty sure I inspired here in gh comments somewhere), and the second one is just annoying ocd type of thing that I'm learning to ignore - same with jj snapshots of unfinished/messy work in general. Also for massive target/ or other "incomplete gitignore"-type issues - they usually occur once and at the very worst you can nuke and reclone/recreate the repo. For the secrets that people keep bringing up - bruh I never in my life had secrets in source files, what (at least not those I'm ready to leak because they are for localhost env or smth) I was a |
To reiterate, in my previous comment, I was mainly proposing that we should establish a single unified design with a minimal number of additional concepts, rather than special-casing behavior for untracked files, as we've done so far; and then we should reuse the design for many other kinds of files. In the comment below, I also point out what I consider to be remaining UX issues, which I should have raised first.
I'm not in favor of specializing the working copy to handle untracked files. I think it would be better to embed the concepts into commits generally. Some of the semantics (like when switching commits) will probably be the same regardless of whether we literally embed untracked files/handles into commits.
I didn't flesh out the design too much, so there's probably multiple implementation routes:
Assuming that —
then I don't see a way for the VCS to avoid worrying about precious files.
I'll admit that "untracked" is perhaps the weakest-justified variant of the kinds of files in the working copy:
I agree that a "not shared/published yet" state is definitely not the same as "losing work", but untracked files (neither in Git nor jj) do not represent a "not shared/published yet" state. They certainly should. Actually I don't recall seeing a UI affordance in this thread that would help mitigate this issue? Maybe somebody already proposed this in the thread? I think the combination of
would help both users who both prefer to auto-track and those who prefer not to auto-track. I believe the current set of solutions let users opt into or out of auto-tracking entirely, but it seems like a local maximum to unblock today's users, rather than the ideal UX. The combination of the above is actually independent of the design and implementation, but in my previous post, I focused on the design and implementation rather than what I consider to be a remaining UX issue, which was probably a rhetorical mistake. (The are some UX reasons to slightly prefer embedding untracked files in commits, rather than strictly as a property of the working copy. For example: when you later discover that you meant to track a file, and you've already created a stack of commits, then you can automatically add the file to the commit which logically introduced the file, rather than having to figure out which commit should contain it later.)
Perhaps you didn't intend it this way, but the wording here seems to underplay the severity of failing to push work.
|
I do intend to correct your dramatization.
No one disputes that this is an inconvenience. The real questions are "how often?" and "how bad?" I worked for a few decades with many different engineers and none complained like it was a serious issue. To be fair, none were exposed to autotracking yet and I bet some will like it (especially the
I wasn't going to reply except for this other, new misrepresentation. When you carelessly delete git clones, you lose much more than uncommitted files: you ALSO lose unpushed branches and stashes! These are 1) more likely to be present than uncommitted files 2) likely losing more work than uncommitted files. git clones MUST always be carefully inspected before deletion no matter what. Autotracking makes a very small difference to that. If you carelessly delete git clones then you get what you deserve - autotracking or not. |
I said this before, @marc-h38, but I really don’t think your tone here is appropriate. You’ve now accused multiple long‐term contributors and users who have engaged at length with the issues surrounding file tracking of arguing in bad faith and lacking experience. We’ve had extensive and fruitful discussions with people strongly opposed to automatic file tracking like @AngelEzquerra, and this is one of the most extensively‐discussed issues in Jujutsu, but I am disappointed that you have repeatedly chosen to turn up the temperature on the discussion. To be clear, I don’t at all mind people having strong opinions on this matter; it’s comments like this I find unacceptable (and I am not saying that I think that none of these could ever be reasonably said – but I do think that they are unwarranted by the discussions here):
I have no position of power in the project beyond the merge bit and am speaking solely for myself here; I just think it is harmful for the project if this kind of thing goes by unremarked upon, and stifles productive discussion of the options and trade‐offs. I think you’ve made plenty of useful and constructive contributions to the discussion here, but if you’re going to continue, please make a greater effort to be civil and not treat conversations as fights. FWIW, @arxanas, I really appreciate your detailed exploration of the design space as always, and think that explicitly representing untracked files may be a really nice way forward that gets much closer to satisfying everyone. |
Thanks @emilazy , I do not believe @arxanas 's presentation of this particular issue was in bad faith, sorry if a bad choice of words gave that impression. I admit I'm getting tired of it being repeatedly presented as a scary and critical issue when it has actually been the (good or bad) routine for the last few decades. This tiredness has negatively affected my tone, apologies. |
"it has been the routine for the last few decades" is not the best argument imo. jj explores new things without the burden of having decades of said legacy and so far the autotrack - while being a contentious point - has been overall a net positive in my opinion and opinions of lots of people, and as you might remember I wasn't the biggest fan. I do agree that some people (cough, @PhilipMetzger, cough - if I'm not super-misremembering) did present no-autotrack as electronic satan at times, but this whole argument often devolves into philosophical battles that are unwinnable, because autotracking and not autotracking are both valid strategies with pros and cons, and I think in the context of jj autotracking makes more sense. People who have established workflows that depend on it, or are not ready for it (like I was I feel like) got the config option (that I argued strongly for). I agree that some person who have never heard of a VCS before, starting with jj from scratch, would be better off with autotrack being the default state of things, and they'll learn the pitfalls - the only major one being accidental big files imo, and that's covered by the heuristic thing. And if they committed and pushed some secret they'd do the same thing in git, I think in discord I said that I see the chances of that happening as equal between jj and git. |
To the questions of "how often?" and "how bad?", I propose the answers are "often" and "it concretely and noticeably impedes productivity". Empirical evidence from Gap Analysis of the Scholarly Git Experience (Nguyen 2021) suggests that novices are constantly doing this, specifically that "all" participants (n = 44?) had trouble with
It's clear from the available evidence that the default UX is not good, and simply doesn't match user expectations, and that these issues arise quite regularly in practice.
In addition: Many experienced engineers leverage the existing Git behavior to implement 'secret'/'ignored'/'precious' workflows.
In terms of how to resolve the UX problem, we can look at What’s Wrong with Git? A Conceptual Design Analysis (De Rosso, Jackson 2013), which provides ideas on how to construct a framework to evaluate potential solutions. To give an example:
[technical note] This specific quote is referring to collapsing "assume-unchanged" and "untracked" into the same state. I'm actually arguing that we don't do exactly that, but my preferred design still tries to preserve 'generality' by generalizing across various file states and 'propriety' by deferring categorization/state transitions until later. [general note] We could adopt the same principles and evaluate our solutions under them, or establish our own. I proposed design principles and concrete workflows in my earlier comment, although at a lower level of abstraction than in this paper. [specific note] Regarding auto-tracking in its current form (including
[micro-critique] I have many quibbles about the argumentation here but ended up removing them during editing of this comment. [macro-critique] Basically, this viewpoint seems to boil down to "the operator should exercise more care". I don't think this is a productive way to approach improving operational safety. I don't know if there's a name for this philosophy, but typically, I don't expect processes to improve simply by telling the operators that they should be more careful. Instead, I rely on improvements to the processes themselves, which is primarily what we're discussing in this thread. Here I am proposing: "this is a specific way that I could avoid losing work under these situations that happen to me personally, and I believe others".
It's not that it's necessarily incorrect (well, I have quibbles) to say something like "you are being careless", but it's not useful, either.
From an argumentation perspective, I think there are two problems here:
EDIT: By the way, statements like "I do intend to correct your dramatization." seem pretty hostile to me. As an example, you could instead say something like "I believe you're overstating the severity of this problem in practice." to communicate the same (?) information. Since I don't know you or your communication style personally, it's quite difficult for me to ascertain if you meant that statement to be less emphatic than I interpreted it to be. |
Thanks for highlighting cases such as this. I work in a repo which, for legacy reasons, has a galactic number of untracked files during/after a build (to the point of basically requiring Having to set that in the config is not an unreasonable burden, IMO. In the longer term, bringing the associated machinery up to par with git (forcing the untracked files back into visibility with |
Thanks for pointing out your use-case. It sounds relevant for the "Performance: handle scanning many files" criterion. Some questions:
If I were to implement some "first-class untracked-files" solution like I was proposing, the necessity of |
According to
Yeah, they are woven into the whole file structure.
Nope. I'm pretty new to
That'd be correct, in my case at least. The untracked files are build artifacts, and including them in commits is wholly undesirable, and would result in pretty bad bloat of the repository.
For the record, I accept that repositories like this are out of the ordinary, and require extra work/care when it comes to the VCS tool. Forgetting to |
I unfortunately don't have the time to read and understand the whole paper, so thanks for quoting that bit. It shows that tracking new files is just a part of a much bigger issue: a general lack of UI design [*]
I've been the local git help desk for many years and I couldn't agree more that its UI is awful. Some other good references come to mind:
The git UI is also why I got curious about However, I've trained and supported many more than 44 engineers over the years and this specific question of tracking brand new files has never been anywhere near the top issues. It simply never hurts much in the greater git UI picture and was very rarely discussed. Yes it does happen from time to time but it's simply not a big deal, that's all. I quickly skimmed your reference and I could not find any mention of this very specific issue either; the paper tries to stay high-level. If
No, it boils down to: "the operator should exercise more care WHEN DELETING A WORKSPACE". This is similar to "the operator should exercise more care when driving a vehicule" or "the operator should exercise more care when handling a gun". The more important word here was not "careful" but "delete" and you dropped it, why?
Yes, even for dangerous actions like deleting (or driving, or shooting,...) it is still possible to improve processes. For instance: moving to the "Trash" instead and waiting 30 days or something. Or: using backups and testing them regularly. But even with the best processes, dangerous actions remain dangerous and dangerous = careful! Users should never, ever be misled into thinking that casually deleting a WORKspace is fine because this can hurt really bad for many other reasons than autotracking new files or not. Deleting workspaces and autotracking new files are two topics completely unrelated to each other.
There is because people don't just sit and watch recurring disasters for decades; this is just not human nature. Version control is very old now and many people have been hard at work to make it safer and easier to use - both in closed-source and open-source tools. Even git has made some efforts on that front, both git itself and its many front-ends. So, if forgetting to track brand NEW files were still scary, critical and frequent, then some solutions would have been at least discussed and proposed. I've kept an interest and eye on that field and I have not seen that very specific question raise interest before. For most users, From @necauqua
This is admirable and exciting but such situations always have some risk of getting carried away and wrongly assuming that past tools and approaches have been completely myopic and static and dismissing "critical" issues, not even discussing them much. Generally not the case.
True, but I seem to have more git support experience than the authors of your reference. So I think it's worth sharing.
Looks very much like "unilaterally establishing" to me; especially for "losing work" which is really not the same as temporarily forgetting to share new files.
Considering how new Also, it's just one of the problems. [*] This is a bit off-topic sorry but besides a poor UI there are other, high-level issues why git is hard:
|
Actually: accidentally sharing wrong files is exactly how you spot git users who regularly (mis)use
|
Now that we have |
Just for the record, #5138 doesn't handle untracked files in untracked directories. I don't think we'll need a separate issue for that, but the |
Actually not in tracked directories either ¯\_(ツ)_/¯ |
Description
Initially I thought the auto-add of files was a neat idea, but in practice I just leave untracked files in my repo all the time, and tools like
patch
(1) assume they can drop.orig
or similar files in the WC without it being a problem. I think every time I've usedjj
I've ended up getting grumpy at auto-adds and having to rip something out of a commit, sometimes after doing a push (when it was effectively emulatinghg import
for example).Steps to Reproduce the Problem
patch -p1 < some.diff
(or similar)jj describe && jj close && jj git push
Expected Behavior
I (still) don't expect auto-adds, and it really surprises me every time, plus it's super frustrating to have to ignore every file spec I might create as a temporary scratch file or what have you.
Actual Behavior
As above.
Specifications
The text was updated successfully, but these errors were encountered: