-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rebase -i: fix re-reading the todo list when newly created objects are referenced #2121
Conversation
Cc @peff (because there might very well be more use cases where the loose object cache goes stale and would need to either be blown away, or simply disabled). |
Can you clarify the symptoms of the bug a little more? Specifically, which use of the cache is having problems? Since the cache is otherwise only used for |
By the way, if the symptom is in the abbrev code, I suspect the problem you are seeing would pre-date the recent loose-object-cache work. That had its own cache since cc817ca (which couldn't even be invalidated!). I'm guessing it's coming up now because of moving |
This is the symptom after appending some lines to the todo list:
I did single-step through one of those, and it fails at https://github.com/git-for-windows/git/blob/v2.21.0.windows.1/sequencer.c#L2132:
I had not actually debugged further, but instead I came up with the fix in this PR, on the hunch that it might be the loose object cache. And it seems to have fixed the problem... I guess I really should add a regression test ;-) |
Hmm. That |
But it would... this is the backtrace:
... and a patch to demonstrate the problem (which I will submit to the Git mailing list Real Soon Now): diff --git a/t/t3429-rebase-edit-todo.sh b/t/t3429-rebase-edit-todo.sh
index b9292dfc2a..d78f057f35 100755
--- a/t/t3429-rebase-edit-todo.sh
+++ b/t/t3429-rebase-edit-todo.sh
@@ -11,4 +11,28 @@ test_expect_success 'rebase exec modifies rebase-todo' '
test -e F
'
+test_expect_success 'loose cache does not interfere with re-reading todo list' '
+ GIT_REBASE_TODO=.git/rebase-merge/git-rebase-todo &&
+ export GIT_REBASE_TODO &&
+ write_script append-todo.sh <<-\EOS &&
+ # It was determined manually that the following fast-import generates
+ # SHA-1s with the same first two digits for the command-line arguments
+ # 5 and 6:
+
+ git fast-import --quiet <<-EOF
+ commit refs/tags/ref$1
+ committer A U Thor <author@example.org> $1 +0000
+ data <<EOM
+ $1
+ EOM
+ EOF
+
+ echo "pick $(git rev-parse --short ref$1)" >>$GIT_REBASE_TODO
+ test $1 -gt 6 ||
+ echo "exec $0 $(($1+1))" >>$GIT_REBASE_TODO
+ EOS
+
+ git rebase HEAD -x "./append-todo.sh 5"
+'
+
test_done |
Ah, right, of course. It triggers the abbrev code for a lookup. So I do think this was probably broken for some time due to the old cache. And has probably been broken forever in a racy way if somebody was simultaneously repacking. We'd probably want something like this: diff --git a/sha1-name.c b/sha1-name.c
index 6dda2c16df..cfe5c874b6 100644
--- a/sha1-name.c
+++ b/sha1-name.c
@@ -442,6 +442,18 @@ static enum get_oid_result get_short_oid(const char *name, int len,
find_short_packed_object(&ds);
status = finish_object_disambiguation(&ds, oid);
+ /*
+ * If we didn't find it, do the usual reprepare() slow-path,
+ * since the object may have recently been added to the repository
+ * or migrated from loose to packed.
+ */
+ if (status == MISSING_OBJECT) {
+ reprepare_packed_git(the_repository);
+ find_short_object_filename(&ds);
+ find_short_packed_object(&ds);
+ status = finish_object_disambiguation(&ds, oid);
+ }
+
if (!quietly && (status == SHORT_NAME_AMBIGUOUS)) {
struct oid_array collect = OID_ARRAY_INIT;
That fixes your test, though I'm not sure if it's entirely safe to call those |
But wasn't part of the idea behind the loose object cache to accelerate the awful performance on NFS when trying to find the same object over and over again, even if we already know that it does not exist? That fix would regress on that, unless I am mistaken. |
FYI my current (unsubmitted) iteration is here: gitgitgadget#161 |
I am fairly certain that it is safe. But I still think that we should do this only if we also introduce a flag that is set by any loose object-writing (or -deleting) code path, and we should only then blow away the loose object cache. |
No, the caller has to opt into the "quick" mode explicitly, with the intent that we would not introduce these kinds of cache-invalidation bugs. So And I'd say the same thing should apply here. We do not currently have a |
Fair enough! |
That would actually not work in the regression test that I added, as the objects are written in a different process. D'oh. |
FWIW I updated that PR, and the actual fix is this one: gitgitgadget@3be9170 |
We specifically support `exec` commands in `git rebase -i`'s todo lists to rewrite the very same todo list. Of course, we need to validate that todo list when re-reading it. It is also totally legitimate to extend the todo list by `pick` lines using short names of commits that were created only after the rebase started. And this is where the loose object cache interferes with this feature: if *some* loose object was read whose hash shares the same first two digits with a commit that was not yet created when that loose object was created, then we fail to find that new commit by its short name in `get_oid()`, and the interactive rebase fails with an obscure error message like: error: invalid line 1: pick 6568fef error: please fix this using 'git rebase --edit-todo'. Let's first demonstrate that this is actually a bug in a new regression test, in a separate commit so that other developers who do not believe me can cherry-pick it to confirm the problem. This new regression test generates two commits whose hashes share the first two hex digits (so that their corresponding loose objects live in the same subdirectory of .git/objects/, and are therefore supposed to be in the same loose object cache bin). It then picks the first, to make sure that the loose object cache is initialized and cached that object directory, then generates the second commit and picks it, too. Since the commit was generated in a different process than the sequencer that wants to pick it, the loose object cache had no chance of being updated in the meantime. Technically, we would need only one `exec` command in this regression test case, but for ease of implementation, it uses a pseudo-recursive call to the same script. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
The interactive rebase simply complains about an "invalid line" when the object hash of, say, a `pick` line could not be parsed. Let's tell the user what happened in a little more detail. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
It is quite possible that the loose object cache gets stale when new objects are written. In that case, get_oid() would potentially say that it cannot find a given object, even if it should find it. Let's blow away the loose object cache as well as the read packs and try again in that case. Note: this does *not* affect the code path that was introduced to help avoid looking for the same non-existing objects (which made some operations really expensive via NFS): that code path is handled by the `OBJECT_INFO_QUICK` flag (which does not even apply to `get_oid()`, which has no equivalent flag, at least at the time this patch was written). This incidentally fixes the problem identified earlier where an interactive rebase wanted to re-read (and validate) the todo list after an `exec` command modified it. Helped-by: Jeff King <peff@peff.net> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
34d9f62
to
119c983
Compare
This was accepted into |
rebase -i: fix re-reading the todo list when newly created objects are referenced
`git rebase -i` used to get confused when an `exec` command created new commits and then appended `pick` lines for them. This [has been fixed](git-for-windows/git#2121). Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
rebase -i: fix re-reading the todo list when newly created objects are referenced
rebase -i: fix re-reading the todo list when newly created objects are referenced
…-list rebase -i: fix re-reading the todo list when newly created objects are referenced
rebase -i: fix re-reading the todo list when newly created objects are referenced
rebase -i: fix re-reading the todo list when newly created objects are referenced
rebase -i: fix re-reading the todo list when newly created objects are referenced
rebase -i: fix re-reading the todo list when newly created objects are referenced
rebase -i: fix re-reading the todo list when newly created objects are referenced
rebase -i: fix re-reading the todo list when newly created objects are referenced
In the
ever-green.sh
script, we make use of an obscure, yet powerful feature of the interactive rebase: we use anexec
command to augment the todo list (to allow for a 2-phase rebase to be executed in a single one, so that the second phase will be executed even if a merge conflict had to be resolved in the first phase).However, this poses a challenge with the new loose object cache, as we might reference newly-created commits in that new todo list part, and the loose object cache might pretend that those commits do not exist.
Let's fix that.