Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sparse-checkout: take en/sparse-checkout early #267

Merged
merged 21 commits into from
May 14, 2020

Conversation

derrickstolee
Copy link
Collaborator

These commits are in next and are valuable to us. In particular, it does the right thing when presented with a dirty status.

Resolves #234.

Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
commit 08402b0 ("merge-recursive: distinguish "removed" and
"overwritten" messages", 2010-08-11) split
    ERROR_WOULD_LOSE_UNTRACKED
into both
    ERROR_WOULD_LOSE_UNTRACKED_OVERWRITTEN
    ERROR_WOULD_LOSE_UNTRACKED_REMOVED
and also split
    ERROR_WOULD_LOSE_ORPHANED
into both
    ERROR_WOULD_LOSE_ORPHANED_OVERWRITTEN
    ERROR_WOULD_LOSE_ORPHANED_REMOVED

However, despite the split only three of these four types were used.
ERROR_WOULD_LOSE_ORPHANED_REMOVED was not put into use when it was
introduced and nothing else has used it in the intervening decade
either.  Remove it.

Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
verify_absent_sparse() was introduced in commit 08402b0
("merge-recursive: distinguish "removed" and "overwritten" messages",
2010-08-11), and has always had exactly one caller which always passes
error_type == ERROR_WOULD_LOSE_UNTRACKED_OVERWRITTEN.  This function
then checks whether error_type is this value, and if so, sets it instead
to ERROR_WOULD_LOSE_ORPHANED_OVERWRITTEN.  It has been nearly a decade
and no other caller has been created, and no other value has ever been
passed, so just pass the expected value to begin with.

Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
commit e091228 ("sparse-checkout: update working directory
in-process", 2019-11-21) allowed passing a pre-defined set of patterns
to unpack_trees().  However, if o->pl was NULL, it would still read the
existing patterns and use those.  If those patterns were read into a
data structure that was allocated, naturally they needed to be free'd.
However, despite the same function being responsible for knowing about
both the allocation and the free'ing, the logic for tracking whether to
free the pattern_list was hoisted to an outer function with an
additional flag in unpack_trees_options.  Put the logic back in the
relevant function and discard the now unnecessary flag.

Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
check_updates() previously assumed it was working on o->result.  We want
to use this function in combination with a different index_state, so
take the intended index_state as a parameter.

Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If a path is dirty, removing from the working tree risks losing data.
As such, we want to make sure any such path is not marked with
SKIP_WORKTREE.  While the current callers of this code detect this case
and re-populate with a previous set of sparsity patterns, we want to
allow some paths to be marked with SKIP_WORKTREE while others are left
unmarked without it being considered an error.  The reason this
shouldn't be considered an error is that SKIP_WORKTREE has always been
an advisory-only setting; merge and rebase for example were free to
materialize paths and clear the SKIP_WORKTREE bit in order to accomplish
their work even though they kept the SKIP_WORKTREE bit set for other
paths.  Leaving dirty working files in the working tree is thus a
natural extension of what we have already been doing.

Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Create a populate_from_existing_patterns() function for reading the
path_patterns from $GIT_DIR/info/sparse-checkout so that we can re-use
it elsewhere.

Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Previously, the only way to update the SKIP_WORKTREE bits for various
paths was invoking `git read-tree -mu HEAD` or calling the same code
that this codepath invoked.  This however had a number of problems if
the index or working directory were not clean.  First, let's consider
the case:

  Flipping SKIP_WORKTREE -> !SKIP_WORKTREE (materializing files)

If the working tree was clean this was fine, but if there were files or
directories or symlinks or whatever already present at the given path
then the operation would abort with an error.  Let's label this case
for later discussion:

    A) There is an untracked path in the way

Now let's consider the opposite case:

  Flipping !SKIP_WORKTREE -> SKIP_WORKTREE (removing files)

If the index and working tree was clean this was fine, but if there were
any unclean paths we would run into problems.  There are three different
cases to consider:

    B) The path is unmerged
    C) The path has unstaged changes
    D) The path has staged changes (differs from HEAD)

If any path fell into case B or C, then the whole operation would be
aborted with an error.  With sparse-checkout, the whole operation would
be aborted for case D as well, but for its predecessor of using `git
read-tree -mu HEAD` directly, any paths that fell into case D would be
removed from the working copy and the index entry for that path would be
reset to match HEAD -- which looks and feels like data loss to users
(only a few are even aware to ask whether it can be recovered, and even
then it requires walking through loose objects trying to match up the
right ones).

Refusing to remove files that have unsaved user changes is good, but
refusing to work on any other paths is very problematic for users.  If
the user is in the middle of a rebase or has made modifications to files
that bring in more dependencies, then for their build to work they need
to update the sparse paths.  This logic has been preventing them from
doing so.  Sometimes in response, the user will stage the files and
re-try, to no avail with sparse-checkout or to the horror of losing
their changes if they are using its predecessor of `git read-tree -mu
HEAD`.

Add a new update_sparsity() function which will not error out in any of
these cases but behaves as follows for the special cases:
    A) Leave the file in the working copy alone, clear the SKIP_WORKTREE
       bit, and print a warning (thus leaving the path in a state where
       status will report the file as modified, which seems logical).
    B) Do NOT mark this path as SKIP_WORKTREE, and leave it as unmerged.
    C) Do NOT mark this path as SKIP_WORKTREE and print a warning about
       the dirty path.
    D) Mark the path as SKIP_WORKTREE, but do not revert the version
       stored in the index to match HEAD; leave the contents alone.

I tried a different behavior for A (leave the SKIP_WORKTREE bit set),
but found it very surprising and counter-intuitive (e.g. the user sees
it is present along with all the other files in that directory, tries to
stage it, but git add ignores it since the SKIP_WORKTREE bit is set).  A
& C seem like optimal behavior to me.  B may be as well, though I wonder
if printing a warning would be an improvement.  Some might be slightly
surprised by D at first, but given that it does the right thing with
`git commit` and even `git commit -a` (`git add` ignores entries that
are marked SKIP_WORKTREE and thus doesn't delete them, and `commit -a`
is similar), it seems logical to me.

Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove the equivalent of 'git read-tree -mu HEAD' in the sparse-checkout
codepaths for setting the SKIP_WORKTREE bits and instead use the new
update_sparsity() function.

Note that when an issue is hit, the error message splits 'error' and
'Cannot update sparse checkout' on separate lines.  For now, we use two
greps to find both pieces of the error message but subsequent commits
will clean up the messages reported to the user.

Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
setup_unpack_trees_porcelain() provides much improved error/warning
messages; instead of a message that assumes that there is only one path
with a given problem despite being used by code that intentionally is
grouping and showing errors together, it uses a message designed to be
used with groups of paths.  For example, this transforms

    error: Entry '	folder1/a
	folder2/a
    ' not uptodate. Cannot update sparse checkout.

into

    error: Cannot update sparse checkout: the following entries are not up to date:
	folder1/a
	folder2/a

In the past the suboptimal messages were never actually triggered
because we would error out if the working directory wasn't clean before
we even called unpack_trees().  The previous commit changed that,
though, so let's use the better error messages.

Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A minor change, but we want to convert the sparse messages to warnings
and this allows us to group warnings and errors.

Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We want to treat issues with setting the SKIP_WORKTREE bit as a warning
rather than an error; rename the enum values to reflect this intent as
a simple step towards that goal.

Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
display_error_msgs() is never called to show messages of both ERROR_*
and WARNING_* types at the same time; it is instead called multiple
times, separately for each type.  Since we want to display these types
differently, make two slightly different versions of this function.

A subsequent commit will further modify unpack_trees() and how it calls
the new display_warning_msgs().

Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The messages for problems with sparse paths are phrased as errors that
cause the operation to abort, even though we are not making the
operation abort.  Reword the messages to make sense in their new
context.

Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When sparse-checkout runs to update the list of sparsity patterns, it
gives warnings if it can't remove paths from the working tree because
those files have dirty changes.  Add a similar warning for unmerged
paths as well.

Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Setting and clearing of the SKIP_WORKTREE bit is not only done when
users run 'sparse-checkout'; other commands such as 'checkout' also run
through unpack_trees() which has logic for handling this special bit.
As such, we need to consider how they handle special cases.  A couple
comparison points should help explain the rationale for changing how
unpack_trees() handles these bits:

    Ignoring sparse checkouts for a moment, if you are switching
    branches and have dirty changes, it is only considered an error that
    will prevent the branch switching from being successful if the dirty
    file happens to be one of the paths with different contents.

    SKIP_WORKTREE has always been considered advisory; for example, if
    rebase or merge need or even want to materialize a path as part of
    their work, they have always been allowed to do so regardless of the
    SKIP_WORKTREE setting.  This has been used for unmerged paths, but
    it was often used for paths it wasn't needed just because it made
    the code simpler.  It was a best-effort consideration, and when it
    materialized paths contrary to the SKIP_WORKTREE setting, it was
    never required to even print a warning message.

In the past if you trying to run e.g. 'git checkout' and:
  1) you had a path that was materialized and had some dirty changes
  2) the path was listed in $GITDIR/info/sparse-checkout
  3) this path did not different between the current and target branches
then despite the comparison points above, the inability to set
SKIP_WORKTREE was treated as a *hard* error that would abort the
checkout operation.  This is completely inconsistent with how
SKIP_WORKTREE is handled elsewhere, and rather annoying for users as
leaving the paths materialized in the working copy (with a simple
warning) should present no problem at all.

Downgrade any errors from inability to toggle the SKIP_WORKTREE bit to a
warning and allow the operations to continue.

Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If commands like merge or rebase materialize files as part of their work,
or a previous sparse-checkout command failed to update individual files
due to dirty changes, users may want a command to simply 'reapply' the
sparsity rules.  Provide one.

Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
@@ -12,6 +12,7 @@
#include "lockfile.h"
#include "resolve-undo.h"
#include "unpack-trees.h"
#include "wt-status.h"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(In case anyone is paying attention, this commit is actually a revert of the commit Revert "sparse-checkout: check for dirty status" which will get obliterated in the next rebase with this fixup.)

Copy link

@jeffhostetler jeffhostetler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is a rubber stamp ok here?

Copy link
Member

@dscho dscho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was about to use range-diff to make sure that the patches did not change dramatically, but then I realized that you merged the commits verbatim from Junio's branch 😄

The loop in warn_conflicted_path() that checks for the count of
entries with the same path uses "i+count" for the array
entry. However, the loop only verifies that the value of count is
below the array size. Fix this by adding i to the condition.

I hit this condition during a test of the in-tree sparse-checkout
feature, so it is exercised by the end of the series.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
[jc: readability fix]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
@derrickstolee
Copy link
Collaborator Author

Is a rubber stamp ok here?

Thank you both for the approvals. I need to figure out the errors this introduces in the Scalar and VFS for Git test suites. So far, it seems like it's a test bug in microsoft/scalar#374. Perhaps the same bug is at fault for microsoft/VFSForGit#1661, but I think it's more likely to be a merge issue that I need to resolve here.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Includes en/sparse-checkout with a small fix on top.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
@derrickstolee derrickstolee merged commit 5570d4a into microsoft:vfs-2.26.2 May 14, 2020
derrickstolee added a commit to microsoft/scalar that referenced this pull request May 15, 2020
Also fixes a bug when running `scalar clone` followed by `git sparse-checkout disable`.

See microsoft/git#267.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sparse-checkout: correct behavior on dirty status
4 participants