-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
unpack-trees: enable fscache for sparse-checkout #2224
unpack-trees: enable fscache for sparse-checkout #2224
Conversation
On my iPad so I can’t dig too deep right now, but this looks good. You might confirm (in other commands) that have historically used fscache Can you check whether you have “preload index” turned on and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor suggestions...
unpack-trees.c
Outdated
@@ -1437,7 +1437,9 @@ static void mark_new_skip_worktree(struct exclude_list *el, | |||
* 2. Widen worktree according to sparse-checkout file. | |||
* Matched entries will have skip_wt_flag cleared (i.e. "in") | |||
*/ | |||
enable_fscache(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might make sense to initialize the hashmap size of the FSCache with istate->cache_nr
? That way, the hashmap does not have to grow a large number of times before reaching its final size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This appears to have improved the performance to ~6 seconds!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow! 👍
clear_ce_flags(istate, select_flag, skip_wt_flag, el); | ||
disable_fscache(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may want to mention in the commit message that disable_fscache()
decrements a counter, and only actually discards the FSCache when that counter reaches 0. (To address one of @jeffhostetler's concerns.)
The most common route for this is the Meaning: the FSCache is discarded unless the In general, it is not a safe thing to keep the FSCache, as there are many possibilities for the cached values to become stale. |
Wow, that's impressive!
Indeed. Don't we have code in the FSCache to detect that already? |
Oh, you know, may I ask to include that piece of information in the commit message? |
@dscho Yes, I added some code to fscache to know about not-found directories bac in:
|
Apparently not, as I still see the deep dir calls (/my/really/long/path) even |
When updating the skip-worktree bits in the index to align with new values in a sparse-checkout file, Git scans the entire working directory with lstat() calls. In a sparse-checkout, many of these lstat() calls are for paths that do not exist. Enable the fscache feature during this scan. Since enable_fscache() calls nest, the disable_fscache() method decrements a counter and would only clear the cache if that counter reaches zero. In a local test of a repo with ~2.2 million paths, updating the index with git read-tree -m -u HEAD with a sparse-checkout file containing only /.gitattributes improved from 2-3 minutes to ~6 seconds. Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
143bfe9
to
f392fab
Compare
Ah, I think I remember enough details now: when Line 374 in d003d72
and here: Lines 570 to 571 in d003d72
I guess what we need is code that looks harder for non-existent directories, e.g. by calling |
I'm hoping that we can duplicate the logic from this section of
If we can do a similar "one level up" lookup, then we should get it for free without recursing up all parents since the index file should list the parent first. |
Do you still want to do that in the context of this here PR? Or would you rather have me merge it and we'll try to look at reducing the cache misses later? |
This is something to try independently. Thanks! |
The FSCache feature [is now used with `git checkout` and `git reset` in sparse checkouts](git-for-windows/git#2224). Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
When updating the skip-worktree bits in the index to align with new
values in a sparse-checkout file, Git scans the entire working
directory with lstat() calls. In a sparse-checkout, many of these
lstat() calls are for paths that do not exist.
Enable the fscache feature during this scan.
In a local test of a repo with ~2.2 million paths, updating the index
with
git read-tree -m -u HEAD
with a sparse-checkout file containingonly
/.gitattributes
improved from 2-3 minutes to 15-20 seconds.More work could be done to stop running lstat() calls when recursing
into directories that are known to not exist.