Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of fscache #1926

Merged
merged 5 commits into from
Nov 16, 2018
Merged

Improve performance of fscache #1926

merged 5 commits into from
Nov 16, 2018

Conversation

benpeart
Copy link

This patch series encompasses a set of changes to the fscache that underlies git's lstat() and opendir(). In addition to some additional tracing, the fscache is enhanced to reduce thread contention and take advantage of the new mem_pool heap manager. The net result is ~25% reduction to preload_index() vs the old fscache.

* splitting up the cache entries across multiple threads so there isn't
* any overlap between threads anyway.
*/
struct fscache {

This comment was marked as off-topic.

@dscho
Copy link
Member

dscho commented Nov 14, 2018

@benpeart have you seen these failures?

 expecting success: 

	git init fscache-test &&
	cd fscache-test &&
	git config core.fscache 1 &&
	echo A > test.txt &&
	git add test.txt &&
	git commit -m A &&
	echo B >> test.txt &&
	git checkout . &&
	test -z "$(git status -s)" &&
	echo A > expect.txt &&
	test_cmp expect.txt test.txt &&
	cd .. &&
	rm -rf fscache-test

++ git init fscache-test
Initialized empty Git repository in D:/a/1/s/t/trash directory.t7201-co/fscache-test/.git/
++ cd fscache-test
++ git config core.fscache 1
++ echo A
++ git add test.txt
++ git commit -m A
[master (root-commit) 13452ee] A
 Author: A U Thor <author@example.com>
 1 file changed, 1 insertion(+)
 create mode 100644 test.txt
++ echo B
++ git checkout .
./test-lib.sh: line 707:  5544 Segmentation fault      git checkout .
error: last command exited with $?=139

I also see a build failure e.g. with linux-gcc:

2018-11-12T21:56:01.5215485Z preload-index.c: In function ‘preload_index’:
2018-11-12T21:56:01.5219128Z preload-index.c:88:30: error: expected expression before ‘;’ token
2018-11-12T21:56:01.5219628Z   fscache = getcache_fscache();

Copy link
Member

@dscho dscho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you don't beat me to it, I will investigate further why the mem_pool is already discarded when we try to use it to allocate memory.

git-compat-util.h Outdated Show resolved Hide resolved
compat/win32/fscache.c Show resolved Hide resolved
Copy link
Member

@dscho dscho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two more things, to let Linux32 build the code. The failure in the Windows job is a bogus one: t3305's scratch directory could not be removed, for whatever reason. The tests passed, though.

mem-pool.c Outdated Show resolved Hide resolved
mem-pool.c Outdated Show resolved Hide resolved
Add tracing around initializing and discarding mempools. In discard report
on the amount of memory unused in the current block to help tune setting
the initial_size.

Signed-off-by: Ben Peart <benpeart@microsoft.com>
Add cache hit/miss statistics to the fscache for lstat() and opendir().
The statistics are printed out when the cache is disabled and cleared and
only if GIT_TRACE_FSCACHE is set.

Signed-off-by: Ben Peart <benpeart@microsoft.com>
Update enable_fscache() to take an optional initial size parameter which is
used to initialize the hashmap so that it can avoid having to rehash as
additional entries are added.

Add a separate disable_fscache() macro to make the code clearer and easier
to read.

Signed-off-by: Ben Peart <benpeart@microsoft.com>
The threading model for fscache has been to have a single, global cache.
This puts requirements on it to be thread safe so that callers like
preload-index can call it from multiple threads.  This was implemented
with a single mutex and completion events which introduces contention
between the calling threads.

Simplify the threading model by making fscache thread specific.  This allows
us to remove the global mutex and synchronization events entirely and instead
associate a fscache with every thread that requests one. This works well with
the current multi-threading which divides the cache entries into blocks with
a separate thread processing each block.

At the end of each worker thread, if there is a fscache on the primary
thread, merge the cached results from the worker into the primary thread
cache. This enables us to reuse the cache later especially when scanning for
untracked files.

In testing, this reduced the time spent in preload_index() by about 25% and
also reduced the CPU utilization significantly.  On a repo with ~200K files,
it reduced overall status times by ~12%.

Signed-off-by: Ben Peart <benpeart@microsoft.com>
Now that the fscache is single threaded, take advantage of the mem_pool as
the allocator to significantly reduce the cost of allocations and frees.

With the reduced cost of free, in future patches, we can start freeing the
fscache at the end of commands instead of just leaking it.

Signed-off-by: Ben Peart <benpeart@microsoft.com>
@dscho dscho merged commit 52fc4db into git-for-windows:master Nov 16, 2018
@dscho dscho added this to the v2.19.1(2) milestone Nov 16, 2018
dscho added a commit to git-for-windows/build-extra that referenced this pull request Nov 20, 2018
The FSCache feature [was optimized to become
faster](git-for-windows/git#1926).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants