-
Notifications
You must be signed in to change notification settings - Fork 41
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Faster diff (for some diffs) by skipping pygit2
Background: =========== pygit2 Tree.diff_to_tree is very slow for some large trees. This seems particularly bad on a repo containing a large table dataset (many rows, but few columns / small objects) e.g. with this 43M-feature repo: ``` $ git ls-tree -r c62a4290d7d1d4039ff06afcbe364bac6abf0e69 | wc -l 43482704 ``` A small diff (a couple thousand features changed) takes 25s for `kart diff` to bootstrap. cProfile shows the time all spent in: ``` 25.534 {method 'diff_to_tree' of '_pygit2.Tree' objects} ``` I was unable to find anything obvious in pygit2 or libgit2 source code pertaining to this. e.g. the flags we're passing appear to be the right ones. Since `git diff-tree` can produce this diff within ~0.1s I just switched to using that as a subprocess. The results are identical but much faster (at least in the above case) Performance of larger diffs / smaller repos =========================================== The difference is less stark with other repos. If I switch to a full diff (`[EMPTY]...HEAD`) I find no noticeable difference in performance between the pygit2 and git implementations. If I switch to a repo with less features but larger objects, I find a much smaller difference than the above. So to reiterate the win appears to be largest when dealing with repos with: * large number of features * each feature is small * final diff covers a small fraction of the repo (e.g. 1% or less) e.g. on [NZ Property Titles](https://data.linz.govt.nz/layer/50804-nz-property-titles/repository/) I found a ~25% improvement when asking for a 14477-feature diff (0.6% of total features) before: ``` kart diff 'HEAD^^^...HEAD' -o json-lines --no-sort-keys --output /dev/null 1.89s user 0.21s system 99% cpu 2.111 total ``` After: ``` kart diff 'HEAD^^^...HEAD' -o json-lines --no-sort-keys --output /dev/null 1.38s user 0.13s system 98% cpu 1.525 total ```
- Loading branch information
Showing
3 changed files
with
74 additions
and
10 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters