Skip to content

Commit bb3fffa

Browse files
dschoGit for Windows Build Agent
authored and
Git for Windows Build Agent
committed
Add experimental 'git survey' builtin (#5174)
This introduces `git survey` to Git for Windows ahead of upstream for the express purpose of getting the path-based analysis in the hands of more folks. The inspiration of this builtin is [`git-sizer`](https://github.com/github/git-sizer), but since that command relies on `git cat-file --batch` to get the contents of objects, it has limits to how much information it can provide. This is mostly a rewrite of the `git survey` builtin that was introduced into the `microsoft/git` fork in microsoft#667. That version had a lot more bells and whistles, including an analysis much closer to what `git-sizer` provides. The biggest difference in this version is that this one is focused on using the path-walk API in order to visit batches of objects based on a common path. This allows identifying, for instance, the path that is contributing the most to the on-disk size across all versions at that path. For example, here are the top ten paths contributing to my local Git repository (which includes `microsoft/git` and `gitster/git`): ``` TOP FILES BY DISK SIZE ============================================================================ Path | Count | Disk Size | Inflated Size -----------------------------------------+-------+-----------+-------------- whats-cooking.txt | 1373 | 11637459 | 37226854 t/helper/test-gvfs-protocol | 2 | 6847105 | 17233072 git-rebase--helper | 1 | 6027849 | 15269664 compat/mingw.c | 6111 | 5194453 | 463466970 t/helper/test-parse-options | 1 | 3420385 | 8807968 t/helper/test-pkt-line | 1 | 3408661 | 8778960 t/helper/test-dump-untracked-cache | 1 | 3408645 | 8780816 t/helper/test-dump-fsmonitor | 1 | 3406639 | 8776656 po/vi.po | 104 | 1376337 | 51441603 po/de.po | 210 | 1360112 | 71198603 ``` This kind of analysis has been helpful in identifying the reasons for growth in a few internal monorepos. Those findings motivated the changes in #5157 and #5171. With this early version in Git for Windows, we can expand the reach of the experimental tool in advance of it being contributed to the upstream project. Unfortunately, this will mean that in the next `microsoft/git` rebase, Jeff Hostetler's version will need to be pulled out since there are enough conflicts. These conflicts include how tables are stored and generated, as the version in this PR is slightly more general to allow for different kinds of data. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2 parents e4d41d1 + 1dc073b commit bb3fffa

13 files changed

+1143
-0
lines changed

.gitignore

+1
Original file line numberDiff line numberDiff line change
@@ -166,6 +166,7 @@
166166
/git-submodule
167167
/git-submodule--helper
168168
/git-subtree
169+
/git-survey
169170
/git-svn
170171
/git-switch
171172
/git-symbolic-ref

Documentation/config.adoc

+2
Original file line numberDiff line numberDiff line change
@@ -538,6 +538,8 @@ include::config/status.adoc[]
538538

539539
include::config/submodule.adoc[]
540540

541+
include::config/survey.adoc[]
542+
541543
include::config/tag.adoc[]
542544

543545
include::config/tar.adoc[]

Documentation/config/survey.adoc

+14
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
survey.*::
2+
These variables adjust the default behavior of the `git survey`
3+
command. The intention is that this command could be run in the
4+
background with these options.
5+
+
6+
--
7+
verbose::
8+
This boolean value implies the `--[no-]verbose` option.
9+
progress::
10+
This boolean value implies the `--[no-]progress` option.
11+
top::
12+
This integer value implies `--top=<N>`, specifying the
13+
number of entries in the detail tables.
14+
--

Documentation/git-survey.adoc

+83
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,83 @@
1+
git-survey(1)
2+
=============
3+
4+
NAME
5+
----
6+
git-survey - EXPERIMENTAL: Measure various repository dimensions of scale
7+
8+
SYNOPSIS
9+
--------
10+
[verse]
11+
(EXPERIMENTAL!) 'git survey' <options>
12+
13+
DESCRIPTION
14+
-----------
15+
16+
Survey the repository and measure various dimensions of scale.
17+
18+
As repositories grow to "monorepo" size, certain data shapes can cause
19+
performance problems. `git-survey` attempts to measure and report on
20+
known problem areas.
21+
22+
Ref Selection and Reachable Objects
23+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
24+
25+
In this first analysis phase, `git survey` will iterate over the set of
26+
requested branches, tags, and other refs and treewalk over all of the
27+
reachable commits, trees, and blobs and generate various statistics.
28+
29+
OPTIONS
30+
-------
31+
32+
--progress::
33+
Show progress. This is automatically enabled when interactive.
34+
35+
Ref Selection
36+
~~~~~~~~~~~~~
37+
38+
The following options control the set of refs that `git survey` will examine.
39+
By default, `git survey` will look at tags, local branches, and remote refs.
40+
If any of the following options are given, the default set is cleared and
41+
only refs for the given options are added.
42+
43+
--all-refs::
44+
Use all refs. This includes local branches, tags, remote refs,
45+
notes, and stashes. This option overrides all of the following.
46+
47+
--branches::
48+
Add local branches (`refs/heads/`) to the set.
49+
50+
--tags::
51+
Add tags (`refs/tags/`) to the set.
52+
53+
--remotes::
54+
Add remote branches (`refs/remote/`) to the set.
55+
56+
--detached::
57+
Add HEAD to the set.
58+
59+
--other::
60+
Add notes (`refs/notes/`) and stashes (`refs/stash/`) to the set.
61+
62+
OUTPUT
63+
------
64+
65+
By default, `git survey` will print information about the repository in a
66+
human-readable format that includes overviews and tables.
67+
68+
References Summary
69+
~~~~~~~~~~~~~~~~~~
70+
71+
The references summary includes a count of each kind of reference,
72+
including branches, remote refs, and tags (split by "all" and
73+
"annotated").
74+
75+
Reachable Object Summary
76+
~~~~~~~~~~~~~~~~~~~~~~~~
77+
78+
The reachable object summary shows the total number of each kind of Git
79+
object, including tags, commits, trees, and blobs.
80+
81+
GIT
82+
---
83+
Part of the linkgit:git[1] suite

Documentation/meson.build

+1
Original file line numberDiff line numberDiff line change
@@ -141,6 +141,7 @@ manpages = {
141141
'git-status.adoc' : 1,
142142
'git-stripspace.adoc' : 1,
143143
'git-submodule.adoc' : 1,
144+
'git-survey.adoc' : 1,
144145
'git-svn.adoc' : 1,
145146
'git-switch.adoc' : 1,
146147
'git-symbolic-ref.adoc' : 1,

Makefile

+1
Original file line numberDiff line numberDiff line change
@@ -1326,6 +1326,7 @@ BUILTIN_OBJS += builtin/sparse-checkout.o
13261326
BUILTIN_OBJS += builtin/stash.o
13271327
BUILTIN_OBJS += builtin/stripspace.o
13281328
BUILTIN_OBJS += builtin/submodule--helper.o
1329+
BUILTIN_OBJS += builtin/survey.o
13291330
BUILTIN_OBJS += builtin/symbolic-ref.o
13301331
BUILTIN_OBJS += builtin/tag.o
13311332
BUILTIN_OBJS += builtin/unpack-file.o

builtin.h

+1
Original file line numberDiff line numberDiff line change
@@ -232,6 +232,7 @@ int cmd_sparse_checkout(int argc, const char **argv, const char *prefix, struct
232232
int cmd_status(int argc, const char **argv, const char *prefix, struct repository *repo);
233233
int cmd_stash(int argc, const char **argv, const char *prefix, struct repository *repo);
234234
int cmd_stripspace(int argc, const char **argv, const char *prefix, struct repository *repo);
235+
int cmd_survey(int argc, const char **argv, const char *prefix, struct repository *repo);
235236
int cmd_submodule__helper(int argc, const char **argv, const char *prefix, struct repository *repo);
236237
int cmd_switch(int argc, const char **argv, const char *prefix, struct repository *repo);
237238
int cmd_symbolic_ref(int argc, const char **argv, const char *prefix, struct repository *repo);

0 commit comments

Comments
 (0)