Skip to content

Integrate commit-graph into 'fsck' and 'gc' #6

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 21 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
ae66969
commit-graph: UNLEAK before die()
derrickstolee May 24, 2018
0a18dc4
commit-graph: fix GRAPH_MIN_SIZE
derrickstolee May 24, 2018
52280bb
commit-graph: parse commit from chosen graph
derrickstolee May 24, 2018
fc52376
commit: force commit to parse from object database
derrickstolee May 24, 2018
3c84326
commit-graph: load a root tree from specific graph
derrickstolee May 24, 2018
cb74de9
commit-graph: add 'verify' subcommand
derrickstolee May 24, 2018
38828ce
commit-graph: verify catches corrupt signature
derrickstolee May 24, 2018
f54079f
commit-graph: verify required chunks are present
derrickstolee May 24, 2018
f78bda6
commit-graph: verify corrupt OID fanout and lookup
derrickstolee May 24, 2018
4dfc871
commit-graph: verify objects exist
derrickstolee May 24, 2018
21501ee
commit-graph: verify root tree OIDs
derrickstolee May 24, 2018
d3b303f
commit-graph: verify parent list
derrickstolee May 24, 2018
4d1d630
commit-graph: verify generation number
derrickstolee May 24, 2018
80b6f0f
commit-graph: verify commit date
derrickstolee May 24, 2018
702c382
commit-graph: test for corrupted octopus edge
derrickstolee May 24, 2018
672a9c4
commit-graph: verify contents match checksum
derrickstolee May 24, 2018
d3b7938
fsck: verify commit-graph
derrickstolee May 24, 2018
b7c9bf6
commit-graph: use string-list API for input
derrickstolee Jun 4, 2018
49958e2
commit-graph: add '--reachable' option
derrickstolee May 24, 2018
b63137f
gc: automatically write commit-graph files
derrickstolee May 24, 2018
d2b0f48
commit-graph: update design document
derrickstolee May 24, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 6 additions & 3 deletions Documentation/config.txt
Original file line number Diff line number Diff line change
Expand Up @@ -904,9 +904,12 @@ core.notesRef::
This setting defaults to "refs/notes/commits", and it can be overridden by
the `GIT_NOTES_REF` environment variable. See linkgit:git-notes[1].

core.commitGraph::
Enable git commit graph feature. Allows reading from the
commit-graph file.
gc.commitGraph::
If true, then gc will rewrite the commit-graph file when
linkgit:git-gc[1] is run. When using linkgit:git-gc[1]
'--auto' the commit-graph will be updated if housekeeping is
required. Default is false. See linkgit:git-commit-graph[1]
for details.

core.sparseCheckout::
Enable "sparse checkout" feature. See section "Sparse checkout" in
Expand Down
14 changes: 12 additions & 2 deletions Documentation/git-commit-graph.txt
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ SYNOPSIS
--------
[verse]
'git commit-graph read' [--object-dir <dir>]
'git commit-graph verify' [--object-dir <dir>]
'git commit-graph write' <options> [--object-dir <dir>]


Expand Down Expand Up @@ -37,12 +38,16 @@ Write a commit graph file based on the commits found in packfiles.
+
With the `--stdin-packs` option, generate the new commit graph by
walking objects only in the specified pack-indexes. (Cannot be combined
with --stdin-commits.)
with `--stdin-commits` or `--reachable`.)
+
With the `--stdin-commits` option, generate the new commit graph by
walking commits starting at the commits specified in stdin as a list
of OIDs in hex, one OID per line. (Cannot be combined with
--stdin-packs.)
`--stdin-packs` or `--reachable`.)
+
With the `--reachable` option, generate the new commit graph by walking
commits starting at all refs. (Cannot be combined with `--stdin-commits`
or `--stdin-packs`.)
+
With the `--append` option, include all commits that are present in the
existing commit-graph file.
Expand All @@ -52,6 +57,11 @@ existing commit-graph file.
Read a graph file given by the commit-graph file and output basic
details about the graph file. Used for debugging purposes.

'verify'::

Read the commit-graph file and verify its contents against the object
database. Used to check for corrupted data.


EXAMPLES
--------
Expand Down
3 changes: 3 additions & 0 deletions Documentation/git-fsck.txt
Original file line number Diff line number Diff line change
Expand Up @@ -110,6 +110,9 @@ Any corrupt objects you will have to find in backups or other archives
(i.e., you can just remove them and do an 'rsync' with some other site in
the hopes that somebody else has the object you have corrupted).

If core.commitGraph is true, the commit-graph file will also be inspected
using 'git commit-graph verify'. See linkgit:git-commit-graph[1].

Extracted Diagnostics
---------------------

Expand Down
4 changes: 4 additions & 0 deletions Documentation/git-gc.txt
Original file line number Diff line number Diff line change
Expand Up @@ -136,6 +136,10 @@ The optional configuration variable `gc.packRefs` determines if
it within all non-bare repos or it can be set to a boolean value.
This defaults to true.

The optional configuration variable `gc.commitGraph` determines if
'git gc' should run 'git commit-graph write'. This can be set to a
boolean value. This defaults to false.

The optional configuration variable `gc.aggressiveWindow` controls how
much time is spent optimizing the delta compression of the objects in
the repository when the --aggressive option is specified. The larger
Expand Down
22 changes: 0 additions & 22 deletions Documentation/technical/commit-graph.txt
Original file line number Diff line number Diff line change
Expand Up @@ -118,9 +118,6 @@ Future Work
- The commit graph feature currently does not honor commit grafts. This can
be remedied by duplicating or refactoring the current graft logic.

- The 'commit-graph' subcommand does not have a "verify" mode that is
necessary for integration with fsck.

- After computing and storing generation numbers, we must make graph
walks aware of generation numbers to gain the performance benefits they
enable. This will mostly be accomplished by swapping a commit-date-ordered
Expand All @@ -130,25 +127,6 @@ Future Work
- 'log --topo-order'
- 'tag --merged'

- Currently, parse_commit_gently() requires filling in the root tree
object for a commit. This passes through lookup_tree() and consequently
lookup_object(). Also, it calls lookup_commit() when loading the parents.
These method calls check the ODB for object existence, even if the
consumer does not need the content. For example, we do not need the
tree contents when computing merge bases. Now that commit parsing is
removed from the computation time, these lookup operations are the
slowest operations keeping graph walks from being fast. Consider
loading these objects without verifying their existence in the ODB and
only loading them fully when consumers need them. Consider a method
such as "ensure_tree_loaded(commit)" that fully loads a tree before
using commit->tree.

- The current design uses the 'commit-graph' subcommand to generate the graph.
When this feature stabilizes enough to recommend to most users, we should
add automatic graph writes to common operations that create many commits.
For example, one could compute a graph on 'clone', 'fetch', or 'repack'
commands.

- A server could provide a commit graph file as part of the network protocol
to avoid extra calculations by clients. This feature is only of benefit if
the user is willing to trust the file, because verifying the file is correct
Expand Down
99 changes: 68 additions & 31 deletions builtin/commit-graph.c
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,19 @@
#include "dir.h"
#include "lockfile.h"
#include "parse-options.h"
#include "repository.h"
#include "commit-graph.h"

static char const * const builtin_commit_graph_usage[] = {
N_("git commit-graph [--object-dir <objdir>]"),
N_("git commit-graph read [--object-dir <objdir>]"),
N_("git commit-graph write [--object-dir <objdir>] [--append] [--stdin-packs|--stdin-commits]"),
N_("git commit-graph verify [--object-dir <objdir>]"),
N_("git commit-graph write [--object-dir <objdir>] [--append] [--reachable|--stdin-packs|--stdin-commits]"),
NULL
};

static const char * const builtin_commit_graph_verify_usage[] = {
N_("git commit-graph verify [--object-dir <objdir>]"),
NULL
};

Expand All @@ -18,17 +25,48 @@ static const char * const builtin_commit_graph_read_usage[] = {
};

static const char * const builtin_commit_graph_write_usage[] = {
N_("git commit-graph write [--object-dir <objdir>] [--append] [--stdin-packs|--stdin-commits]"),
N_("git commit-graph write [--object-dir <objdir>] [--append] [--reachable|--stdin-packs|--stdin-commits]"),
NULL
};

static struct opts_commit_graph {
const char *obj_dir;
int reachable;
int stdin_packs;
int stdin_commits;
int append;
} opts;


static int graph_verify(int argc, const char **argv)
{
struct commit_graph *graph = NULL;
char *graph_name;

static struct option builtin_commit_graph_verify_options[] = {
OPT_STRING(0, "object-dir", &opts.obj_dir,
N_("dir"),
N_("The object directory to store the graph")),
OPT_END(),
};

argc = parse_options(argc, argv, NULL,
builtin_commit_graph_verify_options,
builtin_commit_graph_verify_usage, 0);

if (!opts.obj_dir)
opts.obj_dir = get_object_directory();

graph_name = get_commit_graph_filename(opts.obj_dir);
graph = load_commit_graph_one(graph_name);
FREE_AND_NULL(graph_name);

if (!graph)
return 0;

return verify_commit_graph(the_repository, graph);
}

static int graph_read(int argc, const char **argv)
{
struct commit_graph *graph = NULL;
Expand All @@ -51,8 +89,11 @@ static int graph_read(int argc, const char **argv)
graph_name = get_commit_graph_filename(opts.obj_dir);
graph = load_commit_graph_one(graph_name);

if (!graph)
if (!graph) {
UNLEAK(graph_name);
die("graph file %s does not exist", graph_name);
}

FREE_AND_NULL(graph_name);

printf("header: %08x %d %d %d %d\n",
Expand All @@ -79,18 +120,16 @@ static int graph_read(int argc, const char **argv)

static int graph_write(int argc, const char **argv)
{
const char **pack_indexes = NULL;
int packs_nr = 0;
const char **commit_hex = NULL;
int commits_nr = 0;
const char **lines = NULL;
int lines_nr = 0;
int lines_alloc = 0;
struct string_list *pack_indexes = NULL;
struct string_list *commit_hex = NULL;
struct string_list lines;

static struct option builtin_commit_graph_write_options[] = {
OPT_STRING(0, "object-dir", &opts.obj_dir,
N_("dir"),
N_("The object directory to store the graph")),
OPT_BOOL(0, "reachable", &opts.reachable,
N_("start walk at all refs")),
OPT_BOOL(0, "stdin-packs", &opts.stdin_packs,
N_("scan pack-indexes listed by stdin for commits")),
OPT_BOOL(0, "stdin-commits", &opts.stdin_commits,
Expand All @@ -104,39 +143,35 @@ static int graph_write(int argc, const char **argv)
builtin_commit_graph_write_options,
builtin_commit_graph_write_usage, 0);

if (opts.stdin_packs && opts.stdin_commits)
die(_("cannot use both --stdin-commits and --stdin-packs"));
if (opts.reachable + opts.stdin_packs + opts.stdin_commits > 1)
die(_("use at most one of --reachable, --stdin-commits, or --stdin-packs"));
if (!opts.obj_dir)
opts.obj_dir = get_object_directory();

if (opts.reachable) {
write_commit_graph_reachable(opts.obj_dir, opts.append);
return 0;
}

string_list_init(&lines, 0);
if (opts.stdin_packs || opts.stdin_commits) {
struct strbuf buf = STRBUF_INIT;
lines_nr = 0;
lines_alloc = 128;
ALLOC_ARRAY(lines, lines_alloc);

while (strbuf_getline(&buf, stdin) != EOF) {
ALLOC_GROW(lines, lines_nr + 1, lines_alloc);
lines[lines_nr++] = strbuf_detach(&buf, NULL);
}

if (opts.stdin_packs) {
pack_indexes = lines;
packs_nr = lines_nr;
}
if (opts.stdin_commits) {
commit_hex = lines;
commits_nr = lines_nr;
}

while (strbuf_getline(&buf, stdin) != EOF)
string_list_append(&lines, strbuf_detach(&buf, NULL));

if (opts.stdin_packs)
pack_indexes = &lines;
if (opts.stdin_commits)
commit_hex = &lines;
}

write_commit_graph(opts.obj_dir,
pack_indexes,
packs_nr,
commit_hex,
commits_nr,
opts.append);

string_list_clear(&lines, 0);
return 0;
}

Expand All @@ -162,6 +197,8 @@ int cmd_commit_graph(int argc, const char **argv, const char *prefix)
if (argc > 0) {
if (!strcmp(argv[0], "read"))
return graph_read(argc, argv);
if (!strcmp(argv[0], "verify"))
return graph_verify(argc, argv);
if (!strcmp(argv[0], "write"))
return graph_write(argc, argv);
}
Expand Down
21 changes: 21 additions & 0 deletions builtin/fsck.c
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
#include "decorate.h"
#include "packfile.h"
#include "object-store.h"
#include "run-command.h"

#define REACHABLE 0x0001
#define SEEN 0x0002
Expand Down Expand Up @@ -47,6 +48,7 @@ static int name_objects;
#define ERROR_REACHABLE 02
#define ERROR_PACK 04
#define ERROR_REFS 010
#define ERROR_COMMIT_GRAPH 020

static const char *describe_object(struct object *obj)
{
Expand Down Expand Up @@ -822,5 +824,24 @@ int cmd_fsck(int argc, const char **argv, const char *prefix)
}

check_connectivity();

if (core_commit_graph) {
struct child_process commit_graph_verify = CHILD_PROCESS_INIT;
const char *verify_argv[] = { "commit-graph", "verify", NULL, NULL, NULL };
commit_graph_verify.argv = verify_argv;
commit_graph_verify.git_cmd = 1;

if (run_command(&commit_graph_verify))
errors_found |= ERROR_COMMIT_GRAPH;

prepare_alt_odb(the_repository);
for (alt = the_repository->objects->alt_odb_list; alt; alt = alt->next) {
verify_argv[2] = "--object-dir";
verify_argv[3] = alt->path;
if (run_command(&commit_graph_verify))
errors_found |= ERROR_COMMIT_GRAPH;
}
}

return errors_found;
}
6 changes: 6 additions & 0 deletions builtin/gc.c
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
#include "sigchain.h"
#include "argv-array.h"
#include "commit.h"
#include "commit-graph.h"
#include "packfile.h"
#include "object-store.h"
#include "pack.h"
Expand All @@ -40,6 +41,7 @@ static int aggressive_depth = 50;
static int aggressive_window = 250;
static int gc_auto_threshold = 6700;
static int gc_auto_pack_limit = 50;
static int gc_write_commit_graph = 0;
static int detach_auto = 1;
static timestamp_t gc_log_expire_time;
static const char *gc_log_expire = "1.day.ago";
Expand Down Expand Up @@ -129,6 +131,7 @@ static void gc_config(void)
git_config_get_int("gc.aggressivedepth", &aggressive_depth);
git_config_get_int("gc.auto", &gc_auto_threshold);
git_config_get_int("gc.autopacklimit", &gc_auto_pack_limit);
git_config_get_bool("gc.writecommitgraph", &gc_write_commit_graph);
git_config_get_bool("gc.autodetach", &detach_auto);
git_config_get_expiry("gc.pruneexpire", &prune_expire);
git_config_get_expiry("gc.worktreepruneexpire", &prune_worktrees_expire);
Expand Down Expand Up @@ -641,6 +644,9 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
if (pack_garbage.nr > 0)
clean_pack_garbage();

if (gc_write_commit_graph)
write_commit_graph_reachable(get_object_directory(), 0);

if (auto_gc && too_many_loose_objects())
warning(_("There are too many unreachable loose objects; "
"run 'git prune' to remove them."));
Expand Down
Loading