Skip to content

Commit 40be45b

Browse files
derrickstoleedscho
authored andcommitted
Merge pull request #84 Create 'expire' and 'repack' subcommands for multi-pack-index
The multi-pack-index provides a fast way to find an object among a large list of pack-files. It stores a single pack-reference for each object id, so duplicate objects are ignored. Among a list of pack-files storing the same object, the most-recently modified one is used. Create new subcommands for the multi-pack-index builtin. * 'git multi-pack-index expire': If we have a pack-file indexed by the multi-pack-index, but all objects in that pack are duplicated in more-recently modified packs, then delete that pack (and any others like it). Delete the reference to that pack in the multi-pack-index. * 'git multi-pack-index repack --batch-size=<size>': Starting from the oldest pack-files covered by the multi-pack-index, find those whose on-disk size is below the batch size until we have a collection of packs whose sizes add up to the batch size. Create a new pack containing all objects that the multi-pack-index references to those packs. This allows us to create a new pattern for repacking objects: run 'repack'. After enough time has passed that all Git commands that started before the last 'repack' are finished, run 'expire' again. This approach has some advantages over the existing "repack everything" model: 1. Incremental. We can repack a small batch of objects at a time, instead of repacking all reachable objects. We can also limit ourselves to the objects that do not appear in newer pack-files. 2. Highly Available. By adding a new pack-file (and not deleting the old pack-files) we do not interrupt concurrent Git commands, and do not suffer performance degradation. By expiring only pack-files that have no referenced objects, we know that Git commands that are doing normal object lookups* will not be interrupted. * Note: if someone concurrently runs a Git command that uses get_all_packs(), then that command could try to read the pack-files and pack-indexes that we are deleting during an expire command. Such commands are usually related to object maintenance (i.e. fsck, gc, pack-objects) or are related to less-often-used features (i.e. fast-import, http-backend, server-info). We plan to use this approach in VFS for Git to do background maintenance of the "shared object cache" which is a Git alternate directory filled with packfiles containing commits and trees. We currently download pack-files on an hourly basis to keep up-to-date with the central server. The cache servers supply packs on an hourly and daily basis, so most of the hourly packs become useless after a new daily pack is downloaded. The 'expire' command would clear out most of those packs, but many will still remain with fewer than 100 objects remaining. The 'repack' command (with a batch size of 1-3gb, probably) can condense the remaining packs in commands that run for 1-3 min at a time. Since the daily packs range from 100-250mb, we will also combine and condense those packs.
2 parents e7b24c3 + e6ce075 commit 40be45b

File tree

8 files changed

+376
-28
lines changed

8 files changed

+376
-28
lines changed

Documentation/git-multi-pack-index.txt

Lines changed: 21 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ git-multi-pack-index - Write and verify multi-pack-indexes
99
SYNOPSIS
1010
--------
1111
[verse]
12-
'git multi-pack-index' [--object-dir=<dir>] <verb>
12+
'git multi-pack-index' [--object-dir=<dir>] <subcommand>
1313

1414
DESCRIPTION
1515
-----------
@@ -23,13 +23,29 @@ OPTIONS
2323
`<dir>/packs/multi-pack-index` for the current MIDX file, and
2424
`<dir>/packs` for the pack-files to index.
2525

26+
The following subcommands are available:
27+
2628
write::
27-
When given as the verb, write a new MIDX file to
28-
`<dir>/packs/multi-pack-index`.
29+
Write a new MIDX file.
2930

3031
verify::
31-
When given as the verb, verify the contents of the MIDX file
32-
at `<dir>/packs/multi-pack-index`.
32+
Verify the contents of the MIDX file.
33+
34+
expire::
35+
Delete the pack-files that are tracked by the MIDX file, but
36+
have no objects referenced by the MIDX. Rewrite the MIDX file
37+
afterward to remove all references to these pack-files.
38+
39+
repack::
40+
Collect a batch of pack-files whose size are all at most the
41+
size given by --batch-size, but whose sizes sum to larger
42+
than --batch-size. The batch is selected by greedily adding
43+
small pack-files starting with the oldest pack-files that fit
44+
the size. Create a new pack-file containing the objects the
45+
multi-pack-index indexes into those pack-files, and rewrite
46+
the multi-pack-index to contain that pack-file. A later run
47+
of 'git multi-pack-index expire' will delete the pack-files
48+
that were part of this batch.
3349

3450

3551
EXAMPLES

builtin/multi-pack-index.c

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,12 +6,13 @@
66
#include "trace2.h"
77

88
static char const * const builtin_multi_pack_index_usage[] = {
9-
N_("git multi-pack-index [--object-dir=<dir>] (write|verify)"),
9+
N_("git multi-pack-index [--object-dir=<dir>] (write|verify|expire|repack --batch-size=<size>)"),
1010
NULL
1111
};
1212

1313
static struct opts_multi_pack_index {
1414
const char *object_dir;
15+
unsigned long batch_size;
1516
} opts;
1617

1718
int cmd_multi_pack_index(int argc, const char **argv,
@@ -20,6 +21,8 @@ int cmd_multi_pack_index(int argc, const char **argv,
2021
static struct option builtin_multi_pack_index_options[] = {
2122
OPT_FILENAME(0, "object-dir", &opts.object_dir,
2223
N_("object directory containing set of packfile and pack-index pairs")),
24+
OPT_MAGNITUDE(0, "batch-size", &opts.batch_size,
25+
N_("during repack, collect pack-files of smaller size into a batch that is larger than this size")),
2326
OPT_END(),
2427
};
2528

@@ -43,10 +46,17 @@ int cmd_multi_pack_index(int argc, const char **argv,
4346

4447
trace2_cmd_mode(argv[0]);
4548

49+
if (!strcmp(argv[0], "repack"))
50+
return midx_repack(the_repository, opts.object_dir, (size_t)opts.batch_size);
51+
if (opts.batch_size)
52+
die(_("--batch-size option is only for 'repack' verb"));
53+
4654
if (!strcmp(argv[0], "write"))
4755
return write_midx_file(opts.object_dir);
4856
if (!strcmp(argv[0], "verify"))
4957
return verify_midx_file(the_repository, opts.object_dir);
58+
if (!strcmp(argv[0], "expire"))
59+
return expire_midx_packs(the_repository, opts.object_dir);
5060

5161
die(_("unrecognized verb: %s"), argv[0]);
5262
}

builtin/repack.c

Lines changed: 2 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -129,19 +129,9 @@ static void get_non_kept_pack_filenames(struct string_list *fname_list,
129129

130130
static void remove_redundant_pack(const char *dir_name, const char *base_name)
131131
{
132-
const char *exts[] = {".pack", ".idx", ".keep", ".bitmap", ".promisor"};
133-
int i;
134132
struct strbuf buf = STRBUF_INIT;
135-
size_t plen;
136-
137-
strbuf_addf(&buf, "%s/%s", dir_name, base_name);
138-
plen = buf.len;
139-
140-
for (i = 0; i < ARRAY_SIZE(exts); i++) {
141-
strbuf_setlen(&buf, plen);
142-
strbuf_addstr(&buf, exts[i]);
143-
unlink(buf.buf);
144-
}
133+
strbuf_addf(&buf, "%s/%s.pack", dir_name, base_name);
134+
unlink_pack_path(buf.buf, 1);
145135
strbuf_release(&buf);
146136
}
147137

0 commit comments

Comments
 (0)