Skip to content

Commit 933af84

Browse files
Ben Peartdscho
Ben Peart
authored andcommitted
Hydrate missing loose objects in check_and_freshen()
Hydrate missing loose objects in check_and_freshen() when running virtualized. Add test cases to verify read-object hook works when running virtualized. This hook is called in check_and_freshen() rather than check_and_freshen_local() to make the hook work also with alternates. Helped-by: Kevin Willford <kewillf@microsoft.com> Signed-off-by: Ben Peart <Ben.Peart@microsoft.com>
1 parent 6e17496 commit 933af84

File tree

6 files changed

+487
-16
lines changed

6 files changed

+487
-16
lines changed

Diff for: Documentation/technical/read-object-protocol.txt

+102
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,102 @@
1+
Read Object Process
2+
^^^^^^^^^^^^^^^^^^^^^^^^^^^
3+
4+
The read-object process enables Git to read all missing blobs with a
5+
single process invocation for the entire life of a single Git command.
6+
This is achieved by using a packet format (pkt-line, see technical/
7+
protocol-common.txt) based protocol over standard input and standard
8+
output as follows. All packets, except for the "*CONTENT" packets and
9+
the "0000" flush packet, are considered text and therefore are
10+
terminated by a LF.
11+
12+
Git starts the process when it encounters the first missing object that
13+
needs to be retrieved. After the process is started, Git sends a welcome
14+
message ("git-read-object-client"), a list of supported protocol version
15+
numbers, and a flush packet. Git expects to read a welcome response
16+
message ("git-read-object-server"), exactly one protocol version number
17+
from the previously sent list, and a flush packet. All further
18+
communication will be based on the selected version.
19+
20+
The remaining protocol description below documents "version=1". Please
21+
note that "version=42" in the example below does not exist and is only
22+
there to illustrate how the protocol would look with more than one
23+
version.
24+
25+
After the version negotiation Git sends a list of all capabilities that
26+
it supports and a flush packet. Git expects to read a list of desired
27+
capabilities, which must be a subset of the supported capabilities list,
28+
and a flush packet as response:
29+
------------------------
30+
packet: git> git-read-object-client
31+
packet: git> version=1
32+
packet: git> version=42
33+
packet: git> 0000
34+
packet: git< git-read-object-server
35+
packet: git< version=1
36+
packet: git< 0000
37+
packet: git> capability=get
38+
packet: git> capability=have
39+
packet: git> capability=put
40+
packet: git> capability=not-yet-invented
41+
packet: git> 0000
42+
packet: git< capability=get
43+
packet: git< 0000
44+
------------------------
45+
The only supported capability in version 1 is "get".
46+
47+
Afterwards Git sends a list of "key=value" pairs terminated with a flush
48+
packet. The list will contain at least the command (based on the
49+
supported capabilities) and the sha1 of the object to retrieve. Please
50+
note, that the process must not send any response before it received the
51+
final flush packet.
52+
53+
When the process receives the "get" command, it should make the requested
54+
object available in the git object store and then return success. Git will
55+
then check the object store again and this time find it and proceed.
56+
------------------------
57+
packet: git> command=get
58+
packet: git> sha1=0a214a649e1b3d5011e14a3dc227753f2bd2be05
59+
packet: git> 0000
60+
------------------------
61+
62+
The process is expected to respond with a list of "key=value" pairs
63+
terminated with a flush packet. If the process does not experience
64+
problems then the list must contain a "success" status.
65+
------------------------
66+
packet: git< status=success
67+
packet: git< 0000
68+
------------------------
69+
70+
In case the process cannot or does not want to process the content, it
71+
is expected to respond with an "error" status.
72+
------------------------
73+
packet: git< status=error
74+
packet: git< 0000
75+
------------------------
76+
77+
In case the process cannot or does not want to process the content as
78+
well as any future content for the lifetime of the Git process, then it
79+
is expected to respond with an "abort" status at any point in the
80+
protocol.
81+
------------------------
82+
packet: git< status=abort
83+
packet: git< 0000
84+
------------------------
85+
86+
Git neither stops nor restarts the process in case the "error"/"abort"
87+
status is set.
88+
89+
If the process dies during the communication or does not adhere to the
90+
protocol then Git will stop the process and restart it with the next
91+
object that needs to be processed.
92+
93+
After the read-object process has processed an object it is expected to
94+
wait for the next "key=value" list containing a command. Git will close
95+
the command pipe on exit. The process is expected to detect EOF and exit
96+
gracefully on its own. Git will wait until the process has stopped.
97+
98+
A long running read-object process demo implementation can be found in
99+
`contrib/long-running-read-object/example.pl` located in the Git core
100+
repository. If you develop your own long running process then the
101+
`GIT_TRACE_PACKET` environment variables can be very helpful for
102+
debugging (see linkgit:git[1]).

Diff for: contrib/long-running-read-object/example.pl

+114
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,114 @@
1+
#!/usr/bin/perl
2+
#
3+
# Example implementation for the Git read-object protocol version 1
4+
# See Documentation/technical/read-object-protocol.txt
5+
#
6+
# Allows you to test the ability for blobs to be pulled from a host git repo
7+
# "on demand." Called when git needs a blob it couldn't find locally due to
8+
# a lazy clone that only cloned the commits and trees.
9+
#
10+
# A lazy clone can be simulated via the following commands from the host repo
11+
# you wish to create a lazy clone of:
12+
#
13+
# cd /host_repo
14+
# git rev-parse HEAD
15+
# git init /guest_repo
16+
# git cat-file --batch-check --batch-all-objects | grep -v 'blob' |
17+
# cut -d' ' -f1 | git pack-objects /guest_repo/.git/objects/pack/noblobs
18+
# cd /guest_repo
19+
# git config core.virtualizeobjects true
20+
# git reset --hard <sha from rev-parse call above>
21+
#
22+
# Please note, this sample is a minimal skeleton. No proper error handling
23+
# was implemented.
24+
#
25+
26+
use strict;
27+
use warnings;
28+
29+
#
30+
# Point $DIR to the folder where your host git repo is located so we can pull
31+
# missing objects from it
32+
#
33+
my $DIR = "/host_repo/.git/";
34+
35+
sub packet_bin_read {
36+
my $buffer;
37+
my $bytes_read = read STDIN, $buffer, 4;
38+
if ( $bytes_read == 0 ) {
39+
40+
# EOF - Git stopped talking to us!
41+
exit();
42+
}
43+
elsif ( $bytes_read != 4 ) {
44+
die "invalid packet: '$buffer'";
45+
}
46+
my $pkt_size = hex($buffer);
47+
if ( $pkt_size == 0 ) {
48+
return ( 1, "" );
49+
}
50+
elsif ( $pkt_size > 4 ) {
51+
my $content_size = $pkt_size - 4;
52+
$bytes_read = read STDIN, $buffer, $content_size;
53+
if ( $bytes_read != $content_size ) {
54+
die "invalid packet ($content_size bytes expected; $bytes_read bytes read)";
55+
}
56+
return ( 0, $buffer );
57+
}
58+
else {
59+
die "invalid packet size: $pkt_size";
60+
}
61+
}
62+
63+
sub packet_txt_read {
64+
my ( $res, $buf ) = packet_bin_read();
65+
unless ( $buf =~ s/\n$// ) {
66+
die "A non-binary line MUST be terminated by an LF.";
67+
}
68+
return ( $res, $buf );
69+
}
70+
71+
sub packet_bin_write {
72+
my $buf = shift;
73+
print STDOUT sprintf( "%04x", length($buf) + 4 );
74+
print STDOUT $buf;
75+
STDOUT->flush();
76+
}
77+
78+
sub packet_txt_write {
79+
packet_bin_write( $_[0] . "\n" );
80+
}
81+
82+
sub packet_flush {
83+
print STDOUT sprintf( "%04x", 0 );
84+
STDOUT->flush();
85+
}
86+
87+
( packet_txt_read() eq ( 0, "git-read-object-client" ) ) || die "bad initialize";
88+
( packet_txt_read() eq ( 0, "version=1" ) ) || die "bad version";
89+
( packet_bin_read() eq ( 1, "" ) ) || die "bad version end";
90+
91+
packet_txt_write("git-read-object-server");
92+
packet_txt_write("version=1");
93+
packet_flush();
94+
95+
( packet_txt_read() eq ( 0, "capability=get" ) ) || die "bad capability";
96+
( packet_bin_read() eq ( 1, "" ) ) || die "bad capability end";
97+
98+
packet_txt_write("capability=get");
99+
packet_flush();
100+
101+
while (1) {
102+
my ($command) = packet_txt_read() =~ /^command=([^=]+)$/;
103+
104+
if ( $command eq "get" ) {
105+
my ($sha1) = packet_txt_read() =~ /^sha1=([0-9a-f]{40})$/;
106+
packet_bin_read();
107+
108+
system ('git --git-dir="' . $DIR . '" cat-file blob ' . $sha1 . ' | git -c core.virtualizeobjects=false hash-object -w --stdin >/dev/null 2>&1');
109+
packet_txt_write(($?) ? "status=error" : "status=success");
110+
packet_flush();
111+
} else {
112+
die "bad command '$command'";
113+
}
114+
}

Diff for: object-file.c

+126-16
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,9 @@
4343
#include "object-file-convert.h"
4444
#include "trace.h"
4545
#include "hook.h"
46+
#include "sigchain.h"
47+
#include "sub-process.h"
48+
#include "pkt-line.h"
4649

4750
/* The maximum size for an object header. */
4851
#define MAX_HEADER_LEN 32
@@ -1023,6 +1026,116 @@ int has_alt_odb(struct repository *r)
10231026
return !!r->objects->odb->next;
10241027
}
10251028

1029+
#define CAP_GET (1u<<0)
1030+
1031+
static int subprocess_map_initialized;
1032+
static struct hashmap subprocess_map;
1033+
1034+
struct read_object_process {
1035+
struct subprocess_entry subprocess;
1036+
unsigned int supported_capabilities;
1037+
};
1038+
1039+
static int start_read_object_fn(struct subprocess_entry *subprocess)
1040+
{
1041+
struct read_object_process *entry = (struct read_object_process *)subprocess;
1042+
static int versions[] = {1, 0};
1043+
static struct subprocess_capability capabilities[] = {
1044+
{ "get", CAP_GET },
1045+
{ NULL, 0 }
1046+
};
1047+
1048+
return subprocess_handshake(subprocess, "git-read-object", versions,
1049+
NULL, capabilities,
1050+
&entry->supported_capabilities);
1051+
}
1052+
1053+
static int read_object_process(const struct object_id *oid)
1054+
{
1055+
int err;
1056+
struct read_object_process *entry;
1057+
struct child_process *process;
1058+
struct strbuf status = STRBUF_INIT;
1059+
const char *cmd = find_hook(the_repository, "read-object");
1060+
uint64_t start;
1061+
1062+
start = getnanotime();
1063+
1064+
if (!subprocess_map_initialized) {
1065+
subprocess_map_initialized = 1;
1066+
hashmap_init(&subprocess_map, (hashmap_cmp_fn)cmd2process_cmp,
1067+
NULL, 0);
1068+
entry = NULL;
1069+
} else {
1070+
entry = (struct read_object_process *) subprocess_find_entry(&subprocess_map, cmd);
1071+
}
1072+
1073+
if (!entry) {
1074+
entry = xmalloc(sizeof(*entry));
1075+
entry->supported_capabilities = 0;
1076+
1077+
if (subprocess_start(&subprocess_map, &entry->subprocess, cmd,
1078+
start_read_object_fn)) {
1079+
free(entry);
1080+
return -1;
1081+
}
1082+
}
1083+
process = &entry->subprocess.process;
1084+
1085+
if (!(CAP_GET & entry->supported_capabilities))
1086+
return -1;
1087+
1088+
sigchain_push(SIGPIPE, SIG_IGN);
1089+
1090+
err = packet_write_fmt_gently(process->in, "command=get\n");
1091+
if (err)
1092+
goto done;
1093+
1094+
err = packet_write_fmt_gently(process->in, "sha1=%s\n", oid_to_hex(oid));
1095+
if (err)
1096+
goto done;
1097+
1098+
err = packet_flush_gently(process->in);
1099+
if (err)
1100+
goto done;
1101+
1102+
err = subprocess_read_status(process->out, &status);
1103+
err = err ? err : strcmp(status.buf, "success");
1104+
1105+
done:
1106+
sigchain_pop(SIGPIPE);
1107+
1108+
if (err || errno == EPIPE) {
1109+
err = err ? err : errno;
1110+
if (!strcmp(status.buf, "error")) {
1111+
/* The process signaled a problem with the file. */
1112+
}
1113+
else if (!strcmp(status.buf, "abort")) {
1114+
/*
1115+
* The process signaled a permanent problem. Don't try to read
1116+
* objects with the same command for the lifetime of the current
1117+
* Git process.
1118+
*/
1119+
entry->supported_capabilities &= ~CAP_GET;
1120+
}
1121+
else {
1122+
/*
1123+
* Something went wrong with the read-object process.
1124+
* Force shutdown and restart if needed.
1125+
*/
1126+
error("external process '%s' failed", cmd);
1127+
subprocess_stop(&subprocess_map,
1128+
(struct subprocess_entry *)entry);
1129+
free(entry);
1130+
}
1131+
}
1132+
1133+
trace_performance_since(start, "read_object_process");
1134+
1135+
strbuf_release(&status);
1136+
return err;
1137+
}
1138+
10261139
/* Returns 1 if we have successfully freshened the file, 0 otherwise. */
10271140
static int freshen_file(const char *fn)
10281141
{
@@ -1073,8 +1186,19 @@ static int check_and_freshen_nonlocal(const struct object_id *oid, int freshen)
10731186

10741187
static int check_and_freshen(const struct object_id *oid, int freshen)
10751188
{
1076-
return check_and_freshen_local(oid, freshen) ||
1189+
int ret;
1190+
int tried_hook = 0;
1191+
1192+
retry:
1193+
ret = check_and_freshen_local(oid, freshen) ||
10771194
check_and_freshen_nonlocal(oid, freshen);
1195+
if (!ret && core_virtualize_objects && !tried_hook) {
1196+
tried_hook = 1;
1197+
if (!read_object_process(oid))
1198+
goto retry;
1199+
}
1200+
1201+
return ret;
10781202
}
10791203

10801204
int has_loose_object_nonlocal(const struct object_id *oid)
@@ -1618,20 +1742,6 @@ void disable_obj_read_lock(void)
16181742
pthread_mutex_destroy(&obj_read_mutex);
16191743
}
16201744

1621-
static int run_read_object_hook(struct repository *r, const struct object_id *oid)
1622-
{
1623-
struct run_hooks_opt opt = RUN_HOOKS_OPT_INIT;
1624-
int ret;
1625-
uint64_t start;
1626-
1627-
start = getnanotime();
1628-
strvec_push(&opt.args, oid_to_hex(oid));
1629-
ret = run_hooks_opt(r, "read-object", &opt);
1630-
trace_performance_since(start, "run_read_object_hook");
1631-
1632-
return ret;
1633-
}
1634-
16351745
int fetch_if_missing = 1;
16361746

16371747
static int do_oid_object_info_extended(struct repository *r,
@@ -1690,7 +1800,7 @@ static int do_oid_object_info_extended(struct repository *r,
16901800
break;
16911801
if (core_virtualize_objects && !tried_hook) {
16921802
tried_hook = 1;
1693-
if (!run_read_object_hook(r, oid))
1803+
if (!read_object_process(oid))
16941804
goto retry;
16951805
}
16961806
}

0 commit comments

Comments
 (0)