From 8b5763e8fa99665d686a0af639c7d02984d7a100 Mon Sep 17 00:00:00 2001
From: Jeff King <peff@peff.net>
Date: Fri, 25 Oct 2024 02:43:40 -0400
Subject: [PATCH 01/11] midx: avoid duplicate packed_git entries

When we scan a pack directory to load the idx entries we find into the
packed_git list, we skip any of them that are contained in a midx. We
then load them later lazily if we actually need to access the
corresponding pack, referencing them both from the midx struct and the
packed_git list.

The lazy-load in the midx code checks to see if the midx already
mentions the pack, but doesn't otherwise check the packed_git list. This
makes sense, since we should have added any pack to both lists.

But there's a loophole! If we call close_object_store(), that frees the
midx entirely, but _not_ the packed_git structs, which we must keep
around for Reasons[1]. If we then try to look up more objects, we'll
auto-load the midx again, which won't realize that we've already loaded
those packs, and will create duplicate entries in the packed_git list.

This is possibly inefficient, because it means we may open and map the
pack redundantly. But it can also lead to weird user-visible behavior.
The case I found is in "git repack", which closes and reopens the midx
after repacking and then calls update_server_info(). We end up writing
the duplicate entries into objects/info/packs.

We could obviously de-dup them while writing that file, but it seems
like a violation of more core assumptions that we end up with these
duplicate structs at all.

We can avoid the duplicates reasonably efficiently by checking their
names in the pack_map hash. This annoyingly does require a little more
than a straight hash lookup due to the naming conventions, but it should
only happen when we are about to actually open a pack. I don't think one
extra malloc will be noticeable there.

[1] I'm not entirely sure of all the details, except that we generally
    assume the packed_git structs never go away. We noted this
    restriction in the comment added by 6f1e9394e2 (object: fix leaking
    packfiles when closing object store, 2024-08-08), but it's somewhat
    vague. At any rate, if you try freeing the structs in
    close_object_store(), you can observe segfaults all over the test
    suite. So it might be fixable, but it's not trivial.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c                        | 20 +++++++++++++++++---
 t/t5200-update-server-info.sh |  8 ++++++++
 2 files changed, 25 insertions(+), 3 deletions(-)

diff --git a/midx.c b/midx.c
index 67e0d64004666d..479812cb9bfddc 100644
--- a/midx.c
+++ b/midx.c
@@ -445,6 +445,7 @@ int prepare_midx_pack(struct repository *r, struct multi_pack_index *m,
 		      uint32_t pack_int_id)
 {
 	struct strbuf pack_name = STRBUF_INIT;
+	struct strbuf key = STRBUF_INIT;
 	struct packed_git *p;
 
 	pack_int_id = midx_for_pack(&m, pack_int_id);
@@ -455,16 +456,29 @@ int prepare_midx_pack(struct repository *r, struct multi_pack_index *m,
 	strbuf_addf(&pack_name, "%s/pack/%s", m->object_dir,
 		    m->pack_names[pack_int_id]);
 
-	p = add_packed_git(pack_name.buf, pack_name.len, m->local);
+	/* pack_map holds the ".pack" name, but we have the .idx */
+	strbuf_addbuf(&key, &pack_name);
+	strbuf_strip_suffix(&key, ".idx");
+	strbuf_addstr(&key, ".pack");
+	p = hashmap_get_entry_from_hash(&r->objects->pack_map,
+					strhash(key.buf), key.buf,
+					struct packed_git, packmap_ent);
+	if (!p) {
+		p = add_packed_git(pack_name.buf, pack_name.len, m->local);
+		if (p) {
+			install_packed_git(r, p);
+			list_add_tail(&p->mru, &r->objects->packed_git_mru);
+		}
+	}
+
 	strbuf_release(&pack_name);
+	strbuf_release(&key);
 
 	if (!p)
 		return 1;
 
 	p->multi_pack_index = 1;
 	m->packs[pack_int_id] = p;
-	install_packed_git(r, p);
-	list_add_tail(&p->mru, &r->objects->packed_git_mru);
 
 	return 0;
 }
diff --git a/t/t5200-update-server-info.sh b/t/t5200-update-server-info.sh
index ed9dfd624c7542..cc51c73986c112 100755
--- a/t/t5200-update-server-info.sh
+++ b/t/t5200-update-server-info.sh
@@ -39,4 +39,12 @@ test_expect_success 'info/refs updates when changes are made' '
 	! test_cmp a b
 '
 
+test_expect_success 'midx does not create duplicate pack entries' '
+	git repack -d --write-midx &&
+	git repack -d &&
+	grep ^P .git/objects/info/packs >packs &&
+	uniq -d <packs >dups &&
+	test_must_be_empty dups
+'
+
 test_done

From 019b21d4022c6e3a43f54a3c16c881e9419e25eb Mon Sep 17 00:00:00 2001
From: Jeff King <peff@peff.net>
Date: Fri, 25 Oct 2024 02:44:17 -0400
Subject: [PATCH 02/11] t5550: count fetches in "previously-fetched .idx" test

We have a test in t5550 that looks at index fetching over dumb http. It
creates two branches, each of which is completely stored in its own
pack, then fetches the branches independently. What should (and does)
happen is that the first fetch grabs both .idx files and one .pack file,
and then the fetch of the second branch re-uses the previously
downloaded .idx files (fetching none) and grabs the now-required .pack
file.

Since the next few patches will be touching this area of the code, let's
beef up the test a little by checking that we're downloading the
expected items at each step.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/t5550-http-fetch-dumb.sh | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/t/t5550-http-fetch-dumb.sh b/t/t5550-http-fetch-dumb.sh
index 58189c9f7dc9bd..3968b822607cae 100755
--- a/t/t5550-http-fetch-dumb.sh
+++ b/t/t5550-http-fetch-dumb.sh
@@ -307,6 +307,14 @@ test_expect_success 'fetch notices corrupt idx' '
 	)
 '
 
+# usage: count_fetches <nr> <extension> <trace_file>
+count_fetches () {
+	# ignore grep exit code; it may return non-zero if we are expecting no
+	# matches
+	grep "GET .*objects/pack/pack-[a-z0-9]*.$2" "$3" >trace.count
+	test_line_count = "$1" trace.count
+}
+
 test_expect_success 'fetch can handle previously-fetched .idx files' '
 	git checkout --orphan branch1 &&
 	echo base >file &&
@@ -321,8 +329,14 @@ test_expect_success 'fetch can handle previously-fetched .idx files' '
 	git push "$HTTPD_DOCUMENT_ROOT_PATH"/repo_packed_branches.git branch2 &&
 	git --git-dir="$HTTPD_DOCUMENT_ROOT_PATH"/repo_packed_branches.git repack -d &&
 	git --bare init clone_packed_branches.git &&
-	git --git-dir=clone_packed_branches.git fetch "$HTTPD_URL"/dumb/repo_packed_branches.git branch1:branch1 &&
-	git --git-dir=clone_packed_branches.git fetch "$HTTPD_URL"/dumb/repo_packed_branches.git branch2:branch2
+	GIT_TRACE_CURL=$PWD/one.trace git --git-dir=clone_packed_branches.git \
+		fetch "$HTTPD_URL"/dumb/repo_packed_branches.git branch1:branch1 &&
+	count_fetches 2 idx one.trace &&
+	count_fetches 1 pack one.trace &&
+	GIT_TRACE_CURL=$PWD/two.trace git --git-dir=clone_packed_branches.git \
+		fetch "$HTTPD_URL"/dumb/repo_packed_branches.git branch2:branch2 &&
+	count_fetches 0 idx two.trace &&
+	count_fetches 1 pack two.trace
 '
 
 test_expect_success 'did not use upload-pack service' '

From 63aca3f7f14e01cabccf2165b941205a8a9a01dc Mon Sep 17 00:00:00 2001
From: Jeff King <peff@peff.net>
Date: Fri, 25 Oct 2024 02:58:06 -0400
Subject: [PATCH 03/11] dumb-http: store downloaded pack idx as tempfile

This patch fixes a regression in b1b8dfde69 (finalize_object_file():
implement collision check, 2024-09-26) where fetching a v1 pack idx file
over the dumb-http protocol would cause the fetch to fail.

The core of the issue is that dumb-http stores the idx we fetch from the
remote at the same path that will eventually hold the idx we generate
from "index-pack --stdin". The sequence is something like this:

  0. We realize we need some object X, which we don't have locally, and
     nor does the other side have it as a loose object.

  1. We download the list of remote packs from objects/info/packs.

  2. For each entry in that file, we download each pack index and store
     it locally in .git/objects/pack/pack-$hash.idx (the $hash is not
     something we can verify yet and is given to us by the remote).

  3. We check each pack index we got to see if it has object X. When we
     find a match, we download the matching .pack file from the remote
     to a tempfile. We feed that to "index-pack --stdin", which
     reindexes the pack, rather than trusting that it has what the other
     side claims it does. In most cases, this will end up generating the
     exact same (byte-for-byte) pack index which we'll store at the same
     pack-$hash.idx path, because the index generation and $hash id are
     computed based on what's in the packfile. But:

       a. The other side might have used other options to generate the
          index. For instance we use index v2 by default, but long ago
          it was v1 (and you can still ask for v1 explicitly).

       b. The other side might even use a different mechanism to
          determine $hash. E.g., long ago it was based on the sorted
          list of objects in the packfile, but we switched to using the
          pack checksum in 1190a1acf8 (pack-objects: name pack files
          after trailer hash, 2013-12-05).

The regression we saw in the real world was (3a). A recent client
fetching from a server with a v1 index downloaded that index, then
complained about trying to overwrite it with its own v2 index. This
collision is otherwise harmless; we know we want to replace the remote
version with our local one, but the collision check doesn't realize
that.

There are a few options to fix it:

  - we could teach index-pack a command-line option to ignore only pack
    idx collisions, and use it when the dumb-http code invokes
    index-pack. This would be an awkward thing to expose users to and
    would involve a lot of boilerplate to get the option down to the
    collision code.

  - we could delete the remote .idx file right before running
    index-pack. It should be redundant at that point (since we've just
    downloaded the matching pack). But it feels risky to delete
    something from our own .git/objects based on what the other side has
    said. I'm not entirely positive that a malicious server couldn't lie
    about which pack-$hash.idx it has and get us to delete something
    precious.

  - we can stop co-mingling the downloaded idx files in our local
    objects directory. This is a slightly bigger change but I think
    fixes the root of the problem more directly.

This patch implements the third option. The big design questions are:
where do we store the downloaded files, and how do we manage their
lifetimes?

There are some additional quirks to the dumb-http system we should
consider. Remember that in step 2 we downloaded every pack index, but in
step 3 we may only download some of the matching packs. What happens to
those other idx files now? They sit in the .git/objects/pack directory,
possibly waiting to be used at a later date. That may save bandwidth for
a subsequent fetch, but it also creates a lot of weird corner cases:

  - our local object directory now has semi-untrusted .idx files sitting
    around, without their matching .pack

  - in case 3b, we noted that we might not generate the same hash as the
    other side. In that case even if we download the matching pack,
    our index-pack invocation will store it in a different
    pack-$hash.idx file. And the unmatched .idx will sit there forever.

  - if the server repacks, it may delete the old packs. Now we have
    these orphaned .idx files sitting around locally that will never be
    used (nor deleted).

  - if we repack locally we may delete our local version of the server's
    pack index and not realize we have it. So we'll download it again,
    even though we have all of the objects it mentions.

I think the right solution here is probably some more complex cache
management system: download the remote .idx files to their own storage
directory, mark them as "seen" when we get their matching pack (to avoid
re-downloading even if we repack), and then delete them when the
server's objects/info/refs no longer mentions them.

But since the dumb http protocol is so ancient and so inferior to the
smart http protocol, I don't think it's worth spending a lot of time
creating such a system. For this patch I'm just downloading the idx
files to .git/objects/tmp_pack_*, and marking them as tempfiles to be
deleted when we exit (and due to the name, any we miss due to a crash,
etc, should eventually be removed by "git gc" runs based on timestamps).

That is slightly worse for one case: if we download an idx but not the
matching pack, we won't retain that idx for subsequent runs. But the
flip side is that we're making other cases better (we never hold on to
useless idx files forever). I suspect that worse case does not even come
up often, since it implies that the packs are generated to match
distinct parts of history (i.e., in practice even in a repo with many
packs you're going to end up grabbing all of those packs to do a clone).
If somebody really cares about that, I think the right path forward is a
managed cache directory as above, and this patch is providing the first
step in that direction anyway (by moving things out of the objects/pack/
directory).

There are two test changes. One demonstrates the broken v1 index case
(it double-checks the resulting clone with fsck to be careful, but prior
to this patch it actually fails at the clone step). The other tweaks the
expectation for a test that covers the "slightly worse" case to
accommodate the extra index download.

The code changes are fairly simple. We stop using finalize_object_file()
to copy the remote's index file into place, and leave it as a tempfile.
We give the tempfile a real ".idx" name, since the packfile code expects
that, and thus we make sure it is out of the usual packs/ directory (so
we'd never mistake it for a real local .idx).

We also have to change parse_pack_index(), which creates a temporary
packed_git to access our index (we need this because all of the pack idx
code assumes we have that struct). It reads the index data from the
tempfile, but prior to this patch would speculatively write the
finalized name into the packed_git struct using the pack-$hash we expect
to use.

I was mildly surprised that this worked at all, since we call
verify_pack_index() on the packed_git which mentions the final name
before moving the file into place! But it works because
parse_pack_index() leaves the mmap-ed data in the struct, so the
lazy-open in verify_pack_index() never triggers, and we read from the
tempfile, ignoring the filename in the struct completely. Hacky, but it
works.

After this patch, parse_pack_index() now uses the index filename we pass
in to derive a matching .pack name. This is OK to change because there
are only two callers, both in the dumb http code (and the other passes
in an existing pack-$hash.idx name, so the derived name is going to be
pack-$hash.pack, which is what we were using anyway).

I'll follow up with some more cleanups in that area, but this patch is
sufficient to fix the regression.

Reported-by: fox <fox.gbr@townlong-yak.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 http.c                     | 25 ++++++++++++++++++++-----
 packfile.c                 | 11 ++++++++++-
 t/t5550-http-fetch-dumb.sh | 12 +++++++++++-
 3 files changed, 41 insertions(+), 7 deletions(-)

diff --git a/http.c b/http.c
index d59e59f66b12b6..9642cad2e331f7 100644
--- a/http.c
+++ b/http.c
@@ -19,6 +19,7 @@
 #include "string-list.h"
 #include "object-file.h"
 #include "object-store-ll.h"
+#include "tempfile.h"
 
 static struct trace_key trace_curl = TRACE_KEY_INIT(CURL);
 static int trace_curl_data = 1;
@@ -2388,8 +2389,24 @@ static char *fetch_pack_index(unsigned char *hash, const char *base_url)
 	strbuf_addf(&buf, "objects/pack/pack-%s.idx", hash_to_hex(hash));
 	url = strbuf_detach(&buf, NULL);
 
-	strbuf_addf(&buf, "%s.temp", sha1_pack_index_name(hash));
-	tmp = strbuf_detach(&buf, NULL);
+	/*
+	 * Don't put this into packs/, since it's just temporary and we don't
+	 * want to confuse it with our local .idx files.  We'll generate our
+	 * own index if we choose to download the matching packfile.
+	 *
+	 * It's tempting to use xmks_tempfile() here, but it's important that
+	 * the file not exist, otherwise http_get_file() complains. So we
+	 * create a filename that should be unique, and then just register it
+	 * as a tempfile so that it will get cleaned up on exit.
+	 *
+	 * In theory we could hold on to the tempfile and delete these as soon
+	 * as we download the matching pack, but it would take a bit of
+	 * refactoring. Leaving them until the process ends is probably OK.
+	 */
+	tmp = xstrfmt("%s/tmp_pack_%s.idx",
+		      repo_get_object_directory(the_repository),
+		      hash_to_hex(hash));
+	register_tempfile(tmp);
 
 	if (http_get_file(url, tmp, NULL) != HTTP_OK) {
 		error("Unable to get pack index %s", url);
@@ -2427,10 +2444,8 @@ static int fetch_and_setup_pack_index(struct packed_git **packs_head,
 	}
 
 	ret = verify_pack_index(new_pack);
-	if (!ret) {
+	if (!ret)
 		close_pack_index(new_pack);
-		ret = finalize_object_file(tmp_idx, sha1_pack_index_name(sha1));
-	}
 	free(tmp_idx);
 	if (ret)
 		return -1;
diff --git a/packfile.c b/packfile.c
index df4ba677197193..16d3bcf7f70340 100644
--- a/packfile.c
+++ b/packfile.c
@@ -237,13 +237,22 @@ static struct packed_git *alloc_packed_git(int extra)
 	return p;
 }
 
+static char *pack_path_from_idx(const char *idx_path)
+{
+	size_t len;
+	if (!strip_suffix(idx_path, ".idx", &len))
+		BUG("idx path does not end in .idx: %s", idx_path);
+	return xstrfmt("%.*s.pack", (int)len, idx_path);
+}
+
 struct packed_git *parse_pack_index(unsigned char *sha1, const char *idx_path)
 {
-	const char *path = sha1_pack_name(sha1);
+	char *path = pack_path_from_idx(idx_path);
 	size_t alloc = st_add(strlen(path), 1);
 	struct packed_git *p = alloc_packed_git(alloc);
 
 	memcpy(p->pack_name, path, alloc); /* includes NUL */
+	free(path);
 	hashcpy(p->hash, sha1, the_repository->hash_algo);
 	if (check_packed_git_idx(idx_path, p)) {
 		free(p);
diff --git a/t/t5550-http-fetch-dumb.sh b/t/t5550-http-fetch-dumb.sh
index 3968b822607cae..706540ab74efde 100755
--- a/t/t5550-http-fetch-dumb.sh
+++ b/t/t5550-http-fetch-dumb.sh
@@ -335,7 +335,7 @@ test_expect_success 'fetch can handle previously-fetched .idx files' '
 	count_fetches 1 pack one.trace &&
 	GIT_TRACE_CURL=$PWD/two.trace git --git-dir=clone_packed_branches.git \
 		fetch "$HTTPD_URL"/dumb/repo_packed_branches.git branch2:branch2 &&
-	count_fetches 0 idx two.trace &&
+	count_fetches 1 idx two.trace &&
 	count_fetches 1 pack two.trace
 '
 
@@ -521,4 +521,14 @@ test_expect_success 'fetching via http alternates works' '
 	git -c http.followredirects=true clone "$HTTPD_URL/dumb/alt-child.git"
 '
 
+test_expect_success 'dumb http can fetch index v1' '
+	server=$HTTPD_DOCUMENT_ROOT_PATH/idx-v1.git &&
+	git init --bare "$server" &&
+	git -C "$server" --work-tree=. commit --allow-empty -m foo &&
+	git -C "$server" -c pack.indexVersion=1 gc &&
+
+	git clone "$HTTPD_URL/dumb/idx-v1.git" &&
+	git -C idx-v1 fsck
+'
+
 test_done

From 03b8eed7f50a56cd7581e18775c6f3a7caf639b4 Mon Sep 17 00:00:00 2001
From: Jeff King <peff@peff.net>
Date: Fri, 25 Oct 2024 03:00:09 -0400
Subject: [PATCH 04/11] packfile: drop has_pack_index()

The has_pack_index() function has several oddities that may make it
surprising if you are trying to find out if we have a pack with some
$hash:

  - it is not looking for a valid pack that we found while searching
    object directories. It just looks for any pack-$hash.idx file in the
    pack directory.

  - it only looks in the local directory, not any alternates

  - it takes a bare "unsigned char" hash, which we try to avoid these
    days

The only caller it has is in the dumb http code; it wants to know if we
already have the pack idx in question. This can happen if we downloaded
the pack (and generated its index) during a previous fetch.

Before the previous patch ("dumb-http: store downloaded pack idx as
tempfile"), it could also happen if we downloaded the .idx from the
remote but didn't get the matching .pack. But since that patch, we don't
hold on to those .idx files. So there's no need to look for the .idx
file in the filesystem; we can just scan through the packed_git list to
see if we have it.

That lets us simplify the dumb http code a bit, as we know that if we
have the .idx we have the matching .pack already. And it lets us get rid
of this odd function that is unlikely to be needed again.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 http.c     | 15 ++++++++-------
 packfile.c |  8 --------
 packfile.h |  2 --
 3 files changed, 8 insertions(+), 17 deletions(-)

diff --git a/http.c b/http.c
index 9642cad2e331f7..03802b90495cf9 100644
--- a/http.c
+++ b/http.c
@@ -2420,15 +2420,17 @@ static char *fetch_pack_index(unsigned char *hash, const char *base_url)
 static int fetch_and_setup_pack_index(struct packed_git **packs_head,
 	unsigned char *sha1, const char *base_url)
 {
-	struct packed_git *new_pack;
+	struct packed_git *new_pack, *p;
 	char *tmp_idx = NULL;
 	int ret;
 
-	if (has_pack_index(sha1)) {
-		new_pack = parse_pack_index(sha1, sha1_pack_index_name(sha1));
-		if (!new_pack)
-			return -1; /* parse_pack_index() already issued error message */
-		goto add_pack;
+	/*
+	 * If we already have the pack locally, no need to fetch its index or
+	 * even add it to list; we already have all of its objects.
+	 */
+	for (p = get_all_packs(the_repository); p; p = p->next) {
+		if (hasheq(p->hash, sha1, the_repository->hash_algo))
+			return 0;
 	}
 
 	tmp_idx = fetch_pack_index(sha1, base_url);
@@ -2450,7 +2452,6 @@ static int fetch_and_setup_pack_index(struct packed_git **packs_head,
 	if (ret)
 		return -1;
 
-add_pack:
 	new_pack->next = *packs_head;
 	*packs_head = new_pack;
 	return 0;
diff --git a/packfile.c b/packfile.c
index 16d3bcf7f70340..0ead2290d403f3 100644
--- a/packfile.c
+++ b/packfile.c
@@ -2160,14 +2160,6 @@ int has_object_kept_pack(const struct object_id *oid, unsigned flags)
 	return find_kept_pack_entry(the_repository, oid, flags, &e);
 }
 
-int has_pack_index(const unsigned char *sha1)
-{
-	struct stat st;
-	if (stat(sha1_pack_index_name(sha1), &st))
-		return 0;
-	return 1;
-}
-
 int for_each_object_in_pack(struct packed_git *p,
 			    each_packed_object_fn cb, void *data,
 			    enum for_each_object_flags flags)
diff --git a/packfile.h b/packfile.h
index 0f7865822916ff..b4df3546a3e20f 100644
--- a/packfile.h
+++ b/packfile.h
@@ -193,8 +193,6 @@ int find_kept_pack_entry(struct repository *r, const struct object_id *oid, unsi
 int has_object_pack(const struct object_id *oid);
 int has_object_kept_pack(const struct object_id *oid, unsigned flags);
 
-int has_pack_index(const unsigned char *sha1);
-
 /*
  * Return 1 if an object in a promisor packfile is or refers to the given
  * object, 0 otherwise.

From c2dc4c9fbb1f119be6ab55ff8676bf18b4b9446a Mon Sep 17 00:00:00 2001
From: Jeff King <peff@peff.net>
Date: Fri, 25 Oct 2024 03:00:55 -0400
Subject: [PATCH 05/11] packfile: drop sha1_pack_name()

The sha1_pack_name() function has a few ugly bits:

  - it writes into a static strbuf (and not even a ring buffer of them),
    which can lead to subtle invalidation problems

  - it uses the term "sha1", but it's really using the_hash_algo, which
    could be sha256

There's only one caller of it left. And in fact that caller is better
off using the underlying odb_pack_name() function itself, since it's
just copying the result into its own strbuf anyway.

Converting that caller lets us get rid of this now-obselete function.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 http.c     | 3 ++-
 packfile.c | 6 ------
 packfile.h | 7 -------
 3 files changed, 2 insertions(+), 14 deletions(-)

diff --git a/http.c b/http.c
index 03802b90495cf9..510332ab04ffb2 100644
--- a/http.c
+++ b/http.c
@@ -2579,7 +2579,8 @@ struct http_pack_request *new_direct_http_pack_request(
 
 	preq->url = url;
 
-	strbuf_addf(&preq->tmpfile, "%s.temp", sha1_pack_name(packed_git_hash));
+	odb_pack_name(&preq->tmpfile, packed_git_hash, "pack");
+	strbuf_addstr(&preq->tmpfile, ".temp");
 	preq->packfile = fopen(preq->tmpfile.buf, "a");
 	if (!preq->packfile) {
 		error("Unable to open local file %s for pack",
diff --git a/packfile.c b/packfile.c
index 0ead2290d403f3..48d650161f27e6 100644
--- a/packfile.c
+++ b/packfile.c
@@ -35,12 +35,6 @@ char *odb_pack_name(struct strbuf *buf,
 	return buf->buf;
 }
 
-char *sha1_pack_name(const unsigned char *sha1)
-{
-	static struct strbuf buf = STRBUF_INIT;
-	return odb_pack_name(&buf, sha1, "pack");
-}
-
 char *sha1_pack_index_name(const unsigned char *sha1)
 {
 	static struct strbuf buf = STRBUF_INIT;
diff --git a/packfile.h b/packfile.h
index b4df3546a3e20f..2bbcc585719f2e 100644
--- a/packfile.h
+++ b/packfile.h
@@ -31,13 +31,6 @@ struct pack_entry {
  */
 char *odb_pack_name(struct strbuf *buf, const unsigned char *sha1, const char *ext);
 
-/*
- * Return the name of the (local) packfile with the specified sha1 in
- * its name.  The return value is a pointer to memory that is
- * overwritten each time this function is called.
- */
-char *sha1_pack_name(const unsigned char *sha1);
-
 /*
  * Return the name of the (local) pack index file with the specified
  * sha1 in its name.  The return value is a pointer to memory that is

From 4390fea963b284a3cc904ffc6c48d4ab02ea5e68 Mon Sep 17 00:00:00 2001
From: Jeff King <peff@peff.net>
Date: Fri, 25 Oct 2024 03:01:25 -0400
Subject: [PATCH 06/11] packfile: drop sha1_pack_index_name()

Like sha1_pack_name() that we dropped in the previous commit, this
function uses an error-prone static strbuf and the somewhat misleading
name "sha1".

The only caller left is in pack-redundant.c. While this command is
marked for potential removal in our BreakingChanges document, we still
have it for now. But it's simple enough to convert it to use its own
strbuf with the underlying odb_pack_name() function, letting us drop the
otherwise obsolete function.

Note that odb_pack_name() does its own strbuf_reset(), so it's safe to
use directly within a loop like this.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/pack-redundant.c | 5 ++++-
 packfile.c               | 6 ------
 packfile.h               | 7 -------
 3 files changed, 4 insertions(+), 14 deletions(-)

diff --git a/builtin/pack-redundant.c b/builtin/pack-redundant.c
index 580961300283a4..d2c1c4e5ec1471 100644
--- a/builtin/pack-redundant.c
+++ b/builtin/pack-redundant.c
@@ -13,6 +13,7 @@
 
 #include "packfile.h"
 #include "object-store-ll.h"
+#include "strbuf.h"
 
 #define BLKSIZE 512
 
@@ -591,6 +592,7 @@ static void load_all(void)
 int cmd_pack_redundant(int argc, const char **argv, const char *prefix UNUSED, struct repository *repo UNUSED) {
 	int i; int i_still_use_this = 0; struct pack_list *min = NULL, *red, *pl;
 	struct llist *ignore;
+	struct strbuf idx_name = STRBUF_INIT;
 	char buf[GIT_MAX_HEXSZ + 2]; /* hex hash + \n + \0 */
 
 	if (argc == 2 && !strcmp(argv[1], "-h"))
@@ -688,7 +690,7 @@ int cmd_pack_redundant(int argc, const char **argv, const char *prefix UNUSED, s
 	pl = red = pack_list_difference(local_packs, min);
 	while (pl) {
 		printf("%s\n%s\n",
-		       sha1_pack_index_name(pl->pack->hash),
+		       odb_pack_name(&idx_name, pl->pack->hash, "idx"),
 		       pl->pack->pack_name);
 		pl = pl->next;
 	}
@@ -699,5 +701,6 @@ int cmd_pack_redundant(int argc, const char **argv, const char *prefix UNUSED, s
 	pack_list_free(red);
 	pack_list_free(min);
 	llist_free(ignore);
+	strbuf_release(&idx_name);
 	return 0;
 }
diff --git a/packfile.c b/packfile.c
index 48d650161f27e6..0d4fd2737acd2c 100644
--- a/packfile.c
+++ b/packfile.c
@@ -35,12 +35,6 @@ char *odb_pack_name(struct strbuf *buf,
 	return buf->buf;
 }
 
-char *sha1_pack_index_name(const unsigned char *sha1)
-{
-	static struct strbuf buf = STRBUF_INIT;
-	return odb_pack_name(&buf, sha1, "idx");
-}
-
 static unsigned int pack_used_ctr;
 static unsigned int pack_mmap_calls;
 static unsigned int peak_pack_open_windows;
diff --git a/packfile.h b/packfile.h
index 2bbcc585719f2e..6812abaae0bce7 100644
--- a/packfile.h
+++ b/packfile.h
@@ -31,13 +31,6 @@ struct pack_entry {
  */
 char *odb_pack_name(struct strbuf *buf, const unsigned char *sha1, const char *ext);
 
-/*
- * Return the name of the (local) pack index file with the specified
- * sha1 in its name.  The return value is a pointer to memory that is
- * overwritten each time this function is called.
- */
-char *sha1_pack_index_name(const unsigned char *sha1);
-
 /*
  * Return the basename of the packfile, omitting any containing directory
  * (e.g., "pack-1234abcd[...].pack").

From 6b2fc22050227e102af692af28c98925c18c6cc1 Mon Sep 17 00:00:00 2001
From: Jeff King <peff@peff.net>
Date: Fri, 25 Oct 2024 03:02:22 -0400
Subject: [PATCH 07/11] packfile: warn people away from parse_packed_git()

With a name like parse_packed_git(), you might think it's the right way
to access a local pack index and its associated objects. But not so!
It's a one-off used by the dumb-http code to access pack idx files we've
downloaded from the remote, but whose packs we might not have.

There's only one caller left for this function, and ideally we'd drop it
completely and just inline it there. But that would require exposing
other internals from packfile.[ch], like alloc_packed_git() and
check_packed_git_idx().

So let's leave it be for now, and just warn people that it's probably
not what they're looking for. Perhaps in the long run if we eventually
drop dumb-http support, we can remove the function entirely then.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 packfile.h | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/packfile.h b/packfile.h
index 6812abaae0bce7..ad393be9f1aad9 100644
--- a/packfile.h
+++ b/packfile.h
@@ -37,6 +37,15 @@ char *odb_pack_name(struct strbuf *buf, const unsigned char *sha1, const char *e
  */
 const char *pack_basename(struct packed_git *p);
 
+/*
+ * Parse the pack idx file found at idx_path and create a packed_git struct
+ * which can be used with find_pack_entry_one().
+ *
+ * You probably don't want to use this function! It skips most of the normal
+ * sanity checks (including whether we even have the matching .pack file),
+ * and does not add the resulting packed_git struct to the internal list of
+ * packs. You probably want add_packed_git() instead.
+ */
 struct packed_git *parse_pack_index(unsigned char *sha1, const char *idx_path);
 
 typedef void each_file_in_pack_dir_fn(const char *full_path, size_t full_path_len,
@@ -193,7 +202,7 @@ int is_promisor_object(const struct object_id *oid);
  *
  * This function should not be used directly. It is exposed here only so that we
  * have a convenient entry-point for fuzz testing. For real uses, you should
- * probably use open_pack_index() or parse_pack_index() instead.
+ * probably use open_pack_index() instead.
  */
 int load_idx(const char *path, const unsigned int hashsz, void *idx_map,
 	     size_t idx_size, struct packed_git *p);

From 0af861e0c8d7b7c6135bfc688be13370ac2c9468 Mon Sep 17 00:00:00 2001
From: Jeff King <peff@peff.net>
Date: Fri, 25 Oct 2024 03:03:42 -0400
Subject: [PATCH 08/11] http-walker: use object_id instead of bare hash

We long ago switched most code to using object_id structs instead of
bare "unsigned char *" hashes. This gives us more type safety from the
compiler, and generally makes it easier to understand what we expect in
each parameter.

But the dumb-http code has lagged behind. And indeed, the whole "walker"
subsystem interface has the same problem, though http-walker is the only
user left.

So let's update the walker interface to pass object_id structs (which we
already have anyway at all call sites!), and likewise use those within
the http-walker methods that it calls.

This cleans up the dumb-http code a bit, but will also let us fix a few
more commonly used helper functions.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 http-walker.c | 25 +++++++++++++------------
 walker.c      |  4 ++--
 walker.h      |  4 ++--
 3 files changed, 17 insertions(+), 16 deletions(-)

diff --git a/http-walker.c b/http-walker.c
index fb2d86d5e70f0e..36dd1f33c0fe81 100644
--- a/http-walker.c
+++ b/http-walker.c
@@ -147,14 +147,14 @@ static int fill_active_slot(void *data UNUSED)
 	return 0;
 }
 
-static void prefetch(struct walker *walker, unsigned char *sha1)
+static void prefetch(struct walker *walker, const struct object_id *oid)
 {
 	struct object_request *newreq;
 	struct walker_data *data = walker->data;
 
 	newreq = xmalloc(sizeof(*newreq));
 	newreq->walker = walker;
-	oidread(&newreq->oid, sha1, the_repository->hash_algo);
+	oidcpy(&newreq->oid, oid);
 	newreq->repo = data->alt;
 	newreq->state = WAITING;
 	newreq->req = NULL;
@@ -422,7 +422,8 @@ static int fetch_indices(struct walker *walker, struct alt_base *repo)
 	return ret;
 }
 
-static int http_fetch_pack(struct walker *walker, struct alt_base *repo, unsigned char *sha1)
+static int http_fetch_pack(struct walker *walker, struct alt_base *repo,
+			   const struct object_id *oid)
 {
 	struct packed_git *target;
 	int ret;
@@ -431,7 +432,7 @@ static int http_fetch_pack(struct walker *walker, struct alt_base *repo, unsigne
 
 	if (fetch_indices(walker, repo))
 		return -1;
-	target = find_sha1_pack(sha1, repo->packs);
+	target = find_sha1_pack(oid->hash, repo->packs);
 	if (!target)
 		return -1;
 	close_pack_index(target);
@@ -440,7 +441,7 @@ static int http_fetch_pack(struct walker *walker, struct alt_base *repo, unsigne
 		fprintf(stderr, "Getting pack %s\n",
 			hash_to_hex(target->hash));
 		fprintf(stderr, " which contains %s\n",
-			hash_to_hex(sha1));
+			oid_to_hex(oid));
 	}
 
 	preq = new_http_pack_request(target->hash, repo->base);
@@ -477,9 +478,9 @@ static void abort_object_request(struct object_request *obj_req)
 	release_object_request(obj_req);
 }
 
-static int fetch_object(struct walker *walker, unsigned char *hash)
+static int fetch_object(struct walker *walker, const struct object_id *oid)
 {
-	char *hex = hash_to_hex(hash);
+	char *hex = oid_to_hex(oid);
 	int ret = 0;
 	struct object_request *obj_req = NULL;
 	struct http_object_request *req;
@@ -487,7 +488,7 @@ static int fetch_object(struct walker *walker, unsigned char *hash)
 
 	list_for_each(pos, head) {
 		obj_req = list_entry(pos, struct object_request, node);
-		if (hasheq(obj_req->oid.hash, hash, the_repository->hash_algo))
+		if (oideq(&obj_req->oid, oid))
 			break;
 	}
 	if (!obj_req)
@@ -548,20 +549,20 @@ static int fetch_object(struct walker *walker, unsigned char *hash)
 	return ret;
 }
 
-static int fetch(struct walker *walker, unsigned char *hash)
+static int fetch(struct walker *walker, const struct object_id *oid)
 {
 	struct walker_data *data = walker->data;
 	struct alt_base *altbase = data->alt;
 
-	if (!fetch_object(walker, hash))
+	if (!fetch_object(walker, oid))
 		return 0;
 	while (altbase) {
-		if (!http_fetch_pack(walker, altbase, hash))
+		if (!http_fetch_pack(walker, altbase, oid))
 			return 0;
 		fetch_alternates(walker, data->alt->base);
 		altbase = altbase->next;
 	}
-	return error("Unable to find %s under %s", hash_to_hex(hash),
+	return error("Unable to find %s under %s", oid_to_hex(oid),
 		     data->alt->base);
 }
 
diff --git a/walker.c b/walker.c
index 807a7a388131fe..5ea7e5b392b2bd 100644
--- a/walker.c
+++ b/walker.c
@@ -157,7 +157,7 @@ static int process(struct walker *walker, struct object *obj)
 	else {
 		if (obj->flags & COMPLETE)
 			return 0;
-		walker->prefetch(walker, obj->oid.hash);
+		walker->prefetch(walker, &obj->oid);
 	}
 
 	object_list_insert(obj, process_queue_end);
@@ -186,7 +186,7 @@ static int loop(struct walker *walker)
 		 * the queue because we needed to fetch it first.
 		 */
 		if (! (obj->flags & TO_SCAN)) {
-			if (walker->fetch(walker, obj->oid.hash)) {
+			if (walker->fetch(walker, &obj->oid)) {
 				stop_progress(&progress);
 				report_missing(obj);
 				return -1;
diff --git a/walker.h b/walker.h
index d40b016bab8798..25aaa3631c4258 100644
--- a/walker.h
+++ b/walker.h
@@ -6,8 +6,8 @@
 struct walker {
 	void *data;
 	int (*fetch_ref)(struct walker *, struct ref *ref);
-	void (*prefetch)(struct walker *, unsigned char *sha1);
-	int (*fetch)(struct walker *, unsigned char *sha1);
+	void (*prefetch)(struct walker *, const struct object_id *oid);
+	int (*fetch)(struct walker *, const struct object_id *oid);
 	void (*cleanup)(struct walker *);
 	int get_verbosely;
 	int get_progress;

From 4d995591476f0e0b11c512e4d711541118ea2b79 Mon Sep 17 00:00:00 2001
From: Jeff King <peff@peff.net>
Date: Fri, 25 Oct 2024 03:05:03 -0400
Subject: [PATCH 09/11] packfile: convert find_sha1_pack() to use object_id

The find_sha1_pack() function has a few problems:

  - it's badly named, since it works with any object hash

  - it takes the hash as a bare pointer rather than an object_id struct

We can fix both of these easily, as all callers actually have a real
object_id anyway.

I also found the existence of this function somewhat confusing, as it is
about looking in an arbitrary set of linked packed_git structs. It's
good for things like dumb-http which are looking in downloaded remote
packs, and not our local packs. But despite the name, it is not a good
way to find the pack which contains a local object (it skips the use of
the midx, the pack mru list, and so on).

So let's also add an explanatory comment above the declaration that may
point people in the right direction.

I suspect the calls in fast-import.c, which use the packed_git list from
the repository struct, could actually just be using find_pack_entry().
But since we'd need to keep it anyway for dumb-http, I didn't dig
further there. If we eventually drop dumb-http support, then it might be
worth examining them to see if we can get rid of the function entirely.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/fast-import.c | 6 ++----
 http-push.c           | 4 ++--
 http-walker.c         | 2 +-
 packfile.c            | 6 +++---
 packfile.h            | 9 +++++++--
 5 files changed, 15 insertions(+), 12 deletions(-)

diff --git a/builtin/fast-import.c b/builtin/fast-import.c
index 1e7ab67f6e5f13..76d5c20f141f42 100644
--- a/builtin/fast-import.c
+++ b/builtin/fast-import.c
@@ -966,8 +966,7 @@ static int store_object(
 	if (e->idx.offset) {
 		duplicate_count_by_type[type]++;
 		return 1;
-	} else if (find_sha1_pack(oid.hash,
-				  get_all_packs(the_repository))) {
+	} else if (find_oid_pack(&oid, get_all_packs(the_repository))) {
 		e->type = type;
 		e->pack_id = MAX_PACK_ID;
 		e->idx.offset = 1; /* just not zero! */
@@ -1167,8 +1166,7 @@ static void stream_blob(uintmax_t len, struct object_id *oidout, uintmax_t mark)
 		duplicate_count_by_type[OBJ_BLOB]++;
 		truncate_pack(&checkpoint);
 
-	} else if (find_sha1_pack(oid.hash,
-				  get_all_packs(the_repository))) {
+	} else if (find_oid_pack(&oid, get_all_packs(the_repository))) {
 		e->type = OBJ_BLOB;
 		e->pack_id = MAX_PACK_ID;
 		e->idx.offset = 1; /* just not zero! */
diff --git a/http-push.c b/http-push.c
index aad89f2eab3b52..4d24e6b8d4e092 100644
--- a/http-push.c
+++ b/http-push.c
@@ -309,7 +309,7 @@ static void start_fetch_packed(struct transfer_request *request)
 	struct transfer_request *check_request = request_queue_head;
 	struct http_pack_request *preq;
 
-	target = find_sha1_pack(request->obj->oid.hash, repo->packs);
+	target = find_oid_pack(&request->obj->oid, repo->packs);
 	if (!target) {
 		fprintf(stderr, "Unable to fetch %s, will not be able to update server info refs\n", oid_to_hex(&request->obj->oid));
 		repo->can_update_info_refs = 0;
@@ -681,7 +681,7 @@ static int add_send_request(struct object *obj, struct remote_lock *lock)
 		get_remote_object_list(obj->oid.hash[0]);
 	if (obj->flags & (REMOTE | PUSHING))
 		return 0;
-	target = find_sha1_pack(obj->oid.hash, repo->packs);
+	target = find_oid_pack(&obj->oid, repo->packs);
 	if (target) {
 		obj->flags |= REMOTE;
 		return 0;
diff --git a/http-walker.c b/http-walker.c
index 36dd1f33c0fe81..43cde0ebe51c3f 100644
--- a/http-walker.c
+++ b/http-walker.c
@@ -432,7 +432,7 @@ static int http_fetch_pack(struct walker *walker, struct alt_base *repo,
 
 	if (fetch_indices(walker, repo))
 		return -1;
-	target = find_sha1_pack(oid->hash, repo->packs);
+	target = find_oid_pack(oid, repo->packs);
 	if (!target)
 		return -1;
 	close_pack_index(target);
diff --git a/packfile.c b/packfile.c
index 0d4fd2737acd2c..c51eab15a57dc9 100644
--- a/packfile.c
+++ b/packfile.c
@@ -2010,13 +2010,13 @@ int is_pack_valid(struct packed_git *p)
 	return !open_packed_git(p);
 }
 
-struct packed_git *find_sha1_pack(const unsigned char *sha1,
-				  struct packed_git *packs)
+struct packed_git *find_oid_pack(const struct object_id *oid,
+				 struct packed_git *packs)
 {
 	struct packed_git *p;
 
 	for (p = packs; p; p = p->next) {
-		if (find_pack_entry_one(sha1, p))
+		if (find_pack_entry_one(oid->hash, p))
 			return p;
 	}
 	return NULL;
diff --git a/packfile.h b/packfile.h
index ad393be9f1aad9..3baffa940cf2ab 100644
--- a/packfile.h
+++ b/packfile.h
@@ -79,8 +79,13 @@ struct packed_git *get_all_packs(struct repository *r);
  */
 unsigned long repo_approximate_object_count(struct repository *r);
 
-struct packed_git *find_sha1_pack(const unsigned char *sha1,
-				  struct packed_git *packs);
+/*
+ * Find the pack within the "packs" list whose index contains the object "oid".
+ * For general object lookups, you probably don't want this; use
+ * find_pack_entry() instead.
+ */
+struct packed_git *find_oid_pack(const struct object_id *oid,
+				 struct packed_git *packs);
 
 void pack_report(void);
 

From 479ab76c9ffbd35585a1506ac5c99fe218df70b9 Mon Sep 17 00:00:00 2001
From: Jeff King <peff@peff.net>
Date: Fri, 25 Oct 2024 03:06:06 -0400
Subject: [PATCH 10/11] packfile: use object_id in find_pack_entry_one()

The main function we use to search a pack index for an object is
find_pack_entry_one(). That function still takes a bare pointer to the
hash, despite the fact that its underlying bsearch_pack() function needs
an object_id struct. And so we end up making an extra copy of the hash
into the struct just to do a lookup.

As it turns out, all callers but one already have such an object_id. So
we can just take a pointer to that struct and use it directly. This
avoids the extra copy and provides a more type-safe interface.

The one exception is get_delta_base() in packfile.c, when we are chasing
a REF_DELTA from inside the pack (and thus we have a pointer directly to
the mmap'd pack memory, not a struct). We can just bump the hashcpy()
from inside find_pack_entry_one() to this one caller that needs it.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/pack-objects.c    |  4 ++--
 connected.c               |  4 ++--
 midx.c                    |  2 +-
 pack-bitmap.c             |  4 ++--
 packfile.c                | 16 ++++++++--------
 packfile.h                |  4 ++--
 t/helper/test-find-pack.c |  2 +-
 7 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 0fc0680b40252a..080071426710da 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -1556,7 +1556,7 @@ static int want_object_in_pack_one(struct packed_git *p,
 	if (p == *found_pack)
 		offset = *found_offset;
 	else
-		offset = find_pack_entry_one(oid->hash, p);
+		offset = find_pack_entry_one(oid, p);
 
 	if (offset) {
 		if (!*found_pack) {
@@ -3984,7 +3984,7 @@ static int has_sha1_pack_kept_or_nonlocal(const struct object_id *oid)
 	while (p) {
 		if ((!p->pack_local || p->pack_keep ||
 				p->pack_keep_in_core) &&
-			find_pack_entry_one(oid->hash, p)) {
+			find_pack_entry_one(oid, p)) {
 			last_found = p;
 			return 1;
 		}
diff --git a/connected.c b/connected.c
index 87cc4b57a17ce5..a9e2e139956f0c 100644
--- a/connected.c
+++ b/connected.c
@@ -78,7 +78,7 @@ int check_connected(oid_iterate_fn fn, void *cb_data,
 			for (p = get_all_packs(the_repository); p; p = p->next) {
 				if (!p->pack_promisor)
 					continue;
-				if (find_pack_entry_one(oid->hash, p))
+				if (find_pack_entry_one(oid, p))
 					goto promisor_pack_found;
 			}
 			/*
@@ -144,7 +144,7 @@ int check_connected(oid_iterate_fn fn, void *cb_data,
 		 * are sure the ref is good and not sending it to
 		 * rev-list for verification.
 		 */
-		if (new_pack && find_pack_entry_one(oid->hash, new_pack))
+		if (new_pack && find_pack_entry_one(oid, new_pack))
 			continue;
 
 		if (fprintf(rev_list_in, "%s\n", oid_to_hex(oid)) < 0)
diff --git a/midx.c b/midx.c
index 479812cb9bfddc..e82d4f2e65496b 100644
--- a/midx.c
+++ b/midx.c
@@ -987,7 +987,7 @@ int verify_midx_file(struct repository *r, const char *object_dir, unsigned flag
 		}
 
 		m_offset = e.offset;
-		p_offset = find_pack_entry_one(oid.hash, e.p);
+		p_offset = find_pack_entry_one(&oid, e.p);
 
 		if (m_offset != p_offset)
 			midx_report(_("incorrect object offset for oid[%d] = %s: %"PRIx64" != %"PRIx64),
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 32b222a7afce26..4fa9dfc771a305 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -935,7 +935,7 @@ static inline int bitmap_position_packfile(struct bitmap_index *bitmap_git,
 					   const struct object_id *oid)
 {
 	uint32_t pos;
-	off_t offset = find_pack_entry_one(oid->hash, bitmap_git->pack);
+	off_t offset = find_pack_entry_one(oid, bitmap_git->pack);
 	if (!offset)
 		return -1;
 
@@ -1609,7 +1609,7 @@ static int in_bitmapped_pack(struct bitmap_index *bitmap_git,
 			if (bsearch_midx(&object->oid, bitmap_git->midx, NULL))
 				return 1;
 		} else {
-			if (find_pack_entry_one(object->oid.hash, bitmap_git->pack) > 0)
+			if (find_pack_entry_one(&object->oid, bitmap_git->pack) > 0)
 				return 1;
 		}
 	}
diff --git a/packfile.c b/packfile.c
index c51eab15a57dc9..005ca670b4b495 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1239,7 +1239,9 @@ off_t get_delta_base(struct packed_git *p,
 		*curpos += used;
 	} else if (type == OBJ_REF_DELTA) {
 		/* The base entry _must_ be in the same pack */
-		base_offset = find_pack_entry_one(base_info, p);
+		struct object_id oid;
+		hashcpy(oid.hash, base_info, the_repository->hash_algo);
+		base_offset = find_pack_entry_one(&oid, p);
 		*curpos += the_hash_algo->rawsz;
 	} else
 		die("I am totally screwed");
@@ -1971,11 +1973,10 @@ off_t nth_packed_object_offset(const struct packed_git *p, uint32_t n)
 	}
 }
 
-off_t find_pack_entry_one(const unsigned char *sha1,
-				  struct packed_git *p)
+off_t find_pack_entry_one(const struct object_id *oid,
+			  struct packed_git *p)
 {
 	const unsigned char *index = p->index_data;
-	struct object_id oid;
 	uint32_t result;
 
 	if (!index) {
@@ -1983,8 +1984,7 @@ off_t find_pack_entry_one(const unsigned char *sha1,
 			return 0;
 	}
 
-	hashcpy(oid.hash, sha1, the_repository->hash_algo);
-	if (bsearch_pack(&oid, p, &result))
+	if (bsearch_pack(oid, p, &result))
 		return nth_packed_object_offset(p, result);
 	return 0;
 }
@@ -2016,7 +2016,7 @@ struct packed_git *find_oid_pack(const struct object_id *oid,
 	struct packed_git *p;
 
 	for (p = packs; p; p = p->next) {
-		if (find_pack_entry_one(oid->hash, p))
+		if (find_pack_entry_one(oid, p))
 			return p;
 	}
 	return NULL;
@@ -2033,7 +2033,7 @@ static int fill_pack_entry(const struct object_id *oid,
 	    oidset_contains(&p->bad_objects, oid))
 		return 0;
 
-	offset = find_pack_entry_one(oid->hash, p);
+	offset = find_pack_entry_one(oid, p);
 	if (!offset)
 		return 0;
 
diff --git a/packfile.h b/packfile.h
index 3baffa940cf2ab..08f88a7ff521e9 100644
--- a/packfile.h
+++ b/packfile.h
@@ -154,10 +154,10 @@ int nth_packed_object_id(struct object_id *, struct packed_git *, uint32_t n);
 off_t nth_packed_object_offset(const struct packed_git *, uint32_t n);
 
 /*
- * If the object named sha1 is present in the specified packfile,
+ * If the object named by oid is present in the specified packfile,
  * return its offset within the packfile; otherwise, return 0.
  */
-off_t find_pack_entry_one(const unsigned char *sha1, struct packed_git *);
+off_t find_pack_entry_one(const struct object_id *oid, struct packed_git *);
 
 int is_pack_valid(struct packed_git *);
 void *unpack_entry(struct repository *r, struct packed_git *, off_t, enum object_type *, unsigned long *);
diff --git a/t/helper/test-find-pack.c b/t/helper/test-find-pack.c
index 14b2b0c12c0da3..85a69a4e557694 100644
--- a/t/helper/test-find-pack.c
+++ b/t/helper/test-find-pack.c
@@ -40,7 +40,7 @@ int cmd__find_pack(int argc, const char **argv)
 		die("cannot parse %s as an object name", argv[0]);
 
 	for (p = get_all_packs(the_repository); p; p = p->next)
-		if (find_pack_entry_one(oid.hash, p)) {
+		if (find_pack_entry_one(&oid, p)) {
 			printf("%s\n", p->pack_name);
 			actual_count++;
 		}

From 863f2459a2047406c93758e8c3352cd5d2836f1e Mon Sep 17 00:00:00 2001
From: Jeff King <peff@peff.net>
Date: Fri, 25 Oct 2024 03:08:10 -0400
Subject: [PATCH 11/11] packfile: use oidread() instead of hashcpy() to fill
 object_id

When chasing a REF_DELTA, we need to pull the raw hash bytes out of the
mmap'd packfile into an object_id struct. We do that with a raw
hashcpy() of the appropriate length (that happens directly now, though
before the previous commit it happened inside find_pack_entry_one(),
also using a hashcpy).

But I think this creates a potentially dangerous situation due to
d4d364b2c7 (hash: convert `oidcmp()` and `oideq()` to compare whole
hash, 2024-06-14). When using sha1, we'll have uninitialized bytes in
the latter part of the object_id.hash buffer, which could fool oideq(),
etc.

We should use oidread() instead, which correctly zero-pads the extra
bytes, as of c98d762ed9 (global: ensure that object IDs are always
padded, 2024-06-14).

As far as I can see, this has not been a problem in practice because the
object_id we feed to find_pack_entry_one() is never used with oideq(),
etc. It is being compared to the bytes mmap'd from a pack idx file,
which of course do not have the extra padding bytes themselves. So
there's no bug here, but this just puzzled me while looking at the code.
We should do the more obviously safe thing, both for future-proofing and
to avoid confusing readers.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 packfile.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/packfile.c b/packfile.c
index 005ca670b4b495..9560f0a33c6f26 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1240,7 +1240,7 @@ off_t get_delta_base(struct packed_git *p,
 	} else if (type == OBJ_REF_DELTA) {
 		/* The base entry _must_ be in the same pack */
 		struct object_id oid;
-		hashcpy(oid.hash, base_info, the_repository->hash_algo);
+		oidread(&oid, base_info, the_repository->hash_algo);
 		base_offset = find_pack_entry_one(&oid, p);
 		*curpos += the_hash_algo->rawsz;
 	} else