Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pull] master from git:master #18

Open
wants to merge 4,009 commits into
base: master
Choose a base branch
from
Open

[pull] master from git:master #18

wants to merge 4,009 commits into from

Conversation

pull[bot]
Copy link

@pull pull bot commented Jan 13, 2024

See Commits and Changes for more details.


Created by pull[bot]

Can you help keep this open source service alive? 💖 Please sponsor : )

@pull pull bot added the ⤵️ pull label Jan 13, 2024
pks-t added 29 commits December 6, 2024 20:20
Similar to the preceding commit, we get a warning in `get_packet_data()`
on 32 bit platforms due to our lenient use of `ssize_t`. This function
is kind of curious though: we accept an `unsigned size` of bytes to
read, then store the actual number of bytes read in an `ssize_t` and
return it as an `int`. This is a whole lot of integer conversions, and
in theory these can cause us to overflow when the passed-in size is
larger than `ssize_t`, which on 32 bit platforms is implemented as an
`int`.

None of the callers of that function even care about the number of bytes
we have read, so returning that number is moot anyway. Refactor the
function such that it only returns an error code, which plugs the
potential overflow. While at it, convert the passed-in size parameter to
be of type `size_t`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have a bunch of loops which iterate up to an unsigned boundary using
a signed index, which generates warnigs because we compare a signed and
unsigned value in the loop condition. Address these sites for trivial
cases and enable `-Wsign-compare` warnings for these code units.

This patch only adapts those code units where we can drop the
`DISABLE_SIGN_COMPARE_WARNINGS` macro in the same step.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have several loops in "daemon.c" that use a signed integer to loop
through a `size_t`. Adapt them to instead use a `size_t` as counter
value.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `max_connections` type tracks how many children git-daemon(1) would
spawn at the same time. This value can be controlled via a command line
switch: if given a positive value we'll set that up as the limit. But
when given either zero or a negative value we don't enforce any limit at
all.

But even when being passed a negative value we won't actually store it,
but normalize it to 0. Still, the variable used to store the config is
using a signed integer, which causes warnings when comparing the number
of accepted connections (`max_connections`) with the number of current
connections being handled (`live_children`).

Adapt the type of `max_connections` such that the types of both
variables match.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are a couple of -Wsign-comparison warnings in "gpg-interface.c".
Most of them are trivial and simply using signed integers to loop
towards an upper unsigned bound.

But in `parse_signed_buffer()` we have one case where the different
signedness of the two values of a ternary expression results in a
warning. Given that:

  - `size` will always be bigger than `len` due to the loop condition.

  - `eol` will always be after `buf + len` because it is found via
    memchr(3p) starting from `buf + len`.

We know that both values will always be natural integers.

Squelch the warning by casting the left-hand side to `size_t`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `length` variable is used to store how many bytes we wish to emit
from an object ID. This value will either be the full hash algorithm's
length, or the abbreviated hash that can be set via `--abbrev` or the
"core.abbrev" option. The former is of type `size_t`, whereas the latter
is of type `int`, which causes a warning with "-Wsign-compare".

The reason why `abbrev` is using a signed type is mostly that it is
initialized with `-1` to indicate that we have to compute the minimum
abbreviation length. This length is computed via `find_alignment()`,
which always gets called before `emit_other()`, and thus we can assume
that the value would never be negative in `emit_other()`.

In fact, we can even assume that the value will always be at least
`MINIMUM_ABBREV`, which is enforced by both `git_default_core_config()`
and `parse_opt_abbrev_cb()`. We implicitly rely on this by subtracting
up to 3 without checking for whether the value becomes negative. We then
pass the value to printf(3p) to print the prefix of our object's ID, so
if that assumption was violated we may end up with undefined behaviour.

Squelch the warning by asserting this invariant and casting the value of
`abbrev` to `size_t`. This allows us to store the whole length as an
unsigned integer, which we can then pass to `fwrite()`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In `get_one_patchid()` we assign either the result of `strlen()` or
`remove_space()` to `len`. But while the former correctly returns a
`size_t`, the latter returns an `int` to indicate the length of the
stripped string even though it cannot ever return a negative value. This
causes a warning with "-Wsign-conversion".

In fact, even `get_one_patchid()` itself is also using an integer as
return value even though it always returns the length of the patch, and
this bubbles up to other callers.

Adapt the function and its helpers to use `size_t` for string lengths
consistently.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are two -Wsign-compare warnings in "scalar.c", both of which are
trivial:

  - We mistakenly use a signed integer to loop towards an upper unsigned
    bound in `cmd_reconfigure()`.

  - We subtract `path_sep - enlistment->buf`, which results in a signed
    integer, and use the value in a ternary expression where second
    value is unsigned. But as `path_sep` is being assigned the result of
    `find_last_dir_sep(enlistment->buf + offset)` we know that it must
    always be bigger than or equal to `enlistment->buf`, and thus the
    result will be positive.

Address both of these warnings.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In our test helpers we have two cases where we assign -1 to an `unsigned
long`. The intent is to essentially mean "unbounded output", which is
achieved via implicit wraparound of the value.

This pattern causes warnings with -Wsign-compare though. Adapt it and
instead use `ULONG_MAX` explicitly.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "GIT-BUILD-OPTIONS" file is generated by our build systems to
propagate built-in features and paths to our tests. The generation is
done ad-hoc, where both our Makefile and the CMake build instructions
simply echo a bunch of strings into the file. This makes it very hard to
figure out what variables are expected to exist and what format they
have, and the written variables can easily get out of sync between build
systems.

Introduce a new "GIT-BUILD-OPTIONS.in" template to address this issue.
This has multiple advantages:

  - It demonstrates which built options exist in the first place.

  - It can serve as a spot to document the build options.

  - Some build systems complain when not all variables could be
    substituted, alerting us of mismatches. Others don't, but if we
    forgot to substitute such variables we now have a bogus string that
    will likely cause our tests to fail, if they have any meaning in the
    first place.

Backfill values that we didn't yet set in our CMake build instructions.
While at it, remove the `SUPPORTS_SIMPLE_IPC` variable that we only set
up in CMake as it isn't used anywhere.

This change requires us to adapt the setup of TEST_OUTPUT_DIRECTORY in
"test-lib.sh" such that it does not get overwritten after sourcing when
it has been set up via the environment. This is the only instance I
could find where we rely on ordering on variables.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have a bunch of placeholders in our scripts that we replace at build
time, for example by using sed(1). These placeholders come in three
different formats: @Placeholder@, @@Placeholder@@ and ++PLACEHOLDER++.

Next to being inconsistent it also creates a bit of a problem with
CMake, which only supports the first syntax in its `configure_file()`
function. To work around that we instead manually replace placeholders
via string operations, which is a hassle and removes safeguards that
CMake has to verify that we didn't forget to replace any placeholders.
Besides that, other build systems like Meson also support the CMake
syntax.

Unify our codebase to consistently use the syntax supported by such
build systems.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our "GIT-VERSION-GEN" script always writes the "GIT-VERSION-FILE" into
the current directory, where the expectation is that it should exist in
the source directory. But other build systems that support out-of-tree
builds may not want to do that to keep the source directory pristine,
even though CMake currently doesn't care.

Refactor the script such that it won't write the "GIT-VERSION-FILE"
directly anymore, but instead knows to replace @placeholders@ in an
arbitrary input file. This allows us to simplify the logic in CMake to
determine the project version, but can also be reused later on in order
to generate other files that need to contain version information like
our "git.rc" file.

While at it, change the format of the version file by removing the
spaces around the equals sign. Like this we can continue to include the
file in our Makefiles, but can also start to source it in shell scripts
in subsequent steps.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We set up a couple of preprocessor macros when compiling Git that
propagate the version that Git was built from to `git version` et al.
The way this is set up makes it harder than necessary to reuse the
infrastructure across the different build systems.

Refactor this such that we generate a "version-def.h" header via
`GIT-VERSION-GEN` instead.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When injecting the Perl path into our scripts we sometimes use '@Perl@'
while we othertimes use '@PERL_PATH@'. Refactor the code use the latter
consistently, which makes it easier to reuse the same logic for multiple
scripts.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "git.rc" is used on Windows to embed information like the project
name and version into the resulting executables. As such we need to
inject the version information, which we do by using preprocessor
defines. The logic to do so is non-trivial and needs to be kept in sync
with the different build systems.

Refactor the logic so that we generate "git.rc" via `GIT-VERSION-GEN`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Extract the script to inject various build-time parameters into our Perl
scripts into a standalone script. This is done such that we can reuse it
in other build systems.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The documentation we generate embeds information for the exact Git
version used as well as the date of the commit. This information is
injected by injecting attributes into the build process via command line
argument.

Refactor the logic so that we write the information into "asciidoc.conf"
and "asciidoctor-extensions.rb" via `GIT-VERSION-GEN` for AsciiDoc and
AsciiDoctor, respectively.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Extend "generate-perl.sh" such that it knows to also massage the Perl
library files. There are two major differences:

  - We do not read in the Perl header. This is handled by matching on
    whether or not we have a Perl shebang.

  - We substitute some more variables, which we read in via our
    GIT-BUILD-OPTIONS.

Adapt both our Makefile and the CMake build instructions to use this.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Same as in the preceding commits, extract a script that allows us to
unify how we massage shell scripts.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Extract a script that massages Python scripts. This provides a couple of
benefits:

  - The build logic is deduplicated across Make, CMake and Meson.

  - CMake learns to rewrite scripts as-needed at build time instead of
    only writing them at configure time.

Furthermore, we will use this script when introducing Meson to
deduplicate the logic across build systems.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In order to generate "gitweb.cgi" we have to replace various different
placeholders. This is done ad-hoc and is thus not easily reusable across
different build systems.

Introduce a new GITWEB-BUILD-OPTIONS.in template that we populate at
configuration time with the expected options. This script is then used
as input for a new "generate-gitweb.sh" script that generates the final
"gitweb.cgi" file. While this requires us to repeat the options multiple
times, it is in line to how we generate other build options like our
GIT-BUILD-OPTIONS file.

While at it, refactor how we replace the GITWEB_PROJECT_MAXDEPTH. Even
though this variable is supposed to be an integer, the source file has
the value quoted. The quotes are eventually stripped via sed(1), which
replaces `"@GITWEB_PROJECT_MAXDEPTH@"` with the actual value, which is
rather nonsensical. This is made clearer by just dropping the quotes in
the source file.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Similar to the preceding commit, also extract the script to generate the
"gitweb.js" file. While the logic itself is trivial, it helps us avoid
duplication of logic across build systems and ensures that the build
systems will remain in sync with each other in case the logic ever needs
to change.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have multiple scripts that generate headers from other data. All of
these scripts have the assumption built-in that they are executed in the
current source directory, which makes them a bit unwieldy to use during
out-of-tree builds.

Refactor them to instead take the source directory as well as the output
file as arguments.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "bin-wrappers/" directory gets created by our build system and is
populated with one script for each of our binaries. There isn't anything
inherently wrong with the current layout, but it is somewhat hard to
adapt for out-of-tree build systems.

Adapt the layout such that our "bin-wrappers/" directory always exists
and contains our "wrap-for-bin.sh" script to make things a little bit
easier for subsequent steps.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Write the absolute program path into our bin-wrappers. This allows us to
simplify the Meson build instructions we are about to introduce a bit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we install Git we also install a set of default templates that both
git-init(1) and git-clone(1) populate into our build directories. The
way the pristine templates are laid out in our source directory is
somewhat weird though: instead of reconstructing the actual directory
hierarchy in "templates/", we represent directory separators with "--".

The only reason I could come up with for why we have this is the
"branches/" directory, which is supposed to be empty when installing it.
And as Git famously doesn't store empty directories at all we have to
work around this limitation.

Now the thing is that the "branches/" directory is a leftover to how
branches used to be stored in the dark ages. gitrepository-layout(5)
lists this directory as "slightly deprecated", which I would claim is a
strong understatement. I have never encountered anybody using it today
and would be surprised if it even works as expected. So having the "--"
hack in place for an item that is basically unused, unmaintained and
deprecated doesn't only feel unreasonable, but installing that entry by
default may also cause confusion for users that do not know what this is
supposed to be in the first place.

Remove this directory from our templates and, now that we do not require
the workaround anymore, restructure the templates to form a proper
hierarchy. This makes it way easier for build systems to install these
templates into place.

We should likely think about removing support for "branch/" altogether,
but that is outside of the scope of this patch series.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our documentation uses "include::" directives to include parts that are
either reused across multiple documents or parts that we generate at
build time. Unfortunately, top-level includes are only ever resolved
relative to the base directory, which is typically the directory of the
including document. Most importantly, it is not possible to have either
asciidoc or asciidoctor search multiple directories.

It follows that both kinds of includes must live in the same directory.
This is of course a bummer for out-of-tree builds, because here the
dynamically-built includes live in the build directory whereas the
static includes live in the source directory.

Introduce a `build_dir` attribute and prepend it to all of our includes
for dynamically-built files. This attribute gets set to the build
directory and thus converts the include path to an absolute path, which
asciidoc and asciidoctor know how to resolve.

Note that this change also requires us to update "build-docdep.perl",
which tries to figure out included files such our Makefile can set up
proper build-time dependencies. This script simply scans through the
source files for any lines that match "^include::" and treats the
remainder of the line as included file path. But given that those may
now contain the "{build_dir}" variable we have to teach the script to
replace that attribute with the actual build directory.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "cmd-list.perl" script generates a list of commands that can be
included into our manpages. The script doesn't know about out-of-tree
builds and instead writes resulting files into the source directory.

Adapt it such that we can read data from the source directory and write
data into the build directory.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We include the list of available mergetools into our manpages. Extract
the script that performs this logic such that we can reuse it in other
build systems.

While at it, refactor the Makefile targets such that we don't create
"mergetools-list.made" anymore. It shouldn't be necessary, as we can
instead have other targets depend on "mergetools-{diff,merge}.txt"
directly.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
bagasme and others added 30 commits January 6, 2025 15:55
Update following components:

  * advice.c
  * archive.c
  * builtin/checkout.c
  * builtin/clone.c
  * builtin/config.c
  * builtin/describe.c
  * builtin/fetch.c
  * builtin/gc.c
  * builtin/index-pack.c
  * builtin/notes.c
  * builtin/pack-objects.c
  * builtin/remote.c
  * builtin/worktree.c
  * commit.c
  * fetch-pack.c
  * hook.c
  * object-name.c
  * refs.c
  * refs/files-backend.c
  * remote.c
  * worktree.c

Translate following new components:

  * cache-tree.c
  * daemon.c
  * merge-ll.c

Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
9e2b700 (fetch set_head: add warn-if-not-$branch option, 2024-12-05)
tried to expand the advice message for set_head with the new option, but
unfortunately did not manage to add the right incantation. Fix the
advice message with the correct usage of warn-if-not-$branch.

Reported-by: Teng Long <dyroneteng@gmail.com>
Signed-off-by: Bence Ferdinandy <bence@ferdinandy.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Docfix.

* mh/doc-windows-home-env:
  git.txt: fix heading line of tildes
Docfix.

* jc/doc-opt-tilde-expand:
  gitcli.txt: typeset pathnames as monospace
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A hotfix on an advice messagge added during this cycle.

* bf/fetch-set-head-config:
  fetch: fix erroneous set_head advice message
* 'l10n/zh-TW/2024-12-17' of github.com:l10n-tw/git-po:
  l10n: zh_TW: Git 2.48 round 2
  l10n: zh_TW: Git 2.48
* 'master' of github.com:nafmo/git-l10n-sv:
  l10n: sv.po, fixed swedish typos
  l10n: sv.po: Update Swedish translation
* 'tr-l10n' of github.com:bitigchi/git-po:
  l10n: tr: Update Turkish translations for 2.48
* 'po-id' of github.com:bagasme/git-po:
  l10n: po-id for 2.48
* 'master' of github.com:alshopov/git-po:
  l10n: bg.po: Updated Bulgarian translation (5804t)
* 'fr_v2.48.0' of github.com:jnavila/git:
  l10n: fr: v2.48.0
  l10n: fr.po: Minor improvements
* 'tl/zh_CN_2.48.0_rnd' of github.com:dyrone/git:
  l10n: zh_CN: updated translation for 2.48
* 'l10n-de-2.48' of github.com:ralfth/git:
  l10n: Update German translation
* 'vi-2.48' of github.com:Nekosha/git-po:
  l10n: vi: Updated translation for 2.48
* '2.48-uk-update' of github.com:arkid15r/git-ukrainian-l10n:
  l10n: uk: v2.48 update
l10n-2.48.0-rnd1

* tag 'l10n-2.48.0-rnd1' of https://github.com/git-l10n/git-po:
  l10n: po-id for 2.48
  l10n: zh_CN: updated translation for 2.48
  l10n: uk: v2.48 update
  l10n: sv.po, fixed swedish typos
  l10n: vi: Updated translation for 2.48
  l10n: Update German translation
  l10n: tr: Update Turkish translations for 2.48
  l10n: sv.po: Update Swedish translation
  l10n: fr: v2.48.0
  l10n: zh_TW: Git 2.48 round 2
  l10n: zh_TW: Git 2.48
  l10n: bg.po: Updated Bulgarian translation (5804t)
  l10n: fr.po: Minor improvements
Correct verb tense, add missing words, avoid double blank lines,
and rephrase things that don’t read well to me like “Turn this linkage
to relative paths”.

Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
`generate_perl_command` needs `depends: [git_version_file]` and the uses
in top-level meson.build were fine, but the ones in perl/ weren't, causing
parallel build failures in some cases as GIT-BUILD-OPTIONS wasn't yet
available.

Signed-off-by: Sam James <sam@gentoo.org>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As indicated by the `#undef malloc` line in `reftable/basics.h`, it is
quite common to use allocators other than the default one by defining
`malloc` constants and friends.

This pattern is used e.g. in Git for Windows, which uses the powerful
and performant `mimalloc` allocator.

Furthermore, in `reftable/basics.c` this `#undef malloc` is
_specifically_ disabled by virtue of defining the
`REFTABLE_ALLOW_BANNED_ALLOCATORS` constant before including
`reftable/basic.h`, to ensure that such a custom allocator is also used
in the reftable code.

However, in 8db127d (reftable: avoid leaks on realloc error,
2024-12-28) and in 2cca185 (reftable: fix allocation count on
realloc error, 2024-12-28), `reftable_set_alloc()` function calls were
introduced that pass `malloc`, `realloc` and `free` function pointers as
parameters _after_ `reftable/basics.h` ensured that they were no longer
`#define`d. This would override the custom allocator and re-set it to
the default allocator provided by, say, libc or MSVCRT.

This causes problems because those calls happen after the initial
allocator has already been used to initialize an array, which is
subsequently resized using the overridden default `realloc()` allocator.

You cannot mix and match allocators like that, which leads to a
`STATUS_HEAP_CORRUPTION` (C0000374) on Windows, and when running this
unit test through shell and/or `prove` (which only support 7-bit status
codes), it surfaces as exit code 127.

It is actually unnecessary to use those function pointers to
`malloc`/`realloc`/`free`, though: The `reftable` code goes out of its
way to fall back to the initial allocator when passing `NULL` parameters
instead. So let's do that instead of causing heap corruptions.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Acked-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The build procedure in "meson" for the "perl/" hierarchy lacked
necessary dependencies, which has been corrected.

* sj/meson-perl-build-fix:
  meson: fix perl dependencies
Last-minute fix to a recent update.

* js/reftable-realloc-errors-fix:
  t-reftable-basics: allow for `malloc` to be `#define`d
In 6411a0a (builtin/blame: fix type of `length` variable when
emitting object ID, 2024-12-06) we have fixed the type of the `length`
variable. In order to avoid a cast from `size_t` to `int` in the call to
printf(3p) with the "%.*s" formatter we have converted the code to
instead use fwrite(3p), which accepts the length as a `size_t`.

It was reported though that this makes us read over the end of the OID
array when the provided `--abbrev=` length exceeds the length of the
object ID. This is because fwrite(3p) of course doesn't stop when it
sees a NUL byte, whereas printf(3p) does.

Fix the bug by reverting back to printf(3p) and culling the provided
length to `GIT_MAX_HEXSZ` to keep it from overflowing when cast to an
`int`.

Reported-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When passing the `-b` flag to git-blame(1), then any blamed boundary
commits which were marked as uninteresting will not get their actual
commit ID printed, but will instead be replaced by a couple of spaces.

The flag can lead to an out-of-bounds write as though when combined with
`--abbrev=` when the abbreviation length is longer than `GIT_MAX_HEXSZ`
as we simply use memset(3p) on that array with the user-provided length
directly. The result is most likely that we segfault.

An obvious fix would be to cull `length` to `GIT_MAX_HEXSZ` many bytes.
But when the underlying object ID is SHA1, and if the abbreviated length
exceeds the SHA1 length, it would cause us to print more bytes than
desired, and the result would be misaligned.

Instead, fix the bug by computing the length via strlen(3p). This makes
us write as many bytes as the formatted object ID requires and thus
effectively limits the length of what we may end up printing to the
length of its hash. If `--abbrev=` asks us to abbreviate to something
shorter than the full length of the underlying hash function it would be
handled by the call to printf(3p) correctly.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"Why would one want to run it in parallel?" I hear you ask. I am glad
you are curious, because a curious story is what it is, indeed.

The `GIT-VERSION-GEN` script is quite a pillar of Git's source code,
with most lines being unchanged for the past 15 years. Until the v2.48.0
release candidate cycle.

Its original purpose was to generate the version string and store it in
the `GIT-VERSION-FILE`.

This paradigm changed quite dramatically when support for building with
Meson was introduced. Most crucially, a38edab (Makefile: generate
doc versions via GIT-VERSION-GEN, 2024-12-06) changed the way the
documentation is built by using the `GIT-VERSION-GEN` file to write out
the `asciidocor-extensions.rb` and `asciidoc.conf` files with now
hard-coded version strings.

Crucially, the Makefile rule to generate those files needs to be run in
every build because `GIT_VERSION` could have been specified in the
`make` command-line, which would require these files to be modified.

This introduced a surprising race condition!

And this is how that race surfaces: When calling `make -j2 html man`
from the top-level directory (a variant of which is invoked in Git for
Windows' release process), two sub-processes are spawned, a `make -C
Documentation html` one and a `make -C Documentation man` one. Both run
the rule to (re-)generate `asciidoctor-extensions.rb` or
`asciidoc.conf`, invoking `GIT-VERSION-GEN` to do so. That script first
generates a temporary file (appending the `+` character to the
filename), then looks whether it contains something different than the
already existing file (if it exists, that is), and either replaces it if
needed, or removes the temporary file. If one of the two parallel
invocations removes that temporary file before the other can compare it,
or even worse: if one tries to replace the target file just after the
other _started_ writing the temporary file (but did not finish writing
it yet), that race condition now causes bad builds.

This may sound highly theoretical, but due to the design of Git's build
process, Git for Windows is forced to use a (slow) POSIX emulation layer
to run that script and in the blink of an eye it becomes very much not
theoretical at all. See Exhibit A: These GitHub workflow runs failed
because one of the two competing `make` processes tried to remove the
temporary file when the other process had already done so:

https://github.com/git-for-windows/git-sdk-32/actions/runs/12663456654
https://github.com/git-for-windows/git-sdk-32/actions/runs/12683174970
https://github.com/git-for-windows/git-sdk-64/actions/runs/12649348496

While it is undesirable to run this script over and over again,
certainly when this involves above-mentioned slow POSIX emulation layer,
the stage of the release cycle in which we are presently finding
ourselves does not lend itself to a re-design where this script could be
run once, and once only, but instead dictates that a quick and reliable
work-around be implemented that prevents the race condition without
changing the overall architecture of the build process.

This patch does that: By using a filename suffix for the temporary file
which is based on the currently-executing script's process ID, We
guarantee that the two competing invocations cannot overwrite or remove
each others' temporary files.

The filename suffix still ends in `+` to ensure that the temporary
artifacts are matched by the `*+` pattern in `.gitignore` that was added
in f9bbaa3 (Add intermediate build products to .gitignore,
2009-11-08).

Helped-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Build regression fix.

* js/git-version-gen-update:
  GIT-VERSION-GEN: allow it to be run in parallel
Last-minute fix for a regression in "git blame --abbrev=<length>"
when insane <length> is specified; we used to correctly cap it to
the hash output length but broke it during the cycle.

* ps/build-sign-compare:
  builtin/blame: fix out-of-bounds write with blank boundary commits
  builtin/blame: fix out-of-bounds read with excessive `--abbrev`
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.