-
-
Notifications
You must be signed in to change notification settings - Fork 328
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix 32-bit test failures and test workspace on a 32-bit target on CI #1687
Conversation
This reverts commit c8c25c1.
Instead of i686-unknown-linux-gnu.
`test-32bit` fails in the job for the `i686-unknown-linux-musl` target since changing `-p gix-hashtable` to `-p gix` in the test step because various data structure size assertions fail, which is because some data structures are smaller on 32-bit archictectures than on 64-bit architectures. That is unrelated to the commented problem of environment not being passed into the test environment, which does not happen on `i686-unknown-linux-musl`. But maybe it would happen on `armv7-linux-androideabi`, due to its use of cross. The i686 job, which doesn't require emulation, is faster, and when it fails due to those assertions, it keeps the armv7 job from getting far enough to reveal if it has its own environment-related (or other) failures. This temporarily sets `fail-fast: false` so that the armv7 job can get far enough that we find out.
This completes the TODO about investigating what was thought to be an environment error, possibly because it references `LD_PRELOAD`. We can't run `cross test -p gix --target armv7-linux-androideabi` because the armv7 `bash` that runs the test fixture scripts (whose archives are `.gitignore`d) finds the `git` command on the runner, which is an amd64 build, and tries to run it as though it is a dynamically linked armv7 executable. This relates to the use of `cross`; no analogous problem occurs with the i686 builds, which are able to run the amd64 `git` avaialble on the system. There are various ways to try to fix this, but for now we do not have any such fix and this is putting it back to testing with `-p gix-hashtable` on armv7 (while still using `-p gix` on i686), and also replacing the old TODO comment with comments on the newly introduced matrix variable about why `gix` is tested on one target and `gix-hashtable` is tested on another.
The test failures that appeared with `i686-unknown-linux-musl` are assertions about the size of data structures, where the actual sizes on 32-bit targets are smaller. But these are not the only such failing assertions in the test suite: local testing with `i686-unknown-linux-gnu` on a 32-bit x86 Debian 12 system, and with `i686-pc-windows-msvc` on a 32-bit x86 Windows 10 system, reveals that there are significantly more such failures. About 20 tests usually fail on these platforms, with most failures being of data structure size assertions. To catch such failures when they arise as regressions, it would be useful to have CI coverage of most of the test suite on some 32-bit target. Since no emulation is needed to run i686 binaries on amd64 CI runners, let try `i686-unknown-linux-musl` for this. This doesn't include running tests with `GIX_TEST_IGNORE_ARCHIVES`, because a test that specifically attempts to exercise fixture scripts should run in an environment where the tools being called, including `git`, and `bash` itself, are 32-bit builds. (That could probably be achieved with a `container` job.)
The previous approach did not work because musl dependencies were not installed. (32-bit `-gnu` dependencies would also not be installed, in that non-container environment.) Since this is switching to a container, it may as well target the OS of the container. The previous reasoning for not (also?) having this 32-bit test use `GIX_TEST_IGNORE_ARCHIVES=1` no longer applies, but that is not done at this time.
One possible problem is that this is actually the downstream Debian build of Node 18, which may have security patches, but which does not have any features or breaking changes new in Node 20, and also which will not report itself as being Node 20. (GitHub Actions switched from Node 16 to Node 20, so I don't think using older actions versions would necessarily help.)
This is a workaround for a problem that is the same or similar to actions/checkout#334: Run actions/checkout@v4 /usr/bin/docker exec a7f94cb30b99bd1ee851270fad4768421eddecd7e8fa674f56c4f2d2d64210b6 sh -c "cat /etc/*release | grep ^ID" exec /__e/node20/bin/node: no such file or directory I had not originally noticed the `cat /etc/*release` part of that output, and had surmised that this was a subtly different problem, but it looks like it's probably exactly the same. The workaround in actions/checkout#334 (comment) is applied in this commit and seems to be working. As noted there, the absence of some 64-bit libraries in 32-bit containers prevents the (existing and already mapped) 64-bit Node.js outside the container from being run, and installing 64-bit Node.js in the container via `apt-get` (even though, in this case, it is a different version) installs the needed dependencies, letting the (existing) Node.js run. (Other problems, related to `safe.directory` protections as well as tests that are known to fail due to assertions about the size of data structures that implicitly assume 64-bit targets, do still prevent the job from succeeding. But `actions/checkout@v4`, which was failing before, seems to be working, as do the other actions.)
Installing `nodejs:amd64` in a 32-bit container to work around actions/checkout#334 works by causing 64-bit libraries the external mapped `node20` needs (even though `nodejs:amd64` in Debian is Node 18), as described in actions/checkout#334 (comment). But this installed more packages than are needed. It turns out that, at least with this image, the only libraries the container is missing are the 32-bit libstdc++ and its dependencies. It is therefore sufficient to install the `libstdc++6:amd64` package. This commit makes that change, squashing a few investigatory steps that led to it (#5). * See if libssl3:amd64 is all we need * Temporarily omit even libssl3:amd64 With only `libssl3:amd64` and its dependencies, `actions/checkout` did fail, but with a different error than before, showing a clear error about a missing library, `libstdc++`. /usr/bin/docker exec c1e55a75713e2c4fd08241fae9f4fecbe3cbb179be45174b3138390a1238090e sh -c "cat /etc/*release | grep ^ID" /__e/node20/bin/node: error while loading shared libraries: libstdc++.so.6: cannot open shared object file: No such file or directory Before trying adding that, I want to check that enabling `amd64` in `dpkg` has the expected *no effect* on the error messages, compared to not enabling it. * Install libstdc++6:amd64 in the container This does not add back libssl3:amd64 yet, in case it is not needed, though I expect that it will be needed.
This uses the same array+comment style as in 8c71bd1 (GitoxideLabs#1672).
And add a fixme about how the test suite could be improved to not require this.
Taking ownership of cloned files in the container only fixed one failing test, `gix-url::baseline run`. This had been failing with a git `safe.directory` error in the the gix-url `make_baseline.sh` fixture script. That failure was also reproduced locally by recursively `chown`ing the cloned files to another user (while preserving write permissions for the first user via the group). Because the tests shouldn't unnecessarily depend on starting out in a repository (nor anything about the outer repository they start in), this should probably be considered a test bug that is not specific to the current container setup on CI that triggered it. This commit undoes the `chown -R` step in the CI workflow that had previously worked around this, and instead fixes it by having the gix-url `make_baseline.sh` fixture script run its `git fetch-pack` commands in a temporary repo created in a subdirectory of the fixture directory (and deleted afterwards).
This creates `/etc/gitconfig` in the container and populates it with a single dummy configuration variable, so that the `system` scope will exist and tests of "GitInstallation"-related functionality will be able to pass. This is intended to let the following tests pass in the container: - gix-config-tests::config source::git_config_no_system - gix-path env::git::tests::exe_info::tolerates_broken_temp - gix-path env::git::tests::exe_info::tolerates_git_config_env_var - gix-path env::git::tests::exe_info::tolerates_oversanitized_env - gix-path env::git::tests::exe_info::with_unmodified_environment - gix-path::path env::installation_config Those tests have been failing in the container on CI, and in local container environments that are also Debian 12 and created through a similar procedure. But they are not among the tests that fail in a non-container 32-bit Debian 12 system, probably because that test system has user config in the `global` scope. When that is the highest scope, it is currently treated as "GitInstallation", even though it is not truly associated with the Git installation. The change in this commit is a workaround *specific* to one CI job: - In at least the case of `git_config_no_system`, it seems like the test should automatically handle the case of there being no scope detected as "GitInstallation". For the others, it may be reasonable for them to fail in a CI environment that does not support robustly testing the scenarios they pertain to, but there should probably be a way to suppress or weaken them on systems that have no scopes that are intended to be detected as "GitInstallation", while still testing "GitInstallation" functionality as robustly as it can be tested under those conditions. This commit does not fix that. - On macOS with Apple Git, the `unknown` scope higher than `system` is meant to be detected as "GitInstallation" in the usual case that it is nonempty. On other systems, `system` is the highest scope that is expected to exist. On some systems (besides macOS), the `system` scope is often empty, in which case the `global` scope, if nonempty, will be detected as "GitInstallation". But allowing the `global` scope, which in practice is rarely if ever associated with the installation itself (it is "global" to the user, and still separate from the `system` scope in a per-user Git installation), to be detected as a "GitInstallation" has never been guaranteed, and may be changed. Ever since the CVE-2024-45305 fix in GitoxideLabs#1523 and follow-up in GitoxideLabs#1567, `local` and `worktree` scopes are never detected as "GitInstallation", so it looks like there would be no harm, and some benefit, to excluding the `global` scope as well. This commit does not make that change. - If/when the change of excluding `global` from consideration for "GitInstallation" is made, tests that require a variable to exist in an installation-associated scope will start to fail regularly on some developers' machines, similar to how they have been failing in the container. This commit does not address that; it only adds a configuration variable in the `system` scope in the container.
This compares using `==` on 64-bit targets and `<=` on 32-bit targets. As noted in the documentation comment, when assertions about data stuructures' sizes are being done to safeguard against them growing too big, then it may be acceptable to use `<=` if the structure is smaller on 32-bit targets, but it is still valuable to be able to use `==` on 64-bit targets in the same assertions, since this guards against a data structure becoming smaller, other changes causing the smaller size to be important for memory usage or speed, but then the data structure growing again, up to its original size. An unconditional `<=` will not catch this, while `size_ok` usually will. A related reason to do a `==` on 64-bit systems is so that the expected value being compared to remains tied to the code. It can otherwise become unclear what the expected value's significance is and whether it ought to be updated.
Some of the assertions comparing the sizes of data structures to expected values have been using `==` (unconditionally). Of those, most but not all are mainly to safeguard against data structures growing larger. For those, this loosens the assertions on 32-bit targets to use `<=`, while still having them use `==` on 64-bit targets. The new `gix_testtools::size_ok` function is used to help with this. See the `size_ok` documentation comment for a rationale as to why it is done this way (and why this approach is not taken with assertions that seem intended to do more than keep the size from growing too large or without being noticed). This is to allow the following 18 tests to pass in the container (and hopefully on 32-bit systems in general): - gix-attributes::attributes search::size_of_outcome - gix-index extension::tree::tests::size_of_tree - gix-index size_of_entry - gix-index-tests::integrate index::size_of_entry - gix-negotiate::negotiate size_of_entry - gix-pack cache::delta::tests::size_of_pack_tree_item - gix-pack cache::delta::tests::size_of_pack_verify_data_structure - gix-pack cache::delta::tree::tests::size_of_pack_tree_item - gix-pack cache::delta::tree::tests::size_of_pack_verify_data_structure - gix-pack data::file::decode::entry::tests::size_of_decode_entry_outcome - gix-pack-tests::pack pack::data::output::size_of_count - gix-pack-tests::pack pack::data::output::size_of_entry - gix-pack-tests::pack pack::iter::size_of_entry - gix-ref raw::tests::size_of_reference - gix-revwalk::revwalk graph::commit::size_of_commit - gix::gix object::object_ref_size_in_memory - gix::gix object::oid_size_in_memory - gix::gix status::index_worktree::iter::item_size This splits a couple of test cases into two, where the added code would otherwise make them less readable. The duplicated tests described in GitoxideLabs#1685 are among those modified here, so this exacerbates that duplication, in that there is more duplicated code, but the duplicated test cases are still easy to resolve, once it is clear which modules they should be in. This adds "FIXME" comments to identify the duplication.
The previous commit added `gix-testtools` (by relative path) as a dev dependency of `gix-index`, to use `gix_testtools::size_ok`. Because `gix-testtools` itself depends on `gix-index` -- at an earlier version to not break releasing with csr (see discussion in GitoxideLabs#1510 for general info) -- this causes `cargo`, when running in the top level workspace directory, to consider `-p gix-index` without an explicit version to be ambiguous. This made the full CI `test` job fail when the `check` recipe attempts to run `cargo check` on `gix-index`, with the message error: There are multiple `gix-index` packages in your project, and the specification `gix-index` is ambiguous. Please re-run this command with one of the following specifications: gix-index@0.33.1 gix-index@0.36.0 error: Recipe `check` failed on line 87 with exit code 101 where the line number is from the `justfile`. To fix this, this changes the command to change to the `gix-index` directory instead of passing `-p gix-index`. (This technique is used elsewhere in the same recipe already.)
Most `assert_eq!` data structure size assertions that had failed on 32-bit targets were to safeguard against the size increasing, so using a `==` comparison on 64-bit targets and a `<=` comparison to the same value on 32-bit targets was a suitable fix, which allowed 18 tests to go from failing to passing in the 32-bit container. However, four remaining tests that assert sizes of buffers or streamed data are not just for flagging increases, but are instead cases where a change in either direction, on any platform, should probably draw attention. The most important of these to keep `==`, even on 32-bit platforms, may be two tests that assert the same value for complementary operations: - gix::gix repository::worktree::archive - gix::gix repository::worktree::stream But I think it's useful to do this in the two others, too: - gix-archive::archive from_tree::basic_usage_internal - gix-worktree-stream::stream from_tree::will_provide_all_information_and_respect_export_ignore This commit adds constants with different 64-bit and 32-bit values for the three buffer lengths asserted in those four tests (the two `gix::gix repository::worktree::*` assert the same length). This change should make the four remaining tests that had been failing on the 32-bit container, and 32-bit GNU/Linux targets in general, pass on them. In specifying the expected 32-bit buffer lengths, I have treated these as approval tests, rather than inferring that the exact length and no other length must be correct. The asserted lengths: - Appeared as the value when the tests were run in the 32-bit container. - Are verified to appear repeatedly across different runs in the container as well as in runs on a non-containerized 32-bit Debian 12 system. - Are equal to the values in assertion messages for the same failed tests run on a 32-bit Windows 10 system. This is significant because some values differ between different 32-bit targets, even targets for the same 32-bit architecture, and two such targets where I've observed differences are `i686-unknown-linux-gnu` and `i686-pc-windows-msvc`. - Are, of course, less than the 64-bit sizes, not greater. - Make sense on inspection. (But please note that this is not the same as an analysis that proves they are the only possible correct values.)
This removes one of each pair of equivalent tests duplicated within `gix-index` and (separately) within `gix-pack`. See GitoxideLabs#1685.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for improving 32 bit support!
Various other assertions of data structure size already do
<=
comparisons. I think most of those should be changed to the approach in (1)--and perhaps some even to (2)--but that change is not necessary to get tests to run on CI or anywhere, so I haven't included it in this PR.
Thank you - not only does that save time, but it's likely these started out with equality comparisons, but ended up a bit more flexible due to frequent failures on various platforms.
[about reliance on the parent
gitoxide
repository] More often, I think this coupling is entirely unintended.
I think so, too!
git fetch-pack
commands that were run to produce baseline output--in a fixture script that run even withoutGIX_TEST_IGNORE_ARCHIVES
--were actually operations on thegitoxide
repository itself.
That's probably also a great example for how easily one could accidentally create a huge tar
file, after fetching the parent repository without noticing it happened. Even though in this case it appears git fetch-pack
was called merely to invoke --diag-url
, which is probably mostly unrelated.
Because of the way ownership works with
actions/checkout
in jobs that use acontainer
key, unless customized orchown
ed, the owner and process UIDs differ. This could be worked around withchown
(47f3819), but after verifying that works I instead fixed the bug in the fixture script by introducing a temporary nesting repository forfetch-pack
to operate on (d07c32d).
Thank you.
Something I found generally interesting was that 32bit data structures were only about 20% smaller, not at all close to 50% like one would intuitively expect.
Because of the way sizes can shrink and later grow again, as described in 77c3c59, I think it's probably worthwhile to bring back equality comparisons for them on 64-bit systems, while still using (Whether However, that only applies if the size is expected to be the same on different 64-bit targets.
Although I can imagine that happening, the more likely kind of problem, with the kinds of coupling I've observed here, would be fetching something into the gitoxide repository. (That's what the code changed here had, notionally, done; but, as you mention, nothing was actually ever fetched.) As far as I know, the outer gitoxide repository is never added as a remote of a test repository.
Although it may be unintuitive, it seems that, at least in the data structures in gitoxide that have tests with size assertions, the "lower level" a data structure is, the less it is likely to shrink going from 64-bit to 32-bit targets:
But there may also be correlation between a data structure holding values of fixed size and having its size considered important enough to be asserted in a test. |
Thanks for elaborating! Now that I read it it makes perfect sense and I am feeling very naive for having that intuition, without even realising (or caring to think about) the most obvious possible reason for this 🤦♂️. |
I should mention--so as to avoid confusion for anyone looking at this later--that in most cases I have not checked the reason for specific data structure size differences. Sometimes the difference is small or zero because the data structure consists mostly or entirely of scalar types whose sizes are not architecture dependent. And it may be that this account for the entire effect. But another possible contributing factor is packing: some data structures may have less efficient layouts on 32-bit architectures, with holes in some places, to satisfy alignment requirements, to satisfy an ABI convention, or because the Rust compiler decides to lay them out that way. I found this to be why some data structures are different sizes on different 32-bit targets, though I didn't spend time examining all their layouts. |
Currently, on 32-bit systems, 19 to 22 tests fail because they assert that data structures have exactly the size they are known to have on 64-bit systems, which the
test-32bit
job does not catch because it only runs tests ingix-hashtable
.At one time,
test-32bit
had included an i686 job, but this was removed because of problems related to glibc versioning in the image used bycross
. This brings that back, splits it out to own job--naming the new jobtest-32bit
and renaming thecross
-using armv7 jobtest-32bit-cross
--and solves that problem for it by using a 32-bit Debian container for the job with thecontainer
key rather than usingcross
.This reveals a few bugs, which this PR includes fixes or partial workarounds for.
Data structure size assertions
As noted above, 22 tests of data structure size have been failing on 32-bit GNU/Linux systems. (Only 19 of the 22 have been failing on 32-bit Windows.) Of these:
<=
comparisons on 32-bit targets.Various other assertions of data structure size already do
<=
comparisons. I think most of those should be changed to the approach in (1)--and perhaps some even to (2)--but that change is not necessary to get tests to run on CI or anywhere, so I haven't included it in this PR.Even as used in this PR, it is not immediately obvious that the approach in (1) makes sense, because it is in principle possible for different 64-bit targets to have differently sized data structures. This does happen on 32-bit targets--sometimes even for the same architecture--which is why three fewer of these tests fail on 32-bit Windows systems than 32-bit GNU/Linux. However, the sizes have all been the same on four separate 64-bit architectures tested:
So at least for now I think it's reasonable to continue viewing our data structures as generally not varying in size across 64-bit targets.
Coupling to gitoxide's own repo
The gitoxide test suite currently depends, I believe mostly unintentionally, on the
gitoxide
repository itself existing and having various properties. Various fixture scripts and tests fail if there is no outer repository; or if the outer repository is blocked by configured ceiling directories; or if the tests are run from a linked worktree, rather than its main worktree; or ifgit
operations cannot be performed on the outer repository due tosafe.directory
errors.As far as I know, the only coupling to the
gitoxide
repository that is firmly intended in the design of the test suite is the use of.gitignore
information to determine which fixtures are allowed to generate archives. A few uses of the outer repository seem intentional, but not essential to the design, and they can be fixed using a fixture with nested repositories, decoupling completely fromgitoxide
's own repository. More often, I think this coupling is entirely unintended.I haven't documented this in full detail yet, and maybe it should have an issue, but see this gist and this repo. (The latter is a full repo rather than a gist to facilitate visual diff via the web editor, and because it benefits from a non-flat directory structure.)
Fortunately, for this 32-bit Debian container on CI, only one case of this had to be addressed, largely because
GIX_TEST_IGNORE_ARCHIVES
is not being used.git fetch-pack
commands that were run to produce baseline output--in a fixture script that run even withoutGIX_TEST_IGNORE_ARCHIVES
--were actually operations on thegitoxide
repository itself.Because of the way ownership works with
actions/checkout
in jobs that use acontainer
key, unless customized orchown
ed, the owner and process UIDs differ. This could be worked around withchown
(47f3819), but after verifying that works I instead fixed the bug in the fixture script by introducing a temporary nesting repository forfetch-pack
to operate on (d07c32d).Tests of "GitInstallation" when it is empty
Independent of the above, six tests failed in the container due to weaknesses of tests I introduced in #1567: when there is no identifiable "GitInstallation" scope, some of the tests of "GitInstallation" scope detection fail. This was an intentional decision at the time, but less sound than I had thought.
My thinking in #1567 had been that it will be natural to fix this at the same time as we exclude the
global
scope from ever being detected as "GitIntallation" (or if for some reason that is not to be done then the reason it shouldn't be done would inform how the tests should be improved). However, I had not anticipated that there were already prominent cases where these tests regularly fail even though there is no bug in the code under test. One such case is running the tests in a mostly minimal container that hasgit
.See 178cf25 for details. That commit works around the problem by defining a variable in the
system
scope in the container. It also describes the current situation in detail and describes the significance of likely future developments that would improve or worsen the situation.This bug is not related to the container being 32-bit, just to it not containing
system
orglobal
configuration (and not being macOS with Apple Git, thus not having a highunknown
scope) as is typical of containers.Duplicated tests
Some data structure size tests were duplicated within their package, as described in #1685. Code duplication would be increased by the changes here, since the tests use more code (see the note at the bottom of fc13fc3). The deduplication resolving #1685 is therefore also included in this PR (dc0a73a).