-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update commons-compress to 1.26.1 #22213
Conversation
The CI failure looks related to my hack at https://cs.opensource.google/bazel/bazel/+/master:src/main/java/com/google/devtools/build/lib/bazel/repository/CompressedTarFunction.java;l=206;drc=10169bbe66e818868ec37f7d853ab5a567cd2ced. Let me know if you want me to take a look. |
@fmeum yes, that’d be helpful, thanks. What i know so far:
It seems plausible we could fix the second thing by contributing a new constructor to ArchiveInputStream, but otherwise out of ideas. |
Turns out this was only an issue in Bazel's bootstrap script. Could you try adding this patch? diff --git a/scripts/bootstrap/compile.sh b/scripts/bootstrap/compile.sh
index 3c49679e1a..dcc7e7f846 100755
--- a/scripts/bootstrap/compile.sh
+++ b/scripts/bootstrap/compile.sh
@@ -155,12 +155,17 @@ function create_deploy_jar() {
local output=$3
shift 3
local packages=""
- for i in $output/classes/*; do
+ # Only keep the services subdirectory of META-INF (needed for AutoService).
+ for i in $output/classes/META-INF/*; do
local package=$(basename $i)
- if [[ "$package" != "META-INF" ]]; then
- packages="$packages -C $output/classes $package"
+ if [[ "$package" != "services" ]]; then
+ rm -r "$i"
fi
done
+ for i in $output/classes/*; do
+ local package=$(basename $i)
+ packages="$packages -C $output/classes $package"
+ done
log "Creating $name.jar..."
echo "Main-Class: $mainClass" > $output/MANIFEST.MF |
@mark-thm The |
@fmeum there’s a difftest for that file that fails without the change. Haven’t looked at why the update causes it to be required just yet. |
via jdeps for commons-compress and its dependencies commons-io, commons-codec, and commons-lang3:
I don't think Bazel is using these packages. I see there's a denylist and could instead add these classes to the denylist, but unsure what the protocol is here. |
Yeah, maybe try that first. We can be pretty confident that Bazel tests are at least loading all packages that Bazel depends on. |
@fmeum it looks like deny-listing the classes that were pulling in java.beans/java.desktop has reined in the bloat from ~50MB to around 3MB -- and weirdly the 3MB only seems to show up as an overage on MacOS. |
In case it's helpful:
|
Good point, the mailing list post referenced in the JIRA issue actually refers to the unrelated 651, not 654. |
Before I started down the path, I repeated the failing test linked at the top of COMPRESS-654 and then updated commons-compress to 1.26.1 to find that with new commons-compress I was able to successfully extract ruff. I figured I'd start with the simple bump before getting into test authoring, the bump seemed easier to get going with than the test. Live and learn. Re: Adding a test:
Maybe this one isn't a good starter contrib? |
Running git diff | cat
diff --git a/src/test/shell/bazel/bazel_workspaces_test.sh b/src/test/shell/bazel/bazel_workspaces_test.sh
index 8056929902..7f562bd0a0 100755
--- a/src/test/shell/bazel/bazel_workspaces_test.sh
+++ b/src/test/shell/bazel/bazel_workspaces_test.sh
@@ -504,6 +504,22 @@ function test_extract_default_zip_non_ascii_utf8_file_names() {
ensure_output_contains_exactly_once "external/repo/out_dir/Ä_foo_∅.txt" "bar"
}
+function test_sparse_tar() {
+ set_workspace_command "
+ repository_ctx.download_and_extract(
+ url='https://github.com/astral-sh/ruff/releases/download/v0.1.6/ruff-aarch64-apple-darwin.tar.gz',
+ sha256='0b626e88762b16908b3dbba8327341ddc13b37ebe6ec1a0db3f033ce5a44162d',
+ )"
+
+ build_and_process_log --exclude_rule "repository @@local_config_cc"
+
+ ensure_contains_exactly 'location: .*repos.bzl:3:25' 1
+ ensure_contains_atleast 'context: "repository @@repo"' 2
+ ensure_contains_exactly 'download_and_extract_event' 1
+
+ [[ -f $(bazel info output_base)/external/repo/ruff-aarch64-apple-darwin/ruff ]] || fail "Expected ruff binary to be extracted"
+}
+
function test_file() {
set_workspace_command 'repository_ctx.file("filefile.sh", "echo filefile", True)' |
But this works, so it looks like we need to do more to pick up the fix: #!/usr/bin/env bash
set -o errexit -o nounset
echo "Downloading commons-compress"
wget https://repo1.maven.org/maven2/org/apache/commons/commons-compress/1.26.1/commons-compress-1.26.1.jar
wget https://repo1.maven.org/maven2/commons-io/commons-io/2.16.1/commons-io-2.16.1.jar
echo "Downloading sample sparse archive"
wget https://github.com/astral-sh/ruff/releases/download/v0.1.6/ruff-aarch64-apple-darwin.tar.gz
gunzip ruff-aarch64-apple-darwin.tar.gz
echo "Testing with system tar"
tar -tf ruff-aarch64-apple-darwin.tar
echo "Testing with commons-compress"
java -cp commons-compress-1.26.1.jar:commons-io-2.16.1.jar org.apache.commons.compress.archivers.Lister ruff-aarch64-apple-darwin.tar |
Yep, that's what I tried before getting started. |
ensure_contains_atleast 'context: "repository @@repo"' 2 | ||
ensure_contains_exactly 'download_and_extract_event' 1 | ||
|
||
[[ -f "$(bazel info output_base)/external/repo/ruff" ]] || fail "Expected ruff binary to be extracted" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the ruff tgzs have no prefix folder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice! thanks for the fix.
just realized I had this comment on draft, which may or may not be correct/useful... but leaving it here for posterity:
|
I believe your comment is correct but, as the tests show, it's not required to switch to TarFile to pick up the improvement. I also looked through the other compression formats and whether a similar switch from the compression's native InputStream impls to the commons-compress ones was required, and found that for Zstd there's no option to flip, and for Xz we are already using the correct option. This PR fixes both gz and bz2 formats, the latter since we were here anyway. |
@bazel-io fork 7.2.0 |
What’s the procedure for actually getting this merged? Also, what’s your preferred approach at this point for resolving module lockfile conflicts … should I just continue rebasing? |
No action required on your side. We'll import this into Google's codebase, and then Copybara will copy it back out. We'll fix the lockfile during the import. |
Fixes #20269. Update commons-compress to 1.26.1 and swap use of GZIPInputStream to commons-compress' GzipCompressorInputStream, which [deals correctly with concatenated gz files](https://github.com/apache/commons-compress/blob/53c5e19208caaf63946a41d2763cda1f1b7eadc8/src/main/java/org/apache/commons/compress/compressors/gzip/GzipCompressorInputStream.java#L38-L70). Add a test to demonstrate this fixes the ruff extraction (thanks, fmeum) and update all related lockfiles. Closes #22213. PiperOrigin-RevId: 631509796 Change-Id: I4038244bfbdfbace747554e988587663ca580c16
Fixes bazelbuild#20269. Update commons-compress to 1.26.1 and swap use of GZIPInputStream to commons-compress' GzipCompressorInputStream, which [deals correctly with concatenated gz files](https://github.com/apache/commons-compress/blob/53c5e19208caaf63946a41d2763cda1f1b7eadc8/src/main/java/org/apache/commons/compress/compressors/gzip/GzipCompressorInputStream.java#L38-L70). Add a test to demonstrate this fixes the ruff extraction (thanks, fmeum) and update all related lockfiles. Closes bazelbuild#22213. PiperOrigin-RevId: 631509796 Change-Id: I4038244bfbdfbace747554e988587663ca580c16
Fixes #20269.
Update commons-compress to 1.26.1 and swap use of GZIPInputStream to commons-compress' GzipCompressorInputStream, which deals correctly with concatenated gz files. Add a test to demonstrate this fixes the ruff extraction (thanks, fmeum) and update all related lockfiles.