Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: improve parsing of Package.gz metadata, matching Debian parser logic #43

Merged

Conversation

alexconrey
Copy link
Contributor

@alexconrey alexconrey commented Apr 29, 2024

Problem

On Ubuntu 24.04, and likely other distributions - a Package is able to be published to Ubuntu repositories with non-standard formatting.

Example (package btm):

% grep 'pool/universe/b/btm/btm_0.9.6-4_amd64.deb' ~/Downloads/Packages\ 3 -A10 -B14
Package: btm
Architecture: amd64
Version: 0.9.6-4
Built-Using: rust-ahash-0.7 (= 0.7.7-2), rust-nix (= 0.26.2-1), rust-option-ext (= 0.2.0-1), rustc (= 1.75.0+dfsg0ubuntu1-0ubuntu1)
Multi-Arch: allowed
Priority: optional
Section: universe/utils
Origin: Ubuntu
Maintainer: Ubuntu Developers <ubuntu-devel-discuss@lists.ubuntu.com>
Original-Maintainer: Jonas Smedegaard <dr@jones.dk>
Bugs: https://bugs.launchpad.net/ubuntu/+filebug
Installed-Size: 4799
Depends: libc6 (>= 2.38), libgcc-s1 (>= 4.2)
Suggests: bash-completion
Filename: pool/universe/b/btm/btm_0.9.6-4_amd64.deb
Size: 1607224
MD5sum: 5af6a25fa3b1bb766aecbc7a290670e7
SHA1: 5a8e3563ffc958d5a7b1dff8b97300eb03b1cd02
SHA256: 62b3c95436097e45edeebd72396831938df40de055d2e0dd9fcf276639314799
SHA512: 29806c67d9f74461eedadb488a94bf66d478cda15f33c44fbbc2c710902455fff875d38cada331c1aea84fc2fc6d7bf80aecb0c415273a70751a7ce543ff5518
Homepage: https://clementtsang.github.io/bottom
Description: customizable graphical process/system monitor for the terminal
X-Cargo-Built-Using:
 rust-addr2line (= 0.21.0-2), rust-adler (= 1.0.2-2), rust-ahash-0.7 (= 0.7.7-2), rust-aho-corasick (= 1.1.2-1), rust-anstream (= 0.6.7-1), rust-anstyle (= 1.0.4-1), rust-anstyle-parse (= 0.2.1-1), rust-anstyle-query (= 1.0.0-1), rust-anyhow (= 1.0.75-1), rust-assert-cmd (= 2.0.12-1), rust-backtrace (= 0.3.69-2), rust-bitflags-1 (= 1.3.2-5), rust-bitflags (= 2.4.2-1), rust-bstr (= 1.7.0-2build1), rust-cassowary (= 0.3.0-2), rust-cfg-if (= 1.0.0-1), rust-clap-builder (= 4.4.18-1), rust-clap (= 4.4.18-1), rust-clap-lex (= 0.6.0-2), rust-colorchoice (= 1.0.0-1), rust-concat-string (= 1.0.1-1), rust-crossbeam-deque (= 0.8.5-1), rust-crossbeam-epoch (= 0.9.18-1), rust-crossbeam-utils (= 0.8.19-1), rust-crossterm (= 0.27.0-3), rust-ctrlc (= 3.4.2-1), rust-difflib (= 0.4.0-1), rust-dirs (= 5.0.1-1), rust-dirs-sys (= 0.4.1-1), rust-doc-comment (= 0.3.3-1), rust-either (= 1.9.0-1), rust-float-cmp (= 0.9.0-1), rust-getrandom (= 0.2.10-1), rust-gimli (= 0.28.1-2), rust-hashbrown (= 0.12.3-1), rust-humantime (= 2.1.0-1), rust-indexmap (= 1.9.3-2), rust-itertools (= 0.10.5-1), rust-itoa (= 1.0.9-1), rust-kstring (= 2.0.0-1), rust-lazycell (= 1.3.0-3), rust-libc (= 0.2.152-1), rust-libloading (= 0.7.4-1), rust-linux-raw-sys (= 0.4.12-1), rust-lock-api (= 0.4.11-1), rust-log (= 0.4.20-2), rust-memchr (= 2.7.1-1), rust-miniz-oxide (= 0.7.1-1), rust-mio (= 0.8.10-1), rust-nix (= 0.26.2-1), rust-normalize-line-endings (= 0.3.0-1), rust-num-traits (= 0.2.15-1), rust-nvml-wrapper (= 0.9.0-1), rust-nvml-wrapper-sys (= 0.7.0-1), rust-object (= 0.32.2-1), rust-once-cell (= 1.19.0-1), rust-option-ext (= 0.2.0-1), rust-parking-lot-core (= 0.9.9-1), rust-parking-lot (= 0.12.1-2build1), rust-predicates-core (= 1.0.6-1), rust-predicates (= 3.0.3-1), rust-predicates-tree (= 1.0.7-1), rust-ratatui (= 0.23.0-4), rust-rayon-core (= 1.12.1-1), rust-rayon (= 1.8.1-1), rust-regex-automata (= 0.4.3-1build2), rust-regex (= 1.10.2-2build2), rust-regex-syntax (= 0.8.2-1), rust-rustc-demangle (= 0.1.21-1), rust-rustix (= 0.38.30-1), rust-scopeguard (= 1.1.0-1), rust-serde (= 1.0.195-1), rust-serde-spanned (= 0.6.4-1), rust-signal-hook (= 0.3.17-1), rust-signal-hook-mio (= 0.2.3-2), rust-signal-hook-registry (= 1.4.0-1), rust-smallvec (= 1.11.2-1), rust-starship-battery (= 0.8.2-1), rust-static-assertions (= 1.1.0-1), rust-strsim (= 0.10.0-1), rust-strum (= 0.25.0-1), rust-sysinfo (= 0.28.4-4), rust-terminal-size (= 0.3.0-2), rust-termtree (= 0.4.1-1), rust-thiserror (= 1.0.50-1), rust-time-core (= 0.1.1-1), rust-time (= 0.3.23-2), rust-toml-datetime (= 0.6.5-1), rust-toml-edit (= 0.21.0-2), rust-typenum (= 1.16.0-2), rust-unicode-segmentation (= 1.10.1-1), rust-unicode-width (= 0.1.11-1), rust-uom (= 0.35.0-1), rust-utf8parse (= 0.2.1-1), rust-wait-timeout (= 0.2.0-1), rust-winnow (= 0.5.15-1), rustc (= 1.75.0+dfsg0ubuntu1-0ubuntu1),
Description-md5: e39e31ca350d6a0cb1ee1479936064f3

The current parser logic of rules_distroless assumes that all keys will be followed immediately with a value which is not a newline. In this case, the improper formatting here implies an empty value set for X-Cargo-Built-Using and then tries to do key/val parsing logic on the list of dependencies used to compile the package in question.

This results in the following error when running:

INFO: Repository ubuntu_noble_resolution instantiated at:
  mypath/WORKSPACE:3:10: in <toplevel>
  /home/ubuntu/.cache/bazel/_bazel_ubuntu/c72e88c157bca566b19afcd933de7ae4/external/rules_distroless/apt/index.bzl:65:17: in deb_index
Repository rule deb_resolve defined at:
  /home/ubuntu/.cache/bazel/_bazel_ubuntu/c72e88c157bca566b19afcd933de7ae4/external/rules_distroless/apt/private/resolve.bzl:135:30: in <toplevel>
ERROR: An error occurred during the fetch of repository 'ubuntu_noble_resolution':
   Traceback (most recent call last):
        File "/home/ubuntu/.cache/bazel/_bazel_ubuntu/c72e88c157bca566b19afcd933de7ae4/external/rules_distroless/apt/private/resolve.bzl", line 74, column 33, in _deb_resolve_impl
                pkgindex = package_index.new(rctx, sources = sources, archs = manifest["archs"])
        File "/home/ubuntu/.cache/bazel/_bazel_ubuntu/c72e88c157bca566b19afcd933de7ae4/external/rules_distroless/apt/private/package_index.bzl", line 73, column 33, in _create
                _parse_package_index(state, rctx.read(output), arch, url)
        File "/home/ubuntu/.cache/bazel/_bazel_ubuntu/c72e88c157bca566b19afcd933de7ae4/external/rules_distroless/apt/private/package_index.bzl", line 36, column 26, in _parse_package_index
                (key, value) = line.split(": ", 1)
Error: too few values to unpack (got 1, want 2)
ERROR: mypath/WORKSPACE:3:10: fetching deb_resolve rule //external:ubuntu_noble_resolution: Traceback (most recent call last):
        File "/home/ubuntu/.cache/bazel/_bazel_ubuntu/c72e88c157bca566b19afcd933de7ae4/external/rules_distroless/apt/private/resolve.bzl", line 74, column 33, in _deb_resolve_impl
                pkgindex = package_index.new(rctx, sources = sources, archs = manifest["archs"])
        File "/home/ubuntu/.cache/bazel/_bazel_ubuntu/c72e88c157bca566b19afcd933de7ae4/external/rules_distroless/apt/private/package_index.bzl", line 73, column 33, in _create
                _parse_package_index(state, rctx.read(output), arch, url)
        File "/home/ubuntu/.cache/bazel/_bazel_ubuntu/c72e88c157bca566b19afcd933de7ae4/external/rules_distroless/apt/private/package_index.bzl", line 36, column 26, in _parse_package_index
                (key, value) = line.split(": ", 1)
Error: too few values to unpack (got 1, want 2)
ERROR: no such package '@@ubuntu_noble_resolution//': too few values to unpack (got 1, want 2)
ERROR: /home/ubuntu/.cache/bazel/_bazel_ubuntu/c72e88c157bca566b19afcd933de7ae4/external/ubuntu_noble/BUILD.bazel:2:6: @@ubuntu_noble//:lock depends on @@ubuntu_noble_resolution//:lock in repository @@ubuntu_noble_resolution which failed to fetch. no such package '@@ubuntu_noble_resolution//': too few values to unpack (got 1, want 2)
ERROR: Analysis of target '@@ubuntu_noble//:lock' failed; build aborted: Analysis failed
INFO: Elapsed time: 17.174s, Critical Path: 0.14s
INFO: 1 process: 1 internal.
ERROR: Build did NOT complete successfully
ERROR: Build failed. Not running target

Solution

This PR complements the issue I've opened up here: #42

This PR sanitizes package metadata when parsing the Packages.gz archive on a given Debian/Ubuntu repositiory to allow for newlines and other characters which may follow a : in keyval definition.

Verification Evidence

rules_distroless$ git diff origin/main -- apt/
diff --git a/apt/private/package_index.bzl b/apt/private/package_index.bzl
index 620a021..191d18c 100644
--- a/apt/private/package_index.bzl
+++ b/apt/private/package_index.bzl
@@ -28,7 +28,17 @@ def _parse_package_index(state, contents, arch, root):
                 pkg[last_key] += "\n" + line
                 continue
 
-            (key, value) = line.split(": ", 1)
+            # This allows for (more) graceful parsing of Package metadata (such as X-* attributes)
+            # which may contain patterns that are non-standard. This logic is intended to closely follow
+            # the Debian team's parser logic:
+            # * https://salsa.debian.org/python-debian-team/python-debian/-/blob/master/src/debian/deb822.py?ref_type=heads#L788
+            split = line.split(":")
+            key = split[0]
+            value = ""
+
+            if len(split) == 2:
+                value = split[1]
+
             if not last_key and len(pkg) == 0 and key != "Package":
                 fail("do not expect this. fix it.")
$ bazel run --sandbox_debug --override_repository=rules_distroless=$HOME/git/alexconrey/rules_distroless  @ubuntu_noble//:lock
INFO: Analyzed target @@ubuntu_noble//:lock (91 packages loaded, 742 targets configured).
INFO: Found 1 target...
Target @@ubuntu_noble_resolution//:lock up-to-date:
  bazel-bin/external/ubuntu_noble_resolution/lock
INFO: Elapsed time: 3.277s, Critical Path: 0.13s
INFO: 5 processes: 5 internal.
INFO: Build completed successfully, 5 total actions
INFO: Running command line: bazel-bin/external/ubuntu_noble_resolution/lock external/ubuntu_noble_resolution/lock.json

Writing lockfile to ubuntu/24.04/packages.lock.json

Run the following command to add the lockfile or pass --autofix flag to do it automatically.

   buildozer set lock @@//ubuntu/24.04:packages.lock.json WORKSPACE.bazel:ubuntu_noble

Copy link

google-cla bot commented Apr 29, 2024

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@alexconrey
Copy link
Contributor Author

I've just signed the CLA on behalf of myself for my GitHub user. Let me know if there's anything I can do to get this moved along!

@thesayyn
Copy link
Collaborator

Could you provide more information on why this is necessary?

@alexconrey
Copy link
Contributor Author

@thesayyn certainly! I have updated the PR to contain more details around the nature of this fix. I did include a bit more information in my issue that I am happy to copy/paste over here for completeness.

@thesayyn
Copy link
Collaborator

Okay, ignoring the lines start with X sounds like workaround. looks like this is successfully parsed by other parsers, we should fix the parser to allow there to be nothing after the :.

https://salsa.debian.org/python-debian-team/python-debian/-/blob/master/src/debian/deb822.py?ref_type=heads#L788

https://regex101.com/r/BrBMqo/1

@alexconrey
Copy link
Contributor Author

Agreed on it being more of a workaround. Let me see if I can wrestle this logic into something similar to those links you've referenced.

@thesayyn
Copy link
Collaborator

Something like this should work:

split = line.split(":")
key = split[0]
value = ""
if len(split) == 2: 
   value = split[1]

@alexconrey
Copy link
Contributor Author

Thanks @thesayyn ! That was an immense help. I was able to get a successful run against my lockfile with those changes. If you wouldn't mind glancing at this again, it would be appreciated! The only addition I added to your logic was the stripping of leading whitespace. I thought it was causing issues when running locally, but that was not the case. It seems like a good practice to strip that out, but happy to remove that logic if necessary.

@alexconrey alexconrey force-pushed the aconrey/drop-x-package-metadata branch from ed7f4e6 to b3b6ef4 Compare April 29, 2024 18:13
@alexconrey alexconrey changed the title fix: drop X-* package metadata from Packages.gz which can lead to parser failures fix: improve parsing of Package.gz metadata, matching Debian parser logic Apr 29, 2024
@thesayyn
Copy link
Collaborator

Looks great, stripping sounds good but i am afraid it might break some other code path as parsing doesn't have a unit test for edge cases. Can we remove that part?

@alexconrey
Copy link
Contributor Author

Removed!

@thesayyn
Copy link
Collaborator

thesayyn commented Apr 29, 2024

Some buildifier errors on CI, you can run: buildifier <file_changed.bzl>

@alexconrey
Copy link
Contributor Author

It does seem that I am able to reproduce the whitespace issue consistently (making me think I fell victim to caching on my prior comment.) I am now cleaning my workspace as a means of verification:

without whitespace trim:

$ bazel clean --async --expunge; bazel run --sandbox_debug --override_repository=rules_distroless=$HOME/git/alexconrey/rules_distroless  @ubuntu_noble//:lock
INFO: Starting clean.
INFO: Output base moved to /home/ubuntu/.cache/bazel/_bazel_ubuntu/c72e88c157bca566b19afcd933de7ae4_tmp_444983_6dfe942f-cdb5-494e-93d1-ebd46e153625 for deletion
Starting local Bazel server and connecting to it...
INFO: Repository ubuntu_noble_resolution instantiated at:
  mypath/WORKSPACE:3:10: in <toplevel>
  /home/ubuntu/.cache/bazel/_bazel_ubuntu/c72e88c157bca566b19afcd933de7ae4/external/rules_distroless/apt/index.bzl:65:17: in deb_index
Repository rule deb_resolve defined at:
  /home/ubuntu/.cache/bazel/_bazel_ubuntu/c72e88c157bca566b19afcd933de7ae4/external/rules_distroless/apt/private/resolve.bzl:135:30: in <toplevel>
ERROR: An error occurred during the fetch of repository 'ubuntu_noble_resolution':
   Traceback (most recent call last):
        File "/home/ubuntu/.cache/bazel/_bazel_ubuntu/c72e88c157bca566b19afcd933de7ae4/external/rules_distroless/apt/private/resolve.bzl", line 92, column 21, in _deb_resolve_impl
                fail("Unable to locate package `%s`" % dep_constraint)
Error in fail: Unable to locate package `ncurses-base`
ERROR: mypath/WORKSPACE:3:10: fetching deb_resolve rule //external:ubuntu_noble_resolution: Traceback (most recent call last):
        File "/home/ubuntu/.cache/bazel/_bazel_ubuntu/c72e88c157bca566b19afcd933de7ae4/external/rules_distroless/apt/private/resolve.bzl", line 92, column 21, in _deb_resolve_impl
                fail("Unable to locate package `%s`" % dep_constraint)
Error in fail: Unable to locate package `ncurses-base`
ERROR: no such package '@@ubuntu_noble_resolution//': Unable to locate package `ncurses-base`
ERROR: /home/ubuntu/.cache/bazel/_bazel_ubuntu/c72e88c157bca566b19afcd933de7ae4/external/ubuntu_noble/BUILD.bazel:2:6: @@ubuntu_noble//:lock depends on @@ubuntu_noble_resolution//:lock in repository @@ubuntu_noble_resolution which failed to fetch. no such package '@@ubuntu_noble_resolution//': Unable to locate package `ncurses-base`
ERROR: Analysis of target '@@ubuntu_noble//:lock' failed; build aborted: Analysis failed
INFO: Elapsed time: 27.970s, Critical Path: 0.21s
INFO: 1 process: 1 internal.
ERROR: Build did NOT complete successfully
ERROR: Build failed. Not running target

with whitespace trim:

$ bazel clean --async --expunge; bazel run --sandbox_debug --override_repository=rules_distroless=$HOME/git/alexconrey/rules_distroless  @ubuntu_noble//:lock
INFO: Starting clean.
INFO: Output base moved to /home/ubuntu/.cache/bazel/_bazel_ubuntu/c72e88c157bca566b19afcd933de7ae4_tmp_443909_835417bd-715d-4781-91b6-8ff2b659b67e for deletion
Starting local Bazel server and connecting to it...
DEBUG: /home/ubuntu/.cache/bazel/_bazel_ubuntu/c72e88c157bca566b19afcd933de7ae4/external/rules_distroless/apt/private/package_resolution.bzl:141:22: Warning: optional dependencies are not supported yet. https://github.com/GoogleContainerTools/rules_distroless/issues/27
DEBUG: /home/ubuntu/.cache/bazel/_bazel_ubuntu/c72e88c157bca566b19afcd933de7ae4/external/rules_distroless/apt/private/resolve.bzl:96:22: the following packages have unmet dependencies: ,awk
DEBUG: /home/ubuntu/.cache/bazel/_bazel_ubuntu/c72e88c157bca566b19afcd933de7ae4/external/rules_distroless/apt/private/resolve.bzl:96:22: the following packages have unmet dependencies: 
DEBUG: /home/ubuntu/.cache/bazel/_bazel_ubuntu/c72e88c157bca566b19afcd933de7ae4/external/rules_distroless/apt/private/resolve.bzl:96:22: the following packages have unmet dependencies: 
DEBUG: /home/ubuntu/.cache/bazel/_bazel_ubuntu/c72e88c157bca566b19afcd933de7ae4/external/rules_distroless/apt/private/package_resolution.bzl:141:22: Warning: optional dependencies are not supported yet. https://github.com/GoogleContainerTools/rules_distroless/issues/27
DEBUG: /home/ubuntu/.cache/bazel/_bazel_ubuntu/c72e88c157bca566b19afcd933de7ae4/external/rules_distroless/apt/private/resolve.bzl:96:22: the following packages have unmet dependencies: ,,,
DEBUG: /home/ubuntu/.cache/bazel/_bazel_ubuntu/c72e88c157bca566b19afcd933de7ae4/external/rules_distroless/apt/private/resolve.bzl:96:22: the following packages have unmet dependencies: ,
INFO: Analyzed target @@ubuntu_noble//:lock (91 packages loaded, 742 targets configured).
INFO: Found 1 target...
Target @@ubuntu_noble_resolution//:lock up-to-date:
  bazel-bin/external/ubuntu_noble_resolution/lock
INFO: Elapsed time: 38.654s, Critical Path: 0.18s
INFO: 5 processes: 5 internal.
INFO: Build completed successfully, 5 total actions
INFO: Running command line: bazel-bin/external/ubuntu_noble_resolution/lock external/ubuntu_noble_resolution/lock.json

Writing lockfile to ubuntu/24.04/packages.lock.json

Run the following command to add the lockfile or pass --autofix flag to do it automatically.

   buildozer set lock @@//ubuntu/24.04:packages.lock.json WORKSPACE.bazel:ubuntu_noble

@alexconrey
Copy link
Contributor Author

Alright, we should be good for (another) review. Thanks for the whitespace spot, I would have been here for a few more hours 😝

Buildifier is showing no changes after making these updates, CI should hopefully reflect the same.

@thesayyn thesayyn merged commit b706a0a into GoogleContainerTools:main Apr 30, 2024
7 checks passed
@thesayyn
Copy link
Collaborator

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants