Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v6.20.0 container images errors when pulling #2348

Closed
echoix opened this issue Feb 11, 2023 · 14 comments · Fixed by #2434 or #2435 · May be fixed by roava/rova-engineering-playbook#1, roava/rova-engineering-playbook#3 or roava/rova-engineering-playbook#4
Labels
bug Something isn't working

Comments

@echoix
Copy link
Collaborator

echoix commented Feb 11, 2023

Describe the bug
A clear and concise description of what the bug is.
When pulling the latest beta images (as of 2023-02-11) on Gitpod, docker doesn't complete and shows an error when extracting a layer.

To Reproduce
Steps to reproduce the behavior:

  1. Go to Gitpod.
  2. In the bash terminal, run docker pull oxsecurity/megalinter:beta or docker pull oxsecurity/megalinter-go:beta (was https://hub.docker.com/layers/oxsecurity/megalinter-go/beta/images/sha256-8fab0a400aa67089841912c597863e78a871c14bb6f1b26d57d947ca2b1c2807?context=explore and https://hub.docker.com/layers/oxsecurity/megalinter/beta/images/sha256-43fb5d5d36b4d623be36ea8c46958ed517e230fe17707093454e317d2edae384?context=explore) at time of bug
  3. See error when extracting a layer that has node_modules.

Expected behavior
A clear and concise description of what you expected to happen.
Docker images can be pulled and run without any errors.

Screenshots
If applicable, add screenshots to help explain your problem.
image

failed to register layer: ApplyLayer exit status 1 stdout: stderr: lchown /node-deps/node_modules/ast-types-flow/lib/types.js: invalid argument

Additional context
Add any other context about the problem here.
Discovered in #2318

There seems to be a relation to UIDs, lchown, and maybe rootless containers. moby/moby#43576

In the go flavor, the error message was failed to register layer: ApplyLayer exit status 1 stdout: stderr: lchown /node-deps/node_modules/character-parser/LICENSE: invalid argument

@echoix echoix added the bug Something isn't working label Feb 11, 2023
@echoix echoix changed the title Beta (2023-02-11container images errors when pulling Beta (2023-02-11) container images errors when pulling Feb 11, 2023
@echoix
Copy link
Collaborator Author

echoix commented Feb 11, 2023

Oops I pressed enter when writing the title and it sent.

@echoix
Copy link
Collaborator Author

echoix commented Feb 11, 2023

Running the commands:

cat /etc/subuid
cat /etc/subgid

gave:
image

(.venv) gitpod /workspace/megalinter (dev/remove-apk-go) $ cat /etc/subgid
gitpod:100000:65536
nixbld1:165536:65536
nixbld2:231072:65536
nixbld3:296608:65536
nixbld4:362144:65536
nixbld5:427680:65536
nixbld6:493216:65536
nixbld7:558752:65536
nixbld8:624288:65536
nixbld9:689824:65536
nixbld10:755360:65536
nixbld11:820896:65536
nixbld12:886432:65536
nixbld13:951968:65536
nixbld14:1017504:65536
nixbld15:1083040:65536
nixbld16:1148576:65536
nixbld17:1214112:65536
nixbld18:1279648:65536
nixbld19:1345184:65536
nixbld20:1410720:65536
nixbld21:1476256:65536
nixbld22:1541792:65536
nixbld23:1607328:65536
nixbld24:1672864:65536
nixbld25:1738400:65536
nixbld26:1803936:65536
nixbld27:1869472:65536
nixbld28:1935008:65536
nixbld29:2000544:65536
nixbld30:2066080:65536
(.venv) gitpod /workspace/megalinter (dev/remove-apk-go) $ cat /etc/subuid
gitpod:100000:65536
nixbld1:165536:65536
nixbld2:231072:65536
nixbld3:296608:65536
nixbld4:362144:65536
nixbld5:427680:65536
nixbld6:493216:65536
nixbld7:558752:65536
nixbld8:624288:65536
nixbld9:689824:65536
nixbld10:755360:65536
nixbld11:820896:65536
nixbld12:886432:65536
nixbld13:951968:65536
nixbld14:1017504:65536
nixbld15:1083040:65536
nixbld16:1148576:65536
nixbld17:1214112:65536
nixbld18:1279648:65536
nixbld19:1345184:65536
nixbld20:1410720:65536
nixbld21:1476256:65536
nixbld22:1541792:65536
nixbld23:1607328:65536
nixbld24:1672864:65536
nixbld25:1738400:65536
nixbld26:1803936:65536
nixbld27:1869472:65536
nixbld28:1935008:65536
nixbld29:2000544:65536
nixbld30:2066080:65536

In here, containers/podman#2542, at containers/podman#2542 (comment) and containers/podman#2542 (comment), it seems we can do something in the image

@Kurt-von-Laven
Copy link
Collaborator

I was able to reproduce this issue on the current beta javascript flavor when using rootless Docker on Linux. Noting for myself that this issue doesn't reproduce on Windows according to #2318 (comment).

@echoix
Copy link
Collaborator Author

echoix commented Mar 4, 2023

I kinda have an idea for a temporary workaround in order to be able to release, without having finished finding the problem and fixing it. I was still searching this afternoon, but at somewhere I read something and tried it and it was working.

So if we build the image in an already restricted environment, the UID:GID problem can't arise, (since it won't be mapped to an unavailable range? I don't know). But I reconfirmed the bug in another gitpod, and I'm able to build CI-light and it works.
All the tools I know of to explore the images, to see where the bad UIDs are, are unusable on images that can't be pulled, even on other platforms, or docker extensions, or online. Dive and slim.ai. It's because they are themselves a container mounting another maybe?

@echoix
Copy link
Collaborator Author

echoix commented Mar 4, 2023

I have to go for now, but there's a ghcr package on my profile (it doesn't appear on my megalinter fork).

@Kurt-von-Laven
Copy link
Collaborator

I am still hitting this when using rootless Docker in the javascript flavor of MegaLinter v6.20.0.

@Kurt-von-Laven Kurt-von-Laven pinned this issue Mar 6, 2023
@Kurt-von-Laven Kurt-von-Laven changed the title Beta (2023-02-11) container images errors when pulling v6.20.0 container images errors when pulling Mar 6, 2023
@echoix
Copy link
Collaborator Author

echoix commented Mar 6, 2023

@nvuillam could we add a section to release notes with know issues, and maybe direct users to add info or resolution steps/ideas here?

@nvuillam
Copy link
Member

nvuillam commented Mar 6, 2023

We can pin an issue at the top of the repo

I'm currently solving #2427

@echoix
Copy link
Collaborator Author

echoix commented Mar 6, 2023

The circle CI link in the error message of #2429 is quite useful. But I didn’t find a way yet. I know there’s something in the node_modules. Adding the chown root:root as the circle ci help page could be a bazooka way for a quick fix. I saw it when searching, but I wasn’t sure of the consequences. It might be another thing to fix when we want to be less dependent on the root user (the issues where we can’t remove a folder/file that was linted)

@echoix
Copy link
Collaborator Author

echoix commented Mar 6, 2023

I suspect the workaround for #2348 is to use rootful rather than rootless Docker, although I realize that may be too much security risk to be acceptable for some. I wonder if we can use git bisect to narrow down the commit at which some of our issues were introduced, but I'm not sure what command to run at each git bisect in order to create a local MegaLinter image for testing purposes. I tried docker build ., but this relies on GitHub Actions environment variables, which makes me wonder if I should try doing this with act?

Originally posted by @Kurt-von-Laven in #2431 (comment)

Well, at least I know that the problem was started at the date I filed the issue (2023-02-11) and I'm pretty confident that betas from the previous weekend were ok. It is about in the same days that the build-push-action was added. But even if it is a big change, I don't think that it's the only culprit. Since we were already using buildx before that. So for the bisect, you could try before the #2342 was merged. But since we don't have the package-lock.json files stored in CI, I'm not sure that you could recreate the exact image.
One PR on interest is #2343, where --force was added to npm install. Maybe look at the build output.

@Kurt-von-Laven
Copy link
Collaborator

That was more or less the range I intended to bisect, although more specific, which is helpful. I don't know what commands to run at each bisection point though. I figure if the bisection claims the v6.19.0 release was bad, we will at least have proven by contradiction that the issue cropped up through an unpinned dependency (or possibly something else but likely unrelated to any specific recent PR in our repo).

@echoix
Copy link
Collaborator Author

echoix commented Mar 7, 2023

Well, I have proven in the issue description (the screenshot) that the v6.19.0 was correct. And at multiple independent tries afterwards, on multiple platforms and methods, that the betas were failing, but v6.19.0 was correct. And as of 2022-03-04, v6.19.0 and v6.18.0 were correct.

@echoix
Copy link
Collaborator Author

echoix commented Mar 7, 2023

Ok, what about this as a bazooka temporary fix: Qusic/SmartBoy@1b7c1f8

@Kurt-von-Laven
Copy link
Collaborator

Impressive find, @echoix. What led you to the solution?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment