-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide scratch/distroless base image #9029
Comments
@jannfis is there any reason we couldn't have a different base for the repo-server vs. the API server besides complexity? |
A breached Argo CD container is a serious issue. Once breached, a tiny amount of mis-configuration (OSWAP #5), means an attacker can very easily take down not just the cluster they're breached, but any cluster that Argo CD is configured to connect to. I struggle to see a more target rich environment. This seems like a straight-forward fix to really improve the security posture. |
@alexec I strongly agree. I've added it to the agenda for the next security meeting, which is in 5 days. https://docs.google.com/document/d/1nl1adCRf4tD87Y01i3JcrLcHvtm1O5a7slqzNstct94/edit |
Argo CD needs to have a high bar for security, after the Kuberenetes master, it's going to be the most attacked component. |
Adding this as a potential GHC bug. I think the argocd-application-controller version of this is pretty simple - create a new Dockerfile, build the controller without a distro, and then test it in a local deployment. Given enough time, the same change base image could be tested for the api-server. |
@crenshaw-dev Can you assign this to me? I currently have Do we want a separate image for each controller? Currently all the controllers, utils, and cli are compiled into one binary. It might make more sense to stick with one distroless image. |
@34fathombelow I think one distroless image is probably best. |
Any update on this? |
@PeterBennink I'm hoping to have this ready for Argo CD v2.6. I should have a PR and a proposal ready in the next few weeks. |
@34fathombelow how's it going on this? Would you like some help? I work for Chainguard and we have some distroless images that I think would be a good fit: https://github.com/chainguard-images/images/tree/main/images/static They are similar in nature to the Google distroless images but are easier to customise. |
@amouat Thank you for the inquiry regarding a distroless image. Currently I am still working on this and doing some research. At this moment we still require a few additional binaries to be installed. I love the work that you guys do over at Chainguard. So I will definitely be evaluating all possible options. Apko, Melange, and Wolfie are on my radar. |
Good to hear! We'd be happy to work with you to create a base image with the binaries you need. I'm adrian@chainguard.dev if you want to get in touch. |
whether to consider using alpine as the base image to reduce high-risk vulnerabilities @crenshaw-dev @34fathombelow |
I would like to share a couple of thought here. For creating a distroless image, there are a couple of showstoppers for now, as I've experienced. The first is Shell is worse. Every argocd service is being started with an Argocd needs the /etc/ssl ca certs, which can be just copied over from alpine or something. Regarding the plugin architecture, argocd depends on having a shell which pipes outputs to inputs between different commands. To have a distroless image, which does not include a shell, the plugin architecture needs to be able to work the plugin pipeline without a shell's pipelining operator. All the tools (helm, kustomize and argocd) have to be ensured to be statically linked, otherwise all the supporting libraries need to be present. This is not a huge deal, it's fairly easy to build the in the dockerfile as such. |
Awesome analysis, @gczuczy!
Is that true of all plugins, or is it only true if the plugin author chooses to use a shell? I think things will be simplified when we finally remove argocd-cm plugins, and all plugins run in sidecars (therefore with their own images). I think it might make sense to start by moving the application controller and applicationset controller to distroless. Then we can tackle the repo-server. |
@crenshaw-dev Thank you. From a security point of view, deprecating the plugin section in the appdef and moving the functionality over to a sidecar also only moves the issue, and doesn't solve it: there's still a requirement for a shell, and the plugin-bearing sidecar has the same issue. I've been thinking about this one, but I don't really have any concrete idea about it. The most I got to about this issue is, maybe having the plugins in some kind of prioritized chain-like system would solve it. For an example, first you have to execute the helm or kustomize mechanics, grab the output, pipe it over to another plugin-bearing sidecar which does its sing, and repeat the process. This way different plugins could be chained arbitrarily and you wouldn't have to have complex discovery rules in complex sidecars if you wish to use multiple plugins. However I think the majority of the userbase is only using some kind of secret-handling plugin that replaces placeholders with values from various secret stores. Also, some plugins can be run always, like placeholder-replacing ones. If there's nothing in the data to replace, then it's a simple noop effectively (though expensive cpu-wise). Pipeline-starting functionality like helm or kustomize has to come first, but argocd can pretty much detect those itself. That's just my tuppence on plugins, I don't even know how viable are these thoughts - I don't know the internal mechanics that well. And one more note on going distroless: gpg and alike functionalities. I'm not sure how easy it is to provide a statically linked gpg environment, from the far look of it, it's way more complicated than a tini (which has a predefined target in the CMakefile already), or golang projects (where it's just an envvar and a linker flag). If there's a go package to do the gpg operations with, I think that would pretty much solve all the hassle about the dynamically linked library, and would simplify the image a lot. |
I think it's a bit better than just moving the issue. Having a shell on the sidecar is better than having it on the repo-server for a few reasons:
But point taken, getting to a place where a shell is completely unnecessary would be ideal. |
@crenshaw-dev Yes, that's completely true, and the severity between the two environments are on a different level. I was focusing on an ideal and final goal. And I completely agree, it's a very much welcome first step to remove the dependency on a shell and related environments. I apologize for the misunderstanding. |
You're good! Very fair to note the problems that distroless on the repo-server won't solve, because those should still be considered. :-) |
Just for some context on the chainguard distroless images, it's easy for us to add a shell and any utilities you need, without pulling in any more than that. Would it be useful if I put together an image with bash for you? |
@amouat Thank you so much for following up. I spent a bit of time looking into Chainguard images. I had a quick question regarding the architectures you support. Because we require a few additional binaries we would need support for linux/s390x,linux/ppc64le. Is that possible with Chainguard images ? |
That's fair. I know nothing about statically building binaries, so I'll trust other folks' wisdom on whether it's a good solution for us. Sounds like "yes." |
You get problems with statically compiled binaries, too. For example, if a binary links a given version of the OpenSSL libraries statically, you lose track about this information (i.e. the version of OpenSSL this particular binary was linked against, or even that this binary includes OpenSSL). So when there's a vulnerability in one of the libraries that you statically linked into your binary, your image scanners won't trigger. You will have to setup watching for these elsewhere. It's an often overlooked risk. |
IMHO, distroless + statically compiled binaries may be worse from a security perspective. Usually, the tools that tell you whether your image has issues use some kind of information available from the image. For example, how does a scanner finds out what version of a particular tool is available in your image? I would assume most of them look into whatever package manager that image is using (apt, rpm, apk, etc). So for example, when you have a vulnerable Git version in your image that has been installed using the package manager for example, the scanner will easily figure out this fact and trigger an alert. The same goes for any dependencies of Git, in this particular example. distroless doesn't necessarily reduce security risks. It reduces attack vectors, by leaving out what's considered bloat. distroless with only your particular workload in it, that's great. And that's its intent, I would say. But once you're pulling in something else, for example, statically linked stuff that your workload needs, and that comes from elsewhere, you get a false sense of security. No scanner will trigger, but you will still have vulnerabilities in your image. They just go under the radar. |
That's fair. On the other hand, Argo CD, especially the repo-server, runs a lot of code based on a large variety of user inputs. I worry a lot more about someone using stuff on the Argo CD image to do bad things than I do about missing an out-of-date binary (especially when the list of binaries is short, intuitive, and visible: e.g. git, kustomize, helm - and can be added to our SBOMs relatively easily). |
Is our SBOM generated for all layers of our image build process, or only the final image? Because with statically compiled binaries, very important information gets lost in the final image. And I'm not talking about the version of the tool that ends up in the final image, but the dependencies that are now baked in, instead of being just referenced. So lets take git as an example: If git is visible in the SBOM, that's great. But if its dependencies, such as libopenssl and libcurl (which are now directly baked into the binary and not available anywhere else in the final image) are not in the SBOM, that's bad. When there is a vulnerability in either of those two libraries, this information won't be surfaced. That's where I see the risk, that we should consider when going forward with distroless and static binaries. |
We use |
Another shameless plug for the wolfi based images - we generate the SBOMs as part of the build, and also fill up the package manager manifest accurately so scanners can find everything. We already have a lot of the argocd stuff packaged, we should be able to get a POC together here soon to show what it would actually look like in practice. cc @eddiezane |
that's the major difference between distroless and wolfi. where one generate sbom because it build everything from source vs current approach is trying to extract sbom from the built container image. sbom from source is of higher quality. |
Sounds like wolfi image might be pretty straightforward. Hardest part being setting up the CI part to build/push it and docs to let folks know it's an experimental option. |
That sounds truly awesome. I'm all for a PoC of this. |
I'll work on a PR, unless @dlorenc have something ready to submit already :) @dlorenc I think argo-cd team would prefer to have packaging done here so they have complete control how it's built? (melange pkg+ apko) |
@tuananh yep, I think that would be preferable! |
Has there been any traction with the distroless image lately? I'm especially going after a argocd base image which would have the dependency on BerkeleyDB completely removed due to licensing reasons: #11305 |
I'm definitely happy to help out with moving to a Wolfi/Chainguard image, but I'm pretty flat out until after KubeCon. |
@amouat that would be great! I think we'd want to build tine wofli image in parallel to the Ubuntu image so that people can opt for whichever image they prefer. |
is there any PR already on this, we can track? 🤔 |
Looking at this now, should have something to share shortly @xavier83 |
I wrote something up at #16481. It's not a distroless build, but it is considerably smaller than the current build. I think it may make a good middle ground to begin with. A distroless build is possible, but it's difficult with Dockerfiles (there was already a rejected proposal using apko https://github.com/tuananh/distroless-argocd). The Helm and Kustomize dependencies are easy as they are already statically compiled and can be just copied over. git and gpg are more difficult. There are also two wrapper scripts that would need to be investigated to see if we could remove them. (Taking a quick look I think the scripts could be removed by just calling git/gpg directly from Go, which would probably be an improvement). This isn't production ready, I was hoping to get some thoughts/feedback before going further. |
If a PR falls in the forest, is there anyone to share thoughts/feedback? I'd love to see a distroless build of ArgoCD happen! We are currently jumping through hoops to get the BerkeleyDB dependency removed from the ArgoCD image: #11305 |
Hi @amouat @crenshaw-dev Is anyone else also working on this issue for IBM arch like ppc64le ? and also let us know how we can help. |
@sahu-apoorva most Chainguard images are only built for aarch64 and x86_64. That being said, the "static" image does have a ppc64le build. The above PR uses the wolfi-base image as a halfway-house to a completley distroless build, which doesn't support ppc64le. Supporting ppc64le would mean either building wolfi-base for ppc64le or completing the move to a completely distroless image on top of static (and cross-compiling argo & related tooling for ppc64le) |
@amouat @crenshaw-dev |
Current Workflows maintainer here 👋 Coming from a SIG Security meeting earlier today.
Notably, Workflows has a substantially more limited use of
💯 I think this is an optimal solution. Ideally, once
As a separate, but related goal, I might recommend splitting up the binaries and images as well to reduce the attack surface. They do not all have the same deps etc. Workflows has 3 images,
To clarify on this topic, all scanners that I know of do this at minimum. But some naive scanners do only this (personally, I would not recommend such scanners). Such naive scanners are generally incompatible with most distroless images already, since they lack a package manager. More intelligent scanners will scan the whole image's FS for binaries, hashes, etc. |
Wolfi images do have an apk package database (but no package manager) by default. This means they work with the majority of scanners (even "naive" ones ;) ). |
🙊 |
One does not simply provide a distroless base image, it seems. We are also post processing the argocd base image in order to remove all Oracle licensed BerkeleyDB (libdb) packages coming from Ubuntu, due to their incompatible license in the enterprise context. |
Currently, Argo CD uses
ubuntu:21.10
as a base image.Should an attacker gain access to the container they'll have a shell to use. They'll also have access to the Kubernetes API. Currently, they would not be able to install any apps (because run-as-non-root), but they do have
git
. So they could clone a repository with thekubectl
binary installed. At this point they would able to make API requests.For the API server, I'm not clear why it would need
git
installed (or any other binary), but maybe I'm just missing something.Using a scratch or distroless image would improve security posture.
The text was updated successfully, but these errors were encountered: