Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide scratch/distroless base image #9029

Open
alexec opened this issue Apr 7, 2022 · 70 comments
Open

Provide scratch/distroless base image #9029

alexec opened this issue Apr 7, 2022 · 70 comments
Assignees
Labels
component:api API bugs and enhancements enhancement New feature or request security Security related type:security Something is not secure

Comments

@alexec
Copy link
Contributor

alexec commented Apr 7, 2022

Currently, Argo CD uses ubuntu:21.10 as a base image.

Should an attacker gain access to the container they'll have a shell to use. They'll also have access to the Kubernetes API. Currently, they would not be able to install any apps (because run-as-non-root), but they do have git. So they could clone a repository with the kubectl binary installed. At this point they would able to make API requests.

For the API server, I'm not clear why it would need git installed (or any other binary), but maybe I'm just missing something.

Using a scratch or distroless image would improve security posture.

@alexec alexec added enhancement New feature or request security Security related component:api API bugs and enhancements labels Apr 7, 2022
@crenshaw-dev
Copy link
Member

@jannfis is there any reason we couldn't have a different base for the repo-server vs. the API server besides complexity?

@alexec
Copy link
Contributor Author

alexec commented May 5, 2022

A breached Argo CD container is a serious issue. Once breached, a tiny amount of mis-configuration (OSWAP #5), means an attacker can very easily take down not just the cluster they're breached, but any cluster that Argo CD is configured to connect to. I struggle to see a more target rich environment.

This seems like a straight-forward fix to really improve the security posture.

@crenshaw-dev
Copy link
Member

@alexec I strongly agree. I've added it to the agenda for the next security meeting, which is in 5 days. https://docs.google.com/document/d/1nl1adCRf4tD87Y01i3JcrLcHvtm1O5a7slqzNstct94/edit

@alexec
Copy link
Contributor Author

alexec commented May 5, 2022

Argo CD needs to have a high bar for security, after the Kuberenetes master, it's going to be the most attacked component.

@crenshaw-dev crenshaw-dev added the GraceHopperOSD2022 Good for 2022 Grace Hopper Celebration Open Source Day label Jun 13, 2022
@crenshaw-dev
Copy link
Member

Adding this as a potential GHC bug. I think the argocd-application-controller version of this is pretty simple - create a new Dockerfile, build the controller without a distro, and then test it in a local deployment. Given enough time, the same change base image could be tested for the api-server.

@34fathombelow
Copy link
Member

@crenshaw-dev Can you assign this to me? I currently have argocd-application-controller & argocd-applicationset-controller running distroless.

Do we want a separate image for each controller? Currently all the controllers, utils, and cli are compiled into one binary. It might make more sense to stick with one distroless image.

@crenshaw-dev
Copy link
Member

@34fathombelow I think one distroless image is probably best.

@alexec alexec changed the title Provide to scratch/distroless base image Provide scratch/distroless base image Aug 11, 2022
@PeterBennink
Copy link

Any update on this?

@crenshaw-dev crenshaw-dev removed the GraceHopperOSD2022 Good for 2022 Grace Hopper Celebration Open Source Day label Oct 7, 2022
@34fathombelow
Copy link
Member

34fathombelow commented Oct 11, 2022

Any update on this?

@PeterBennink I'm hoping to have this ready for Argo CD v2.6. I should have a PR and a proposal ready in the next few weeks.

@amouat
Copy link

amouat commented Jan 9, 2023

@34fathombelow how's it going on this? Would you like some help? I work for Chainguard and we have some distroless images that I think would be a good fit: https://github.com/chainguard-images/images/tree/main/images/static

They are similar in nature to the Google distroless images but are easier to customise.

@34fathombelow
Copy link
Member

@amouat Thank you for the inquiry regarding a distroless image. Currently I am still working on this and doing some research. At this moment we still require a few additional binaries to be installed. I love the work that you guys do over at Chainguard. So I will definitely be evaluating all possible options. Apko, Melange, and Wolfie are on my radar.

@amouat
Copy link

amouat commented Jan 20, 2023

Good to hear!

We'd be happy to work with you to create a base image with the binaries you need.

I'm adrian@chainguard.dev if you want to get in touch.

@fengshunli
Copy link
Member

whether to consider using alpine as the base image to reduce high-risk vulnerabilities @crenshaw-dev @34fathombelow

@gczuczy
Copy link
Contributor

gczuczy commented Feb 6, 2023

I would like to share a couple of thought here.

For creating a distroless image, there are a couple of showstoppers for now, as I've experienced.

The first is cp and a shell. Cp is need for the initcontainers, and it's a really minor thing. If the need for it cannot be eliminated, then busybox has a statically linked cp

Shell is worse. Every argocd service is being started with an entrypoint.sh, for which the need could be eliminated by simply making having tini a requirement along with having it set as an entrypoint to the image. Apart from that, the current entrypoint has bashism set as an interpreter, which is absolutely not required, the function does not need any bash extensions, a basic statically linked POSIX sh from busybox does the job just as well, bash is absolutely not needed for the purpose, and it complicates things.

Argocd needs the /etc/ssl ca certs, which can be just copied over from alpine or something.

Regarding the plugin architecture, argocd depends on having a shell which pipes outputs to inputs between different commands. To have a distroless image, which does not include a shell, the plugin architecture needs to be able to work the plugin pipeline without a shell's pipelining operator.

All the tools (helm, kustomize and argocd) have to be ensured to be statically linked, otherwise all the supporting libraries need to be present. This is not a huge deal, it's fairly easy to build the in the dockerfile as such.

@crenshaw-dev
Copy link
Member

Awesome analysis, @gczuczy!

Regarding the plugin architecture, argocd depends on having a shell which pipes outputs to inputs between different commands.

Is that true of all plugins, or is it only true if the plugin author chooses to use a shell?

I think things will be simplified when we finally remove argocd-cm plugins, and all plugins run in sidecars (therefore with their own images).

I think it might make sense to start by moving the application controller and applicationset controller to distroless. Then we can tackle the repo-server.

@gczuczy
Copy link
Contributor

gczuczy commented Feb 6, 2023

@crenshaw-dev Thank you. From a security point of view, deprecating the plugin section in the appdef and moving the functionality over to a sidecar also only moves the issue, and doesn't solve it: there's still a requirement for a shell, and the plugin-bearing sidecar has the same issue. I've been thinking about this one, but I don't really have any concrete idea about it.

The most I got to about this issue is, maybe having the plugins in some kind of prioritized chain-like system would solve it. For an example, first you have to execute the helm or kustomize mechanics, grab the output, pipe it over to another plugin-bearing sidecar which does its sing, and repeat the process. This way different plugins could be chained arbitrarily and you wouldn't have to have complex discovery rules in complex sidecars if you wish to use multiple plugins. However I think the majority of the userbase is only using some kind of secret-handling plugin that replaces placeholders with values from various secret stores.

Also, some plugins can be run always, like placeholder-replacing ones. If there's nothing in the data to replace, then it's a simple noop effectively (though expensive cpu-wise). Pipeline-starting functionality like helm or kustomize has to come first, but argocd can pretty much detect those itself.

That's just my tuppence on plugins, I don't even know how viable are these thoughts - I don't know the internal mechanics that well.

And one more note on going distroless: gpg and alike functionalities. I'm not sure how easy it is to provide a statically linked gpg environment, from the far look of it, it's way more complicated than a tini (which has a predefined target in the CMakefile already), or golang projects (where it's just an envvar and a linker flag). If there's a go package to do the gpg operations with, I think that would pretty much solve all the hassle about the dynamically linked library, and would simplify the image a lot.

@crenshaw-dev
Copy link
Member

From a security point of view, deprecating the plugin section in the appdef and moving the functionality over to a sidecar also only moves the issue, and doesn't solve it: there's still a requirement for a shell, and the plugin-bearing sidecar has the same issue.

I think it's a bit better than just moving the issue. Having a shell on the sidecar is better than having it on the repo-server for a few reasons:

  1. the sidecar doesn't keep a static cache of all configured repositories, so there's less sensitive info to leak
  2. there's less code running on the sidecar, so less stuff to break
  3. the plugin author knows their threat environment better than we do, so they can craft the image in a way that fits their use case
  4. they may not even need a shell - their plugin can directly invoke commands (e.g. a go app) rather than use a shell

But point taken, getting to a place where a shell is completely unnecessary would be ideal.

@gczuczy
Copy link
Contributor

gczuczy commented Feb 6, 2023

@crenshaw-dev Yes, that's completely true, and the severity between the two environments are on a different level. I was focusing on an ideal and final goal. And I completely agree, it's a very much welcome first step to remove the dependency on a shell and related environments. I apologize for the misunderstanding.

@crenshaw-dev
Copy link
Member

You're good! Very fair to note the problems that distroless on the repo-server won't solve, because those should still be considered. :-)

@amouat
Copy link

amouat commented Feb 9, 2023

Just for some context on the chainguard distroless images, it's easy for us to add a shell and any utilities you need, without pulling in any more than that. Would it be useful if I put together an image with bash for you?

@34fathombelow
Copy link
Member

@amouat Thank you so much for following up. I spent a bit of time looking into Chainguard images. I had a quick question regarding the architectures you support. Because we require a few additional binaries we would need support for linux/s390x,linux/ppc64le. Is that possible with Chainguard images ?

@crenshaw-dev
Copy link
Member

it's easy for us to add a shell

@amouat I think our only needs for a shell are 1) entrypoint.sh and 2) plugins.

We're going to move plugins out of our images, so there will be no need for a shell then.

@gczuczy I believe indicated we can use tini without a shell.

So hopefully no shell. :-)

@crenshaw-dev
Copy link
Member

That's fair. I know nothing about statically building binaries, so I'll trust other folks' wisdom on whether it's a good solution for us. Sounds like "yes."

@jannfis
Copy link
Member

jannfis commented Jun 16, 2023

If you can solve with statically compiled binaries, you may get secure and compact image, without the headache of partially implemented/buggy Go libraries.

You get problems with statically compiled binaries, too. For example, if a binary links a given version of the OpenSSL libraries statically, you lose track about this information (i.e. the version of OpenSSL this particular binary was linked against, or even that this binary includes OpenSSL). So when there's a vulnerability in one of the libraries that you statically linked into your binary, your image scanners won't trigger. You will have to setup watching for these elsewhere. It's an often overlooked risk.

@jannfis
Copy link
Member

jannfis commented Jun 16, 2023

IMHO, distroless + statically compiled binaries may be worse from a security perspective.

Usually, the tools that tell you whether your image has issues use some kind of information available from the image. For example, how does a scanner finds out what version of a particular tool is available in your image? I would assume most of them look into whatever package manager that image is using (apt, rpm, apk, etc).

So for example, when you have a vulnerable Git version in your image that has been installed using the package manager for example, the scanner will easily figure out this fact and trigger an alert. The same goes for any dependencies of Git, in this particular example.

distroless doesn't necessarily reduce security risks. It reduces attack vectors, by leaving out what's considered bloat. distroless with only your particular workload in it, that's great. And that's its intent, I would say.

But once you're pulling in something else, for example, statically linked stuff that your workload needs, and that comes from elsewhere, you get a false sense of security. No scanner will trigger, but you will still have vulnerabilities in your image. They just go under the radar.

@crenshaw-dev
Copy link
Member

crenshaw-dev commented Jun 16, 2023

That's fair. On the other hand, Argo CD, especially the repo-server, runs a lot of code based on a large variety of user inputs. I worry a lot more about someone using stuff on the Argo CD image to do bad things than I do about missing an out-of-date binary (especially when the list of binaries is short, intuitive, and visible: e.g. git, kustomize, helm - and can be added to our SBOMs relatively easily).

@jannfis
Copy link
Member

jannfis commented Jun 16, 2023

Is our SBOM generated for all layers of our image build process, or only the final image?

Because with statically compiled binaries, very important information gets lost in the final image. And I'm not talking about the version of the tool that ends up in the final image, but the dependencies that are now baked in, instead of being just referenced.

So lets take git as an example: If git is visible in the SBOM, that's great. But if its dependencies, such as libopenssl and libcurl (which are now directly baked into the binary and not available anywhere else in the final image) are not in the SBOM, that's bad. When there is a vulnerability in either of those two libraries, this information won't be surfaced. That's where I see the risk, that we should consider when going forward with distroless and static binaries.

@crenshaw-dev
Copy link
Member

We use bom on the final image, so I kinda doubt it has insight into all build layers.

@dlorenc
Copy link
Contributor

dlorenc commented Jun 16, 2023

Another shameless plug for the wolfi based images - we generate the SBOMs as part of the build, and also fill up the package manager manifest accurately so scanners can find everything.

We already have a lot of the argocd stuff packaged, we should be able to get a POC together here soon to show what it would actually look like in practice.

cc @eddiezane

@tuananh
Copy link
Contributor

tuananh commented Jun 16, 2023

that's the major difference between distroless and wolfi. where one generate sbom because it build everything from source vs current approach is trying to extract sbom from the built container image.

sbom from source is of higher quality.

@crenshaw-dev
Copy link
Member

Sounds like wolfi image might be pretty straightforward. Hardest part being setting up the CI part to build/push it and docs to let folks know it's an experimental option.

@jannfis
Copy link
Member

jannfis commented Jun 16, 2023

we generate the SBOMs as part of the build, and also fill up the package manager manifest accurately so scanners can find everything.

That sounds truly awesome. I'm all for a PoC of this.

@tuananh
Copy link
Contributor

tuananh commented Jun 18, 2023

we generate the SBOMs as part of the build, and also fill up the package manager manifest accurately so scanners can find everything.

That sounds truly awesome. I'm all for a PoC of this.

I'll work on a PR, unless @dlorenc have something ready to submit already :)

@dlorenc I think argo-cd team would prefer to have packaging done here so they have complete control how it's built? (melange pkg+ apko)

@crenshaw-dev
Copy link
Member

@tuananh yep, I think that would be preferable!

@pre
Copy link

pre commented Oct 20, 2023

Has there been any traction with the distroless image lately?

I'm especially going after a argocd base image which would have the dependency on BerkeleyDB completely removed due to licensing reasons: #11305

@amouat
Copy link

amouat commented Oct 20, 2023

I'm definitely happy to help out with moving to a Wolfi/Chainguard image, but I'm pretty flat out until after KubeCon.

@crenshaw-dev
Copy link
Member

@amouat that would be great! I think we'd want to build tine wofli image in parallel to the Ubuntu image so that people can opt for whichever image they prefer.

@xavier83
Copy link

is there any PR already on this, we can track? 🤔

@amouat
Copy link

amouat commented Nov 28, 2023

Looking at this now, should have something to share shortly @xavier83

@amouat
Copy link

amouat commented Nov 29, 2023

I wrote something up at #16481.

It's not a distroless build, but it is considerably smaller than the current build. I think it may make a good middle ground to begin with.

A distroless build is possible, but it's difficult with Dockerfiles (there was already a rejected proposal using apko https://github.com/tuananh/distroless-argocd). The Helm and Kustomize dependencies are easy as they are already statically compiled and can be just copied over. git and gpg are more difficult. There are also two wrapper scripts that would need to be investigated to see if we could remove them. (Taking a quick look I think the scripts could be removed by just calling git/gpg directly from Go, which would probably be an improvement).

This isn't production ready, I was hoping to get some thoughts/feedback before going further.

@pre
Copy link

pre commented Dec 12, 2023

If a PR falls in the forest, is there anyone to share thoughts/feedback?

I'd love to see a distroless build of ArgoCD happen!

We are currently jumping through hoops to get the BerkeleyDB dependency removed from the ArgoCD image: #11305

@sahu-apoorva
Copy link

Hi @amouat @crenshaw-dev Is anyone else also working on this issue for IBM arch like ppc64le ? and also let us know how we can help.

@amouat
Copy link

amouat commented Jan 11, 2024

@sahu-apoorva most Chainguard images are only built for aarch64 and x86_64. That being said, the "static" image does have a ppc64le build. The above PR uses the wolfi-base image as a halfway-house to a completley distroless build, which doesn't support ppc64le. Supporting ppc64le would mean either building wolfi-base for ppc64le or completing the move to a completely distroless image on top of static (and cross-compiling argo & related tooling for ppc64le)

@sahu-apoorva
Copy link

@amouat @crenshaw-dev
I have tried to create wolfi-base image for ppc64le but encountered some error. I have started the discussion for the same on this discussion link :- #17246
I have mentioned the error in this discussion. Can you kindly provide some advice on this discussion link?

@agilgur5
Copy link
Contributor

Argo Workflows uses mainly distroless images for security. It was migrated to go-git and that was both costly and caused many bugs.

Current Workflows maintainer here 👋 Coming from a SIG Security meeting earlier today.
Alex C's time as Lead and this migration both predate me, but yes go-git changes probably had and still have the most regressions out of all.
The main changes to distroless and removing binaries occurred in argoproj/argo-workflows#8806 and argoproj/argo-workflows#8292.
go-git also recently became maintained again, so we switched from Argo's fork to upstream in argoproj/argo-workflows#12515, which did cause some regressions. Switching back to the git binary or something else has been proposed more than once as well: argoproj/argo-workflows#10091

go-git maintains a compatibility doc and has fallbacks to the git binary in some cases.

Notably, Workflows has a substantially more limited use of git than CD as it's only used for certain artifacts. Many Workflows users never even use git artifacts or artifacts at all (including me as a user). It's also ripe for being ripped out of Workflows core and into an artifact plugin, where users could use either go-git or the git binary or both or others, and any CVEs therein would not affect the core.
Comparatively, git is pretty foundational to CD; at least to the repo server.

I'm in favor of parallel, experimental builds of both -distroless and -wolfi. My guess is that some stuff will be broken in -wolfi and more stuff will be broken in -distroless, and we'll slowly fix problems in both.

💯 I think this is an optimal solution. Ideally, once -distroless is stable and well used, you could make that the default build and entirely remove/replace the ubuntu build.

Do we want a separate image for each controller? Currently all the controllers, utils, and cli are compiled into one binary. It might make more sense to stick with one distroless image.

As a separate, but related goal, I might recommend splitting up the binaries and images as well to reduce the attack surface. They do not all have the same deps etc. Workflows has 3 images, workflow-controller, argoexec, and argocli (the CLI includes the Server, which can run locally). More concretely, argoexec is the main one that needs git functionality for artifacts.

Usually, the tools that tell you whether your image has issues use some kind of information available from the image. For example, how does a scanner finds out what version of a particular tool is available in your image? I would assume most of them look into whatever package manager that image is using (apt, rpm, apk, etc).

To clarify on this topic, all scanners that I know of do this at minimum. But some naive scanners do only this (personally, I would not recommend such scanners). Such naive scanners are generally incompatible with most distroless images already, since they lack a package manager. More intelligent scanners will scan the whole image's FS for binaries, hashes, etc.

@amouat
Copy link

amouat commented May 29, 2024

To clarify on this topic, all scanners that I know of do this at minimum. But some naive scanners do only this (personally, I would not recommend such scanners). Such naive scanners are generally incompatible with most distroless images already, since they lack a package manager. More intelligent scanners will scan the whole image's FS for binaries, hashes, etc.

Wolfi images do have an apk package database (but no package manager) by default. This means they work with the majority of scanners (even "naive" ones ;) ).

@tooptoop4
Copy link

🙊

@pre
Copy link

pre commented Oct 19, 2024

One does not simply provide a distroless base image, it seems.

We are also post processing the argocd base image in order to remove all Oracle licensed BerkeleyDB (libdb) packages coming from Ubuntu, due to their incompatible license in the enterprise context.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component:api API bugs and enhancements enhancement New feature or request security Security related type:security Something is not secure
Projects
None yet
Development

No branches or pull requests