Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS ECR registry authentication only works in the same/default region as caller #1026

Closed
msw-kialo opened this issue May 28, 2021 · 22 comments · Fixed by #6217
Closed

AWS ECR registry authentication only works in the same/default region as caller #1026

msw-kialo opened this issue May 28, 2021 · 22 comments · Fixed by #6217
Labels
good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/bug Categorizes issue or PR as related to a bug. priority/backlog Higher priority than priority/awaiting-more-evidence.

Comments

@msw-kialo
Copy link
Contributor

Description

Trivy is deployed within an AWS EKS Kubernetes pod. The pod has an AWS role associated to read ECR images. Trivy is executed to scan:

  • Private AWS ECR images
  • Public AWS ECR images in the same region
  • Public AWS ECR images in other regions

What did you expect to happen?

Trivy is able to scan these three image types. Authentication is handled transparently by Trivy (for all these types). On failure, an available docker-credential-ecr-login helper is used/tried, too.

What happened instead?

Trivy is unable to scan (public) ECR images in other AWS regions (except the region of the EKS cluster).

The automatic AWS authentication handler injects ECR authentication tokens from the current (default) region, not the region of the ECR image itself.
Having the docker-credential-ecr-login helper available and configured does not mitigate the authentication issue (the helper is never used by Trivy; although it would return valid credentials). There is no known configuration option to force Trivy to fall back to using the docker-credential-ecr-login helper.

In our case, we start Trivy in an EKS cluster in us-east-1. It is able to scan private and public ECR images in us-east-1. Public images in other regions like eu-central-1 fail (I could not yet test it with private ECR images).

Output of run with -debug:

/ # trivy --debug image --skip-update --severity CRITICAL 602401143452.dkr.ecr.us-east-1.amazonaws.com/eks/coredns:v1.7.0-eksbuild.1
2021-05-27T14:20:50.859Z        DEBUG   Severities: CRITICAL
2021-05-27T14:20:50.860Z        DEBUG   cache dir:  /root/.cache/trivy
2021-05-27T14:20:50.861Z        DEBUG   DB Schema: 1, Type: 1, UpdatedAt: 2021-05-27 12:13:39.579564022 +0000 UTC, NextUpdate: 2021-05-28 00:13:39.579563622 +0000 UTC, DownloadedAt: 2021-05-27 13:37:54.854680347 +0000 UTC
2021-05-27T14:20:50.861Z        DEBUG   Vulnerability type:  [os library]
2021-05-27T14:20:51.204Z        DEBUG   Image ID: sha256:e3adaca0b74ac9810f91d8ae39d80681772f4a1338201e498fb4abdd0b80e5f3
2021-05-27T14:20:51.204Z        DEBUG   Diff IDs: [sha256:32e8e94d13e789ff20ec4ea815f971bbcc3ce52955c6ac8b7ce25546ad978024 sha256:b0bc0874083cf0b4db645481dcd8d4b5fa0ab02295676c19d8a4f3960eaf711e]
2021-05-27T14:20:51.205Z        DEBUG   OS is not detected and vulnerabilities in OS packages are not detected.
2021-05-27T14:20:51.205Z        INFO    Detected OS: unknown
2021-05-27T14:20:51.205Z        INFO    Number of PL dependency files: 1
2021-05-27T14:20:51.205Z        INFO    Detecting gobinary vulnerabilities...
2021-05-27T14:20:51.206Z        DEBUG   Detecting library vulnerabilities, type: gobinary, path: coredns

coredns
=======
Total: 0 (CRITICAL: 0)

/ # trivy --debug image --skip-update --severity CRITICAL 602401143452.dkr.ecr.eu-central-1.amazonaws.com/eks/coredns:v1.7.0-eksbuild.1
2021-05-27T14:21:04.191Z        DEBUG   Severities: CRITICAL
2021-05-27T14:21:04.192Z        DEBUG   cache dir:  /root/.cache/trivy
2021-05-27T14:21:04.192Z        DEBUG   DB Schema: 1, Type: 1, UpdatedAt: 2021-05-27 12:13:39.579564022 +0000 UTC, NextUpdate: 2021-05-28 00:13:39.579563622 +0000 UTC, DownloadedAt: 2021-05-27 13:37:54.854680347 +0000 UTC
2021-05-27T14:21:04.192Z        DEBUG   Vulnerability type:  [os library]
2021-05-27T14:21:04.903Z        FATAL   scan error:
    github.com/aquasecurity/trivy/pkg/commands/artifact.runWithTimeout
        /home/runner/work/trivy/trivy/pkg/commands/artifact/run.go:67
  - unable to initialize a scanner:
    github.com/aquasecurity/trivy/pkg/commands/artifact.scan
        /home/runner/work/trivy/trivy/pkg/commands/artifact/run.go:157
  - unable to initialize a docker scanner:
    github.com/aquasecurity/trivy/pkg/commands/artifact.dockerScanner
        /home/runner/work/trivy/trivy/pkg/commands/artifact/image.go:29
  - 3 errors occurred:
        * unable to inspect the image (602401143452.dkr.ecr.eu-central-1.amazonaws.com/eks/coredns:v1.7.0-eksbuild.1): Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
        * unable to initialize Podman client: no podman socket found: stat podman/podman.sock: no such file or directory
        * GET https://602401143452.dkr.ecr.eu-central-1.amazonaws.com/v2/eks/coredns/manifests/v1.7.0-eksbuild.1: DENIED: Your Authorization Token is invalid.

Output of trivy -v:

Version: 0.18.3

Issue can be reproduced with at least 0.16.0, too.

Additional details (base image name, container registry info...):

Produce same error messages with built-in tools

The error message Your Authorization Token is invalid. can be reproduced by providing an AWS authorization token from a different region:

/ # AWS_TOKEN_EU_CENTRAL_1=$(aws --region=eu-central-1 ecr get-authorization-token --output text --query 'authorizationData[].authorizationToken')
/ # curl -i -H "Authorization: Basic $AWS_TOKEN_EU_CENTRAL_1" https://602401143452.dkr.ecr.eu-central-1.amazonaws.com/v2/eks/coredns/manifests/v1.7.0-eksbuild.1
HTTP/1.1 200 OK
Content-Type: application/vnd.docker.distribution.manifest.list.v2+json
Docker-Distribution-Api-Version: registry/2.0
Date: Thu, 27 May 2021 15:12:48 GMT
Content-Length: 741

{
   "schemaVersion": 2,
   "mediaType": "application/vnd.docker.distribution.manifest.list.v2+json",
   "manifests": [
      {
         "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
         "size": 739,
         "digest": "sha256:0e681df097589fa1cdd6c09bf140d9dd9ade86f2cadca06fa6a5008ac7da1cd2",
         "platform": {
            "architecture": "amd64",
            "os": "linux"
         }
      },
      {
         "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
         "size": 739,
         "digest": "sha256:3174688c1f4bcc94c786c22e2b3c7cc8e171b1b3997f8ddc777152766618ef6e",
         "platform": {
            "architecture": "arm64",
            "os": "linux"
         }
      }
   ]
/ # AWS_TOKEN_US_EAST_1=$(aws --region=us-east-1 ecr get-authorization-token --output text --query 'authorizationData[].authorizationToken')
/ # curl -i -H "Authorization: Basic $AWS_TOKEN_US_EAST_1" https://602401143452.dkr.ecr.eu-central-1.amazonaws.com/v2/eks/coredns/manifests/v1.7.0-eksbuild.1
HTTP/1.1 400 Bad Request
Content-Type: application/json; charset=utf-8
Docker-Distribution-Api-Version: registry/2.0
Date: Thu, 27 May 2021 15:12:59 GMT
Content-Length: 80

{"errors":[{"code":"DENIED","message":"Your Authorization Token is invalid."}]}

Authentication via docker-credential-ecr-login would work

Previously, the pod assumed its role via an older kube2iam version (0.10.x). Apparently, the AWS API imitated by kube2iam was not sufficient to convince Trivy to be running in an AWS context. It did not handle the authentication itself. The available docker-credential-ecr-login helper was successfully used for all image types:

/ # docker-credential-ecr-login list
{}

/ # trivy --debug image --skip-update --severity CRITICAL 602401143452.dkr.ecr.us-east-1.amazonaws.com/eks/coredns:v1.7.0-eksbuild.1
2021-05-27T15:27:35.912Z        DEBUG   Severities: CRITICAL
2021-05-27T15:27:35.913Z        DEBUG   cache dir:  /root/.cache/trivy
2021-05-27T15:27:35.913Z        DEBUG   DB Schema: 1, Type: 1, UpdatedAt: 2021-05-27 12:13:39.579564022 +0000 UTC, NextUpdate: 2021-05-28 00:13:39.579563622 +0000 UTC, DownloadedAt: 2021-05-27 13:37:50.414451 +0000 UTC
2021-05-27T15:27:35.913Z        DEBUG   Vulnerability type:  [os library]
2021-05-27T15:27:36.381Z        DEBUG   Image ID: sha256:e3adaca0b74ac9810f91d8ae39d80681772f4a1338201e498fb4abdd0b80e5f3
2021-05-27T15:27:36.381Z        DEBUG   Diff IDs: [sha256:32e8e94d13e789ff20ec4ea815f971bbcc3ce52955c6ac8b7ce25546ad978024 sha256:b0bc0874083cf0b4db645481dcd8d4b5fa0ab02295676c19d8a4f3960eaf711e]
2021-05-27T15:27:36.382Z        DEBUG   OS is not detected and vulnerabilities in OS packages are not detected.
2021-05-27T15:27:36.382Z        INFO    Detected OS: unknown
2021-05-27T15:27:36.382Z        INFO    Number of PL dependency files: 1
2021-05-27T15:27:36.382Z        INFO    Detecting gobinary vulnerabilities...
2021-05-27T15:27:36.382Z        DEBUG   Detecting library vulnerabilities, type: gobinary, path: coredns

coredns
=======
Total: 0 (CRITICAL: 0)

/ # docker-credential-ecr-login list
{"https://602401143452.dkr.ecr.us-east-1.amazonaws.com":"AWS"}

/ # trivy --debug image --skip-update --severity CRITICAL 602401143452.dkr.ecr.eu-central-1.amazonaws.com/eks/coredns:v1.7.0-eksbuild.1
2021-05-27T15:27:04.407Z        DEBUG   Severities: CRITICAL
2021-05-27T15:27:04.408Z        DEBUG   cache dir:  /root/.cache/trivy
2021-05-27T15:27:04.408Z        DEBUG   DB Schema: 1, Type: 1, UpdatedAt: 2021-05-27 12:13:39.579564022 +0000 UTC, NextUpdate: 2021-05-28 00:13:39.579563622 +0000 UTC, DownloadedAt: 2021-05-27 13:37:50.414451 +0000 UTC
2021-05-27T15:27:04.408Z        DEBUG   Vulnerability type:  [os library]
2021-05-27T15:27:06.164Z        DEBUG   Image ID: sha256:e3adaca0b74ac9810f91d8ae39d80681772f4a1338201e498fb4abdd0b80e5f3
2021-05-27T15:27:06.164Z        DEBUG   Diff IDs: [sha256:32e8e94d13e789ff20ec4ea815f971bbcc3ce52955c6ac8b7ce25546ad978024 sha256:b0bc0874083cf0b4db645481dcd8d4b5fa0ab02295676c19d8a4f3960eaf711e]
2021-05-27T15:27:06.165Z        DEBUG   OS is not detected and vulnerabilities in OS packages are not detected.
2021-05-27T15:27:06.165Z        INFO    Detected OS: unknown
2021-05-27T15:27:06.165Z        INFO    Number of PL dependency files: 1
2021-05-27T15:27:06.165Z        INFO    Detecting gobinary vulnerabilities...
2021-05-27T15:27:06.165Z        DEBUG   Detecting library vulnerabilities, type: gobinary, path: coredns

coredns
=======
Total: 0 (CRITICAL: 0)

/ # docker-credential-ecr-login list
{"https://602401143452.dkr.ecr.eu-central-1.amazonaws.com":"AWS","https://602401143452.dkr.ecr.us-east-1.amazonaws.com":"AWS"}
@msw-kialo msw-kialo added the kind/bug Categorizes issue or PR as related to a bug. label May 28, 2021
@knqyf263
Copy link
Collaborator

knqyf263 commented Jun 1, 2021

If you set AWS_DEFAULT_REGION=eu-central-1, does it work?

@icecream-monster
Copy link

yes, @knqyf263
I opened the same issue: #1034 and realized someone had already opened this one, but I will go close it now, thanks for the tips

any possibility that you provide some leads on how trivy reads ~/.aws/config? I would like to switch a few regions back and fourth, would be nice to understand that rather than going to switch [default] region=<region> every time, if possible

thanks!

@msw-kialo
Copy link
Contributor Author

@knqyf263 Specifying a matching region to the image location via AWS_REGION / AWS_DEFAULT_REGION resolves the issue.

I discovered that EKS clusters have a pod-identiy-webhook configured to inject needed environment variables to assume roles via OIDC / web-identity. This also injects AWS_REGION and AWS_DEFAULT_REGION based on the cluster location if they are not specified.
If both variables are unset or blank, Trivy correctly auto-selects the region based on the image location.
Creating the pod with explicit black values like

    env:
    - name: AWS_DEFAULT_REGION
      value: ""
    - name: AWS_REGION
      value: ""

prevents the webhook from inject region environment values: Trivy's auto-detection works as expected. I wasn't able to resolve the issue by creating a ~/.aws/config file.

But this still fells like some kind of workaround - running Trivy in an EKS cluster to scan ECR images appears to me like a common use-case. The documentation on the ECR integration page reads like this should work out-of-the-box.

Is it possible to override the region directly in Trivy? Otherwise, it feels like this pitfall is worth to document.

@github-actions
Copy link

github-actions bot commented Aug 1, 2021

This issue is stale because it has been labeled with inactivity.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and will be auto-closed. label Aug 1, 2021
@msw-kialo
Copy link
Contributor Author

I have a workaround in-place (blank AWS_REGION, AWS_DEFAULT_REGION) but would prefer a proper solution in Trivy if reasonable.

@github-actions github-actions bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and will be auto-closed. label Aug 10, 2021
@github-actions
Copy link

This issue is stale because it has been labeled with inactivity.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and will be auto-closed. label Oct 10, 2021
@msw-kialo
Copy link
Contributor Author

Situation hasn't changed on my site, but I would prefer a mentation statement regarding option to improve the situation of future users.

@github-actions github-actions bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and will be auto-closed. label Oct 22, 2021
@github-actions
Copy link

This issue is stale because it has been labeled with inactivity.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and will be auto-closed. label Dec 21, 2021
@msw-kialo
Copy link
Contributor Author

Status remains unchanged.

@github-actions github-actions bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and will be auto-closed. label Dec 22, 2021
@jananathbanuka
Copy link

Still the same issue

@sinner-
Copy link

sinner- commented Jul 20, 2022

I ran across this issue in version 0.28.0 (couldn't use 0.30.0 due to #2349) trying to troubleshoot both trivy image and trivy k8s plugins scanning images stored in private ECR repository.

This is just running trivy on an EC2 instance with access to the EKS cluster API, not running it as a pod or operator. The instance has a profile with associated role to allow access to EKS and ECR.

Using trivy image without AWS_REGION set results in the same errors mentioned in the original report.

Using trivy k8s without AWS_REGION set results in the scan completing without any error but many images are simply not scanned and there is no warning/error/output that it wasn't able to scan anything.

The workaround suggested by @msw-kialo to set AWS_REGION or AWS_DEFAULT_REGION as empty strings did not seem to work for me, I could not trigger the auto-selection of region.

However I was able to get it working by installing docker, starting docker daemon, adding the user to the docker group, and installing/enabling config for docker-credential-ecr-login.

It would be great if trivy k8s provided some output on images it was unable to scan because right now it gives the false impression that it didn't find any vulnerabilities.

@github-actions
Copy link

This issue is stale because it has been labeled with inactivity.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and will be auto-closed. label May 23, 2023
@msw-kialo
Copy link
Contributor Author

While we have a workaround in place, a proper resolution would be great (I am reluctant to try fixing it myself as I am not familiar with the code base).

@github-actions github-actions bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and will be auto-closed. label May 24, 2023
@knqyf263
Copy link
Collaborator

@msw-kialo We've added --aws-region in VM scanning. Do you think adding the same option to image scanning helps?
#3284

@sinner-
Copy link

sinner- commented May 24, 2023

@msw-kialo did you try

However I was able to get it working by installing docker, starting docker daemon, adding the user to the docker group, and installing/enabling config for docker-credential-ecr-login.

@msw-kialo
Copy link
Contributor Author

@sinner- I didn't try adding a docker daemon. It is IMO also a workaround. However, it requires much more maintenance attention and further permissions (at least if you run in a Kubernetes cluster). Unsetting a few environments variables is easier.

@knqyf263 Based on the PR, the --aws-region parameter is specific to VM scanning (we scan container images). However, the PR shows that it might be feasible to extract an aws-region from the image reference that is about to be scanned and use that region for authentication 🤔

@knqyf263
Copy link
Collaborator

Based on the PR, the --aws-region parameter is specific to VM scanning (we scan container images).

Yes, I'm asking if you want the option for image scanning. The CLI flag can overwrite environmental variables injected by webhook.

@msw-kialo
Copy link
Contributor Author

Sorry (apparently I skipped over the second sentence).

Yes, such an option would be good to have:

  1. It would allow users to resolve such permission issues without digging into injected environment variables. In some cases it might be difficult to change that.
  2. The corresponding parameter description could help users understand such issues without internet searches; if there is a region parameter, it might be necessary to provide the correct value.

The optimal solution would auto-detect the region parameter based on the image registry URL. If it is an AWS managed registry (and only for those this parameter makes sense), the full registry URL will already contain the correct region: e.g.,
602401143452.dkr.ecr.us-east-1.amazonaws.com/eks/coredns:v1.7.0-eksbuild.1 (us-east-1) and 602401143452.dkr.ecr.eu-central-1.amazonaws.com/eks/coredns:v1.7.0-eksbuild.1 (eu-central-1).
With such autodetection in place, the CLI parameter might no longer be needed (providing a different value would almost always result in a permission issue). That is the difference to EBS / AMI references (they don't include the region).

@knqyf263
Copy link
Collaborator

knqyf263 commented May 24, 2023

OK, it seems like the region is always included. We can take it and fall back into a blank region when it cannot be detected.
https://docs.aws.amazon.com/AmazonECR/latest/userguide/Repositories.html

It could be easy to do that. Here is the ECR logic.
https://github.com/aquasecurity/trivy/blob/50fe43f14c655c5e7bca45f0a0e4bd92891487d9/pkg/fanal/image/registry/ecr/ecr.go

@knqyf263 knqyf263 added help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. priority/backlog Higher priority than priority/awaiting-more-evidence. labels May 24, 2023
@msw-kialo
Copy link
Contributor Author

Apparently, this issue has been fixed in the meantime (either in trivy itself or the use aws libraries).
I tested trivy 0.48.1 without the added workaround (ensure trivy is called without AWS_REGION and AWS_DEFAULT_REGION): even with AWS_REGION and AWS_DEFAULT_REGION set to us-east-1, trivy was able to successfully scan images in other regions like eu-central-1.

@knrc
Copy link
Contributor

knrc commented Feb 26, 2024

This issue still exists with the latest version.

setting AWS_REGION and AWS_DEFAULT_REGION to "" fails for multiple regions

2024-02-26T18:56:03.738Z	ERROR	Error during vulnerabilities or misconfiguration scan: scan error: unable to initialize a scanner: unable to initialize an image scanner: 4 errors occurred:
	* docker error: unable to inspect the image (127647282379.dkr.ecr.us-east-1.amazonaws.com/undistro-test-image:1.25.3): Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
	* containerd error: containerd socket not found: /run/containerd/containerd.sock
	* podman error: unable to initialize Podman client: no podman socket found: stat podman/podman.sock: no such file or directory
	* remote error: GET https://127647282379.dkr.ecr.us-east-1.amazonaws.com/v2/undistro-test-image/manifests/1.25.3: unexpected status code 401 Unauthorized: Not Authorized

2024-02-26T18:56:03.738Z	ERROR	Error during vulnerabilities or misconfiguration scan: scan error: unable to initialize a scanner: unable to initialize an image scanner: 4 errors occurred:
	* docker error: unable to inspect the image (127647282379.dkr.ecr.sa-east-1.amazonaws.com/undistro-test-image:1.25.3): Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
	* containerd error: containerd socket not found: /run/containerd/containerd.sock
	* podman error: unable to initialize Podman client: no podman socket found: stat podman/podman.sock: no such file or directory
	* remote error: GET https://127647282379.dkr.ecr.sa-east-1.amazonaws.com/v2/undistro-test-image/manifests/1.25.3: unexpected status code 401 Unauthorized: Not Authorized

By default it uses the EKS region, and overriding it allows us to access an alternative region.

In order to scan images from multiple regions we need to modify the code, I have something working and will submit a PR shortly

@achantavy
Copy link

I haven't had a chance to upgrade yet, but I'm seeing the issue in 0.48.0. Thought I'd share a workaround:

I had one account where trivy was able to only scan in us-east-1 but not eu-west-3 and a several others. For the other accounts, Trivy was able to run just fine. Not sure why.

I used a workaround by setting AWS_DEFAULT_REGION and AWS_REGION to the region of the repo immediately before each image scan and it works for all my accounts. I tried the workaround mentioned above where those env vars were set to the empty string but that didn't work for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/bug Categorizes issue or PR as related to a bug. priority/backlog Higher priority than priority/awaiting-more-evidence.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants