Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support EKS Pod Identity feature #3899

Closed
Rohlik opened this issue Jul 23, 2024 · 9 comments
Closed

Support EKS Pod Identity feature #3899

Rohlik opened this issue Jul 23, 2024 · 9 comments

Comments

@Rohlik
Copy link
Contributor

Rohlik commented Jul 23, 2024

Is your feature request related to a problem? Please describe.
A very common solution for granting permission to S3 buckets is via IAM roles for Service Accounts (IRSA), but recently (2023), AWS introduced EKS Pod Identity functionality, which simplifies granting AWS services access to pods running in an EKS cluster.
However, Tempo (and other Grafana components) seems to be incompatible based on the docs and my tests:

err="failed to init module services: error initialising module: store: failed to create store: unexpected error from ListObjects on dev-tempo: Access Denied"

Describe the solution you'd like
Support this modern way of granting access to AWS services to pods via EKS Pod Identity.

Describe alternatives you've considered
The mentioned alternative solution with IRSA works fine. However, it can be unnecessarily complicated, especially in big deployments.

Additional context
The primary prerequisite is aws-sdk-go with version > v1.47.11, which Tempo fulfills.
We use tempo-distributed Helm chart.
Related pod's output of compactor, which shows that the container has proper ENVs/mounts auto-set, but the container itself doesn't use them for some reason:

spec:
  containers:
  - args:
    - -target=compactor
    - -config.file=/conf/tempo.yaml
    - -mem-ballast-size-mbs=1024
    env:
    - name: AWS_STS_REGIONAL_ENDPOINTS
      value: regional
    - name: AWS_DEFAULT_REGION
      value: eu-central-1
    - name: AWS_REGION
      value: eu-central-1
    - name: AWS_CONTAINER_CREDENTIALS_FULL_URI
      value: http://169.254.170.23/v1/credentials
    - name: AWS_CONTAINER_AUTHORIZATION_TOKEN_FILE
      value: /var/run/secrets/pods.eks.amazonaws.com/serviceaccount/eks-pod-identity-token
    image: docker.io/grafana/tempo:2.5.0
    imagePullPolicy: IfNotPresent
    name: compactor
    ports:
    - containerPort: 3100
      name: http-metrics
      protocol: TCP
    - containerPort: 7946
      name: http-memberlist
      protocol: TCP
    resources:
      limits:
        cpu: 500m
        memory: 2000Mi
      requests:
        cpu: 5m
        memory: 300Mi
    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop:
        - ALL
      readOnlyRootFilesystem: true
      runAsGroup: 1000
      runAsNonRoot: true
      runAsUser: 1000
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /conf
      name: config
    - mountPath: /runtime-config
      name: runtime-config
    - mountPath: /var/tempo
      name: tempo-compactor-store
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-9s9fj
      readOnly: true
    - mountPath: /var/run/secrets/pods.eks.amazonaws.com/serviceaccount
      name: eks-pod-identity-token
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: false
  nodeName: ip-10-2-6-59.eu-central-1.compute.internal
  preemptionPolicy: PreemptLowerPriority
  priority: 0
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext:
    fsGroup: 1000
  serviceAccount: tempo-pi
  serviceAccountName: tempo-pi
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - name: eks-pod-identity-token
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          audience: pods.eks.amazonaws.com
          expirationSeconds: 86400
          path: eks-pod-identity-token
  - configMap:
      defaultMode: 420
      items:
      - key: tempo.yaml
        path: tempo.yaml
      name: tempo-config
    name: config
  - configMap:
      defaultMode: 420
      items:
      - key: overrides.yaml
        path: overrides.yaml
      name: tempo-runtime
    name: runtime-config
  - emptyDir: {}
    name: tempo-compactor-store
  - name: kube-api-access-9s9fj
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          expirationSeconds: 3607
          path: token
      - configMap:
          items:
          - key: ca.crt
            path: ca.crt
          name: kube-root-ca.crt
      - downwardAPI:
          items:
          - fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
            path: namespace
@joe-elliott
Copy link
Member

joe-elliott commented Jul 23, 2024

We actually use the minio s3 client. Here is our Tempo s3 config:

type Config struct {
tls.ClientConfig `yaml:",inline"`
Bucket string `yaml:"bucket"`
Prefix string `yaml:"prefix"`
Endpoint string `yaml:"endpoint"`
Region string `yaml:"region"`
AccessKey string `yaml:"access_key"`
SecretKey flagext.Secret `yaml:"secret_key"`
SessionToken flagext.Secret `yaml:"session_token"`
Insecure bool `yaml:"insecure"`
PartSize uint64 `yaml:"part_size"`
HedgeRequestsAt time.Duration `yaml:"hedge_requests_at"`
HedgeRequestsUpTo int `yaml:"hedge_requests_up_to"`
// SignatureV2 configures the object storage to use V2 signing instead of V4
SignatureV2 bool `yaml:"signature_v2"`
ForcePathStyle bool `yaml:"forcepathstyle"`
UseDualStack bool `yaml:"enable_dual_stack"`
BucketLookupType int `yaml:"bucket_lookup_type"`
Tags map[string]string `yaml:"tags"`
StorageClass string `yaml:"storage_class"`
Metadata map[string]string `yaml:"metadata"`
// Deprecated
// See https://github.com/grafana/tempo/pull/3006 for more details
NativeAWSAuthEnabled bool `yaml:"native_aws_auth_enabled"`
ListBlocksConcurrency int `yaml:"list_blocks_concurrency"`
}

And here is where we use it to build a minio client:

func createCore(cfg *Config, hedge bool) (*minio.Core, error) {

This appears relevant to our interests:

minio/minio-go#1940

Looks like this was released here:

https://github.com/minio/minio-go/releases/tag/v7.0.70

We updated to this version here:

#3721

So with a little luck this will be supported in 2.6.0?

@AnhQKatalon
Copy link

AnhQKatalon commented Jul 24, 2024

We are having the same issue. Pod Identity was configured correctly and the containers had auto-mounted ENVs properly.

image

But somehow, Tempo's services do not pick those credentials. The only way seems to work now is using IRSA

level=error ts=2024-07-24T04:34:13.563927913Z caller=main.go:121 msg="error running Tempo" err="failed to init module services: error initialising module: store: failed to create store: unexpected error from ListObjects on s3-tempo: Access Denied"

Additional information, Grafana Loki and Mimir can work normally with EKS Pod Identity

@Rohlik
Copy link
Contributor Author

Rohlik commented Jul 24, 2024

@AnhQKatalon 🧐 I was not able to make it work even with Mimir, I'm getting similar error as for Tempo:

err="blocks storage: unable to successfully send a request to object storage: Access Denied"

@joe-elliott Thank yout for that claryfication about Go library 😇.

@mogopz
Copy link

mogopz commented Aug 1, 2024

@Rohlik I can confirm Mimir works with Pod Identity.

We're running most Grafana OSS services and Tempo + Pyroscope are the only two that don't work with Pod Identity at the moment.

Copy link
Contributor

github-actions bot commented Oct 1, 2024

This issue has been automatically marked as stale because it has not had any activity in the past 60 days.
The next time this stale check runs, the stale label will be removed if there is new activity. The issue will be closed after 15 days if there is no new activity.
Please apply keepalive label to exempt this Issue.

@github-actions github-actions bot added the stale Used for stale issues / PRs label Oct 1, 2024
@Dr4il
Copy link

Dr4il commented Oct 8, 2024

bump

@joe-elliott
Copy link
Member

So we have released 2.6.0 with a version of the minio client that presumably supports this feature. Can those who are interested in this feature confirm/disconfirm it works in 2.6.0?

@github-actions github-actions bot removed the stale Used for stale issues / PRs label Oct 9, 2024
@Rohlik
Copy link
Contributor Author

Rohlik commented Oct 9, 2024

I did a quick test and it works.
I appreciate your cooperation on this one 👍🏼.
Docs mentioned in the initial message should be updated accordingly. I will try to do it 🤞🏼.

Rohlik added a commit to Rohlik/tempo that referenced this issue Oct 10, 2024
@joe-elliott
Copy link
Member

Sounds like this is fixed! Thanks for confirming @Rohlik and updating the docs 🙏

knylander-grafana pushed a commit that referenced this issue Oct 11, 2024
github-actions bot pushed a commit that referenced this issue Oct 11, 2024
Relates to #3899.

(cherry picked from commit 793bd5e)
electron0zero pushed a commit that referenced this issue Oct 11, 2024
Relates to #3899.

(cherry picked from commit 793bd5e)
electron0zero pushed a commit that referenced this issue Oct 11, 2024
Relates to #3899.

(cherry picked from commit 793bd5e)

Co-authored-by: TomasR <linux@rohlik.xyz>
knylander-grafana pushed a commit to knylander-grafana/tempo-doc-work that referenced this issue Oct 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants