Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS IRSA requires delicate dance #3069

Closed
james-callahan opened this issue Dec 11, 2020 · 12 comments
Closed

AWS IRSA requires delicate dance #3069

james-callahan opened this issue Dec 11, 2020 · 12 comments

Comments

@james-callahan
Copy link
Contributor

Describe the bug

For AWS IAM roles for service accounts (IRSA) to work, it's important that the .WithCredentials method is never called

s3Config = s3Config.WithCredentials(credentials.AnonymousCredentials)
. This means you need to pass an s3 url and dodge the other branches.


This appears to have been fixed in master of cortex https://github.com/cortexproject/cortex/blob/d775e195f186fe4e2407ca2c643bf7f2350bd6cd/pkg/chunk/aws/s3_storage_client.go#L190. So fix should be to simply update the dependency.

@chancez
Copy link
Contributor

chancez commented Dec 11, 2020

I'm using IRSA just fine, all you have to to is specify the S3 URL like so:

s3://us-west-2/my-bucket-name

For dynamodb_url:

dynamodb://us-west-2

Depending on what your using to deploy Loki, you may need to also set the fsGroup to the token is mounted with the correct permissions for Loki to read it.

@james-callahan
Copy link
Contributor Author

you have to to is specify the S3 URL like so:

s3://us-west-2/my-bucket-name

yes. if you specify e.g. bucketnames it all breaks.

@stale
Copy link

stale bot commented Jan 11, 2021

This issue has been automatically marked as stale because it has not had any activity in the past 30 days. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale A stale issue or PR that will automatically be closed. label Jan 11, 2021
@james-callahan

This comment has been minimized.

@stale stale bot removed the stale A stale issue or PR that will automatically be closed. label Jan 11, 2021
@stale
Copy link

stale bot commented Feb 13, 2021

This issue has been automatically marked as stale because it has not had any activity in the past 30 days. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale A stale issue or PR that will automatically be closed. label Feb 13, 2021
@james-callahan
Copy link
Contributor Author

I wonder if this was fixed with #3267

@stale stale bot closed this as completed Feb 21, 2021
@james-callahan
Copy link
Contributor Author

???? this was commented on. why did stale bot close it?

@dannykopping dannykopping added keepalive An issue or PR that will be kept alive and never marked as stale. and removed stale A stale issue or PR that will automatically be closed. labels Feb 22, 2021
@james-callahan

This comment has been minimized.

@dannykopping dannykopping reopened this Apr 10, 2021
@owen-d owen-d removed the keepalive An issue or PR that will be kept alive and never marked as stale. label May 27, 2021
@owen-d
Copy link
Member

owen-d commented May 27, 2021

@james-callahan Can you verify if this is still an issue now that we've updated our Cortex dependency and reopen this? I believe this should be fixed.

@owen-d owen-d closed this as completed May 27, 2021
@james-callahan
Copy link
Contributor Author

@james-callahan Can you verify if this is still an issue now that we've updated our Cortex dependency and reopen this? I believe this should be fixed.

This is still a problem. Starting up without s3 set in the storage_config when using IRSA gives:
level=error ts=2021-06-23T12:47:02.103948575Z caller=log.go:106 msg="error running loki" err="WebIdentityErr: failed to retrieve credentials\ncaused by: RequestError: send request failed\ncaused by: Post \"https://sts.dummy.amazonaws.com/\": dial tcp: lookup sts.dummy.amazonaws.com on 172.20.0.10:53: no such host\nerror creating index client\ngithub.com/cortexproject/cortex/pkg/chunk/storage.NewStore\n\t/src/loki/vendor/github.com/cortexproject/cortex/pkg/chunk/storage/factory.go:179\ngithub.com/grafana/loki/pkg/loki.(*Loki).initStore\n\t/src/loki/pkg/loki/modules.go:319\ngithub.com/cortexproject/cortex/pkg/util/modules.(*Manager).initModule\n\t/src/loki/vendor/github.com/cortexproject/cortex/pkg/util/modules/modules.go:103\ngithub.com/cortexproject/cortex/pkg/util/modules.(*Manager).InitModuleServices\n\t/src/loki/vendor/github.com/cortexproject/cortex/pkg/util/modules/modules.go:75\ngithub.com/grafana/loki/pkg/loki.(*Loki).Run\n\t/src/loki/pkg/loki/loki.go:220\nmain.main\n\t/src/loki/cmd/loki/main.go:132\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:204\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1374\nerror initialising module: store\ngithub.com/cortexproject/cortex/pkg/util/modules.(*Manager).initModule\n\t/src/loki/vendor/github.com/cortexproject/cortex/pkg/util/modules/modules.go:105\ngithub.com/cortexproject/cortex/pkg/util/modules.(*Manager).InitModuleServices\n\t/src/loki/vendor/github.com/cortexproject/cortex/pkg/util/modules/modules.go:75\ngithub.com/grafana/loki/pkg/loki.(*Loki).Run\n\t/src/loki/pkg/loki/loki.go:220\nmain.main\n\t/src/loki/cmd/loki/main.go:132\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:204\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1374"

However, it's not quite as bad as it was: bucketnames is now at least usable.
Will be updating with:

--- a/applications/loki/files/config.yaml
+++ b/applications/loki/files/config.yaml
@@ -108,7 +108,8 @@ server:
   http_server_write_timeout: 1m
 storage_config:
   aws:
-    s3: s3:///${LOKI_STORAGE_CONFIG_AWS_BUCKETNAMES}
+    s3: s3:///
+    bucketnames: ${LOKI_STORAGE_CONFIG_AWS_BUCKETNAMES}
     sse_encryption: true
   boltdb_shipper:
     shared_store: aws

james-callahan added a commit to james-callahan/kustomize-loki that referenced this issue Jun 23, 2021
grafana/loki#3069 is partially addressed;
however an empty S3 url is still required to avoid loki reaching out to sts.dummy.amazonaws.com
@Fantaztig
Copy link

Fantaztig commented May 23, 2022

This issue still applies to the s3 config of the loki ruler, using bucketnames + endpoint doesn't work with IRSA while specifying the exact s3 url works fine.

Part of the error message is (for making it easier to find this issue and solution when encountering the problem):

WebIdentityErr: failed to retrieve credentials\ncaused by: SerializationError: failed to unmarshal error message\n\tstatus code: 405, request id: \ncaused by: UnmarshalError: failed to unmarshal error message

@nicon89
Copy link

nicon89 commented Dec 23, 2022

I'm getting same error in me-central-1.
Is it because there's no STS endpoint in this region?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants