Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ec2 tags collection is broken when kube2iam is used #3174

Merged
merged 2 commits into from
Mar 29, 2019

Conversation

arminioa
Copy link
Contributor

What does this PR do?

Fix ec2 tag collection when datadog agent is used in kubernetes with kube2iam.

Additional Notes

There is no need to add "/" after a IAM role when composing the url used to get AWS security credentials.

Please see these issues:
DataDog/datadog-agen#3173
jtblin/kube2iam#204

@arminioa arminioa requested a review from a team as a code owner March 20, 2019 10:20
@bits-bot
Copy link
Collaborator

bits-bot commented Mar 20, 2019

CLA assistant check
All committers have signed the CLA.

@arminioa arminioa changed the title Remove trailing slash from url used to retrive security credentials ec2 tags collection is broken when kube2iam is used Mar 20, 2019
@codecov
Copy link

codecov bot commented Mar 20, 2019

Codecov Report

Merging #3174 into master will increase coverage by 2.96%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master    #3174      +/-   ##
==========================================
+ Coverage   54.39%   57.36%   +2.96%     
==========================================
  Files         543      462      -81     
  Lines       39043    28796   -10247     
==========================================
- Hits        21239    16519    -4720     
+ Misses      16536    11165    -5371     
+ Partials     1268     1112     -156
Flag Coverage Δ
#linux 57.36% <100%> (ø) ⬆️
#windows ?
Impacted Files Coverage Δ
pkg/util/ec2/ec2_tags.go 28.98% <100%> (+3.69%) ⬆️
pkg/trace/sampler/prioritysampler.go 49.05% <0%> (-11.99%) ⬇️
pkg/metadata/resources/resources.go 75% <0%> (-8.34%) ⬇️
pkg/trace/sampler/scoresampler.go 58.33% <0%> (-7.71%) ⬇️
pkg/quantile/key.go 46.15% <0%> (-7.18%) ⬇️
pkg/metadata/host/host_tags.go 74% <0%> (-6%) ⬇️
pkg/procmatch/signature.go 66.66% <0%> (-4.77%) ⬇️
pkg/metrics/event.go 67.1% <0%> (-4.64%) ⬇️
pkg/metrics/service_check.go 56.71% <0%> (-4.33%) ⬇️
pkg/trace/test/buffer.go 75% <0%> (-4.32%) ⬇️
... and 386 more

@arminioa
Copy link
Contributor Author

@truthbk please review

@arminioa arminioa force-pushed the ec2_tags_iam_role_fix branch from 7b0b465 to 13c16c9 Compare March 20, 2019 12:38
@arminioa arminioa force-pushed the ec2_tags_iam_role_fix branch from 13c16c9 to 3e6975b Compare March 20, 2019 12:40
@Simwar
Copy link
Contributor

Simwar commented Mar 28, 2019

@clamoriniere can you review?

Thanks!

@hush-hush hush-hush added this to the 6.11.0 milestone Mar 28, 2019
@2rs2ts
Copy link

2rs2ts commented Mar 28, 2019

This may actually depend on which instance type you are running on. On m5 and c5, you do not want the trailing /, but on other instance types you do. See this issue for more details (particularly this comment) uswitch/kiam#42 (comment)

@2rs2ts
Copy link

2rs2ts commented Mar 28, 2019

I wonder if you even need to manually fetch credentials at all. Normally, if you provide no credentials the AWS SDK will attempt to use the instance profile, and it does all this work behind the scenes.

@Simwar
Copy link
Contributor

Simwar commented Mar 29, 2019

Hi @2rs2ts

The issue linked is, I believe, about the URL:
http://169.254.169.254/latest/meta-data/iam/security-credentials having a trailing '/' or not.
Our issue here is about http://169.254.169.254/latest/meta-data/iam/security-credentials/iam-role/ so not the same URL.
I tested on a m5 and on a t2 instance with or without a '/' and it works.
Need to test on kube2iam and we can merge.

@Simwar
Copy link
Contributor

Simwar commented Mar 29, 2019

Tested on kube2iam. Working without the '/' in the curl.

@Simwar Simwar merged commit 36d7364 into DataDog:master Mar 29, 2019
@arminioa
Copy link
Contributor Author

Thank you guys!

@xvello
Copy link
Contributor

xvello commented Apr 3, 2019

While QA-ing this feature, I stumbled upon this known kube2iam issue about cold caches. This is an issue for the agent as our cloud provider autodetection algorithm has intentionally short timeouts, to avoid delaying too much the agent startup.

To pre-fill the kube2iam cache, I had to add the following init container to the agent daemonset:

      initContainers:
      - image: datadog/agent:latest
        imagePullPolicy: Always
        name: curl-iam
        command: ['curl', 'http://169.254.169.254/latest/meta-data/iam/security-credentials/<ASSUMED_IAM_ROLE_NAME>']

@2rs2ts
Copy link

2rs2ts commented Apr 4, 2019

That intentionally short timeout causes a lot of issues for us and many others... perhaps it should be reconsidered. One really really common problem is that the metadata endpoint isn't available (often due to setups like kube2iam or kiam) and so the agent boots up with the ec2 hostname as the agent hostname, which of course causes all sorts of problems in the backend because hostnames are not unique so you get tons of no data alerts and other confusion. If the user knows they're deploying the agent to a cloud provider, autodetection only makes things harder for them.

@hkaj
Copy link
Member

hkaj commented Apr 5, 2019

Thanks for the feedback @2rs2ts

I'll add something to the backlog to make it configurable. I think it requires a different config for each hostname provider, as otherwise we'll wait for X seconds * Y hostname provider, Y being the amount of providers preceding the one that will succeed in the user's environment, and the EC2 provider is the second last one which would significantly slow down the hostname detection. Do you agree, or did you have another solution in mind?

@2rs2ts
Copy link

2rs2ts commented Apr 5, 2019

I think you probably just want to provide a config value for the cloud provider in general. If I say to the agent "you are in EC2" then it should do only EC2 things, not GCE things or anything else. Hopefully that makes sense!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants