Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid security token issue on EC2 that has an IAM instance profile #2693

Closed
killercentury opened this issue Jul 11, 2015 · 30 comments
Closed
Labels
bug provider/aws waiting-response An issue/pull request is waiting for a response from the community

Comments

@killercentury
Copy link

When I run plan and apply on my Mac for creating a ECS cluster, everything seems fine. But when I run the exact same thing on a EC2 instance which has an IAM instance profile, it will complain following error:

UnrecognizedClientException: The security token included in the request is invalid
    status code: 400, request id: []

I actually want to use the "access_key" and "secret_key" specified in my terraform configs instead of inheriting these from the IAM role. But it seems those variables in terraform configs weren't used properly, and the credentials from IAM role as well, since both have full permission for the operation.

So I tried a few different combinations as follows.

  1. I change the value of "access_key" and "secret_key" in my terraform configs to the ones in the IAM role. And it works fine. (Of course, EC2 will change these values after a while.)
  2. I set the environment variable AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY to some random value (anything! such as AWS_ACCESS_KEY_ID=hello and AWS_SECRET_ACCESS_KEY=world). Then it seems working fine with the values I specified in the my terraform.tfvars file now.

So without looking into the code, I guess it has to do with some credentials detection logic for the IAM role and it overrides the value specified in the terraform.tfvars file.

@phinze
Copy link
Contributor

phinze commented Jul 14, 2015

@killercentury that's definitely odd!

We have been working on improving the credentials detection code recently - so it's definitely possible there's a bug there.

From reading your description, it's hard to tell exactly the steps to reproduce the unexpected behavior. Can you lay them out for me?

@evidex
Copy link

evidex commented Aug 24, 2015

+1 Seeing the same issue on EC2 when using Terraform. As @killercentury suggested, setting the environment variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY to random values allows terraform to pickup the correct values from the provider file.

@ckelner
Copy link

ckelner commented Nov 7, 2015

I am really glad I came across this. I am also having this issue.

I'm working on automation to drive terraform and recently ran into this issue w/ running terraform on an AWS EC2 instance. I was able to run terraform plan from my local dev machine just fine with the same keys I was trying to use on the EC2 instance, but on EC2 I kept getting:

TF_VAR_project_aws_access_key=<key> TF_VAR_project_aws_secret_key=<secret> terraform plan 
....
* InvalidClientTokenId: The security token included in the request is invalid
    status code: 403, request id: abc123

I finally ended up running tcpdump to try to figure out what the issue was:

22:00:33.251354 IP 192.168.0.24.53165 > 169.254.169.254.http: Flags [.], ack 1, win 141, options [nop,nop,TS val 18826934 ecr 3018831768], length 0
....
22:00:33.252092 IP 169.254.169.254.http > 192.168.0.24.53165: Flags [.], ack 216, win 149, options [nop,nop,TS val 3018831768 ecr 18826935], length 0

I noticed terraform was contacting 169.254.169.254 which is the EC2 meta data service.

After finding this issue, I tried setting AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY to some random value and finally I was able to successfully execute terraform from my EC2 instance:

AWS_ACCESS_KEY_ID=123 AWS_SECRET_ACCESS_KEY=567 TF_VAR_project_aws_access_key=<key> TF_VAR_project_aws_secret_key=<secret> terraform plan
Refreshing Terraform state prior to plan...
....
+ aws_vpc.main
    cidr_block:                "" => "192.168.0.0/24"
    default_network_acl_id:    "" => "<computed>"
    default_security_group_id: "" => "<computed>"
    dhcp_options_id:           "" => "<computed>"
    enable_dns_hostnames:      "" => "<computed>"
    enable_dns_support:        "" => "<computed>"
    main_route_table_id:       "" => "<computed>"


Plan: 22 to add, 0 to change, 0 to destroy.

I am not working on this alone, so the next piece of information could be erroneous, but I believe we only recently added an EC2 instance profile (IAM role) to the instance, so perhaps this has something to do with terraform trying to use the metadata service.

EDIT: To clarify, I found this to be true with both version 0.6.3 and 0.6.6

@artburkart
Copy link
Contributor

I have experienced identical behavior: #3243

@catsby
Copy link
Contributor

catsby commented Dec 3, 2015

Hello friends –

Sorry for the late response here. Can you confirm this is still an issue?
We had an update to the provider code that should properly load all the creds from the environment on EC2 (see 9e66e18#diff-d6065946c2cbff2b5f5d3beadfdfa8a4). I'm wondering if it fixed your issues here, as I'm attempting to reproduce your issues here and not quite hitting them.

On an EC2 instance, I'm able to curl out to the metadata, retrieve the security creds (including SessionToken) and run TF like so:

$ AWS_ACCESS_KEY_ID=anaccessid AWS_SECRET_ACCESS_KEY=anaccesskey AWS_SESSION_TOKEN=alongtoken ./bin/terraform plan

The plan executes successfully. Editing any of those variables causes the plan to fail to auth, as expected.

What am I missing here? Am I not reproducing this correctly?

@catsby catsby added the waiting-response An issue/pull request is waiting for a response from the community label Dec 3, 2015
@artburkart
Copy link
Contributor

@catsby - Would this change be in the latest binary I can install, or will I need to build from source?
Aight, I tried it with the latest binary; still broken there. I'm about to build it from source and see if I still get it.

@catsby
Copy link
Contributor

catsby commented Dec 3, 2015

Hey @artburkart – the change I referenced is in v0.6.8, the current binary.

@artburkart
Copy link
Contributor

@catsby - okay good, cuz I only just installed go on my test machine 😉 If that's the case, then it appears to still be there. Allow me to double check.

@catsby
Copy link
Contributor

catsby commented Dec 3, 2015

@artburkart to confirm, this is your issue, specifically, correct?

@artburkart
Copy link
Contributor

@catsby, yes that is my issue. It's identical to that defined by @ckelner above

terraform -v
Terraform v0.6.8

Then I run terraform plan

Error refreshing state: 1 error(s) occurred:

* 1 error(s) occurred:

* InvalidClientTokenId: The security token included in the request is invalid
    status code: 403, request id: abc123

Then I run

AWS_ACCESS_KEY_ID=123 AWS_SECRET_ACCESS_KEY=567 terraform plan

And bingo!

@ckelner
Copy link

ckelner commented Dec 3, 2015

@catsby maybe I can help add a little more color, I'm recalling from memory, but if need be I'll be happy to dig in and get real data/examples.

  • launch aws ec2 instance (it may require that it has an instance role [iam profile] associated, I was never able to confirm)
  • Either export environment variables or assign them just before execution:
    • export TF_VAR_project_aws_access_key=<your real key> (where project_aws_access_key is a variable used in tf)
    • export TF_VAR_project_aws_secret_key=<your real secret> (where project_aws_secret_key is a variable used in tf)
    • Alternatively: TF_VAR_project_aws_access_key=<key> TF_VAR_project_aws_secret_key=<secret> terraform plan
  • Execute terraform plan and you should get (if the issue still persists) InvalidClientTokenId from AWS
  • Now either export AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY or set them just before execution (edit) to some bogus value, it doesn't matter, (end-edit) as the TF_VAR environment variables were above, being sure to ALSO set the TF_VAR environment variables again to some valid value.
  • Now execute terraform plan again and it should now work.

Hopefully this is helpful in your testing.

I will try 0.6.8 as soon as I can.

@artburkart
Copy link
Contributor

I also have an IAM role associated with my instance. But allow me to try it without one.

@catsby
Copy link
Contributor

catsby commented Dec 4, 2015

Thanks all for the additional info! I have some kind of idea, but probably won't get to it until tomorrow.

@artburkart
Copy link
Contributor

Okay, here's my steps to repro:

  • Start EC2 instance with IAM role
  • ssh as ubuntu user
  • Install terraform
    1. curl -L https://releases.hashicorp.com/terraform/0.6.8/terraform_0.6.8_linux_amd64.zip -O
    2. unzip terraform/0.6.8/terraform_0.6.8_linux_amd64.zip -d /usr/local/bin
  • Create a main.tf file:
provider "aws" {
    access_key = "${var.access_key}"
    secret_key = "${var.secret_key}"
    region = "us-east-1"
}

resource "aws_instance" "web" {
    ami = "ami-d05e75b8"
    instance_type = "m1.small"
    tags {
        Name = "HelloWorld"
    }
}
  • Create a variables.tf file:
variable "access_key" {
    description = "Access key to provider (AWS, openstack, etc)"
}

variable "secret_key" {
    description = "Secret key to provider (AWS, openstack, etc)"
}
  • Create a terraform.tfvars file:
access_key="some_valid_key"
secret_key="some_other_valid_key"
aws_key_name="some_pem_file"
  • Run terraform plan and you get the error:
Error refreshing state: 1 error(s) occurred:

* 1 error(s) occurred:

* InvalidClientTokenId: The security token included in the request is invalid
    status code: 403, request id: abc123

I have verified that if you do the exact same thing without associating an IAM role, you can successfully execute terraform plan

@ckelner
Copy link

ckelner commented Dec 4, 2015

Great work @artburkart ! 💃 Glad you were able to pinpoint that it was the IAM role.

@catsby
Copy link
Contributor

catsby commented Dec 4, 2015

Hello Friends! Thanks to all of your help, we've identified the issue here. The fix however will take some time to work out and test.

The crux of it is here:

specifically, the DefaultFunc's. If you manually supply id/secret, default is not used for id/secret. However, it is used for token. In this case, if you do so on an instance with IAM, the DefaultFunc calls getCredDefault which goes through the chain providers to detect credentials. The first two (env, shared) fail, but the EC2RoleProvider succeeds in finding the IAM info. Unless you used the IAM id/secret in the invocation, you'll end up with mismatched id/secret/token.

If you specify AWS_ACCESS_KEY_ID et. al. with bad values while using tfvars, things should work, because DefaultFunc is only called for token, in which case the ENV provider evaluates successfully and the EC2RoleProvider is never attempted. The check in the chain providers doesn't actually authenticate with what it finds, it just checks if the right things are there. In this case, your token becomes the value of what ever is in AWS_SESSION_TOKEN (probably nothing). Terraform then uses the values from your tfvars and the actual authentication ignores the empty string for token.

We believe we can fix this by dumbing down the logic in aws/provider.go and leveraging the chain provider(s) more. We think they were built up the way they are now back in the darker times of the aws-sdk-go when things were less sturdy. I'm going to experiment and find out!

Thanks again for the help

@ckelner
Copy link

ckelner commented Dec 4, 2015

Awesome work @catsby thanks for the update!

@avdhoot
Copy link

avdhoot commented Dec 4, 2015

+1 facing same issue with aws ec2 instance launched with IAM role. Waiting for fix :)

@ckelner
Copy link

ckelner commented Dec 4, 2015

@avdhoot Not sure if you noticed the work around, but setting AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY to some random value does the trick.

EDIT: Not that it should be accepted as a long term fix, but in the event it was blocking you :)

@avdhoot
Copy link

avdhoot commented Dec 11, 2015

@ckelner This behavior is not straightforward. If we export aws keys before terraform plan works. due to our use case we have to manage two account with same machine. If we forgot to export keys and perform terraform plan with different account keys. We loss our tfstate. Then we have to depend on backup. if we performterraform plan twice then... then lost back to. Hope i am able to explain my use case.

Note: ec2 instance is launched with empty instance role.

@catsby
Copy link
Contributor

catsby commented Dec 11, 2015

Note: ec2 instance is launched with empty instance role.

Can you elaborate on what an "empty" instance role is? In your previous message you mention "launched with IAM role".

@catsby
Copy link
Contributor

catsby commented Dec 11, 2015

An update: a rough version of the patch has been pushed and a PR sent. I've tested the scenario(s) above but am still testing and adjusting:

@avdhoot
Copy link

avdhoot commented Dec 12, 2015

"empty" means ec2 instance launched with IAM role but no permission
attached to that role.

On Sat, Dec 12, 2015 at 12:07 AM, Clint notifications@github.com wrote:

Note: ec2 instance is launched with empty instance role.

Can you elaborate on what an "empty" instance role is? In your previous
message you mention "launched with IAM role".


Reply to this email directly or view it on GitHub
#2693 (comment)
.

@catsby
Copy link
Contributor

catsby commented Dec 14, 2015

@avdhoot ah, thanks for clarifying

@catsby
Copy link
Contributor

catsby commented Dec 16, 2015

Hello friends! Thank you for your patience here. I just merged #4254 to address this, if you have the ability to make Terraform from source and could be so kind, please do so and let me know how it goes. Otherwise it will be out in the next release (soonish).

Thanks for all the help here!

@catsby catsby closed this as completed Dec 16, 2015
@mkirchner
Copy link

Is anyone still experiencing this?

I have an IAM user with full EC2 access and running the basic example causes the following error:

$ terraform plan
Refreshing Terraform state prior to plan...

Error refreshing state: 1 error(s) occurred:

* 1 error(s) occurred:

* InvalidClientTokenId: The security token included in the request is invalid.
    status code: 403, request id: 25xxxxxxx-xxxx-11e6-8891-xxxxxxxxx52
$

This is v0.6.16 on OSX 10.11 (El Capitan). I tried the workarounds mentioned above, to no avail.
Any pointers would be much appreciated... Thanks!

@agarstang
Copy link

This is also a problem for my desired workflow. In a multi-account setup we wish to assume a role in the target account, we do this from machine with an IAM role in the "source" account.

We assume the role and then populate the environment variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_SECURITY_TOKEN) with the temporary credentials. Because we are using an IAM role Terraform does not use the Token from AWS_SECURITY_TOKEN and attempts to use the token from the metadata service with AWS_ACCESS_KEY_ID & AWS_SECRET_ACCESS_KEY environment variable (which obviously does not work).

As far as I can tell this is still broken in 0.7.0-rc3.

@rico-spaceship
Copy link

@agarstang , sorry, I know this has been closed, but just wondering did this work for you in the new version terraform? we are using the latest terraform 0.9.11, but we hitting the same issue. We actually have multi-account setup, and create an IAM role from the source account and let another account to assume the role then grab the temporary credentials,,,but we are still hitting this Get Caller Identity 403 error.

@agarstang
Copy link

Hi @rico-spaceship, iirc our attempt to use environment variables was to get around the lack of role assumption within Terraform (I believe it may have been a GO SDK limitation at the time).

We now use the role_arn parameter of the AWS provider to do this within Terraform.

@ghost
Copy link

ghost commented Apr 8, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@ghost ghost locked and limited conversation to collaborators Apr 8, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug provider/aws waiting-response An issue/pull request is waiting for a response from the community
Projects
None yet
Development

No branches or pull requests