Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aws_launch_configuration does not wait for aws_iam_instance_profile #2349

Closed
jedineeper opened this issue Jun 15, 2015 · 20 comments
Closed

aws_launch_configuration does not wait for aws_iam_instance_profile #2349

jedineeper opened this issue Jun 15, 2015 · 20 comments

Comments

@jedineeper
Copy link
Contributor

I've created a launch configuration and instance profile as below;

resource "aws_iam_instance_profile" "myprofile" {
    name = "myprofile"
    roles = ["${aws_iam_role.myrole.name}"]
}

resource "aws_launch_configuration" "mylc" {
    name = "my-server"
    image_id = "ami-1234567" 
    instance_type = "t2.micro"
    key_name = "mykey"
    iam_instance_profile = "${aws_iam_instance_profile.myprofile.id}"
    security_groups = ["${aws_security_group.mysg.id}"]
}

When I run an apply, the instance profile is not created in time to create the launch configuration and the apply fails. Running the apply again immediately causes it to succeed.

Sounds like something is missing in a dependency graph?

I've not included all of the template dependencies to keep the issue tidy, but if you need a stronger example then please let me know.

@jedineeper
Copy link
Contributor Author

Should add that only two items are created on the second apply, the launch configuration and the then dependent autoscaling group.

@mikeyhill
Copy link

Can confirm, I've run into this issue numerous times - just seems to be a timing issue with the IAM profile and re-running terraform a second times allows the ASG to be setup.

@alexintel
Copy link

+1

Terraform v. 0.5.3: same issue here.
aws_launch_configuration does not wait on aws_iam_instance_profile

I have to run plan and then apply a second time.

@jedineeper
Copy link
Contributor Author

I should add for other people having this issue that I've worked around this by adding

depends_on = ["aws_iam_instance_profile.myprofile"]

to the launch configuration resource.

@delianides
Copy link

Adding @jedineeper's solution above did not resolve the issue for me. I still had to wait a few seconds then run apply again.

@alexintel
Copy link

@delianides
@jedineeper

depends_on, does not work for me in 0.5.3. Tried that previously to force proper graph mapping, but didn't work.

Putting the following local-exec provisioner in aws_iam_instance_profile resource works for me in 0.5.3:

provisioner "local-exec" {
  command = "sleep 10"
}

@joekhoobyar
Copy link
Contributor

In my opinion, the real issue here is that AWS is "eventually consistent". Just because the IAM instance profile is created and an ID is returned is no guarantee that the subsequent call to create the LC which references the IAM profile will be able to access that profile. Terraform needs to be able to detect that issue, wait and then retry.

@phinze
Copy link
Contributor

phinze commented Jun 24, 2015

Hey folks,

In similar scenarios with other resources, we've added logic at the end of Create to ensure the resource is ready to use. So in this case, we'd be looking for an API call we can poll before returning from the aws_iam_instance_profile create so that any descendent resources can assume that the IAM profile is ready by the time they see it.

So in this case, we could try polling on GetInstanceProfile until the profile is successfully returned. In other areas of the AWS API - showing up in the return from a Get has correlated well with a resource being ready to use.

I'll try to follow up with a PR, but describing the strategy here in case somebody else beats me to it. 😀

@zollie
Copy link

zollie commented Jun 24, 2015

Normally I would agree with your approach but my understanding is the AWS API can report the profile created, but it's permissions have not been propagated to the EC2 infrastructure yet (~10 seconds).

For example, the Response to aws iam create-instance-profile --instance-profile-name=Webserver is the profile:

{
    "InstanceProfile": {
        "InstanceProfileId": "AIPAJMBYC7DLSPEXAMPLE",
        "Roles": [],
        "CreateDate": "2015-03-09T20:33:19.626Z",
        "InstanceProfileName": "Webserver",
        "Path": "/",
        "Arn": "arn:aws:iam::123456789012:instance-profile/Webserver"
    }
}

From:

(http://docs.aws.amazon.com/cli/latest/reference/iam/create-instance-profile.html)

@alexintel
Copy link

Awesome! Thanks @phinze. You guys are doing an amazing job. I'm really exited to use it. :)

@phinze
Copy link
Contributor

phinze commented Jun 24, 2015

@zollie Ah gotchya - sounds like you've got good domain knowledge in this area. 👍

I still believe the responsibility should fall on the Create function of the aws_iam_instance_profile resource. If there's no better way to guarantee propagation other than to wait 10 seconds, then I think it's better to add a sleep before iam_instance_profile returns from its Create than to introduce the details of potential dependencies into the aws_instance resource.

In general I'm trying to maintain the policy of "don't return from Create until the resource is truly ready to use" so we can avoid leaking resource details across dependencies.

And then maybe somebody from the community can make noise upstream to see if AWS will yield a better API for guaranteeing that permissions propagation has occurred. 😀

Let me know if this all makes sense to you. Happy to continue to discuss.

@zollie
Copy link

zollie commented Jun 24, 2015

Ya, I get what your saying. :) But a few things to think about ...

How long to sleep before returning from Create? Too short, and the profile may not be ready, too long and the apply is not as timely as could be. Maybe not a big deal?

By using the exponential back off of resource.Retry though we make the aws_launch_configuration Create and in turn the apply as fast as possible.

Also, I can only think of two places an instance profile is used, launch configs, and creating a standalone EC2 instance?

I get not leaking the abstraction though so ...tough call.

@jedineeper
Copy link
Contributor Author

If it is relevant, I think it might be related to the AWS API, I've seen a similar situation in cloudformation, a two minute (exactly) delay in the creation of IAM roles.

@phinze
Copy link
Contributor

phinze commented Jun 24, 2015

Yeah I totally agree that a sleep is a sad tool to have to use.

Here are the scenarios I'm worried about:

  • resources like instance ending up with dozens of conditionals "if X dependency not ready, if Y dependency not ready" etc etc.
  • down the road when a new AWS resource comes out that can also depend on instance profiles - we need to remember to implement the same behavior there.
  • even today - there are use cases where somebody might be kicking off an external system. Picture this:
resource "aws_iam_instance_profile" "foo" { /* ... */ }

resource "null_resource" "kick-off-homegrown-autoscaler" {
  provisioner "local-exec" {
    command = "./launch-instances-with-profile ${aws_iam_instance_profile.foo.id}"
  }
}

For all of these reasons I think even though it's the less efficient solution, it's better to have aws_iam_instance_profile provide the guarantee of resource usability rather than having dependent resources tolerate the discrepancy.

I'm willing to revisit if we find it difficult to land on a stable / reasonable sleep time, but can we pursue that first? Happy to do the leg work to cook up a PR implementing it.

(FWIW I also did experimentation on GetInstanceProfile and it's true that it's immediately available.)

(p.s. Just thought of a crazy idea that might be worth exploring - the iam_instance_profile resource could block until a dummy LaunchConfiguration creates successfully, at which point it immediately removes it and continues.)

@zollie
Copy link

zollie commented Jun 24, 2015

Sure, sleep may just work every time :)

Also, to @jedineeper's point, if the AWS API exhibits this behavior in other areas, it would be good to handle it consistently.

@mitchellh
Copy link
Contributor

Fixed by #2136.

@mtougeron
Copy link
Contributor

@mitchellh This issue is back in v0.6.14. Examples are given in #2136

@phinze
Copy link
Contributor

phinze commented Mar 25, 2016

Thanks for the ping @mtougeron - tracking new fix in #5862. 👍

@hposca
Copy link

hposca commented Dec 26, 2016

I've been fighting with this issue for a month now and the "local-exec" solution, although not beautiful, solved the problem. During this period I used versions 0.7.7, 0.8.1 and 0.8.2.
As it appears to be an AWS API issue, #9164 looks an interesting suggestion.

@ghost
Copy link

ghost commented Apr 18, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@ghost ghost locked and limited conversation to collaborators Apr 18, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

10 participants