Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spot instance support #2095

Closed
wants to merge 5 commits into from
Closed

Conversation

dalehamel
Copy link

WIP

This is an attempt to close #1339.

I'd like some feedback on the approach I've taken here before I smooth off the rough edges

Initially I thought it would be a good idea to implement spot instances as a separate resources, but since spot instances share almost all of the same attributes as normal on demand instances, this isn't very DRY.

So, the approach I've taken here is to instead add spot related attributes to the aws_instance resource. If 'spot_price' is specified, it will attempt to launch a spot instance. It will fail if the spot instance provisioning fails unless the 'spot_fallback' option is specified, in which case it will just execute the normal on-demand provisioning path.

The code changes here do a few things:

  • Add the necessary resource data for spot instances (spot_price, spot_fallback). I plan on also adding a 'spot_persist' flag, so that the spot instance can be automatically relaunched if the spot instance is preempted by amazon.
  • The provisioning of the on demand instance is now conditional on the request not being for a spot instance or a spot request failing, and the spot_fallback option being set.
  • Spot requests will take advantage of all of the attribute parsing that is done in the runOpts for on demand instances, to keep things DRY.

There are several things that I recognize need to be fixed before this can be merged:

  • delete should delete the spot request
  • all other FIXMEs should be addressed
  • 2 spaces should be converted to tabs to be consistent with the rest of the code

@ketzacoatl @mitchellh @catsby for review

@dalehamel
Copy link
Author

FYI this is also my first time writing any Go code, so please don't be afraid to nitpick for styles / Go idioms I don't know.

@cbednarski
Copy link
Contributor

@dalehamel Can you run go fmt on your checkins? Go uses hard tabs for indentation. The formatting tool will fix it for you. Reading through the rest of it atm.

@dalehamel
Copy link
Author

thanks @cbednarski, thanks handy - done.

Pending: []string{"start", "pending-evaluation", "pending-fulfillment"},
Target: "fulfilled",
Refresh: SpotInstanceStateRefreshFunc(conn, request_id),
Timeout: 10 * time.Minute,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the script may wait 10 minutes for a bid to complete. You can't run terraform apply in parallel so In the mean time, you will be unable to apply other changes to your infrastructure. How likely is it that you'll wait the full 10 minutes? Can this be shorter (e.g. 10-30s)? Or configurable?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all comes down to how fast amazon resolves your bid. Typically it's about 5 minutes or less, but I've seen it take as long as half an hour in rare cases.

Making it configurable wouldn't hurt - to be honest though this is just copy pasta. If someone decides they want a spot instance, they should be aware that it will take some time for their bid to resolve

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I actually need to wait for the bid to resolve? Or can I just post it and move on, and assume my instances may spin up later? When I read the docs on spot I got the impression that you can post a bid and walk away and AWS will boot the instances for you later if the price drops. Maybe that feature is optional?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cbednarski that's an interesting proposition.

If we don't wait for the bid to resolve, there's no guarantee that the resource will ever actually be provided, which kind of sucks. On the other hand, it would be nice to be able to put the bid in proceed irregardless of if the bid ever resolves.

What do you think about making this another switch? Something like "spot_wait" to toggle if you should wait for the instance to come up or not. I can see use cases for both.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is more Terraform-like to just wait for the bid for now. Terraform waits for things to be fully ready at a resource level. A spot instance isn't ready until it exists.

Let's get this merged and then we might want to entertain adding another flag to the resource that just tells Terraform to just wait for the bid and move on.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, I'm working moving spot bids into their own resource right now - i'll make sure that instances wait for the bid to resolve (i had made this a flag)

@dalehamel
Copy link
Author

closing in favor of spot bids as a separate resource, which is provided by #2109. We can reopen this if we decide to back-peddle.

@dalehamel dalehamel closed this May 27, 2015
@dalehamel dalehamel reopened this May 27, 2015
@dalehamel
Copy link
Author

As discussed in #2109, this approach is simpler than a separate resource. I've re-opened the PR and I've done some cleanup and fixed most of the FIXME's

Aside from missing tests (there are none, but I've started to point out the ones I expect to have to write), I think this is now ready for a more serious and thorough code review.

Behavior should be:

  • If a spot_price is specified, the launch strategy is to request a spot instance and block waiting for it to resolve. If it cannot resolve (or times out - note timeout still not configurable), then the resource creation will fail.
  • Deleting the resource should delete the spot request as well - this is particularly important for persistent spot requests, as the instance can and will come back from the dead if the spot request is left around and the instance is terminated
  • "read" should update and store the spot instance status

Usage is like:

resource "aws_instance" "test" {
    ami = "ami-50948838"
    instance_type = "m1.small"
    spot_price = "0.05"
    spot_persist = true
    key_name = "dale.hamel"
    tags {
        Name = "HelloWorld"
    }
}

@cbednarski @mitchellh @catsby for review


if output != nil {
cancelled := output.CancelledSpotInstanceRequests
if len(cancelled) != 1 || *cancelled[0].SpotInstanceRequestID != reqID {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little wary of this logic myself. It might cause the stack to fail to delete if the spot instance request wasn't fulfilled correctly (I'm not sure how cancel responds to requests that are already cancelled / in some other bad state)

One option I've considered is to check if the spot instance is actually active, and if it's not, just skip trying to cancel altogether. That seems cleanest / safest to me.

What do you guys think we should do here?

@dalehamel
Copy link
Author

FYI i tried writing some tests... but it's very difficult given that the tests I saw assume an instance has already been launched, and there's no way to access the resource scheme from the tests as far as I can tell.

I also read that writing acceptance tests are not required to accept the PR, so maybe the manual testing I have done is sufficient?

If that's the case, I guess we're down to just nitpicking the code?

@dalehamel dalehamel changed the title WIP spot instances Spot instance support May 28, 2015
@dalehamel
Copy link
Author

Anything needed from me to help get this reviewed and merged?

@mitchellh
Copy link
Contributor

Nope we'll do so today :)

@dalehamel
Copy link
Author

Nope we'll do so today :)

\o/

@ketzacoatl
Copy link
Contributor

super exciting, great work @dalehamel!!

@dalehamel
Copy link
Author

Thanks!

I do feel weird not writing any tests for this though.

I would love to pair with someone more familiar with the testing framework
and the project to submit some tests to make sure that all cases and states
are covered

On Thursday, May 28, 2015, ketzacoatl notifications@github.com wrote:

super exciting, great work @dalehamel https://github.com/dalehamel!!


Reply to this email directly or view it on GitHub
#2095 (comment).

@dalehamel
Copy link
Author

bump - any movement on this?

@dalehamel
Copy link
Author

sorry to keep bumping guys, we'd like to ship this internally and don't want to have to fork terraform.

Is there anything further I can do to facilitate the review and help get this 🚢 💨 ?

cc @mitchellh

@phinze
Copy link
Contributor

phinze commented Jun 3, 2015

Hi @dalehamel - sorry for the radio silence here - bump received! We'll get this moving today one way or another. 👍

@dalehamel
Copy link
Author

✨ ❤️ ✨

@catsby
Copy link
Contributor

catsby commented Jun 3, 2015

Hey @dalehamel I'm sorry for the silence here, it's my fault. I'm re-reviewing this and your other PR just to make sure I grok things.

In the midsts of researching all this (I admit spot instances are new-ish to me), I found that you can use an Auto Scaling Group to launch spot instances:

Terraform has support for the spot_price field in Launch Configurations used by ASGs (sadly undocumented, but it's there). On the surface, to me this seems a more viable route for maintaining spot instances.

You seem to be well versed here, can you help me understand if AGSs+LC with spot prices is not a good route, or at least the specifics why they are not a viable option for your use case?

Thanks!

@dalehamel
Copy link
Author

You seem to be well versed here, can you help me understand if AGSs+LC with spot prices is not a good route, or at least the specifics why they are not a viable option for your use case?

@catsby yeah, i was already aware of the ASG support and it's great to see that exists.

Unfortunately, I do think that there's a very valid separate use case for supporting standalone spot instances that is not supported by the autoscaling groups.

The biggest one I can think of is simplicity, if you want to have a very simple stack you shouldn't have to add an autoscaling group with one spot instance - that's just a hack. It's an additional level of indirection.

For longer running instances that just happen to be spot, where you only ever want one of them, creating an autoscaling group is a silly complication, and extra layer of indirection.

@ketzacoatl
Copy link
Contributor

@catsby, as a heavy user of ASG and AWS instances in Terraform, but also needing to use spot pricing.. I agree with @dalehamel on this.. it makes the most sense for both resource types (aws instance and asg) to support spot pricing.

Thank you for your work to review this PR!

@dalehamel
Copy link
Author

@ketzacoatl just to be sure you are aware, you can already use spot pricing with ASGs in terraform. (I think that was linked to in the issue i referenced above).

Can you clarify that you also need individual instances to have spot prices?

@ketzacoatl
Copy link
Contributor

Yes @dalehamel, I can confirm wanting spot prices for individual instances. I think spot pricing should be supported on both resources. I was happy to find out about the existing support for ASG, and I'm excited about (and grateful for) your work here. thanks!

@phinze
Copy link
Contributor

phinze commented Jun 7, 2015

Hi @dalehamel - just pushed #2263 with a take on this that incorporates your code along with the latest from the core team's discussion about how best to model this.

Since we've again shifted to "additional resource is better" - my idea there was to pick up the coding work instead of asking you to yet again change tacks.

Have a look and let me know what you think!

phinze added a commit that referenced this pull request Jun 7, 2015
This is an iteration on the great work done by @dalehamel in PRs #2095
and #2109.

The core team went back and forth on how to best model Spot Instance
Requests, requesting and then rejecting a separate-resource
implementation in #2109.

After more internal discussion, we landed once again on a separate
resource to model Spot Instance Requests. Out of respect for
@dalehamel's already-significant donated time, with this I'm attempting
to pick up the work to take this across the finish line.

Important architectural decisions represented here:

 * Spot Instance Requests are always of type "persistent", to properly
   match Terraform's declarative model.
 * The spot_instance_request resource exports several attributes that
   are expected to be constantly changing as the spot market changes:
   spot_bid_status, spot_request_state, and instance_id. Creating
   additional resource dependencies based on these attributes is not
   recommended, as Terraform diffs will be continually generated to keep
   up with the live changes.
 * When a Spot Instance Request is deleted/canceled, an attempt is made
   to terminate the last-known attached spot instance. Race conditions
   dictate that this attempt cannot guarantee that the associated spot
   instance is terminated immediately.

Implementation notes:

 * This version of aws_spot_instance_request borrows a lot of common
   code from aws_instance.
 * In order to facilitate borrowing, we introduce `awsInstanceOpts`, an
   internal representation of instance details that's meant to be shared
   between resources. The goal here would be to refactor ASG Launch
   Configurations to use the same struct.
 * The new aws_spot_instance_request acc. test is passing.
 * All aws_instance acc. tests remain passing.
@phinze
Copy link
Contributor

phinze commented Jun 9, 2015

Closing this since we finished it up in #2263 - thanks again for all your hard work on this @dalehamel!

@phinze phinze closed this Jun 9, 2015
@dalehamel
Copy link
Author

👏

@ghost
Copy link

ghost commented May 2, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@ghost ghost locked and limited conversation to collaborators May 2, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

support EC2 spot instances
6 participants