-
Notifications
You must be signed in to change notification settings - Fork 9.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spot instance support #2095
Spot instance support #2095
Conversation
FYI this is also my first time writing any Go code, so please don't be afraid to nitpick for styles / Go idioms I don't know. |
@dalehamel Can you run |
thanks @cbednarski, thanks handy - done. |
Pending: []string{"start", "pending-evaluation", "pending-fulfillment"}, | ||
Target: "fulfilled", | ||
Refresh: SpotInstanceStateRefreshFunc(conn, request_id), | ||
Timeout: 10 * time.Minute, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the script may wait 10 minutes for a bid to complete. You can't run terraform apply
in parallel so In the mean time, you will be unable to apply other changes to your infrastructure. How likely is it that you'll wait the full 10 minutes? Can this be shorter (e.g. 10-30s)? Or configurable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This all comes down to how fast amazon resolves your bid. Typically it's about 5 minutes or less, but I've seen it take as long as half an hour in rare cases.
Making it configurable wouldn't hurt - to be honest though this is just copy pasta. If someone decides they want a spot instance, they should be aware that it will take some time for their bid to resolve
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do I actually need to wait for the bid to resolve? Or can I just post it and move on, and assume my instances may spin up later? When I read the docs on spot I got the impression that you can post a bid and walk away and AWS will boot the instances for you later if the price drops. Maybe that feature is optional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cbednarski that's an interesting proposition.
If we don't wait for the bid to resolve, there's no guarantee that the resource will ever actually be provided, which kind of sucks. On the other hand, it would be nice to be able to put the bid in proceed irregardless of if the bid ever resolves.
What do you think about making this another switch? Something like "spot_wait" to toggle if you should wait for the instance to come up or not. I can see use cases for both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is more Terraform-like to just wait for the bid for now. Terraform waits for things to be fully ready at a resource level. A spot instance isn't ready until it exists.
Let's get this merged and then we might want to entertain adding another flag to the resource that just tells Terraform to just wait for the bid and move on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, I'm working moving spot bids into their own resource right now - i'll make sure that instances wait for the bid to resolve (i had made this a flag)
closing in favor of spot bids as a separate resource, which is provided by #2109. We can reopen this if we decide to back-peddle. |
As discussed in #2109, this approach is simpler than a separate resource. I've re-opened the PR and I've done some cleanup and fixed most of the FIXME's Aside from missing tests (there are none, but I've started to point out the ones I expect to have to write), I think this is now ready for a more serious and thorough code review. Behavior should be:
Usage is like:
@cbednarski @mitchellh @catsby for review |
|
||
if output != nil { | ||
cancelled := output.CancelledSpotInstanceRequests | ||
if len(cancelled) != 1 || *cancelled[0].SpotInstanceRequestID != reqID { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little wary of this logic myself. It might cause the stack to fail to delete if the spot instance request wasn't fulfilled correctly (I'm not sure how cancel responds to requests that are already cancelled / in some other bad state)
One option I've considered is to check if the spot instance is actually active, and if it's not, just skip trying to cancel altogether. That seems cleanest / safest to me.
What do you guys think we should do here?
FYI i tried writing some tests... but it's very difficult given that the tests I saw assume an instance has already been launched, and there's no way to access the resource scheme from the tests as far as I can tell. I also read that writing acceptance tests are not required to accept the PR, so maybe the manual testing I have done is sufficient? If that's the case, I guess we're down to just nitpicking the code? |
Anything needed from me to help get this reviewed and merged? |
Nope we'll do so today :) |
\o/ |
super exciting, great work @dalehamel!! |
Thanks! I do feel weird not writing any tests for this though. I would love to pair with someone more familiar with the testing framework On Thursday, May 28, 2015, ketzacoatl notifications@github.com wrote:
|
bump - any movement on this? |
sorry to keep bumping guys, we'd like to ship this internally and don't want to have to fork terraform. Is there anything further I can do to facilitate the review and help get this 🚢 💨 ? cc @mitchellh |
Hi @dalehamel - sorry for the radio silence here - bump received! We'll get this moving today one way or another. 👍 |
✨ ❤️ ✨ |
Hey @dalehamel I'm sorry for the silence here, it's my fault. I'm re-reviewing this and your other PR just to make sure I grok things. In the midsts of researching all this (I admit spot instances are new-ish to me), I found that you can use an Auto Scaling Group to launch spot instances: Terraform has support for the You seem to be well versed here, can you help me understand if AGSs+LC with spot prices is not a good route, or at least the specifics why they are not a viable option for your use case? Thanks! |
@catsby yeah, i was already aware of the ASG support and it's great to see that exists. Unfortunately, I do think that there's a very valid separate use case for supporting standalone spot instances that is not supported by the autoscaling groups. The biggest one I can think of is simplicity, if you want to have a very simple stack you shouldn't have to add an autoscaling group with one spot instance - that's just a hack. It's an additional level of indirection. For longer running instances that just happen to be spot, where you only ever want one of them, creating an autoscaling group is a silly complication, and extra layer of indirection. |
@catsby, as a heavy user of ASG and AWS instances in Terraform, but also needing to use spot pricing.. I agree with @dalehamel on this.. it makes the most sense for both resource types (aws instance and asg) to support spot pricing. Thank you for your work to review this PR! |
@ketzacoatl just to be sure you are aware, you can already use spot pricing with ASGs in terraform. (I think that was linked to in the issue i referenced above). Can you clarify that you also need individual instances to have spot prices? |
Yes @dalehamel, I can confirm wanting spot prices for individual instances. I think spot pricing should be supported on both resources. I was happy to find out about the existing support for ASG, and I'm excited about (and grateful for) your work here. thanks! |
Hi @dalehamel - just pushed #2263 with a take on this that incorporates your code along with the latest from the core team's discussion about how best to model this. Since we've again shifted to "additional resource is better" - my idea there was to pick up the coding work instead of asking you to yet again change tacks. Have a look and let me know what you think! |
This is an iteration on the great work done by @dalehamel in PRs #2095 and #2109. The core team went back and forth on how to best model Spot Instance Requests, requesting and then rejecting a separate-resource implementation in #2109. After more internal discussion, we landed once again on a separate resource to model Spot Instance Requests. Out of respect for @dalehamel's already-significant donated time, with this I'm attempting to pick up the work to take this across the finish line. Important architectural decisions represented here: * Spot Instance Requests are always of type "persistent", to properly match Terraform's declarative model. * The spot_instance_request resource exports several attributes that are expected to be constantly changing as the spot market changes: spot_bid_status, spot_request_state, and instance_id. Creating additional resource dependencies based on these attributes is not recommended, as Terraform diffs will be continually generated to keep up with the live changes. * When a Spot Instance Request is deleted/canceled, an attempt is made to terminate the last-known attached spot instance. Race conditions dictate that this attempt cannot guarantee that the associated spot instance is terminated immediately. Implementation notes: * This version of aws_spot_instance_request borrows a lot of common code from aws_instance. * In order to facilitate borrowing, we introduce `awsInstanceOpts`, an internal representation of instance details that's meant to be shared between resources. The goal here would be to refactor ASG Launch Configurations to use the same struct. * The new aws_spot_instance_request acc. test is passing. * All aws_instance acc. tests remain passing.
Closing this since we finished it up in #2263 - thanks again for all your hard work on this @dalehamel! |
👏 |
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further. |
WIP
This is an attempt to close #1339.
I'd like some feedback on the approach I've taken here before I smooth off the rough edges
Initially I thought it would be a good idea to implement spot instances as a separate resources, but since spot instances share almost all of the same attributes as normal on demand instances, this isn't very DRY.
So, the approach I've taken here is to instead add spot related attributes to the aws_instance resource. If 'spot_price' is specified, it will attempt to launch a spot instance. It will fail if the spot instance provisioning fails unless the 'spot_fallback' option is specified, in which case it will just execute the normal on-demand provisioning path.
The code changes here do a few things:
There are several things that I recognize need to be fixed before this can be merged:
@ketzacoatl @mitchellh @catsby for review