Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EC2 vCPU-based On-Demand Instance Limits #432

Closed
emilburzo opened this issue Sep 30, 2019 · 11 comments
Closed

EC2 vCPU-based On-Demand Instance Limits #432

emilburzo opened this issue Sep 30, 2019 · 11 comments

Comments

@emilburzo
Copy link

emilburzo commented Sep 30, 2019

It looks like AWS will do some changes to EC2 on-demand limits:

On-Demand Instance usage toward the vCPU-based limit is measured in terms of the number of virtual central processing units (vCPUs) attached to your running instances, making it easier to take advantage of Amazon EC2’s broad selection of Instance Types. In addition, there are only five different On-Demand Instance limits

(source) (faq)

It is currently opt-in until October 24, 2019, afterwards all accounts will switch to vCPU-based limits regardless of the account’s opt-in status.

Are there plans / is it possible to support this new style in awslimitchecker?

@jantman
Copy link
Owner

jantman commented Oct 9, 2019

@emilburzo I was on vacation for the last two weeks when notice about that seems to have come out, and just found out about it yesterday. It seems like I'm going to need to plan a day or two of awslimitchecker work some time in the near future.

The short answer is: yes, I'll definitely need to do the work to support this. My gut reaction is that, assuming I can get the work done in time, I'll be releasing a new major version (8.0.0) to support the vCPU-based limits, and users shouldn't update to that version until either they opt-in or October 24. Given the short time frame, it doesn't seem worth adding all the code to support both types of limits, and awslimitchecker's versioning policy and install docs are pretty clear about not blindly upgrading to the latest version.

I may need some assistance in testing this though, as I'm not sure that I have an account with Trusted Advisor that I can safely opt-in.

On the positive side, they mention CloudWatch Metrics for checking usage and limits... if those are reliable, that would greatly simplify the awslimitchecker code, if we could just grab the most recent values for some CW metrics.

It's also definitely notable that the FAQ says they'll be automatically increasing limits based on usage, which makes it sound like hitting these limits should be less common in the future.

@jantman
Copy link
Owner

jantman commented Oct 9, 2019

Update: Eek! In the FAQ:

Starting September 24, 2019, vCPU-based instance limits will be available in all commercial AWS Regions except the AWS China (Beijing and Ningxia) Regions.

So... this greatly complicates things, as it seems to indicate that we'll need two completely separate code paths with separate sets of limits: one with vCPU limits for the "normal" partitions, and one with the current code/limits for China, GovCloud, etc.

@emilburzo
Copy link
Author

I may need some assistance in testing this though, as I'm not sure that I have an account with Trusted Advisor that I can safely opt-in.

Sure, I'll not be around next week, but after that I can help with testing this.

It's also definitely notable that the FAQ says they'll be automatically increasing limits based on usage, which makes it sound like hitting these limits should be less common in the future.

🤞 that they do that and it works as expected.

So... this greatly complicates things, as it seems to indicate that we'll need two completely separate code paths with separate sets of limits: one with vCPU limits for the "normal" partitions, and one with the current code/limits for China, GovCloud, etc.

Damn, I was hoping it would simplify things in the end.

@mouellet-coveo
Copy link

@jantman, I'm already running awslimitchecker and I do have an account that is opted-in already. Let me know if I can help testing it.

@jantman
Copy link
Owner

jantman commented Oct 28, 2019

Many apologies for the delay in implementing this... it's occasionally difficult to balance life and maintaining an open source project in my spare time.

For some reason, I'd remembered the date as October 30, not October 24. I'm sincerely sorry if this breaks things for anyone (though I'm not sure how quickly Trusted Advisor will be updated, and given the lack of an API to retrieve EC2 service limits, I'm hoping it won't).

I'm going to do my best to get this at least fixed in a branch, if not released, within the next few days, though I do have some personal commitments that may push it back to mid- or late-week.

One thing that's going to complicate this a bit is awslimitchecker has a requirement of needing to know all of the limits and their default values before connecting to any APIs (though it will know the region name)... this is the first time it's come up that the actual limits themselves can change based on region/partition. I'm not quite sure how I'm going to handle this yet... the easiest for users would probably be to deal with possible breakage over the period from release date until November 7th, and hard code behavior based on the region name beginning with cn- or us-gov- (or perhaps an environment variable to override this behavior). The alternative would be to expose all limits and only set data for the ones that are in use in the current region, but that would likely be very confusing to users and really complicate anyone using awslimitchecker via the Python API.

@jantman
Copy link
Owner

jantman commented Oct 29, 2019

So... I'm working on implementing this, but I've hit a bit of a roadblock: in the past, when Running On-Demand Instances limits were based on instance type, we could easily look up our Reserved Instances and subtract them from the count of running instances.

I'm not quite sure how this is going to work with vCPU-based limits... RIs don't count against your Running On-Demand Instances limit, but there's no clear way to match up type-specific RIs with vCPU-based limits.

I'm going to try to reach out to support on one of my accounts and see if I can get an explanation.

@jantman
Copy link
Owner

jantman commented Oct 30, 2019

@mouellet-coveo @emilburzo I believe I have the code ready for this in the issues/432 branch, if you'd be willing to test it. There are currently 2 caveats:

  1. I've just guessed at how Reserved Instances work with these new limits, since AWS hasn't confirmed yet. Please see the relevant section of the Changelog.
  2. I haven't implemented Trusted Advisor support for the new limits, since I'm planning on finally working Retrieve Current Limits from Service Quotas #413 to implement Service Quotas. As a result, awslimitchecker will only know about the default limit values for the new vCPU-based limits unless you manually override them, until the work for Retrieve Current Limits from Service Quotas #413 is done.

jantman added a commit that referenced this issue Oct 30, 2019
jantman added a commit that referenced this issue Oct 30, 2019
Issues/432 - vCPU based EC2 limits
@jantman
Copy link
Owner

jantman commented Oct 30, 2019

Regarding reserved instances, I've confirmed with AWS that my calculations are correct.

The fix for this has been merged in #440 and will be present in the next release.

@jantman
Copy link
Owner

jantman commented Nov 4, 2019

This has been fixed in 8.0.0 which is now live on PyPI and Docker Hub. Apologies for any delays in getting this released.

@jantman jantman closed this as completed Nov 4, 2019
@bergkampsliew
Copy link
Contributor

thanks so much for the fix!

i just tried it but getting the following console output. Not sure if it is a bug or expected behavior (it also didn't halt the execution).

WARNING:awslimitchecker.quotas:Attempted to retrieve Service Quotas for service code iam but received NoSuchResourceException

I also executed command below and get somewhat similar result.

aws service-quotas list-service-quotas --service-code iam
An error occurred (NoSuchResourceException) when calling the ListServiceQuotas operation: The request failed because the specified service does not exist.

@jantman
Copy link
Owner

jantman commented Nov 4, 2019

@bergkampsliew I'm not quite sure why, but it seems that Service Quotas only reports certain limits - including iam - in certain regions (us-east-1 in this case).

That should be able to be safely ignored, especially since IAM also provides that information via its own API... I just haven't had this in the wild long enough to decide if that can be silently ignored or not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants