Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible regression? Cluster Launch With More than 20 instances hits AWS Rate Limits #340

Closed
rlaabs opened this issue Apr 23, 2021 · 4 comments · Fixed by #342
Closed

Possible regression? Cluster Launch With More than 20 instances hits AWS Rate Limits #340

rlaabs opened this issue Apr 23, 2021 · 4 comments · Fixed by #342

Comments

@rlaabs
Copy link
Contributor

rlaabs commented Apr 23, 2021

When launching a cluster (tried m5 and r4 instances) with about more than 20 instances the following error is raised:

botocore.exceptions.ClientError: An error occurred (RequestLimitExceeded) when calling the DescribeSubnets operation (reached max retries: 4): Request limit exceeded.

From the traceback it looks like this may be caused by the changes in #296 ?

-->

  • Flintrock version: version 2.0.0.dev0
  • Python version: 3.9
  • OS: mac os

Traceback:

Traceback (most recent call last):
  File "/Users/robertlaabs/opt/miniconda3/envs/flintrock_test/bin/flintrock", line 8, in <module>
    sys.exit(main())
  File "/Users/robertlaabs/opt/miniconda3/envs/flintrock_test/lib/python3.9/site-packages/flintrock/flintrock.py", line 1247, in main
    cli(obj={})
  File "/Users/robertlaabs/opt/miniconda3/envs/flintrock_test/lib/python3.9/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/Users/robertlaabs/opt/miniconda3/envs/flintrock_test/lib/python3.9/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/Users/robertlaabs/opt/miniconda3/envs/flintrock_test/lib/python3.9/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/robertlaabs/opt/miniconda3/envs/flintrock_test/lib/python3.9/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/robertlaabs/opt/miniconda3/envs/flintrock_test/lib/python3.9/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/Users/robertlaabs/opt/miniconda3/envs/flintrock_test/lib/python3.9/site-packages/click/decorators.py", line 17, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/Users/robertlaabs/opt/miniconda3/envs/flintrock_test/lib/python3.9/site-packages/flintrock/flintrock.py", line 486, in launch
    cluster = ec2.launch(
  File "/Users/robertlaabs/opt/miniconda3/envs/flintrock_test/lib/python3.9/site-packages/flintrock/ec2.py", line 54, in wrapper
    res = func(*args, **kwargs)
  File "/Users/robertlaabs/opt/miniconda3/envs/flintrock_test/lib/python3.9/site-packages/flintrock/ec2.py", line 983, in launch
    provision_cluster(
  File "/Users/robertlaabs/opt/miniconda3/envs/flintrock_test/lib/python3.9/site-packages/flintrock/core.py", line 714, in provision_cluster
    run_against_hosts(partial_func=partial_func, hosts=hosts)
  File "/Users/robertlaabs/opt/miniconda3/envs/flintrock_test/lib/python3.9/site-packages/flintrock/core.py", line 510, in run_against_hosts
    future.result()
  File "/Users/robertlaabs/opt/miniconda3/envs/flintrock_test/lib/python3.9/concurrent/futures/_base.py", line 438, in result
    return self.__get_result()
  File "/Users/robertlaabs/opt/miniconda3/envs/flintrock_test/lib/python3.9/concurrent/futures/_base.py", line 390, in __get_result
    raise self._exception
  File "/Users/robertlaabs/opt/miniconda3/envs/flintrock_test/lib/python3.9/concurrent/futures/thread.py", line 52, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/Users/robertlaabs/opt/miniconda3/envs/flintrock_test/lib/python3.9/site-packages/flintrock/core.py", line 775, in provision_node
    service.configure(
  File "/Users/robertlaabs/opt/miniconda3/envs/flintrock_test/lib/python3.9/site-packages/flintrock/services.py", line 222, in configure
    mapping=generate_template_mapping(
  File "/Users/robertlaabs/opt/miniconda3/envs/flintrock_test/lib/python3.9/site-packages/flintrock/core.py", line 458, in generate_template_mapping
    'master_ip': cluster.master_ip,
  File "/Users/robertlaabs/opt/miniconda3/envs/flintrock_test/lib/python3.9/site-packages/flintrock/ec2.py", line 90, in master_ip
    if self.private_network:
  File "/Users/robertlaabs/opt/miniconda3/envs/flintrock_test/lib/python3.9/site-packages/flintrock/ec2.py", line 86, in private_network
    return not ec2.Subnet(self.master_instance.subnet_id).map_public_ip_on_launch
  File "/Users/robertlaabs/opt/miniconda3/envs/flintrock_test/lib/python3.9/site-packages/boto3/resources/factory.py", line 339, in property_loader
    self.load()
  File "/Users/robertlaabs/opt/miniconda3/envs/flintrock_test/lib/python3.9/site-packages/boto3/resources/factory.py", line 505, in do_action
    response = action(self, *args, **kwargs)
  File "/Users/robertlaabs/opt/miniconda3/envs/flintrock_test/lib/python3.9/site-packages/boto3/resources/action.py", line 83, in __call__
    response = getattr(parent.meta.client, operation_name)(**params)
  File "/Users/robertlaabs/opt/miniconda3/envs/flintrock_test/lib/python3.9/site-packages/botocore/client.py", line 276, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/Users/robertlaabs/opt/miniconda3/envs/flintrock_test/lib/python3.9/site-packages/botocore/client.py", line 586, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (RequestLimitExceeded) when calling the DescribeSubnets operation (reached max retries: 4): Request limit exceeded.
@rlaabs rlaabs changed the title Possible regression? Cluster Lunch With More than 20 instances hits AWS Rate Limits Possible regression? Cluster Launch With More than 20 instances hits AWS Rate Limits Apr 23, 2021
@nchammas
Copy link
Owner

Thanks for the report. I guess the problem is in the repeated calls to ec2.Subnet():

@property
def private_network(self):
ec2 = boto3.resource(service_name='ec2', region_name=self.region)
return not ec2.Subnet(self.master_instance.subnet_id).map_public_ip_on_launch

I believe @luhhujbb warned that this would happen. I think a fix worth exploring would be to decorate private_network() with @lru_cache. (There are newer alternatives available like @cached_property, but that requires Python 3.8+. Flintrock currently supports 3.6+.)

@luhhujbb
Copy link
Contributor

Hi, it seems that @cached_property is compatible with python 3.6 : cached-property

@nchammas
Copy link
Owner

That's neat. I would prefer to stick to the standard library, but if there is a problem combining @property with @lru_cache then cached-property would be a good way to go.

@luhhujbb
Copy link
Contributor

luhhujbb commented May 3, 2021

@nchammas It's seems Ok for the combination of @property and @functools.lru_cache cf : https://stackoverflow.com/questions/4037481/caching-class-attributes-in-python

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants