Skip to content

Commit b04fde0

Browse files
demartinofratilne
authored andcommitted
Add retries to change_resource_record_sets Route53 API call
Signed-off-by: Francesco De Martino <fdm@amazon.com>
1 parent f5c3cae commit b04fde0

File tree

2 files changed

+9
-1
lines changed

2 files changed

+9
-1
lines changed

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ This file is used to list changes made in each version of the aws-parallelcluste
99
**CHANGES**
1010
- Use inclusive language in internal naming convention.
1111
- Improve error handling in slurm plugin processes when clustermgtd is down.
12+
- Increase max attempts when retrying on Route53 API call failures.
1213

1314
2.10.0
1415
-----

src/slurm_plugin/common.py

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@
1818
from datetime import datetime, timezone
1919

2020
import boto3
21+
from botocore.config import Config
2122
from botocore.exceptions import ClientError
2223

2324
from common.schedulers.slurm_commands import InvalidNodenameError, parse_nodename, update_nodes
@@ -244,7 +245,13 @@ def _update_dns_hostnames(self, nodes):
244245
# Submit calls to change_resource_record_sets in batches of 500 elements each.
245246
# change_resource_record_sets API call has limit of 1000 changes,
246247
# but the UPSERT action counts for 2 calls
247-
route53_client = boto3.client("route53", region_name=self._region, config=self._boto3_config)
248+
# Also pick the number of retries to be the max between the globally configured one and 3.
249+
# This is done to address Route53 API throttling without changing the configured retries for all API calls.
250+
configured_retry = self._boto3_config.retries.get("max_attempts", 0) if self._boto3_config.retries else 0
251+
boto3_config = self._boto3_config.merge(
252+
Config(retries={"max_attempts": max([configured_retry, 3]), "mode": "standard"})
253+
)
254+
route53_client = boto3.client("route53", region_name=self._region, config=boto3_config)
248255
changes_batch_size = 500
249256
for changes_batch in grouper(changes, changes_batch_size):
250257
route53_client.change_resource_record_sets(

0 commit comments

Comments
 (0)