Use exponential backoff when getting the workflow status. (kubeflow#170)

* We were using a fixed retry window that was too short O(5 seconds). * Use exponential backoff and retry for up to 3 minutes. We are seeing test flakes due to problems getting the workflow status. * Related to kubeflow#169
Linchin · Jul 6, 2018 · f7c7645 · f7c7645
1 parent 2f9a392
commit f7c7645
Showing 1 changed file with 5 additions and 1 deletion.
diff --git a/py/kubeflow/testing/argo_client.py b/py/kubeflow/testing/argo_client.py
@@ -28,7 +28,11 @@ def log_status(workflow):
     logging.exception('KeyError: %s', e)
 
 
-@retry(stop_max_attempt_number=3, wait_fixed=2000,
+# Wait 2^x * 1 second between retries up to a max of 10 seconds between
+# retries.
+# Retry for a maximum of 3 minutes.
+@retry(wait_exponential_multiplier=1000, wait_exponential_max=10000,
+       stop_max_delay=3*60*1000,
        retry_on_exception=lambda e: not isinstance(e, util.TimeoutError))
 def wait_for_workflows(client, namespace, names,
                       timeout=datetime.timedelta(minutes=30),