-
Notifications
You must be signed in to change notification settings - Fork 899
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(analysis): Add Dry-Run Mode #1627
Conversation
242ff08
to
b758b4b
Compare
Codecov Report
@@ Coverage Diff @@
## master #1627 +/- ##
==========================================
+ Coverage 81.97% 82.07% +0.09%
==========================================
Files 116 116
Lines 15929 16064 +135
==========================================
+ Hits 13058 13184 +126
- Misses 2201 2208 +7
- Partials 670 672 +2
Continue to review full report at Codecov.
|
e10124b
to
0287e05
Compare
f4bc268
to
4e93fce
Compare
Signed-off-by: Rohit Agrawal <rohit.agrawal@databricks.com>
4e93fce
to
6b756c3
Compare
Signed-off-by: Rohit Agrawal <rohit.agrawal@databricks.com>
…un-mode Signed-off-by: Rohit Agrawal <rohit.agrawal@databricks.com>
Signed-off-by: Rohit Agrawal <rohit.agrawal@databricks.com>
5418b07
to
76caceb
Compare
9183978
to
bd3a75c
Compare
Signed-off-by: Rohit Agrawal <rohit.agrawal@databricks.com>
bd3a75c
to
f390173
Compare
Signed-off-by: Rohit Agrawal <rohit.agrawal@databricks.com>
aa6b320
to
1a025b7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think last change request about the status type structure and I think this will be good to go.
Signed-off-by: Rohit Agrawal <rohit.agrawal@databricks.com>
…un-mode Signed-off-by: Rohit Agrawal <rohit.agrawal@databricks.com>
Kudos, SonarCloud Quality Gate passed! 0 Bugs No Coverage information |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really great feature. Thank you!
@jessesuen amazing feature! |
Background
We are now using Argo Rollouts in production and have a few additional requirements to extend it to all our backend services. One of the top ask/requirements in the checklist is to have an ability to evaluate new analyses for their real behavior, prior to actually rolling back based on those analyses. One possible solution would be to add a
Dry-Run
mode to the AnalysisTemplate so that after analyzing the metrics which are marked asDry-Run
the controller does not result in the rollback/failures.Use Cases
When we add a new M3/PROM metric to the AnalysisTemplate, users . Adding a
Dry-Run
mode can help us identify the issues (if any) in our metrics and resolve them before we start evaluating them to make the actual rollout decisions.For those metrics which are marked to be in the
Dry-Run
mode, we need a mechanism to understand/answer the question: ”How would Argo have evaluated this query?”, but explicitly take no action.Changes
In this PR, we are adding a Dry-Run mode at the per-metric level.
dryRun
can be used on a metric to control whether or not to evaluate that metric in a dry-run mode. A metric runningin the dry-run mode won't impact the final state of the rollout or experiment even if it fails or the evaluation comes
out as inconclusive.
The following example queries prometheus every 5 minutes to get the total number of 4XX and 5XX errors, and even if the
evaluation of the metric to monitor the 5XX error-rate fail, the analysis run will pass.
A wildcard '*' can be used to make all the metrics run in the dry-run mode. In the following example, even if one or
both metrics fail, the analysis run will pass.
Dry-Run Summary
If one or more metrics are running in the dry-run mode, the summary of the dry-run results gets appended to the analysis
run message. Assuming that the
total-4xx-errors
metric fails in the above example but, thetotal-5xx-errors
succeeds, the final dry-run summary will look like this.
Dry-Run Rollouts
If a rollout wants to dry run its analysis, it simply needs to specify the
dryRun
field to itsanalysis
stanza. If arollout wants to dry run its analysis, it simply needs to specify the
dryRun
field to itsanalysis
stanza. In thefollowing example, all the metrics from
random-fail
andalways-pass
get merged and executed in the dry-run mode.Metrics
The
analysis_run_metric_phase
metrics will have an additional"dry_run"
label to indicate whether or not the metric is running in a Dry-Run mode.Testing
Checklist
"fix(controller): Updates such and such. Fixes #1234"
.Signed-off-by: Rohit Agrawal rohit.agrawal@databricks.com