Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: metrics don't get emitted properly during retry. Fixes #8207 #10463 #10489

Merged
merged 11 commits into from
Mar 27, 2023

Conversation

jiachengxu
Copy link
Member

@jiachengxu jiachengxu commented Feb 8, 2023

Signed-off-by: Jiacheng Xu xjcmaxwellcjx@gmail.com

Fixes #8207
Fixes #10463
This PR fixes the issue that metrics don't get emitted correctly, and lets the controller also emit metrics during every retry.

Please do not open a pull request until you have checked ALL of these:

  • Create the PR as draft .
  • Run make pre-commit -B to fix codegen and lint problems.
  • Sign-off your commits (otherwise the DCO check will fail).
  • Use a conventional commit message (otherwise the commit message check will fail).
  • "Fixes #" is in both the PR title (for release notes) and this description (to automatically link and close the issue).
  • Add unit or e2e tests. Say how you tested your changes. If you changed the UI, attach screenshots.
  • Github checks are green.
  • Once required tests have passed, mark your PR "Ready for review".

If changes were requested, and you've made them, dismiss the review to get it reviewed again.

Signed-off-by: Jiacheng Xu <xjcmaxwellcjx@gmail.com>
Signed-off-by: Jiacheng Xu <xjcmaxwellcjx@gmail.com>
Signed-off-by: Jiacheng Xu <xjcmaxwellcjx@gmail.com>
@jiachengxu jiachengxu changed the title fix: fix metrics don't get emited properly during retry fix: fix metrics don't get emited properly during retry. Fixes #8207 #10463 Feb 8, 2023
Signed-off-by: Jiacheng Xu <xjcmaxwellcjx@gmail.com>
@jiachengxu jiachengxu changed the title fix: fix metrics don't get emited properly during retry. Fixes #8207 #10463 fix: fix metrics don't get emitted properly during retry. Fixes #8207 #10463 Feb 9, 2023
Signed-off-by: Jiacheng Xu <xjcmaxwellcjx@gmail.com>
@jiachengxu jiachengxu marked this pull request as ready for review February 9, 2023 13:17
@@ -57,6 +60,23 @@ func (s *MetricsSuite) TestMetricsEndpoint() {
})
}

func (s *MetricsSuite) TestRetryMetrics() {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, the e2e test is added to the MetricsSuite(within api test group) since we don't really have a suite for testing metrics functionalities.

@alexec alexec changed the title fix: fix metrics don't get emitted properly during retry. Fixes #8207 #10463 fix: metrics don't get emitted properly during retry. Fixes #8207 #10463 Feb 12, 2023
// Runtime parameters (e.g., `status`, `resourceDuration`) in the output will be used to emit metrics.
if lastChildNode != nil {
retryParentNode.Outputs = lastChildNode.Outputs.DeepCopy()
woc.wf.Status.Nodes[node.ID] = *retryParentNode
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this need locking with mutex to prevent race?

Copy link
Member Author

@jiachengxu jiachengxu Feb 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this probably doesn't require locking because the woc.wf.Status.Nodes[node.ID] = *retryParentNode only happens when the retryParentNode.Fulfilled() meets, which indicates that either all retries are failed or one retry succeeds, and for both cases, we only set woc.wf.Status.Nodes[node.ID] once to the lastChildNode.

Also, the code is actually moved from https://github.com/argoproj/argo-workflows/pull/10489/files#diff-f321d4af83fcf8311dc80c0d50c59ac4c240f761206e7bb652709870eb9feb43L1925-L1928 to the current place because the Outputs are used for emitting metrics. Was it already a race condition before this PR?

@jiachengxu jiachengxu requested a review from alexec March 14, 2023 07:59
@terrytangyuan
Copy link
Member

@alexec Want to take another look since you left some comments?

@terrytangyuan terrytangyuan merged commit cbd40e7 into argoproj:master Mar 27, 2023
terrytangyuan pushed a commit that referenced this pull request Mar 29, 2023
 (#10489)

Signed-off-by: Jiacheng Xu <xjcmaxwellcjx@gmail.com>
Co-authored-by: Saravanan Balasubramanian <33908564+sarabala1979@users.noreply.github.com>
JPZ13 pushed a commit to pipekit/argo-workflows that referenced this pull request Jul 4, 2023
…8207 argoproj#10463 (argoproj#10489)

Signed-off-by: Jiacheng Xu <xjcmaxwellcjx@gmail.com>
Co-authored-by: Saravanan Balasubramanian <33908564+sarabala1979@users.noreply.github.com>
@agilgur5 agilgur5 added area/metrics area/retry-manual Manual workflow "Retry" Action (API/CLI/UI). See retryStrategy for template-level retries labels Sep 8, 2023
@agilgur5 agilgur5 added area/retryStrategy Template-level retryStrategy and removed area/retry-manual Manual workflow "Retry" Action (API/CLI/UI). See retryStrategy for template-level retries labels Apr 25, 2024
@jiachengxu jiachengxu deleted the 8207 branch June 14, 2024 16:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/metrics area/retryStrategy Template-level retryStrategy
Projects
None yet
5 participants