-
Notifications
You must be signed in to change notification settings - Fork 805
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add log and metrics to workflow termination events #6146
Add log and metrics to workflow termination events #6146
Conversation
197331d
to
f3a7f5d
Compare
5329e75
to
08460ec
Compare
Pull Request Test Coverage Report for Build 019050ae-cea2-4903-9de4-b1001335191cDetails
💛 - Coveralls |
08460ec
to
2f2f9c7
Compare
Pull Request Test Coverage Report for Build 019060fb-a2f5-46ac-a87c-29c4b9ea3cefDetails
💛 - Coveralls |
@@ -47,5 +54,19 @@ func TerminateWorkflow( | |||
terminateDetails, | |||
terminateIdentity, | |||
) | |||
|
|||
domainName := mutableState.GetDomainEntry().GetInfo().Name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest moving these metrics to AddWorkflowExecutionTerminatedEvent. It will simplify things since mutableStateBuilder already has metrics.Client and logger. Then you don't need to change the signature of this method
tag.WorkflowDomainName(domainName), | ||
tag.WorkflowID(r.mutableState.GetExecutionInfo().WorkflowID), | ||
tag.WorkflowRunID(r.mutableState.GetExecutionInfo().RunID), | ||
tag.WorkflowTerminationReason(WorkflowTerminationReason), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this code path only called for conflict based terminations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend passing the reason as a parameter because this looks like a helper function that may get called from other places in the future. Having a reason param will force the future changes to pass the actual reason which is desired compared to logging the invalid reason for such cases.
8e519a7
to
7aa89ca
Compare
tag.WorkflowTerminationReason(reason), | ||
) | ||
|
||
scopeWithDomainTag := e.metricsClient.Scope(metrics.HistoryTerminateWorkflowExecutionScope).Tagged(metrics.DomainTag(domainName)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we can add the reason dimension to this?
c034d9b
to
460c452
Compare
**What changed?** Added logs to workflow termination events, containing the reason for the termination, tagged with the `domainName`, `workflowID`, `terminationReason`, and `runID`. Added metrics to workflow termination events, using a counter per domain `WorkflowTerminateCounterPerDomain` under the `HistoryTerminateWorkflowExecutionScope` scope, with `WorkflowTerminationReasonTag` **Why?** Improve workflow termination visibility, allowing Cadence and clients to easily find terminated workflows. This is particularly important to provide better information for workflows terminated during failovers. **How did you test it?** Unit tests. **Potential risks** The risks are associated with the changes in functions parameters being passed. Need to ensure that the parameters are correct and that they do not contain `nil` values. **Release notes** **Documentation Changes**
460c452
to
e3becf7
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files
... and 14 files with indirect coverage changes Continue to review full report in Codecov by Sentry.
|
…WorkflowTerminationReasonTag`.
tag.WorkflowTerminationReason(reason), | ||
) | ||
|
||
re := regexp.MustCompile(`[^a-zA-Z0-9]`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry to be annoying, but lets move this outside the function invocation, there's no need to recompile it each time.
By convention, go programs put contstants, type declarations up the top, so perhaps around the top of the metrics/tags.go
we can define the regex and then just use it inside the metrics.WorkflowTerminationReasonTag
function, thefore we don't need to remember to sanitize inline like this for every use?
The syntax outside function closure is slightly different, so either
var safeAlphaNumericStringRE = regexp.MustCompile(`[^a-zA-Z0-9]`)
// or, within a var block like:
var (
safeAlphaNumericStringRE = regexp.MustCompile(`[^a-zA-Z0-9]`)
)
and that will make it easier such that we reference the already compiled regex
// WorkflowTerminationReasonTag reports the reason for workflow termination
func WorkflowTerminationReasonTag(value string) Tag {
value = safeAlphaNumericStringRE.ReplaceAllString(value, "_")
return simpleMetric{key: workflowTerminationReason, value: value}
}
…unction to avoid performance issues
What changed?
Added logs to workflow termination events, containing the reason for the termination, tagged with the
domainName
,workflowID
,terminationReason
, andrunID
.Added metrics to workflow termination events, using a counter per domain
WorkflowTerminateCounterPerDomain
under theHistoryTerminateWorkflowExecutionScope
scope.Why?
Improve workflow termination visibility, allowing Cadence and clients to easily find terminated workflows. This is particularly important to provide better information for workflows terminated during failovers.
How did you test it?
Unit tests.
Potential risks
The risks are associated with the changes in functions parameters being passed. Need to ensure that the parameters are correct and that they do not contain
nil
values.Release notes
Documentation Changes