Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add canary jitter workflow debugging log #6278

Merged
merged 14 commits into from
Sep 11, 2024
Merged

Conversation

bowenxia
Copy link
Contributor

What changed?

  • log jitter start time with debug mode

Why?

  • canary recently have errors of "early start". This means more than have of the jitter workflows which started by jitter collector workflow are started before 1/10 of jitter time. This should be a very rare case. So need to debug.

How did you test it?

Potential risks

Release notes

Documentation Changes

Copy link

codecov bot commented Sep 11, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 73.10%. Comparing base (1771349) to head (78284c9).
Report is 1 commits behind head on master.

Additional details and impacted files
Files with missing lines Coverage Δ
service/frontend/api/handler.go 65.97% <100.00%> (+0.11%) ⬆️

... and 7 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1771349...78284c9. Read the comment docs.

@bowenxia bowenxia changed the title Canary jitter workflow debugging add canary jitter workflow debugging log Sep 11, 2024
@bowenxia bowenxia changed the title add canary jitter workflow debugging log Add canary jitter workflow debugging log Sep 11, 2024
Comment on lines +1902 to +1907
wh.GetLogger().Debug("Start workflow execution request domainID",
tag.WorkflowDomainID(domainID),
tag.WorkflowID(startRequest.WorkflowID),
tag.Dynamic("JitterStartSeconds", jitterStartSeconds),
tag.Dynamic("firstDecisionTaskBackoffSeconds", historyRequest.GetFirstDecisionTaskBackoffSeconds()),
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible to cover these in tests ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bowenxia I wonder if you need to have this permanently or just for debugging.
If it's purely for debugging (I guess so since tags are too specific) then you can temporarily deploy your own branch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I talked about this in Cadence all hands, this kind of error happens randomly. It had been happened in Prod07, Prod04, Prod12 etc. I can't predict which environment will have that error, so I'll have to merge it to main and turn on the debug mode once this is once happen in the future. :(

Copy link
Contributor

@ketsiambaku ketsiambaku left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@bowenxia bowenxia merged commit e5bd91e into master Sep 11, 2024
21 checks passed
@bowenxia bowenxia deleted the xbowen_debug_jitter_00 branch September 11, 2024 22:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants