Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update frontend poller metrics to include tasklist tag #6237

Merged
merged 1 commit into from
Aug 20, 2024

Conversation

Shaddoll
Copy link
Member

What changed?
Update frontend poller metrics to include tasklist tag

Why?
Improve observability

How did you test it?

Potential risks

Release notes

Documentation Changes

Copy link

codecov bot commented Aug 19, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 72.93%. Comparing base (67fcf12) to head (001386d).
Report is 3 commits behind head on master.

Additional details and impacted files

see 11 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 67fcf12...001386d. Read the comment docs.

Copy link
Member

@timl3136 timl3136 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines -322 to +326
scope := h.metricsClient.Scope(metrics.FrontendPollForDecisionTaskScope).Tagged(metrics.DomainTag(pp1.GetDomain())).Tagged(metrics.GetContextTags(ctx)...)
scope.IncCounter(metrics.CadenceRequests)
sw := scope.StartTimer(metrics.CadenceLatency)
scope := common.NewPerTaskListScope(pp1.Domain, pp1.TaskList.GetName(), pp1.TaskList.GetKind(), h.metricsClient, metrics.FrontendPollForDecisionTaskScope).Tagged(metrics.GetContextTags(ctx)...)
scope.IncCounter(metrics.CadenceRequestsPerTaskList)
sw := scope.StartTimer(metrics.CadenceLatencyPerTaskList)
Copy link
Member

@Groxx Groxx Aug 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this might deserve a rollup value for the timer, as we'll lose domain-wide aggregates with this change.
(or is that somehow already handled? I find it super hard to figure that out tbh)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. I also though that might be a problem and based on this line it looks like there's rollup defined for this metric.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UPDATED! Also fixed a bug in metrics package.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, just doing a manual non-TL-containing timer. yea, that seems fine 👍

Copy link
Member

@Groxx Groxx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just blocking temporarily to make sure a rollup value is considered.
if we don't want/need the per-domain rollup, or it's somehow present (I find it super hard to tell), then LGTM happy to unblock 👍

Comment on lines 122 to 124
func (m *metricsScope) Tagged(tags ...Tag) Scope {
domainTagged := false
domainTagged := m.isDomainTagged
tagMap := make(map[string]string, len(tags))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aaah. yeah, this makes sense - otherwise we'd lose the value when we make a cached scope with a domain tag. 👍

Copy link
Member

@Groxx Groxx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, looks good now. thank you!

What does the isDomainTagged value do, anyway? The fix looks definitely correct, I'm just not familiar with what it actually does.


edit: from chat messages:

if it's domain tagged, it emits a summary metric with domain:all

so yea, that's definitely a good fix to have. thanks!

@Shaddoll Shaddoll merged commit 44353d6 into cadence-workflow:master Aug 20, 2024
19 checks passed
@Shaddoll Shaddoll deleted the poll branch August 20, 2024 20:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants