Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TiCDC] Owner fails to clean up stale metrics leading to incorrect display in dashboard #4774

Closed
liuzix opened this issue Mar 4, 2022 · 2 comments · Fixed by #4775 or #4998
Closed

Comments

@liuzix
Copy link
Contributor

liuzix commented Mar 4, 2022

If a owner has resigned, it may fail to clean up its metrics data. In some metric items, such as changefeed checkpoint lag, the dashboard tries to calculate a maximum of all existing time series, causing incorrect displayed values in the dashboard.

image

For example, in the screenshot above, the checkpoint lag remains unchanged after an old owner has resigned with a high lag.

@liuzix
Copy link
Contributor Author

liuzix commented Mar 23, 2022

#4775 seems to be an incomplete fix.

@liuzix
Copy link
Contributor Author

liuzix commented Mar 23, 2022

#4775 only mitigated the situation a little bit, and it assumed that Reset() methods on the prometheus metric items can clean up data from all instances, which is wrong. A new PR has been open, which is a complete fix.

liuzix added a commit to liuzix/ticdc that referenced this issue Apr 24, 2022
liuzix added a commit to liuzix/ticdc that referenced this issue Apr 24, 2022
liuzix added a commit to liuzix/ticdc that referenced this issue Jun 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants