-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
statistics: fix PK column TopN not loading when init stats #37552
Conversation
[REVIEW NOTIFICATION] This pull request has not been approved. To complete the pull request process, please ask the reviewers in the list to review by filling The full list of commands accepted by this bot can be found here. Reviewer can indicate their review by submitting an approval review. |
} | ||
|
||
func (h *Handle) initStatsTopN(cache *statsCache) error { | ||
ctx := kv.WithInternalSourceType(context.Background(), kv.InternalTxnStats) | ||
sql := "select HIGH_PRIORITY table_id, hist_id, value, count from mysql.stats_top_n where is_index = 1" | ||
sql := "select HIGH_PRIORITY table_id, is_index, hist_id, value, count from mysql.stats_top_n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that we only load Histograms into memory for indexes when starting TiDB, so should we keep the TopN corresponding with Histograms here? (only load index-TopN and skip column-TopN)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we should do that, except for PK. If the column is PK, we load its histogram and TopN into memory, otherwise we don't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can join information_schema.columns to get entries whose is_index=1 or COLUMN_KEY='PRI'?
} | ||
|
||
func (h *Handle) initStatsTopN(cache *statsCache) error { | ||
ctx := kv.WithInternalSourceType(context.Background(), kv.InternalTxnStats) | ||
sql := "select HIGH_PRIORITY table_id, hist_id, value, count from mysql.stats_top_n where is_index = 1" | ||
sql := "select HIGH_PRIORITY table_id, is_index, hist_id, value, count from mysql.stats_top_n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems we will read many unneeded data (non-PK columns' TopN). Is this acceptable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to avoid reading unnecessary data. It sometimes takes about 20 minutes to init stats when the cluster is under high pressure, and the modification will make it worse. Maybe we need a more efficient way to do this.
idx.TopN.AppendTopN(data, row.GetUint64(4)) | ||
} else { | ||
col, ok := table.Columns[row.GetInt64(2)] | ||
if !ok || col.Info == nil || !mysql.HasPriKeyFlag(col.Info.GetFlag()) || (col.CMSketch == nil && col.StatsVer <= statistics.Version1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this fix should also apply to stats ver1🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did some simple experiments and believe this bug also affects stats ver1, so probably this fix should also apply to stats ver1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. I will make the fix apply to stats ver1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggested title: fix PK column TopN not loading when init stats |
Co-authored-by: Yuanjia Zhang <qw4990@163.com>
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@xuyifangreeneyes: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
@xuyifangreeneyes: The following test failed, say
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #37552 +/- ##
================================================
+ Coverage 73.4684% 73.5833% +0.1148%
================================================
Files 1124 1125 +1
Lines 357948 358134 +186
================================================
+ Hits 262979 263527 +548
+ Misses 77937 77584 -353
+ Partials 17032 17023 -9 |
It has been fixed by #53298. |
What problem does this PR solve?
Issue Number: close #37548
Problem Summary:
What is changed and how it works?
Since we have topn for PK in stats ver2, when init stats, we need to load topn for PK.
Check List
Tests
Test as primary key column's Histogram and TopN are not loaded after restarting TiDB #37548 suggests.
Side effects
Documentation
Release note
Please refer to Release Notes Language Style Guide to write a quality release note.