Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

statistics: fix PK column TopN not loading when init stats #37552

Closed
wants to merge 9 commits into from

Conversation

xuyifangreeneyes
Copy link
Contributor

@xuyifangreeneyes xuyifangreeneyes commented Sep 1, 2022

What problem does this PR solve?

Issue Number: close #37548

Problem Summary:

What is changed and how it works?

Since we have topn for PK in stats ver2, when init stats, we need to load topn for PK.

Check List

Tests

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

@ti-chi-bot
Copy link
Member

[REVIEW NOTIFICATION]

This pull request has not been approved.

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@ti-chi-bot ti-chi-bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note-none Denotes a PR that doesn't merit a release note. do-not-merge/needs-triage-completed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Sep 1, 2022
@xuyifangreeneyes xuyifangreeneyes marked this pull request as ready for review January 10, 2023 08:18
@ti-chi-bot ti-chi-bot added needs-cherry-pick-release-5.3 Type: Need cherry pick to release-5.3 needs-cherry-pick-release-5.4 Should cherry pick this PR to release-5.4 branch. needs-cherry-pick-release-6.1 Should cherry pick this PR to release-6.1 branch. needs-cherry-pick-release-6.5 Should cherry pick this PR to release-6.5 branch. and removed do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. do-not-merge/needs-triage-completed labels Jan 10, 2023
@ti-chi-bot ti-chi-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 23, 2023
@ti-chi-bot ti-chi-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 16, 2023
@ti-chi-bot ti-chi-bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Feb 16, 2023
@ti-chi-bot ti-chi-bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Feb 16, 2023
}

func (h *Handle) initStatsTopN(cache *statsCache) error {
ctx := kv.WithInternalSourceType(context.Background(), kv.InternalTxnStats)
sql := "select HIGH_PRIORITY table_id, hist_id, value, count from mysql.stats_top_n where is_index = 1"
sql := "select HIGH_PRIORITY table_id, is_index, hist_id, value, count from mysql.stats_top_n"
Copy link
Contributor

@qw4990 qw4990 Feb 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that we only load Histograms into memory for indexes when starting TiDB, so should we keep the TopN corresponding with Histograms here? (only load index-TopN and skip column-TopN)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we should do that, except for PK. If the column is PK, we load its histogram and TopN into memory, otherwise we don't.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can join information_schema.columns to get entries whose is_index=1 or COLUMN_KEY='PRI'?

}

func (h *Handle) initStatsTopN(cache *statsCache) error {
ctx := kv.WithInternalSourceType(context.Background(), kv.InternalTxnStats)
sql := "select HIGH_PRIORITY table_id, hist_id, value, count from mysql.stats_top_n where is_index = 1"
sql := "select HIGH_PRIORITY table_id, is_index, hist_id, value, count from mysql.stats_top_n"
Copy link
Member

@time-and-fate time-and-fate Feb 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems we will read many unneeded data (non-PK columns' TopN). Is this acceptable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to avoid reading unnecessary data. It sometimes takes about 20 minutes to init stats when the cluster is under high pressure, and the modification will make it worse. Maybe we need a more efficient way to do this.

idx.TopN.AppendTopN(data, row.GetUint64(4))
} else {
col, ok := table.Columns[row.GetInt64(2)]
if !ok || col.Info == nil || !mysql.HasPriKeyFlag(col.Info.GetFlag()) || (col.CMSketch == nil && col.StatsVer <= statistics.Version1) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this fix should also apply to stats ver1🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did some simple experiments and believe this bug also affects stats ver1, so probably this fix should also apply to stats ver1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. I will make the fix apply to stats ver1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The condititon col.CMSketch == nil && col.StatsVer <= statistics.Version1 is copied from the condition of loading index TopN(idx.CMSketch == nil && idx.StatsVer <= statistics.Version1), which I don't fully understand by now. The condition is from #14421 and #24623. I will take a deep look into it.

@time-and-fate
Copy link
Member

Suggested title: fix PK column TopN not loading when init stats

@xuyifangreeneyes xuyifangreeneyes changed the title statistics: fix loading column stats when init stats statistics: fix PK column TopN not loading when init stats Feb 20, 2023
Co-authored-by: Yuanjia Zhang <qw4990@163.com>
@ti-chi-bot ti-chi-bot bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 22, 2023
@ti-chi-bot
Copy link

ti-chi-bot bot commented Apr 22, 2023

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@ti-chi-bot
Copy link

ti-chi-bot bot commented Jul 6, 2023

@xuyifangreeneyes: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-br-integration-test c359a4e link true /test pull-br-integration-test

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@ti-chi-bot
Copy link
Member

@xuyifangreeneyes: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-br-integration-test c359a4e link true /test pull-br-integration-test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Copy link

codecov bot commented Nov 17, 2023

Codecov Report

Attention: Patch coverage is 92.59259% with 2 lines in your changes missing coverage. Please review.

Project coverage is 73.5833%. Comparing base (f9e1845) to head (c359a4e).
Report is 4160 commits behind head on master.

Additional details and impacted files
@@               Coverage Diff                @@
##             master     #37552        +/-   ##
================================================
+ Coverage   73.4684%   73.5833%   +0.1148%     
================================================
  Files          1124       1125         +1     
  Lines        357948     358134       +186     
================================================
+ Hits         262979     263527       +548     
+ Misses        77937      77584       -353     
+ Partials      17032      17023         -9     

@ti-chi-bot ti-chi-bot added needs-cherry-pick-release-8.1 Should cherry pick this PR to release-8.1 branch. needs-cherry-pick-release-7.5 Should cherry pick this PR to release-7.5 branch. labels May 16, 2024
@ti-chi-bot ti-chi-bot added the needs-cherry-pick-release-7.1 Should cherry pick this PR to release-7.1 branch. label Jun 25, 2024
@hawkingrei
Copy link
Member

It has been fixed by #53298.

@hawkingrei hawkingrei closed this Jun 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-cherry-pick-release-5.3 Type: Need cherry pick to release-5.3 needs-cherry-pick-release-5.4 Should cherry pick this PR to release-5.4 branch. needs-cherry-pick-release-6.1 Should cherry pick this PR to release-6.1 branch. needs-cherry-pick-release-6.5 Should cherry pick this PR to release-6.5 branch. needs-cherry-pick-release-7.1 Should cherry pick this PR to release-7.1 branch. needs-cherry-pick-release-7.5 Should cherry pick this PR to release-7.5 branch. needs-cherry-pick-release-8.1 Should cherry pick this PR to release-8.1 branch. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. release-note-none Denotes a PR that doesn't merit a release note. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

primary key column's Histogram and TopN are not loaded after restarting TiDB
6 participants