Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sessionctx/variable, executor: introduce a limit on "thread" config #28842

Merged
merged 5 commits into from
Oct 22, 2021

Conversation

morgo
Copy link
Contributor

@morgo morgo commented Oct 15, 2021

What problem does this PR solve?

Problem Summary:

Currently TiDB has many sysvars which configure "threads" (goroutines). The max range of some of these values is int32-max, which is not practical. Usually a practical limit is around the number of CPU cores, or 2x the number of CPU cores.

I have picked 256 as an upper bound because there are already some thread settings which safely use this as their upper bound.

What is changed and how it works?

What's Changed:

The maximum value of several system variables has been lowered to 256 for safety.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Technically this breaks BC in two ways:

  1. Users might need a value higher than 256. This seems possible, particularly in cases where latency is high and thread counts might be configured higher.
  2. If the PR is cherry picked the 'set' command might value when setting a high value. This is not a problem in master, because "SET" for an out of range value now returns a warning.

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

The maximum value of several system variables has been lowered to 256 for safety.

@morgo morgo added the compatibility-breaker Violation of forwards/backwards compatibility in a design-time piece. label Oct 15, 2021
@ti-chi-bot
Copy link
Member

ti-chi-bot commented Oct 15, 2021

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • bb7133
  • djshow832

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@morgo morgo requested a review from a team as a code owner October 15, 2021 04:26
@ti-chi-bot ti-chi-bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Oct 15, 2021
@kennytm
Copy link
Contributor

kennytm commented Oct 15, 2021

isn't 128 too low? 96-core machines are not that hard to find, and 2×96 = 192.

@morgo
Copy link
Contributor Author

morgo commented Oct 15, 2021

isn't 128 too low? 96-core machines are not that hard to find, and 2×96 = 192.

I picked it because it was an existing convention, but I had the same thought as you. For an absolute upper bound it is on the low side. I had also considered 256.

@kennytm
Copy link
Contributor

kennytm commented Oct 15, 2021

In the TiDB repository I've only found gcMaxConcurrency = 128 (#5837 (review)) and maxDDLReorgWorkerCount = 128 (#6441 (review)). The PR review did not explain why 128 was chosen though.

@morgo
Copy link
Contributor Author

morgo commented Oct 15, 2021

/run-check_dev_2

@ti-chi-bot ti-chi-bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Oct 15, 2021
@morgo
Copy link
Contributor Author

morgo commented Oct 15, 2021

OK, I've changed the default to 256 instead.

@morgo morgo force-pushed the concurrency-thread-limits branch from 068f088 to 397eb52 Compare October 15, 2021 05:27
@morgo morgo requested review from djshow832 and bb7133 October 20, 2021 14:05
Copy link
Contributor

@djshow832 djshow832 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tried: when a user has set the concurrency as 257 in the older TiDB version and then upgrade to this version, the TiDB cluster can work correctly, no matter the effective value is 257 or 256.
I assume it will work correctly, but it is worth a trial.

@XuHuaiyu
Copy link
Contributor

Can we use runtime.NumCPU() * n?

@morgo
Copy link
Contributor Author

morgo commented Oct 21, 2021

Have you tried: when a user has set the concurrency as 257 in the older TiDB version and then upgrade to this version, the TiDB cluster can work correctly, no matter the effective value is 257 or 256. I assume it will work correctly, but it is worth a trial.

Yes, it will work. When a sysvar is loaded from a previous version the validation func is always called (this is intentional knowing there are changes to validation cross version). The value will then be truncated to MaxValue. It is similar to the test in executor/ddl_test.go:

	tk.MustExec("set @@global.tidb_ddl_reorg_worker_cnt = 257")
	tk.MustQuery("SHOW WARNINGS").Check(testkit.Rows("Warning 1292 Truncated incorrect tidb_ddl_reorg_worker_cnt value: '257'"))
	tk.MustQuery("select @@global.tidb_ddl_reorg_worker_cnt").Check(testkit.Rows("256"))

@morgo
Copy link
Contributor Author

morgo commented Oct 21, 2021

Can we use runtime.NumCPU() * n?

I think it's better if we use static values for min/max. We can use dynamic values for defaults though.

@ti-chi-bot ti-chi-bot added the status/LGT1 Indicates that a PR has LGTM 1. label Oct 21, 2021
@ti-chi-bot ti-chi-bot added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Oct 22, 2021
@bb7133
Copy link
Member

bb7133 commented Oct 22, 2021

/merge

@ti-chi-bot
Copy link
Member

This pull request has been accepted and is ready to merge.

Commit hash: 397eb52

@ti-chi-bot ti-chi-bot added the status/can-merge Indicates a PR has been approved by a committer. label Oct 22, 2021
@ti-chi-bot
Copy link
Member

@morgo: Your PR was out of date, I have automatically updated it for you.

At the same time I will also trigger all tests for you:

/run-all-tests

If the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compatibility-breaker Violation of forwards/backwards compatibility in a design-time piece. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants