Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

config, sysvar: move configs tmp-storage-* to instance sysvars #36849

Open
wants to merge 82 commits into
base: master
Choose a base branch
from

Conversation

CbcWestwolf
Copy link
Member

@CbcWestwolf CbcWestwolf commented Aug 3, 2022

What problem does this PR solve?

Issue Number: ref #34960

Problem Summary:

Before, if we want to change the temporary storage path/quota in a node/instance, we have to change the configuration and restart the node.

What is changed and how it works?

In this PR we introduce two instance scope variables tmpdir and tidb_tmp_storage_quota, and the corresponding config items in [instance] section.

Check List

Tests

  • Unit test (existing UT)
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Manual Test

Data preparation

Setup t1 and t2:

#!/bin/bash
mysql -u root -h 127.0.0.1 -D test -P 4000 -e "DROP TABLE IF EXISTS t1, t2"
mysql -u root -h 127.0.0.1 -D test -P 4000 -e "CREATE TABLE t1 (c INT); CREATE TABLE t2 (c VARCHAR(100));"
for ((i=1; i<=9999; i++)); do
mysql -u root -h 127.0.0.1 -D test -P 4000 -e "INSERT INTO t1 VALUES ($i); INSERT INTO t2 VALUES ($i);"
done
Test

There are several functions using tmpdir:

  1. util/disk/tempDir.go: InitializeTempDir() would be called each time tmpdir is modified, and create a _dir.lock file. We expect to see this file after we modify tmpdir.

    You can just switch different tmpdir like TestSwitchMultipleTmpDir, and check that both $tmpdir/record/ and $tmpdir/_dir.lock exist.

  2. executor/explain.go: getHeapProfile() would create a temp file named like $tmpdir/record/heapGC2006-01-02T15:04:05Z07:001. We can use the sql explain analyze select ... to trigger a long-time query and modify [instance].tmpdir in another session. We expect that the modification is successful and the origin heap file names are recorded in memoryDebugModeHandler.infoField and shown on logs.

    • connect to session 1 and execute:
    set tidb_memory_debug_mode_min_heap_inuse = 1024;
    set tidb_memory_debug_mode_alarm_ratio = 100;
    -- The following query costs about 3 seconds to execute.
    explain analyze select t1.c as c1, t2.c as c2 from t1 join t2 on t1.c = t2.c - t1.c order by c2 desc;
    • connect to session 2 and execute set global tmpdir = 'another_tmpdir';
    • waiting the explain analyze in session 1 to finish. The log would show a line containing ["Memory Debug Mode"] [sql=finished]. In this line you could see heap profile following the tmpdir. In most cases it has been set to another_tmpdir, but it is still reasonable if it keeps the origin value.
  3. util/misc.go: LoadTLSCertificates() creates a key file and certificate in tmpdir. We can set auto-tls to true in config.toml and use the sql alter instance reload tls to call LoadTLSCertificates(). By checking the log, we expect that the paths of cert and key file keep the same as the original.

    • create a config.toml as below and start the tidb:
    [security]
    auto-tls = true
    • connect to tidb and check the ssl key and cert by select @@ssl_key, @@ssl_cert;
    • execute alter instance reload tls;, and the log ["execute reload tls"] shows the ssl_key and ssl_path
    • select @@ssl_key, @@ssl_cert; and we expect to get the origin result
  4. util/chunk/disk.go: initDiskFile() uses disk.TmpDirMutex to prevent possible conflicts since it has several statements reading tmpdir. We can perform a large query with sort or join operator under limited memory to call this method. Since the block of mutex, the modification of tmpdir would wait for the query finished.

    • do the query in session 1. Two temporary files chunk.ListInDisk-xxxxx and chunk.ListInDiskOffset-xxxxx exists on tmpdir:
    set @@tidb_mem_quota_query = 100 << 10; -- 100KB
    set global tidb_mem_oom_action = 'log';
    -- The following query costs hundreds of seconds to execute.
    select t1.c as c1, t2.c as c2 from t1 join t2 on t1.c = t2.c - t1.c order by c2 desc;
    • when the query in session 1 is going on, run set global tmpdir = "another_tmpdir"; in session 2
    • when the query finished in session 1, the origin 2 temp chunk files are deleted, and select @@global.tmpdir; returns another_tmpdir in session 1.
  5. type memoryUsageAlarm struct binds to each domain, and owns a tmpDir field for dumping information about OOM. We can modify tidb_memory_usage_alarm_ratio to trigger the alarm dumping.

    • create a config.toml as below and start the tidb:
    [performance]
    # you can set some other suitable values to trigger the memory alarm
    server-memory-quota = 1048576 # 1MB
    memory-usage-alarm-ratio = 0.01
    • query select t1.c as c1, t2.c as c2 from t1 join t2 on t1.c = t2.c - t1.c order by c2 desc;
  6. ddl/ingest/env.go: genLightningDataDir() is called each time a ddl is starting. We expect to see a directory named $tmpdir/tmp_ddl-<port> after we modify the tmpdir and do another ddl.

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

1. Merge configs `tmp-storage-path` and `temp-dir` into one config `[instance].tmpdir`, and create a new corresponding instance system variable `tmpdir`
2. Move config `tmp-storage-quota` to `[instance].tidb_tmp_storage_quota`, and create a new corresponding instance system variable `tidb_tmp_storage_quota`

@ti-chi-bot
Copy link
Member

ti-chi-bot commented Aug 3, 2022

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • morgo
  • xhebox

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@ti-chi-bot ti-chi-bot added release-note-none Denotes a PR that doesn't merit a release note. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Aug 3, 2022
@sre-bot
Copy link
Contributor

sre-bot commented Aug 3, 2022

@CbcWestwolf CbcWestwolf marked this pull request as ready for review August 5, 2022 03:37
@CbcWestwolf CbcWestwolf requested a review from a team as a code owner August 5, 2022 03:37
@ti-chi-bot ti-chi-bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 5, 2022
@ti-chi-bot ti-chi-bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Aug 9, 2022
@CbcWestwolf
Copy link
Member Author

/cc wshwsh12

@ti-chi-bot ti-chi-bot requested a review from wshwsh12 August 9, 2022 13:15
@CbcWestwolf
Copy link
Member Author

/run-mysql-test

@ti-chi-bot ti-chi-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 7, 2022
@ti-chi-bot ti-chi-bot added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Nov 1, 2022
@ti-chi-bot
Copy link
Member

@CbcWestwolf: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@ti-chi-bot ti-chi-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 5, 2022
@ti-chi-bot ti-chi-bot bot deleted a comment from ti-chi-bot Dec 19, 2023
Copy link

tiprow bot commented Feb 27, 2024

@CbcWestwolf: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
tiprow_fast_test 9a6e689 link true /test fast_test_tiprow

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Copy link

ti-chi-bot bot commented Nov 18, 2024

@CbcWestwolf: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
idc-jenkins-ci-tidb/mysql-test 9a6e689 link true /test mysql-test
idc-jenkins-ci-tidb/unit-test 9a6e689 link true /test unit-test
idc-jenkins-ci-tidb/build 9a6e689 link true /test build
idc-jenkins-ci-tidb/check_dev_2 9a6e689 link true /test check-dev2
pull-mysql-client-test 9a6e689 link true /test pull-mysql-client-test
pull-integration-ddl-test 9a6e689 link true /test pull-integration-ddl-test
idc-jenkins-ci-tidb/check_dev 9a6e689 link true /test check-dev
pull-br-integration-test 9a6e689 link true /test pull-br-integration-test
pull-lightning-integration-test 9a6e689 link true /test pull-lightning-integration-test
pull-integration-e2e-test 9a6e689 link true /test pull-integration-e2e-test

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. status/LGT2 Indicates that a PR has LGTM 2.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants