Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

session: add session var to update global logger max days #27595

Merged
merged 14 commits into from
Oct 27, 2021

Conversation

db-will
Copy link
Contributor

@db-will db-will commented Aug 26, 2021

What problem does this PR solve?

Issue Number: close #27520

Problem Summary: Currently, it is required to restart instance to change max-days of log files(log.file.max-days in config file). We want to enable user to change such config dynamically.

What is changed and how it works?

What's Changed:

  • added a session scope variable(tidb_log_file_max_days) to adjust max days of logger

How it Works:

  • when user want to dynamically change max days of general log, they can change the session variable
  • internally, it will create a new logger instance with new max days config to replace the global logger instance
  • the session value is stored in memory, and it may inconsistent with the actual log file config. the main purpose is to provide a way to dynamically adjust the max days, but not to provide a consistent view of log file max days config

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

We setup and run an tidb cluster with a given max-days=30 and max-size=30 in config

mysql> select @@tidb_log_file_max_days;
+--------------------------+
| @@tidb_log_file_max_days |
+--------------------------+
| 30                       |
+--------------------------+
1 row in set (0.00 sec)

Enable general log

mysql> set tidb_general_log = 1;
Query OK, 0 rows affected (0.01 sec)

Run sysbench prepare to init table and add data, so that we can generate some general logs

sysbench oltp_read_write --table-size=1000000 --mysql-db=sysbench --mysql-user=root --mysql-host=127.0.0.1 --mysql-port=4000 prepare
❯ ls -larth
total 392488
drwxr-xr-x  11 will  staff   352B  2 Sep 22:08 ..
-rw-r--r--   1 will  staff    30M  2 Sep 22:10 tidb-2021-09-02T22-10-56.957.log
-rw-r--r--   1 will  staff    30M  2 Sep 22:11 tidb-2021-09-02T22-11-00.343.log
-rw-r--r--   1 will  staff    30M  2 Sep 22:11 tidb-2021-09-02T22-11-03.773.log
-rw-r--r--   1 will  staff    30M  2 Sep 22:11 tidb-2021-09-02T22-11-07.151.log
-rw-r--r--   1 will  staff    30M  2 Sep 22:11 tidb-2021-09-02T22-11-10.400.log
-rw-r--r--   1 will  staff    30M  2 Sep 22:11 tidb-2021-09-02T22-11-13.926.log
drwxr-xr-x  10 will  staff   320B  2 Sep 22:11 .
-rw-r--r--   1 will  staff    11M  2 Sep 22:11 tidb.log
-rw-r--r--   1 will  staff   2.4K  2 Sep 22:11 tidb-slow.log

We can fake some last 5 days log files by creating similar log files with different timestamp and clean up the generate logs:

❯ cp tidb-2021-09-02T22-10-56.957.log tidb-2021-08-25T22-10-56.957.log
❯ cp tidb-2021-09-02T22-11-00.343.log tidb-2021-08-26T22-11-00.343.log
❯ cp tidb-2021-09-02T22-11-03.773.log tidb-2021-08-27T22-11-03.773.log
❯ cp tidb-2021-09-02T22-11-07.151.log tidb-2021-08-28T22-11-07.151.log
❯ cp tidb-2021-09-02T22-11-10.400.log tidb-2021-08-29T22-11-10.400.log
❯ ls -larth
total 695648
drwxr-xr-x  11 will  staff   352B  2 Sep 22:08 ..
-rw-r--r--   1 will  staff    30M  2 Sep 22:10 tidb-2021-09-02T22-10-56.957.log
-rw-r--r--   1 will  staff    30M  2 Sep 22:11 tidb-2021-09-02T22-11-00.343.log
-rw-r--r--   1 will  staff    30M  2 Sep 22:11 tidb-2021-09-02T22-11-03.773.log
-rw-r--r--   1 will  staff    30M  2 Sep 22:11 tidb-2021-09-02T22-11-07.151.log
-rw-r--r--   1 will  staff    30M  2 Sep 22:11 tidb-2021-09-02T22-11-10.400.log
-rw-r--r--   1 will  staff    30M  2 Sep 22:11 tidb-2021-09-02T22-11-13.926.log
-rw-r--r--   1 will  staff   2.4K  2 Sep 22:11 tidb-slow.log
-rw-r--r--   1 will  staff    30M  2 Sep 22:16 tidb-2021-08-25T22-10-56.957.log
-rw-r--r--   1 will  staff    30M  2 Sep 22:17 tidb-2021-08-26T22-11-00.343.log
-rw-r--r--   1 will  staff    30M  2 Sep 22:17 tidb-2021-08-27T22-11-03.773.log
-rw-r--r--   1 will  staff    30M  2 Sep 22:18 tidb-2021-08-28T22-11-07.151.log
-rw-r--r--   1 will  staff    11M  2 Sep 22:18 tidb.log
drwxr-xr-x  15 will  staff   480B  2 Sep 22:18 .
-rw-r--r--   1 will  staff    30M  2 Sep 22:18 tidb-2021-08-29T22-11-10.400.log

Let set max days to some values and run sysbench to generate new logs, and check if logs are purged as expected(this is due to purge only happen when new logs are generated):

mysql> set tidb_log_file_max_days = 5;
Query OK, 0 rows affected (0.01 sec)

❯ sysbench oltp_read_write --mysql-db=sysbench --mysql-user=root --mysql-port=4000  --max-time=10 --mysql-host=127.0.0.1 run

❯ ls
tidb-2021-08-29T22-11-10.400.log tidb-2021-09-02T22-11-10.400.log
tidb-2021-09-02T22-10-56.957.log tidb-2021-09-02T22-11-13.926.log
tidb-2021-09-02T22-11-00.343.log tidb-slow.log
tidb-2021-09-02T22-11-03.773.log tidb.log
tidb-2021-09-02T22-11-07.151.log

change max days to larger value, and we expect it won't wrongly purge files:

mysql> set tidb_log_file_max_days = 10;
Query OK, 0 rows affected (0.00 sec)

❯ sysbench oltp_read_write --mysql-db=sysbench --mysql-user=root --mysql-port=4000  --max-time=10 --mysql-host=127.0.0.1 run

❯ ls
tidb-2021-08-29T22-11-10.400.log tidb-2021-09-02T22-11-10.400.log
tidb-2021-09-02T22-10-56.957.log tidb-2021-09-02T22-11-13.926.log
tidb-2021-09-02T22-11-00.343.log tidb-slow.log
tidb-2021-09-02T22-11-03.773.log tidb.log
tidb-2021-09-02T22-11-07.151.log

Set to max days = 1

mysql> set tidb_log_file_max_days = 1;
Query OK, 0 rows affected (0.00 sec)

❯ sysbench oltp_read_write --mysql-db=sysbench --mysql-user=root --mysql-port=4000  --max-time=10 --mysql-host=127.0.0.1 run

❯ ls
tidb-2021-09-02T22-10-56.957.log tidb-2021-09-02T22-11-10.400.log
tidb-2021-09-02T22-11-00.343.log tidb-2021-09-02T22-11-13.926.log
tidb-2021-09-02T22-11-03.773.log tidb-slow.log
tidb-2021-09-02T22-11-07.151.log tidb.log

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Add session variable tidb_log_file_max_days to dynamically adjust general log and slow log max days

@ti-chi-bot
Copy link
Member

ti-chi-bot commented Aug 26, 2021

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • morgo
  • xhebox

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@ti-chi-bot ti-chi-bot added the release-note Denotes a PR that will be considered when it comes time to generate release notes. label Aug 26, 2021
@CLAassistant
Copy link

CLAassistant commented Aug 26, 2021

CLA assistant check
All committers have signed the CLA.

@ti-chi-bot ti-chi-bot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Aug 26, 2021
@sre-bot
Copy link
Contributor

sre-bot commented Aug 26, 2021

Please follow PR Title Format:

  • pkg [, pkg2, pkg3]: what is changed

Or if the count of mainly changed packages are more than 3, use

  • *: what is changed

After you have format title, you can leave a comment /run-check_title to recheck it

@db-will db-will changed the title Add session var to update global logger max age session: add session var to update global logger max age Aug 26, 2021
@db-will
Copy link
Contributor Author

db-will commented Aug 26, 2021

/assign @morgo

@db-will
Copy link
Contributor Author

db-will commented Aug 26, 2021

/run-check_title

@morgo morgo self-requested a review August 26, 2021 23:29

return nil
}, GetSession: func(s *SessionVars) (string, error) {
return strconv.FormatInt(int64(atomic.LoadInt32(&GeneralLogMaxAge)), 10), nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will be out of sync until the first ReplaceLogger event happens right? So if they never replace the logger, it will not be safe to read TiDBGeneralLogMaxAge to see what the value defined in config is.

A workaround is, you can set the sysvar in tidb-server/main.go when the server starts up. We have some examples for global sysvars, but because you have a getter func (and it's session scope) you can just atomic.StoreInt32(&GeneralLogMaxAge, maxAge)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i found setGlobalVars in tidb-server/main.go initialize variables during startup, from what i can tell, we can actually change this to be global variable here and i think it make more sense.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of these are setting ScopeNone (read-only) sysvars. If you make this ScopeGlobal it will be cluster wide.

The behavior of ScopeGlobal + config file is weird, because any new joining server would overwrite the other server's log configuration. Usually with logs (or things refering to the local filesystem) we have kept them instance scoped. This also supports the use case that a user might want to have one server log everything.

I hope in future we clean this up though, so there is a native concept of instance scope.

@morgo morgo removed their assignment Aug 27, 2021
@db-will db-will changed the title session: add session var to update global logger max age session: add session var to update global logger max days Aug 29, 2021
@morgo
Copy link
Contributor

morgo commented Aug 29, 2021

@db-will Please run go fmt sessionctx/variable/tidb_vars.go. Otherwise LGTM.

@morgo morgo self-requested a review August 29, 2021 19:14
Copy link
Contributor

@morgo morgo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-chi-bot ti-chi-bot added the status/LGT1 Indicates that a PR has LGTM 1. label Aug 29, 2021
@morgo morgo requested a review from a team August 29, 2021 19:15
@db-will
Copy link
Contributor Author

db-will commented Aug 30, 2021

/assign @SabaPing

@ti-chi-bot
Copy link
Member

@db-will: GitHub didn't allow me to assign the following users: SabaPing.

Note that only pingcap members, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

/assign @SabaPing

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@bb7133
Copy link
Member

bb7133 commented Aug 30, 2021

This variable is named 'tidb_general_log_max_days' but it is not limited to the 'general log' actually. Is it a good idea to name it 'tidb_log_file_max_days'?

Copy link
Contributor

@SabaPing SabaPing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are max-size and max-backups variables in log config. Should we add them to session var to make the behavior consistent?

@@ -121,6 +121,9 @@ const (
// tidb_general_log is used to log every query in the server in info level.
TiDBGeneralLog = "tidb_general_log"

// tidb_general_log is used to log every query in the server in info level.
TiDBGeneralLogMaxDays = "tidb_general_log_max_days"
Copy link
Contributor

@SabaPing SabaPing Aug 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @bb7133 said, should we rename it to tidb_log_file_max_days?

@@ -143,6 +144,24 @@ func initGRPCLogger(cfg *LogConfig) (*zap.Logger, *log.ZapProperties, error) {
return l, prop, nil
}

// ReplaceLogger replace global logger instance with given log config.
func ReplaceLogger(cfg *LogConfig) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to replace slow query logger to enable slow log runtime modification either?

@db-will
Copy link
Contributor Author

db-will commented Aug 30, 2021

@bb7133 Previously, i found newSlowQueryLogger() doesn't change max days of slow logger, but @SabaPing is creating a PR to fix that: #27625. Combined with these two, we can make it as tidb_log_file_max_days . good suggestions here!

@morgo morgo requested a review from a team September 3, 2021 02:54
@db-will
Copy link
Contributor Author

db-will commented Sep 3, 2021

/assign @bb7133

@db-will
Copy link
Contributor Author

db-will commented Sep 15, 2021

/assign @yudongusa
/assign @XuHuaiyu
/assign @cfzjywxk

Looking for one another approval to meet the merge criterial

@dveeden
Copy link
Contributor

dveeden commented Oct 14, 2021

Unfortunately libfaketime doesn't work with Go, that would have made testing easier and/or more realistic. Interesting enough the golang playground etc are using a fake time (example). However I think the testing here is sufficient for this PR.

In MySQL expire_logs_days was replaced by binlog_expire_logs_seconds. While this variable doesn't server the same purpose it might be good to consider settings in seconds or hours instead of days. Another option would be to allow decimal values like 0.5 days.

Another option is to leave some of this work to the OS, like logrotate and journald. Sending this to a socket is another option. Some of these might make it easier to filter, compress, forward and de-duplicate log entries, especially when using JSON. However I don't think this is in scope for this PR.

@ti-chi-bot ti-chi-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 22, 2021
@ti-chi-bot ti-chi-bot added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Oct 25, 2021
@db-will db-will requested a review from a team as a code owner October 26, 2021 02:39
@ti-chi-bot ti-chi-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 26, 2021
@xiongjiwei
Copy link
Contributor

please format the code and then the pr can be merge

@db-will
Copy link
Contributor Author

db-will commented Oct 27, 2021

@xiongjiwei May i merge this PR?

@xiongjiwei
Copy link
Contributor

/merge

@ti-chi-bot
Copy link
Member

This pull request has been accepted and is ready to merge.

Commit hash: 125887b

@ti-chi-bot ti-chi-bot added the status/can-merge Indicates a PR has been approved by a committer. label Oct 27, 2021
@ti-chi-bot
Copy link
Member

@db-will: Your PR was out of date, I have automatically updated it for you.

At the same time I will also trigger all tests for you:

/run-all-tests

If the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@ti-chi-bot ti-chi-bot merged commit c9f33cb into pingcap:master Oct 27, 2021
@db-will db-will deleted the general-log-max-age branch November 1, 2021 19:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

expose log.file.max-days as a dynamic parameter