-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
txn: Slow txn log #41864
txn: Slow txn log #41864
Conversation
Signed-off-by: ekexium <eke@fastmail.com>
Signed-off-by: ekexium <eke@fastmail.com>
Signed-off-by: ekexium <eke@fastmail.com>
[REVIEW NOTIFICATION] This pull request has been approved by:
To complete the pull request process, please ask the reviewers in the list to review by filling The full list of commands accepted by this bot can be found here. Reviewer can indicate their review by submitting an approval review. |
Signed-off-by: ekexium <eke@fastmail.com>
Is this change intended to resolve issues like #41471, if so we may need find out a way to show the long-running transactions like |
No. I'd rather treat them as different problems. This PR is a counterpart of slow queries. I intend to use it as a tool to diagnose tail latencies. Though finished long-running transaction could be logged by it as well. |
Signed-off-by: ekexium <eke@fastmail.com>
How to diagnose tail latencies using the long-txn log, I don't quite get it. Could you give an example of it? |
For example we see high tail latencies for transactions and high max idle durations in metrics. But we cannot guarantee that the tail latency is caused by idle duration.
We can tell that there is an idle duration of 218ms before the INSERT stmt comes. And it might help further diagnosis in client side. |
@@ -2175,6 +2175,7 @@ func (s *session) ExecuteStmt(ctx context.Context, stmtNode ast.StmtNode) (sqlex | |||
cmd32 := atomic.LoadUint32(&s.GetSessionVars().CommandValue) | |||
s.SetProcessInfo(stmtNode.Text(), time.Now(), byte(cmd32), 0) | |||
s.txn.onStmtStart(digest.String()) | |||
defer sessiontxn.GetTxnManager(s).OnStmtEnd() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to merge the two OnStmtEnd
into one #41122 but it looks difficult..
@@ -38,6 +38,8 @@ const ( | |||
DefaultLogFormat = "text" | |||
// DefaultSlowThreshold is the default slow log threshold in millisecond. | |||
DefaultSlowThreshold = 300 | |||
// DefaultSlowTxnThreshold is the default slow txn log threshold in ms. | |||
DefaultSlowTxnThreshold = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How to choose a default-enabled value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Default to 0. It's totally dependent on the application logic to determine what is "slow". And defaulting to 0 will prevent noisy logs from inner transactions.
And that's why this should be a SESSION-scoped variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that query_dur > stmt_dur, that is {event="select ...",gap=100ms}
may be caused by slow parsing, we cannot confirm that it's caused by client idle time without the slow query log of that query.
// used for slow transaction logs | ||
events []event | ||
lastInstant time.Time | ||
enterTxnInstant time.Time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe lastInstant
is unnecessary, we can record the start instant and the duration (since the start instant) of each event, then the gaps can be inferred from these info.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I've considered the alternative. I think they are basically equivalent and either is fine. If you prefer this style I can change it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it’s fine. A field of type time.Time here is not expensive at all.
Signed-off-by: ekexium <eke@fastmail.com>
That's true. And the stat doesn't count the time spent in TiDB after In real cases we should use it together with slow query logs and stmt info. For things like |
BTW I think we've added some "event tracking" things in the consistency check enhancement, should we also include the |
@ekexium |
Signed-off-by: ekexium <eke@fastmail.com>
/retest |
/merge |
This pull request has been accepted and is ready to merge. Commit hash: af9d4df
|
What problem does this PR solve?
Issue Number: close #41863
Problem Summary:
What is changed and how it works?
Add a session variable
tidb_slow_txn_log_threshold
. Log transaction events observed by the transaction manager of any transaction that takes longer than the threshold. Setting it to 0 means disabling the log.The log looks like
Check List
Tests
Side effects
Documentation
Release note
Please refer to Release Notes Language Style Guide to write a quality release note.