Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

txn: Slow txn log #41864

Merged
merged 12 commits into from
Mar 6, 2023
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions session/BUILD.bazel
Original file line number Diff line number Diff line change
Expand Up @@ -103,6 +103,7 @@ go_library(
"@com_github_tikv_client_go_v2//util",
"@io_etcd_go_etcd_client_v3//concurrency",
"@org_uber_go_zap//:zap",
"@org_uber_go_zap//zapcore",
],
)

Expand Down
1 change: 1 addition & 0 deletions session/session.go
Original file line number Diff line number Diff line change
Expand Up @@ -2175,6 +2175,7 @@ func (s *session) ExecuteStmt(ctx context.Context, stmtNode ast.StmtNode) (sqlex
cmd32 := atomic.LoadUint32(&s.GetSessionVars().CommandValue)
s.SetProcessInfo(stmtNode.Text(), time.Now(), byte(cmd32), 0)
s.txn.onStmtStart(digest.String())
defer sessiontxn.GetTxnManager(s).OnStmtEnd()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to merge the two OnStmtEnd into one #41122 but it looks difficult..

defer s.txn.onStmtEnd()

if err := s.onTxnManagerStmtStartOrRetry(ctx, stmtNode); err != nil {
Expand Down
72 changes: 72 additions & 0 deletions session/txnmanager.go
Original file line number Diff line number Diff line change
Expand Up @@ -17,16 +17,21 @@ package session
import (
"context"
"fmt"
"time"

"github.com/pingcap/errors"
"github.com/pingcap/failpoint"
"github.com/pingcap/tidb/infoschema"
"github.com/pingcap/tidb/kv"
"github.com/pingcap/tidb/parser"
"github.com/pingcap/tidb/parser/ast"
"github.com/pingcap/tidb/sessionctx"
"github.com/pingcap/tidb/sessiontxn"
"github.com/pingcap/tidb/sessiontxn/isolation"
"github.com/pingcap/tidb/sessiontxn/staleread"
"github.com/pingcap/tidb/util/logutil"
"go.uber.org/zap"
"go.uber.org/zap/zapcore"
)

func init() {
Expand All @@ -52,6 +57,22 @@ type txnManager struct {

// We always reuse the same OptimisticTxnContextProvider in one session to reduce memory allocation cost for every new txn.
reservedOptimisticProviders [2]isolation.OptimisticTxnContextProvider

// used for slow transaction logs
events []event
lastInstant time.Time
enterTxnInstant time.Time
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe lastInstant is unnecessary, we can record the start instant and the duration (since the start instant) of each event, then the gaps can be inferred from these info.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I've considered the alternative. I think they are basically equivalent and either is fine. If you prefer this style I can change it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it’s fine. A field of type time.Time here is not expensive at all.

}

type event struct {
event string
duration time.Duration
}

func (s event) MarshalLogObject(enc zapcore.ObjectEncoder) error {
enc.AddString("event", s.event)
enc.AddDuration("gap", s.duration)
return nil
}

func newTxnManager(sctx sessionctx.Context) *txnManager {
Expand Down Expand Up @@ -153,12 +174,30 @@ func (m *txnManager) EnterNewTxn(ctx context.Context, r *sessiontxn.EnterNewTxnR
if r.Type == sessiontxn.EnterNewTxnWithBeginStmt {
m.sctx.GetSessionVars().SetInTxn(true)
}

m.resetEvents()
m.recordEvent("enter txn")
return nil
}

func (m *txnManager) OnTxnEnd() {
m.ctxProvider = nil
m.stmtNode = nil

m.events = append(m.events, event{event: "txn end", duration: time.Since(m.lastInstant)})

duration := time.Since(m.enterTxnInstant)
threshold := m.sctx.GetSessionVars().SlowTxnThreshold
if threshold > 0 && uint64(duration.Milliseconds()) >= threshold {
logutil.BgLogger().Info(
ekexium marked this conversation as resolved.
Show resolved Hide resolved
"slow transaction", zap.Duration("duration", duration),
zap.Uint64("conn", m.sctx.GetSessionVars().ConnectionID),
zap.Uint64("txnStartTS", m.sctx.GetSessionVars().TxnCtx.StartTS),
zap.Objects("events", m.events),
)
}

m.lastInstant = time.Now()
}

func (m *txnManager) GetCurrentStmt() ast.StmtNode {
Expand All @@ -172,9 +211,23 @@ func (m *txnManager) OnStmtStart(ctx context.Context, node ast.StmtNode) error {
if m.ctxProvider == nil {
return errors.New("context provider not set")
}

var sql string
if node != nil {
sql = node.OriginalText()
if m.sctx.GetSessionVars().EnableRedactLog {
sql = parser.Normalize(sql)
}
}
m.recordEvent(sql)
return m.ctxProvider.OnStmtStart(ctx, m.stmtNode)
}

// OnStmtEnd implements the TxnManager interface
func (m *txnManager) OnStmtEnd() {
m.recordEvent("stmt end")
}

// OnPessimisticStmtStart is the hook that should be called when starts handling a pessimistic DML or
// a pessimistic select-for-update statements.
func (m *txnManager) OnPessimisticStmtStart(ctx context.Context) error {
Expand Down Expand Up @@ -222,14 +275,33 @@ func (m *txnManager) OnStmtCommit(ctx context.Context) error {
if m.ctxProvider == nil {
return errors.New("context provider not set")
}
m.recordEvent("stmt commit")
return m.ctxProvider.OnStmtCommit(ctx)
}

func (m *txnManager) recordEvent(eventName string) {
if m.events == nil {
m.resetEvents()
}
m.events = append(m.events, event{event: eventName, duration: time.Since(m.lastInstant)})
m.lastInstant = time.Now()
}

func (m *txnManager) resetEvents() {
if m.events == nil {
m.events = make([]event, 0, 10)
} else {
m.events = m.events[:0]
}
m.enterTxnInstant = time.Now()
}

// OnStmtRollback is the hook that should be called when a statement fails to execute.
func (m *txnManager) OnStmtRollback(ctx context.Context, isForPessimisticRetry bool) error {
if m.ctxProvider == nil {
return errors.New("context provider not set")
}
m.recordEvent("stmt rollback")
return m.ctxProvider.OnStmtRollback(ctx, isForPessimisticRetry)
}

Expand Down
3 changes: 3 additions & 0 deletions sessionctx/variable/session.go
Original file line number Diff line number Diff line change
Expand Up @@ -1373,6 +1373,9 @@ type SessionVars struct {
// EnableINLJoinInnerMultiPattern indicates whether enable multi pattern for index join inner side
// For now it is not public to user
EnableINLJoinInnerMultiPattern bool

// SlowTxnThreshold is the threshold of slow transaction logs
SlowTxnThreshold uint64
}

// planReplayerSessionFinishedTaskKeyLen is used to control the max size for the finished plan replayer task key in session
Expand Down
6 changes: 6 additions & 0 deletions sessionctx/variable/sysvar.go
Original file line number Diff line number Diff line change
Expand Up @@ -389,6 +389,12 @@ var defaultSysVars = []*SysVar{
}, GetGlobal: func(_ context.Context, s *SessionVars) (string, error) {
return BoolToOnOff(ProcessGeneralLog.Load()), nil
}},
{Scope: ScopeSession, Name: TiDBSlowTxnLogThreshold, Value: strconv.Itoa(logutil.DefaultSlowTxnThreshold),
Type: TypeUnsigned, MinValue: 0, MaxValue: math.MaxInt64, SetSession: func(s *SessionVars, val string) error {
s.SlowTxnThreshold = TidbOptUint64(val, logutil.DefaultSlowTxnThreshold)
return nil
},
},
{Scope: ScopeInstance, Name: TiDBSlowLogThreshold, Value: strconv.Itoa(logutil.DefaultSlowThreshold), Type: TypeInt, MinValue: -1, MaxValue: math.MaxInt64, SetGlobal: func(_ context.Context, s *SessionVars, val string) error {
atomic.StoreUint64(&config.GetGlobalConfig().Instance.SlowThreshold, uint64(TidbOptInt64(val, logutil.DefaultSlowThreshold)))
return nil
Expand Down
3 changes: 3 additions & 0 deletions sessionctx/variable/tidb_vars.go
Original file line number Diff line number Diff line change
Expand Up @@ -214,6 +214,9 @@ const (
// TiDBSlowLogThreshold is used to set the slow log threshold in the server.
TiDBSlowLogThreshold = "tidb_slow_log_threshold"

// TiDBSlowTxnLogThreshold is used to set the slow transaction log threshold in the server.
TiDBSlowTxnLogThreshold = "tidb_slow_txn_log_threshold"

// TiDBRecordPlanInSlowLog is used to log the plan of the slow query.
TiDBRecordPlanInSlowLog = "tidb_record_plan_in_slow_log"

Expand Down
6 changes: 4 additions & 2 deletions sessiontxn/interface.go
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ type EnterNewTxnRequest struct {
// causalConsistencyOnly means whether enable causal consistency for transactions, default is false
CausalConsistencyOnly bool
// staleReadTS indicates the read ts for the stale read transaction.
//The default value is zero which means not a stale read transaction.
// The default value is zero which means not a stale read transaction.
StaleReadTS uint64
}

Expand Down Expand Up @@ -123,7 +123,7 @@ type TxnContextProvider interface {
GetTxnScope() string
// GetReadReplicaScope returns the read replica scope
GetReadReplicaScope() string
//GetStmtReadTS returns the read timestamp used by select statement (not for select ... for update)
// GetStmtReadTS returns the read timestamp used by select statement (not for select ... for update)
GetStmtReadTS() (uint64, error)
// GetStmtForUpdateTS returns the read timestamp used by update/insert/delete or select ... for update
GetStmtForUpdateTS() (uint64, error)
Expand Down Expand Up @@ -205,6 +205,8 @@ type TxnManager interface {
OnStmtCommit(ctx context.Context) error
// OnStmtRollback is the hook that should be called when a statement fails to execute.
OnStmtRollback(ctx context.Context, isForPessimisticRetry bool) error
// OnStmtEnd is called when a statement ends, together with txn.onStmtEnd()
OnStmtEnd()
// OnLocalTemporaryTableCreated is the hook that should be called when a local temporary table created.
OnLocalTemporaryTableCreated()
// ActivateTxn activates the transaction.
Expand Down
2 changes: 2 additions & 0 deletions util/logutil/log.go
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,8 @@ const (
DefaultLogFormat = "text"
// DefaultSlowThreshold is the default slow log threshold in millisecond.
DefaultSlowThreshold = 300
// DefaultSlowTxnThreshold is the default slow txn log threshold in ms.
DefaultSlowTxnThreshold = 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How to choose a default-enabled value?

Copy link
Member Author

@ekexium ekexium Mar 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default to 0. It's totally dependent on the application logic to determine what is "slow". And defaulting to 0 will prevent noisy logs from inner transactions.
And that's why this should be a SESSION-scoped variable.

// DefaultQueryLogMaxLen is the default max length of the query in the log.
DefaultQueryLogMaxLen = 4096
// DefaultRecordPlanInSlowLog is the default value for whether enable log query plan in the slow log.
Expand Down
1 change: 1 addition & 0 deletions util/sem/sem.go
Original file line number Diff line number Diff line change
Expand Up @@ -151,6 +151,7 @@ func IsInvisibleSysVar(varNameInLower string) bool {
variable.TiDBRowFormatVersion,
variable.TiDBSlowQueryFile,
variable.TiDBSlowLogThreshold,
variable.TiDBSlowTxnLogThreshold,
variable.TiDBEnableCollectExecutionInfo,
variable.TiDBMemoryUsageAlarmRatio,
variable.TiDBRedactLog,
Expand Down