Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

*: refine commit txn slow log #8731

Merged
merged 8 commits into from
Dec 25, 2018
Merged

*: refine commit txn slow log #8731

merged 8 commits into from
Dec 25, 2018

Conversation

lysu
Copy link
Contributor

@lysu lysu commented Dec 18, 2018

What problem does this PR solve?

1) commit slow log info is hard to find question:

run this script:

create table tb (a int, b int);
insert into tb (a, b) values (1, 1), (2, 2)...;  // x47100
begin;
update tb set b =  0;
commit;

will got commit log like this:

2018/12/18 15:11:04.482 adapter.go:391: [warning] [SLOW_QUERY] cost_time:2.530186968s succ:true con:1 user:root@127.0.0.1 txn_start_ts:405043163388706817 database:test sql:commit

we are hard to know the real reason make this slow.

2) no latch wait metric

From metric, we can't known how long time used in wait local latch, and we need add it to metrics

What is changed and how it works?

1) refine commit txn slow log

add more detail info in commit slow log, just like this:

2018/12/18 15:11:04.482 adapter.go:391: [warning] [SLOW_QUERY] cost_time:2.530186968s prewrite_time:2.284130378s commit_time:83.495155ms get_commit_ts_time:153.792µs write_keys:47100 write_size:1457201 prewrite_region:1 succ:true con:1 user:root@127.0.0.1 txn_start_ts:405043163388706817 database:test sql:commitTxn

we can got detail info:

  • GetCommitTsTime: get commit_ts time for this txn
  • PrewriteTime: prewrite time for this txn
  • CommitTime: commit time for this txn
  • LocalLatchTime: wait tidb local latch time
  • TotalBackoffTime: total time waste in backoff operation
  • ResolveLockTime: time used to resolve lock
  • WriteKeys: keys be writen in this txn
  • WriteSize: data bytes be writen in this txn
  • PrewriteRegionNum: prewrite region count
  • TxnRetry: auto transaction retried count, so if txn be retried above item is first txn commit value

so we can see TxnRetry to known whether slow by txn-retry, or no retry just slow in caused by latch, prewrite, getTs, commit or resolve; and we also can get write key count, data size and requested region count info which are missed in previous log.

2) add latch wait to metric

Add a prometheus metric item, local_latch_wait_seconds:

TiKVLocalLatchWaitTimeHistogram = prometheus.NewHistogram(
		prometheus.HistogramOpts{
			Namespace: "tidb",
			Subsystem: "tikvclient",
			Name:      "local_latch_wait_seconds",
			Help:      "Wait time of a get local latch.",
			Buckets:   prometheus.ExponentialBuckets(0.0005, 2, 20),
		})

to record the time that wait latch when txn could not acquire lock success.

Check List

Tests

  • Exists Test

Code changes

  • Collect stats in 2pc
  • Add log
  • Add metric

Side effects

  • N/A

Related changes

  • Need to cherry-pick to the release branch
  • Need to update the documentation
  • Need to be included in the release note

This change is Reviewable

@tiancaiamao @jackysp PTAL thx~

@lysu
Copy link
Contributor Author

lysu commented Dec 18, 2018

/run-all-tests

@lysu
Copy link
Contributor Author

lysu commented Dec 18, 2018

/run-all-tests

store/tikv/2pc.go Outdated Show resolved Hide resolved
@shenli
Copy link
Member

shenli commented Dec 18, 2018

@tiancaiamao PTAL

@lysu lysu requested a review from tiancaiamao December 20, 2018 03:49
store/tikv/txn.go Outdated Show resolved Hide resolved
@lysu
Copy link
Contributor Author

lysu commented Dec 21, 2018

/run-all-tests

@shenli
Copy link
Member

shenli commented Dec 21, 2018

The metrics is not mentioned in "What is changed and how it works?" section. Please refine the description.

@lysu
Copy link
Contributor Author

lysu commented Dec 21, 2018

/rebuild

@lysu
Copy link
Contributor Author

lysu commented Dec 21, 2018

/run-all-tests

@lysu
Copy link
Contributor Author

lysu commented Dec 21, 2018

@tiancaiamao @jackysp @winkyao PTAL

Copy link
Member

@jackysp jackysp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lysu
Copy link
Contributor Author

lysu commented Dec 25, 2018

@tiancaiamao @tiancaiamao

store/tikv/latch/scheduler.go Outdated Show resolved Hide resolved
Ctx: s,
StartTime: time.Now(),
}
var commitDetail *execdetails.CommitDetails
ctx = context.WithValue(ctx, execdetails.CommitDetailCtxKey, &commitDetail)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why pointer of pointer? is it better to use commitDetail ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

direct use commitDetail will copy on withValue call and couldn't get modification by callled method,
use pointer of commitDetail can get change result, but hard to distinguish "has commitDetail" and "no commitDetail"

@tiancaiamao
Copy link
Contributor

LGTM

@tiancaiamao tiancaiamao added the status/LGT2 Indicates that a PR has LGTM 2. label Dec 25, 2018
@lysu lysu merged commit 351c4a5 into pingcap:master Dec 25, 2018
@lysu lysu deleted the refine-commit-txn-log-metric branch December 25, 2018 12:49
zz-jason pushed a commit that referenced this pull request Dec 29, 2018
yu34po pushed a commit to yu34po/tidb that referenced this pull request Jan 2, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants