Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tikvclient: add metrics for gRPC connection transient failure #12084

Merged
merged 2 commits into from
Sep 9, 2019
Merged

tikvclient: add metrics for gRPC connection transient failure #12084

merged 2 commits into from
Sep 9, 2019

Conversation

lonng
Copy link
Contributor

@lonng lonng commented Sep 9, 2019

Signed-off-by: Lonng heng@lonng.org

What problem does this PR solve?

TiDB access to TiKV through gRPC requests. If the underlying socket is disconnected, gRPC will try to reconnect to the underlying socket, which may cause the request delay to jitter. We need to way to monitor the low-level socket state change.

What is changed and how it works?

This PR adds a metric to monitor the gRPC connection state, the metric will record the connection state before sending the request to TiKV. We can diagnose the delay jitter by rate(tidb_grpc_connection_state) after this PR merged.

Check List

Tests

  • No code
  • Manual test
    image

Related changes

  • Need to cherry-pick to the release branch

Release note

  • Write release note for bug-fix or new feature.

@codecov
Copy link

codecov bot commented Sep 9, 2019

Codecov Report

Merging #12084 into master will not change coverage.
The diff coverage is n/a.

@@             Coverage Diff             @@
##             master     #12084   +/-   ##
===========================================
  Coverage   82.1858%   82.1858%           
===========================================
  Files           447        447           
  Lines         99567      99567           
===========================================
  Hits          81830      81830           
  Misses        12148      12148           
  Partials       5589       5589

@lysu
Copy link
Contributor

lysu commented Sep 9, 2019

/bench

store/tikv/client.go Outdated Show resolved Hide resolved
@AilinKid
Copy link
Contributor

AilinKid commented Sep 9, 2019

/run-all-tests

@sre-bot
Copy link
Contributor

sre-bot commented Sep 9, 2019

@@                               Benchmark Diff                               @@
================================================================================
--- tidb: 619b9a92b3c26b830687ea06ffe25538c464505d
+++ tidb: 893f2c93ef8154906765e7b96ddc1c075a16c7a0
tikv: b4daac2bb80ade14d1442d6fc21fa9c12efab943
pd: b66ba4482c5dfb3d976461544b5df3b8442d4d37
================================================================================
test-1: < oltp_insert >
    * QPS : 21367.25 ± 0.7889% (std=102.98) delta: -0.14%
    * AvgMs : 11.98 ± 0.8016% (std=0.06) delta: -0.10%
    * PercentileMs99 : 42.61 ± 0.0000% (std=0.00) delta: -0.37%
            
test-2: < oltp_update_non_index >
    * QPS : 29570.68 ± 0.5176% (std=90.24) delta: 0.31%
    * AvgMs : 8.64 ± 0.2315% (std=0.01) delta: -0.46%
    * PercentileMs99 : 30.59 ± 2.5431% (std=0.44) delta: -0.36%
            
test-3: < oltp_read_write >
    * QPS : 36750.93 ± 0.1727% (std=56.07) delta: -0.26%
    * AvgMs : 139.86 ± 0.1698% (std=0.20) delta: 0.26%
    * PercentileMs99 : 257.95 ± 0.0000% (std=0.00) delta: 0.00%
            
test-4: < oltp_point_select >
    * QPS : 76150.13 ± 0.7291% (std=322.87) delta: -0.69%
    * AvgMs : 3.37 ± 0.3712% (std=0.01) delta: 0.90%
    * PercentileMs99 : 7.43 ± 0.0000% (std=0.00) delta: 0.00%
            
test-5: < oltp_update_index >
    * QPS : 16981.31 ± 0.4707% (std=62.66) delta: -0.16%
    * AvgMs : 15.07 ± 0.4512% (std=0.05) delta: 0.16%
    * PercentileMs99 : 47.99 ± 1.0877% (std=0.43) delta: -0.72%
            

https://perf.pingcap.com

@lonng lonng changed the title tikvclient: add metrics for gRPC connection state tikvclient: add metrics for gRPC connection transient failure Sep 9, 2019
@sre-bot
Copy link
Contributor

sre-bot commented Sep 9, 2019

@@                               Benchmark Diff                               @@
================================================================================
--- tidb: 619b9a92b3c26b830687ea06ffe25538c464505d
+++ tidb: 893f2c93ef8154906765e7b96ddc1c075a16c7a0
tikv: b4daac2bb80ade14d1442d6fc21fa9c12efab943
pd: b66ba4482c5dfb3d976461544b5df3b8442d4d37
================================================================================
tidb_max_cpu: 19.21, delta: -0.46%
tikv_max_cpu: 13.72, delta: 3.66%
tidb_max_memory: 1718.83 MiB, delta: -3.11%
tikv_max_memory: 59880.01 MiB, delta: 0.03%

Measured tpmC (NewOrders): 21651.64, delta: -2.49%

https://perf.pingcap.com

@lonng
Copy link
Contributor Author

lonng commented Sep 9, 2019

/run-all-tests

Copy link
Contributor

@crazycs520 crazycs520 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lonng lonng merged commit 7d74402 into pingcap:master Sep 9, 2019
@lonng lonng deleted the grpc-state-metrics branch September 9, 2019 08:23
@lonng
Copy link
Contributor Author

lonng commented Sep 9, 2019

/run-cherry-picker

@sre-bot
Copy link
Contributor

sre-bot commented Sep 9, 2019

cherry pick to release-3.0 failed

@sre-bot
Copy link
Contributor

sre-bot commented Sep 9, 2019

cherry pick to release-2.1 failed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants