-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhancement for tidb query diagnose #28937
Comments
Related: TL;DR:
We are still working on delivering the Top SQL feature, which solves some more urgent diagnostics issues. However there is a very detailed implementation instruction in tikv/tikv#8942 (comment) I would appreciate if someone interested can give a help to bring it to our code base :) Note: after tikv/tikv#8942 is resolved, we still have more work to do with single-query diagnostics. From my current understanding, a complete solution should at least cover these information:
|
@breeswish |
@cfzjywxk Thanks for the recap! Maybe we can separate what you describe into two things that we can consider work with independently: a) The troubleshooting of a single SQL execution is not well supported. Currently these lack information can only be known from the metrics, which is an aggregation that may not be helpful to a single execution. This has been planned to be improved by @SunRunAway in the next several releases. b) Components other than TiDB does not process the SQL request in the same way as TiDB. I guess there is no architectural difficulty for (b), as some behavior has been successfully kept identical. For example, the handling of different SQL modes are well implemented in both TiKV side and TiDB side. They follows the same behavior for whatever SQL mode user sets. This indicates that we can do it right. However I admit that we need to implement these behaviors one by one for now, which is not a good way. |
Feature Request
Is your feature request related to a problem? Please describe:
In the architecture of TiDB, tidb-server manages the user connections and process incoming queries,these queries will be converted into different kv requests and sent to tikv-server using tidb batch client and grpc components. The question is, after turing query into kv requests many execution context and query context will be lost, which make it difficult to diagnose "query-diamension" issues, for example slow queries. For example we could often see slow queries like:
4f12266030e202b41c1f0531d03ba799a458f11a88635a3f424506a12e3ee543,SELECT
event_time
FROMfollowers
WHEREuid
= 59322005 ANDtarget_uid
= 75161335 LIMIT 1;,10.160.32.142:10080,blued,32214282,1,1632294495.143163,1.004865,0.000028,0.000029,0.000053,0.000000,0.000091,0.000406,0.000000,0.000000,0.000000,0.000000,0,0,427896207863981621,," id task estRows operator info actRows execution info memory diskPoint_Get_1 root 1 table:followers, index:PRIMARY(uid, target_uid) 0 time:1s, loops:1, Get:{num_rpc:1, total_time:1s} N/A N/A",0,,,,blued,10.10.96.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0,0,0,0,0,0,0,,,0,0,0,0,0
In the slow log above, we could only know the kv get rpc is slow(1s), but what is the root cause and how to solve it there's no useful information. Usually we need to check the grafana metrics for more information but this is the incorrect dianosing way, diagnosing query issues using server dimenson information is quite inefficient and inappropriate.
Describe the feature you'd like:
Try to pass the needed query context and execution context into the tikv-server, and record duration for each stage. So we could clearly know what is the root cause slowing down specific queries and will not need to check the aggregated data in grafana. This is the key point to enhance the diagnosibility for TiDB.
Describe alternatives you've considered:
Teachability, Documentation, Adoption, Migration Strategy:
The text was updated successfully, but these errors were encountered: