-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
*: tidb tracing prototype #7016
Conversation
Good job. |
Here is an overview of this PR and I sincerely wish this can help you understand how I implement this PR and speed the review process up. Firstly I add a new executor
In addition, I add a I create You may wonder how You may want to know what is |
If one sql runs 2 minutes, will the trace command run 2 minutes too? |
Yes. tracing traces how query got executed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PLEASE SPLIT THIS PR for reviewer
Gopkg.toml
Outdated
@@ -89,3 +89,7 @@ required = ["github.com/golang/protobuf/jsonpb"] | |||
name = "github.com/sirupsen/logrus" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make vendor change in another PR please.
distsql/distsql.go
Outdated
@@ -16,16 +16,20 @@ package distsql | |||
import ( | |||
"time" | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove empty line
distsql/distsql.go
Outdated
if !sctx.GetSessionVars().EnableStreaming { | ||
kvReq.Streaming = false | ||
} | ||
|
||
for _, key := range kvReq.KeyRanges { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a session variable to control the behavior?
I don't like the performance degrade if the feature is not used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
length of KeyRangs is constant. The complexity is just O(1)
; hence performance degradation not need to consider.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
length of KeyRangs is constant.
You sure?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By saying it is a constant, I mean it will not grow linearly. In general, even it is 1000 or much larger, it is still constant.
executor/aggregate.go
Outdated
@@ -16,7 +16,10 @@ package executor | |||
import ( | |||
"sync" | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
executor/aggregate.go
Outdated
@@ -64,6 +68,8 @@ type HashAggPartialWorker struct { | |||
// chk stores the input data from child, | |||
// and is reused by childExec and partial worker. | |||
chk *chunk.Chunk | |||
|
|||
trace opentracing.Span |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this kind of things be hidden in context.Context ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it will be used across different methods. Wrapping into ctx does not resolve this. So, I add this trace as part of the struct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO, the trace related information should come with the channel message for a worker.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by saying "come with the channel message for a worker"?
It will be really nice if you couple explains it more explicitly.
executor/builder.go
Outdated
@@ -617,17 +619,6 @@ func (b *executorBuilder) buildDDL(v *plan.DDL) Executor { | |||
return e | |||
} | |||
|
|||
func (b *executorBuilder) buildExplain(v *plan.Explain) Executor { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move code makes a reviewer waste more time to view a PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3202bc5
to
1f7bfda
Compare
plan/trace.go
Outdated
targetPlan, err := Optimize(b.ctx, trace.Stmt, b.is) | ||
if err != nil { | ||
// TODO: imporve this error handle once everything works | ||
b.err = errors.New("fail to optimize") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This error message is really hard to read for users.
distsql/distsql.go
Outdated
for _, key := range kvReq.KeyRanges { | ||
sTid, sIid, _, err := tablecodec.DecodeKeyHead(key.StartKey) | ||
if err != nil { | ||
return nil, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/ err/ errors.Trace(err)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think add a String method of Key
maybe better?
distsql/distsql.go
Outdated
} | ||
eTid, eIid, _, err := tablecodec.DecodeKeyHead(key.EndKey) | ||
if err != nil { | ||
return nil, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
distsql/distsql.go
Outdated
@@ -54,6 +73,7 @@ func Select(ctx context.Context, sctx sessionctx.Context, kvReq *kv.Request, fie | |||
}, nil | |||
} | |||
|
|||
child.LogKV("event", "finished sending rpc calls") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do not need this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need it because it can help us know when rpc is finished so we can understand how many millisecond is spent at sending rpc call.
distsql/distsql.go
Outdated
if err != nil { | ||
return nil, err | ||
} | ||
child.LogKV("sending rpc", fmt.Sprintf("to [/Table/%d/%d, %d/%d)", sTid, sIid, eTid, eIid)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why make the string format like this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It tells us which table id and index id that this query may touch.
executor/trace.go
Outdated
chk.Reset() | ||
for !e.exhausted { | ||
var timeVal string | ||
var baseIdx int64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this var can be removed?
plan/trace.go
Outdated
|
||
StmtPlan Plan | ||
|
||
ChildSchema expression.Schema |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Useless attribute.
plan/trace.go
Outdated
b.err = errors.New("fail to optimize") | ||
return nil | ||
} | ||
pp, _ := targetPlan.(PhysicalPlan) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do not need this.
util/tracing/tracer.go
Outdated
) | ||
|
||
// TiDBTrace is set as Baggage on traces which are used for snowball tracing. | ||
const TiDBTrace = "tr" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not "trace"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the shorter the better.
distsql/distsql.go
Outdated
if !sctx.GetSessionVars().EnableStreaming { | ||
kvReq.Streaming = false | ||
} | ||
|
||
for _, key := range kvReq.KeyRanges { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
length of KeyRangs is constant.
You sure?
executor/aggregate.go
Outdated
@@ -64,6 +68,8 @@ type HashAggPartialWorker struct { | |||
// chk stores the input data from child, | |||
// and is reused by childExec and partial worker. | |||
chk *chunk.Chunk | |||
|
|||
trace opentracing.Span |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO, the trace related information should come with the channel message for a worker.
executor/aggregate.go
Outdated
@@ -351,12 +359,14 @@ func (w *HashAggPartialWorker) run(ctx sessionctx.Context, waitGroup *sync.WaitG | |||
if needShuffle { | |||
w.shuffleIntermData(sc, finalConcurrency) | |||
} | |||
w.trace.Finish() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will w.trace
be nil
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope. All trace is at least a noop trace.
executor/trace.go
Outdated
childrenResults []*chunk.Chunk | ||
} | ||
|
||
// buildTrace builds a TraceExec for future executing. This func will be called |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/func/function
executor/trace.go
Outdated
e.children = make([]Executor, 0, len(pp.Children())) | ||
for _, child := range pp.Children() { | ||
switch p := child.(type) { | ||
case *plan.PhysicalTableReader: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What will happen if we explain trace select ...
?
executor/trace.go
Outdated
// Open opens a trace executor and it will create a root snowball span which will be | ||
// used for the follwoing span in a relationship of `ChildOf` or `FollowFrom`. | ||
func (e *TraceExec) Open(ctx context.Context) error { | ||
e.rootTrace = tracing.NewRecordedTrace("trace_exec", func(sp basictracer.RawSpan) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After that, global opentracing Tracer is modified?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I will file another PR to better handle this.
executor/trace.go
Outdated
}) | ||
ctx = opentracing.ContextWithSpan(ctx, e.rootTrace) | ||
for _, child := range e.children { | ||
err := child.Open(ctx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will some node open multiple times?
The children node open
may also call its children's open
? If something like indexReader
open multiple time, would there be resource leak?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usually, it is not. I just copied the code from basicExecutor with some addition about how to set up tracing.
@zhexuany |
371f35f
to
96f0e8a
Compare
distsql/distsql.go
Outdated
if !sctx.GetSessionVars().EnableStreaming { | ||
kvReq.Streaming = false | ||
} | ||
|
||
for _, key := range kvReq.KeyRanges { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to trace every key range.
In index lookup request, there can be thousands of ranges.
This has too much cost for index lookup requests.
c4295c1
to
4889004
Compare
83570ea
to
a8821c8
Compare
Not easy to do so since we remove all |
…o tidb_tracing_prototype
@zhexuany |
@coocood Good point, I will do it in another PR. |
LGTM |
plan/trace.go
Outdated
retFields := []string{"operation", "duration", "spanID"} | ||
schema := expression.NewSchema(make([]*expression.Column, 0, len(retFields))...) | ||
schema.Append(buildColumn("", "operation", mysql.TypeString, mysql.MaxBlobWidth)) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove this empty line?
plan/trace.go
Outdated
return nil, errors.New("trace only supports select query") | ||
} | ||
|
||
optimizedP, err := Optimize(b.ctx, trace.Stmt, b.is) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/optimizedP/stmtPlan/?
plan/trace.go
Outdated
// rows result. | ||
func (b *planBuilder) buildTrace(trace *ast.TraceStmt) (Plan, error) { | ||
if _, ok := trace.Stmt.(*ast.SelectStmt); !ok { | ||
return nil, errors.New("trace only supports select query") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about "only the select statement can be traced"?
executor/trace_test.go
Outdated
type testTraceExec struct{} | ||
|
||
func (s *testTraceExec) SetupSuite(c *C) { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove this empty line?
executor/trace_test.go
Outdated
|
||
} | ||
|
||
func (s *testSuite) TestTraceExec(c *C) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems this test file does nothing, can we remove the whole file and add it when adding ut.
And I think it's better to add these integration test in mysql-test
, not in the ut.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I leave a todo and I will soon add the actual test in another PR.
executor/trace.go
Outdated
|
||
// buildTrace builds a TraceExec for future executing. This method will be called | ||
// at build(). | ||
func (b *executorBuilder) buildTrace(v *plan.Trace) Executor { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to maintain this function in the "builder.go" file, like other functions.
executor/trace.go
Outdated
e.children = make([]Executor, 0, len(pp.Children())) | ||
for _, child := range pp.Children() { | ||
switch p := child.(type) { | ||
case *plan.PhysicalTableReader, *plan.PhysicalIndexReader, *plan.PhysicalIndexLookUpReader, *plan.PhysicalHashAgg, *plan.PhysicalProjection, *plan.PhysicalStreamAgg, *plan.PhysicalSort: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please split the long code line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems these operators are also not supported in this PR?
executor/trace.go
Outdated
// Open opens a trace executor and it will create a root trace span which will be | ||
// used for the following span in a relationship of `ChildOf` or `FollowFrom`. | ||
// for more details, you could refer to http://opentracing.io | ||
func (e *TraceExec) Open(ctx context.Context) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems that we can not trace the building and optimization time of a query, only the execution time can be traced?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is only for now. I will file another PR to support this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the TraceExec
has no child, I think we can just leave this function to be empty.
func (b *executorBuilder) buildTrace(v *plan.Trace) Executor { | ||
return &TraceExec{ | ||
baseExecutor: newBaseExecutor(b.ctx, v.Schema(), v.ExplainID()), | ||
stmtNode: v.StmtNode, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we store the plain text of the select
statement to be traced? We can trace the time spent by parser with the help of this.
executor/trace.go
Outdated
// Open opens a trace executor and it will create a root trace span which will be | ||
// used for the following span in a relationship of `ChildOf` or `FollowFrom`. | ||
// for more details, you could refer to http://opentracing.io | ||
func (e *TraceExec) Open(ctx context.Context) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the TraceExec
has no child, I think we can just leave this function to be empty.
executor/trace.go
Outdated
} | ||
|
||
// append select executor to trace executor | ||
e.children = append(e.children, e.builder.build(pp)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'd better use another variable to store the executor, for example e.stmtExec
.
executor/trace.go
Outdated
}) | ||
// we actually don't care when underlying executor started. We only care how | ||
// much time was spent | ||
for _, child := range e.children { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only need to call e.stmtExec.Open()
executor/trace.go
Outdated
return errors.Trace(err) | ||
} | ||
} | ||
e.childrenResults = make([]*chunk.Chunk, 0, len(e.children)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only one Chunk
is needed for e.stmtExxec
executor/trace.go
Outdated
// store span into context | ||
ctx = opentracing.ContextWithSpan(ctx, e.rootTrace) | ||
|
||
if len(e.children) > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can be removed.
executor/trace.go
Outdated
if err := e.children[0].Next(ctx, e.childrenResults[0]); err != nil { | ||
return errors.Trace(err) | ||
} | ||
if e.childrenResults[0].NumRows() != 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
e.childrenResults[0].NumRows() == 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/run-all-tests |
Thank you for working on TiDB! Please read TiDB's CONTRIBUTING document BEFORE filing this PR.
What have you changed? (mandatory)
Please explain IN DETAIL what the changes are in this PR and why they are needed:
Syntax support was already covered in another PR. In this PR, I actually added the functionality by taking advantage of opentracing. A detailed design doc will be release in a couple of days.
What are the type of the changes (mandatory)?
New feature.
How has this PR been tested (mandatory)?
Please describe the tests that you ran to verify your changes. Have you finished unit tests, integration tests, or manual tests?
a tracer test is needed and I will add shortly.
Does this PR affect documentation (docs/docs-cn) update? (optional)
If there is document change, please file a PR in (docs and docs-cn) and add the PR number here.
Yes. It will affect the way how ops engineer inverstigate slow query. I will file anothr PR to update.
Refer to a related PR or issue link (optional)
Benchmark result if necessary (optional)
I have not benchmakred my changes but I will add one later
Add a few positive/negative examples (optional)
Suppose you already have tpch data present in your database, runing the following command will give a tace.
The output looks like the following:
Duration is dependent with your data size. Scale of 0.05 was used to generate data and that is why it is kind of fast here.