Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

P2P Message Execution Tracing #517

Merged
merged 12 commits into from
Sep 20, 2018
Merged

Conversation

d1vyank
Copy link
Contributor

@d1vyank d1vyank commented Sep 14, 2018

  • Adds a reusable tracing adapter that is applied to both beacon-chain and validator.
  • --enable-tracing, trace-sample-fraction, and --tracing-endpoint CLI options added
  • Define trace spans for existing p2p messages for a breakdown on how processing time is spent.

towards #438

@CLAassistant
Copy link

CLAassistant commented Sep 14, 2018

CLA assistant check
All committers have signed the CLA.

return nil, errors.New("tracing service name cannot be empty")
}

// TODO: make sampling fraction configurable?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts on this? Currently hard-coded to sampling 25% of requests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That'd be awesome. Feel free to open an issue and change it to TODO(#Issue number)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, made this configurable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please format as TODO(#issue number)

Copy link
Member

@terencechain terencechain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. Can't wait to try it later!

shared/cmd/flags.go Outdated Show resolved Hide resolved
Name: "disable-tracing",
Usage: "Disable request tracing",
}
// TracingEndpointFlag flag defines the http enpoint for serving traces to Jaeger
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

period


var log = logrus.WithField("prefix", "tracer")

// New creates and initializes a new tracing adapter
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

period

return nil, errors.New("tracing service name cannot be empty")
}

// TODO: make sampling fraction configurable?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That'd be awesome. Feel free to open an issue and change it to TODO(#Issue number)

shared/p2p/message.go Outdated Show resolved Hide resolved
Copy link
Member

@prestonvanloon prestonvanloon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! I can't wait to try it

shared/cmd/flags.go Outdated Show resolved Hide resolved
)

const (
defaultSamplingFraction = 0.25
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a flag?

"reflect"

"github.com/golang/protobuf/proto"
)

// Message represents a message received from an external peer.
type Message struct {
// Ctx message context
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can omit this. It doesn't add any additional information

shared/p2p/adapter/tracer/tracer.go Outdated Show resolved Hide resolved
@@ -119,7 +119,7 @@ func (s *Server) RegisterTopic(topic string, message proto.Message, adapters ...
}

var h Handler = func(ctx context.Context, pMsg Message) {
s.emit(feed, msg, msgType)
s.emit(pMsg.Ctx, feed, msg, msgType)
}

pMsg := Message{}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's assign a context here and remove the context parameter from the adapter functions, as you suggested.

@@ -119,7 +119,7 @@ func (s *Server) RegisterTopic(topic string, message proto.Message, adapters ...
}

var h Handler = func(ctx context.Context, pMsg Message) {
s.emit(feed, msg, msgType)
s.emit(pMsg.Ctx, feed, msg, msgType)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you pass the whole message?

shared/p2p/service.go Outdated Show resolved Hide resolved
@rauljordan rauljordan changed the title P2P message execution tracing P2P Message Execution Tracing Sep 17, 2018
@d1vyank
Copy link
Contributor Author

d1vyank commented Sep 17, 2018

Addressed feedback. Here's what the traces look like if you're curious:

In Jaeger:
screen shot 2018-09-17 at 1 10 08 pm

In pprof:
screen shot 2018-09-17 at 1 23 37 pm

var messageSpan *trace.Span
msg.Ctx, messageSpan = trace.StartSpan(msg.Ctx, "handleP2pMessage")
next(msg)
messageSpan.End()
Copy link
Member

@prestonvanloon prestonvanloon Sep 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this work properly when the message is handled in a go routine? The typical lifecycle for a message is:

  • Receive message
  • Apply adapters
  • Emit message to feed subscribers that are running in other go routines
  • End routine for handling message <- I think your span will end here, no?
  • Message received via feed to subscriber 1
  • Message received via feed to subscriber 2

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This span only traces the handling of a message in the p2p layer and will end after the emit step as you mentioned. However, since the message context is passed to subscribers 1 and 2, any spans created there will be children of this root and linked to the message trace.
I expect the subscribing routines to define these child spans. (See receiveBlockHash in beacon chain sync service for an example)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. That makes sense.

One last request, can you add an example of how to instrument code in your readme?
It would be helpful to point someone to a canonical example in documentation.

After that, I think this is good to go!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated doc

@codecov
Copy link

codecov bot commented Sep 19, 2018

Codecov Report

Merging #517 into master will increase coverage by 0.43%.
The diff coverage is 79.25%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #517      +/-   ##
==========================================
+ Coverage   76.18%   76.62%   +0.43%     
==========================================
  Files          41       42       +1     
  Lines        2818     2879      +61     
==========================================
+ Hits         2147     2206      +59     
- Misses        472      473       +1     
- Partials      199      200       +1
Impacted Files Coverage Δ
shared/p2p/message.go 100% <ø> (ø) ⬆️
shared/p2p/service.go 67.01% <100%> (+0.34%) ⬆️
shared/p2p/adapter/tracer/tracer.go 100% <100%> (ø)
beacon-chain/node/node.go 59.49% <66.66%> (ø) ⬆️
validator/node/node.go 45.37% <66.66%> (ø) ⬆️
beacon-chain/sync/service.go 75.59% <73.17%> (+7.62%) ⬆️
beacon-chain/node/p2p_config.go 71.42% <75%> (-3.58%) ⬇️
validator/node/p2p_config.go 71.42% <75%> (-3.58%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 75d3258...24a52c3. Read the comment docs.

@prestonvanloon
Copy link
Member

@d1vyank please resolve conflicts then we can merge.

Thanks!

@prestonvanloon prestonvanloon merged commit 2a51ed3 into prysmaticlabs:master Sep 20, 2018
houjieth pushed a commit to houjieth/prysm that referenced this pull request Sep 20, 2018
* Request execution tracing initial commit

* Resolve linter issues

* Run gazelle

* Make trace sampling configurable, clean up, update doc

* Document trace span creation

* Fix linter issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants