-
Notifications
You must be signed in to change notification settings - Fork 611
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[agent] debug logs for session, node events on dispatcher, heartbeats #2486
Conversation
agent/agent.go
Outdated
backoff = initialSessionFailureBackoff + 2*backoff | ||
if backoff > maxSessionFailureBackoff { | ||
backoff = maxSessionFailureBackoff | ||
} | ||
log.G(ctx).WithError(err).Errorf("agent: session failed. Backoff period: %d", backoff) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use a field here: log.G(ctx).WithError(err).WithField("backoff", backoff).Errorf("agent: session failed")
.
Note this is the maximum backoff range for a random backoff. Actual backoff is rand([0, backoff)). ;)
manager/dispatcher/dispatcher.go
Outdated
@@ -1094,6 +1094,7 @@ func (d *Dispatcher) Heartbeat(ctx context.Context, r *api.HeartbeatRequest) (*a | |||
} | |||
|
|||
period, err := d.nodes.Heartbeat(nodeInfo.NodeID, r.SessionID) | |||
log.G(ctx).WithField("dispatcher", "heartbeat").Infof("agent heartbeat period %v", period) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be better to use the module here, rather than a field.
Codecov Report
@@ Coverage Diff @@
## master #2486 +/- ##
==========================================
+ Coverage 61.23% 61.42% +0.19%
==========================================
Files 49 129 +80
Lines 6890 21313 +14423
==========================================
+ Hits 4219 13092 +8873
- Misses 2241 6812 +4571
- Partials 430 1409 +979 |
connectionbroker/broker.go
Outdated
@@ -58,6 +60,8 @@ func (b *Broker) Select(dialOpts ...grpc.DialOption) (*Conn, error) { | |||
// connection. | |||
func (b *Broker) SelectRemote(dialOpts ...grpc.DialOption) (*Conn, error) { | |||
peer, err := b.remotes.Select() | |||
log.G(context.Background()).Infof("Manager selected by agent for session: %v", peer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: all messages start with lowercase.
5bf93bc
to
8f1bb1b
Compare
e0e280f
to
f822e10
Compare
@anshulpundir CI is failing with a bunch of data races. |
Will check it out tomorrow @nishanttotla |
agent/session.go
Outdated
resp, err := client.Heartbeat(heartbeatCtx, &api.HeartbeatRequest{ | ||
SessionID: s.sessionID, | ||
}) | ||
cancel() | ||
if err != nil { | ||
if grpc.Code(err) == codes.NotFound { | ||
log.G(ctx).WithFields(fields).WithError(err).Errorf("heartbeat failed") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you want to add here again the manager details to make it easier to identify it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do!
manager/dispatcher/dispatcher.go
Outdated
@@ -1094,6 +1108,8 @@ func (d *Dispatcher) Heartbeat(ctx context.Context, r *api.HeartbeatRequest) (*a | |||
} | |||
|
|||
period, err := d.nodes.Heartbeat(nodeInfo.NodeID, r.SessionID) | |||
|
|||
log.G(ctx).WithField("method", "(*Dispatcher).Heartbeat").Infof("received heartbeat from worker %v, expect next heartbeat in %v", nodeInfo, period) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe this one debug?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its every 5 seconds, I thought thats not too frequent for logging ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's per node in the cluster, so can easily be hundred every 5 sec :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh yes, I mistook this for the log on the agent.
f8af5f4
to
fc5a878
Compare
agent/session.go
Outdated
resp, err := client.Heartbeat(heartbeatCtx, &api.HeartbeatRequest{ | ||
SessionID: s.sessionID, | ||
}) | ||
cancel() | ||
if err != nil { | ||
if grpc.Code(err) == codes.NotFound { | ||
log.G(ctx).WithFields(fields).WithError(err).Errorf("heartbeat to manager %v failed", s.conn.Peer()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you want to move this one line above, so it's printed no matter from the grpc code?
fc5a878
to
bb17218
Compare
Signed-off-by: Anshul Pundir <anshul.pundir@docker.com>
bb17218
to
6fa4dda
Compare
Added info logs to the agent to track the manager its connecting to, timeouts, heartbeat from the dispatcher.
Signed-off-by: Anshul Pundir anshul.pundir@docker.com