-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve leader node-left logging to indicate timeout/coordination state rejection #1584
Improve leader node-left logging to indicate timeout/coordination state rejection #1584
Conversation
Signed-off-by: Poojita Raj <poojiraj@amazon.com>
Can one of the admins verify this patch? |
✅ Gradle Wrapper Validation success ab0057e |
✅ Gradle Precommit success ab0057e |
start gradle check |
@@ -299,6 +299,7 @@ FastResponseState getFastResponseState() { | |||
private void handleDisconnectedNode(DiscoveryNode discoveryNode) { | |||
FollowerChecker followerChecker = followerCheckers.get(discoveryNode); | |||
if (followerChecker != null) { | |||
logger.info(() -> new ParameterizedMessage("{} disconnected", followerChecker)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not consistent throughout this class, but I think it's fine (and more concise) to do
logger.info("{} disconnected", followerChecker);
like is done on line 433. Feel free to simplify the other lines you modified to this style as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes made as suggested for all the modified logging statements.
✅ Gradle Wrapper Validation success 739d5071e9c360c0212b094934e00d22a7fbb24f |
✅ Gradle Precommit success 739d5071e9c360c0212b094934e00d22a7fbb24f |
✅ Gradle Check success 739d5071e9c360c0212b094934e00d22a7fbb24f |
✅ Gradle Wrapper Validation success 17145dd2282bfc82025ec2611afbfba01fb4fdcd |
❌ Gradle Check failure 17145dd2282bfc82025ec2611afbfba01fb4fdcd |
❌ Gradle Precommit failure 17145dd2282bfc82025ec2611afbfba01fb4fdcd |
17145dd
to
ab0057e
Compare
✅ Gradle Wrapper Validation success ab0057e |
✅ Gradle Precommit success ab0057e |
reason = "health check failed"; | ||
} else if (failureCountSinceLastSuccess >= followerCheckRetryCount) { | ||
logger.debug(() -> new ParameterizedMessage("{} failed too many times", FollowerChecker.this), exp); | ||
logger.info(() -> new ParameterizedMessage("{} failed too many times", FollowerChecker.this), exp); | ||
reason = "followers check retry count exceeded"; | ||
} else { | ||
logger.debug(() -> new ParameterizedMessage("{} failed, retrying", FollowerChecker.this), exp); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering if retries should be info as well, since it would be helpful to know if single attempts failed too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The purpose was to troubleshoot reasons node-left if that failure occurs; Single attempts failing seems like additional information more suited for the debug level - especially in a production system not opted into debugging. My concern was that it might unnecessarily crowd the output - especially in a system with large number of followers since this would all print on the leader node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its possible for node-left due followers check retry count exceeded
to have different reasons per attempt like timeout, CSRejectedException etc. The above logging will point the last attempt failure reason as the culprit, which might be leading. So either we could aggregate those reasons or log all. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I see your point. In that case, it's simpler to change it to info from debug.
Signed-off-by: Poojita Raj <poojiraj@amazon.com>
✅ Gradle Wrapper Validation success 470b1a2 |
✅ Gradle Precommit success 470b1a2 |
Signed-off-by: Poojita Raj poojiraj@amazon.com
Description
Production systems generally don't have debug logging enabled. The reasons for follower rejection are now logged at the default INFO logging level.
Issues Resolved
#992
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.