Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve leader node-left logging to indicate timeout/coordination state rejection #1584

Merged
merged 2 commits into from
Nov 19, 2021

Conversation

Poojita-Raj
Copy link
Contributor

Signed-off-by: Poojita Raj poojiraj@amazon.com

Description

Production systems generally don't have debug logging enabled. The reasons for follower rejection are now logged at the default INFO logging level.

Issues Resolved

#992

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Poojita Raj <poojiraj@amazon.com>
@Poojita-Raj Poojita-Raj requested a review from a team as a code owner November 18, 2021 19:51
@opensearch-ci-bot
Copy link
Collaborator

Can one of the admins verify this patch?

@opensearch-ci-bot
Copy link
Collaborator

✅   Gradle Wrapper Validation success ab0057e

@opensearch-ci-bot
Copy link
Collaborator

✅   Gradle Precommit success ab0057e

@opensearch-ci-bot
Copy link
Collaborator

❌   Gradle Check failure ab0057e
Log 1133

Reports 1133

@CEHENKLE
Copy link
Member

start gradle check

@@ -299,6 +299,7 @@ FastResponseState getFastResponseState() {
private void handleDisconnectedNode(DiscoveryNode discoveryNode) {
FollowerChecker followerChecker = followerCheckers.get(discoveryNode);
if (followerChecker != null) {
logger.info(() -> new ParameterizedMessage("{} disconnected", followerChecker));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not consistent throughout this class, but I think it's fine (and more concise) to do

logger.info("{} disconnected", followerChecker);

like is done on line 433. Feel free to simplify the other lines you modified to this style as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes made as suggested for all the modified logging statements.

@opensearch-ci-bot
Copy link
Collaborator

✅   Gradle Check success ab0057e
Log 1134

Reports 1134

@opensearch-ci-bot
Copy link
Collaborator

✅   Gradle Wrapper Validation success 739d5071e9c360c0212b094934e00d22a7fbb24f

@opensearch-ci-bot
Copy link
Collaborator

✅   Gradle Precommit success 739d5071e9c360c0212b094934e00d22a7fbb24f

@opensearch-ci-bot
Copy link
Collaborator

✅   Gradle Check success 739d5071e9c360c0212b094934e00d22a7fbb24f
Log 1136

Reports 1136

@opensearch-ci-bot
Copy link
Collaborator

✅   Gradle Wrapper Validation success 17145dd2282bfc82025ec2611afbfba01fb4fdcd

@opensearch-ci-bot
Copy link
Collaborator

❌   Gradle Check failure 17145dd2282bfc82025ec2611afbfba01fb4fdcd
Log 1137

Reports 1137

@opensearch-ci-bot
Copy link
Collaborator

❌   Gradle Precommit failure 17145dd2282bfc82025ec2611afbfba01fb4fdcd
Log 1591

@opensearch-ci-bot
Copy link
Collaborator

✅   Gradle Wrapper Validation success ab0057e

@opensearch-ci-bot
Copy link
Collaborator

✅   Gradle Precommit success ab0057e

@opensearch-ci-bot
Copy link
Collaborator

✅   Gradle Check success ab0057e
Log 1138

Reports 1138

reason = "health check failed";
} else if (failureCountSinceLastSuccess >= followerCheckRetryCount) {
logger.debug(() -> new ParameterizedMessage("{} failed too many times", FollowerChecker.this), exp);
logger.info(() -> new ParameterizedMessage("{} failed too many times", FollowerChecker.this), exp);
reason = "followers check retry count exceeded";
} else {
logger.debug(() -> new ParameterizedMessage("{} failed, retrying", FollowerChecker.this), exp);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if retries should be info as well, since it would be helpful to know if single attempts failed too

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose was to troubleshoot reasons node-left if that failure occurs; Single attempts failing seems like additional information more suited for the debug level - especially in a production system not opted into debugging. My concern was that it might unnecessarily crowd the output - especially in a system with large number of followers since this would all print on the leader node.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its possible for node-left due followers check retry count exceeded to have different reasons per attempt like timeout, CSRejectedException etc. The above logging will point the last attempt failure reason as the culprit, which might be leading. So either we could aggregate those reasons or log all. Thoughts?

Copy link
Contributor Author

@Poojita-Raj Poojita-Raj Nov 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I see your point. In that case, it's simpler to change it to info from debug.

Signed-off-by: Poojita Raj <poojiraj@amazon.com>
@opensearch-ci-bot
Copy link
Collaborator

✅   Gradle Wrapper Validation success 470b1a2

@opensearch-ci-bot
Copy link
Collaborator

✅   Gradle Precommit success 470b1a2

@opensearch-ci-bot
Copy link
Collaborator

✅   Gradle Check success 470b1a2
Log 1149

Reports 1149

@anasalkouz anasalkouz merged commit a15c526 into opensearch-project:main Nov 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants