Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add extra logging for investigation into #52000 #52472

Merged

Conversation

DaveCTurner
Copy link
Contributor

It looks like #52000 is caused by a slowdown in cluster state application
(maybe due to #50907) but I would like to understand the details to ensure that
there's nothing else going on here too before simply increasing the timeout.
This commit enables some relevant DEBUG loggers and also captures stack
traces from all threads rather than just the three hottest ones.

It looks like elastic#52000 is caused by a slowdown in cluster state application
(maybe due to elastic#50907) but I would like to understand the details to ensure that
there's nothing else going on here too before simply increasing the timeout.
This commit enables some relevant `DEBUG` loggers and also captures stack
traces from all threads rather than just the three hottest ones.
@DaveCTurner DaveCTurner added >non-issue :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. v8.0.0 v7.7.0 labels Feb 18, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (:Distributed/Cluster Coordination)

@DaveCTurner
Copy link
Contributor Author

Note to reviewers: the change to the loggers will be reverted in due course, but I think the change to the hot threads capture is worth keeping. Please review accordingly.

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@DaveCTurner DaveCTurner merged commit 34f302b into elastic:master Feb 18, 2020
@DaveCTurner DaveCTurner deleted the 2020-02-18-more-logging-for-52000 branch February 18, 2020 13:02
DaveCTurner added a commit that referenced this pull request Feb 18, 2020
It looks like #52000 is caused by a slowdown in cluster state application
(maybe due to #50907) but I would like to understand the details to ensure that
there's nothing else going on here too before simply increasing the timeout.
This commit enables some relevant `DEBUG` loggers and also captures stack
traces from all threads rather than just the three hottest ones.
sbourke pushed a commit to sbourke/elasticsearch that referenced this pull request Feb 19, 2020
It looks like elastic#52000 is caused by a slowdown in cluster state application
(maybe due to elastic#50907) but I would like to understand the details to ensure that
there's nothing else going on here too before simply increasing the timeout.
This commit enables some relevant `DEBUG` loggers and also captures stack
traces from all threads rather than just the three hottest ones.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. >non-issue v7.7.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants