Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: improve node liveness failure log, point to docs #51870

Merged
merged 1 commit into from
Jul 28, 2020

Conversation

nvanbenschoten
Copy link
Member

This commit improves the message logged on node liveness failures, which
is logged roughly every 4.5 seconds during liveness unavailability. The
improved message describes some of the common causes of liveness
unavailability (resource starvation and network connectivity problems)
and then links to our troubleshooting docs about the topic.

This was an action item coming out of a recent support incident postmortem.

This commit improves the message logged on node liveness failures, which
is logged roughly every 4.5 seconds during liveness unavailability. The
improved message describes some of the common causes of liveness
unavailability (resource starvation and network connectivity problems)
and then links to our troubleshooting docs about the topic.

This was an action item coming out of a recent support incident postmortem.
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Member

@jordanlewis jordanlewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:, thanks for taking care of this!

@jseldess would you or someone from your team care to give this a quick review? What do you think of this strategy to write more verbose log messages (really, on-line documentation) during common troubleshooting scenarios?

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @jordanlewis)

@nvanbenschoten
Copy link
Member Author

@jseldess would you or someone from your team care to give this a quick review? What do you think of this strategy to write more verbose log messages (really, on-line documentation) during common troubleshooting scenarios?

Also, while here, could you confirm that we should be linking to https://www.cockroachlabs.com/docs/stable/... from Cockroach itself. Other alternatives would be to link to https://www.cockroachlabs.com/docs/<current version>/ or to link somewhere else that may have a lower chance of rotting.

@jseldess
Copy link
Contributor

This very much LGTM! I love the idea of providing more guidance and links to docs in log messages as well as in error messages. Thanks for writing this, @nvanbenschoten, and for looping me in, @jordanlewis.

@nvanbenschoten, I think the link you're using is best and matches what I see done in other parts of the code base. The stable alias shouldn't rot. If the target url changes somehow, we should always be putting client or server-side redirects in place on our end.

Beyond the scope of this PR: I think texts like these should ultimately be centralized in a file or files somewhere, or even housed outside of the cockroach repo and fetched as a part of the build process. That would make it much easier for writers to know what and when to review log and error texts, and perhaps there'd even be a way to auto-assign docs reviewers in cases where those are touched. In an ideal world, this would apply to:

  • CLI help
  • SQL shell help
  • Session variable, built-in function, cluster setting descriptions
  • Error messages
  • Log messages
  • UI copy and tooltips
    I'm sure I'm missing others. Thoughts on something like this? Wonder if @mjibson has thoughts since he's worked on other docs efficiencies like SQL diagrams and is working on auto-generated API docs, from what I hear.

@nvanbenschoten
Copy link
Member Author

TFTR!

If the target url changes somehow, we should always be putting client or server-side redirects in place on our end.

Got it, that's very helpful to know. And that's even true of section headers like #node-liveness-issues, right?

Thoughts on something like this?

Centralizing string assets like these seems like a natural progression. I think that's actually a common approach, especially when applications start thinking about internationalization. This would also make it much easier to auto-assign docs reviewers because we could match on the package of the files being changed.

bors r+

@jseldess
Copy link
Contributor

Got it, that's very helpful to know. And that's even true of section headers like #node-liveness-issues, right?

Yep! I'm not sure how thoughtful we've been with header-level redirects, but I'll bring it up in the docs meeting.

@maddyblue
Copy link
Contributor

I'm currently work on extracting out our admin ui HTTP endpoints into auto docs. If we want to put this stuff somewhere else it could follow a similar path. In general it's not too much work. Up to you. If users or operators of cockroach would find it useful, ask the dev interfaces team.

@craig
Copy link
Contributor

craig bot commented Jul 27, 2020

Build failed:

@lunevalex
Copy link
Collaborator

drive by kicking bors again, since it failed

bors r+

@craig
Copy link
Contributor

craig bot commented Jul 28, 2020

Build succeeded:

@craig craig bot merged commit 4c24889 into cockroachdb:master Jul 28, 2020
@nvanbenschoten nvanbenschoten deleted the nvanbenschoten/livenessLog branch August 3, 2020 15:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants