-
Notifications
You must be signed in to change notification settings - Fork 529
OCPEDGE-2215: Updated TNF EP to address some drift from original requirements. #1887
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
jaypoulz
wants to merge
1
commit into
openshift:master
Choose a base branch
from
jaypoulz:tnf
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume these error events happen then every 30s while the node is unhealthy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intent is that it works as follows - every 30 seconds we scan pacemaker for updates related to resources and fencing. If a new event is present (e.g. etcd/fencingAgent/kubelet was started or stopped, a node was fenced) check if it has already been posted, and post it if it hasn't already.
The latter part of this implementation is a little tricky. The naive way to do it is to ensure I do a {node-name}-{resource-name}-{timestamp-hash} kind of thing for the event names. Then I can just blindly try to created them every 30s and ignore the 409s.
The nicer way to do it probably to get the last n (probably 2-5) minutes worth of events, filter out the ones created by the status checker, and make sure my names don't conflict prior to creation.
Bottom line is - one "action" captured by pacemaker should equate to one "event" recorded by the api-server.
We don't plan on taking action based on events - those will be taken based on the API conditions. The events are just here to allow a cluster admin to reconstruct a timeline of what might have happened if we've degraded CEO due to pacemaker being unhealthy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In other APIs, we see events emitted regularly over a period. An
oc decsribewill sayx time over x time periodnext to the events as it aggregates. I don't think you necessarily need to do the deduplication you describeI assume that as an end user, I'd be able to see "this status has cleared" when there's an error because a newer event would have come through that shows things return to normal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in general, the events we'll capture likely not represent error conditions. But let's say you had an etcd-node0-stop event prior to a reboot or something and the status starts reporting that the pacemakercluster is unhealthy. ClusterUnhealthy because NodeUnhealthy, NodeUnhealthy because EtcdUnhealthy. You have the event that tells you that etcd is stopped. There should be an etcd-node0-start event to match everything becoming healthy again.
That said, we can also add events for "etcd is down" that would work like emitted events. I think the conditions probably already cover that sufficiently though, yeah? Everything else is just a record of this thing happened at exactly this time.
I don't know if there is a way to detect fencing completed successfully events, as an example. We have a record of when the reboot signal was sent and succeeded, but no new event is expected when the node comes back up healthy (besides the resource start events).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The phrase it differently:
It the collection method is: list me the things that happened in the pacemaker cluster in the last 5 minutes - then you can potentially end up with some strange windows where your repeating both events that say the cluster is healthy and events that say the cluster is not.
If the collection method is: (running every 30s) - list me the things that happened in the pacemaker cluster in the last 30 seconds - you have no duplicate events, but you could miss an event if you tried to run the status cluster during a node reboot and had to end up rescheduling it on the other node. (Which can take several minutes).
The goal of this API is to try to provide pre-warnings for cluster configuration issues and an accurate reconstructed timeline for when things happened. So I think the former design is better for the latter goal. Both solve the first one just fine.