Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added a spec regarding the rules for eviction & replacement of pods #133

Merged
merged 4 commits into from
May 14, 2018

Conversation

ewoutp
Copy link
Contributor

@ewoutp ewoutp commented May 10, 2018

No description provided.

Copy link
Member

@neunhoef neunhoef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least some discussion has to happen and a few typos corrected.


## Replacement

Replacement is the process of replacing a pod an another pod that takes over the responsibilities
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"pod an another" -> "pod by another"


### Image ID Pods

The Image ID pods are starter to fetch the ArangoDB version of a specific
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

starter -> started

- Image ID pods can always be restarted on a different node.
There is no need to replace an image ID pod.
- `node.kubernetes.io/unreachable:NoExecute` toleration time is set very low (5sec)
- `node.kubernetes.io/not-ready:NoExecute` toleration time is set very low (5sec)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add (if true): There is no danger at all if two image ID pods happen to run at the same time.

- Coordinator pods can always be evicted from any node
- Coordinator pods can always be replaced with another coordinator pod with a different ID on a different node
- `node.kubernetes.io/unreachable:NoExecute` toleration time is set low (15sec)
- `node.kubernetes.io/not-ready:NoExecute` toleration time is set low (15sec)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add? "There is no danger at all if two coordinator pods with different ID run concurrently.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done (a bit different)

### DBServer Pods

DBServer pods run an ArangoDB dbserver as part of an ArangoDB cluster.
It has persistent state potentially tight to the node it runs on and it has a unique ID.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"tight" -> "tied"

### Single Server Pods

Single server pods run an ArangoDB server as part of an ArangoDB single server deployment.
It has persistent state potentially tight to the node.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"tight" -> "tied"

### Single Pods in Active Failover Deployment

Single pods run an ArangoDB single server as part of an ArangoDB active failover deployment.
It has persistent state potentially tight to the node it runs on and it has a unique ID.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"tight" -> "tied"

- It is a follower of an active-failover deployment (Q: can we trigger this failover to another server?)
- Single pods can always be replaced with another single pod with a different ID on a different node.
- `node.kubernetes.io/unreachable:NoExecute` toleration time is set high to "wait it out a while" (5min)
- `node.kubernetes.io/not-ready:NoExecute` toleration time is set high to "wait it out a while" (5min)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to check this, do not know by heart.

- SyncMaster pods can always be evicted from any node
- SyncMaster pods can always be replaced with another syncmaster pod on a different node
- `node.kubernetes.io/unreachable:NoExecute` toleration time is set low (15sec)
- `node.kubernetes.io/not-ready:NoExecute` toleration time is set low (15sec)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any requirement about the same network endpoint or an internal k8s service being set up in case of a replacement?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no

- SyncWorker pods can always be evicted from any node
- SyncWorker pods can always be replaced with another syncworker pod on a different node
- `node.kubernetes.io/unreachable:NoExecute` toleration time is set a bit higher to try to avoid resynchronization (1min)
- `node.kubernetes.io/not-ready:NoExecute` toleration time is set a bit higher to try to avoid resynchronization (1min)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here about network endpoint.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no

@ewoutp ewoutp merged commit d82bc7f into master May 14, 2018
@ewoutp ewoutp deleted the documentation/eviction-and-replacement-spec branch May 14, 2018 14:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants