Skip to content

Commit

Permalink
Node Drain Request/Status API
Browse files Browse the repository at this point in the history
The base of an upcoming PR to enable deferring node
reboot and shutdown operations.  In an Edge-Node-Cluster
each node can hold a replica copy of a cluster volume.
If the other nodes 1 and 2 in a cluster have recently recovered
from a failure they could be rebuilding their replicas
(1.x, 1.y, 2.x, 2.y) off of the only remaining readable copies
(3.x, 3.y) on node 3.
This API is to enable zedagent,nodeagent,baseosmgr to
request a drain of replicas off of an edge-node (rebuild completion) before allowing
a node outage.

Signed-off-by: Andrew Durbin <andrewd@zededa.com>
  • Loading branch information
andrewd-zededa committed Oct 17, 2024
1 parent 1d2ad7a commit f8ffca4
Show file tree
Hide file tree
Showing 4 changed files with 286 additions and 0 deletions.
134 changes: 134 additions & 0 deletions pkg/pillar/docs/zedkube.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
# Clustered eve nodes (aka zedkube)

## Overview

## Components

### kubenodeop

kubenodeop handles cordoning, uncordoning, and draining of clustered eve-os nodes. Any given node could be hosting
one or more longhorn volume replicas and thus could be the rebuild source for other node replicas. A drain operation should be
performed before any Node Operation / Node Command which can cause an extended outage of a node such as a reboot, shutdown, reset.
kubenodeop handles NodeDrainRequest objects which zedkube subscribes to, initiates the drain, and publishes NodeDrainStatus objects.

### Drain PubSub setup (node reboot/shutdown)

1. zedagent/handlenodedrain.go:initNodeDrainPubSub()
subscribes to NodeDrainStatus from zedkube
creates publication of NodeDrainRequest
1. nodeagent/handlenodedrain.go:initNodeDrainPubSub()
subscribe to NodeDrainStatus from zedkube

### Drain Request path (node reboot/shutdown)

1. zedagent/parseconfig.go:scheduleDeviceOperation()
call GetNodeDrainStatus() to determine if system supports drain
systems not HV=kubevirt will return NOTSUPPORTED
HV=kubevirt will return:
NOTSUPPORTED if in single node.
NOTREQUESTED if in cluster mode
NodeDrainStatus == NOTREQUESTED:
- kubeapi.RequestNodeDrain
- then set appropriate reboot or shutdown cmd deferred state in zedagentContext struct
NodeDrainStatus == REQUESTED: then set appropriate reboot or shutdown cmd deferred state in zedagentContext struct

### Drain Status Handler (node reboot/shutdown)

1. zedagent/handlenodedrain.go:handleNodeDrainStatusImpl()
if drain status FAILEDCORDON or FAILEDDRAIN, unpublish NodeDrainRequest
1. zedagent/zedagent.go:handleNodeAgentStatusImpl()
drainInProgress = getConfigContext.drainInProgress
getConfigContext.drainInProgress = NodeAgentStatus.DrainInProgress
if NodeAgentStatus DrainInProgress cleared then allow deferred reboot/shutdown
1. nodeagent/nodeagent.go:handleNodeDrainStatusImplNA
NodeDrainStatus == REQUESTED
NodeDrainStatus == STARTING
republish nodeagentstatus with drainInProgress set
NodeDrainStatus == COMPLETE
republish nodeagentstatus with drainInProgress cleared

### Drain PubSub setup (node eveimage-update)

1. baseosmgr/baseosmgr.go:setupKubePubSub()
subscribe to NodeDrainStatus from zedkube
setup publication to NodeDrainRequest

### Drain Request path (node eveimage-update)

1. baseosmgr/handlebaseos.go:baseOsHandleStatusUpdateUUID()
New BaseImage coming down, if State == LOADING or LOADED (past verification, so edge-node is ready for it)
if shouldDeferForNodeDrain() return, later NodeDrainStatus will complete this id.
1. baseosmgr/handlenodedrain.go:shouldDeferForNodeDrain()
NodeDrainStatus == NOTREQUESTED: RequestNodeDrain
NodeDrainStatus == REQUESTED-FAILEDDRAIN: defer
NodeDrainStatus == COMPLETE: don't defer

### Drain Status Handler (node eve-image update)

1. baseosmgr/handlenodedrain.go:handleNodeDrainStatusImpl()
NodeDrainStatus == FAILEDCORDON or FAILEDDRAIN: unpublish NodeDrainRequest
NodeDrainStatus == COMPLETE: complete deferred id to baseOsHandleStatusUpdateUUID

### General DrainRequest Processing

1. zedkube/zedkube.go:Run()
sub to NodeDrainRequest from zedagent and baseosmgr
new publication of NodeDrainStatus
Init NodeDrainStatus to NOTSUPPORTED
1. zedkube/zedkube.go:handleEdgeNodeClusterComfigImpl()
On cluster config:
NodeDrainStatus -> NOTREQUESTED
1. zedkube/handlenodedrain.go:handleNodeDrainRequestImpl()
NodeDrainStatus -> REQUESTED
1. zedkube/kubenodeop.go:cordonAndDrainNode()
NodeDrainStatus -> STARTING
Retry Cordon up to 10 times (in case k8s api states object changed)
else NodeDrainStatus -> FAILEDCORDON
NodeDrainStatus -> CORDONED
Retry Drain up to 5 times
between tries: NodeDrainStatus -> DRAINRETRYING
on failure: NodeDrainStatus -> FAILEDDRAIN
NodeDrainStatus -> COMPLETE

## Debugging

### PubSub NodeDrainRequest/NodeDrainStatus

/run/zedagent/NodeDrainRequest/global.json
/run/baseosmgr/NodeDrainRequest/global.json
/run/zedkube/NodeDrainStatus/global.json

The current node drain progress is available from the global NodeDrainStatus object found at
`cat /run/zedkube/NodeDrainStatus/global.json | jq`.

NodeDrainStatus can be forced by writing the object (in pillar svc container fs) to: /tmp/force-NodeDrainStatus-global.json

eg. to force disable drain:
echo '{"Status":1,"RequestedBy":1}' > /persist/force-NodeDrainStatus-global.json

eg. to force deviceop drain complete:
echo '{"Status":9,"RequestedBy":2}' > /persist/force-NodeDrainStatus-global.json

eg. to force baseosmgr drain complete:
echo '{"Status":9,"RequestedBy":3}' > /persist/force-NodeDrainStatus-global.json

"Cannot evict pod as it would violate the pod's disruption budget":
If NodeDrainStatus can get stuck if attempting to drain a node running a pod where the pod has an
explicit spec.nodeName == "drain node". Delete the pod to continue.
If workload is a statefulset declaing spec.nodeName and node is already cordoned. Then deleting the pod is not sufficient
The statefulset must be deleted.

### NodeDrainRequest/NodeDrainStatus log strings

- NodeDrainRequest
- NodeDrainStatus
- cordonNode
- cordonAndDrainNode
- scheduleDeviceOperation
- baseOsHandleStatusUpdateUUID
- nodedrain-step
- kubevirt_node_drain_completion_time_seconds
...
zgrep 'kubevirt_node_drain_completion_time_seconds' /persist/newlog/keepSentQueue/dev.log.1725511530990.gz | jq -r .content | jq -r .msg | cut -d ':' -f 2
34.559219
...
49 changes: 49 additions & 0 deletions pkg/pillar/kubeapi/kubetypes.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
// Copyright (c) 2024 Zededa, Inc.
// SPDX-License-Identifier: Apache-2.0

package kubeapi

import "time"

// DrainStatus tracks progress of draining a node of replica disks and workloads
type DrainStatus uint8

const (
UNKNOWN DrainStatus = iota + 0 // UNKNOWN Unable to determine
NOTSUPPORTED // NOTSUPPORTED System not (HV=kubevirt and clustered)
NOTREQUESTED // NOTREQUESTED Not yet requested
REQUESTED // REQUESTED From zedagent device operation or baseosmgr new update
STARTING // STARTING Zedkube go routine started, not yet cordoned
CORDONED // CORDONED Node Unschedulable set
FAILEDCORDON // FAILEDCORDON Node modification unable to apply
DRAINRETRYING // DRAINRETRYING Drain retry in progress, could be retried replica rebuild
FAILEDDRAIN // FAILEDDRAIN Could be retried replica rebuild
COMPLETE // COMPLETE All node workloads removed from system
)

// DrainRequester is a user initiated edge-node operation from a pillar microservice
type DrainRequester uint8

const (
NONE DrainRequester = iota + 1 // NONE - The default value
DEVICEOP // DEVICEOP - Node Reboot or shutdown
UPDATE // UPDATE - baseos update
)

// NodeDrainRequest is the trigger to NodeDrainStatus
//
// Used by Reboots, Prepare-Shutdown, baseos updates
type NodeDrainRequest struct {
Hostname string
RequestedAt time.Time
RequestedBy DrainRequester
Context string
}

// NodeDrainStatus is a response to NodeDrainRequest
//
// Subscribe to updates to continue NodeDrainRequest operations.
type NodeDrainStatus struct {
Status DrainStatus
RequestedBy DrainRequester
}
90 changes: 90 additions & 0 deletions pkg/pillar/kubeapi/nodedrain.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
// Copyright (c) 2024 Zededa, Inc.
// SPDX-License-Identifier: Apache-2.0

//go:build kubevirt

package kubeapi

import (
"encoding/json"
"fmt"
"os"
"time"

"github.com/lf-edge/eve/pkg/pillar/base"
"github.com/lf-edge/eve/pkg/pillar/pubsub"
)

// RequestNodeDrain generates the NodeDrainRequest object and publishes it
func RequestNodeDrain(pubNodeDrainRequest pubsub.Publication, requester DrainRequester, context string) error {
hostname, err := os.Hostname()
if err != nil {
return fmt.Errorf("RequestNodeDrain: can't get hostname %v", err)
}
drainReq := NodeDrainRequest{
Hostname: hostname,
RequestedAt: time.Now(),
RequestedBy: requester,
Context: context,
}
err = pubNodeDrainRequest.Publish("global", drainReq)
if err != nil {
return fmt.Errorf("RequestNodeDrain: error publishing drain request: %v", err)
}
return nil
}

// GetDrainStatusOverride : an alternate way to set drain status for debug
func GetDrainStatusOverride(log *base.LogObject) *NodeDrainStatus {
// An alternate path to force a drain status in the event of a drain issue.
forceNodeDrainPath := "/persist/force-NodeDrainStatus-global.json"
if _, err := os.Stat(forceNodeDrainPath); err != nil {
return nil
}
b, err := os.ReadFile(forceNodeDrainPath)
if err != nil {
log.Warnf("Unable to read %s:%v", forceNodeDrainPath, err)
return nil
}
cfg := NodeDrainStatus{}
err = json.Unmarshal(b, &cfg)
if err != nil {
log.Warnf("Unable to Unmarshal %s to NodeDrainStatus: %v", forceNodeDrainPath, err)
return nil
}
if cfg.Status == COMPLETE {
err = os.Remove(forceNodeDrainPath)
if err != nil {
log.Warnf("could not remove %s: %v", forceNodeDrainPath, err)
}
}
return &cfg
}

// DrainStatusFaultInjectionWait while this file exists, wait in the drain status goroutine
func DrainStatusFaultInjectionWait() bool {
injectFaultPath := "/tmp/DrainStatus_FaultInjection_Wait"
if _, err := os.Stat(injectFaultPath); err == nil {
return true
}
return false
}

// GetNodeDrainStatus is a wrapper to either return latest NodeDrainStatus
//
// or return a forced status from /persist/force-NodeDrainStatus-global.json
func GetNodeDrainStatus(subNodeDrainStatus pubsub.Subscription, log *base.LogObject) *NodeDrainStatus {
override := GetDrainStatusOverride(log)
if override != nil {
return override
}

items := subNodeDrainStatus.GetAll()
glbStatus, ok := items["global"].(NodeDrainStatus)
if !ok {
// This should only be expected on an HV=kubevirt build
// and only very early in boot (before zedkube starts)
return &NodeDrainStatus{Status: UNKNOWN, RequestedBy: NONE}
}
return &glbStatus
}
13 changes: 13 additions & 0 deletions pkg/pillar/kubeapi/nokube.go
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
package kubeapi

import (
"fmt"
"time"

"github.com/lf-edge/eve/pkg/pillar/base"
Expand All @@ -27,3 +28,15 @@ func CleanupStaleVMI() (int, error) {
func GetPVCList(*base.LogObject) ([]string, error) {
panic("GetPVCList is not built")
}

// RequestNodeDrain is a stub for non-kubevirt builds
func RequestNodeDrain(pubsub.Publication, DrainRequester, string) error {
// Nothing to do here, just noop
return fmt.Errorf("nokube requested drain, should not get here")
}

// GetNodeDrainStatus is a stub for non-kubevirt builds
func GetNodeDrainStatus(pubsub.Subscription) *NodeDrainStatus {
// No need to query for inprogress operations, just a noop
return &NodeDrainStatus{Status: NOTSUPPORTED}
}

0 comments on commit f8ffca4

Please sign in to comment.