Skip to content

Commit

Permalink
add queue state management design proposal
Browse files Browse the repository at this point in the history
  • Loading branch information
sivanzcw committed Oct 31, 2019
1 parent c8ef105 commit 9412481
Show file tree
Hide file tree
Showing 2 changed files with 220 additions and 0 deletions.
220 changes: 220 additions & 0 deletions docs/design/queue/queue-state-management.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,220 @@
# Queue State Management

[@sivanzcw](https://github.com/sivanzcw); Oct 17, 2019

## Table of Contents

- [Queue State Management](#queue-state-management)
* [Table of Contents](#table-of-contents)
* [Motivation](#motivation)
* [Function Detail](#function-detail)
+ [Data Structure](#data-structure)
+ [Queue State](#queue-state)
+ [Queue Lifecycle Management](#queue-lifecycle-management)
+ [Queue Status Refreshment](#queue-status-refreshment)
+ [Queue Placement Restriction](#queue-placement-restriction)
+ [Queue State on The Scheduling Process](#queue-state-on-the-scheduling-process)
+ [Queue State on `vcctl`](#queue-state-on--vcctl-)

## Motivation

Queue lacks `State Management` capabilities. As long as the queue exists, task can be sent to the queue. Queue can be
changed or deleted regardless of its state.

In `Hadoop YARN`, The queue has three states, `STOPPED`, `Running` and `Draining`. When the Queue is in the `STOPPED`
state, the Queue no longer receives new tasks, but tasks that have been delivered will not be affected. When the
queue status is `RUNNING`, the queue cannot be deleted. `Draining` is a intermediate state between `Running` and
`Stopped`.

## Function Detail

### Data Structure

Add `state` to `properties` in `spec` of CRD `queues.scheduling.sigs.dev`. The `state` of queue controller the status
of queue.

```go
spec:
properties:
...

state:
type: string

...
```

Add `state` to `properties` in `status` of CRD `queues.scheduling.sigs.dev`. The `state` of queue display the status of
current queue.

```go
status:
properties:
...

state:
type: string

...
```
### Queue State

Valid queue state includes:

* `Open`, indicates that the queue is available, the queue receives new task delivery
* `Closed`, indicated that the queue is unavailable, the queue will wait for the subordinate tasks to gracefully exit,
which does not mean that the system will actively delete tasks under the queue. However, the queue does not receive new
task delivery
* `Closing`, is a intermediate state between `Open` and `Closed`. When the state of queue is `Open` and there
are tasks running or waiting to be scheduled under the queue. At this time, we try to change the state of queue to
`Closed`. The state of queue will changes to `Closing` firstly and then changes to `Closed` when all the tasks under
the queue exist.

The ability of queue corresponding to queue state as show in the following table:

| state | default | can be set | receive delivery | can be deleted | can be scheduled | deserved resources |
| :-------: | :-----: | :--------: | :--------------: | :------------: |:---------------: | :----------------: |
| `Open` | Y | Y | Y | N | Y | Normal |
| `Closed` | N | Y | N | Y | Y | Normal |
| `Closing` | N | N | N | N | Y | Normal |

* If the state of queue is not specified during the creating of queue, the queue will use default state `Open`
* When creating a new queue, the user can only specify `Open` or `Closed` state for queue
* Only the queue with `Open` state accept new task delivery. the task will be rejected when it is posted to the queue
with `Closed` or `Closing` state
* Only the queue with `Closed` state can be deleted

### Queue Lifecycle Management

In the lifecycle management of queue, we need to guarantee the following three points:

* When creating a new queue, if the user does not specify a state for queue, we need to specify default `Open` state
for it, If the user specifies a state for queue, the specified state must be a valid value, valid values are `Open`
and `Closed`.
* When upgrading the queue, if state of queue changed, the specified state value must be valid.
* when deleting the queue, only queue with `Closed` status can be deleted successfully. The `status` here is the `state`
under the status of queue, not the `state` under the `spec` of queue.
* `default` queue can not be deleted

Add `validatingwebhookconfiguration` for queue validation during creating, updating or deleting of queue.

```yaml
apiVersion: admissionregistration.k8s.io/v1beta1
kind: ValidatingWebhookConfiguration
metadata:
name: {{ .Release.Name }}-validate-queue
annotations:
"helm.sh/hook": pre-install,pre-upgrade,post-delete
webhooks:
- clientConfig:
caBundle: ""
service:
name: {{ .Release.Name }}-admission-service
namespace: {{ .Release.Namespace }}
path: /queues
failurePolicy: Fail
name: validatequeue.volcano.sh
namespaceSelector: {}
rules:
- apiGroups:
- "scheduling.sigs.dev"
apiVersions:
- "v1alpha2"
operations:
- CREATE
- UPDATE
resources:
- queues
```
Add implementation function `AdmitQueues`

```go
func AdmitQueues(ar v1beta1.AdmissionReview) *v1beta1.AdmissionResponse {
...
queue, err := DecodeQueue(ar.Request.Object, ar.Request.Resource)
reviewResponse := v1beta1.AdmissionResponse{}
validateQueue(queue, &reviewResponse)
...
}
```

The above function will complete the following verification:

* During creating or upgrading queue, verify the validity of the queue state
* During deleting queue, check if queue can be deleted

We need another `webhook` to set default state value for queue during queue creating, add `mutatingwebhookconfiguration`
and `MutateQueues` function

```yaml
apiVersion: admissionregistration.k8s.io/v1beta1
kind: MutatingWebhookConfiguration
metadata:
name: {{ .Release.Name }}-mutate-queue
annotations:
"helm.sh/hook": pre-install,pre-upgrade,post-delete
webhooks:
- clientConfig:
caBundle: ""
service:
name: {{ .Release.Name }}-admission-service
namespace: {{ .Release.Namespace }}
path: /mutating-queues
failurePolicy: Fail
name: mutatequeue.volcano.sh
namespaceSelector: {}
rules:
- apiGroups:
- "scheduling.sigs.dev"
apiVersions:
- "v1alpha2"
operations:
- CREATE
resources:
- queues
```

```go
func MutateQueues(ar v1beta1.AdmissionReview) *v1beta1.AdmissionResponse {
...
queue, err := DecodeQueue(ar.Request.Object, ar.Request.Resource)
reviewResponse := v1beta1.AdmissionResponse{}
createPatch(queue)
...
}
```

### Queue Status Refreshment

When refreshing the status of queue, the `state` value under `spec.properties` and podgroup condition under the queue will be
considered:

* If the `state` value is empty, the status of queue will be set as `Open`
* If the `state` value is `Open`, then the status of queue will also be `Open`
* If the `state` value is `Closed`, then we need to further consider whether there is a podgroup under the queue. if
there is a podgroup under the queue, the status of the queue will be set as `Closing`, while if there is no podgroup
under the queue, the status of queue will be set as `Closed`.

### Queue Placement Restriction

When creating job, we need to verify the status of queue specified by the job:

* Allow job to be create, if the job does not specify a queue name
* If the job specifies a queue name and the status of the queue is `Open`, the job is allowed to create
* If the status of queue is not `Open`, the job creation request will be rejected.

### Queue State on The Scheduling Process

The above three states of queue have no effect on the existing scheduling process, for there is no pod under queue with
`Closed` state, while pods under queues with `Open` or `Closing` state should be scheduled normally.

### Queue State on `vcctl`

We need to add support for `queue state management` in `vcctl`, mainly including the following changes:

* Support for passing state of queue when creating queue
* When getting queue detail or queue list, we need to display the status of the queue
* Provide update function of queue, the function supports updating the `weight` or `state` of queue
* Provide delete function of queue
* Add queue operation interface, add `queue open` `queue close` `queue update` support
File renamed without changes.

0 comments on commit 9412481

Please sign in to comment.