-
Notifications
You must be signed in to change notification settings - Fork 486
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Signed-off-by: Ori Braunshtein <obraunsh@redhat.com>
- Loading branch information
Showing
1 changed file
with
187 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,187 @@ | ||
--- | ||
title: OVN Pods Egress DSCP QoS | ||
authors: | ||
- "@oribon" | ||
reviewers: | ||
- "@trozet" | ||
- "@danwinship" | ||
- "@tssurya" | ||
approvers: | ||
- TBD | ||
creation-date: 2022-02-16 | ||
last-updated: 2022-02-27 | ||
status: implementable | ||
--- | ||
|
||
# OVN Pods Egress DSCP QoS | ||
|
||
## Summary | ||
|
||
Not all traffic has the same priority, and when there is contention for bandwidth, there should be a mechanism for objects outside the cluster to prioritize the traffic. | ||
To enable this, we will use Differentiated Services Code Point (DSCP) which allows us to classify packets by setting a 6-bit field in the IP header, effectively marking the priority of a given packet relative to other packets as "Critical", "High Priority", "Best Effort" and so on. | ||
|
||
By introducing a new CRD `EgressQoS`, users could specify a DSCP value for packets originating from pods on a given namespace heading to a specified CIDR. | ||
The CRD will be Namespaced, with one resource allowed per namespace. | ||
The resources will be watched by ovn-k, which in turn will configure OVN's [QoS Table](https://man7.org/linux/man-pages/man5/ovn-nb.5.html#QoS_TABLE). | ||
|
||
## Motivation | ||
|
||
Telco customers require support for DSCP marking capability for some of their 5G applications, giving some pods precedence over others. | ||
The QoS markings will be consumed and acted upon by objects outside of the OpenShift cluster to optimize traffic flow throughout their networks. | ||
|
||
### Goals | ||
|
||
- Provide a mechanism for users to set DSCP on egress traffic coming from specific namespaces. | ||
|
||
### Non-Goals | ||
|
||
- Ingress QoS. | ||
|
||
- Consolidating with current `kubernetes.io/egress-bandwidth` and `kubernetes.io/ingress-bandwidth` annotations. | ||
Nonetheless, the work done here does not interfere with the current bandwidth QoS mechanism. | ||
|
||
- The DSCP marking does not need to be handled or acted upon by OpenShift, just added to selected headers. | ||
|
||
- Marking East/West traffic, exposing the DSCP value from the inner packet to the outer geneve packet. | ||
|
||
## Proposal | ||
|
||
To achieve egress DSCP marking on pods, we introduce a new namespace-scoped CRD `EgressQoS` which allows specifying a set of QoS rules, each has a DSCP value and a destination CIDR. Traffic coming from pods on the namespace heading to each destination CIDR will be marked with the corresponding DSCP value. | ||
|
||
### Implementation Details/Notes/Constraints | ||
|
||
A new API `EgressQoS` under the `k8s.ovn.org/v1` version will be added to `pkg/crd`. | ||
|
||
A new controller in OVN-K will watch `EgressQoS` and `Pod` objects, which will create the relevant QoS objects in OVN and result in the necessary flows to be programmed in OVS. By listing the pods in the namespace these QoS rules will be attached only to the relevant node local switches. | ||
|
||
For example, assuming there's a single pod `app1` in namespace `default` on node `node1` and the following `EgressQoS` is created: | ||
|
||
```yaml | ||
kind: EgressQoS | ||
apiVersion: k8s.ovn.org/v1 | ||
metadata: | ||
name: default | ||
namespace: default | ||
spec: | ||
egress: | ||
- dscp: 46 | ||
dstCIDR: 0.0.0.0/0 | ||
- dscp: 30 | ||
dstCIDR: 1.2.3.4/32 | ||
``` | ||
the equivalent of: | ||
```bash | ||
ovn-nbctl qos-add node1 from-lport 1 "ip4.src == <default_ns_address_set> && ip4.dst == 0.0.0.0/0" dscp=46 | ||
ovn-nbctl qos-add node1 from-lport 2 "ip4.src == <default_ns_address_set> && ip4.dst == 1.2.3.4/32" dscp=30 | ||
``` | ||
will be executed. | ||
|
||
In addition it'll watch pods to decide if further updates are needed, for example: | ||
when another pod `app2` comes up in the namespace on node `node2`, the controller will | ||
attach the existing `QoS` object to the local switch of `node2`. | ||
|
||
IPv6 will also be supported, given the following `EgressQoS`: | ||
```yaml | ||
apiVersion: k8s.ovn.org/v1 | ||
kind: EgressQoS | ||
metadata: | ||
name: default | ||
namespace: default | ||
spec: | ||
egress: | ||
- dscp: 48 | ||
dstCIDR: 2001:0db8:85a3:0000:0000:8a2e:0370:7330/124 | ||
``` | ||
and a single pod with the IP `fd00:10:244:2::3` in the namespace, the controller will create the relevant QoS object that will result in a similar flow to this on the pod's node: | ||
```bash | ||
cookie=0x6d99cb18, duration=63.310s, table=18, n_packets=0, n_bytes=0, idle_age=63, priority=555,ipv6,metadata=0x4,ipv6_src=fd00:10:244:2::3,ipv6_dst=2001:db8:85a3::8a2e:370:7330/124 actions=mod_nw_tos:192,resubmit(,19) | ||
``` | ||
|
||
### User Stories | ||
#### Story 1 | ||
|
||
As a user of OpenShift, I should be able to mark egress traffic coming from a specific namespace with a valid DSCP value. | ||
|
||
### API Extensions | ||
|
||
A new namespace-scoped CRD is introduced: | ||
|
||
```go | ||
// +genclient | ||
// +k8s:deepcopy-gen:interfaces=k8s.io/apimachinery/pkg/runtime.Object | ||
// +kubebuilder:resource:path=egressqoses | ||
// +kubebuilder::singular=egressqos | ||
// +kubebuilder:object:root=true | ||
// +kubebuilder:subresource:status | ||
// EgressQoS is a CRD that allows the user to define a DSCP value | ||
// for pods egress traffic on its namespace to specified CIDRs. | ||
// Traffic from these pods will be checked against each EgressQoSRule in | ||
// the namespace's EgressQoS, and if there is a match the traffic is marked | ||
// with the relevant DSCP value. | ||
type EgressQoS struct { | ||
metav1.TypeMeta `json:",inline"` | ||
metav1.ObjectMeta `json:"metadata,omitempty"` | ||
|
||
Spec EgressQoSSpec `json:"spec,omitempty"` | ||
Status EgressQoSStatus `json:"status,omitempty"` | ||
} | ||
|
||
// EgressQoSSpec defines the desired state of EgressQoS | ||
type EgressQoSSpec struct { | ||
// a collection of Egress QoS rule objects | ||
Egress []EgressQoSRule `json:"egress"` | ||
} | ||
|
||
type EgressQoSRule struct { | ||
// Dscp marking value for matching pods' traffic. | ||
// +kubebuilder:validation:Maximum:=63 | ||
// +kubebuilder:validation:Minimum:=0 | ||
Dscp int `json:"dscp"` | ||
|
||
// DstCIDR specifies the destination's CIDR. Only traffic heading to this CIDR will be marked with DSCP. | ||
DstCIDR string `json:"dstCIDR"` | ||
} | ||
``` | ||
|
||
### Test Plan | ||
|
||
* Unit tests coverage | ||
|
||
* IPv4/IPv6 E2E that validates egress traffic from a namespace is marked with the correct DSCP value by creating and deleting `EgressQoS`, setting up src pods and host-networked destination pods. | ||
* Traffic to the specified CIDR should be marked. | ||
* Traffic to an address not contained in the CIDR should not be marked. | ||
|
||
### Risks and Mitigations | ||
N/A | ||
## Design Details | ||
N/A | ||
### Graduation Criteria | ||
|
||
#### Dev Preview -> Tech Preview | ||
|
||
#### Tech Preview -> GA | ||
|
||
#### Removing a deprecated feature | ||
N/A | ||
|
||
### Upgrade / Downgrade Strategy | ||
N/A | ||
### Version Skew Strategy | ||
N/A | ||
|
||
### Operational Aspects of API Extensions | ||
N/A | ||
|
||
#### Failure Modes | ||
N/A | ||
|
||
#### Support Procedures | ||
N/A | ||
|
||
## Implementation History | ||
N/A | ||
|
||
## Drawbacks | ||
N/A | ||
## Alternatives | ||
N/A |