Skip to content

PFC Watchdog Design

sihuihan88 edited this page Aug 23, 2018 · 21 revisions

PFC Watchdog in SONiC

High Level Design Document

Rev 0.1

Table of Contents

List of Tables

Revision
Rev Date Author Change Description
0.1 Marian Pritsak Initial version
0.2 Sihui Han Update design with FLEX_COUNTER

About this Manual

This document provides general information about the PFC Watchdog feature implementation in SONiC. PFC watchdog is designed to detect and mitigate PFC storm received for each port. PFC pause frames are used in lossless Ethernet to pause the link partner from sending packets. Such back-pressure mechanism could propagate to the whole network and cause the network stop forwarding traffic. PFC watchdog is to detect abnormal back-pressure caused by receiving excessive PFC pause frames, and mitigate such situation by disabling PFC caused pause temporarily. PFC watchdog has three functional blocks, i.e. detection, mitigation and restoration.

PFC Watchdog only works for lossless queues. It requires QoS (PFC) configuration on the device.

Scope

This document describes the high level design of the PFC WD feature.

Definitions/Abbreviation

Table 2: Abbreviations
Definitions/Abbreviation Description
PFC Priority Flow Control
WD Watchdog
ACL Access Control List

1 PFC Watchdog Requirements Overview

1.1 Overview

PFC watchdog is designed to detect and mitigate PFC storm received for each port. PFC pause frames is used in lossless Ethernet to pause the link partner from sending packets. Such back-pressure mechanism could propagate to the whole network and cause the network stop forwarding traffic. PFC watchdog is to detect abnormal back-pressure caused by receiving excessive PFC pause frames, and mitigate such situation by disable PFC caused pause temporarily. PFC watchdog has three function blocks, i.e. detection, mitigation and restoration.

  • Storm Detection: Detect the PFC pause storm if it keeps happening for certain time for a non-drop class.
  • Storm Restoration: Restore the Storm by dropping all packets of the non-drop class.

1.2 Functional Specification

1.2.1 PFC storm detection

The PFC storm detection is for a switch to detect a lossless queue is receiving PFC storm from its link partner and the queue is in a paused state over detection_time amount of time. Even when the queue is empty, as soon as the duration for a queue in paused state exceeds detection_time amount of time, the watchdog should detect such storm. detection_time is a port level parameter. The detection needs to enable/disable at per port level. Such detection mechanism is only available for lossless queue. By default, the detection mechanism is disabled. detection_time should be on the scale of hundred milliseconds.

1.2.2 PFC storm mitigation

Once PFC storm is detected on a queue, the watchdog can then have two actions, drop and forward at per queue level. When drop action is selected, following actions need to be implemented.

  • All existing packets in the output queue are discarded
  • All subsequent packets destine to the output queue are discarded
  • all subsequent packets received by the corresponding priority group of this queue are discarded including the pause frames received. As a result, the switch should not generate any pause frame to its neighbor due to congestion of this output queue.

When forward action is selected, following actions need to be implemented.

  • the queue no longer honor the PFC frames received. All packets destined to the queue are forwarded as well as those packets that were in the queue.

The default action is drop.

1.2.3 PFC storm restoration

The watchdog should continue count the PFC frames received on the queue. If there is no PFC frame received over restoration_time period. Then, re-enable the PFC on the queue and stop dropping packets if the previous mitigation was drop. restoration_time is port level parameter. restoration_time should be on the scale of hundred milliseconds.

2 Modules Design

2.1 Config DB

2.1.1 PFC WD Table

; Defines PFC WD configuration on a port
key                              = PFC_WD_TABLE:ifname           ; configuration for watchdog on port
; field                          = value

detection_time                   = 1*3DIGIT                      ; pause storm happen for detection_time in msecs
                                                                 ; after which queue is considered to be in PFC
                                                                 ; storm, and watchdog action is triggered.
restoration_time                 = 1*3DIGIT                      ; restoration_time in msecs after which
                                                                 ; queue in PFC storm state is
                                                                 ; restored if no pause frames were received.
action                           = "drop"/"forward"              ; action taken by watchdog in case
                                                                 ; of PFC storm detection.

2.2 FLEX_COUNTER DB

Orchagent can have different strategies or criteria for storm detection, which might require different counters. Hence it needs to tell syncd which counters to poll. To achieve this, we apply FLEX_COUNTER DB (FLEX_COUNTER_TABLE and FLEX_COUNTER_GROUP_TABLE)

2.2.1 PFC WD FLEX_COUNTER_GROUP_TABLE entry

; Defines schema for PFC WD flex counter group
key                            = "FLEX_COUNTER_GROUP_TABLE:PFC_WD"      ; WD group entry
; field                        = value
QUEUE_PLUGIN_LIST              = 1*64VCHAR                     ; list of ',' separated redis lua scripts plugins
POLL_INTERVAL                  = 1*3DIGIT                      ; counter polling interval
FLEX_COUNTER_STATUS            = 1*8VCHAR                      ; 'enable'/'disable'
                                                               ;  to indicate whether counter poll is enabled

2.2.2 PFC WD FLEX_COUNTER_TABLE entry for each port

; Defines schema for PFC WD flex counter entry
key                            = "FLEX_COUNTER_TABLE:PFC_WD:portID"      ; WD port entry
; field                        = value
PORT_COUNTER_ID_LIST           = 1*64VCHAR                     ; list of ',' separated port counter IDs

2.2.3 PFC WD FLEX_COUNTER_TABLE entry for each lossless queue

; Defines schema for PFC WD flex counter entry
key                            = "FLEX_COUNTER_TABLE:PFC_WD:queueID"      ; WD queue entry
; field                        = value
QUEUE_COUNTER_ID_LIST          = 1*64VCHAR                     ; list of ',' separated queue counter IDs
QUEUE_ATTR_ID_LIST             = 1*64VCHAR                     ; list of ',' separated queue attribute IDs

2.3 COUNTERS DB

2.3.1 COUNTERS table

; Defines schema for queue counters that are updated by PFC WD
key                                            = "COUNTERS":""queueId"  ; WD queue entry
; field                                        = value
PFC_WD_QUEUE_STATS_STORM_DETECTED              = 1*4DIGIT               ; deadlock counter
PFC_WD_QUEUE_STATS_STORM_RESTORED              = 1*4DIGIT               ; restoration counter
PFC_WD_QUEUE_STATS_TX_PACKETS                  = 1*20DIGIT              ; total packets transmitted during storm
PFC_WD_QUEUE_STATS_TX_DROPPED_PACKETS          = 1*20DIGIT              ; total Tx packets dropped due to storm
PFC_WD_QUEUE_STATS_RX_PACKETS                  = 1*20DIGIT              ; total packets received during storm
PFC_WD_QUEUE_STATS_RX_DROPPED_PACKETS_LAST     = 1*20DIGIT              ; total Rx packets dropped due to storm
PFC_WD_QUEUE_STATS_TX_PACKETS_LAST             = 1*20DIGIT              ; packets transmitted during last storm
PFC_WD_QUEUE_STATS_TX_DROPPED_PACKETS_LAST     = 1*20DIGIT              ; Tx packets dropped due to last storm
PFC_WD_QUEUE_STATS_RX_PACKETS_LAST             = 1*20DIGIT              ; packets received during last storm
PFC_WD_QUEUE_STATS_RX_DROPPED_PACKETS_LAST     = 1*20DIGIT              ; Rx packets dropped due to last storm

2.4 Criteria for storm detection

As different vendors support different counters, there must be a way to let every ASIC vendor decide how to tell if queue is stormed. For instance, a possible criteria could be one based on pause duration counter:

   (SAI_QUEUE_STAT_CURR_OCCUPANCY_BYTES > 0 && SAI_QUEUE_STAT_PACKETS.current - SAI_QUEUE_STAT_PACKETS.last == 0 && SAI_PORT_STAT_PFC_[queue]_RX_PKT.current - SAI_PORT_STAT_PFC_[queue]_RX_PKT > 0)
   ||
   (SAI_QUEUE_STAT_CURR_OCCUPANCY_BYTES ==  0 && SAI_QUEUE_STAT_PACKETS.current - SAI_QUEUE_STAT_PACKETS.last == 0 && SAI_PORT_STAT_PFC_[queue]_RX_PAUSE_DURATION.current == SAI_PORT_STAT_PFC_[queue]_RX_PAUSE_DURATION.last + t0 * delta)
   // delta is a percentage of time that queue had to be paused (e. g. 0.9)

Those criteria are coded in .lua script for Redis DB and called periodically based on configured timers by orchagent.

2.5 Action Handlers

PfcWdActionHandler class is defined to provide common interface to handle PFC storm. Different platforms can inherit from it to implement different handlers.

2.6 Events for Resetting PFC WD

There is a set of external events that can fully or partially invalidate current state of watchdog.

2.6.1 Counters Reset

In case of resetting counters PFC WD is in undefined state and should skip one polling interval. SONiC does not use API to reset counters on ASIC, so this event is ignored.

2.6.2 Port Going Down

In case if port's state changes to DOWN, all queues marked as stormed, are restored.

2.6.3 Queue Reconfiguration or Removal

Queue configuration must be going through PFC WD proxy in order to make it update its internal state. If PFC is disabled on a queue that was marked as stormed, queue will be restored.

2.7 CLI

In order to provide user an ability to set/view PFC WD configuration/statistics, pfcwatchdog CLI tool should provide following functionality:

  • Show watchdog configuration (per port). pfcwd show config
  • Show watchdog statistics (per port/queue). pfcwd show stats
  • Enable watchdog on a specified port(s). pfcwd start --action drop ports Ethernet116 detection-time 300 --restoration-time 300
  • Disable watchdog on a specified port(s). pfcwd stop <interfaceName>

3 Flows

3.1 General Overview

The workflow of PFC Watchdog is shown as above figure.

(1) Users config PFC Watchdog configuration via CLI command. Users need to specify the detection_time, restoration_time and port name in the CLI and the command can be found in Section 2.7. Configurations are written into Config DB PFC_WD table.

(2) PFC Watchdog orchagent subscribe Config DB PFC_WD table. Once the configuration is written in Config DB, PFC Watchdog Orchagent get the change, and start the functionality on intended ports.

(3) PFC storm events are detected via counters. Syncd is the module to poll counters from ASIC. To start PFC Watchdog functionality on certain port, orchagent writes the port/queue counters needed to detect storm start/stop to Flex Counter DB. Then it will wait for notification in case storm start/stop event happens.

(4)(5) Syncd subscribes FLEX Counter DB and will periodically query the counters from ASIC and write those to counter DB.

(6) Lua scripts are embedded with Counter DB. The Lua script runs periodically to check whether storm start/stop with available counters. If the storm start/stop event happens, it will send notification to Orchagent.

Once Orchagent received storm start/stop notification from lua script, it will start/stop PFCWDActionHandler accordingly.

3.2 Reading SAI Counters

All port counters are stored in Counters DB. In order to have all required information from hardware, watchdog needs to extend it by making syncd read values from SAI: it subscribes queues for polling in WD database. syncd will call apropriate .lua script upon every counters update to check if queues changed their state and notify orchagent upon any change.

3.3 Watchdog Orchagent

Orchagent is subscribed to syncd notifications of queue deadlock, and is supposed to apply configured action (drop/forward) upon receiving notification that queue became locked, or restore queue when opposite notification type is received.

3.4 WD Drop Action

3.4.1 Detect Handler

Following mitigation handler disables PFC on marked queue and sets its reserved buffer to 0.

Another option is following mitigation handler disable PFC on marked queue, and bind the port to the ACL tables that has ingress/egress ACL rule to drop packets of that particular PG.

3.4.2 Restore Handler

Following restoration handler returns reserved buffer value to initial value and enables PFC on unmarked queue.

If choosing the ACL option, unbind the port from ACL tables and enables PFC on unmarked queue.

3.5 WD Forward Action

3.5.1 Detect Handler

Following mitigation handler disables PFC on stormed queue. It will no more respect pause frames from link partner, and forward all packets.

3.5.2 WD Restore Handler

Following restore handler will reenable PFC on a queue so that it will continue to work in lossless mode as configured by user.

Clone this wiki locally