diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000000..2543955bc6 --- /dev/null +++ b/.gitignore @@ -0,0 +1,3 @@ +*~ +*.swp +*.un~ diff --git a/MoM.html b/MoM.html new file mode 100644 index 0000000000..7a3dc6a8a1 --- /dev/null +++ b/MoM.html @@ -0,0 +1,323 @@ + + + + + + + + SONiC | Home + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+ + + + + +
+
+
+
+
+

SONiC community meeting minutes

+
+
+
+
+
+ + + + +
+
+
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
DateLinks To Meeting AgendaLinks To Minutes Of The meeting
  DECEMBER 17 2019  PCI-e diag designMoM
  DECEMBER 10 2019  201911 Release Tracking status & OCP planningMoM
  DECEMBER 03 2019  201911 Release Tracking statusMoM
  NOVEMBER 26 2019   201911 Release Tracking statusMoM
  NOVEMBER 19 2019  Thermal control designMoM
  NOVEMBER 12 2019  DPKG Caching FrameworkMoM
  NOVEMBER 05 2019  Ingress Discards Verification & DPKG Caching Framework MoM
OCTOBER 29 2019RADIUS HLD & DPKG local cachingMoM
OCTOBER 22 2019VRRP HLD & Release progressMoM
OCTOBER 15 2019Tech support data export and core file manager HLDMoM
  OCTOBER 08 2019  Review of 201910 release statusMoM
  SEPTEMBER 24 2019  Dynamic Port BreakOut - LKND MoM
  SEPTEMBER 17 2019  Firmware UtilsMoM
  SEPTEMBER 10 2019  Drop Counters HLDMoM
  SEPTEMBER 03 2019  BGP Error handlingMoM
  AUGUST 27 2019  Dynamic Port BreakOut HLDMoM
  AUGUST 20 2019  MC-LAG HLD - NephosMoM
  AUGUST 13 2019  Mgmt FrameworkMoM
  AUGUST 06 2019  Sub port interface high level designMoM
  JULY 30 2019  TAM featuresMoM
  JULY 23 2019  Debug Framework MoM
  JULY 16 2019  Egress Mirror support and ACL MoM
  JULY 09 2019  PDE (Platform Development Environment) /PDDF (Platform Driver Development Framework)MoM
  JULY 02 2019  L3 Performance enhancements - BRCMMoM
  JUNE 25 2019  VRF design discussion - NephosMoM
  JUNE 18 2019  Error Handling - BRCM MoM
  JUNE 11 2019   sFlow HLD MoM
  JUNE 04 2019  STP/PVST and NAT HLDMoM
  MAY 28 2019  MLAG DesignMoM
  MAY 21 2019   L2 Forwarding Enhancements & BFDMoM
  MAY 07 2019  SONiC 201908 release PlanningMoM
  APRIL 23 2019   Platform Support Tests and SSD Diagnostic tool by MLNXMoM
  APRIL 16 2019   ZTP Feature ProposalMoM
  APRIL 09 2019  SONiC test frame work enhancemenMoM
  MARCH 26 2019   KVM to run SONiC imageMoM
+
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/Supported-Devices-and-Platforms.html b/Supported-Devices-and-Platforms.html new file mode 100644 index 0000000000..b80c7a2497 --- /dev/null +++ b/Supported-Devices-and-Platforms.html @@ -0,0 +1,985 @@ + + + + +
+
+
+
+
+
+

  Filter your search

+
+
+
+
+
+
+
+ +
+
+ + + + + + + + + + + +
logo       Supported Platforms  
+

       Following is the list of platforms that supports SONiC.

+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
VendorPlatformASIC VendorSwitch ASICPort ConfigurationImage
AcctonAS4630-54PEBroadcomHelix 548x1G + 4x25G + 2x100GSONiC-ONIE-Broadcom
AcctonAS5712-54XBroadcomTrident 272x10GSONiC-ONIE-Broadcom
AcctonAS5812-54XBroadcomTrident 272x10GSONiC-ONIE-Broadcom
AcctonAS5835-54TBroadcomTrident 348x10G + 6x100GSONiC-ONIE-Broadcom
AcctonAS5835-54XBroadcomTrident 348x10G + 6x100GSONiC-ONIE-Broadcom
AcctonAS6712-32XBroadcomTrident 232x40GSONiC-ONIE-Broadcom
AcctonAS7116-54XNephosTaurus48x25G + 6x100GSONiC-ONIE-Nephos
AcctonAS7312-54XBroadcomTomahawk48x25G + 6x100GSONiC-ONIE-Broadcom
AcctonAS7312-54XSBroadcomTomahawk48x25G + 6x100GSONiC-ONIE-Broadcom
AcctonAS7315-27XBBroadcomQumran20x10G + 4x25G + 3x100GSONiC-ONIE-Broadcom
AcctonAS7326-56XBroadcomTrident 348x25G + 8x100GSONiC-ONIE-Broadcom
AcctonAS7512-32XCaviumXPliantCNX880**32x100GSONiC-ONIE-Cavium
AcctonAS7712-32XBroadcomTomahawk32x100GSONiC-ONIE-Broadcom
AcctonAS7716-32XBroadcomTomahawk32x100GSONiC-ONIE-Broadcom
AcctonAS7716-32XBBroadcomTomahawk32x100GSONiC-ONIE-Broadcom
AcctonAS7726-32XBroadcomTrident 332x100GSONiC-ONIE-Broadcom
AcctonAS7816-64XBroadcomTomahawk 264x100GSONiC-ONIE-Broadcom
AcctonAS9716-32DBroadcomTomahawk 332x400GSONiC-ONIE-Broadcom
AcctonMinipackBroadcomTomahawk 3128x100GSONiC-ONIE-Broadcom
AlphanetworksSNH60A0BroadcomTomahawk32x100GSONiC-ONIE-Broadcom
AlphanetworksSNH60B0BroadcomTomahawk64x100GSONiC-ONIE-Broadcom
Arista7050QX-32BroadcomTrident 232x40GSONiC-Aboot-Broadcom
Arista7050QX-32SBroadcomTrident 232x40GSONiC-Aboot-Broadcom
Arista7060CX-32SBroadcomTomahawk32x100GSONiC-Aboot-Broadcom
Arista7060DX4-32BroadcomTomahawk 332x400G + 2x10GSONiC-Aboot-Broadcom
Arista7060PX4-32BroadcomTomahawk 332x400G + 2x10GSONiC-Aboot-Broadcom
Arista7170-64CBarefootTofino64x100GSONiC-ONIE-Barefoot
Arista7260CX3-64BroadcomTomahawk 264x100GSONiC-Aboot-Broadcom
Arista7280CR3-32D4BroadcomJericho 232x100G + 4x400GSONiC-Aboot-Broadcom
Arista7280CR3-32P4BroadcomJericho 232x100G + 4x400GSONiC-Aboot-Broadcom
BarefootSONiC-P4BarefootP4 EmulatedConfigurableSONiC-P4
BarefootWedge 100BF-32BarefootTofino32x100GSONiC-ONIE-Barefoot
BarefootWedge 100BF-65XBarefootTofino32x100GSONiC-ONIE-Barefoot
CelesticaDX010BroadcomTomahawk32x100GSONiC-ONIE-Broadcom
CelesticaE1031BroadcomHelix448x1G + 4x10GSONiC-ONIE-Broadcom
Celesticamidstone-200iInnoviumTeralynx 7128x100GSONiC-ONIE-Innovium
CelesticaSilverstoneBroadcomTomahawk 332x400GSONiC-ONIE-Broadcom
CentecE582-48X2QCentecGoldengate48x10G + 2x40G + 4x100GSONiC-ONIE-Centec
CentecE582-48X6QCentecGoldengate48x10G + 6x40GSONiC-ONIE-Centec
CigCS6436-56PNephosTaurus48x25G + 8x100GSONiC-ONIE-Nephos
DellS5232F-C32BroadcomTrident 332x100GSONiC-ONIE-Broadcom
DellS6000-ONBroadcomTrident 232x40GSONiC-ONIE-Broadcom
DellS6100-ONBroadcomTomahawk64x40GSONiC-ONIE-Broadcom
DellZ9100-C32BroadcomTomahawk32x100GSONiC-ONIE-Broadcom
DellZ9264BroadcomTomahawk 264x100GSONiC-ONIE-Broadcom
DeltaAG5648BroadcomTomahawk48x25G + 6x100GSONiC-ONIE-Broadcom
DeltaAG6248CBroadcomHelix 448x1G + 2x10GSONiC-ONIE-Broadcom
DeltaAG9032V1BroadcomTomahawk32x100GSONiC-ONIE-Broadcom
DeltaAG9032V2BroadcomTrident 332x100G + 1x10GSONiC-ONIE-Broadcom
DeltaAG9064BroadcomTomahawk 264x100GSONiC-ONIE-Broadcom
Deltaet-c032ifInnoviumTeralynx 732x400GSONiC-ONIE-Innovium
DeltaET-6448MMarvellPrestera 98DX325548xGE + 4x10GSONiC-ONIE-Marvell
EmbedwayES6220 (48x10G)CentecGoldengate48x10G + 2x40G + 4x100GSONiC-ONIE-Centec
EmbedwayES6428A-X48Q2H4CentecGoldengate4x100G + 2x40G + 48x10GSONiC-ONIE-Centec
FacebookWedge 100-32XBroadcomTomahawk32x100GSONiC-ONIE-Broadcom
IngrasysS8810-32QBroadcomTrident 232x40GSONiC-ONIE-Broadcom
IngrasysS8900-54XCBroadcomTomahawk48x25G + 6x100GSONiC-ONIE-Broadcom
IngrasysS8900-64XCBroadcomTomahawk48x25G + 16x100GSONiC-ONIE-Broadcom
IngrasysS9100-32XBroadcomTomahawk32x100GSONiC-ONIE-Broadcom
IngrasysS9130-32XNephosTaurus32x100GSONiC-ONIE-Nephos
IngrasysS9180-32XBarefootTofino32x100GSONiC-ONIE-Barefoot
IngrasysS9200-64XBroadcomTomahawk 264x100GSONiC-ONIE-Broadcom
IngrasysS9230-64XNephosTaurus64x100GSONiC-ONIE-Nephos
IngrasysS9280-64XBarefootTofino64x100GSONiC-ONIE-Barefoot
InventecD6254QSBroadcomTrident 272x10GSONiC-ONIE-Broadcom
InventecD6356BroadcomTrident 348x25G + 8x100GSONiC-ONIE-Broadcom
InventecD6556BroadcomTrident 348x25G + 8x100GSONiC-ONIE-Broadcom
InventecD7032QBroadcomTomahawk32x100GSONiC-ONIE-Broadcom
InventecD7054QBroadcomTomahawk48x25G + 6x100GSONiC-ONIE-Broadcom
InventecD7264QBroadcomTomahawk 264x100GSONiC-ONIE-Broadcom
Juniper NetworksQFX5210-64CBroadcomTomahawk 264x100GSONiC-ONIE-Broadcom
MarvellRD-ARM-48XG6CG-A4MarvellPrestera 98EX54xx6x100G+48x10GSONiC-ONIE-Marvell
MarvellRD-BC3-4825G6CG-A4MarvellPrestera 98CX84xx6x100G+48x25GSONiC-ONIE-Marvell
MellanoxSN2010MellanoxSpectrum4x100G+18x25GSONiC-ONIE-Mellanox
MellanoxSN2100MellanoxSpectrum16x100GSONiC-ONIE-Mellanox
MellanoxSN2410MellanoxSpectrum48x25G+8x100GSONiC-ONIE-Mellanox
MellanoxSN2700MellanoxSpectrum32x100GSONiC-ONIE-Mellanox
MellanoxSN2740MellanoxSpectrum32x100GSONiC-ONIE-Mellanox
MellanoxSN3700MellanoxSpectrum 232x200GSONiC-ONIE-Mellanox
MellanoxSN3700CMellanoxSpectrum 232x100GSONiC-ONIE-Mellanox
MellanoxSN3800MellanoxSpectrum 264x100GSONiC-ONIE-Mellanox
MitacLY1200-B32H0-C3BroadcomTomahawk32x100GSONiC-ONIE-Broadcom
PegatronPorscheNephosTaurus48x25G + 6x100GSONiC-ONIE-Nephos
QuantaT3032-IX7BroadcomTrident 332x100GSONiC-ONIE-Broadcom
QuantaT4048-IX8BroadcomTrident 348x25G + 8x100GSONiC-ONIE-Broadcom
QuantaT4048-IX8CBroadcomTrident 348x25G + 8x100GSONiC-ONIE-Broadcom
QuantaT7032-IX1BBroadcomTomahawk32x100GSONiC-ONIE-Broadcom
QuantaT9032-IX9BroadcomTomahawk 332x400GSONiC-ONIE-Broadcom
WncOSW1800BarefootTofino48x25G + 6x100GSONiC-ONIE-Barefoot
+

 

+

+Note: +

    +
  1. Dell S6100-ON is a modular switch that has different port configurations. Currently only the 64x40G port configuration is supported.
  2. +
  3. Arista devices use the Aboot boot loader instead of ONIE. It is normally pre-installed. To learn more about Aboot, please refer to their documentation here.
  4. +
  5. ONIE images are normally pre-installed. For DELL switches, you can find their ONIE images and instructions on Dell's website, S6000-ONS6100-ONZ9100-ON.
  6. +
  7. Please contact Marvell for details of the SKU information.
  8. +
  9. Please contact vendors for support and SKU information.
  10. +
      +
    1. Inventec swsp@inventec.com
    2. +
    3. Edgecore sales@edge-core.com
    4. +
    +
+
+
+

 

+

 

+ + + diff --git a/assets/css/style.css b/assets/css/style.css index f9eeac4390..8948f27abb 100644 --- a/assets/css/style.css +++ b/assets/css/style.css @@ -62,7 +62,7 @@ a { a:hover, a:focus { outline: none; - text-decoration: none; + text-decoration: underline; } h1, h2, h3, h4, h5, h6 { diff --git a/assets/img/all_partners2_1920x1320.jpg b/assets/img/all_partners2_1920x1320.jpg index f78b8ad266..dede91eb2b 100644 Binary files a/assets/img/all_partners2_1920x1320.jpg and b/assets/img/all_partners2_1920x1320.jpg differ diff --git a/contact.html b/contact.html index 5cd989f292..b1024c516f 100644 --- a/contact.html +++ b/contact.html @@ -47,61 +47,6 @@
-
@@ -178,7 +123,10 @@

Contact

- + + diff --git a/doc/DUT_monitor_HLD.md b/doc/DUT_monitor_HLD.md new file mode 100644 index 0000000000..bd17ca8b81 --- /dev/null +++ b/doc/DUT_monitor_HLD.md @@ -0,0 +1,252 @@ +Table of Contents + +- [Scope](#scope) +- [Overview](#overview) +- [Quality Objective](#quality-objective) +- [Module design](#module-design) + - [Overall design](#overall-design) + - [Updated directory structure](#updated-directory-structure) + - [Thresholds overview](#thresholds-overview) + - [Thresholds configuration file](#thresholds-configuration-file) + - [Thresholds template](#thresholds-template) + - [Preliminary defaults](#preliminary-defaults) +- [Pytest plugin overview](#pytest-plugin-overview) + - [Pytest option](#pytest-option) + - [Pytest hooks](#pytest-hooks) + - [Classes](#classes) +- [Interaction with dut](#interaction-with-dut) +- [Tests execution flaw](#tests-execution-flaw) +- [Extended info to print for error cases](#extended-info-to-print-for-error-cases) +- [Commands to fetch monitoring data](#commands-to-fetch-monitoring-data) +- [Possible future expansion](#possible-future-expansion) + + + +### Scope + +This document describes the high level design of verification the hardware resources consumed by a device. The hardware resources which are currently verified are CPU, RAM and HDD. + +This implementation will be integrated in test cases written on Pytest framework. + +### Overview + +During tests run test cases perform many manipulations with DUT including different Linux and SONiC configurations and sending traffic. + +To be sure that CPU, RAM and HDD resources utilization on DUT are not increasing within tests run, those parameters can be checked after each finished test case. + +Purpose of the current feature is to - verify that previously listed resources are not increasing during tests run. It achieves by performing verification after each test case. + +### Quality Objective ++ Ensure CPU consumption on DUT does not exceed threshold ++ Ensure RAM consumption on DUT does not exceed threshold ++ Ensure used space in the partition mounted to the HDD "/" root folder does not exceed threshold + +### Module design +#### Overall design +The following figure depicts current feature integration with existed Pytest framework. + +![](https://github.com/yvolynets-mlnx/SONiC/blob/dut_monitor/images/dut_monitor_hld/Load_flaw.jpg) + +Newly introduced feature consists of: ++ Pytest plugin – pytest_dut_monitor.py. Plugin defines: + + pytest hooks: pytest_addoption , pytest_configure, pytest_unconfigure + + pytest fixtures: dut_ssh, dut_monitor + + DUTMonitorPlugin – class to be registered as plugin. Define pytest fixtures described above + + DUTMonitorClient - class to control DUT monitoring over SSH ++ Pytest plugin register new options: "--dut_monitor", "--thresholds_file" ++ Python module - dut_monitor.py. Which is running on DUT and collects CPU, RAM and HDD data and writes it to the log files. There will be created three new files: cpu.log, ram.log, hdd.log. + +#### Updated directory structure ++ ./sonic-mgmt/tests/plugins/\_\_init__.yml ++ ./sonic-mgmt/tests/plugins/dut_monitor/thresholds.yml ++ ./sonic-mgmt/tests/plugins/dut_monitor/pytest_dut_monitor.py ++ ./sonic-mgmt/tests/plugins/dut_monitor/dut_monitor.py ++ ./sonic-mgmt/tests/plugins/dut_monitor/errors.py ++ ./sonic-mgmt/tests/plugins/dut_monitor/\_\_init__.py + +#### Thresholds overview +To be able to verify that CPU, RAM or HDD utilization are not critical on the DUT, there is a need to define specific thresholds. + +List of thresholds: ++ Total system CPU consumption ++ Separate process CPU consumption ++ Time duration of CPU monitoring ++ Average CPU consumption during test run ++ Peak RAM consumption ++ RAM consumption delta before and after test run ++ Used disk space + +```Total system CPU consumption``` - integer value (percentage). Triggers when total peak CPU consumption is >= to defined value during “Peak CPU monitoring duration” seconds. + +```Separate process CPU consumption``` - integer value (percentage). Triggers when peak CPU consumption of any separate process is >= to defined value during “Peak CPU measurement duration” seconds. + +```Time duration of CPU monitoring``` - integer value (seconds). Time frame. Used together with total or process peak CPU consumption verification. + +```Average CPU consumption during test run``` - integer value (percentage). Triggers when the average CPU consumption of the whole system between start/stop monitoring (between start/end test) is >= to defined value. + +```Peak RAM consumption``` – integer value (percentage). Triggers when RAM consumption of the whole system is >= to defined value. + +```RAM consumption delta before and after test``` – integer value (percentage). Difference between consumed RAM before and after test case. Triggers when the difference is >= to defined value. + +```Used disk space``` - integer value (percentage). Triggers when used disk space is >= to defined value. + +#### Thresholds configuration file +Default thresholds are defined in ./sonic-mgmt/tests/plugins/dut_monitor/thresholds.yml file. + +The proposal is to define thresholds for specific platform and its hwsku. Below is template of "thresholds.yml" file, which has defined: general default thresholds, platform default thresholds, specific HWSKU thresholds. + +If HWSKU is not defined for current DUT - platform thresholds will be used. + +If platform is not defined for current DUT - default thresholds will be used. + + +##### Thresholds template example: +```code +# All fields are mandatory +default: + cpu_total: x + cpu_process: x + cpu_measure_duration: x + cpu_total_average: x + ram_peak: x + ram_delta: x + hdd_used: x + + +# Platform inherits thresholds from 'default' section +# Any threshold field can be redefined per platform specific +# In below example all defaults are redefined +platform X: + hwsku: + [HWSKU]: + cpu_total: x + cpu_process: x + cpu_measure_duration: x + cpu_total_average: x + ram_peak: x + ram_delta: x + hdd_used: x + default: + cpu_total: x + cpu_process: x + cpu_measure_duration: x + cpu_total_average: x + ram_peak: x + ram_delta: x + hdd_used: x +``` +##### Preliminary defaults +Note: need to be tested to define accurately. + + cpu_total: 90 + cpu_process: 60 + cpu_measure_duration: 10 + cpu_total_average: 90 + ram_peak: 80 + ram_delta: 1 + hdd_used: 80 + +##### How to tune thresholds +1. User can pass its own thresholds file for test run using "--thresholds_file" pytest option. For example: +```code +py.test TEST_RUN_OPTIONS --thresholds_file THRESHOLDS_FILE_PATH +``` +2. User can update thresholds directly in test case by using "dut_monitor" fixture. +For example: +```code +dut_monitor["cpu_total"] = 80 +dut_monitor["ram_peak"] = 90 +... +``` +3. Define thresholds for specific test groups. +For specific test groups like scale, performance, etc. thresholds can be common. In such case "thresholds.yml" file can be created and placed next to the test module file. Pytest framework will automatically discover "thresholds.yml" file and will apply defined thresholds for current tests. + + +### Pytest plugin overview + +#### Pytest option +To enable DUT monitoring for each test case the following pytest console option should be used - "--dut_monitor" + +#### Pytest hooks +dut_monitor.py module defines the following hooks: +##### pytest_addoption(parser) +Register "--dut_monitor" option. This option used for trigger device monitoring. + +Register "--thresholds_file" option. This option takes path to the thresholds file. + +##### pytest_configure(config) +Check whether "--dut_monitor" option is used, if so register DUTMonitorPlugin class as pytest plugin. +##### pytest_unconfigure(config) +Unregister DUTMonitorPlugin plugin. + +### Classes +#### DUTMonitorClient class +Define API for: + ++ Start monitoring on the DUT ++ Stop monitoring on the DUT. Compare measurements with defined thresholds ++ Execute remote commands via SSH ++ Track SSH connection with DUT ++ Automatically restore SSH connection with DUT while in monitoring mode + +#### DUTMonitorPlugin class +Defines the following pytest fixtures: + +##### dut_ssh(autouse=True, scope="session") +Establish SSH connection with a device. Keeps this connection during all tests run. + +If the connection to the DUT is broken during monitoring phase (test performed DUT reboot), it will automatically try to restore connection during some time (for example 5 minutes). + +If the connection will be restored, monitoring will be automatically restored as well and dut_monitor fixture will have monitoring results even if reboot occurred. So, monitoring results will not be lost if in some case DUT will be rebooted. + +If the connection will not be restored, exception will be raised that DUT become inaccessible. + +##### dut_monitor(dut_ssh, autouse=True, scope="function") +- Starts DUT monitoring before test start +- Stops DUT monitoring after test finish +- Get measured values and compare them with defined thresholds +- Pytest error will be generated if any of resources exceed the defined threshold. + + +### Interaction with dut + +![](https://github.com/yvolynets-mlnx/SONiC/blob/dut_monitor/images/dut_monitor_hld/Dut_monitor_ssh.jpg) + +### Tests execution flaw + ++ Start pytest run with added “–dut_monitor” option ++ Before each test case - initialize DUT monitoring ++ Start reading CPU, RAM and HDD values every 2 seconds ++ Start test case ++ Wait the test case to finish ++ Stop reading CPU, RAM and HDD values ++ Display logging message with measured parameters ++ After the end of each test case compare obtained values with defined thresholds ++ Pytest error will be generated if any of resources exceed the defined threshold. Error message will also show extended output about consumed CPU, RAM and HDD, which is described below. Test case status like pass/fail still be shown separately. It gives possibility to have separate results for test cases (pass/fail) and errors if resources consumption exceed the threshold. + + +#### Extended info to print for error cases +Display output of the following commands: + ++ df -h --total /* ++ ps aux --sort rss ++ docker stats --all --format "table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}" + +### Commands to fetch monitoring data + +##### Fetch CPU consumption: +ps -A -o pcpu | tail -n+2 | python -c "import sys; print(sum(float(line) for line in sys.stdin))" +##### Fetch RAM consumption: +show system-memory +OR +ps -A -o rss | tail -n+2 | python -c "import sys; print(sum(float(line) for line in sys.stdin))" + +##### Fetch HDD usage: +df -hm / + + +### Possible future expansion + +Later this functionality can be integrated with some UI interface where will be displayed consumed resources and device health during regression run. As UI board can be used Grafana. + +It can be useful for DUT health debugging and for load/stress testing analysis. diff --git a/doc/L3_performance_and_scaling_enchancements_HLD.md b/doc/L3_performance_and_scaling_enchancements_HLD.md new file mode 100644 index 0000000000..c01435cb37 --- /dev/null +++ b/doc/L3_performance_and_scaling_enchancements_HLD.md @@ -0,0 +1,228 @@ +# L3 Scaling and Performance Enhancements +Layer 3 Scaling and Performance Enhancements +# High Level Design Document +#### Rev 0.1 + +# Table of Contents + * [List of Tables](#list-of-tables) + * [Revision](#revision) + * [About This Manual](#about-this-manual) + * [Scope](#scope) + * [Definition/Abbreviation](#definitionabbreviation) + +# List of Tables +[Table 1: Abbreviations](#table-1-abbreviations) + +# Revision +| Rev | Date | Author | Change Description | +| :---: | :-----------: | :------------------: | ----------------------------------- | +| 0.1 | 06/04/2019 | Arvind | Initial version | +| | | | | + +# About this Manual +This document provides information about the Layer 3 performance and scaling improvements done in SONiC 201908. +# Scope +This document describes the high level design of Layer 3 performance and scaling improvements. + +# 1 Requirement Overview +## 1.1 Functional Requirements + + - __Scaling improvements__ + 1. ARP/ND + - Support for 32k IPv4 ARP entries + - Support for 16k IPv6 ND entries + 2. Route scale + - 200k IPv4 routes + - 65k IPv6 routes + + 3. ECMP + - 512 groups X 32 paths + - 256 groups X 64 paths + - 128 groups X 128 paths + + The route scale and ECMP items mentioned above will be tested to success. + A Broadcom Tomahawk-2 platform will be used for this testing, but the focus is on SONiC behavior + + - __Performance improvements__ + + 4. Reduce the IPv4 and IPv6 route programming time + 5. Reduce unknown ARP/ND learning time + 6. Reduce the time taken to display output with the following *"show commands"* + - **_show arp_** + - **_show ndp_** + + + +## 1.2 Configuration and Management Requirements +No new configuration or show commands introduced. + +## 1.3 Scalability Requirements +Covered in Functional requirements + + +## 1.4 Warm Boot Requirements + +There are no specific changes done for Warm-boot in this feature, however testing will done to make sure no change affect the warm boot time. + +# 2 Design + +## 2.1 Scaling improvements + +### 2.1.2 Improvement in number of ARP Entries + +SONiC currently supports around 2400 host entries. + +In our testing we found, when sending a lot of ARP/ND requests in a burst, ARP entries are getting purged from the kernel while the later set of ARP entries was still getting added. +The sequence of add/remove is in such a way that we were never able to cross ~2400 entries . + +In the kernel ARP module the following attributes govern how many ARP entries are key in the Kernel ARP cache +``` +gc_thresh1 (since Linux 2.2) +The minimum number of entries to keep in the ARP cache. The garbage collector will not run if there are fewer than this number of entries in the cache. Defaults to 128. + +gc_thresh2 (since Linux 2.2) +The soft maximum number of entries to keep in the ARP cache. The garbage collector will allow the number of entries to exceed this for 5 seconds before collection will be performed. Defaults to 512. + +gc_thresh3 (since Linux 2.2) +The hard maximum number of entries to keep in the ARP cache. The garbage collector will always run if there are more than this number of entries in the cache. Defaults to 1024. +``` + +To increase the number of ARP/ND entries these attributes will be changed to following values for IPv4 and IPv6 +``` +net.ipv4.neigh.default.gc_thresh1=16000 +net.ipv4.neigh.default.gc_thresh2=32000 +net.ipv4.neigh.default.gc_thresh3=48000 + +net.ipv6.neigh.default.gc_thresh1=8000 +net.ipv6.neigh.default.gc_thresh2=16000 +net.ipv6.neigh.default.gc_thresh3=32000 +``` +To increase rate of the ARP/ND packets coming to the CPU. Currently the max rate for ARP/ND is 600 packets, we will be increasing it to higher number(8000) in CoPP file to improve the learning time. + +## 2.2 Performance Improvements + +This section elaborates the changes done to improve L3 performance in SONiC + +### 2.2.1 Route installation time + +The tables below captures the baseline route programming time for IPv4 and IPv6 prefixes in SONiC. +To measure route programming time, BGP routes were advertised to a SONiC router and timed how long it took for these routes to be installed in the ASIC. + +### Table 2: IPv4 prefix route programming time + +| Routes | time taken on AS7712(Tomahawk) | +| ----------------------- | ------------------------------ | +| 10k IPv4 prefix routes | 11 seconds | +| 30k IPv4 prefix routes | 30 seconds | +| 60k IPv4 prefix routes | 48 seconds | +| 90k IPv4 prefix routes | 68 seconds | + + + +### Table 3: IPv6 prefix route programming time + +| Routes | time taken on AS7712(Tomahawk) | +| ---------------------------------- | ----------------------------- | +| 10k IPv6 route with prefixes > 64b | 11 seconds | +| 30k IPv6 route with prefixes >64b | 30 seconds | + + +#### 2.2.1.1 Proposed optimizations for reducing the route programming time. + +- Using sairedis bulk route APIs + + In SONiC architecture, routeorch in Orchagent processes route table updates from the APP_DB and calls the sairedis APIs to put these routes in ASIC_DB. + Currently Orchagent processes each route and puts in the ASIC_DB one at a time. The Redis pipelining allows for some level of bulking when putting entries in ASIC_DB but still one DB message is generated for every route. + + Further bulking can be done by using the sairedis bulk APIs for route creation and deletion. + + By using Sairedis bulk APIs, orchagent will call these APIs with a list of routes and their attributes. + The meta_sai layer in sairedis iterates over this route list and creates the meta objects for every route but only one Redis DB message will be generated for the route list. + Therefore using the sairedis bulk APIs reduces the number of Redis messages. + + The ASIC doesn't support bulk route creation/deletion,so syncd still processes one route at time and updates the ASIC. + + So, the saving achieved by using bulk APIs will be number of Redis message generated. + + Bulking of Route updates will be enabled in Orchagent. Orchagent will bulk 64 updates and send to Sairedis. + A new timer will be introduced in orchagent to flush the outstanding updates every second. + +- Optimization in Fpmsyncd + + Fpmsyncd listens to Netlink messages for Route add/delete messages and updates the APP_DB. + Current behaviour in Fpmsyncd + - When fpmsyncd get a route first it tries to get the master device name from the rt_table attribute of route object. + This is done to check if the route belongs to VNET_ROUTE_TABLE. + - To get a Master device name, fpmsyncd does a lookup in its local link cache. + - If the lookup in the local link cache fails, fpmsyncd updates cache by getting the configured links from the kernel. + + The problem here is if there are no VNET present on the system, this lookup will always fail and cache is updated for every route. + This seems to slow down the rate which route programed for the global route table. + + To fix this we will skip the lookup for the Master device name if the route object table value is zero .i.e. the route needs to put in the global routing table + + In our testing, we found which this change time taken to add 10k routes to the APP_DB was reduced from 7-8 seconds to 4-5 seconds. + +- Optimization in the sairedis. + + In sairedis every sai object is serialized in JSON format while creating the meta sai objects and updating the ASIC_DB. For serializing it uses json dump functionality to convert the objects in JSON. format. + This json dump function is provided by a open source JSON library. [[link to the open-source project](https://github.com/nlohmann/json)] + + Currently SONiC uses version 2.0 of this library, latest version however is version 3.6.1 + There have a been a few bug fixes/improvements done to the *dump()* from v 2.0 to v 3.6. + + We will be upgrading this library to latest version to pick up all the fixes. + + +With the above mentioned optimizations we target to get 30% reduction in the route programming time in SONiC. + +### 3.3 show CLI command enchancements + +#### 3.3.1 show arp/ndp command improvement. + +The current implemetation of the cli script for "show arp" or "show ndp" fetches the whole FDB table to get the outgoing interface incase the L3 interface is a VLAN L3 Interface. + +This slows down the show command. We will make changes to the CLI script to get FDB entries only for this specific ARP/ND instead of getting the whole FDB table. + +These changes will improve the performance of the command significantly + +# 4 Warm Boot Support +No specific changes are planned for Warm boot support as these are exisiting features. + +However, testing will done to make sure the changes done, for scaling or performance improvements, won't affect the Warm boot functionality. + + + +# 5 Unit Test +## 5.1 IPv4 Testcases +| Testcase number | Testcase | Result | Time taken | +| --------------- | ----------------------------------------------------------------------------- | ------ | ---------- | +| 1. | Verify 10k IPv4 routes are installed and measure the route programming time | | | +| 2. | Verify 60k IPv4 routes are installed and measure the route programming time | | | +| 3. | Verify 90k IPv4 routes are installed and measure the route programming time | | | +| 4. | Verify 128k IPv4 routes are installed and measure the route programming time | | | +| 5. | Verify 160k IPv4 routes are installed and measure the route programming time | | | +| 6. | Verify 200k IPv4 routes are installed and measure the route programming time | | | +| 7. | Verfiy 8k IPv4 ARP entries are learnt and measure the learning time | | | +| 8. | Verify 16k IPv4 ARP entries are learnt and measure the learning time | | | +| 9. | Verify 32k IPv4 ARP entries are learnt and measure the learning time | | | +## 5.2 IPv6 Testcases +| Testcase number | Testcase | Result | Time taken | +| --------------- | ---------------------------------------------------------------------------------------------- | ------ | ---------- | +| 1. | Verify 10k IPv6 routes with prefix >64b are installed and measure the route programming time | | | +| 2. | Verify 25k IPv6 routes with prefix > 64b are installed and measure the route programming time | | | +| 3. | Verify 40k IPv6 routes with prefix > 64b are installed and measure the route programming time | | | +| 4. | Verfiy 8k IPv6 ND entries are learnt and measure the learning time | | | +| 5. | Verify 16k IPv6 ND entries are learnt and measure the learning time | | | +| | | | | +## 5.3 Regresssion Testcases +| Testcase number | Testcase | Result | Time taken | +| --------------- | ----------------------------------------------------------- | ------ | ---------- | +| 1. | Measure the convergence time with link flaps | | | +| 2. | Measure the convergence time with Link flaps on ECMP paths | | | +| 3. | Clear bgp neighbors to check all routes and forwarding | | | +| 4. | Clear neigh table and check all routes and forwarding | | | +| 5. | Clear mac table and check all routes and forwarding | | | +| 6. | Test across warm reboot , Orchagt/Syncd restart and upgrade | | | + + diff --git a/doc/Optional-Feature-Control.md b/doc/Optional-Feature-Control.md new file mode 100644 index 0000000000..116cab0431 --- /dev/null +++ b/doc/Optional-Feature-Control.md @@ -0,0 +1,19 @@ +# SONiC Optional Feature Control Enhancement # + +## Revision ## + +| Rev | Date | Author | Change Description | +|:---:|:--------:|:-----------:|--------------------| +| 0.1 | 10/10/19 | Pradnya Mohite | Initial version | + +## Scope ## +Add support to enable/disable features in sonic. Features like telemetry agent can be optional and this enhancement will provide a way to control that. + +### Implementation Details ### +* Add feature table in config db. + * Modify sonic-cfggen tool to add table and enable the telemetry feature by default. + * For each feature, key is FEATURE|feature name, status :enabled/disabled. +* Add "config feature enable|disable [feature name]" command line. + * Add support for show and config commands. +* Add feature in hostcfgd to listen for Config DB FEATURE table entry changes, and enable & start or stop & disable the respective service as appropriate. + * When hostcfgd first starts, it reads all entries in the FEATURE table and compares with current status of each service. If there is mismatch, hostcfgd will enable & start or stop & disable as appropriate. \ No newline at end of file diff --git a/doc/SONIC_Test_Ingress_Discards_HLD.md b/doc/SONIC_Test_Ingress_Discards_HLD.md new file mode 100644 index 0000000000..d388c8e63a --- /dev/null +++ b/doc/SONIC_Test_Ingress_Discards_HLD.md @@ -0,0 +1,995 @@ +- [Overview](#overview) + - [Scope](#scope) + - [Supported topologies](#supported-topologies) + - [Discard groups covered by test case](#discard-groups-covered-by-test-cases) + - [Related DUT CLI commands](#related-dut-cli-commands) + - [SAI attributes](#sai-attributes) +- [General test flow](#general-test-flow) +- [Run test](#run-test) +- [Test cases](#test-cases) + - [Test case #1](#test-case-1) + - [Test case #2](#test-case-2) + - [Test case #3](#test-case-3) + - [Test case #4](#test-case-4) + - [Test case #5](#test-case-5) + - [Test case #6](#test-case-6) + - [Test case #7](#test-case-7) + - [Test case #8](#test-case-8) + - [Test case #9](#test-case-9) + - [Test case #10](#test-case-10) + - [Test case #11](#test-case-11) + - [Test case #12](#test-case-12) + - [Test case #13](#test-case-13) + - [Test case #14](#test-case-14) + - [Test case #15](#test-case-15) + - [Test case #16](#test-case-16) + - [Test case #17](#test-case-17) + - [Test case #18](#test-case-18) + - [Test case #19](#test-case-19) + - [Test case #20](#test-case-20) + - [Test case #21](#test-case-21) + +#### Overview +The purpose is to test drop counters triggers on receiving specific packets by DUT. + +The test assumes control plane traffic is disabled before test run by disabling VMs. + +Destination IP address of the injected packet must be routable to ensure packet should not be routed via specific interface but dropped. + +##### For Ethernet drop reasons: +```portstat -j``` - check ```RX_DRP``` + +##### For IP drop reasons: +```intfstat -j``` - check ```RX_ERR``` + +##### For ACL drop reasons: +```aclshow -a``` - check ```PACKETS COUNT``` + +#### Scope +The purpose of test cases is to verify that: +- appropriate packet drop counters trigger on SONIC system on expected value +- making sure that specific traffic drops correctly, according to sent packet and configured packet discards + +#### Supported topologies: +``` +t0 +t1 +t1-lag +ptf32 +``` + +#### Discard groups covered by test cases +Please refer to the test case for detailed description. + +| Test case ID| Drop reason | Group type| +|-------------|-------------|-----------| +| 1 | SMAC and DMAC are equal |Ethernet | +| 2 | Not allowed VLAN TAG| Ethernet| +| 3 | Multicast SMAC | Ethernet| +| 4 | Reserved DMAC | Ethernet| +| 5 | Loop-back filter | IP| +| 6 | Packet exceed router interface MTU| IP| +| 7 | Time To Live (TTL) Expired | IP| +| 8 | Discard at router interface for non-routable packets | IP| +| 9 | Absent IP header | IP| +| 10 | Broken IP header - header checksum or IPver or IPv4 IHL too short | IP| +| 11 | Unicast IP with multicast DMAC or broadcast DST MAC | IP| +| 12 | DST IP is loopback address | IP| +| 13 | SRC IP is loopback address | IP| +| 14 | SRC IP is multicast address | IP| +| 15 | SRC IP address is in class E | IP| +| 16 | SRC IP address is not specified | IP| +| 17 | DST IP address is not specified | IP| +| 18 | SRC IP address is link-local | IP| +| 19 | DST IP address is link-local | IP| +| 20 | ACL SRC IP DROP| IP| +| 21 | ERIF interface disabled | IP| + +#### Related DUT CLI commands +| **Command** | **Comment** | +|------------------------------------------------------------------|-------------| +| counterpoll port enable | Enable port counters | +| counterpoll rif enable | Enable RIF counters | +| portstat -j | Check ```RX_DRP``` | +| intfstat -j | Check ```RX_ERR``` | +| aclshow -a | Check ```PACKETS COUNT``` | +| sonic-clear counters | Clear counters | +| sonic-clear rifcounters | Clear RIF counters | + +As different vendors can have diferent drop counters calculation, for example L2 and L3 drop counters can be combined and L2 drop counter will be increased for all ingress discards. +So for valid drop counters verification there is a need to distinguish wheter drop counters are combined or not for current vendor. +This can be done by checking platform name of the DUT. + +##### Work need to be done based on this case +Create yml file which will contain list of regular expressions which match platform name of specific vendor who has combined drop counters calculation. Internally test framework will use this regular expressions to match current DUT platform name to determine whether drop counters are combined or not. + +##### Example of file content: +tests/drop_counters/combined_drop_counters.yml +``` +- "[REGEXP FOR VENDOR X]" +- "[REGEXP FOR VENDOR Y]" +``` + +#### SAI attributes +```SAI_PORT_STAT_IF_IN_DISCARDS``` - number of L2 discards + +```SAI_ROUTER_INTERFACE_STAT_IN_ERROR_PACKETS``` - number of L3 discards + +#### General test flow +##### Each test case will use the following port types: +- VLAN (T0) +- LAG (T0, T1-LAG) +- Router (T1, T1-LAG, PTF32) + +##### Drop counters which are going to be checked +It depends on test objective. + +Will be verified one of the following drop counters: +- Drop counter for L2 discards +- Drop counter for L3 discards +- Drop counter for ACL discards + +##### Sent packet number: +N=5 + +##### step #1 - Disable VMs +Before test suite run - disable control plane traffic generation. Use "testbed-cli.sh" script with "disconnect-vms" option. + +##### step #2 - Execute test scenario for all available port types depends on run topology (VLAN, LAG, Router) + +- Select PTF ports to send and sniff traffic +- Clear counters on DUT. Use CLI command "sonic-clear counters" +- Inject N packet via PTF port selected for TX +- Check specific drop counter incremented on N (depends on test case) + - If counter was not incremented on N, test fails with expected message +- Check other counters were not incremented on N based on sent packet type, sent port and expected drop reason (depends on test case) + - If counter was incremented on N, test fails with expected message +- Check the packet was dropped by sniffing packet absence on PTF port selected for RX + +##### step #3 - Enable VMs +Enable previously disabled VMs using "testbed-cli.sh" script with "connect-vms" option. + +#### Run test +``` +py.test --inventory ../ansible/inventory --host-pattern [DEVICE] --module-path ../ansible/library/ --testbed [DEVICE-TOPO] --testbed_file ../ansible/testbed.csv --junit-xml=./target/test-reports/ --show-capture=no --log-cli-level debug -ra -vvvvv ingress_discard/test_ingress_discard.py +``` + +#### Test cases +Each test case will be additionally validated by the loganalyzer utility. + +Each test case will run specific traffic to trigger specific discard reasone. + +Pytest fixture - "ptfadapter" is used to construct and send packets. + +After packet is sent using source port index, test framework waits during 5 seconds for specific packet did not appear, due to ingress packet drop, on one of the destination port indices. + +#### Test case #1 +##### Test objective + +Verify packet drops when SMAC and DMAC are equal + +Packet to trigger drop +``` +###[ Ethernet ]### + dst = [DUT_MAC_ADDR] + src = [DUT_MAC_ADDR] + type = 0x800 +... +``` + +##### Get interfaces which are members of LAG, RIF and VLAN. Repeat defined test steps for each of those interfaces. + +##### Test steps +- PTF host will send IP packet specifying identical src and dst MAC. +- When packet reaches SONIC DUT, it should be dropped according to the test objective +- Get L2 drop counter +- Verify drop counter incremented +- Get L3 drop counter +- Verify L3 drop counter is not incremented + +#### Test case #2 +##### Test objective + +Verify VLAN tagged packet drops when packet VLAN ID does not match ingress port VLAN ID + +Packet to trigger drop +``` +###[ Ethernet ]### + dst = [auto] + src = [auto] + type = 0x8100 + vid = 2 +###[ IP ]### + version = 4 + ttl = [auto] + proto = tcp + src = 10.0.0.2 + dst = [get_from_route_info] +... +``` + +##### Get interfaces which are members of LAG, RIF and VLAN. Repeat defined test steps for each of those interfaces. + +##### Test steps +- PTF host will send IP packet specifying VID different then port VLAN +- When packet reaches SONIC DUT, it should be dropped according to the test objective +- Get L2 drop counter +- Verify drop counter incremented +- Get L3 drop counter +- Verify L3 drop counter is not incremented + +#### Test case #3 +##### Test objective + +Verify packet with multicast SMAC drops + +Packet to trigger drop +``` +###[ Ethernet ]### + dst = [auto] + src = 01:00:5e:00:01:02 + type = 0x800 +###[ IP ]### + version = 4 + ttl = [auto] + proto = tcp + src = 10.0.0.2 + dst = [get_from_route_info] +... +``` + +##### Get interfaces which are members of LAG, RIF and VLAN. Repeat defined test steps for each of those interfaces. + +##### Test steps +- PTF host will send IP packet specifying multicast SMAC. +- When packet reaches SONIC DUT, it should be dropped according to the test objective +- Get L2 drop counter +- Verify drop counter incremented +- Get L3 drop counter +- Verify L3 drop counter is not incremented + +#### Test case #4 +##### Test objective + +Verify packet with reserved DMAC drops + +Packet1 to trigger drop (use reserved for future standardization MAC address) +``` +###[ Ethernet ]### + dst = 01:80:C2:00:00:05 + src = [auto] + type = 0x800 +###[ IP ]### + version = 4 + ttl = [auto] + proto = tcp + src = 10.0.0.2 + dst = [get_from_route_info] +... +``` +Packet2 to trigger drop (use provider Bridge group address) +``` +###[ Ethernet ]### + dst = 01:80:C2:00:00:08 + src = [auto] + type = 0x800 +###[ IP ]### + version = 4 + ttl = [auto] + proto = tcp + src = 10.0.0.2 + dst = [get_from_route_info] +... +``` + +##### Get interfaces which are members of LAG, RIF and VLAN. Repeat defined test steps for each of those interfaces. + +##### Test steps +- PTF host will send IP packet1 specifying reserved DMAC +- When packet reaches SONIC DUT, it should be dropped according to the test objective +- Get L2 drop counter +- Verify drop counter incremented +- Get L3 drop counter +- Verify L3 drop counter is not incremented +--- +- PTF host will send IP packet2 specifying reserved DMAC +- When packet reaches SONIC DUT, it should be dropped according to the test objective +- Get L2 drop counter +- Verify drop counter incremented +- Get L3 drop counter +- Verify L3 drop counter is not incremented + +#### Test case #5 +##### Test objective + +Verify packet drops by loop-back filter. Loop-back filter means that route to the host with DST IP of received packet exists on received interface + +Packet to trigger drop +``` +###[ Ethernet ]### + dst = [auto] + src = [auto] + type = 0x800 +###[ IP ]### + version = 4 + ttl = [auto] + proto = tcp + src = [auto] + dst = [known_bgp_neighboar_ip] +... +``` + +##### Get interfaces which are members of LAG, RIF and VLAN. Repeat defined test steps for each of those interfaces. + +##### Test steps +- PTF host will send IP packet specifying DST IP of VM host. Port to send is port which IP interface is in VM subnet. +- When packet reaches SONIC DUT, it should be dropped according to the test objective +- Get L3 drop counter +- Verify drop counter incremented +- Get L2 drop counter +- Verify L2 drop counter is not incremented + +#### Test case #6 +##### Test objective + +Verify packet which exceed router interface MTU (for IP packets) drops + +Note: make sure that configured MTU on testbed server and fanout are greater then DUT port MTU + +Packet to trigger drop +``` +###[ Ethernet ]### + dst = [auto] + src = [auto] + type = 0x800 +... +###[ TCP ]### + sport = [auto] + dport = [auto] + data = [max_mtu + 1] +... +``` + +##### Get interfaces which are members of LAG, RIF and VLAN. Repeat defined test steps for each of those interfaces. + +##### Test steps +- PTF host will send IP packet which exceed router interface MTU +- When packet reaches SONIC DUT, it should be dropped according to the test objective +- Get L3 drop counter +- Verify drop counter incremented +- Get L2 drop counter +- Verify L2 drop counter is not incremented + +#### Test case #7 +##### Test objective + +Verify packet with TTL expired (ttl <= 0) drops + +Packet to trigger drop +``` +###[ Ethernet ]### + dst = [auto] + src = [auto] + type = 0x800 +... +###[ IP ]### + version = 4 + ttl = 0 + proto = tcp + src = [auto] + dst = [auto] +... +``` + +##### Get interfaces which are members of LAG, RIF and VLAN. Repeat defined test steps for each of those interfaces. + +##### Test steps +- PTF host will send IP packet with TTL = 0 +- When packet reaches SONIC DUT, it should be dropped according to the test objective +- Get L3 drop counter +- Verify drop counter incremented +- Get L2 drop counter +- Verify L2 drop counter is not incremented + +#### Test case #8 +##### Test objective + +Verify non-routable packets discarded at router interface +Packet list: +- IGMP v1 v2 v3 membership query +- IGMP v1 membership report +- IGMP v2 membership report +- IGMP v2 leave group +- IGMP v3 membership report + +##### Get interfaces which are members of LAG, RIF and VLAN. Repeat defined test steps for each of those interfaces. + +##### Test steps +- PTF host will send IGMP v1 v2 v3 membership query +- When packet reaches SONIC DUT, it should be dropped according to the test objective +- Get L3 drop counter +- Verify drop counter incremented +- Get L2 drop counter +- Verify L2 drop counter is not incremented +--- +- PTF host will send IGMP v1 membership report +- When packet reaches SONIC DUT, it should be dropped according to the test objective +- Get L3 drop counter +- Verify drop counter incremented +- Get L2 drop counter +- Verify L2 drop counter is not incremented +--- +- PTF host will send IGMP v2 membership report +- When packet reaches SONIC DUT, it should be dropped according to the test objective +- Get L3 drop counter +- Verify drop counter incremented +- Get L2 drop counter +- Verify L2 drop counter is not incremented +--- +- PTF host will send IGMP v2 leave group +- When packet reaches SONIC DUT, it should be dropped according to the test objective +- Get L3 drop counter +- Verify drop counter incremented +- Get L2 drop counter +- Verify L2 drop counter is not incremented +--- +- PTF host will send IGMP v3 membership report +- When packet reaches SONIC DUT, it should be dropped according to the test objective +- Get L3 drop counter +- Verify drop counter incremented +- Get L2 drop counter +- Verify L2 drop counter is not incremented + +#### Test case #9 +##### Test objective + +Verify packet with no ip header available - drops + +Packet to trigger drop +``` +###[ Ethernet ]### + dst = [auto] + src = [auto] + type = 0x800 +###[ TCP ]### + sport = [auto] + dport = [auto] +``` + +##### Get interfaces which are members of LAG, RIF and VLAN. Repeat defined test steps for each of those interfaces. + +##### Test steps +- PTF host will send packet without IP header +- When packet reaches SONIC DUT, it should be dropped according to the test objective +- Get L3 drop counter +- Verify drop counter incremented +- Get L2 drop counter +- Verify L2 drop counter is not incremented + +#### Test case #10 +##### Test objective + +Verify DUT drop packet with broken ip header due to header checksum or IPver or IPv4 IHL too short + +Packet1 to trigger drop (Incorrect checksum) +``` +... +###[ IP ]### + version = 4 + ttl = [auto] + proto = tcp + src = [auto] + dst = [auto] + checksum = [generated_value] +... +``` + +Packet2 to trigger drop (Incorrect IP version) +``` +... +###[ IP ]### + version = 1 + ttl = [auto] + proto = tcp + src = [auto] + dst = [auto] +... +``` + +Packet3 to trigger drop (Incorrect IPv4 IHL) +``` +... +###[ IP ]### + version = 4 + ihl = 1 + ttl = [auto] + proto = tcp + src = [auto] + dst = [auto] +... +``` + +##### Get interfaces which are members of LAG, RIF and VLAN. Repeat defined test steps for each of those interfaces. + +##### Test steps +- PTF host will send packet1 +- When packet reaches SONIC DUT, it should be dropped according to the test objective +- Get L3 drop counter +- Verify drop counter incremented +- Get L2 drop counter +- Verify L2 drop counter is not incremented +--- +- PTF host will send packet2 +- When packet reaches SONIC DUT, it should be dropped according to the test objective +- Get L3 drop counter +- Verify drop counter incremented +- Get L2 drop counter +- Verify L2 drop counter is not incremented +--- +- PTF host will send packet3 +- When packet reaches SONIC DUT, it should be dropped according to the test objective +- Get L3 drop counter +- Verify drop counter incremented +- Get L2 drop counter +- Verify L2 drop counter is not incremented + +#### Test case #11 +##### Test objective + +Verify DUT drops unicast IP packet sent via: +- multicast DST MAC +- broadcast DST MAC + +Packet1 to trigger drop +``` +###[ Ethernet ]### + dst = 01:00:5e:00:01:02 + src = [auto] + type = 0x800 +###[ IP ]### + version = 4 + ttl = [auto] + proto = tcp + src = 10.0.0.2 + dst = [get_from_route_info] +... +``` + +Packet2 to trigger drop +``` +###[ Ethernet ]### + dst = ff:ff:ff:ff:ff:ff + src = [auto] + type = 0x800 +###[ IP ]### + version = 4 + ttl = [auto] + proto = tcp + src = 10.0.0.2 + dst = [get_from_route_info] +... +``` + +##### Get interfaces which are members of LAG, RIF and VLAN. Repeat defined test steps for each of those interfaces. + +##### Test steps +- PTF host will send packet1 +- When packet reaches SONIC DUT, it should be dropped according to the test objective +- Get L3 drop counter +- Verify drop counter incremented +- Get L2 drop counter +- Verify L2 drop counter is not incremented + +#### Test case #12 +##### Test objective + +Verify DUT drops packet where DST IP is loopback address + +For ipv4: dip==127.0.0.0/8 + +For ipv6: dip==::1/128 OR dip==0:0:0:0:0:ffff:7f00:0/104 + +Packet1 to trigger drop +``` +... +###[ IP ]### + version = 4 + ttl = [auto] + proto = tcp + src = [auto]] + dst = [127.0.0.1] +... +``` + +Packet2 to trigger drop +``` +... +###[ IP ]### + version = 6 + ttl = [auto] + proto = tcp + src = [auto]] + dst = [::1/128] +... +``` + +##### Get interfaces which are members of LAG, RIF and VLAN. Repeat defined test steps for each of those interfaces. + +##### Test steps +- PTF host will send packet1 +- When packet reaches SONIC DUT, it should be dropped according to the test objective +- Get L3 drop counter +- Verify drop counter incremented +- Get L2 drop counter +- Verify L2 drop counter is not incremented +--- +- PTF host will send packet2 +- When packet reaches SONIC DUT, it should be dropped according to the test objective +- Get L3 drop counter +- Verify drop counter incremented +- Get L2 drop counter +- Verify L2 drop counter is not incremented + +#### Test case #13 +##### Test objective + +Verify DUT drops packet where SRC IP is loopback address + +For ipv4: dip==127.0.0.0/8 + +For ipv6: dip==::1/128 OR dip==0:0:0:0:0:ffff:7f00:0/104 + +Packet1 to trigger drop +``` +... +###[ IP ]### + version = 4 + ttl = [auto] + proto = tcp + src = [127.0.0.1] + dst = [auto] +... +``` + +Packet2 to trigger drop +``` +... +###[ IP ]### + version = 6 + ttl = [auto] + proto = tcp + src = [::1/128] + dst = [auto] +... +``` + +##### Get interfaces which are members of LAG, RIF and VLAN. Repeat defined test steps for each of those interfaces. + +##### Test steps +- PTF host will send packet1 +- When packet reaches SONIC DUT, it should be dropped according to the test objective +- Get L3 drop counter +- Verify drop counter incremented +- Get L2 drop counter +- Verify L2 drop counter is not incremented +--- +- PTF host will send packet2 +- When packet reaches SONIC DUT, it should be dropped according to the test objective +- Get L3 drop counter +- Verify drop counter incremented +- Get L2 drop counter +- Verify L2 drop counter is not incremented + +#### Test case #14 +##### Test objective + +Verify DUT drops packet where SRC IP is multicast address + +For ipv4: sip = 224.0.0.0/4 + +For ipv6: sip == FF00::/8 + +Packet1 to trigger drop +``` +... +###[ IP ]### + version = 4 + ttl = [auto] + proto = tcp + src = [224.0.0.5] + dst = [auto] +... +``` + +Packet2 to trigger drop +``` +... +###[ IP ]### + version = 6 + ttl = [auto] + proto = tcp + src = [ff02::5] + dst = [auto] +... +``` + +##### Get interfaces which are members of LAG, RIF and VLAN. Repeat defined test steps for each of those interfaces. + +##### Test steps +- PTF host will send packet1 +- When packet reaches SONIC DUT, it should be dropped according to the test objective +- Get L3 drop counter +- Verify drop counter incremented +- Get L2 drop counter +- Verify L2 drop counter is not incremented +--- +- PTF host will send packet2 +- When packet reaches SONIC DUT, it should be dropped according to the test objective +- Get L3 drop counter +- Verify drop counter incremented +- Get L2 drop counter +- Verify L2 drop counter is not incremented + +#### Test case #15 +##### Test objective + +Verify DUT drops packet where SRC IP address is in class E + +SIP == 240.0.0.0/4 + +SIP != 255.255.255.255 + +Packet1 to trigger drop +``` +... +###[ IP ]### + version = 4 + ttl = [auto] + proto = tcp + src = [240.0.0.1] + dst = [auto] +... +``` + +Packet2 to trigger drop +``` +... +###[ IP ]### + version = 4 + ttl = [auto] + proto = tcp + src = [255.255.255.254] + dst = [auto] +... +``` + +##### Get interfaces which are members of LAG, RIF and VLAN. Repeat defined test steps for each of those interfaces. + +##### Test steps +- PTF host will send packet1 +- When packet reaches SONIC DUT, it should be dropped according to the test objective +- Get L3 drop counter +- Verify drop counter incremented +- Get L2 drop counter +- Verify L2 drop counter is not incremented +--- +- PTF host will send packet2 +- When packet reaches SONIC DUT, it should be dropped according to the test objective +- Get L3 drop counter +- Verify drop counter incremented +- Get L2 drop counter +- Verify L2 drop counter is not incremented + +#### Test case #16 +##### Test objective + +Verify DUT drops packet where SRC IP address is not specified + +IPv4 sip == 0.0.0.0/32 + +Note: for IPv6 (sip == ::0) + +Packet1 to trigger drop +``` +... +###[ IP ]### + version = 4 + ttl = [auto] + proto = tcp + src = [0.0.0.0] + dst = [auto] +... +``` + +Packet2 to trigger drop +``` +... +###[ IP ]### + version = 6 + ttl = [auto] + proto = tcp + src = [::0] + dst = [auto] +... +``` + +##### Get interfaces which are members of LAG, RIF and VLAN. Repeat defined test steps for each of those interfaces. + +##### Test steps +- PTF host will send packet1 +- When packet reaches SONIC DUT, it should be dropped according to the test objective +- Get L3 drop counter +- Verify drop counter incremented +- Get L2 drop counter +- Verify L2 drop counter is not incremented +--- +- PTF host will send packet2 +- When packet reaches SONIC DUT, it should be dropped according to the test objective +- Get L3 drop counter +- Verify drop counter incremented +- Get L2 drop counter +- Verify L2 drop counter is not incremented + +#### Test case #17 +##### Test objective + +Verify DUT drops packet where DST IP address is not specified + +IPv4 sip == 0.0.0.0/32 + +Note: for IPv6 (sip == ::0) + +Packet1 to trigger drop +``` +... +###[ IP ]### + version = 4 + ttl = [auto] + proto = tcp + src = [auto] + dst = [0.0.0.0] +... +``` + +Packet2 to trigger drop +``` +... +###[ IP ]### + version = 6 + ttl = [auto] + proto = tcp + src = [auto] + dst = [::0] +... +``` + +##### Get interfaces which are members of LAG, RIF and VLAN. Repeat defined test steps for each of those interfaces. + +##### Test steps +- PTF host will send packet1 +- When packet reaches SONIC DUT, it should be dropped according to the test objective +- Get L3 drop counter +- Verify drop counter incremented +- Get L2 drop counter +- Verify L2 drop counter is not incremented +--- +- PTF host will send packet2 +- When packet reaches SONIC DUT, it should be dropped according to the test objective +- Get L3 drop counter +- Verify drop counter incremented +- Get L2 drop counter +- Verify L2 drop counter is not incremented + +#### Test case #18 +##### Test objective + +Verify DUT drops packet where SRC IP is link-local address + +For ipv4: sip==169.254.0.0/16 + +Packet1 to trigger drop +``` +... +###[ IP ]### + version = 4 + ttl = [auto] + proto = tcp + src = [169.254.10.125] + dst = [auto] +... +``` + +##### Get interfaces which are members of LAG, RIF and VLAN. Repeat defined test steps for each of those interfaces. + +##### Test steps +- PTF host will send packet1 +- When packet reaches SONIC DUT, it should be dropped according to the test objective +- Get L3 drop counter +- Verify drop counter incremented +- Get L2 drop counter +- Verify L2 drop counter is not incremented + +#### Test case #19 +##### Test objective + +Verify DUT drops packet where DST IP is link-local address + +For ipv4: dip==169.254.0.0/16 + +Packet1 to trigger drop +``` +... +###[ IP ]### + version = 4 + ttl = [auto] + proto = tcp + src = [auto]] + dst = [169.254.10.125] +... +``` + +##### Get interfaces which are members of LAG, RIF and VLAN. Repeat defined test steps for each of those interfaces. + +##### Test steps +- PTF host will send packet1 +- When packet reaches SONIC DUT, it should be dropped according to the test objective +- Get L3 drop counter +- Verify drop counter incremented +- Get L2 drop counter +- Verify L2 drop counter is not incremented + +#### Test case #20 +##### Test objective + +Verify DUT drops packet when configured ACL DROP for SRC IP 20.0.0.0/24 + +Packet1 to trigger drop +``` +... +###[ IP ]### + version = 4 + ttl = [auto] + proto = tcp + src = [20.0.0.10] + dst = [auto] +... +``` + +##### Get interfaces which are members of LAG, RIF and VLAN. Repeat defined test steps for each of those interfaces. + +##### Test steps +- PTF host will send packet1 +- When packet reaches SONIC DUT, it should be dropped according to the test objective +- Get ACL drop counter +- Verify drop counter incremented +- Get L3 drop counter +- Verify L3 drop counter is not incremented + +#### Test case #21 +##### Test objective + +Verify egress RIF drop counter incremented while sending packets that are destined for a neighboring device but the egress link is down + +Packet1 to trigger drop +``` +... +###[ IP ]### + version = 4 + ttl = [auto] + proto = tcp + src = [auto] + dst = [auto] +... +``` + +##### Get interfaces which are members of LAG, RIF and VLAN. Repeat defined test steps for each of those interfaces. + +##### Test steps +- Disable egress interface on DUT which is linked with neighbouring device +- PTF host will send packet1 +- Verify that no packets appeared/captured on disabled link +- Get L3 drop counter +- Verify drop counter incremented +- Get L2 drop counter +- Verify L2 drop counter is not incremented +- Enable back egress interface on DUT which is linked with neighbouring device diff --git a/doc/SONiC-User-Manual.md b/doc/SONiC-User-Manual.md index 648621bf11..68b2ac6f71 100644 --- a/doc/SONiC-User-Manual.md +++ b/doc/SONiC-User-Manual.md @@ -49,7 +49,7 @@ Connect the console port of the device and use the 9600 baud rate to access the Users shall use the default username/password "admin/YourPaSsWoRd" to login to the device through the console port. After logging into the device, SONiC software can be configured in following three methods. - 1) [Command Line Interface](https://github.com/Azure/SONiC/wiki/Command-Reference) + 1) [Command Line Interface](https://github.com/Azure/sonic-utilities/blob/master/doc/Command-Reference.md) 2) [config_db.json](https://github.com/Azure/SONiC/wiki/Configuration) 3) [minigraph.xml](https://github.com/Azure/SONiC/wiki/Configuration-with-Minigraph-(~Sep-2017)) @@ -285,7 +285,7 @@ Before Sep 2017, we were using an XML file named minigraph.xml to configure SONi SONiC includes commands that allow user to show platform, transceivers, L2, IP, BGP status, etc. -- [Command Reference](Command-Reference.md) +- [Command Reference](https://github.com/Azure/sonic-utilities/blob/master/doc/Command-Reference.md) Note that all the configuration commands need root privileges to execute them and the commands are case-sensitive. Show commands can be executed by all users without the root privileges. @@ -579,11 +579,11 @@ Basic cable connectivity shall be verified by configuring the IP address for the | # | Module | CLI Link | ConfigDB Link | Remarks | | --- | --- | --- | --- | --- | -| 1 | Interface |[Interface CLI](Command-Reference.md#Interface-Configuration-And-Show-Commands) | [Interface ConfigDB](Configuration.md)| To view the details about the interface | -| 2 | BGP |[BGP CLI](Command-Reference.md#BGP-Configuration-And-Show-Commands) | [BGP ConfigDB](Configuration.md)| To view the details about the BGP | -| 3 | ACL |[ACL CLI](Command-Reference.md#ACL-Configuration-And-Show) | [ACL ConfigDB](Configuration.md)| To view the details about the ACL | +| 1 | Interface |[Interface CLI](https://github.com/Azure/sonic-utilities/blob/master/doc/Command-Reference.md#Interface-Configuration-And-Show-Commands) | [Interface ConfigDB](Configuration.md)| To view the details about the interface | +| 2 | BGP |[BGP CLI](https://github.com/Azure/sonic-utilities/blob/master/doc/Command-Reference.md#BGP-Configuration-And-Show-Commands) | [BGP ConfigDB](Configuration.md)| To view the details about the BGP | +| 3 | ACL |[ACL CLI](https://github.com/Azure/sonic-utilities/blob/master/doc/Command-Reference.md#ACL-Configuration-And-Show) | [ACL ConfigDB](Configuration.md)| To view the details about the ACL | | 4 | COPP |COPP CLI Not Available | [COPP ConfigDB](Configuration.md)| To view the details about the COPP | -| 5 | Mirroring |[Mirroring CLI](Command-Reference.md#Mirroring-Configuration-And-Show) | [Mirroring ConfigDB](Configuration.md)| To view the details about the Mirroring | +| 5 | Mirroring |[Mirroring CLI](https://github.com/Azure/sonic-utilities/blob/master/doc/Command-Reference.md#mirroring-configuration-and-show) | [Mirroring ConfigDB](Configuration.md)| To view the details about the Mirroring | diff --git a/doc/Sonic BSL Test plan.md b/doc/Sonic BSL Test plan.md new file mode 100644 index 0000000000..f4131648b1 --- /dev/null +++ b/doc/Sonic BSL Test plan.md @@ -0,0 +1,80 @@ +#SONiC BSL Test Plan + + +## Overview +This document outlines the Sonic BSL test plan. In BSL mode, Sonic device is brought up as an L2 switch. This test plan validates the functionality by running the tests as described in below sections. More details on BSL can be found in this [document](https://github.com/Azure/SONiC/wiki/L2-Switch-mode#3-generate-a-configuration-for-l2-switch-mode). This must be followed to configure a Sonic device in L2 switch and verify the associated commands before running the below test cases. + +### Scope +--------- +This is limited to Sonic device in BSL mode with the minimal functional verification. + + +## Test structure + +### Setup configuration +------------------- +L2 configuration on a T0 topology + +### Configuration scripts +------------------------- + +Configuration is created from https://github.com/Azure/SONiC/wiki/L2-Switch-mode#3-generate-a-configuration-for-l2-switch-mode. After applying configuration, this also has basic verifications of interfaces and oper status. + +The following is an example script to create config file for Mellanox platform +``` +sonic-cfggen -t /usr/local/lib/python2.7/dist-packages/usr/share/sonic/templates/l2switch.j2 -p -k Mellanox-SN2700-D48C8 +``` +The config contains all the ports set to admin-up and configure as untagged member ports of Vlan 1000. + +Test cases +---------- + +### Test case \#1 + +#### Test objective +Verify basic sanity + +#### Test description +Run sanity test - https://github.com/Azure/sonic-mgmt/blob/master/ansible/roles/test/tasks/base_sanity.yml + +### Test case \#2 + +#### Test objective +Verify FDB learning happens on all ports. + +#### Test description +Run fdb test - https://github.com/Azure/sonic-mgmt/blob/master/ansible/roles/test/tasks/fdb.yml + +### Test case \#3 + +#### Test objective +Verify Vlan configurations, ARP and PING. The current vlan test (vlantb) has two parts - vlan_configure and vlan_test. vlan_configure cannot be run on BSL without modification as it takes into account port-channels. This must be modified to run on BSL configuration. Test must also configure an IP address on Vlan interface. + +This test covers Vlan, ARP and PING tests. + +#### Test description +Run fdb test - https://github.com/Azure/sonic-mgmt/blob/master/ansible/roles/test/tasks/vlantb.yml + + +### Test case \#4 + +#### Test objective +Verify SNMP. This [document](https://github.com/Azure/SONiC/wiki/How-to-Check-SNMP-Configuration) can be referred for basic SNMP verification and configuring public community string + +The current SNMP test must be modified for BSL. The BSL test can cover to get + 1. MAC table + 2. Interface table + 3. CPU + 4. PSU + +#### Test description +Run SNMP test - https://github.com/Azure/sonic-mgmt/blob/master/ansible/roles/test/tasks/snmp.yml + +| **\#** | **Test Description** | **Expected Result** | +|--------|----------------------|---------------------| +| 1. | Sanity | Pass | +| 2. | FDB | MAC Learn | +| 3. | VLAN | PING,ARP succeeds | +| 4. | SNMP | Walk succeeds | +| 5. | | | + diff --git a/doc/acl/Everflow-test-plan.md b/doc/acl/Everflow-test-plan.md new file mode 100644 index 0000000000..b465acea69 --- /dev/null +++ b/doc/acl/Everflow-test-plan.md @@ -0,0 +1,1213 @@ + +- [Overview](#overview) + - [Scope](#scope) + - [Summary of the existing everflow test plan](#summary-of-the-existing-everflow-test-plan) + - [Extend the test plan to cover both ingress and egress mirroring](#extend-the-test-plan-to-cover-both-ingress-and-egress-mirroring) + - [What new enhancements need to be covered?](#what-new-enhancements-need-to-be-covered) + - [Egress ACL table](#egress-acl-table) + - [Egress mirroring](#egress-mirroring) + - [Some existing areas not covered by the existing scripts](#some-existing-areas-not-covered-by-the-existing-scripts) + - [ACL rule for matching "IN_PORTS"](#acl-rule-for-matching-in_ports) + - [ACL rule for matching ICMP type and code](#acl-rule-for-matching-icmp-type-and-code) + - [IPv6 everflow](#ipv6-everflow) + - [How to extend the testing](#how-to-extend-the-testing) + - [Combine the existing test cases](#combine-the-existing-test-cases) + - [Test configurations](#test-configurations) + - [ACL table configurations](#acl-table-configurations) + - [Related **DUT** CLI commands](#related-dut-cli-commands) + - [`sonic-cfggen` Advanced config_db updating tool](#sonic-cfggen-advanced-config_db-updating-tool) + - [`config acl add table ` Add ACL table](#config-acl-add-table-table_name-table_type-add-acl-table) + - [`config acl remove table ` Remove ACL table](#config-acl-remove-table-table_name-remove-acl-table) + - [`config acl update` Update ACL rules](#config-acl-update-update-acl-rules) + - [`acl-loader` Update ACL rules](#acl-loader-update-acl-rules) + - [`aclshow` Show ACL rule counters](#aclshow-show-acl-rule-counters) + - [`config mirror_session` Configure everflow mirror session](#config-mirror_session-configure-everflow-mirror-session) +- [Test structure](#test-structure) + - [Overall structure](#overall-structure) + - [Prepare some variables for testing](#prepare-some-variables-for-testing) + - [Add everflow configuration](#add-everflow-configuration) + - [ACL tables](#acl-tables) + - [Mirror sessions](#mirror-sessions) + - [ACL rules](#acl-rules) + - [Run test](#run-test) + - [PTF Test](#ptf-test) +- [Test cases](#test-cases) + - [Test case \#1 - Packets mirrored to best match resolved route](#test-case-1---packets-mirrored-to-best-match-resolved-route) + - [Test objective](#test-objective) + - [Test steps](#test-steps) + - [Test case \#2 - Change neighbor MAC address.](#test-case-2---change-neighbor-mac-address) + - [Test objective](#test-objective-1) + - [Test steps](#test-steps-1) + - [Test case \#3 - ECMP route change (remove next hop not used by session).](#test-case-3---ecmp-route-change-remove-next-hop-not-used-by-session) + - [Test objective](#test-objective-2) + - [Test steps](#test-steps-2) + - [Test case \#4 - ECMP route change (remove next hop used by session).](#test-case-4---ecmp-route-change-remove-next-hop-used-by-session) + - [Test objective](#test-objective-3) + - [Test steps](#test-steps-3) + - [Test case \#5 - Policer enforced DSCP value/mask test.](#test-case-5---policer-enforced-dscp-valuemask-test) + - [Test objective](#test-objective-4) + - [Test steps](#test-steps-4) +- [TODO](#todo) +- [Open Questions](#open-questions) + +## Overview + +This document is an updated version of the existing everflow test plan: https://github.com/Azure/SONiC/wiki/Everflow-test-plan + +The purpose is to test functionality of Everflow on the SONIC switch DUT with and without LAGs configured, closely resembling production environment. +The test assumes all necessary configuration, including Everflow session and ACL rules, LAG configuration and BGP routes, are already pre-configured on the SONIC switch before test runs. + +### Scope +The test is targeting a running SONIC system with fully functioning configuration. +The purpose of the test is not to test specific SAI API, but functional testing of Everflow on SONiC system, making sure that traffic flows correctly, according to BGP routes advertised by BGP peers of SONIC switch, and the LAG configuration. + +NOTE: Everflow+LAG test will be able to run **only** in the testbed specifically created for LAG. + +### Summary of the existing everflow test plan + +The existing everflow scripts: +``` +ansible/ + roles/ + test/ + tasks/ + everflow_testbed.yml + everflow_testbed/ + apply_config/ + acl_rule_persistent.json + expect_messages.txt + del_config/ + acl_rule_persistent-del.json + acl_rule_persistent.json + acl_table.json + expect_messages.txt + session.json + apply_config.yml + del_config.yml + get_neighbor_info.yml + get_port_info.yml + get_session_info.yml + run_test.yml + testcase_1.yml + testcase_2.yml + testcase_3.yml + testcase_4.yml + testcase_5.yml + testcase_6.yml + testcase_7.yml + testcase_8.yml + files/ + acstests/ + everflow_tb_test.py + everflow_policer_test.py +``` + +The existing everflow test plan only covers ingress mirroring. ACL rules are added to ACL table of type "MIRROR". And by default, packets are only checked against the ACL rules on ingress stage. On ports that are bound to everflow ACL table, any ingress packets hitting the ACL rules are copied to associated mirror destination in GRE tunnel. + +Two packet directions are covered in the exist testing: SPINE -> TOR ports and TOR -> SPINE ports. When the injected packets hit any of the configured ACL rules in the ingress stage, the packets will be mirrored to configured mirror destination. GRE tunnel is used for sending mirrored packets with src, dst IP and other parameters configured in the mirror session. Below test cases are used for verifying that the DUT switch can properly forwarded the mirrored packets according to different routing configurations for mirror session destination. + +Example ACL table in config_db for everflow testing, type of the table is `MIRROR`: +``` +{ + "ACL_TABLE": { + "EVERFLOW": { + "policy_desc": "EVERFLOW", + "ports": [ + "Ethernet100", + "Ethernet104", + "Ethernet92", + "Ethernet96", + "Ethernet84", + "Ethernet88", + "Ethernet76", + "Ethernet80", + "Ethernet108", + "Ethernet112", + "Ethernet64", + "Ethernet60", + "Ethernet52", + "Ethernet48", + "Ethernet44", + "Ethernet40", + "Ethernet36", + "Ethernet120", + "Ethernet116", + "Ethernet56", + "Ethernet124", + "Ethernet72", + "Ethernet68", + "Ethernet24", + "Ethernet20", + "Ethernet16", + "Ethernet12", + "Ethernet8", + "Ethernet4", + "Ethernet0", + "Ethernet32", + "Ethernet28" + ], + "type": "MIRROR" + } + } +} +``` + +Example ACL rules in config_db for everflow testing, action field and value of the ACL rules is `MIRROR_ACTION: `: +``` +{ + "ACL_RULE": { + "EVERFLOW|RULE_1": { + "MIRROR_ACTION": "test_session_1", + "PRIORITY": "9999", + "SRC_IP": "20.0.0.10/32" + }, + "EVERFLOW|RULE_2": { + "DST_IP": "30.0.0.10/32", + "MIRROR_ACTION": "test_session_1", + "PRIORITY": "9998" + }, + "EVERFLOW|RULE_3": { + "L4_SRC_PORT": "4661", + "MIRROR_ACTION": "test_session_1", + "PRIORITY": "9997" + }, + "EVERFLOW|RULE_4": { + "L4_DST_PORT": "4661", + "MIRROR_ACTION": "test_session_1", + "PRIORITY": "9996" + }, + "EVERFLOW|RULE_5": { + "ETHER_TYPE": "4660", + "MIRROR_ACTION": "test_session_1", + "PRIORITY": "9995" + }, + "EVERFLOW|RULE_6": { + "IP_PROTOCOL": "126", + "MIRROR_ACTION": "test_session_1", + "PRIORITY": "9994" + }, + "EVERFLOW|RULE_7": { + "MIRROR_ACTION": "test_session_1", + "PRIORITY": "9993", + "TCP_FLAGS": "0x12/0x12" + }, + "EVERFLOW|RULE_8": { + "L4_SRC_PORT_RANGE": "4672-4681", + "MIRROR_ACTION": "test_session_1", + "PRIORITY": "9992" + }, + "EVERFLOW|RULE_9": { + "L4_DST_PORT_RANGE": "4672-4681", + "MIRROR_ACTION": "test_session_1", + "PRIORITY": "9991" + }, + "EVERFLOW|RULE_10": { + "DSCP": "51", + "MIRROR_ACTION": "test_session_1", + "PRIORITY": "9990" + } + } +} +``` + +Examples of show the ACL table and rules configuration from command line: +``` +$ show acl table EVERFLOW +Name Type Binding Description +-------- ------ ----------- ------------- +EVERFLOW MIRROR Ethernet0 EVERFLOW + Ethernet4 + Ethernet8 + Ethernet12 + Ethernet16 + Ethernet20 + Ethernet24 + Ethernet28 + Ethernet32 + Ethernet36 + Ethernet40 + Ethernet44 + Ethernet48 + Ethernet52 + Ethernet56 + Ethernet60 + Ethernet64 + Ethernet68 + Ethernet72 + Ethernet76 + Ethernet80 + Ethernet84 + Ethernet88 + Ethernet92 + Ethernet96 + Ethernet100 + Ethernet104 + Ethernet108 + Ethernet112 + Ethernet116 + Ethernet120 + Ethernet124 +$ show acl rule +Table Rule Priority Action Match +-------- ------- ---------- ---------------------- ---------------------------- +EVERFLOW RULE_1 9999 MIRROR: test_session_1 SRC_IP: 20.0.0.10/32 +EVERFLOW RULE_2 9998 MIRROR: test_session_1 DST_IP: 30.0.0.10/32 +EVERFLOW RULE_3 9997 MIRROR: test_session_1 L4_SRC_PORT: 4661 +EVERFLOW RULE_4 9996 MIRROR: test_session_1 L4_DST_PORT: 4661 +EVERFLOW RULE_5 9995 MIRROR: test_session_1 ETHER_TYPE: 4660 +EVERFLOW RULE_6 9994 MIRROR: test_session_1 IP_PROTOCOL: 126 +EVERFLOW RULE_7 9993 MIRROR: test_session_1 TCP_FLAGS: 0x12/0x12 +EVERFLOW RULE_8 9992 MIRROR: test_session_1 L4_SRC_PORT_RANGE: 4672-4681 +EVERFLOW RULE_9 9991 MIRROR: test_session_1 L4_DST_PORT_RANGE: 4672-4681 +EVERFLOW RULE_10 9990 MIRROR: test_session_1 DSCP: 51 +``` + +Existing test cases: +* testcase 1 - Resolved route +* testcase 2 - Longer prefix route with resolved next hop +* testcase 3 - Remove longer prefix route +* testcase 4 - Change neighbor MAC address +* testcase 5 - Resolved ECMP route +* testcase 6 - ECMP route change (remove next hop not used by session) +* testcase 7 - ECMP route change (remove next hop used by session) +* testcase 8 - Policer enforced DSCP value/mask test + +### Extend the test plan to cover both ingress and egress mirroring + +#### What new enhancements need to be covered? + +##### Egress ACL table +In Jan 2019, egress ACL table support is added (https://github.com/Azure/SONiC/pull/322, https://github.com/Azure/sonic-swss/pull/741) to SONiC. Then ACL table can have an extra field `stage` indicting on which stage will the ACL rules be checked against packets. If the `stage` is ignored or is set to 'ingress', the behavior is same as before, ingress packets will be checked against ACL rules. If the `stage` field is set to 'egress', then on ports bound to ACL table, egress packets will be checked against the ACL rules and will be handled according to the action configured for the ACL rules. The action could be `PACKET_ACTION` or `MIRROR_ACTION`. + +##### Egress mirroring +Besides the egress ACL table support, a recent enhancement (Design: https://github.com/Azure/SONiC/pull/411 Implementations: https://github.com/Azure/sonic-swss/pull/963 https://github.com/Azure/sonic-utilities/pull/575) added egress mirroring support. This enhancement added two ACL rule action types based on the existing mirroring action `MIRROR_ACTION`: +* MIRROR_INGRESS_ACTION +* MIRROR_EGRESS_ACTION + +The `MIRROR_INGRESS_ACTION` type new. But its behavior is same as the existing ingress mirroring. Packets hit ACL rule will be mirrored at the ingress stage. +The `MIRROR_EGRESS_ACTION` is a new action type which is for egress mirroring. It means that on ports bound to everflow ACL table, when packets hit ACL rules of that table, the packets will be mirrored at the egress stage. The original `MIRROR_ACTION` is kept for backward compatibility and it is implicitly set to "ingress" by default. + +Combining these two enhancements, there are 4 scenarios for everflow. +1. ACL table of type `MIRROR` have `stage` field ignored or set to "ingress". Action type of ACL_RULE is `MIRROR_ACTION` or `MIRROR_INGRESS_ACTION`. +2. ACL table of type `MIRROR` have `stage` field ignored or set to "ingress". Action type of ACL_RULE is `MIRROR_EGRESS_ACTION`. +3. ACL table of type `MIRROR` have `stage` field set to "egress". Action type of ACL_RULE is `MIRROR_ACTION` or `MIRROR_INGRESS_ACTION`. +4. ACL table of type `MIRROR` have `stage` field set to "egress". Action type of ACL_RULE is `MIRROR_EGRESS_ACTION`. + +Expected behaviors for the combinations: + +| - | ACL table stage: ingress | ACL table stage: egress | +| ---------------------------------- | -------------------------------------------------------- | ------------------------------------------------------ | +| Action type: MIRROR_INGRESS_ACTION | Ingress packets hit ACL rules, mirrored at ingress stage | Not applicable | +| Action type: MIRROR_EGRESS_ACTION | Ingress packets hit ACL rules, mirrored at egress stage | Egress packets hit ACL rules, mirrored at egress stage | + +Since not all the combinations are supported by all vendors, the enhancement also added ACL capability detection. The supported ACL action types at different stage are detected and stored in redis. The below is an example of showing detected capabilities: +``` +$ redis-cli -n 6 hgetall 'SWITCH_CAPABILITY|switch' + 1) "MIRROR" + 2) "true" + 3) "MIRRORV6" + 4) "true" + 5) "ACL_ACTIONS|INGRESS" + 6) "PACKET_ACTION,REDIRECT_ACTION,MIRROR_INGRESS_ACTION" + 7) "ACL_ACTIONS|EGRESS" + 8) "PACKET_ACTION,MIRROR_EGRESS_ACTION" + 9) "ACL_ACTION|PACKET_ACTION" +1) "DROP,FORWARD" +``` +In the above example output, two everflow combinations are supported on the platform being checked: +* INGRESS stage, MIRROR_INGRESS_ACTION +* EGRESS stage, MIRROR_EGRESS_ACTION + +This test plan needs to be extended to cover both the existing everflow function and the newly added capabilities. + +#### Some existing areas not covered by the existing scripts + +##### ACL rule for matching "IN_PORTS" +Now the SONiC ACL rules support matching "IN_PORTS". New ACL rule for matching "IN_PORTS" need to be added and covered. + +#### ACL rule for matching ICMP type and code +The SONiC ACL rules also support matching ICMP type and code. New ACL rules for matching ICMP type and code need to be added and covered. The acl-loader utility does not support ICMP type and code yet. The sonic-cfggen tool will be used for directly loading such ACL rules to config_db. + +##### IPv6 everflow +The existing scripts only covered IPv6. IPv6 is also supported by SONiC now. The scripts need to be extended to cover IPv6 everflow too. To cover IPv6: +* Everflow ACL table of type "MIRRORV6" needs to be defined and loaded during testing. +* Different stages (ingress & egress) also need to be covered. +* Different ACL rule mirror actions also need to be covered. +* New set of IPv6 ACL rules need to be defined and loaded during testing. +* The PTF script needs to be extended to inject and monitor IPv6 packets. + +#### How to extend the testing + +To cover the new enhancements, the existing scripts need to be extended: +* The existing structure, sub-tests and PTF scripts can be reused. +* Add new sets of configurations. +* We can refactor the run_test.yml to run the existing sub-tests in multiple iteration. Each iteration loads a different set of ACL table and ACL rules configuration. +* Adjust initialization of variables used in testing if needed. +* Add a new class in the existing PTF script to inject and monitor IPv6 packets. +* Add a sub-test and add a new class in the PTF script to cover ACL rule matching "IN_PORTS". + +In a summary, the extended scripts need to do below work: +1. Firstly the script need to get capability info of the DUT from hard coded resource, then check against the detected capabilities in DB. Fail the test if capabilities do not match. +2. Create everflow ACL table with different stage setting. Load ACL rules with different action type. +3. Run test cases in the existing scripts (8 test cases at the time of writing). +4. Ensure that each combination of the supported ACL table stages and ACL rule action types are covered. + +Summary of the possible combinations: + +| Combinations | ACL table stage: ingress | ACL table stage: egress | +| ------------------------------ | ------------------------ | ----------------------- | +| ACL Rule MIRROR_INGRESS_ACTION | [x] | N/A | +| ACL Rule MIRROR_EGRESS_ACTION | [x] | [x] | + +Totally there are 3 possible combinations. Not all the combinations are supported by all platforms. The actual combinations to be tested are determined the actual DUT platform. + +Switch's ACL capability can be queried from DB table `SWITCH_CAPABILITY|switch`. The testing script can query ACL capability first. And then run the supported combinations, skip the unsupported combinations. + +For example, if query SWITCH_CAPABILITY got below results: +``` +$ redis-cli -n 6 hgetall "SWITCH_CAPABILITY|switch" +1) "ACL_ACTIONS|INGRESS" +2) "PACKET_ACTION,REDIRECT_ACTION,MIRROR_ACTION_INGRESS" +3) "ACL_ACTIONS|EGRESS" +4) "PACKET_ACTION,MIRROR_ACTION_EGRESS" +... +``` +Then the platform only supports two combinations: +* ACL table stage ingress + ACL Rule MIRROR_INGRESS_ACTION +* ACL table stage egress + ACL Rule MIRROR_EGRESS_ACTION + +The third combination would be skipped on this platform. + +#### Combine the existing test cases + +Some of the existing test cases are similar and doing repetitive testing. We remove and combine them to have a shorter list of test cases: +* Test case #1 is covered in #2, #3 and #4. It can be removed. +* Test case #2 and #3 can be combined to one case. +* Test case #5 is covered in #6. It can be removed. + +### Test configurations + +#### ACL table configurations + +New ACL tables of type MIRROR and MIRRORv6 need to be created in testing. The new ACL tables also need to set different values for their "stage" attribute. + +### Related **DUT** CLI commands + +Summary of the CLI commands that will be used for configuring DUT. + +#### `sonic-cfggen` Advanced config_db updating tool + +This is the advanced tool for updating the config_db. It can be used for adding/removing ACL tables, ACL rules, mirror sessions and many more other configurations. + +Some example usages: + +* `sonic-cfggen -j --write-to-db`: Load configuration in json format to config_db. The json file could be ACL table configuration. +* `sonic-cfggen -d -v ACL_TABLE`: Dump current ACL_TABLE configuration from config_db. +* `sonic-cfggen -d -v ACL_RULE`: Dump current ACL_RULE configuration from config_db. + +#### `config acl add table ` Add ACL table + +Usage: `config acl add table [OPTIONS] ` + +This is the formal command for adding ACL table. ACL table added using this command is associated with all interfaces. If ACL table associated with a fraction of the interfaces is needed, the above `sonic-cfggen` method can be used. On versions that this formal command is not supported yet, the `sonic-cfggen` tool can be used. + +#### `config acl remove table ` Remove ACL table + +Usage: `config acl remove table [OPTIONS] ` + +This is the formal command for removing ACL table. + +#### `config acl update` Update ACL rules + +Usages: +* `config acl update full [OPTIONS] FILE_NAME` +* `config acl update incremental [OPTIONS] FILE_NAME` + +This is the formal command for loading ACL rules configuration from file specified by the FILE_NAME. + +#### `acl-loader` Update ACL rules + +Under the hood, the `config acl update` command called this `acl-loader` tool to load ACL rules configurations. For example: +* `acl-loader update full [--session_name= --mirror_stage=]`: Load acl rules specified in a json file to config_db. + +On versions that the formal `config acl update` is not supported yet, this `acl-loader` tool or the `sonic-cfggen` tool can be used. + +The acl-loader utility does not support load ACL rules matching ICMP type and code yet as the time of writing. The `sonic-cfggen` tool will be used for loading ACL rules matching ICMP type&code. + + +#### `aclshow` Show ACL rule counters + +This tool is for collecting ACL rule counters. For example: +* `aclshow -a` + +#### `config mirror_session` Configure everflow mirror session + +This tool is for configuring everflow mirror session. For example: +* `config mirror_session add [gre_type] [queue]` + +## Test structure + +### Overall structure + +The extended ansible test playbook will have below parts: +1. Prepare some variables for testing +2. Add everflow configuration +3. Run everflow sub-tests +4. Clear everflow configuration +5. Repeat steps 1-3 for other configuration scenarios + +The subsequent sections will have more detailed description of part + +### Prepare some variables for testing + +Firstly, some variables need to be prepared for testing. For example, the source ports for injecting traffic. The expected destination ports for mirrored packets. + +### Add everflow configuration + +Before run the sub-tests for each scenario, the scripts need to setup by loading configurations to DUT for the scenario to be covered. + +There will be j2 template files for generating ACL tables and ACL rules configurations. Ansible playbook will generate ACL tables and ACL rules json configuration files to DUT based on these templates, switch capability and running topology. Then commands `sonic-cfggen` and `acl-loader` can be used for loading the configurations. + +Different sets of configuration files will be generated for different test scenarios: +* IPv4 in IPv4 + * ACL table: MIRROR, ingress; IPv4 ACL rules, MIRROR_INGRESS_ACTION; Mirror session IPv4 src & dst IP address + * ACL table: MIRROR, ingress; IPv4 ACL rules, MIRROR_EGRESS_ACTION; Mirror session IPv4 src & dst IP address + * ACL table: MIRROR, egress; IPv4 ACL rules, MIRROR_EGRESS_ACTION; Mirror session IPv4 src & dst IP address +* IPv4 in IPv6 + * ACL table: MIRROR, ingress; IPv4 ACL rules, MIRROR_INGRESS_ACTION; Mirror session IPv6 src & dst IP address + * ACL table: MIRROR, ingress; IPv4 ACL rules, MIRROR_EGRESS_ACTION; Mirror session IPv6 src & dst IP address + * ACL table: MIRROR, egress; IPv4 ACL rules, MIRROR_EGRESS_ACTION; Mirror session IPv6 src & dst IP address +* IPv6 in IPv4 + * ACL table: MIRRORV6, ingress; IPv6 ACL rules, MIRROR_INGRESS_ACTION; Mirror session IPv4 src & dst IP address + * ACL table: MIRRORV6, ingress; IPv6 ACL rules, MIRROR_EGRESS_ACTION; Mirror session IPv4 src & dst IP address + * ACL table: MIRRORV6, egress; IPv6 ACL rules, MIRROR_EGRESS_ACTION; Mirror session IPv4 src & dst IP address +* IPv6 in IPv6 + * ACL table: MIRRORV6, ingress; IPv6 ACL rules, MIRROR_INGRESS_ACTION; Mirror session IPv6 src & dst IP address + * ACL table: MIRRORV6, ingress; IPv6 ACL rules, MIRROR_EGRESS_ACTION; Mirror session IPv6 src & dst IP address + * ACL table: MIRRORV6, egress; IPv6 ACL rules, MIRROR_EGRESS_ACTION; Mirror session IPv6 src & dst IP address + +Totally there are 12 scenarios to cover. To make the scripts flexible, command line options should be added for selecting which scenarios to run. + +#### ACL tables + +For each test scenario, an ACL table configuration need to be created and loaded to DUT. Command `sonic-cfggen` can be used to load the ACL table configuration into config_db. Command syntax: `sonic-cfggen -j --write-to-db`. + +Example ACL table configuration files to be generated for each scenario: +* ACL table: MIRROR, ingress +``` +{ + "ACL_TABLE": { + "EF_INGRESS": { + "policy_desc": "EVERFLOW ingress", + "ports": [ + "Ethernet100", "Ethernet104", "Ethernet92", "Ethernet96", "Ethernet84", "Ethernet88", "Ethernet76", "Ethernet80", "Ethernet108", "Ethernet112", "Ethernet64", "Ethernet60", "Ethernet52", "Ethernet48", "Ethernet44", "Ethernet40", "Ethernet36", "Ethernet120", "Ethernet116", "Ethernet56", "Ethernet124", "Ethernet72", "Ethernet68", "Ethernet24", "Ethernet20", "Ethernet16", "Ethernet12", "Ethernet8", "Ethernet4", "Ethernet0", "Ethernet32", "Ethernet28" + ], + "type": "MIRROR", + "stage": "ingress" + } + } +} +``` +* ACL table: MIRROR, egress +``` +{ + "ACL_TABLE": { + "EF_EGRESS": { + "policy_desc": "EVERFLOW egress", + "ports": [ + "Ethernet100", "Ethernet104", "Ethernet92", "Ethernet96", "Ethernet84", "Ethernet88", "Ethernet76", "Ethernet80", "Ethernet108", "Ethernet112", "Ethernet64", "Ethernet60", "Ethernet52", "Ethernet48", "Ethernet44", "Ethernet40", "Ethernet36", "Ethernet120", "Ethernet116", "Ethernet56", "Ethernet124", "Ethernet72", "Ethernet68", "Ethernet24", "Ethernet20", "Ethernet16", "Ethernet12", "Ethernet8", "Ethernet4", "Ethernet0", "Ethernet32", "Ethernet28" + ], + "type": "MIRROR", + "stage": "egress" + } + } +} +``` +* ACL table: MIRRORV6, ingress +``` +{ + "ACL_TABLE": { + "EFV6_INGRESS": { + "policy_desc": "EVERFLOW IPv6 ingress", + "ports": [ + "Ethernet100", "Ethernet104", "Ethernet92", "Ethernet96", "Ethernet84", "Ethernet88", "Ethernet76", "Ethernet80", "Ethernet108", "Ethernet112", "Ethernet64", "Ethernet60", "Ethernet52", "Ethernet48", "Ethernet44", "Ethernet40", "Ethernet36", "Ethernet120", "Ethernet116", "Ethernet56", "Ethernet124", "Ethernet72", "Ethernet68", "Ethernet24", "Ethernet20", "Ethernet16", "Ethernet12", "Ethernet8", "Ethernet4", "Ethernet0", "Ethernet32", "Ethernet28" + ], + "type": "MIRRORV6", + "stage": "ingress" + } + } +} +``` +* ACL table: MIRRORV6, egress +``` +{ + "ACL_TABLE": { + "EFV6_EGRESS": { + "policy_desc": "EVERFLOW IPv6 egress", + "ports": [ + "Ethernet100", "Ethernet104", "Ethernet92", "Ethernet96", "Ethernet84", "Ethernet88", "Ethernet76", "Ethernet80", "Ethernet108", "Ethernet112", "Ethernet64", "Ethernet60", "Ethernet52", "Ethernet48", "Ethernet44", "Ethernet40", "Ethernet36", "Ethernet120", "Ethernet116", "Ethernet56", "Ethernet124", "Ethernet72", "Ethernet68", "Ethernet24", "Ethernet20", "Ethernet16", "Ethernet12", "Ethernet8", "Ethernet4", "Ethernet0", "Ethernet32", "Ethernet28" + ], + "type": "MIRRORV6", + "stage": "egress" + } + } +} +``` + +#### Mirror sessions + +For each scenario, a mirror session is required. Totally two types of mirror sessions are required for all the scenarios: +* Mirror session using IPv4 source and destination IP addresses. +* Mirror session using IPv6 source and destination IP addresses. + +The script will configure appropriate mirror session using `config mirror_session` while testing each of the scenario. + +Add mirror_session using IPv4 source and destination IP addresses: +``` +$ config mirror_session add session_v4 1.1.1.1 2.2.2.2 8 64 0x6558 0 +$ acl-loader show session +Name Status SRC IP DST IP GRE DSCP TTL Queue +---------- -------- --------- -------- ------ ------ ----- ------- +session_v4 inactive 1.1.1.1 2.2.2.2 0x6558 8 64 0 +``` + +Add mirror_session using IPv6 source and destination IP addresses: +``` +$ config mirror_session add session_v6 2000::1:1:1:1 2000::2:2:2:2 8 64 0x6558 0 +$ acl-loader show session +Name Status SRC IP DST IP GRE DSCP TTL Queue +---------- -------- ------------- ------------- ------ ------ ----- ------- +session_v6 inactive 2000::1:1:1:1 2000::2:2:2:2 0x6558 8 64 0 +``` + +#### ACL rules + +Generate different sets of ACL rules from template. Load the ACL rules using below command: +`acl-loader update full --session_name= --mirror_stage=` + +For IPv4 testing, the ACL rules template: +``` +{ + "acl": { + "acl-sets": { + "acl-set": { + "{{ acl_table_name }}": { + "acl-entries": { + "acl-entry": { + "1": { + "actions": { + "config": { + "forwarding-action": "ACCEPT" + } + }, + "config": { + "sequence-id": 1 + }, + "ip": { + "config": { + "source-ip-address": "20.0.0.10/32" + } + } + }, + "2": { + "actions": { + "config": { + "forwarding-action": "ACCEPT" + } + }, + "config": { + "sequence-id": 2 + }, + "ip": { + "config": { + "destination-ip-address": "192.168.0.10/32" + } + } + }, + "3": { + "actions": { + "config": { + "forwarding-action": "ACCEPT" + } + }, + "config": { + "sequence-id": 3 + }, + "transport": { + "config": { + "source-port": "4661" + } + } + }, + "4": { + "actions": { + "config": { + "forwarding-action": "ACCEPT" + } + }, + "config": { + "sequence-id": 4 + }, + "transport": { + "config": { + "destination-port": "4661" + } + } + }, + "5": { + "actions": { + "config": { + "forwarding-action": "ACCEPT" + } + }, + "config": { + "sequence-id": 5 + }, + "l2": { + "config": { + "ethertype": "4660" + } + } + }, + "6": { + "actions": { + "config": { + "forwarding-action": "ACCEPT" + } + }, + "config": { + "sequence-id": 6 + }, + "ip": { + "config": { + "protocol": 126 + } + } + }, + "7": { + "actions": { + "config": { + "forwarding-action": "ACCEPT" + } + }, + "config": { + "sequence-id": 7 + }, + "transport": { + "config": { + "tcp-flags": ["TCP_ACK", "TCP_SYN"] + } + } + }, + "8": { + "actions": { + "config": { + "forwarding-action": "ACCEPT" + } + }, + "config": { + "sequence-id": 8 + }, + "transport": { + "config": { + "source-port": "4672..4681" + } + } + }, + "9": { + "actions": { + "config": { + "forwarding-action": "ACCEPT" + } + }, + "config": { + "sequence-id": 9 + }, + "transport": { + "config": { + "destination-port": "4672..4681" + } + } + }, + "10": { + "actions": { + "config": { + "forwarding-action": "ACCEPT" + } + }, + "config": { + "sequence-id": 10 + }, + "ip": { + "config": { + "dscp": "51" + } + } + }, + "11": { + "actions": { + "config": { + "forwarding-action": "ACCEPT" + } + }, + "config": { + "sequence-id": 10 + }, + "input_interface": { + "interface_ref": { + "config": { + "interface": "{{ acl_in_ports }}" + } + } + } + } + } + } + } + } + } + } +} +``` + +For IPv6 testing, the ACL rules template: +``` +{ + "acl": { + "acl-sets": { + "acl-set": { + "{{ acl_table_name }}": { + "acl-entries": { + "acl-entry": { + "1": { + "actions": { + "config": { + "forwarding-action": "ACCEPT" + } + }, + "config": { + "sequence-id": 1 + }, + "ip": { + "config": { + "source-ip-address": "2000::20:0:0:10/64" + } + } + }, + "2": { + "actions": { + "config": { + "forwarding-action": "ACCEPT" + } + }, + "config": { + "sequence-id": 2 + }, + "ip": { + "config": { + "destination-ip-address": "fe80::192:168:0:10/64" + } + } + }, + "3": { + "actions": { + "config": { + "forwarding-action": "ACCEPT" + } + }, + "config": { + "sequence-id": 3 + }, + "transport": { + "config": { + "source-port": "4661" + } + } + }, + "4": { + "actions": { + "config": { + "forwarding-action": "ACCEPT" + } + }, + "config": { + "sequence-id": 4 + }, + "transport": { + "config": { + "destination-port": "4661" + } + } + }, + "5": { + "actions": { + "config": { + "forwarding-action": "ACCEPT" + } + }, + "config": { + "sequence-id": 5 + }, + "l2": { + "config": { + "ethertype": "4660" + } + } + }, + "6": { + "actions": { + "config": { + "forwarding-action": "ACCEPT" + } + }, + "config": { + "sequence-id": 6 + }, + "ip": { + "config": { + "protocol": 126 + } + } + }, + "7": { + "actions": { + "config": { + "forwarding-action": "ACCEPT" + } + }, + "config": { + "sequence-id": 7 + }, + "transport": { + "config": { + "tcp-flags": ["TCP_ACK", "TCP_SYN"] + } + } + }, + "8": { + "actions": { + "config": { + "forwarding-action": "ACCEPT" + } + }, + "config": { + "sequence-id": 8 + }, + "transport": { + "config": { + "source-port": "4672..4681" + } + } + }, + "9": { + "actions": { + "config": { + "forwarding-action": "ACCEPT" + } + }, + "config": { + "sequence-id": 9 + }, + "transport": { + "config": { + "destination-port": "4672..4681" + } + } + }, + "10": { + "actions": { + "config": { + "forwarding-action": "ACCEPT" + } + }, + "config": { + "sequence-id": 10 + }, + "ip": { + "config": { + "dscp": "51" + } + } + }, + "11": { + "actions": { + "config": { + "forwarding-action": "ACCEPT" + } + }, + "config": { + "sequence-id": 10 + }, + "input_interface": { + "interface_ref": { + "config": { + "interface": "{{ acl_in_ports }}" + } + } + } + } + } + } + } + } + } + } +} +``` + +For example, if loaded IPv4 ACL rules into ACL table EF_EGRESS, used MIRROR_EGRESS_ACTION, used mirror session session_v4, they should be like below in config_db: +``` +{ + "ACL_RULE": { + "EF_EGRESS|RULE_1": { + "MIRROR_EGRESS_ACTION": "session_v4", + "PRIORITY": "9999", + "SRC_IP": "20.0.0.10/32" + }, + "EF_EGRESS|RULE_2": { + "DST_IP": "30.0.0.10/32", + "MIRROR_EGRESS_ACTION": "session_v4", + "PRIORITY": "9998" + }, + "EF_EGRESS|RULE_3": { + "L4_SRC_PORT": "4661", + "MIRROR_EGRESS_ACTION": "session_v4", + "PRIORITY": "9997" + }, + "EF_EGRESS|RULE_4": { + "L4_DST_PORT": "4661", + "MIRROR_EGRESS_ACTION": "session_v4", + "PRIORITY": "9996" + }, + "EF_EGRESS|RULE_5": { + "ETHER_TYPE": "4660", + "MIRROR_EGRESS_ACTION": "session_v4", + "PRIORITY": "9995" + }, + "EF_EGRESS|RULE_6": { + "IP_PROTOCOL": "126", + "MIRROR_EGRESS_ACTION": "session_v4", + "PRIORITY": "9994" + }, + "EF_EGRESS|RULE_7": { + "MIRROR_EGRESS_ACTION": "session_v4", + "PRIORITY": "9993", + "TCP_FLAGS": "0x12/0x12" + }, + "EF_EGRESS|RULE_8": { + "L4_SRC_PORT_RANGE": "4672-4681", + "MIRROR_EGRESS_ACTION": "session_v4", + "PRIORITY": "9992" + }, + "EF_EGRESS|RULE_9": { + "L4_DST_PORT_RANGE": "4672-4681", + "MIRROR_EGRESS_ACTION": "session_v4", + "PRIORITY": "9991" + }, + "EF_EGRESS|RULE_10": { + "DSCP": "51", + "MIRROR_EGRESS_ACTION": "session_v4", + "PRIORITY": "9990" + }, + "EF_EGRESS|RULE_11": { + "IN_PORTS": "Ethernet4,Ethernet8", + "MIRROR_EGRESS_ACTION": "session_v4", + "PRIORITY": "9989" + } + } +} +``` + +To cover ACL rule matching ICMP type and code, additional ACL configuration is required. Since the acl-loader utility does not support parsing and loading ACL rules matching ICMP type&code, the advanced configuration tool `sonic-cfggen` will be used. + +Firstly, scripts need to prepare json file for ACL rules matching ICMP type&code from j2 template. To cover ICMPv4 and ICMPv6, two templates are required. + +For matching ICMPv4: +``` +{ + "ACL_RULE": { + "{{ ACL_TABLE_NAME }}|RULE_12": { + "MIRROR_EGRESS_ACTION": "{{ MIRROR_SESSION_NAME }}", + "PRIORITY": "9988", + "ICMP_TYPE": "8", + "ICMP_CODE": "0" + } +} +``` + +For matching ICMPv6: +``` +{ + "ACL_RULE": { + "{{ ACL_TABLE_NAME }}|RULE_12": { + "MIRROR_EGRESS_ACTION": "{{ MIRROR_SESSION_NAME }}", + "PRIORITY": "9988", + "ICMPV6_TYPE": "128", + "ICMPV6_CODE": "0" + } +} +``` + +Then use the `sonic-cfggen` tool to dump the current ACL rules configuration from config_db: +`$ sonic-cfggen -d -v ACL_RULE`. + +Generate appropriate ICMP ACL rule configuration json file from j2 templates according to current testing scenario. Combine the dumped ACL rules with the ICMP ACL rules. Load the combined ACL rules into config_db using `sonic-cfggen` again: +`$ sonic-cfggen -j --write-to-db` + +### Run test + +For each configuration scenario, we need to run all the sub-tests. Everflow sub-tests consists of a number of test cases. Each of the test case is executed with log analyzer enabled, for example: + +1. Run loganalyzer 'init' phase +2. Run a everflow test case +3. Run loganalyzer 'analyze' phase + +Each test case may involve with with one or more classes defined in the PTF script. + +#### PTF Test + +The everflow test cases eventually call the ptf scripts to do the actual testing. The PTF scripts inject packets into DUT and validate traffic forwarded by DUT. + +PTF test will generate traffic between ports and make sure it mirrored according to the configured Everflow session and ACL rules. Depending on the testbed topology and the existing configuration (e.g. ECMP, LAGS, etc) packets may arrive to different ports. Therefore ports connection information will be generated from the minigraph and supplied to the PTF script. + +The `EverflowTest` class in everflow_tb_test.py need to be extended to cover IPv6 testing. Need some new methods for sending and validating IPv6 packets. + +## Test cases + +Each test case will be additionally validated by the loganalyzer utility. + +Each test case will add dynamic Everflow ACL rules at the beginning and remove them at the end. + +Each test case will run traffic for persistent and dynamic Everflow ACL rules. + +Each test case will analyze Everflow packet header and payload (if mirrored packet is equal to original). In case of egress mirroring, verify that TTL of the mirrored packet in GRE tunnel is decremented comparing with the injected packet. + +### Test case \#1 - Packets mirrored to best match resolved route + +#### Test objective + +Verify that mirrored packets are forwarded to the best match route for the session destination IP. + +#### Test steps + +- Create route with next hop on port dst_port_1. +- Send packets that hit each Everflow ACL rule. +- Verify that the packets are mirrored to dst_port_1 with correct Everflow header. + +- Create another route with unresolved next hop. +- Send packets that hit each Everflow ACL rule. +- Verify that the packets are mirrored to dst_port_1 with correct Everflow header. + +- Remove the route with unresolved next hop. Create another route with best match prefix and resolved next hop on dst_port_2 +- Send packets that hit each Everflow ACL rule. +- Verify that the packets are mirrored to dst_port_2 with correct Everflow header. + +- Remove the best match route +- Send packets that hit each Everflow ACL rule. +- Verify that the packets are mirrored to dst_port_1 with correct Everflow header. + +- Cleanup all the added routes + +While checking mirrored packets: +- Verify that packets are mirrored to appropriate port. +- Verify that mirrored packets payload is equal sent packets. +- Analyze mirrored packets header. +- In case of egress mirroring, verify that TTL of the mirrored packets is decremented comparing with the injected packets. + +### Test case \#2 - Change neighbor MAC address. + +#### Test objective + +Verify that session destination MAC address is changed after neighbor MAC address update. + +#### Test steps + +- Create route with next hop on port dst_port_1. +- Send packets that hit each Everflow ACL rule. +- Verify that the packets are mirrored to dst_port_1 with correct Everflow header. + +- Change neighbor MAC address of the next hop on dst_port_1. +- Send packets that hit each Everflow ACL rule. +- Verify that the packets are still mirrored to dst_port_1 with correct Everflow header. +- Verify that DST MAC address in mirrored packet header is changed accordingly. + +- Cleanup all the added routes + +While checking mirrored packets: +- Verify that packets are mirrored to appropriate port. +- Verify that mirrored packets payload is equal sent packets. +- Analyze mirrored packets header. +- In case of egress mirroring, verify that TTL of the mirrored packets is decremented comparing with the injected packets. + +### Test case \#3 - ECMP route change (remove next hop not used by session). + +#### Test objective + +Verify that mirror session is still active after removal of next hop that was not used by mirror session. + +#### Test steps + +- Create ECMP route with next hops on dst_port_1 and dst_port_2. +- Send packets that hit each Everflow ACL rule. +- Verify that the packets are mirrored to dst_port_1 or dst_port_2 with correct Everflow header. + +- Add next hop on dst_port_3 to ECMP route +- Send packets that hit each Everflow ACL rule. +- Verify that the packets are mirrored to dst_port_1 or dst_port_2 with correct Everflow header. +- Verify that the packets are not mirrored to dst_port_3 + +- Remove the added ECMP next hop on dst_port_3. +- Send packets that hit each Everflow ACL rule. +- Verify that the packets are mirrored to dst_port_1 or dst_port_2 with correct Everflow header. +- Verify that the packets are not mirrored to dst_port_3 + +- Cleanup all the added routes + +While checking mirrored packets: +- Verify that packets are mirrored to appropriate port. +- Verify that mirrored packets payload is equal sent packets. +- Analyze mirrored packets header. +- In case of egress mirroring, verify that TTL of the mirrored packets is decremented comparing with the injected packets. + +### Test case \#4 - ECMP route change (remove next hop used by session). + +#### Test objective + +Verify that mirror session is still active after removal of next hop that was used by mirror session when there are other ECMP next hops available. + +#### Test steps + +- Create route with next hop on dst_port_1. +- Send packets that hit each Everflow ACL rule. +- Verify that the packets are mirrored to dst_port_1 with correct Everflow header. + +- Add next hops on dst_port_2 and dst_port_3 to route. +- Send packets that hit each Everflow ACL rule. +- Verify that the packets are mirrored to dst_port_1 with correct Everflow header. +- Verify that the packets are not mirrored to dst_port_2 or dst_port_3. + +- Remove the ECMP next hop on dst_port_1. +- Send packets that hit each Everflow ACL rule. +- Verify that the packets are not mirrored to dst_port_1. +- Verify that the packets are mirrored to dst_port_2 or dst_port_3 with correct Everflow header. + +- Cleanup all the added routes + +While checking mirrored packets: +- Verify that packets are mirrored to appropriate port. +- Verify that mirrored packets payload is equal sent packets. +- Analyze mirrored packets header. +- In case of egress mirroring, verify that TTL of the mirrored packets is decremented comparing with the injected packets. + +### Test case \#5 - Policer enforced DSCP value/mask test. + +#### Test objective + +#### Test steps + + +## TODO +- Everflow+VLAN test configuration and test cases (Add VLAN, move destination port in VLAN, test everflow; move destination port out of VLAN, test everflow) +- Everflow+LAG test configuration and test cases (separate ansible playbook) + +## Open Questions diff --git a/doc/acl/acl_stage_capability.md b/doc/acl/acl_stage_capability.md new file mode 100644 index 0000000000..45494143d0 --- /dev/null +++ b/doc/acl/acl_stage_capability.md @@ -0,0 +1,226 @@ +# Egress mirroring support and ACL action capability check + +# Table of Contents + +#### Revision +| Rev | Date | Author | Change Description | +|:---:|:-------:|:------------------:|:------------------:| +| 0.1 | 2019-05 | Blyschak Stepan | Initial Version | + +## Motivation +Not all ASICs support all actions on ingress, egress stages + +E.g.: Egress mirror action on ingress stage or vice versa might be not supported + +## Design + +## 1. Egress mirroring support + +SAI API has two mirror action types - SAI_ACL_ACTION_TYPE_MIRROR_INGRESS, SAI_ACL_ACTION_TYPE_MIRROR_EGRESS which can be set on ingress or egress table. +So SONiC will not restrict setting egress mirror rule on ingress table or vice versa. +To check wheter such combination is supported by the ASIC application should look into SWITCH_CAPABILITY table which is described in part 2 of this document. + +The proposed new schema: + +### ACL_RULE_TABLE +``` +mirror_action = 1*255VCHAR ; refer to the mirror session (implicitly ingress for backward compatibility) +mirror_ingress_action = 1*255VCHAR ; refer to the mirror session +mirror_egress_action = 1*255VCHAR ; refer to the mirror session +``` + +e.g.: +``` +{ + "ACL_RULE": { + "EVERFLOW_INGRESS|RULE_1": { + "MIRROR_EGRESS_ACTION": "everflow0", + "PRIORITY": "9999", + "SRC_IP": "20.0.0.10/32" + } +} +``` + +The above example shows setting an egress mirror action on ingress everflow table. + +mirror_action should be implicitly set to "ingress" by default to be backward compatible + + +### orchagent + +- AclRuleMirror adds processing of new schema and convert to SAI_ACL_ACTION_TYPE_MIRROR_INGRESS/EGRESS based on action key; +- By default mirror action is considered "ingress" to be backward compatible; + +### acl-loader + +- By default acl-loader with ```--session_name``` parameter will produce ingress mirror rule; +- A new parameter ```--mirror_stage=ingress|egress``` will be added; + +e.g.: + +``` +admin@sonic:~$ acl-loader update incremental --session_name=everflow0 --mirror_stage=egress rules.json +``` + +## 2. ACL action capability check + +### orchagent + +AclOrch on initialization will query ACL stage capabilities and store them in internal map: + +| SAI attribute | Comment | +|:-----------------------------------------------------:|:-----------------------------------------------:| +|SAI_SWITCH_ATTR_MAX_ACL_ACTION_COUNT | max acl action count | +|SAI_SWITCH_ATTR_ACL_STAGE_INGRESS | list of action types supported on ingress stage | +|SAI_SWITCH_ATTR_ACL_STAGE_EGRESS | list of action types supported on egress stage | + +For those ACL entry attributes which have isenum == true set in sai_attr_metadata_t we will query supported list of actions using ``` sai_query_attribute_enum_values_capability ``` + +E.g for SAI_ACL_ENTRY_ATTR_ACTION_PACKET_ACTION: + +```c++ +status = sai_query_attribute_enum_values_capability(gSwitchId, + SAI_OBJECT_TYPE_SWITCH, + SAI_ACL_ENTRY_ATTR_ACTION_PACKET_ACTION, + &enum_values_capability); +if (status != SAI_STATUS_SUCCESS) +{ + SWSS_LOG_THROW("sai_query_attribute_enum_values_capability failed"); +} +``` + +The above query will return a list of supported actions from ```sai_packet_action_t``` (DROP/FORWARD/COPY/TRAP etc.) + +**NOTE**: sai_query_attribute_enum_values_capability does not return values supported per stage + +**TODO**: sai_query_attribute_enum_values_capability not yet supported by libsairedis implementation + +#### aclorch.cpp + +```c++ +class AclOrch +{ +public: + ... + // return true if action in attr is supported at stage otherwise return false + bool isActionSupported(acl_stage_type_t stage, sai_acl_entry_attr_t attr) const; + ... +private: + + // query SAI_SWITCH_ATTR_ACL_STAGE_INGRESS/SAI_SWITCH_ATTR_STAGE_EGRESS + // will be called from AclOrch::init(); + void queryAclCapabilities(); + + std::map> m_aclStageCapabilities; + std::map> m_aclEnumActionCapabilities; +... +}; +``` + +and in AclRule + +```c++ +class AclRule +{ +public: + ... + // generic validation of ACL action based on m_aclStageCapabilities + // if some of sai_acl_entry_attr_t values in m_actions keys are enums (isenum == true) + // validate based on m_aclEnumActionCapabilities + virtual bool validateAddAction(string attr_name, string attr_value); + ... +}; +``` + +AclRule derivatives will call base class method ```AclRule::validateAddAction```, e.g. AclRuleMirror: + +```c++ +AclRuleMirror::validateAddAction(string attr_name, string attr_value) +{ + ... // + ... // validate + + ... // fill in m_actions map + + return AclRule::validateAddAction(attr_name, attr_value); +} +``` + +### VS test + +Test case 1: + +VS test cases update to check for differnt combinations ingress/egress table and ingress/egress mirror rule creation + +### system level testing + +TBD + +### Switch capability table + +We will put ACL capabilities in state DB table: + +``` +SWITCH_CAPABILITY|switch +``` + +e.g: +``` +127.0.0.1:6379[6]> hgetall "SWITCH_CAPABILITY|switch" +1) "ACL_ACTIONS|INGRESS" +2) "PACKET_ACTION,REDIRECT_ACTION,MIRROR_ACTION_INGRESS" +3) "ACL_ACTIONS|EGRESS" +4) "PACKET_ACTION,MIRROR_ACTION_EGRESS" +... +``` + +For those action keys which are enums we will put queried supported enum values in DB +``` +5) "ACL_ACTION|PACKET_ACTION" +6) "DROP,FORWARD,COPY,TRAP" +``` + +Producer for ACL_RULE table like acl-loader will look at "ACL_ACTIONS|table-stage" to get a list of supported action keys. +If key is in the list of supported, it will look if "ACL_ACTION|action-key" exists, if it doesn't exist we cannot validate action value (e.g value is not an enum but object like in redirect or mirror key), otherwise acl-loader gets a list of supported values and checks if value is in the list. + +#### NOTE + +To be consistent with SAI data type 'redirect:' will be moved out from PACKET_ACTION key to own REDIRECT_ACTION key. +Old config like ```"PACKET_ACTION": "redirect:Ethernet8"``` should still work for backward compatibility. + +##### ACL_RULE_TABLE +``` +redirect_action = 1*255VCHAR ; refer to the redirect object +``` + + +### libsairedis + +Implement SAI_ATTR_VALUE_TYPE_ACL_CAPABILITY deserialization as it is missing +**DONE** + +### vslib + +Add support for SAI_SWITCH_ATTR_ACL_STAGE_INGRESS, SAI_SWITCH_ATTR_ACL_STAGE_EGRESS to VS. + +Two options: +1. Return all actions as supported +2. Return per VS simulating specific device. Currently there are MLNX and BRCM + +Second options is harder to maintain: +1. every SAI (mlnx/brcm) update will require to check if list of supported actions were updated and update a VS library correspondingly and also update VS test. +2. VS tests need to check which acl test cases to run for which vs simulating device + +So, we prefer to go with first option to make all actions returned as supported by VS. + +### VS test + +Check negative flow in case action is not supported. + +For ingress and egress tables: + - Set custom SAI_SWITCH_ATTR_ACL_STAGE_$STAGE attribute using setReadOnlyAttribute mechanism in VS test infrastructure and restart orchagent to make it reconstruct its capability map; + - Create ACL rule wich is not supported and verify no entry in ASIC DB; + +### system level testing + +TBD diff --git a/doc/bfd/BFD_Enhancement_HLD.md b/doc/bfd/BFD_Enhancement_HLD.md new file mode 100644 index 0000000000..8c580dd019 --- /dev/null +++ b/doc/bfd/BFD_Enhancement_HLD.md @@ -0,0 +1,453 @@ +# Feature Name +Bidirectional Forwarding Detection +# High Level Design Document +#### Rev 0.2 + +# Table of Contents + * [List of Tables](#list-of-tables) + * [Revision](#revision) + * [About This Manual](#about-this-manual) + * [Scope](#scope) + * [Definition/Abbreviation](#definitionabbreviation) + +# List of Tables +[Table 1: Abbreviations](#table-1-abbreviations) + +# Revision +| Rev | Date | Author | Change Description | +|:---:|:-----------:|:------------------:|-----------------------------------| +| 0.1 | 05/15/2019 | Sumit Agarwal | Initial version | +| 1.0 | 18/06/2019 | Sumit Agarwal | Updated community review comments | + +# About this Manual +This document provides general information about the BFD feature implementation in SONiC. +# Scope +This document describes the high level design of BFD, with software implementation.In this implementation, the BFD state machines and session termination happens on the Host CPU, specifically in FRR. + +# Definition/Abbreviation +### Table 1: Abbreviations +| **Term** | **Meaning** | +|--------------------------|-------------------------------------| +| BFD | Bidirectional Forwarding Detection | +| BGP | Border Gateway Protocol | +| GR | Graceful Restart | + +# 1 Requirement Overview +## 1.1 Functional Requirements + 1. Support monitoring of forwarding path failure for BGP neighbor. + 2. Support BFD single hop sessions. + 3. Support BFD multi hop sessions. + 4. Support Asynchronous mode of operation. + 5. Support Echo mode of operation. + 6. Support IPv4 address family. + 7. Support IPv6 address family. + 8. Support LAG interface. + 9. Support ECMP paths for multi hop session. + 10. Support FRR container warm reboot. + 11. Support 64 BFD sessions. + 12. Support minimum timeout interval of 300 milliseconds. +## 1.2 Configuration and Management Requirements +BFD will support CLI in FRR vtysh shell. +## 1.3 Scalability Requirements +Support 64 BFD session with timer of 100 * 3 milliseconds, i.e. detection time of 300 milliseconds. +## 1.4 Warm Boot Requirements +BFD should support planned/unplanned restart of BGP container. + +# 2 Functionality +## 2.1 Target Deployment Use Cases +BFD supports creation of single hop and multi hop session to monitor forwarding path failure. +Single hop session are created for iBGP. +Multihop session are created usually for protocols like eBGP where the neighbors are multiple hop apart. +## 2.2 Functional Description +This document provides functional design and specifications of BFD protocol as defined in RFC 5880, 5881, 5882 and 5883. + +Bidirectional Forwarding Detection, (BFD) is a protocol defined by the BFD working group at IETF. The protocol defines a method of rapid detection of the failure of a forwarding path by checking that the next hop router is alive. The protocol will be able to detect the forwarding path failure in milliseconds depending on the actual configuration. Currently a Routing Protocol takes a few seconds (from 3 seconds to 180 seconds or even more) to detect that the neighbouring router, the next hop router, is not operational causing packet loss due to incorrect routing information. BFD is designed to provide a rapid forwarding path failure detection service to a Routing Protocol in a few milliseconds. + +# 3 Design + + + +![BFD](images/BFD_Block_Diagram.png "Figure 1: BFD in SONiC Architecture") + +__Figure 1: BFD in SONiC Architecture__ + + +BFD is part of the FRR BGP container in SONiC system. BFD daemon communicates with linux kernel using UDP socket to send and receive BFD packets. BFD relies on Linux kernel for routing of BFD packet to its destination. +BFD communicates to applications like BGP through Zebra, BGP sends BFD session create/delete events through Zebra and in case of session timeout BFD informs BGP through Zebra. +## 3.1 Overview +### 3.1.1 Packet Tx +In current FRR BFD implementation, for packet Tx BFD packet is constructed every time a packet has to be sent, this is an overhead considering BFD needs to send packet every few milliseconds. A better approach is to store the BFD packet in memory for each session and keep replaying the packet as per the BFD transmission interval. + +**Stored packets are updated in below circumstances:** + + 1. **Local configuration change:** +When BFD timer configuration is changed the packet stored in memory will be flushed and new packet will be constructed until new timer negotiation is complete. + + 2. **Received Rx packet with poll bit set:** +When a Rx packet is received with poll bit set, BFD will flush the stored Tx packet and a fresh packet will be sent until the negotiation is complete. + +### 3.1.3 LAG support: +When deploying BFD over LAG interface it is expected that BFD session do not flap when a LAG member port flaps. BFD packets are send over a member port in the LAG based on the hashing in the kernel. When the port on which BFD packets are being sent goes down, BFD packets should seamlessly switchover to next available member port in LAG decided by hasing in the kernel. +One BFD session will be created per LAG irrespective of number of member port in the LAG. RFC 7130 does specifies creation of BFD session for each member port of LAG, but this will not be implemented. + +Supporting LAG is challenging in BFD due to time it may take for member port down event to reach control plane, in SONiC when a port is DOWN, the down event has to traverse up to ORCH agent and then back to the kernel, this may take considerable time. + +In current SONiC implementation BFD relies on kernel network stack to switch the BFD packet to next available active port when a member port goes DOWN in LAG. BFD timers in this case is directly proportional to the time it takes for kernel to get the port down event. Faster the kernel learns port down the more aggressive BFD timers can be. In this case it is suggested to configure BFD timer values to have a timeout value of atleast 600 msec. + +### 3.1.4 ECMP Support: +For BFD multihop session there could be multiple nexthop to reach the destination, it is expected that BFD session do not flap when an active nexthop goes down, BFD session should seamlessly switchover to next available nexthop without bringing down the BFD session. + +Supporting ECMP is challenging in BFD due to time it may take for control plane to know that the active nexthop went down. In SONiC this information has to traverse all the way down to kernel after traversing all the DBs this may take considerable time. + +In current SONiC implementation BFD relies on kernel network stack to switch the BFD packet to next available nexthop when a active nexthop goes down. BFD timers in this case is directly proportional to the time it takes for kernel to get the nexthop down event. Faster the kernel learns active nexthop is down the more aggressive BFD timers can be. In this case it is suggested to configure BFD timer values to have a timeout of atleast 600 msec. + +## 3.2 CLI +### 3.2.1 Data Models +NA +### 3.2.2 Configuration Commands + +Config commands as in FRR BGP container is described at below link. +[Config commands](http://docs.frrouting.org/en/latest/bfd.html) +### 3.2.3 Show Commands + +Show commands as in FRR BGP container is described at below link. +[Show commands](http://docs.frrouting.org/en/latest/bfd.html) +**show bfd peer [{multihop|local-address |interface IFNAME ifname|vrf NAME vrfname}]** + +This command is enhanced to add detect-multiplier, enhanced show output is as below: +``` +sonic# show bfd peer + BFD Peers: + peer 10.0.0.103 interface Ethernet204 + ID: 2 + Remote ID: 5 + Status: up + Uptime: 30 second(s) + Diagnostics: ok + Remote diagnostics: ok + Local timers: + Detect-multiplier: 3 + Receive interval: 300ms + Transmission interval: 300ms + Echo transmission interval: disabled + Remote timers: + Detect-multiplier: 3 + Receive interval: 200ms + Transmission interval: 200ms + Echo transmission interval: 0ms +``` + +A new show command is added as below: + +**show bfd peers brief** +Show all the BFD peer in brief, sample output as below: +This command is available in FRR vtysh shell. + +``` +Session count: 1 +SessionId LocalAddr NeighAddr State +========= ========= ========= ===== +1 192.168.0.1 192.168.0.2 UP +``` +### 3.2.4 Debug Commands +Debug commands as in FRR BGP container is described at below link. +[Debug commands](http://docs.frrouting.org/en/latest/bfd.html) + +In current FRR implementation of BFD, debug are reported only in error cases. To debug issues it is necessary to have a few debug in positive code flow to trace the code flow. A few debugs is added in the positive code path and a new command is added to enable/disable debugs in positive code flow. These debugs will be reported only when enabled through this new command. Error debugs will continue to be reported always as in the current FRR BFD implementation. + +This new command will be available in FRR vtysh shell. +Command syntax as below : +**[no] debug bfd** + +### 3.2.5 REST API Support +NA + +# 4 Flow Diagrams +NA + +# 5 Serviceability and Debug +**Configuring BFD** +In order to monitor a forwarding path BFD need triggers from the application to create BFD session with the remote peer. Below are the steps to enable BFD in BGP. + +``` + sonic# + sonic# conf t + sonic(config)# router bgp + sonic(config-router)# neighbor 1.1.1.1 remote-as 7 + sonic(config-router)# neighbor 1.1.1.1 bfd +``` + +Different timer values can be configured to achieve the desired failure detection time. Below configurations can be used to configure BFD timers. + +``` + sonic(config)# bfd + sonic(config-bfd)# peer 1.1.1.1 + sonic(config-bfd-peer)# detect-multiplier 3 + sonic(config-bfd-peer)# receive-interval 200 + sonic(config-bfd-peer)# transmit-interval 200 +``` +**Show Output** +Below output shows BFD session in UP state established for BGP protocol. + +``` + sonic# show bfd peer + BFD Peers: + peer 10.0.0.103 interface Ethernet204 + ID: 2 + Remote ID: 5 + Status: up + Uptime: 30 second(s) + Diagnostics: ok + Remote diagnostics: ok + Local timers: + Receive interval: 300ms + Transmission interval: 300ms + Echo transmission interval: disabled + Remote timers: + Receive interval: 200ms + Transmission interval: 200ms + Echo transmission interval: 0ms +``` +**BFD Counters** +Below output shows BFD counter for particular BFD session +``` +sonic# show bfd peer 192.168.0.1 counters + peer 192.168.0.1 + Control packet input: 126 packets + Control packet output: 247 packets + Echo packet input: 2409 packets + Echo packet output: 2410 packets + Session up events: 1 + Session down events: 0 + Zebra notifications: 4 +``` +# 6 Warm Boot Support +Planned/unplanned warm-boot of a BGP container can be achieved by running BGP in GR mode. + +When BGP container is rebooted, remote BGP neighbour enters GR helper mode when BFD indicate timeout and GR helper mode is enabled. If the flag in BFD control packet indicate that the BFD in remote neighbour is not control plane independent, BFD session down event can be a trigger to BGP to enter helper mode. In GR helper mode remote BGP neighbor should not delete the routes learnt through BGP neighbour and keep the forwarding plane intact until GR timeout. +After warm-boot is completed BGP will re-establish all the sessions and trigger BFD to establish corresponding BFD sessions. + +# 7 BFD packet trapping to CPU + + +BFD packets are classified based on packet pattern defined in RFC 5880. Trapping of Single-Hop and Multi-hop BFD packets for IPv4 and IPv6 address family is supported. + + + +BFD related control plane QoS can be configured by user via 00-copp.config.json file. User can specify which CPU queue a protocol (e.g. BFD) is trapped to and policing parameters to rate limit the protocol packet. Below is an example of the BFD protocol trapping COPP (Control Plane Policing) configuration in 00-copp.config.json file: + +``` +{ +"COPP_TABLE:trap.group.bfd": { +"trap_ids": "bfd,bfdv6", +"trap_action":"trap", +"trap_priority":"5", +"queue": "5", +"meter_type":"packets", +"mode":"sr_tcm", +"cir":"60000", +"cbs":"60000", +"red_action":"drop" +}, +"OP": "SET" +} +``` + +# 7 Scalability + - No of sessions: 64 + - Transmit-interval: 100 milliseconds + - Receive-interval: 100 milliseconds + - Detect-multiplier: 3 + +Goal is to support 64 BFD session with timer of 100 * 3 milliseconds, i.e. minimum detection time of 300 milliseconds. +These timer values are subject to revision based on the BFD sessiom stability observed during multi-dimensional scale test. + +# 8 Enable/Disable BFD daemon +BFD daemon can be enabled and disabled at compile time as well as in the switch. + +**Enable/Disable at Compile time** + +To enable BFD add the below text in corresponding files as below: + +**../dockers/docker-fpm-frr/supervisord.conf** +``` +[program:bfdd] +command=/usr/lib/frr/bfdd -A 127.0.0.1 +priority=4 +stopsignal=KILL +autostart=false +autorestart=false +startsecs=0 +stdout_logfile=syslog +stderr_logfile=syslog +``` + +**../dockers/docker-fpm-frr/start.sh** +``` +supervisorctl start bfdd +``` + +**Enable/Disable in the switch** +BFD daemon can be enabled/disabled in the same way as done during compile time. The file path on the switch are as below. After modifying these files BGP container should be restarted. + +./etc/supervisor/conf.d/supervisord.conf +./usr/bin/start.sh + +# 9 Unit Test +Unit test cases for this specification are as listed below: + +|**Test-Case ID**|**Test Title**|**Test Scenario**| +|----------------|--------------|-----------------| +| | BFD for BGP IPv4 single-hop | +1| | Verify BFD session establishment.| +2| | Verify BFD packet transmission as per configured interval| +3| | Verify BFD state notification to BGP +4| | Verify BFD session establishment trigger from BGP +5| | Verify change of BFD transmission configuration at run time. +6| | Verify deletion of BFD session by BGP config +7| | Verify timeout of BGP session. +8| | Verify timeout notification to BGP. +9| | Verify deletion of BFD session after timeout. | + ||BFD for BGP IPv4 single-hop over LAG | +10| | Verify BFD session establishment. +11||Verify session flap on LAG down +12||Verify session flap on LAG member down. +13||Verify session flap on LAG member UP. + ||BFD for BGP IPv4 multi-hop | +14| | Verify BFD session establishment. +15| |Verify BFD packet transmission as per configured interval +16| |Verify BFD state notification to BGP +17| |Verify BFD session establishment trigger from BGP +18| |Verify change of BFD transmission configuration at run time. +19| |Verify deletion of BFD session by BGP config. +20| |Verify timeout of BGP session. +21| |Verify timeout notification to BGP. +22||Verify deletion of BFD session after timeout.Verify BFD session establishment. + ||BFD for BGP IPv4 Multi-hop over LAG| +23| |Verify BFD session establishment. +24| |Verify session flap on LAG down +25| |Verify session flap on LAG member down. +26| |Verify session flap on LAG member UP. + ||BFD for BGP IPv4 Multi-hop with ECMP| +27| | Verify BFD session establishment for BGP neighbour having ECMP paths. +28| |Verify BFD session switch to next available ECMP path on active path DOWN +29| |Verify BFD session timeout on all ECMP path DOWN. +30| |Verify BFD session timeout when an intermediate path is DOWN. + ||BFD for BGP IPv6 multi-hop| +31| |Verify BFD session establishment with link-local address. +32| |Verify BFD session establishment with global address. +33| |Verify BFD packet transmission as per configured interval +34| |Verify BFD state notification to BGP +35| |Verify BFD session establishment trigger from BGP +36| |Verify change of BFD transmission configuration at run time. +37| |Verify deletion of BFD session by BGP config. +38| |Verify timeout of BGP session. +39| |Verify timeout notification to BGP. +40| |Verify deletion of BFD session after timeout. + ||BFD for BGP IPv6 Multi-hop over LAG| +41| |Verify BFD session establishment. +42| |Verify session flap on LAG down +43| |Verify session flap on LAG member down. +44| |Verify session flap on LAG member UP. + ||BFD for BGP IPv6 Multi-hop with ECMP| +45| |Verify BFD session establishment for BGP neighbour having ECMP paths. +46| |Verify BFD session switch to next available ECMP path on active path DOWN +47| |Verify BFD session timeout on all ECMP path DOWN. +48| |Verify BFD session timeout when a intermediate path is DOWN. + ||BFD CLI| +49| |Verify CLI to cofigure BFD for BGP +50| |Verify CLI to configure transmit interval +51| |Verify CLI to configure receive interval +52| |Verify CLI to configure detection multiplier +53| |Verify CLI to configure echo multiplier +54| |Verify CLI to enable echo mode +55| |Verify CLI to disable echo mode +56| |Verify CLI to shutdown BFD peer. +57| |Verify CLI to configure static IPv4 single hop peer. +58| |Verify CLI to configure static IPv4 multi hop peer. +59| |Verify CLI to configure static IPv4 single hop peer with local address +60| |Verify CLI to configure static IPv4 single hop peer with interface. +61| |Verify CLI to configure static IPv4 single hop peer. +62| |Verify CLI to configure static IPv4 multi hop peer. +63| |Verify CLI to configure static IPv4 single hop peer with local address +64| |Verify CLI to configure static IPv4 single hop peer with interface. +65| |Verify CLI to configure static IPv6 single hop peer. +66| |Verify CLI to configure static IPv6 multi hop peer. +67| |Verify CLI to configure static IPv6 single hop peer with local address +68| |Verify CLI to configure static IPv6 single hop peer with interface. +69| |Verify CLI to configure static IPv6 single hop peer. +70| |Verify CLI to configure static IPv6 multi hop peer. +71| |Verify CLI to configure static IPv6 single hop peer with local address +72| |Verify CLI to configure static IPv6 single hop peer with interface. +73| |Verify CLI to configure static IPv4 single hop peer. +74| |Verify CLI to un-configure static IPv4 multi hop peer. +75| |Verify CLI to un-configure static IPv4 single hop peer with local address +76| |Verify CLI to un-configure static IPv4 single hop peer with interface. +77| |Verify CLI to un-configure static IPv4 single hop peer. +78| |Verify CLI to un-configure static IPv4 multi hop peer. +79| |Verify CLI to un-configure static IPv4 single hop peer with local address +80| |Verify CLI to un-configure static IPv4 single hop peer with interface. +81| |Verify CLI to un-configure static IPv6 single hop peer. +82| |Verify CLI to un-configure static IPv6 multi hop peer. +83| |Verify CLI to un-configure static IPv6 single hop peer with local address +84| |Verify CLI to un-configure static IPv6 single hop peer with interface. +85| |Verify CLI to un-configure static IPv6 single hop peer. +86| |Verify CLI to un-configure static IPv6 multi hop peer. +87| |Verify CLI to un-configure static IPv6 single hop peer with local address +88| |Verify CLI to un-configure static IPv6 single hop peer with interface. +89| |Verify CLI to display IPv4 Peer. +90| |Verify CLI to display IPv4 peer with local address +91| |Verify CLI to display IPv4 peer with interface. +92| |Verify CLI to display IPv6 Peer. +93| |Verify CLI to display IPv6 peer with local address +94| |Verify CLI to display IPv6 peer with interface. +95| |Verify CLI to display IPv4 multihop Peer. +96| |Verify CLI to display IPv6 multihop Peer. +97| |Verify config save and reload of BFD configuration. +98| |Verify unsaved config loss after relaod. + ||BFD static peer| +99| |Verify BFD static IPv4 single hop peer establishment. +100| |Verify BFD static IPv4 multi hop peer stablishment. +101| |Verify BFD static IPv4 single hop peer stablishment with local address +102| |Verify BFD static IPv4 single hop peer with interface. +103| |Verify BFD static IPv6 single hop peer establishment. +104| |Verify BFD static IPv6 multi hop peer stablishment. +105| |Verify BFD static IPv6 single hop peer stablishment with local address +106| |Verify BFD static IPv6 single hop peer with interface. + ||BFD echo mode| +107| |Verify BFD session with echo mode. +108| |Verify BFD echo mode packet transmission as per configured interval +109| |Verify echo mode timeout of BGP session. +110| |Verify echo mode timeout notification to BGP. +111| |Verify echo mode deletion of BFD session after timeout. + ||BFD scale| +112| |Verify 64 BFD IPv4 single hop ession with 100 * 3 msec timer +113| |Verify 64 BFD IPv4 mulit hop ession with 200 * 3 msec timer +114| |Verify 64 BFD IPv6 multi hop ession with 200 * 3 msec timer +115| |Verify 64 BFD IPv6 single hop ession with 100 * 3 msec timer +116| |Verify LAG member down with 64 BFD IPv4 single hop session with 100 * 3 msec timer +117| |Verify LAG member down with 64 BFD IPv4 mulit hop ession with 200 * 3 msec timer +118| |Verify LAG member down with 64 BFD IPv6 multi hop ession with 200 * 3 msec timer +119| |Verify LAG member down with 64 BFD IPv6 single hop ession with 100 * 3 msec timer +120| |Verify active ECMP path down with 64 IPv4 multi hop session. +121| |Verify active ECMP path down with 64 IPv6 multi hop session. +122| |Verify echo mode with 64 IPv4 single hop session. +123| |Verify echo mode with 64 IPv6 single hop session. +124| |Verify 64 IPv4 single-hop static BFD session. +125| |Verify 64 IPv4 multi-hop static BFD session. +126| |Verify 64 IPv6 single-hop static BFD session. +127| |Verify 64 IPv6 multi-hop static BFD session. + ||BFD LOG| +128| |Verify log generation on BFD session UP. +129| |Verify log generation on B sFDession DOWN. +130| |Verify log generation on BFD session DOWN with DOWN reason ADMIN_DOWN. +131| |Verify log generation on BFD session DOWN with DOWN reason "detection timeout" +132| |Verify log generation on BFD session DOWN with DOWN reason "neighbour signalled session down" + ||BFD INTEROPERABILITY| +133| |Verify BFD IPv4 single-hop session establishment with third party device. +134| |Verify BFD IPv6 single-hop session establishment with third party device. +135| |Verify BFD IPv4 multi-hop session establishment with third party device. +136| |Verify BFD IPv6 multi-hop session establishment with third party device. +137| |Verify run time timer config change for BFD sessions with third party device. + + diff --git a/doc/bfd/images/BFD_Block_Diagram.png b/doc/bfd/images/BFD_Block_Diagram.png new file mode 100644 index 0000000000..2ff9406c8b Binary files /dev/null and b/doc/bfd/images/BFD_Block_Diagram.png differ diff --git a/doc/bfd/images/BFD_Block_Diagram.xml b/doc/bfd/images/BFD_Block_Diagram.xml new file mode 100644 index 0000000000..80ca5b71a1 --- /dev/null +++ b/doc/bfd/images/BFD_Block_Diagram.xml @@ -0,0 +1,2 @@ + +3Vldc9soFP01foxHAn35sXGa7LTpNttkpu2+7GCBLSZIeBCOpf31CxGyhZDjeCt3u3mKOFxfiXMP917IBM7z6kagdfaJY8ImwMPVBF5NAPB9D6o/GqkbJIpnDbASFBujPXBP/yYG9Ay6oZiUlqHknEm6tsGUFwVJpYUhIfjWNltyZr91jVbEAe5TxFz0K8Uya9AExHv8N0JXWftmPzLry1FrbFZSZgjzbQeC7ydwLjiXzVNezQnT5LW8NL+7PjC7+zBBCvmaH3z8EH+4fbgkj1X4x0Nef56BOb4wXp4Q25gFfyGYlhclEU9EmA+XdcuGWsNaP6Y1owVWBvBym1FJ7tco1fhWiUBhmcyZGvnqccE3yhDfLnYASh9XQqOfN1J5IQZfUsbmnHHx/Ca4TFKSpgovpeCPpDOzSMIg9NSMS0C7GiIkqTqQIeSG8JxIUSsTMxu1MjPqBIkZb/ex9j2DZd04hwZERl+rne99CNSDicIJEQFORJwoiIZTQ9yRAGBUZjvbESgDsU2ZD1zKgnCIMnAuyqBD2S0tNpWCfidyy8XjM31Kdz/GZE+hOCQJDoYUmoAFjKLz0B0MKDSIBuiOzsV2eIpAvZNpXS5JNLzxcTxbeCNtfDDrqRi6tM4GWA3OxWrksHp5ffUWiB3Q608lNnaI/ZMsBHKoVa5UT0F+UVoDkPT0mriFCgzwmpyL18ThdbnOy7pI8f+M2tBJBfF/TG3blLyUYkmB3+n2VjdjDJUlbXhCQrpwh2FFk6i/6fwxhVHcAt+7s1eVSS/NqO6O7oigao268WvAisrGmw8iM9bOLryp58cG2DvUg7oz6Ls7GM2Sb0RKjrcAav0rIo+nWYKtrt/VRif2Q71MiwnCkKRP9llhSA/mDXecqpUdLO67drR10azb/Krb2/ccQWA7CvvVvyHGcfQsz92yf0CxbjZwFNts4vYQBWxdtgeLvFrpc+R0yfg2zZSap6gouFQk8+IvbYgYXRXKkJGl1N0tFerUpyZ1PiGlHCclwKSXEsKhY8FhWYx/KHBb3JcywoJx1ez28kELKrtrynbEa5POuBOTg0Sesnv8aJytAg/1FadulQAecXTmrQJe0T+/7VCG4UihdCr3K0OpaER1x2ytDcoXPrh/5E2smxb10HgcVyezn9EEJLFvNQFTT+nzXzQCBwV2tCKDX6oiRxGwQu00ea/VZtTTpnPAOXOage490pdKeXqoHBWpoih71djqtwuuL+ms5txAbTFOVdSfLwN1iaUpYu/MRE4xZocaf/swO0LZBnFocQ5h6JTtYEA757uYcsv25c3d2w0A9O0A+IHbN40UADXcX503m2b/Dwj4/h8= \ No newline at end of file diff --git a/doc/bgp_error_handling/BGP_Route_Error_Handling_Arlo.md b/doc/bgp_error_handling/BGP_Route_Error_Handling_Arlo.md new file mode 100644 index 0000000000..99c5361093 --- /dev/null +++ b/doc/bgp_error_handling/BGP_Route_Error_Handling_Arlo.md @@ -0,0 +1,251 @@ + + + + +# BGP Route Install Error Handling +# High Level Design Document +#### Rev 0.1 + +# Table of Contents + * [List of Tables](#list-of-tables) + * [Revision](#revision) + * [About This Manual](#about-this-manual) + * [Scope](#scope) + * [Definition/Abbreviation](#definitionabbreviation) + +# List of Tables +[Table 1: Abbreviations](#table-1-abbreviations) + +# Revision +| Rev | Date | Author | Change Description | +|:---:|:-----------:|:------------------:|-----------------------------------| +| 0.1 | 05/07/2019 | Sudhanshu Kumar | Initial version | + +# About this Manual +This document provides information about how to handle the "route add failure in hardware" related errors in BGP in SONIC. +# Scope +This document describes the high level design of BGP route install error handling feature. Implementation for warm reboot and GR for BGP is out of scope for this feature. When route installation fails in hardware due to table full, BGP may retry again when some routes get deleted. This Retry mechanism in BGP for failed routes will not be implemented in this release. + +# Definition/Abbreviation +### Table 1: Abbreviations + +| **Term** | ***Meaning*** | +|-------------------|-------------------------| +| BGP | Border Gateway Protocol | +| GR | Graceful Restart | +| SONIC | Software for Open Networking in the Cloud | +| FRR | FRRouting | +| FPM | Forwarding Plane Manager | +| SwSS | SONiC Switch State Service | +# 1 Requirement Overview + When BGP learns a prefix, it sends it to route table manager(Zebra). The routes are installed in kernel and sent to APP_DB via fpmsyncd. +The Orchagent reads the route from APP_DB, creates new resources like nexthop or nexthop group Id and installs the route in ASIC_DB. The syncd triggers the appropriate SAI API and route is installed in hardware. The CRM manages the count of critical resources allocated by orchagent through SAI API. +Due to resource allocations failures in hardware, SAI API calls can fail and these failures should be notified to Zebra and BGP. +On learning the prefix, BGP can immediately advertise the prefix to its neighbors. However, if the error-handling feature is enabled, BGP waits for success notification from hardware before advertising the same to its peers. If the hardware returns error, the routes are not advertised to the peers. + +## 1.1 Functional Requirements + + + + 1. BGP should not advertise the routes which have failed to be installed in hardware. + 1. BGP should mark the routes which are not installed in hardware as "FIB-install pending" routes in its RIB-IN table. + 1. Zebra should mark the routes which are not successfully installed in hardware as failed routes. + +## 1.2 Configuration and Management Requirements +## 1.3 Scalability Requirements +## 1.4 Warm Boot Requirements + There is no change needed in BGP warm reboot for supporting this feature. +# 2 Functionality +Refer to section 1 + +## 2.1 Target Deployment Use Cases + +## 2.2 Functional Description +Refer to section 1.1 + +# 3 Design +## 3.1 Overview +On enabling the error-handling feature, fpmsyncd subscribes to the changes in the ERROR_ROUTE_TABLE entries. Whenever the error status in ERROR_ROUTE_TABLE is updated, fpmsyncd is notified. It then sends a message to Zebra's routing table to take appropriate action. +Zebra should lookup the route and mark it as not installed in hardware. It should create a route-netlink message to withdraw this state in kernel. Also, it sends message to the source protocol for the route (BGP). +BGP marks the route as not installed in hardware and does not advertise the route to its peers. It should mark the route in RIB-IN as not installed in hardware and remove it from RIB-OUT list, if any. +For ECMP case, BGP sends route with list of nexthops to Zebra for programming. In fpmsyncd, as per the route table schema, the route is received with a list of nexthops. If the nexthop group programming fails, it is treated as route add failure in BGP. +If the error-handling feature is disabled, fpmsyncd does not receive any notification from ERROR_ROUTE_TABLE. +## 3.2 DB Changes +### 3.2.1 CONFIG DB +A new table BGP_ERROR_CFG_TABLE has been introduced in CONFIG DB (refer section 3.7). +### 3.2.2 APP DB +### 3.2.3 STATE DB +### 3.2.4 ASIC DB +### 3.2.5 COUNTER DB + +## 3.3 FRRouting Design +### 3.3.1 Zebra changes +Zebra, on receiving the message containing route install success, will notify BGP so that it can advertise the route to its peers. +Zebra, on receiving the message containing failed route notification, will withdraw the route from kernel. It will also mark the route with flag as "Not installed in hardware" and store the route. It will not send the next best route to fpmsyncd. At this stage, route is present in Zebra. It will NOT notify BGP of the route add failure. + +### 3.3.2 BGP changes +When BGP learns a route, it marks the route as "pending FIB install" and sends the route to Zebra. The route may or may not be successfully installed in hardware. On receiving route add sucess notification message, BGP will remove the "pending FIB install" flag and advertise the route to its peers. + +In case user wants to retry the installation of failed routes, he/she can issue the command in Zebra. The command will reprogram the failed route in kernel and send that route to hardware. If the route is successfully programmed in hardware, it will notify Zebra. Zebra will, in turn, notify BGP and route will be advertised to its neighbors. + +## 3.4 SwSS Design + +### 3.4.1 fpmsyncd changes +A new class is added in fpmsyncd to subscribe to ERROR_ROUTE_TABLE present inside the ERROR_DB. Subscription to this table is sufficient to handle the errors in route installation. +Currently, fpmsyncd has a TCP socket with Zebra listening on FPM_DEFAULT_PORT. This socket is used by Zebra to send route add/delete related messages to fpmsyncd. We will reuse the same socket to send information back to Zebra. +fpmsyncd will convert the ERROR_ROUTE_TABLE entry to Zebra common header format and send the message. Zebra will send a delete route message to clean the route from APP_DB so that OrchAgent can process it. If processing this results in a further error, then fpmsyncd silently ignores this. + +## 3.5 SyncD + +## 3.6 SAI + + +## 3.7 CLI +### 3.7.1 Data Models +A new table is added in CONFIG_DB to enable and disable error_handling feature. +BGP_ERROR_CFG_TABLE +``` +key = BGP_ERROR_CFG_TABLE:config +``` +The above key has field-value pair as {"enable", "true"/"false"} based on configuration. +### 3.7.2 Configuration Commands +A command is provided in SONIC to enable or disable this feature. + +``` +root@sonic:~# config bgp error-handling --help +Usage: config bgp error-handling [OPTIONS] COMMAND [ARGS]... + + Handle BGP route install errors + +Options: + --help Show this message and exit. + +Commands: + disable Administratively Disable BGP error-handling + enable Administratively Enable BGP error handling + ``` + When the error-handling is disabled, fpmsyncd will not subcribe to any notification from ERROR_ROUTE_TABLE. By default, the error-handling feature is disabled. During system reload, config replay for this feature is possible when the docker routing config mode is unified or split. + This feature can be turned off on demand. But it can affect the system stability. When the config was turned on, there may be some routes in BGP, for which, it is waiting for update from hardware. When the feature is turned off, we will unsubscribe from ERROR_DB and will no longer receive any notifications from hardware. Hence, some of the routes may not receive any notification from hardware. +It is recommended to restart the BGP docker when the config state is changed to disable from enable. By default, this config is disabled. If the config is changed from disable to enable, we do not need to restart the docker. But the feature will be affecting only those routes which will be learnt after enabling the feature. + +### 3.7.3 Show Commands +``` +sonic(config-router-af)# do show bgp ipv4 unicast +BGP table version is 1, local router ID is 10.1.0.1, vrf id 0 +Status codes: s suppressed, d damped, h history, * valid, > best, = multipath,# FIB install pending. + i internal, r RIB-failure, S Stale, R Removed +Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self +Origin codes: i - IGP, e - EGP, ? - incomplete + + Network Next Hop Metric LocPrf Weight Path +*># 21.21.21.21/32 4.1.1.2 0 0 101 ? + +Displayed 1 routes and 1 total paths + ``` + + + + +``` +sonic(config-router-af)# do show ip route +Codes: K - kernel route, C - connected, S - static, R - RIP, + O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP, + T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP, + F - PBR, + > - selected route, * - FIB route, # - Not installed in hardware + +K>* 0.0.0.0/0 [0/0] via 10.59.128.1, eth0, 09:44:37 +C>* 4.1.1.0/24 is directly connected, Ethernet4, 00:01:48 +C>* 10.1.0.1/32 is directly connected, lo, 09:44:37 +C>* 10.59.128.0/20 is directly connected, eth0, 09:44:37 +B># 21.21.21.21/32 [20/0] via 4.1.1.2, Ethernet4, 00:00:07 +``` +A new command has been introduced for seeing the failed routes as follows + show {ip | ipv6} route not-installed [prefix/mask] +``` +sonic# show ip route not-installed +Codes: K - kernel route, C - connected, S - static, R - RIP, + O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP, + T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP, + F - PBR, + > - selected route, * - FIB route # - not installed in hardware +B> # 22.1.1.1/32 [20/0] via 4.1.1.2, Ethernet4, 00:00:20 +B> # 22.1.1.2/32 [20/0] via 4.1.1.2, Ethernet4, 00:00:20 +B> # 30.1.1.1/32 [20/0] via 4.1.1.2, Ethernet4, 00:00:20 +B> # 30.1.1.2/32 [20/0] via 4.1.1.2, Ethernet4, 00:00:20 +B> # 30.1.1.3/32 [20/0] via 4.1.1.2, Ethernet4, 00:00:20 +B> # 30.1.1.4/32 [20/0] via 4.1.1.2, Ethernet4, 00:00:20 +B> # 30.1.1.5/32 [20/0] via 4.1.1.2, Ethernet4, 00:00:20 +B> # 30.1.1.6/32 [20/0] via 4.1.1.2, Ethernet4, 00:00:20 +B> # 30.1.1.7/32 [20/0] via 4.1.1.2, Ethernet4, 00:00:20 +B> # 30.1.1.8/32 [20/0] via 4.1.1.2, Ethernet4, 00:00:20 +``` +### 3.7.4 Debug Commands +In order to retry the installation of failed routes from Zebra, a clear command has been provided. + clear {ip | ipv6} route {not-installed | } + ``` +sonic# clear ip route + not-installed not installed in hardware +sonic# clear ip route not-installed + + A.B.C.D/M ipv4 prefix with mask + X:X::X:X/M ipv6 prefix with mask +``` +The above command will send route add message for the failed route from Zebra to fpmsyncd. + +### 3.7.5 REST API Support + +# 4 Flow Diagrams + ![BGP](images/bgp_error_handling_flow1.png "Figure 1: High level module interaction for route install error notification") + +__Figure 1: High level module interaction for route install error notification__ + +![BGP](images/bgp_error_handling_flow2.png "Figure 2: Module flow for route add error notification") + +__Figure 2: Module flow for route add error notification__ + +![BGP](images/bgp_error_handling_flow3.png "Figure 3: Module flow for route install success notification") + +__Figure 3: Module flow for route add success notification__ + ![BGP](images/bgp_error_handling_flow4.png "Figure 4: Module flow for route delete success/fail notification") + +__Figure 4: Module flow for route delete success/fail notification__ + +# 5 Serviceability and Debug + + +# 6 Warm Boot Support +There are two scenarios here. One is warm-reboot case and another is unplanned reboot (like bgp restart or docker restart due to cold reboot). Note that in current sonic code, we don't retain routes learnt by Zebra across +warm-reboot. +During warm reboot, fpmsyncd supports syncing of existing routes in APP_DB with newly learnt routes. After warm reboot of BGP docker, BGP sends newly learnt routes to fpmsyncd. Since warm reboot is enabled, fpmsyncd will mark the existing db routes and send only the newly learnt routes to APP_DB. If BGP error-handling is enabled, for the routes which were same as before, we will send an implicit positive ACK to Zebra. +Before warm reboot, suppose, we had 5 routes sent by BGP to fpmsynd, out of them., 5th route failed to be installed in hardware. Zebra will delete the 5th route from kernel, APP_DB, ASIC_DB and ERROR_DB. After warm reboot, when fpmsyncd receives the 5 routes again from BGP, it will send add for the 5th route again (since 5th route was never present in APP_DB prior to warm reboot). +During unplanned reboot, for the same 5 routes, suppose, fpmsyncd crashed before processing the add failure for the 5th route. Now, all the 5 routes are present in APP_DB and ASIC_DB. Since +the 5th route failed to be installed in hardware, it is present in ERROR_DB. When docker comes up again, orchagent will send the notification for the 5th route failure to fpmsyncd. fpmsyncd will send this message to Zebra. Zebra cleans the route from APP_DB and ASIC_DB. After the warm reboot timer expires, fpmsyncd will +receive the implicit ACK for the remaining routes (routes which did not change during warm-reboot). If any route changes during warm-reboot, the route will be sent to orchagent for installation in hardware and ACK will take its normal flow. + +# 7 Scalability + +# 8 Unit Test + +The UT testcases are as follows: + +|**Test-Case ID**|**Description**|**Status**|**Comments**| +|----------------|---------------|----------|------------| +1 | Send an iBGP route from the traffic generator and see that route is learnt in BGP. | | Check the route in zebra. fpmsyncd should send route to APP_DB. Check APP_DB. Check that orchagent should send this route to ASIC_DB. Check that syncd should send the route to ASIC. | + 2 | Send an eBGP route from the traffic generator and see that route is learnt in BGP. | | | + 3 | Install a route and check that error status is present in "show ip route" in zebra. | | | + 4 | Install a route and check that route is present in kernel. | | | + 5 | Install a route and check that route is present in APP_DB and ASIC_DB. | | | + 6 | Execute the command "show bgp ipv4". Check that the error status (installed in hardware flag) is shown as 0. | | | + 7 | Check that routes with installed flag as TRUE is sent to eBGP peers. Also, BGP ribout list should have this route. | | | + 8 | Check that routes with installed flag as TRUE is sent to iBGP peers. (Rules for iBGP and route reflector will apply). | | | + 9 | Send an iBGP route from the traffic generator and see that route is learnt in BGP. But this route is not installed in ASIC_DB. Check that error status is correctly shown in APP_DB. Please note that route will be present in BGP/Zebra/APP_DB. | | Check the route in zebra. fpmsyncd should send route to APP_DB. Check APP_DB. Check that orchagent should send this route to ASIC_DB. Check that syncd should send the route to ASIC. | + 10 | Send an eBGP route from the traffic generator and see that route is learnt in BGP. But this route is not installed in ASIC_DB. Check that error status is correctly shown in APP_DB. Check different kinds of errors like nexthop group-id add failed, nexthop add failed, route add failed, route table/nexthop table is full etc. | | | + 11 | Install a route and check that route is present in zebra, but not in ASIC_DB and kernel. | | | + 12 | Install a route and check that route is not present in APP_DB and ASIC_DB. (Can be due to send error from fpmsyncd). | | | + 13 | Execute the command "show bgp ipv4". Check that the error status (installed flag) is shown as some non-zero value. Also check the rib failure flag. In case of error, check if BGP ribout message does not contain this route. | | | + 14 | Check that routes with installed flag as FALSE is not sent to eBGP peers. Also, BGP ribout list should not have this route. If route has already been sent by BGP, it should be withdrawn. | | | + 15 | Check that routes with installed flag as FALSE is not sent to iBGP peers. (Rules for iBGP and route reflector will apply). If route has already been sent by BGP, it should be withdrawn. | | | + 16 | Send a route. Execute "show bgp ipv4". Check that this command shows the error status (installed flag). See that it shows error. Also, check the rib-failure flag. | | | +17 | Send a route. Execute "show bgp ipv4". Check that this command shows the error status (installed flag). See that it shows error. Also, check the rib-failure flag. Now,send the same route again so that it in installed successfully. Check all the flags. | | | +# 9 Internal Design Information \ No newline at end of file diff --git a/doc/bgp_error_handling/images/bgp_error_handling_flow1.png b/doc/bgp_error_handling/images/bgp_error_handling_flow1.png new file mode 100644 index 0000000000..810a9ca6bf Binary files /dev/null and b/doc/bgp_error_handling/images/bgp_error_handling_flow1.png differ diff --git a/doc/bgp_error_handling/images/bgp_error_handling_flow1.xml b/doc/bgp_error_handling/images/bgp_error_handling_flow1.xml new file mode 100644 index 0000000000..bbeb117c54 --- /dev/null +++ b/doc/bgp_error_handling/images/bgp_error_handling_flow1.xml @@ -0,0 +1,2 @@ + +tLzHruQ60y34NN+wAXkzlE9573Im771JSU/f4j7n63t/XDTQk65C7cxNSRRFRqxYK4Kq/6DccElrMtf6lBf9fxAov/6D8v9BEBilsfcDtNz/tJAw/k9DtTb5vyf9rwa3eYp/G6F/W48mL7b/ceI+Tf3ezP+zMZvGscj2/9GWrOv0+5+nlVP/P+86J1XxfzS4WdL/n61hk+/1P60UQv6v9k/RVPV/7wwT9D9HhuS/J//7JFud5NPvf2tChf+g3DpN+z/fhosrejB5/52Xf64T/1+O/j8DW4tx//9yAcZDNC642//Fh623+OZ6Mt7/9W8vZ9If/z7wv4Pd7//OwDvuGXxthr+pYs9i3Zt3grQkLXpr2pq9mcb3eDrt+zS8J/TgAJtkXbVOx5hzUz+tf12h5d+f/60Ppm8qcO0+zW9rss3/LGHZXMU7avbvlsx/W6H/trzf82RP/oMy//yKiPNY/QfhmoA1nR+kStXEvH8M168Fv3q/6eCHmHNM/H6yUs1pyfulYoResAMHY4qbZnEnu9Bx/A/CksecvX2WEC67mf1eIgmsquj+5C/K1HyVyU3j/sa/XXFFIbRKPZ0SaEHjJHroVYhs+GqUq2wcDXP5s0vIk+43ncQInMKrTuVLya9wv+NMmgYJeztRfBiE+f1M77AD+FdI4NbhOwxpLen3UySWY1mJyYfRIMc53+mYRm8whiD7E/cbuRJKxu4Y12F9odHfu7yXVl4p7vfoqTjzIJxaDX0tse/yBBFWfouPLTUlGcH8V9NFQ4B64dS8kEX2UHxMurqv96YZ6iFKIk7NFAwY+RX264qs39v15/33Ho+3SZMKo9HwU+1jMukGOQeHsdS1tKhVJTT//a4h1dTBplh7Yd5rVkMPeVSjYd7fUoww3bGgLep8jg71ID43va0bR81zgtldKt85nSOyPH9NjvB5lrOXi7zGiBunqfJ5e7uGcf6N/uQUYMm6QHQlllzm5bc6hW6oGk28a9akM7nGReLy9IwJbHZy83v2BsFUGu+a6gx0AU3DmKhqdZIS2jZ5mZg8Jje1yBiYii1cg/yoobaMZJZjHCGLD58T10CQs0jy/TUOb3d0Id88avsZHFPvwARGEVnnNHEEuaOgM/OarOcSBpb1jcj1LJtPjGwPS2/IrRQ+VbpBsf80zdLnL0xSgZdjlut8GHWE+sBhjR7dpm18wQQRe0d4ewnlbP+gddS832H2B6WHe36urZLj6HUy0ayP5/sjgFeEdP2cuEEXliBZomskEmR5hlM9CVuYatscLrwtd56/HRWz/d4DGVQisknIlQ3PrZZYmdazKFm8aQR1WL5tk2LlgMjl6gGva6nf845e1L8vQjSqSGpy1taIIleKo+pnfNy4KUDUKjecpCXGTmNkBQwB3RGTiI/2va2zKU50CXkGay3XPtXp4/fbrGDGdpbWJ4zHuLSy1H6uL5vYeDQZZGWug2ldmby+JzKqHvO1+n2hj+V8GuUvZY8EZvpYoarjMCZgDCVQZcd/zHcmwGK9/075BSFELJj90lWWo8zwvGKBOQ2p0z81L1sKb8K0ay9D5D5TpbbdMBzNbjhS4+f8Z6st0nskgdxJX1I0zuqhQAjEXGXz6/rAB1oo+UP8GempibFGFVYGfEoTk6kN6kOpYhklBc/hhUeNmC2p9zB+7wueu3A9Gv04tm+vQkKPX4T6Qn+rKqck2c+e2t5ZIlS6mQlEMVvbmQvK/eeSLJIN7PazJmulULu3OVjajJXRIg3DH5T4htmSGjEBlvpH1Iedl3Dcbb+0Ego/9n02gMTHyTpJ7YokFFWDW0X9SI+janMdfvLKFKxI6bIJxvNcgMQIc+nUQzuqidnz+RCBh2084w09x5LtGj3kTWUYzKLtwHTe8Dhh7fSOTpVXYF2BKvKN+0Gsz+DbYxLH6m0uXLqknwK6nBouT4lX4Unsg5pW8fWg4CVpcER/p8GFE7n3H199v3NkAJwC63tZ+FUCwS7caxIijCkAiejUhdbwmI68+b6twcNwGSTbCPp9GQk7QV75OVP1qwmL1M9UYM8f/uMXlvib3sOHAcVFY8ROs3+bb+ypStTKfBS7iwSLwoX2uvP2ubukHLyfvA95awxPpvudrOxLjC2vbzb5sbTjgIm5j2TrTL1T6QosywlixSkxhS9Guz1SkVR7oUfB5EzthibhKS0rAMZs861v879QAbjhr32w1I521pzveXz0Nv3Gwoxot8Y/1CAvbISpiNsSuinGfBy/Y3KUcsbIT3S0Ywq6EFCURL6F4EB0XSJ5MMRi2rqB6WLlVl4GgZFMsWHpkCVk0I23IT1n+aQPRjscb7sCD5yNZdGBcuQRlQ6irZN9SL2SytSUiHLOyS3K9jUwtXxexlXAn4YWVgD29bIlH4hBBGYlvu2qbr/a4nFVU4/J1sdlWXWDEM4iH2cFSwk6J+mTGCcZUjLth7kDLrJe8OKuLkLLz1WUyKAAzFrnMRoq5NJQIml1P5x6HTaCqZhPQQdOK9Df2xQdvZkWe6kLHn2NNfBoWqnftTAJEn6NgjV70f8Al7BSU7dl10L3t2emAD9je1lsYHLWZtGrtVHFo7CofV0OgqCiE3zRAW/aE2Xr3ocfSPh90stH+8K0fOG9jM1vj63MtjHtPghaEeDO90I1J9wuYSI4G5mqK6NMXlMfmJJDRen6RorXSADRN6Ap6qZe9sZ2MANFOyaWXMqKLqLOiygSY41Ev/zznJ75NetewOMJlpdmI2z81uREdPMZmrcNUswPcUzTh6x37ktn+F0IMrWAwcEDUl2HSnGbpZn664htM+6xHfhChPzeuVbruRnf1RSRq7wOZ2sMczILZ4fQkMWjmOt1t9LFojbFrflK3RHBpTWzePlQuS+9Vl9As5mrCTusdnh9cuGyMHRfb7XiasKceMfr5vwxVbL9hlfSlBxPZmW8adf1LB4cyBTe6vD1sVOyOx9Np6EMchwfJkJin+LzTA5lR5AF3zuvsMkx8l1twuIWLJaaP94NhaalD7ZUIMZpkZWUYq/lhNrKznsIr3rK+qeYsOE/uMm9o/7Yf4G8t0uAFdByVKax4Ct2pvtyfPZV/pVWXrLJ2PfOHiY7W3SFq3xkcWdpuy3K7fqtA/aa38Lj8urAytb2HKoPAeDjLq4jpsFY8+HpvZC40McoSbDMiEhd8EwvX/1Rm/KFOzYLULfdhB2mo4tsfOkTZ6/QYEuzfGO0+Hm04UNjovr7WGpY4ai2gYNMkZLPd55ZH0zAsEEr08Yon9WP+fkL03WIpZbDDlGiv3N8q2rMurFiYg88adk6kVUCxYmu2eS10Iag8yFRtCXeBFoaPFWnnZksnWd0A6I1tWxZHCuSV4lSL/EePq/AsNUvJUJ3DHCmrmu/KfuJN6cXhlj3ydeW7TJmfAfHimMgbZr8x1La6R4T/eJ/ET2jh3nLYqKKwgfK2USyD5lQByWHM7014hN6Eqoy5tTq0tx/agXxul46OrdRZ5jU+6hyzrfzZJba3VK8MRvCj4ZVDbrK7dW2wcVM8Q/69l83c/ygaSec/sLZdxHmhV9XsXYebxof0jzpjz6RCFdDrHZtqWLOOfed7rzFjiWZIe3bdygynUIUBLrSmui3obv7uoMo05tlEiszcGHH6L9z99Ll/CUhaqGVj/gO7XFq3smmWkqfRBjHLv42uQC/cT5ijzIueWC5tp4R/q/l5XmkrtVEXPcxZonBZ7Eu4JfhvR5Fz48G7WAiPb1evTAxlW/SC/IlgajO0777AiMb3AjCDRhi4/Qu7v7h3HpCGWZlA/Nw47b6SM3rg6tyl55uae6SEEwzs+yHMH9ffG5gY4dP35q8qwgg6tL2Y1mMj4+1YqleyUcq+jg1Hue1SzG6YHFMkgu9qtFFgOt0SACNRTiWqqfMfdBuyBLKG8+H4reUiJbFPVOt4wNEuOFn4+Nd/8qmttSDJrR4OMQi85dNl+ovZ27SEt8+Ua/rpvNEhGFpLHwSBcmvhwsCn9q1MOFpApfU4FdvuWV+G6QC4icYvl9Mq8hvM6U8X3fbCvgAh13yLzeVuvU/L1+WFyXA8eo5wuXi5qb82KQnbBFziXHKro2eI/yq7amcUFOrpHRG6ILv6t/zqXpXB9NcTt+uI5hTPHjSvmW1EkXg05FocOBDs7FIrXNT7iA4EA/dSqZHoJNeb+pifqSSkNJ5qSt2pIjEOQUUWIm1zuj8nJ96EwwDWWxyRjX5NQZ0xZr7BF7z4SMmtz5pG+9m7hJu+Tin/eHOjmIxrDfVam20LwOneqZlTIkGkWHjZ3t+bmTEr737qSF8HnNzN5nnTkLwLREbRiksEsTlNaqdcE2Phr/ee6eGi4NNBB6tuRlKMBeMFbN+jNPeHURyKSm1cvA7FWHxDE9Vs6vygSVgxxmEbanYWkNZqardJxsMs2L87XRcVjy+CkmTZ7ZvN9PxWdsQVn25Ce878lMMrZtEei3CkHY1kA+R2sV7NdvX9YZ34be28dSaw64MXg/iMI/OHrPVuoR3wMR0n+RYC4B2QzinIduy0rxEomLAW8SpFjMSxqZ+K1VS9nZD8bb65bU7h/AUxMsZXib+GQsZcPTliGXgTptLAp0rx0mXFDvRGL98+GWagRQuewsB15xKebM+hn7Hehf/aGfN5XkO25avFPjnkkqISAtEBaKYGncgABor8qCYCr7cL+HQqDNkKX4CcYI0hQYw0QhEKlXXa61iKscpTDlf/+Ar94yF6fxROgqjoqAN8UbkX4N15BtDPgnzcmAWnzErTj5+pL+KZ6i4Aw8cZ9MhGllkekrt/VXXIK/wyh52tPvoMRCrJBYdH0l4FP500foNAj5T7Ey/UtZkBMWzzMFGyE6D9x9nxwazyRxCagrhLuB5q/v8dN0PY744MN44Ift6w7QJSVJzKmt/24rJ0Pq8baD18yXawhYMJsCOAG+b+7Hc9eU3D1xKcl5I1hnZ2ctqRf9+9dWtG+MQd5ceV5IvVuuPnI8EEdW14w6MEJOiUCUtTbQXGVHGH9DTVB3jXaJRPRjUsH6JyExmW7Hebapy+7GcMlEHKeuFmxLceuaeuSfwr1wSUAnX9h7As1x8kEiCGiw/n0s33sfJii4f4yK5jQeC1WBmyy/5MpxVqkzZWyepwiVZZw3AGuZ4Xres5C1FtwvOmjc+LWHqpFJrRe4NL3ULRL7+zhQGrIEX4w7iLve2hLoKEaLsSb48V+MBjslTE4vGDJ/MnvIOc5cG3szLMF53dBLGBXTBZYLp9jQK/Vo5CZjmrEnDjM2J3fTyxDb0UkBSF3GtUDxfcSkLfsjGB8hekW0H34AbShOodkiK7ZKLesSgJR97Y/baLUnwNNi0QKlvKUpqGvIIvQde/Srb1H/Jhjgy+mM3VSFHnwdiQ5wM+C/5ooSI71FbEKe/Yws3JbOA0REFn3vzFWwiYhplaBYPAyuMsmd7VJ57MIM2GfNjUeV9eLs6SwdeiMWHNuD3fpCOPSu0Sw4uw4/5VczN06BVDRxddmkY+yF0kIFU1GrLHSBB6fl5JkuafsYIvzheirxcxgGzf/imnG9u9VnWlYpLo66PrBgYCpmYq34ZB8OUWCIrOgbezkJJALs3scgXSIlQocTnR1vtlj2ao7szlt7QBhZghO9KuagHJN25P0Dsc2Xvku8QxXOGfOMyUkHejjvjyuOub5/HkdJInO2qVW7R92+qpjZE+eSxNGQfSISo+jojvOiwp1FGUO+lhD1qdbnaSa1w95x9cXBXFJLB7xglIF/Nx5pkEmXIru0c0EOBU9Dd3dVgi8QINmRlMzYUzcek5ZPgVZiqOOz7Gla2cw0Z3diXOmuJ/Ix2rJ2/RB1fpw5NcmyQ1+2NDTHjMr7lcxht9HNI7KKPaPSgFuAd2SYu8IFZJgqPRoqhvWvm51rIe+ISdYykl4Gc7yO/6hLZBPY8FUrDxqQ48TV1oiRUzkXKnJww3d69otdjVtbn5qhY/YpcND2EkyJ4AfUsPmlz6X09l3SxirxVJmloQvPrqL2AAFWEzFlihLafVo5fNykB0HOakFLFeSDZNu+zLVnOnc/YrXqDcI04dWKrTUK5yWfgD9ehXHQGIJfd5kThXdEVmTE2M2o6am4lEQzZ2aD/uEXaW6wGJQjVpKMPycyMNKptCIcruQ18/168lspqPm41bmAlHLyy/fAmqVXfdWmie3DsDolXR+pUyqcHmhaPSWbbpNOj6XztXBSzJ3En2Ujn+pGuEebryku2U5FWtF6mX/tqABxaaBwVm+M7R6+adcTvcZvJVYYUYrg/RbBbMtQItLaRg486qIX7GROrli+8ufULPBVVG/4MHt01M1A8kZq59nsTj3oVi7q98TPeyEU1bOF6Xd3tUUkxhPQbqMhTSlAAU5lRzUaw2V7r7K64QL17x0g8siY8kb+Fx8R5d4HzjJTEGzfQGz+GMjHBeR9hrodGNsvNqIGdYofdTy1JzrRQ0NdPv8FS5uJT5LfdUhJ0p9z+kmbP0tbPdb3izZJ6OmF8TyQLNNMd3d+pROaMnfZuKsrDrmiNjGfrmO6GGqOsrdWKGwRcKrlG/nQ7M1+cWUOrfb/q/DSV4JaC38qorGs8qWiqavgzjZprMzXBDYVro0WkHJxXIJ17bYb/tiAUyxLc2qKZ85EDwTnAMDe8omn93TdCPqu1rqg4edjlD2nC7fOvaXKkSfu4YSuVAHoS8IihHCstg4APtVR8fGPeLIjGdS1w7IkHGAr2nm0KA27TVjNAevKhRJ5yjbSLBa41eHQhZ//OdEgNXs+y8K9Ch1a3TcUkpbApbxFiZL5uL3PQC7Sy9W2qoD4pXUmrRxBBYjElMegDkae/8UegKLU6DQfx7aeaw51wc5O5GFsqSjGYeZrVXJSCo/h1kjEO5NpWP1ZY5rkQufCTP5a9ss3Yefr1Oc9PTGnRWcoQ9gi8Bv804rc7r8cf80oJvverEU2Q5ixSyJIwR3a7kV1ZDxsktCnak1AOy0tYtPE5opoGtP58DroT2dUjWMVXHNuMl9IFBkQDWIbmV9ZsN7dTCFARHJKTnK9SRAuyNniTq5RHbh6SU/ktH3oCgi9mg7S/8+lUFFvzJlBJgpasRViWdDZX6SWFWB4mC/vUnBvxkG/fFDKGRpbF/siH5PmJIJiCOhiXeaZlWVEJ8z65qUXSBSXSCNebP6vVHT/ci5ZfgnjPvm3RbKkM1iydwERFbo+RYSqJso7+T27r36eefWrss82FeZbGLnJZOetK8kWrig4G+iIUfqtHyojpbdZHNUWMMiGNpaygGr4tuQ9faLKP6L43imoNyRAjbhh3tLlGj31ZCF8WKPaD/dZ8GrP1eXbnHshjqfIMqeDG5+UaCVDkQopTdQgLtvQ63AXKjfzsI6APvHpWCHXfmrAfe+YLR55Ksq3FYILcAA0yvbBmFUMTkS++oPQmTt97N+NiblhBLvRSm2ep8Iks7EUOiTjM55J1+bQgVuK1tZ9hG6Rx9yMiTYjd730EbuODfLnwmYEBMHSIkMiHhRSapglDn+Zj2OMvDWFDylT0qw6Qu9OqRHL3QhfeADmagYA/PEs0PjYBnlLTSTyiQI5MXODPOSS4Kp6zS7KiNu9ryhgfbsaq1CmSwWm+PAC/l2So1XoanGoBYEWz04kA4bi9YNjH7J7lJ+BvlG2k5Y6diRQI1BlizaljjfZwjry5w210buaqM7bIymNywueet/73LF6owzXVoJB3azJjNWPRLw0eyO/9fOn3qhll0zr44Qw0laK8O5fbBVlJ2gBlnObZUG0egst6Bb/QmNa/JThwirmui951gNc7J9AYqzXeZzLq1+SIWTrwFeVAkU7i4SG5L2VVdftHlBw5NHeIHSGrDtABs9EVv1A+93UaNnJ9TFMOgxt3wRlq1/RFG/CbyL4mDKhW+LEIVbUe8oSEm/DGB4CYDoELwPyYw1Nk76UgvAaVU1UuV87zWuHOLflXKWnGj1JOx7fV2gP9rty7eii5Mo4zu4+G6ytwZAt9ngFkz3ATlf3z3LBIy9lUi7CXVeKlsjhxToRHsyDrUtWW90y7qz2DguENPOn0PSPjERjwV858SMxJtoRmlFL/yrogfX6SCYzwKFmd5VOYp+S+dEqifzc9DcyAmdD98z0szC/IM8rdEz2n003qHEQbeT40zLWqNM50FRUHwyLa70sca/Wz8xp2JyynyJRU9axroVadv/Itw2sICssi3HglSr9gGMKMSG5qu4eQO+l4jKX2XUnMVF3coX24qB9uuxEtRzOxOI8yMt3Jb9Itt+BDXA/mxN2lDvv2I8aeZfUBRWLlYQ1BNWSMupRHKJfv9KsYkJFha5q4xuUDWSOOYMw1BMqDi9dlzFRmbSWr+CmUu0fcbt9C25A6UnyXLMyPYvIENToa+ynIXvwrczohGlzR/mtN43Ouk19FTHwpILCH/fIS+Po4dy+3QBHgq8eqnL9It+U0ljFnpHmjRsMv3O+F3RDnMnbBqsyotPtQs2eIgFWIZ3LXQ9eMu/cDEpa8Hkd3LcmiTMfO3bwE7YGYGyn2HY/Gsj/CR6FvWNTUmAZs28RuIKEcnZps37v/SmamFexhZp6EU/yIfAZuPdkXDiEa2oSYSROb8KdJPHFTFZmVxXB0TRCxMfef3PA3ETVltvdIBGVmkVPOljMV1zlTOPOPDioGSdZ8SDrRB2UnGSTMLXcaMhPPOjPJYtWfmBYv0wdoqCHsR/om1hmUL4ADU/TOUIYAyjWfBgt+whTkwyfD2gPKEe+XEMyPeoOGOABXcjzm5zaCfeVgxlEoljFWyGwnvLm+Q0ZV8lzY6u/TOqXv/ifJ0VX4yS4lthi3Qa86JMQU7l8Hu8jsFosS0fPKbwQXlJTTElt4UyUzeTp8O8T3QRhRzB4EDgqnDDz6Mmiq9sa2RGURPCbfWFQQ9drpGNsQt7j/WG3WpVMlPUHkk62x+GMgS50IBUxE9KmkKJoZDUlQJKh1DxCUDY3ngFOgtHJk5Aa5S4J/ACW2nT7cqWmslz7lFKrsHf6L6hTHErIp5xma1GrrKBTrBhGPi+c3HfeH/s101nz4NX/yoWDlseaNgSISKOfsjF9F2L3+knIuEAem58kFkd6N8K29qmq66yN2J6jnAV5z0YdJxVoN8KxmD1OQz0/jdu4F0asceOv+85QH3TprXb4tWEat635WZ7cfLogiz7Ry+5cqm2tF9+/FBykIo7U/w8ZqvB5yq1rnMgFyhFO78XqnCUJIi319UZQE6n6ojBGBOZ2RscLmSyo6dvtmbg+MOyDf2ReHNMYWcylaF81KyshL4P2fxCYh3uHMyy+2lyQ3QdPXPYlonXumP5+EKYszvPRZ8TmMcncCHJbxhtpluHwU/zF14zhCaiUUG4V4NcYKNYlAobT4EMLdmQYEu9zsUVqy0kzYSKoU78DHTizGa3LdWSzeykMR/pJMF8xXmC5MqwAJW6oJqD3eg1kJIK8WCyrx0KAUL+pvtHOblHP7r6hvHpO9rKoBSbgM69x4bkLSQUCpj600hWpxECt4kqFCxQki+FG4T/frEjr8Z0cKi+WncvAeCJsquo/6wX3js20p+7J4vTeoZD79QSERxXKOK9gsHYAjE7DjONU1KFWUViZdpjuc+sRUblH84SrGou19e7KNd7KDL5B4BIzufyWNLGC4nZ/9m72/i1vEwkem3LZ0bE6SKbu8KtND2tZmGZopo/mB60d5yvmIPi2boaf1qlkTaqmuHmqa04wji4ZOvyH/Ihz1JiXiFXM18uTXTa4hEdJQjgocblJi4/NfQXArEMqSkwSCIgXP/GMvk9iw6iZfQhxH5c0yBsyHPh0IrYWSJLxbxZ8yckAZoHxpUwHhUPdy+MzghT9MieJCg6Z8jjtjQ9fZScn7keX2Jht3Wqu6bjbEuBBhOrwwmYYPd32uiHO6dfPNGt4kVGDctZh3pkXkWjEfEVdAqLrZr9GZAJr39o4yWPeKsMXwA+ZViM0aZghEwRCvFQAx4ZZJ6UTGeoafzG6+yqFNylOUaD3E+V/WXZ2bRfe3kpr6HyjB2+t3JytZtPrnrlr9/iy2mgGV0GW5N8Z6RmZkoz8OVOIsMK1Ec7x8FliUqGYcIVfirK821LRTnktoEr5VwW7DzNIzOsUiTQ7nKi5U1x3dOOo4phRtjFfSPhR5MYgtCf+4bwO86ayfHzbpvavYzMBujSqbgk520ORCunuQzUhh4bzBtx/SlnQVcPlgKrzO5avo2h4fL/31sVUM9zlzMbT89bE+jeE+SHERPkF8kLXUDYbZTKaOS7JwT1NlEPjx/jXuq91IIiQPqkwxNG/bDDpvf6ABjTYMsAkIa1l9Vi9H9krzX+aIGehuGT9zKxYn2zH6RtYkbrVvAj8BSwwiBdsiVCBpYRmz2eXuYiOg/p/lOUI0SbTUz/IysvcpbUnp3Sf6oyQG+nDbpJ+aIuPsjlIgTwmwhTqpAuySKHG8VQoKBhvcWhEc7G5w5egWcF8gy4LTKvwy07CPe1/+9pruXszPTD+rXuLIF6AN2N6xUn9W9W/f4DEBloAYmruDV8nyP8EVPHkJAGw4eECZmqln8x7GFJ8bWfdZBXnRnOMv3PxzdgpG9Hz8FCQlhPkyIfi0NAiM9tomOXtwzPk6VhDzV/RoTO/qQR88k7LAmrNhShxajOHdEHwRamFB+MukNa/45XB6xaV6cg6rtlZ59/yi8SDWkOBHk/1z79N4Bt5Gk99Iu2D/qKiV4PAEhmSphL5vv+LSpvvVg4rOuV6TSzYfUaxu1L9DiKl0Jkq5+HtccB1O0XhqZBW+OQWYqGRVhPb77ZCec9IO72i6/ubmWGj+4qXIeTLCiYZ/RUj9n/EI+z+GQgID+r2Q/Y2rzlXD4EeqvgGdZOMh1KnmT6Ctye4c+BeF/woc0wPr9K92FIHuRGmDbZCqZhFW+RxgtV5uTIGUOdhyoYcW+peBGxiJBwvRsgZ90LLxB1MPvK0G25tfufJvKoCObGL2YCuLFgzTWFZA4hPwU+zKCegEns7LB3FpqgPLTV+M5bcrsa9MrnT6qq6PGsBd6eFTiXeAHdFNAPYmiw4zL2glf1GUdweC+npIcPKh9Aqtr7j2qigcZ5R5OQnjcUP3nUJCrkuofpFhYNflVyfq3lFCuXoFoFIkSf8ZTUFhhMEzp6/vBo3MdC4XDcPs4E/N2azSPcsrHfu4YRSh6l8p/v9rX7YyxAdEQ61moKKUvafufVQx8sPN7EcvX5/4bWD6UlOpqfgZVaTA2mLybBG0CiYSPqf7mP57I/33M65lNIIbVB3cummb8PtbYIexttSuvdGHKjWZCWWR5maKPvqAMJbWI93CYRj7Hacz/OK/6o7cEPPvIntgfjRrHVIiOdOm1c1pW+rLAPrALDcRqg7LQmSEn+UVYymsllcZP+gXX7OUGz0yBEnxDCHUS9nW5qha4z013PPUDljz1fFyJVZ4Ef7CDeqNNLID7dJHHo9yUN3BLwhBP9OnrgdZGerVs9j2sPIPP5GcsrBEtdfqtSSC722Da5tw4ARsKZ25PPDmy7o0Df7XiLGwRfXXQNgrc65L01wtyuo6fHr/ouz9q3NXmFLTs5+hMa6jZKTCr3VCh0TC+QDJhHDFrLk0M9IhmS2EL3JMcVsRZVukylWc4NdqFX2IJnY7kuRXfZrqEGfotgMu/fwCnX4Zb3ThuMeQHjyLImRBMVY3VGzTOqSRUou4TPGScsmxEGOhgrEW7daOh5RGyhTnDeX52C5MEdfUfLh+8OFUvBKAGHPf+g4vpisAWSPisipALXpGBv4f3H2OTQC+/XFiUj0xKTisZVsacFQb5hPTalNrMDUAvNXqPdI/iVZmGI5hXN9hA4MUSeeIigUb1lCB6ab85qmbM0j2KUy0E5S0uQAdNv09crEZGhuRK05ZJv1x1n79A4EUj0MkY+ms2jvZKG+BDSNCPAc28QthUpyfH/5kB8VY9AWP/R6NkvVQ/XxEhqFmCLppAHojSNSJgdLLBbGu+/Sxlw23dakA4vJScXOgmuW7WDpVIt129s4QYsmkZgQ0gsSwSXgkf3iY2Jxciv22cu4Tba3H8HMckxzTQ8uAWGXY8kTuicPXGRvVLh8i31Tdy5jdZ4oA/WywaTT3ge1YGqxqp//VyeIN91Z1avqXW8KkRHGDW1xcmyYiKlLNnACiC5VdqaosK7S+/g8JLud7zEGus9VsMZGFZduSmjtisq0C5FBdwoW7TxCo6yqRzD4g5qzIOcKTHJnGzCeFSUArvWKnhZS7W9/wFxJ3Qpar6h8OUToata1AeYS2il/x5CRuGSa/8T7lmvld9THyyNglWOWuZVgyV76ewXGtc3SlT1xa1H2i6eAZ2DorLNfkUbPJrbDZr8IE07Qr9biP6bMwnuxwhxJOccPOX8JKguI3wtMC6tzO2yk3+dU/4Wnxxs9I5Q4Zb1NloeiYEybL4UkCYhEXoilTpkHJzzGVPTdsHe6vcW/8BDwS1YFsMIfaLq4KKGCJQWlLsz5hbxn6T0l6oajusj2INfaqCBv3k1Z4ZYie7TxAoo29Z1yjmPnnScfrRtvgFTnkDeurE2dCW9T0kiUXIxq/P3uZkLFczTZPLShn/p3eoYusPvYc+ryOiFL0h3Ej4mYr8bNVIP5uSfah8i1iPpXN2/5Nd4CkHG90QujTWGmQ+Cbwrms09qV7IOwbq4dxaM3XdToejSgKL9e8BjOg4C2kf6Mb4vQO32ZXo3+aWz4wVDjv6KW52Bil158gFzJJhMBSfNeeYxv63MtyXUoc+yvuZFna4MYaD94irbWQvEYgzZ9EGXFqY8/oL5afFsruHeUgsuOqVDGihoBjrWWs2kX2K0Ynh4HJSOCH4iZ9qCMS2Hj8cZKSGCBzeJj67P2t65T0z/JEX/LqqMgU0cqCnwgQh84KnvKa3qVu5NBqT0dRFqA5NY6TAybScZ4vywzsNS4oUBY+aFzrTCkAhXMeuVWw74h5CP/26z0BqZImdyayRaPXOspi9gtbRgoA7KV08IMCr54h8ZPILl/ZrvR7+MbWRDNjDi+0KaCFpTCLgwxJgX5Ck/f3BUldsBv7q2R3sJkFRfSW9a7vHASEFHby4eg1IW1OaGzMeSAurECEfVjyJynAsn971Ua1hh/8UsBbjN7fqZm5HN5OQHy9xJQXblM2u3RSLV9FSWRoh20dg8rvd6Uhmf5F2i8VP+40J7l6RNyEyRD6kZGoihdKpbX4NkTm4q+AF2hcwJzQbvtnZkfORwP0jKhf6JEIRNq5bABhGT0PVU2932AVJjOh99OW6VIoOlLzLkAj8EoC+3lRdiXk2sxYCxlgCrqRbZ12QPeEetVTEobqD6hPfiD5SBa9pd6Hdc/Pj/qVUqGJ2xsN6zbP/QD/XhqLqVQLdpY5Fcw56IrUdRsYoDTA4k53x2wJv6pBep9askQRy3A8Q4NAyr1SGdPfdV0N+drek9r2SRTlmYQDftKFEH3626uoTCpKUegbCXZVxdEEe3TaAIPpmCCH9prSxjYBuxQEKPM++rawvB/dcDgnydrqdPYleQr66T1jOP+grdbx9r0qu/O3u4IXG2y3P8W5EC+A5lOjeEMq7y2LnngRHUWvync/3G0GBHhjg/AfmBn3erApEHCpM3yfecLZz/WRIiEDzF2X0iR1Cakmi1Rjz+JHk9tYM4XkRLaV6SL1QZIpaEiXXj6GVHBPXjgXW8+LVfXR7ti4/06vS4XVbsxJd1zMPAr6IRJ5mMpUg2snbrca2AfywxHqUJ0Y3Ux+FPbZeJnZqnVZnXzOXmS/DBpFDuxZPixMrh1Gd3kfL51WyylMPX/MOvDWEcDV3jSIPC6OW7LANtGvswOKrlJjPWfsc4cT9zH24lbD9fqNJk2aUFR6WRAgjmroDp8C0tlb/qa3ws2i86Wt3AfEtimWCWT526q95e5Xbp+4+hmLv1wjTejCTLu6kGQjyzPFnNrLPYH0XVsCvwmeQqun8fAtNm8gS1/dThbzTxJ9fYtfzQElhZ+TAhX/DFEZbk3RdlA7uPu23ETLpsSJCkBtjBWV63i2tOLWmkHL8FBPUxTwAdUqapN5BG6MJOcIgtDGR9ddxiY2AVJPvxZ/xkHH/dCYLkGXfNvaqq6Pri5wxEyueZDB+U20nLG7jrQdLfqBu1wGDEvC/eWAaj+mWrX680pzNdha6MXzLxP9sMBjTdFNbyJZBiERIhIAXDtSF4QiO5F9737E8m9YTdEkWPvgYYC4orkTpHcrsZy2Cvaz1ncMAmZ7CicDNocrEuPQxOIS1Typv2dMfXinjm5FN338w+eDDoBg+wPKdfbmKf7tYqfRWf+9h+1MCIkmMAX6mFB15+b3biapxbNsXll4DQ1LU3QsCXaQf2vDyb1wPr6YxjrPSJ8/8O6AF9DYrkMBkh5hUyMWJgMhP+4ggZ8epeI14bYNUtU+FN2DfH0W+nuO0wqEk3InOsHrvcOKLJBRRfGeFrkFKo/lrxTWryYjZxP8QyB4WHnv2jAqZYmGnoevhQN2ix+Y/v7+JU9tZBAMfQXi4HrRRz3JZ3UA+UhOEJBFf8q1lchh46aEko9A+v/b0uYXsLrSmoXx646+hr22/gKsITHeNsrUsZiWV8lUNlBPAt8/vQOuTsGLOy8wjLLVxXHbUsU813h/b40VWcsmU3ihdQcD1lKU9Pn9XdKDGLMOeKCilfYJ/3aVy63YpIwdTUvW5wNh5ZF4q9mp5ebVBu5Jx5/SgEhRWRVIdveyZu5GDTRkPHsMpFAg5wSv1EB/9tA97BB4pVdW68CQ8ymiIChckdiRwbqYHFwuYOSW/nIn/HxJBihyNGCNAH9mnuBC+8m9IfgHWRRdYl1qh8GmtpOmFMtQcehqHy3dZBg0ozw1HIKRg2yL+uF+WjwG0pDyUwPeFyOEi8RZJuRDF6CmhSDlGOsI1yT2IqBXQiBygcfhVoRwxVqj3z0DW7iqbPvgYf62s3o3YvnSZ174XgS7YJyXVugaSt/QGS8X/nL+t3GOXLe8cy7f4bP1iTSM5Zuf4jEsV+LTx909AJ66BcssahQuE0GT/71PpWlJoHFN28W4isOB56RSgOFDGOH0q66+YqiWD7H6A8ocXgl8vTh507HJn7JU4/V9Gp4mypov7wbwsEyEYBeieiTlEfukymiOOg1IgmTCUXmxd3oB6+DUhCa6SPCM9d+7fs2obwWmP9cvw8D7BjzKs5nM9sWeyK3pCt9c1TUfnSaroEid+/7SO+rOKHqhzIYOgUbu40sfh12SNe5Ki2gIXFTDuANePOD08g3zyVCYhDw8JOrfOH4ML5XSRMorEsjWeXcBS6doQhFGZIXeDbxczz0ukWChcVLbn1J3clqD2sCuvsKPcMxwUHOMatQJp9VcL0b1p75x1oDtDNj0xjxpOpNb/3N7SenbKZg6eTcNG67F8eXcQv44UghLS9x3yh7KoCg+JB2W5yZEufvig07wQAmLl74odDl2iy1thcyWIftpIlDYHyaDqMxCHvF22SJErGY5w/vabmVupn9Pb47Zju+9VhTakuJYxKt28klNhQ7zOUxkt2E31AGbvaHSo5c4Pi+6PkDhUNeE5s9ThKrlah0DTPT6rlJnFSgN32qPYf3CYEk0RBUcV1Exk1FEqGKCoeQRwr0YHOD1j+PIawclhLmDtW8t6Vrv+CuBAoqCVEx+PSseB0e4CtSab8rHb8ZJtmQqvOvfDvKiFqXXJ0ma6S+SM7Pd1NBYxGTlpwBP/bgX6jAmNDWqSHyEFNeQGuC/SIikDaSpZAv9Q3KmL4T6uMHhGlNaFXHOq/VdDSXjtMr/XelWcOV8pdmeMl4r6hwJzT2zEWYHWc+wU5SwtymAQ2CyvFPO+waKzjvYUcAe7Uu2YIb2gXX3j4fpuv+bzONqhSaUvBUx12WpFpzXzLnEQHKhhpdDY5OY9iOlYXTgS5QYJm1GDO2H8ia4DPe4XKQnF/TPCWHPj+IAlh0vKfP/MoZcrUQ3ayiWN00bL08i1gr5sTZS+DG1M//B97KwrhGqik/cUc1vny1RKUgpOwkAaP6FaVZ52gSOpOuTM0YbV9e3dbxHLeJ4sdUWuUM+CNgxnyr8s2jFNeiqjUWz/+IkCM5ZKetVTZnripPt50cPBAdhX8uWI7TGVJv6XJQD/73UDlA0Z6M9cB5TEUXaDkZNs8RvNbE28UFJhrNYQqHICRu888Yqokx+49X8LrAF7mgcK3iXCkx9bLEwFHkO9Z2dxYmcXEoZpUpBplj4gTt5kJwClDQvfxCpEgN55/PFgcKrjpTTg/VqioiW+7y4vnIwEWHYSoz+XdMJxwU7vLdhFq6NFn4gFisvCXuyM0oJAv4Jg+flFwyhBqpom856Gurd9lwYGtHYOpUhiIvx0vflmzaG7oL3PblWpr76ENWEan9cHp4npmhS8HJs+LtqXxZfVSZedA4io7UWbwzHWbhEswh+ajENvtaBawbyxZp2xD3GqaivM0Lb0SP5GeKgzhdS4agaLnr1lQPfwvNS+BChEzQzG9G/uPn4/rYOrc7WyDO+fiZcvVJZPaaRJ79XToMYwNoZSDKbI3vWiAHcu9lETvvwk6oN3XM9OllQ4/AClzYpB7UtujUlJsHEUT2F0j/cg1i5Sqd732qt0wQkudRxUfdhBIT4p2/MXi9/eOcJq1gp5PDV+jmI+kNpWQlmtuK6SeQoeF75UHJC77GL/6yVtV+VcsL3tWIoqzicsLFqUWG0QmuU2Vw8v5n0KsFb/EHRdND3YntyBk8xdtnSCrN/hzYiFP1TehQCNrQiIygIfFqvQH1tm54UpAIatDYa8UjtywZOjLZ4Ksd1cPkAPhSCwAs40a/lLLhgoDvEzOscywQT1FlYvLzMyeQe3Y2lYB34oGYDuTeTHIXJ4kgLIdUuZPT1tgV5iI1mSG9muRG6NHArw0BUv7l/7SuCoJ8P2VRoXfHwVNdzqQ0j2lfxTM53PDr49L8WmFT4Rb51ubmPrX/sFWea0sQoBOQqjgvQtfhwYANQFl7GCOax5jm2V24AduLRvAGShIdUR+GWuYMSoAbo05sJtJFG1b2SYEOb5ppdPbWKJImV8ePN1n5q6HO1VPFiX34j07xkxotvOv6Y4Kk8k4nizuG2U5lZNuQxfTTHFqWwyfrUvGVTSLNhFA1s5MRXMWdec1KtyoDcbH5mZ6Tab2N/OF+GVCO/UPA/kbBoNaKXUUoOPGr7GrmPEZcly3jz+ttflTAj5LA5xTUhaJLpaEkSfzQEneQehO1LBCb8IhUKU9caZ4tN/PEsFqiXDQ+tE9UvHc7Why9YdATbszSUJO1fofUJym/AVmh+Dqs36gqRYtslSEHqWzTteDO49aEfOu42wsVyp6rchUJn6bcj+5rrbd/XbUROKXWd2+6uGVVlPGha4a0ffhL7rZvkk01TuksZLJXMEkoagTR7GinOTXN+zgNl8PWsV9GEYtJtT3JoR+pHhoIDFXG2LauuuCXcOew/5qdsvJIoeeRlKPMZFipBfY6KHl6zBOYuvniHmYK6zelY/waPAdv4hz2In9Z5iHmpo/yFtaC6eSloP/oOoIDkLt1ip1O2q7tt0qGxVbjuTmvQ4unmVEER1QqGGzEe+9Sninu6mjlShsWRjtZueElU/HCZHF0I8Qala+8TipLuytX35dm67cXXLAJKcR7vYcFA8m6JJAmoba88nS5JFoVKRtKAuV3phtFv2r93D2K0+xFX1ULT0n0JUwi9rki9Fyqt0rvcXzJ8mVyi7Z6CZNaTmsKiuUjVT9v+XuuUrjhL1JYoB65j3Sn82TWJbeLv6jCpBZtNA2xHHTfdtmWc6HXNdYRAbC/5mj9/IyvaMDkZaB3OJqcRpPG67jcKMzHXhMctI/dxj+U++Z8uwEQFmBJFo2uoqcRJsY5ChRlXBRLX6AyyImv99+pUYmuvmCwZO+K2vUlU5yOs7syFFC3RMdzaHqJPP7FiOPFihdksCdGHKaJE0iRXPd4K0Dg6HhfPsPDHKjQ9nw9dXVBUFNerSyL0wnQ62+OPgt0XR9E+0igwK5XYospoqqm2+a0FOM/iXtriBiQDOiwSrQZ2X6RNge0QA1AVdKE0eRpmf69wRRhQip/mlp3PwBxZQGPWHNZ3B9Bn8AOVPiUDFwEs0t8H0aRlWrUb8TrjFfCIMIA6aL3f2PDee8RGHhmr2ia0+fUTOtNTFN/oQU1Nya4bXM+dg/WPJs2Gp/BAEp8noCgu/gk13cXOULUC4ni8oqb6zgldZO+zTVtcNE8S1JEQnMfXDmfKjEiYXKNsV5PAN5ndF9ZK0xxS7Y+qvNu+TQbTKI7vgzrEPmbnGXs3KdB/JIux8DGyYlHKHCCw/+/uvlvJYmVJ7mvoQwsTWmsND1ofaPn1BObe3X0OTUYwGDFjnMYMDtAiK7Oquhowa6THvhkBibbmTJSdpbfKkyCCXWrcHonjMxJSXDUhtQr2Z3GyrbKs6RXA36r2j25te08DgkhXXxnz+W7Z7RM7T/4jjsk3C/Tk6NWHt4/LIINpEkRpFnimm0Fhsw3R8wXv8QrckTBbqtzWaJnhkpwLRr8FiSR6NQtWfWm65hl1/yq0evbVY8hueL8mm8VGmeYDqu205kekU/xOzxCz/ZA4tyNc02uKks3NkB+yGTKGnBCCXqg0FezzQ4qjrH2hb2tTIc+YaM+pmtwGp0nlswB3oR9Jie0+cgBdM0HqN1P4iqEsleaUeeChsqMDgZvA/jWr8ZDawcX0Ik32s8JKftZ3H+rZAED5WzjBVaTh+ITyeXDNyec7VWe/hIdMonxuN49eLkzXjkZW7VcBPqoAhtuJ5bxvjg5MZ0+0vZ2pG+UHtcrkgmM0bGTY5w14sD+SpSz7dRFakVgX+ykCbIPgyVG227Ne/rA6FW/gPIm9E0Ycl5TGlKsymyMI4kHymLhFVXXa6LXU1sWCzk2LRYt0TWwNQVj9sxMhXu6GZk8e25cuYhUmCO5/DhuBnrBvLkdHaX1PQlAkDVFBcLreVo23b4LZydkb9TOeEYG/KVhvJplkOIDarTIItlaDVPVt//eik6pap7c+c/f5l2lzr6Ej1fukTtS1ugUnKfxzaSHNvwX78zy9QIwtLN0cY9VAM6Mqs76qCtgO4StR3Y6f5XbFGuK3epP/fTXKiM/xwonyl4vCtjDC5eR1IeX2ba9s/RZMxrE19lD/vW/NSw1AVoQJvPahW0/fGdybxdwrr66fQQrh6DgJUT3vogJs8zUG/QEFk2FFEpC3CBtBSaMTqZ7rfh8lM6Ud80nawowNJm11fOnZbPl58p4yg9J6fOhlP8sCXji8xs+bMS3AEn4uYeT5FacdiRYclePtohkxhrt6oG237km/FQgP6nO+avDPDYjo6zup1dcGuj7BNiKU1TANr68VjLa5+vluDHbduUPGF8+YeHsRZ/jy8aruViIwdoelvc8t+0+OyrKWRN0OnEu8y90UvJNvwVbNWh/lSgsHTOL21kZi04swEY5XA5CjaurxzhWatjili5HNflrV2OwAedylOZun/BRCljgl07XOUXz2VBzJCFweVBytsylutrQXS94LiORIiueCX32v0aK8wdLW01N8FZUYSvZa9L3gTdZpdVY2nh5mS2hkUDL3XQi8iTrtjrKmE7ht6ZV11CRVaKyF3iTdNke9OHc+dvReqGqpmmwz7Gr5JTa0E1P3Q1uTd3SVRPWO+XsvOD1LGzFgei8of4lHis5lEi+92sneFIlAH9Lua2zals9MltkDioSHgMLW0K7t4dh+j1wo7BFHfg7Z6y/JBofjQOVhBXjBbC/CP6cpZ/6T2BOD9To9TFbxWCxxu0V9vuXAGLisAQ5CDssD+Fgx45b3ApjmWm3fNDpGUzXcf9K5+PWTlM3CE4THYlqR4jDoUut9lVRnJZH0M20ot5pxtF/b/Kmaz1gWBEH82cQP4nuzML4VUMZJerAwnh9ilwGl1ASsMjDMzcLy56V2rty1BTrS9AQ+9mlLvketQM+W+pCSdDfsf8Qf2EXVCHY/EYpKKtpJ9R+TNxTsYWUfWpmwqinpuZvajqmOpGrl/I53dFaA+VO+dzL8f17Me5kSiJnHt1WNR3RRX3PxubXVR5mu6YPdSE0K5cZvaxb1vRXGbSVKZXkR4Dj83eC8vpWOtsl447/dQ+Rsc6hOLmIip/lbHL1vEM7wn2+Dv+XyuOf7nH/Jke+PdXy31f95gTOfLDql2Z9s4QbWvkjqMbblrMcCE1fwUYjrLykf1em/Wl8fyvAouGD6cgk1BFXYCK9mEvi+E54WpvTEejgxq0qOWT5nxdwpkpsm6ax2WXzsX8NLI83hq3PxVSPSr6AGZcNPPxPvLB4fWT06hPiqLL1SAoTx34sXsXEnQ67RjmajL+FKAwaA1WEl2P6OjfV2pctspYXUOYn/29X1xb0+WLcDw09NJUdvuNR3KAkSjeRRmpHtqwQBZ5JUpbNtEK8Fi5Z7/BECjY+cL8OJj+PIywvE/jKcCuLqWzrgvmwpOggCXgx3Lqa4ITy0ODr9+suWovy+r/+fulcoYcyC6GN43JAKw7Je9H13kqekxeeWqSjMaMXlaVN3QXXouS8TbzmpY39uW6gGfEaG2mBYutGJeJedCspsloNoL9QvFARabDm0jGcl4v3jvPtswExHVPPNFWeXosfkPDtoavu+rBEzHPrxhz9PaUg7aylF/65xO4fVV3s5xZByooBuEVMCaL8OjR30U7CsxvWosrwk6xxIk7I+X9gumZJZQE90H8lVb3eldVxJoa0PTwDe53nX/orDBcWPaHpgsBELUeaRuacmFwfvVxbiY83xKXyo2Z2safQg4mgVk6onATZDRo8vassa9TxTXnCtRA03YerR9KMShmsYl9IILAxr6abk9WCfQPm/c69DOPgZzqylCKyHHzo3Lm8zVJ6e0T70jWQ2siKuL9Jd9p7PSZRKEkp/OaTRjPerLW3eczH7TzvvepzCCYl349XQbG5xt2j3xnRBkd1qJ/vR3fDyPzgIOEoy1ITlSvkBiQjTuzyvq4XnO+fPtawv3l/gVCu6vHQ+0h9GXoqEAaFxaflnVpBEEeurFDyY3Y0GCnFIHafn59ts5tkMNPzKtQtiY4lC+7P4tidoNQXNulT+4L2t6gdIdBe7xg3kFyYLdZHKad/3tgRT8UiOBiYo6o/zXyQPX8wyMNMojaeOL53ny2HVMmq0hNLY+YiELQW1h+IH5ySjkJJUC1GiBkq6ockLgkoEO+Cy71Z3L4Huf7k56BPY7W5UNAwRqaex+pMu4Zedf0TTFdY4/9F/1+Iq7FYEy9IHQhMN+Aesmq4nhLwGtNDfJW3zRHt9Tlwg5jcyO+/lJeA++QHnD3KiLF2+TqxedllL48x0H1e5OrIQZn9LHbxZms8fUupSdyy7HvbYQGTtKxOwSMrDL4EluTYQNzjN/ctguMqIBuj5S5z2CNjPtYRmjRTN+9A4BlQo3Fwn7adEDWPArdHmT4z291/hdX6CjpQ6uMc8yw0KwEVKH91TIMR3K3hCEm//PA+LYKg593morvGCv0hc7rAiUoFgSkGfo05YGpiP/t0tGIOEqnN/4bTC98w5Yz7ricFptDGxkN3tq6c/j/+6AZ/feQ959sCv3hKh2D7vfcWD2OaZaZ2Ty3v7cfWtYPi21fhf3wsfO3+wjRRZhzFVp26mdkLeB4kqy6ngp53+XIXHYpRlMOQCmApFuOeXrx2/wrjSaZem2qATixgdSVr/UvVpOLXkPIsvwRt9x7Dyoiof0gjeh3DbzdEmdgLV7CW0g2GyVBx7SsIkIjuSj4FB71LBidoR6t9vB1kWf+Zw8kpg3X7mKsL2lIzTleBQOeqVoUDbPbcwa7Hu97VrVM2IZMTweudGVA/EcT46UkMr1hm5Yk9/eU7R5gg7i47fB1lrKj03PtXxfvrBgnMdmk7j3fAcjyoej8cZrNx5usoJJbyEuQmaKvy+8y1ZgO40+EwuzDLC2ezbApiQ0Pbz82BkI7FLVbyJ2KaZsWV4rpg5g8P6vKy1R8OFw4tBicIBCQM8kiOkVgbaq95mgZSJkBbHAN+IRB6u9PPXhb972NWtiwP4TLIpYbBoPIimHlCwTShh2yFrO+36fNd0KSVK0u4IFRmIC3il4LzPsDZ39yA5qSD2ASkh+5HKV90b3L3LjXEvqr+maEeLQnjpdRZi1u88Z9UsJprnLdvLc+/84U0bIGOzILnltaLBvmtWCGjwWQ/ObyxGK73mB6Du4GfmwAvYiKyzoXpnJ4YA1SO8eTXvvKE9MGTKEWwVrkvPyycwVGeDFix8ZbhTvFUiFELTfrMeqDx46KdfdZN5bftjmEVHzpOM7tdEdli+FZwLCgiDdsQOIFq3dh/V1dzA8ubCQgHD8nTWOeEvt0fjaKg/WZjsa2Jqu9KadvCiTVZePUpYkQvaRDec8jbw60/JT9aaO4Wpq+wHz/mWa0UtQ+C9bWVXfpZYEfusRYw0zt5nKQmRJzg+BB82rWvgy6AOv6BrZgc11CvF2AYiIyduJELHgrJk2F/bWI050sEkvAPTdnmqO8Wx1t2me/5VHnm/wvcEK5fQXH6kcR/nuvpSZi2ldAFCA2APckmhE0p7c4/MmNau1r6ylyIw7xnCmbBVy+z5V1jljHKam4NtN77yVu87M8g0dylHKcKwuSmyUokI5fo+1lViiH0BoSRLYD+k1NnYyE7UJ7gQ4Lm3Iz+RDF10Mzw/e698QvihWl7DY6we7fs+PktSGu+9C69Jlq6h1GOsTRMRima2ghmSRR3u0W8XCN2plmxTKJmoEs28i5arPBtQayw+52D0o4AatNOCrb/A1swC5W7vCbXkuDUUqVqOmbTPP54xqANrxEuAljpTD7eQpc2CtaS/T9/r7tWvPzcf3TEt8ekaClrC1fa93Gjnp6E7PEKsDCB89sIEWCZ0FCjHaPzCJTRs079nIbpbu+oig17q14CJFR2nVdpMWSYzKzU10DaPEgN0J7iW13F3xb36kL5v6tWdZNd8+nDru/YlZSLN2PTdcOFEXX2DvDxU74fW/rtg0cwghjJ19wzwihS6Hx7/vSBTFq10oijT7wXoYzS9AAeOJiqUxcud+Pbbqztr65iary62Iss2x3uQI1WMr0DvH/LKmXEyJyqSXXSsEKgVaweqvq+DF9KXHOESeeyPIK29XQKG/DKj3i70Do2jhfv5mXArcRUsA6gb8c1jPX/x3KPa+yP66dhZr1AOKRMOGJu7uL7Ja5RTkpznoYA5Cprv46jlqtlwZgeJphUrfMBl5AaybvRoFpCYpBEMRwxflqx1SPo3StAayBYnvNbjs7VIo0kKJWqBY/9bEfudTW7F/JWPtV5NH1hxR9XaK8K7+71wc4RDeV9z9B/N7d0BNsV1TkBFP4qm/tgf1R5n93H5LvApqX2bX+3PIVRrAt3bX3wnxJTkfk4E7W3WavOVBn/N8n80E5POvQKC6ytBPinrHyYq4YRMf7qC7y1GepsVR6INSnpI+Rv/mB4shrupinHfZlZyr3cIv+bmP5pNxf38D7Yc0U32vvYfk/2yx3jG+pSJHXvU/+3+OBTtrH5UF5MDO4oQ/gFZF3ylHaJn+70c8Gm6v51C9ZCWD/XJlObm5Wo9m52VR+erz8z/vrQ9AptKFO/NFAoziI1WbC+8EO6J6+26uW8efkUK6heosPdtGm6BVAdhszxpuBcFpVEQvfChaHvqzghetVZbgn9GF21V5HMCX3Sr3bNPbg/OX1EKPplph5hx7urnQNHvT6uihvbOJHq2oT5qLIqAptZkkYykM1XiB8uwYtuujCsMcunqhsaRVBVwqrdDPxVI0Q+aFJHlaSlwo8FlQtHe9J38Dk/nD4xf+ZhmMZZ30jvhVQCDFCz/Iw7o23mlYOngzTXkqbPtrqpla8aEkwr8/n4UzRWehx3wzexMCl4FUc1dm4eJT7iVpQvvKunnhqiSqNIGRpDq50ZpY/79J/xX6W27nrCftUhqrL90b1p0yLOjqdL1t/4HpHE12STfB3fFoEE8NoUtJJlNOZJOWDJWBBvq7Uwr6+9S9LlRsqkiN2BZCQjndAfmD2E6JuziVLD+kEEwu25C3294EQ6lelQIru6b9Z7dorJSZPLngSssubD7Nf5mrc/9lQ51fr//r++mNZt9r+3vnhhnzS9FrqDoKKExdIJD6FEZGHsNHoQV7rzrpX/l5E3ZT7R9OZgzibZOjb5CRarN+8/zSrneVRDP0ird7A7uy3GSdCvWBZF7IM4lDX1k4Nk9Kf4ZxeeDig+uw1LQOEXf8ope44HKS0M5564miVcspj6iqchpPDceDUANZXspvc3SmVyVhFQuyVqM2fvnyFFdypR730gFY1PvXORc5wTw8HYChDQEQuEFgNkj8YvvpGhjHI1gfJQ0e3ptwmoXizPgN6S2HNfAQ8tBhkFM9VdKEwHcUlOWeUEtnrMXnQZFHyqLDjjYhjld2ky3jrYzAVDBu99LAUr4o2eoLzw5IHpW8z0j067bWBF0jjZ7Y7jr+bhAzPE8YFXzRdyGPhDeATKYBOLwvIbCbUVocvHBrKgBlhhWO48dQWtQWZQUvZmFl15EZk+Bf/Wwl2HEBgxr5nv0fkn0sp1Mx0VTx52WmsVtUMK1mJNxqVg13iJwDfohbCA2MSKIC1yDhD2xGNmm/d72r0QRmCSXvN5B74FrqKNS+eIlcmUvA0m8wDrEDurf8Zs0wjBeZtfGmscagFcnxvATv3oBYtV77qtQJX8U8WskvS4ajURaNB1mzEtH+LG771m1PezKbGZtG8NM9Cp4KkEKG2MMuAzoUN5ACg9YyA8YkVafIoGD8Ey/FG6wBmKOmBOoe2K5rkUyvv3sPFQiq+FCynyGNvvFM5NwOHNbyNNeBHmFAx6gWjQOpJUsbCk+07hmVFuIr4sE0mveWkZ/eodiCFlkGIv95/kzOC2E/ePxpetu9umh8BL+4qHAVx3wQzos3gHSzk8KFjSLrnMAFD0tmdQgtZQVaZ2EWlt3p8GD/XLR9oaf1OxIMTvEl58xk3xnOfHlnuOFsaWhL4b5XPigoEwjMoDQQZZ10M1om40zdXXQHZTeQPftontYuO36j2cN597VrJwCXKlY+bdzgPrBf0XdvpAnL7zzy1DQVsVY7/Fth39pkPxZHu9xoE2oS8gRopmteyNwlBOFK7HGt8cWdB95G5XCJJjbx5Tqk1oJA5oHyQ2M+7nTBFeGuamBXDcCW2FPR23GiFCVGgQlJHfV2saTs3aw+F3GUwxn1IkqcC3V4vSSdYhwIu8Wc0f424m7ODJLfZCW7f07vu6R4+PPtcDdWQB+jFoQ5qRrWL50Z0dw4KA8a0zjpDYY67+6ixi9xu/6G4z9bian6YPSFRfxDnCeKcEnU323cNyTqJlUOu8D9ydA9Rk+9MdyZVBZJVkxACPL+sIbjOzDq1MpY4xfMXFysMjbCoNbVPXEjWy3fIVGe5w/6pwGZpftBfeVPuAjUlBgWsR6gU+ia2lWOTu8PkEEhJTih7c+usGp0M7n2y6xBOGoXyALQDeOwyBVkr6MLlrnufuRGeF9Mru7Xiomch878JWfynOc2Ae+DH2rNI5O2xa4TDsZvxFMP6sr9AXU2bvPuUJLhU9WgfsG+TUJloVHFYO5mftEsZyuk1tm9jwgDf8B9ulLF6F33ecz8JhoGXbMcxxpnvnZPgS4YP5qOm2v2mo/pw3nmpscGm5X7tFQls1EqvDtjRjyQzt9eRAMUPBFUWZZRUaxEhT71S6rSK0cuc6Fav5+k9je7GZTf+dS4DGYtvQl8tEXMD5Wso/hHCIH5Tgi7Mlr6ZcFt0hJHbN7NSdRtXwZnt4ZjO8TlusolBT+BrqjqqlhT1HyKYfrkUoKc6lSvWp46isVir9qbeS2tMBj5pDuHqFW/CJiW0vhXNF9exw89MEczIPfs5EgNl3uh4EXUB+ulR5eDQmNr4tpG0a12FEwk+qZe1mnTaCz3KvKrOFWCmNHszv+X26mjPn2CA65uySTbS9M1Qwq001dQWmORs1Rt0sao57M51Ntk4iBjBeEpTMK837Tv1xmmYM/6lmHjj9Rkv35rDgHSyvas7Xo8SBLkvRgSZzk6r4sPbmUX0PJX6uMXGJH/LYNQ/31cfvOjfK6y9oO5PP42Kjbquqx/EvLLDcx3JOIiGHQvGQvkbQnEplfQo9n8DkcUZMW1tJy4wlOVFsWFQdwuZUhjBOg7Qq3QNuLsc1euiHR3HZDs5buqsPXZ4O8a6FZGuIq1lupm4HCOsjwyUryyasrwP7w0qe5ocbNKexzTHup6b5cFJQXrZrihJVghtN6bPGFzwHYHUUw7yDz23VLeGlpACLjVr7ofWf2wuePNXEa8ASTCeSLPDZDx1tCEbIxCpbsidtQdWTY76dGZKPkBZOKzkwr3imZznGIOcXKtIonX8ECvkG8hOUPWsYNgVyi3TWX4XOmfv7dKg1LuuTZ9E54qaKzNpjEv+k7zH/3WI7xeFUvqjOrRe3x4rUoST6+Vv1muyRAEMjtJYq72xA2Cgh/eZE+uRwyzUHaSYoPHdwwK24X+442pqmsEhTjTq9v3g+8SSeGvvkbgH7uQgaVM6FKLsgIWNu7aGbiQPeuUynx7qsoFRtef4Bbp85NWxW5f+HCC7pJMEJ5cOaF6miORJpBr2oafOwO/EwZj720Jj1dBWxE0GamcvZN796kwciXh/aeu4NCwQxos48Imtu4D8bmdZQY/s4lPDsmQyQbX9pUByLd+Qupi0YxklH1xTRroG/43pZc65vPrPYXKj1JTDUhsQkWEgAvtQqUNYI3+ovFFsMg62SU1uIkuaGBfD5xSmJB8kCQo/BJPTLT8DtwhyfdeAaEHY7Mn1oQmbj+rHOX3PyoG5TIEexTGK4LulhzHosp4OO9Ez7Kjw16PLkLV0fSPSar+9PQLmbZx1doWAI9pwrDEAOx9pk18n/FL14E/sfM6ivplp9ZHVJgpSuy6zgAYzcEN4i8Wpjr3qbyh4l2l50c0kPfRg4hx8oP1TP/b5v3BcKFYLvneSZS4rFRieMX/jPIknY+j79JaK5KPCcYODIeSc9Ls/qGvv8iFNmSCnvxzTnkm5Rg2Vh/Bhk4J+34mlV1I1OpRfBv23y5i7vL+cgmKp/3KL0O4rkvUqN/OPgK4kNwjalIySQWGCaFGN1gKI8xIXuDA3/eVtce/ux7ENLxPB4UzysbyHZXCMn3vj6pc2Hoj4ONjiERRw/BgUfWVyFgj7w/eS7HDfb+Gbx256m3kkOaz2Fm9uMPPggZP/dhFwQNBH2Fb4FXLZ0h//I5/F6GDeDx9AfKwO+pQySbAhTA1m/O7KMz4RJrqtRjuaM051mqKuAn281c69yvd5VFi7D90gSwrmUuKSZy1m9ckDZP+BBPsXkq7kyUi57SQKAOvisAq79subKQlQUpx9lzyvcLOK27umcTolAVYqeTIBsSEmpQq6fsCsZZsrAxr2N/Yg9HjWMXOcyeeSf+CVMzVImuzjpjnd3NA0Bxt0YG8IN/A/wX/VpWReMhFOHUMvkmUv8yE3/1iNI7fWDN7i8z4RLAlv9BqlgB2vWL+6Kkz+EH4y+6Bnd37v4rl4LB/tk+7Ggeg3GCn4bQZPxaM+bV3IR0m1EUTPyx7Y6MWb0P+vPBW0WdSOcVyPZXG84u7WKm5UNoGQFSm/Rjij8WlNMj/ro+7xa6VIrZbmDEKr0yAgOSV+rVRqnR6JD1V5vxVuSoT2ydQsjfAJWFdEvO7PYGzxappWuRsC+tuRm/tBPqqTWCjW2/TYrNUpGrz83S0ZMk9vlRNWyzLG4DDlkrcZZ70L/ctMQ0X5rSLhjxxFAb1eOXXeDv7sR/5uker3IY7DVDtSOErDEbZqHlnfWfknb0JJcHq6mfzyminhtRAe5mlQbruDZGeYneki/CPFpn1EaXrB52FNT+EdoAdgNEl5mOBdev/cxGRYe4u6fuCaX0+RiRL7bY3aRG7O2D4t0XxuM1bQ8BgmRsVtGFSi/YFNK07uWCYF8Cqmf7CqSoUHlOeoJ/ObSIPNj8qkCa5zwofO8lfFK0PrONusyu5bTv+AZ7Gn9bDSYNlpe+o7V7A+0G6trnfUxcsLok2VdORcVGgTOj64GFGJFUMYPTVI0PDhigabGBd3Ow9IJoXI/5Co5S1l/pfQzGi+r2t2R+CXb/xgonWUgdE2P2XlJhBdUvnq/0I4RIrQnL58kXEWhKWotprqQz9gPqRJhcL5uBPUA9sQ+P0af6ixFfvGNnLltd5stYnszIjHJkXrU2+UzXboHe6abI3E+mr0uO3Jv2y5YHFNYBfmbgrztpbbWxWRATY8ffbiSyGzeo64f2CLt/JkUGWwW4c5ZRcJcqrjhBvOEv4AY/HQ7Qx4A2O+T/d9qBdH8y3CZzJus0fbnTaKsXwlLxy44Y0VnFMREN0x5pUoCzQ50O4EVKbn9weUAcBzyxI9mJwdcn6CsYRXeX83e8jGRu3LqAfnwFLleFTzySFmWRpsz6M4e+gAAIFjK907y4rE6tAawCO3UG8Z/LBBXSafsA2cf14gX/e+ETPAMzpXvi2kRb0Z8b3UHhMHpLA3p98UO0M4QzayqLwG3QijCtIpMH37P72Mapoinp2U8jJV8RJ+epwS7GCusiG/vzpd7AIErI8RkyTOuDbEB4IvY25xTrNlCue/iYLnHvyTQiNtj2Pr2ZVq9UTt3iT3xNn0QZ2EorIAbIFjaGhn50JISZjcFl2uEdif2XwKEmes5rNYbwCF4Fce4H3manSmt5deL3Ft/bYANoHy4qc4WDsLo9YGvr/cQ0g91PQ2w+Wh+vBZEEoYr5kxYvQG+4ZGclkXfeWVxOApyMUjcYj83PXtNwFeg66Fdia4b+jrbgRRZaGxWsmESrMuSB8lTQ8325Nv2JzI9AkxJRGsRZgLsJBZWmijdyfiM6KLOkQDC1JdkWTGiiyTIQqjeXv2qHtpyGDj8B9OUh9ZXETCGz3KBq8szzB6sGg1iKUKyQQpHc/Re5SxbbEmIe97/6nHrHcb1X12uoLo388NTS1jJyzodOeRzNA93CHK/cec1ywo5YP+5HXbsRv1kSlUrb3g4YE/JpwzbO7wHsir9O6aSBet+21c6drBT0WaZSMDcTs0M6Xvlq+2bvm9HETxyX5vCv0cTzn3z6qNxEfrCRvZ6K/D3dSy6ZnuLWO+neH2d4Jnc7Gkq47XIGJ7KMDmiX0r+shMo4XN+alwZ/l9Z5SUcO8IwH8FCdTitozUjhfVZDd5Ar/Tux9GkLNI1eNTU1OgwgGyGV4eeXEA4sUVV1DbAU0e7QKoejaWiA81B1WKKfp3KewkTOAQ4KAgFNFxCVA2q/A0diLP2kfzsfVi4bfaR1L/agcUkoA2m4pQmt/mC8Ih9BuDM/AcxxzMB2lNjThMpffDiW5MVmGxxl8tyibC8PD5cHvrD54mO9fy+Nr414gBi0UZ/Gx6yW33FT38YH1TfbqYkJL5DTL8f+sy/aCxIPmckY2rvqixoWshZOI9gxNH+BV9UdKJTxHhd+wUa6Zf+GJ7hNfrze3KKb7fLEmedBL+DcHZUfiqqdSxDMqsFF2NLQAl9f5ReIXYigYf2zM++wsF0HpUgYixJ2pK0Aiwpd556DE/gw+CgUvDi58ygV3q+Z0ZZCQPBX0xWAKLouLiwcBLkRhDqIa04QolLt7Iw+H6bsjHX5FFgl90DjfMPBPOMvJ5tq//47d6YG4nT7Th500wv28HV/JPID19NyeUrtmgKPITScL0vRs68go5KOLzt15SeBguFB7Vjt41hJa+/bAUhzCMTG+sgPHPC55zajDrxlaJkz9P0Bu/VM7cVFRt3bTBau9ABcrj/v7yVKS/XESiCFWB7MoAqbVIpz1d2lWSIC+1m4qva8lHtdOWK7jMk0rAeQva4vBlJlZVbFU3sLViTkk/aHPD4xWeirtJWAqL3ARw5tiaPRL9ZoX//Kn7rPIseZ1aY+078dmm1ixZCgpKyxHpTZBctE0+HyRFM8bp9lpMW0B2GZHk/qKq7zKaFrA6ElpVMPl5AN5DKPmwjoASfwAM84yaMhxxdg/uMT/DsgmOJ63u2c3RoY5n/B9L9HIRfLVlz/xzOWwf8+uXm4hGIcim15VzHw7z/A/3Vs9b+nXYPIv5/P/zk7GgL+bav/49xoGPn32Ojk3/Oqq/++9/8c6fzZhH9Odf6vj/9zevTftf84gxvm/jc= \ No newline at end of file diff --git a/doc/bgp_error_handling/images/bgp_error_handling_flow2.png b/doc/bgp_error_handling/images/bgp_error_handling_flow2.png new file mode 100644 index 0000000000..60a68ff85b Binary files /dev/null and b/doc/bgp_error_handling/images/bgp_error_handling_flow2.png differ diff --git a/doc/bgp_error_handling/images/bgp_error_handling_flow2.xml b/doc/bgp_error_handling/images/bgp_error_handling_flow2.xml new file mode 100644 index 0000000000..3e092ccc59 --- /dev/null +++ b/doc/bgp_error_handling/images/bgp_error_handling_flow2.xml @@ -0,0 +1,2 @@ + +7Vttc+I2EP41zNx9CGNbfgkfAwm5XK8Xhsxd23zpCFs2amSLChGgv76SLRvbIiH4TCAzZDLBWsnrZffZF62VDhjEq1sGZ9PfaYBIxzKCVQdcdyzLtC2rI3+NYJ1RPNPJCBHDgVq0ITzg/5AiGoq6wAGaVxZySgnHsyrRp0mCfF6hQcbosrospKT61BmMkEZ48CHRqX/ggE8z6qXlbehfEI6m+ZNNt5fNxDBfrL7JfAoDuiyRwE0HDBilPLuKVwNEpPJyvWT3DV+YLQRjKOFvuWHUx1/v3ODCDO8mj+OR/ePxJrmwMy7PkCzUF+7fjpS8fJ0rgdFFEiDJx+yA/nKKOXqYQV/OLoXZBW3KY6KmdbmUqM+IcbQqkZSct4jGiLO1WKJmHaUyhRlHQWZZMoBaMS3pHigaVCaPCr4brYgLpZg9lORpStI0JIw7k5cEJ09VdRRmN8SAwAkifeg/RalKB5RQljIAvd5gMBRC9jlkEeIjxLAQHjGpZ5xEYpErmdEF89ELk0LfbP2nfFLXyYd/KSnSwfVKiZGN1mr0osEyUXajBwUVH9LNWjKcs8VwOY0hAjl+rnreNmuqJ4woFhIXqOlVUQOsGh6U8rKbyo5S43NZ5XPpVNkoA9XZpMgqvnNzsPV+BWytQyDT2SviAu+NWMkD+omApUgwysq22xAtRdhZV1H4TnDJv0YJL49owuDRY7jpnVYQN62Tcqzd/gJOyl+A3VJ0Be5x/QVoMAhn8Xyd+MHRXQYYJ+YyeuFzNRr9fd1/0XH8tXCdALHd6ppkuv02KQhFVXS/4IILak+vtr1Tr+42vTqH0utp5fgTiTC211KEsd1e1zlmjLH0GGN2xfgBJSLIGJ+Y0M0gMT9LiQI97IhVV3LnKv2JwPkc+412FMP05wMYvla3m2bTSszcwejQZnfObq1bxTWNmjc2dmzXPGrpYLlb7OsSoa5+SFM5N4Z2/13QfOJinna1rsQC05itNpPiKpKfN+Px/ThNqxk3IV3GMJv+YNnWqe2fHa975Hxr6XXMPfOnMEIKKMcs+jyrqq5cM8cq+qzLk4piO/dJuXFPJNx5bktVjAtqNcz7tqEsvUa1SjXMTZJj7ZCVy5wz+oRKM2bPMAD4AFlPq0Uab5ftHYwOjIM8zpRwAM44aNw2Metx+82bGm8Ho0PjwNRwYEscfKcch2cEvIwAzXB2QwTUM4vG6NAIsDQEOBIBY7rgSHzeJXMOUzZDiMkZGvvXCmbTdxD1jZHG6NDQ0BseblcBYcFQjgXsC23R5IyI1zbLVUN6bZUNdUaHRoR+sKAnEXGNCEqjxblw2K+AtJrGhjoSNEaHRoLeFesdvnAI058PjYB64WAZLRUOGqNDI0Dvm3nn7NDGZqLeAG3cKX/nTirYdqJqv05qb1sj9TKtR1FMn0spxhgyGouP3xBL5OHEt/VYOVrxGs7quCkQFWJCttEhwVEiYSvwlHZpZVdQwJxcqYkYB4F84NY246YR+WqDbI8TGXa1o1ScTCyht9hWluFbbzm01msEp/UidGevEZxWr7Huxo17jXVGrR15+zFH7H7yjzyYaxlpbM4YioCv/DjfPeJi9xgWaQEnUm+QBUuYjoOF/MuptB6jEYNxLE8/WsZE6lyd4RC6T9lM5NokTSwoyG+jedv/E0OhcElhSpFaWPqYJCAZty/frj93FUiIcMkpJQFi1XPECYyV1X5ChuFEoDWlv37Y1inZUvnHHiEnofI1TSXaKFIeaAgK+b5hRsRKFpI0gU7FOpS0dZalCk1bjzVeS7FGDEs42x5+LMASGH3tfXdABGxrNWTuzwu9pbX3+zxzWxoyjW1bnbflnV3VzkerUYqXV7/a8NQYNQ5KYrg5dZ8t3/zvArj5Hw== \ No newline at end of file diff --git a/doc/bgp_error_handling/images/bgp_error_handling_flow3.png b/doc/bgp_error_handling/images/bgp_error_handling_flow3.png new file mode 100644 index 0000000000..be435a6a19 Binary files /dev/null and b/doc/bgp_error_handling/images/bgp_error_handling_flow3.png differ diff --git a/doc/bgp_error_handling/images/bgp_error_handling_flow3.xml b/doc/bgp_error_handling/images/bgp_error_handling_flow3.xml new file mode 100644 index 0000000000..930be9b53e --- /dev/null +++ b/doc/bgp_error_handling/images/bgp_error_handling_flow3.xml @@ -0,0 +1,2 @@ + +7Vtbc6s2EP41nmkfnEGIS3j0LWl6ek489nTak5eODDJmgpErcByfX18JBAaEDw42sZPaL4GVWKTd71vtSqQDB8vXe4pWi6/EwX5HVZzXDhx2VPbTLPaHS7aJxAR6InCp5yQisBNMvR9YCBUhXXsODgsdI0L8yFsVhTYJAmxHBRmilGyK3ebEL751hVwsCaY28mXpX54TLRLprWru5L9hz12kbwaGmPASpZ3FTMIFcsgmJ4KjDhxQQqLkavk6wD43XmqX5Lm7Pa3ZwCgOokMeGPe93x8MpwvmD7OnyVj782kUdLVEywvy12LC/fuxGG+0TY1AyTpwMNcDOrC/WXgRnq6QzVs3zO1MtoiWvmiWxyWG+oJphF9zIjHOe0yWOKJb1kW06rfJEwIzuoDMJucAYdRFzvZQyJBwuZvp3VmFXQjDvMFIpmQkyULMuSt+6XvBc9EczAp0+ze7UW709PZ7vm3IJ61kd1txl8GF3/hohv0+sp/d2BUD4hMavxha1mBwxybXjxB1cTTG1GOTxpT7xwtc1sngysia2nhP416HJSrr0YOdAodkt+Ycp1c4LpVR7KPIeykyr8qb4g1j4rERZ6ixiqjJMJJqEEZIHsoTpaQHgKIiq6RHWLqsJ4ZWNunmaLPOgbbmGEgD9YWAQIUlFOgNUVAOQu+MgtSsORg84RlFZ4/NwLqs4AzUD8YXeFF8gcaJoia8PS9foASD+WoZbgPbOTtloHphlJETmt54/M+wv5c49pZRx8G03lyzxLZ/zDJBlrU8riOmBZ/OrppRa1ejyq56W3a9rKX7QiKMdqq8TLPMG/2cMUaVYwy4YfdTHLAgo/xCmW0GAfiVj8iRww7r1eMVKeeTj8LQs4sAODDjv4t/vH9EyTPOtQBLUSD8AJAoIQKUI93BORqsUdQ2IPQr4WWvGKmeYwlvaNaZCW9U+Nfwmbn6cxKPdOdo4981SRu6YbyP1WMdgLJ63TWyK5f/HU0mj5N4wU20sdElCpPmD7YOG2rR4emie651WJXzm0dqL5CLBUzOmQyasNZY75oMprnphcSw2vopde6FBDvTPFWwM2Ax1JXY0Xaok3NXNZfbjIIUa9eM5qBMpOm2U7kelxS1jIM0zuRwAK84aLydAszGxU6NorZxACQcaBwH30jkza8I2I8AqdxVGiJAWlnKitpGwHU345DzhabrvaRIe1/3avLR71trG6uqtLF4lBhjVqOoSo/vghxW4VwDyL5jzDevIPsePB2ghsHLnWLcG+Ppw7fJly9fF93J9668YLzZyZV+qPDVxST7TTeupC2S9jauKn2lHs396n0NnZP/IQgjFI8hXNs2DsNc2nBYMGCldFREBsXs1WgWd+ARYMUtE9tK73f0IZMg33MDDiwGlXiDhJfkno38nmhYeo7Dn/9J9JjHv2xEUlGffWYkRtLJPu7JQ3Q/O/ZuCig3KjDMAiRSHx0JWVCsKtXi82Q+D3ErCJO3yj9dNJBIXF7BD675ymGlvVSg0ldyJmAUeTyVePz5CQvrCKuboIpaR4Kqa1RGgfYJe8BRxkcnrMSzxt8GaTWKWiasvL9t/qQ2/4Tk1GvICfRSdZVmp00R1D7/qo6aPhn/JNoYDfkHlBpFLfNPPjK5/X/xz6jjn5aeJ5xmcTyGfux295F90n33rwpw9B8= \ No newline at end of file diff --git a/doc/bgp_error_handling/images/bgp_error_handling_flow4.png b/doc/bgp_error_handling/images/bgp_error_handling_flow4.png new file mode 100644 index 0000000000..aa18f18ba8 Binary files /dev/null and b/doc/bgp_error_handling/images/bgp_error_handling_flow4.png differ diff --git a/doc/bgp_error_handling/images/bgp_error_handling_flow4.xml b/doc/bgp_error_handling/images/bgp_error_handling_flow4.xml new file mode 100644 index 0000000000..b06c7e9f64 --- /dev/null +++ b/doc/bgp_error_handling/images/bgp_error_handling_flow4.xml @@ -0,0 +1,2 @@ + +7Vrfk6I4EP5rrLp70AIiODz6a+Zm63bHdepubuflKkLE1CDxQhx1//pNICAQOJTV1anSF6QTOkn393U6DS0wXG4fKFwtPhMX+S1Dc7ctMGoZ4qfzi5DsYklPN2OBR7Ebi/S94Bl/R1KoSekauyjMdWSE+Ayv8kKHBAFyWE4GKSWbfLc58fOjrqCHFMGzA31V+oJdtoild0ZvL/8DYW+RjKxbdtyyhElnuZJwAV2yyYjAuAWGlBAW/1tuh8gXxkvsEj93X9GaToyigB3ywGSAPz1ablufP85ep5PuX6/joN2NtbxDfy0XPHiYyPmyXWIEStaBi4QevQUGmwVm6HkFHdG64W7nsgVb+rJZnZec6juiDG0zIjnPB0SWiNEd7yJbzbv4CYkZU0Jmk3GANOoiY3sgZVC63Ev17q3C/0jDHGGknmIkxULcuSvx18fBW94c3Ap09w+/0Tpmcvst2zYSi9bSu528S+Eibnw4Q/4AOm9e5Ioh8QmNBga2PRze88UNGKQeYhNEMV80osI/OPB4J0soI2vqoIrGSofFKuvRg9wch1S3ZhxnljgukVHkQ4bf88wr86YcYUIwn3GKGjuPmhQjiQZphPihLFEKenQ9r8gu6JGWLuqJoJUuujna7EugrTkGkkB9JSAwQAEFZkMUFIPQL0ZBYtYMDF7RjMKLx2bdvq7grBsfjC/gqvgCrAJftIZ8AXeX5QtQYDBfLcNd4LgXpwwwrowyakLTn0z+HQ0qiePsOHVcROvNNYtt++csFaRZy9OacS3odHbtWrV2tcrsap7Lrte1dV9JhOmeKi/r2r2OeckYY6gxxujw+2cU8CCj/Ua5bYaB/rs47opjcMH1vFdfnEgFn3wYhtjJA+DAjP8++on+jJI3lGnRbU0D4ANAooAI3Wiao4EaRecGhHkjvOoVK9m4fpbwlmFfmPBWiX8tn5trMCfRTPeOtv5bk6ShHUZ1rD7voGur7b6R//PEdTydPk2jDTfWxmcXK4ybP9g+bGl5h5tpnL7UTmyoGc4TdRbQQxIol0wHe91ezlyJZS6VDibZ6ZVEsdoTVOLcKwl3vbu8O5uHOwvkg12BHecOdmr2CjLZzThIsHbLaQ7KRbpND9JWjaIz4yCJMxkcdG84aFxQ0a3Gx50aRefGga7gwBQ4+EIYnt8QUI0As5AO6U03hJ5do6gCAdwPcJfpthIdwuoJW2b5OHtAxRpPC69bseSQ1xdNsaMo+sW7SPe6ksorca9SbW/q3tqy/bndq7LXEpvDlKwZ4tfHIGQwUjOKPgiZQ+zf9o4DDhOFkN/4dYxShDkQINV7x6khZKp5Zq+TAuY+AxjscJOS4Aab6h286O2mFVXQq1F05rhiqklnUiNz8XurrODG0Ja1oY+9IK64OdxHiKo1t0RPuIJBqaKDKnd3EUQpWfHLZxSG4vOxfREv1p0fj4ujqVeU9sT0C7DNwTAgolQ3mGPfL8OsXPhIrhoMRB2KE8bvy4Yldl0xVmlha1/6+t/ds/mrzjTtqCluFWHWpLhlPnwbvXz6+n37ZbF2vr69euP+3+0yPB0ZRkqZXmKnS5FfL543jt4zqh48HdlLfaN+fHhsdd0uo6guKDpBIgZoL5gtXAo3hxbaVTZSxIeDs6iDIIncEHlvc9AyR0dRsHoPmke/dEYK4dIPXOVMWiljsjCsZkAlYbVO4aCQMOZnX/HlIZl/nMznIToSUvx2/61s3H3/xTEY/wA= \ No newline at end of file diff --git a/doc/watermarks_HLD.md b/doc/buffer-watermark/watermarks_HLD.md similarity index 100% rename from doc/watermarks_HLD.md rename to doc/buffer-watermark/watermarks_HLD.md diff --git a/doc/crm/Critical-Resource-Monitoring-High-Level-Design.md b/doc/crm/Critical-Resource-Monitoring-High-Level-Design.md new file mode 100644 index 0000000000..be51319390 --- /dev/null +++ b/doc/crm/Critical-Resource-Monitoring-High-Level-Design.md @@ -0,0 +1,312 @@ +# Critical resource monitoring in SONiC +# High Level Design Document +### Rev 0.1 + +# Table of Contents + * [List of Tables](#list-of-tables) + + * [Revision](#revision) + + * [About this Manual](#about-this-manual) + + * [Scope](#scope) + + * [Definitions/Abbreviation](#definitionsabbreviation) + + * [1 Subsystem Requirements Overview](#1-subsystem-requirements-overview) + * [1.1 Functional requirements](#11-functional-requirements) + * [1.1.1 Resources to be monitored](#111-resources-to-be-monitored) + * [1.1.2 Monitoring process requirements](#112-monitoring-process-requirements) + * [1.2 CLI requirements](#12-cli-requirements) + + * [2 Modules Design](#2-modules-design) + * [2.1 Config DB](#21-config-db) + * [2.1.1 CRM Table](#211-crm-table) + * [2.2 Counters DB](#22-counters-db) + * [2.2.1 CRM_STATS Table](#221-crm_stats_table) + * [2.2.2 CRM_ACL_GROUP_STATS](#222-crm_acl_group_stats) + * [2.4 Orchestration Agent](#24-orchestration-agent) + * [2.6 SAI](#26-sai) + * [2.7 CLI](#27-cli) + * [2.7.1 CRM utility interface](#271-crm-utility-interface) + * [2.7.1.1 CRM utility config syntax](#2711-crm-utility-config-syntax) + * [2.7.1.2 CRM utility show syntax](#2712-crm-utility-show-syntax) + * [2.7.2 Config CLI command](#272-config_cli_command) + * [2.7.3 Show CLI command](#273-show_cli_command) + + * [3 Flows](#3-flows) + * [3.1 CRM monitoring](#31-crm-monitoring) + * [3.2 CRM CLI config](#32-crm-cli-config) + * [3.3 CRM CLI show](#33-crm-cli-show) + + * [4 Open Questions](#4-open-questions) + +# List of Tables +* [Table 1: Revision](#revision) +* [Table 2: Abbreviations](#definitionsabbreviation) +* [Table 3: CRM SAI attributes](#crn-sai-attributes) + +###### Revision +| Rev | Date | Author | Change Description | +|:---:|:-----------:|:------------------:|-----------------------------------| +| 0.1 | | Volodymyr Samotiy | Initial version | + +# About this Manual +This document provides general information about the Critical Resource Monitoring feature implementation in SONiC. +# Scope +This document describes the high level design of the Critical Resource Monitoring feature. +# Definitions/Abbreviation +###### Table 2: Abbreviations +| Definitions/Abbreviation | Description | +|--------------------------|--------------------------------------------| +| CRM | Critical Resource Monitoring | +| API | Application Programmable Interface | +| SAI | Swich Abstraction Interface | +# 1 Subsystem Requirements Overview +## 1.1 Functional requirements +Detailed description of the Critical Resource Monitoring feature requirements is here: [CRM Requirements](https://github.com/Azure/SONiC/blob/gh-pages/doc/CRM_requirements.md). + +This section describes the SONiC requirements for Critical Resource Monitoring (CRM) feature. CRM should monitor utilization of ASIC resources by polling SAI attributes. + +At a high level the following should be supported: + +- CRM should log a message if there are any resources that exceed defined threshold value. +- CLI commands to check current usage and availability of monitored resources. + +### 1.1.1 Resources to be monitored +1. **IPv4 routes:** query currently used and available number of entries +2. **IPv6 routes:** query currently used and available number of entries +3. **IPv4 Nexthops:** query currently used available number of entries +4. **IPv6 Nexthops:** query currently used and available number of entries +5. **IPv4 Neighbors:** query currently used and available number of entries +6. **IPv6 Neighbors:** query currently used and available number of entries +7. **Next-hop group member:** query currently used and available number of entries +8. **Next-hop group objects:** query currently used and available number of entries +9. **ACL:** query currently used and available number of entries + - ACL Table + - ACL Group + - ACL Entries + - ACL Counters/Statistics +10. **FDB entries:** query currently used and available entries +## 1.2 Monitoring process requirements +Monitoring process should periodically poll SAI counters for all required resources, then it should check whether retrieved values exceed defined thresholds and log appropriate SYSLOG message. + +- User should be able to configure LOW and HIGH thresholds. +- User should be able to configure thresholds in the following formats: + - percentage + - actual used count + - actual free count +- CRM feature should log "SYSLOG" message if there are any resources that exceed LOW or HIGH threshold. +- CRM should support two types of "SYSLOG" messages: + - EXCEEDED for high threshold. + - CLEAR for low threshold. +- "SYSLOG" messages should be in the following format: + +```" WARNING : THRESHOLD_EXCEEDED for <%> Used count free count "``` + +```" NOTICE : THRESHOLD_CLEAR for <%> Used count free count "``` + +``` = ``` + +- Default polling interval should be set to 5 minutes. +- Default HIGH threshold should be set to 85%. +- Default LOW threshold should be set to 70%. +- CRM feature should suppress SYSLOG messages after printing for 10 times. +## 1.3 CLI requirements +- User should be able to query usage and availability of monitored resources. +- User should be able to configure thresholds values. + +# 2 Modules Design +## 2.1 Config DB +### 2.1.1 CRM Table +New "CRM" table should be added to ConfigDB in order to store CRM related configuration: polling interval and LOW/HIGH threshold values. +``` +; Defines schema for CRM configuration attributes +key = CRM ; CRM configuration +; field = value +polling_interval = 1*4DIGIT ; CRM polling interval +ipv4_route_threshold_type = "percentage" / "used" / "free" ; CRM threshold type for 'ipv4 route' resource +ipv6_route_threshold_type = "percentage" / "used" / "free" ; CRM threshold type for 'ipv6 route' resource +ipv4_nexthop_threshold_type = "percentage" / "used" / "free" ; CRM threshold type for 'ipv4 next-hop' resource +ipv6_nexthop_threshold_type = "percentage" / "used" / "free" ; CRM threshold type for 'ipv6 next-hop' resource +ipv4_neighbor_threshold_type = "percentage" / "used" / "free" ; CRM threshold type for 'ipv4 neighbor' resource +ipv6_neighbor_threshold_type = "percentage" / "used" / "free" ; CRM threshold type for 'ipv6 neighbor' resource +nexthop_group_member_threshold_type = "percentage" / "used" / "free" ; CRM threshold type for 'next-hop group member' resource +nexthop_group_threshold_type = "percentage" / "used" / "free" ; CRM threshold type for 'next-hop group object' resource +acl_table_threshold_type = "percentage" / "used" / "free" ; CRM threshold type for 'acl table' resource +acl_group_threshold_type = "percentage" / "used" / "free" ; CRM threshold type for 'acl group' resource +acl_entry_threshold_type = "percentage" / "used" / "free" ; CRM threshold type for 'acl entry' resource +acl_counter_threshold_type = "percentage" / "used" / "free" ; CRM threshold type for 'acl counter' resource +fdb_entry_threshold_type = "percentage" / "used" / "free" ; CRM threshold type for 'fdb entry' resource +ipv4_route_low_threshold = 1*4DIGIT ; CRM low threshold for 'ipv4 route' resource +ipv6_route_low_threshold = 1*4DIGIT ; CRM low threshold for 'ipv6 route' resource +ipv4_nexthop_low_threshold = 1*4DIGIT ; CRM low threshold for 'ipv4 next-hop' resource +ipv6_nexthop_low_threshold = 1*4DIGIT ; CRM low threshold for 'ipv6 next-hop' resource +ipv4_neighbor_low_threshold = 1*4DIGIT ; CRM low threshold for 'ipv4 neighbor' resource +ipv6_neighbor_low_threshold = 1*4DIGIT ; CRM low threshold for 'ipv6 neighbor' resource +nexthop_group_member_low_threshold = 1*4DIGIT ; CRM low threshold for 'next-hop group member' resource +nexthop_group_low_threshold = 1*4DIGIT ; CRM low threshold for 'next-hop group object' resource +acl_table_low_threshold = 1*4DIGIT ; CRM low threshold for 'acl table' resource +acl_group_low_threshold = 1*4DIGIT ; CRM low threshold for 'acl group' resource +acl_entry_low_threshold = 1*4DIGIT ; CRM low threshold for 'acl entry' resource +acl_counter_low_threshold = 1*4DIGIT ; CRM low threshold for 'acl counter' resource +fdb_entry_low_threshold = 1*4DIGIT ; CRM low threshold for 'fdb entry' resource +ipv4_route_high_threshold = 1*4DIGIT ; CRM high threshold for 'ipv4 route' resource +ipv6_route_high_threshold = 1*4DIGIT ; CRM high threshold for 'ipv6 route' resource +ipv4_nexthop_high_threshold = 1*4DIGIT ; CRM high threshold for 'ipv4 next-hop' resource +ipv6_nexthop_high_threshold = 1*4DIGIT ; CRM high threshold for 'ipv6 next-hop' resource +ipv4_neighbor_high_threshold = 1*4DIGIT ; CRM high threshold for 'ipv4 neighbor' resource +ipv6_neighbor_high_threshold = 1*4DIGIT ; CRM high threshold for 'ipv6 neighbor' resource +nexthop_group_member_high_threshold = 1*4DIGIT ; CRM high threshold for 'next-hop group member' resource +nexthop_group_high_threshold = 1*4DIGIT ; CRM high threshold for 'next-hop group object' resource +acl_table_high_threshold = 1*4DIGIT ; CRM high threshold for 'acl table' resource +acl_group_high_threshold = 1*4DIGIT ; CRM high threshold for 'acl group' resource +acl_entry_high_threshold = 1*4DIGIT ; CRM high threshold for 'acl entry' resource +acl_counter_high_threshold = 1*4DIGIT ; CRM high threshold for 'acl counter' resource +fdb_entry_high_threshold = 1*4DIGIT ; CRM high threshold for 'fdb entry' resource +``` +## 2.2 Counters DB +Two new tables should be added to the CountersDB in order to represent currently used and available entries for the CRM resources. +### 2.2.1 CRM_STATS Table +This table should store all global CRM stats. +``` +; Defines schema for CRM counters attributes +key = CRM_STATS ; CRM stats entry +; field = value +CRM_STATS_IPV4_ROUTE_AVAILABLE = 1*20DIGIT ; number of available entries for 'ipv4 route' resource +CRM_STATS_IPV6_ROUTE_AVAILABLE = 1*20DIGIT ; number of available entries for 'ipv6 route' resource +CRM_STATS_IPV4_NEXTHOP_AVAILABLE = 1*20DIGIT ; number of available entries for 'ipv4 next-hop' resource +CRM_STATS_IPV6_NEXTHOP_AVAILABLE = 1*20DIGIT ; number of available entries for 'ipv6 next-hop' resource +CRM_STATS_IPV4_NEIGHBOR_AVAILABLE = 1*20DIGIT ; number of available entries for 'ipv4 neighbor' resource +CRM_STATS_IPV6_NEIGHBOR_AVAILABLE = 1*20DIGIT ; number of available entries for 'ipv6 neighbor' resource +CRM_STATS_NEXTHOP_GROUP_MEMBER_AVAILABLE = 1*20DIGIT ; number of available entries for 'next-hop group member' resource +CRM_STATS_NEXTHOP_GROUP_OBJECT_AVAILABLE = 1*20DIGIT ; number of available entries for 'next-hop group object' resource +CRM_STATS_ACL_TABLE_AVAILABLE = 1*20DIGIT ; number of available entries for 'acl table' resource +CRM_STATS_ACL_GROUP_AVAILABLE = 1*20DIGIT ; number of available entries for 'acl group' resource +CRM_STATS_FDB_ENTRY_AVAILABLE = 1*20DIGIT ; number of available entries for 'fdb entry' resource +CRM_STATS_IPV4_ROUTE_USED = 1*20DIGIT ; number of available entries for 'ipv4 route' resource +CRM_STATS_IPV6_ROUTE_USED = 1*20DIGIT ; number of used entries for 'ipv6 route' resource +CRM_STATS_IPV4_NEXTHOP_USED = 1*20DIGIT ; number of used entries for 'ipv4 next-hop' resource +CRM_STATS_IPV6_NEXTHOP_USED = 1*20DIGIT ; number of used entries for 'ipv6 next-hop' resource +CRM_STATS_IPV4_NEIGHBOR_USED = 1*20DIGIT ; number of available entries for 'ipv4 neighbor' resource +CRM_STATS_IPV6_NEIGHBOR_USED = 1*20DIGIT ; number of available entries for 'ipv6 neighbor' resource +CRM_STATS_NEXTHOP_GROUP_MEMBER_USED = 1*20DIGIT ; number of used entries for 'next-hop group member' resource +CRM_STATS_NEXTHOP_GROUP_OBJECT_USED = 1*20DIGIT ; number of used entries for 'next-hop group object' resource +CRM_STATS_ACL_TABLE_USED = 1*20DIGIT ; number of used entries for 'acl table' resource +CRM_STATS_ACL_GROUP_USED = 1*20DIGIT ; number of used entries for 'acl group' resource +CRM_STATS_FDB_ENTRY_USED = 1*20DIGIT ; number of used entries for 'fdb entry' resource +``` +### 2.2.2 CRM_ACL_GROUP_STATS +This table should store all "per ACL group" CRM stats . +``` +; Defines schema for CRM counters attributes +key = CRM_ACL_GROUP_STATS:OID ; CRM ACL group stats entry +; field = value +CRM_STATS_ACL_ENTRY_AVAILABLE = 1*20DIGIT ; number of available entries for 'acl entry' resource +CRM_STATS_ACL_COUNTER_AVAILABLE = 1*20DIGIT ; number of available entries for 'acl counter' resource +CRM_STATS_ACL_ENTRY_USED = 1*20DIGIT ; number of used entries for 'acl entry' resource +CRM_STATS_ACL_COUNTER_USED = 1*20DIGIT ; number of used entries for 'acl counter' resource +``` +## 2.4 Orchestration Agent +New "CrmOrch" class should be implemented and it should run new CRM thread for all monitoring logic. + +CRM thread should check whether some threshold is exceeded and log appropriate (CLEAR/EXCEEDED) SYSLOG message. Also number of already logged EXCEEDED messages should be tracked and once it reached the pre-defined value all CRM SYSLOG messages should be suppressed. When CLEAR message is logged then counter for number of logged messages should be cleared. + +CLI show command should be able to display currently USED and AVAILABLE number of entries, but SAI provides API to query the current AVAILABLE entries. So, OrchAgent (appropriate agent for each resource) should track respective entries that are programmed and update appropriate counter in "CrmOrch" cache. Also, "CrmOrch" should provide public API in order to allow other agents update local cache and then CRM thread should periodically update CountersDB from the cache. +## 2.6 SAI +Shown below table represents all the SAI attributes which should be used to get required CRM counters. +###### Table 3: CRM SAI attributes +| CRM resource | SAI attribute | +|--------------------------|-------------------------------------------------------| +| IPv4 routes | SAI_SWITCH_ATTR_AVAILABLE_IPV4_ROUTE_ENTRY | +| IPv6 routes | SAI_SWITCH_ATTR_AVAILABLE_IPV6_ROUTE_ENTRY | +| IPv4 next-hops | SAI_SWITCH_ATTR_AVAILABLE_IPV4_NEXTHOP_ENTRY | +| IPv6 next-hops | SAI_SWITCH_ATTR_AVAILABLE_IPV6_NEXTHOP_ENTRY | +| IPv4 neighbors | SAI_SWITCH_ATTR_AVAILABLE_IPV4_NEIGHBOR_ENTRY | +| IPv6 neighbors | SAI_SWITCH_ATTR_AVAILABLE_IPV6_NEIGHBOR_ENTRY | +| Next-hop group members | SAI_SWITCH_ATTR_AVAILABLE_NEXT_HOP_GROUP_MEMBER_ENTRY | +| Next-hop group objects | SAI_SWITCH_ATTR_AVAILABLE_NEXT_HOP_GROUP_ENTRY | +| ACL tables | SAI_SWITCH_ATTR_AVAILABLE_ACL_TABLE | +| ACL groups | SAI_SWITCH_ATTR_AVAILABLE_ACL_TABLE_GROUP | +| ACL entries | SAI_ACL_TABLE_ATTR_AVAILABLE_ACL_ENTRY | +| ACL counters | SAI_ACL_TABLE_ATTR_AVAILABLE_ACL_COUNTER | +| FDB entries | SAI_SWITCH_ATTR_AVAILABLE_FDB_ENTRY | +## 2.7 CLI +New CRM utility script should be implement in "sonic-utilities" in order to configure and display all CRM related information. +### 2.7.1 CRM utility interface +``` +crm +Usage: crm [OPTIONS] COMMAND [ARGS]... + + Utility to operate with CRM configuration and resources. + +Options: + --help Show this message and exit. + +Commands: + config Set CRM configuration. + show Show CRM information. +``` +#### 2.7.1.1 CRM utility config syntax +* ```polling interval ``` +* ```thresholds all type [percentage|used|count]``` +* ```thresholds all [low|high] ``` +* ```thresholds [ipv4|ipv6] route type [percentage|used|count]``` +* ```thresholds [ipv4|ipv6] route [low|high] ``` +* ```thresholds [ipv4|ipv6] neighbor type [percentage|used|count]``` +* ```thresholds [ipv4|ipv6] neighbor [low|high] ``` +* ```thresholds [ipv4|ipv6] nexthop type [percentage|used|count]``` +* ```thresholds [ipv4|ipv6] nexthop [low|high] ``` +* ```thresholds nexthop group [member|object] type [percentage|used|count]``` +* ```thresholds nexthop group [member|object] [low|high] ``` +* ```thresholds acl [table|group] type [percentage|used|count]``` +* ```thresholds acl [table|group] [low|high] ``` +* ```thresholds acl group [entry|counter] type [percentage|used|count]``` +* ```thresholds acl group [entry|counter] [low|high] ``` +* ```thresholds fdb type [percentage|used|count]``` +* ```thresholds fdb [low|high] ``` +#### 2.7.1.2 CRM utility show syntax +* ```summary``` +* ```[resources|thresholds] all``` +* ```[resources|thresholds] [ipv4|ipv6] [route|neighbor|nexthop]``` +* ```[resources|thresholds] nexthop group [member|object]``` +* ```[resources|thresholds] acl [table|group]``` +* ```[resources|thresholds] acl group [entry|counter]``` +* ```[resources|thresholds] fdb``` +### 2.7.2 Config CLI command +Config command should be extended in order to add "crm" alias to the CRM utility. +``` +Usage: config [OPTIONS] COMMAND [ARGS]... + + SONiC command line - 'config' command + +Options: + --help Show this message and exit. + +Commands: +... + crm CRM related configuration. +``` +### 2.7.3 Show CLI command +Show command should be extended in order to add "crm" alias to the CRM utility. +``` +show +Usage: show [OPTIONS] COMMAND [ARGS]... + + SONiC command line - 'show' command + +Options: + -?, -h, --help Show this message and exit. + +Commands: + ... + crm Show CRM related information +``` +# 3 Flows +## 3.1 CRM monitoring +![](https://github.com/Azure/SONiC/blob/gh-pages/images/crm_hld/crm_monitoring_flow.png) +## 3.2 CRM CLI config +![](https://github.com/Azure/SONiC/blob/gh-pages/images/crm_hld/crm_cli_config_flow.png) +## 3.3 CRM CLI show +![](https://github.com/Azure/SONiC/blob/gh-pages/images/crm_hld/crm_cli_show_flow.png) +# 4 Open Questions diff --git a/doc/debug_framework_design_spec.md b/doc/debug_framework_design_spec.md new file mode 100644 index 0000000000..7567e26d99 --- /dev/null +++ b/doc/debug_framework_design_spec.md @@ -0,0 +1,394 @@ +# Debug Framework in SONiC +# Design Specification +#### Rev 0.3 + +# Table of Contents + * [List of Tables](#list-of-tables) + * [Revision](#revision) + * [About This Manual](#about-this-manual) + * [Scope](#scope) + * [Definition/Abbreviation](#definitionabbreviation) + * [1. Requirements ](#1-requirements) + * [2. Functional Description](#2-functional-description) + * [3. Design](#3-design) + * [4. Flow Diagrams](#4-flow-diagrams) + * [5. Serviceability and Debug](#5-serviceability-and-debug) + * [6. Warm Boot Support](#6-warm-boot-support) + * [7. Scalability](#7-scalability) + * [8. Unit Test](#8-unit-test) + +# List of Tables +[Table 1: Abbreviations](#table-1-abbreviations) +[Table 2: Configuration options and defaults](#table-2-defaults) + +# Revision +| Rev | Date | Author | Change Description | +|-----|------------|-----------------------------|----------------------------| +| 0.1 | 05/19/2019 | Anil Sadineni, Laveen T | Initial version | +| 0.2 | 05/31/2019 | Anil Sadineni, Laveen T | Address comments from Ben | +| 0.3 | 07/22/2019 | Anil Sadineni, Laveen T | Internal review comments | + +# About this Manual +This document provides general information about the debug framework and additional debug features implementation in SONiC. + +# Scope +Current implementation of SONiC offers logging utility and utilities to display the contents of Redis. In an effort to enhance debug ability, A new debug framework is added with below functionality: + * Provide a framework that allows components to register and dump running snapshots of component internals using dump routines. + * Handle assert conditions to collect more info. + * Implement dump routines in OrchAgent using debug framework. + +Additionally, below debug features are done to enhance debug ability. + * Enhance existing show tech-support utility. + * Add additional scripts to enforce policies on debug related files. + +This document describes the above scope. + +# Definition/Abbreviation + +# Table 1: Abbreviations + +| **Term** | **Meaning** | +|-------------------|-------------------------------------------------------------------------------------------------------| +| Singleton | The singleton pattern is a design pattern that restricts the instantiation of a class to one object. | | + +# 1 Requirements +## 1.1 Functional Requirements +1. A framework to allow component registration and subsequently trigger dump routines. Triggers are from administrative CLI commands or from assert for this release. +2. Assert routines for production environment to delay the reload for gathering information or notify admin and delegate the decision of reload to admin. +3. Enhance tech-support to collect comprehensive info. +4. Add a helper script to facilitate an upload of the debug info in a non-intrusive way. +5. Rate limit and redirect critical logs for better debug ability. + +## 1.2 Configuration and Management Requirements +1. New CLI for triggering dump routines of Daemons. +2. New CLI to display drop/error interface counters. +3. New CLI to trigger upload of debug information. +4. Existing CLI is extended to filter the display counters on a specific interface. + +## 1.3 Scalability Requirements +None + +## 1.4 Warm Boot Requirements +None + +# 2 Functionality +## 2.1 Functional Description + +**1. Dump Routines** +- Framework facilitates to collect developer-level component internal data state under certain system conditions. +- The triggers to dump information might come from administrative commands, from assert or in response to some critical notifications. +- Framework provides interface for components to register and an interface for utilities to trigger the components to invoke dump routines. + +**2. Assert utility** +- This utility adds some data collection of certain system information when an assert condition is hit. + +**3. Tech-support enhancements** +- Current tech-support collects exhaustive information. The tech-support is enhanced to additionally collect STATE_DB dump, dump of ASIC specifics, filter and redirect critical logs to persistent log file for quick reference. + +**4. Helper Scripts** +New utility scripts are provided to: +- Print "headline" information for a quick summary of the system state. +- Facilitate upload of the collected information. +- Enforce policy on the number of debug-related files. + +# 3 Design +## 3.1 Overview +### 3.1.1 Framework for dump routines + +**Block Diagram** +![Debug Framework Block Diagram](https://github.com/anilsadineni/SONiC/blob/debug_framework_HLD/images/debug_framework_block_diagram.png) + + +Debug framework provides an API for components to register with the framework. It also provides interface for administrator, assert utilities, error handling functions to invoke framework actions. Framework uses Redis notifications for communicating the trigger message from entity invoking the trigger to the registered components. Framework actions are configurable. These actions are applicable uniformly to all registered components. + +Debug Framework itself will not have an executing context. Framework actions are performed in the registered component's thread context. Trigger APIs are executed in the context of invoking utility or invoking process. +Please refer [Flow Diagram](#4-flow-diagrams) for execution sequence. + +**Framework Responsibilities** + +Each registered component provides the routines to dump the component-specific state via callback functions. It is then the responsibility of the framework to +- Monitor for specific system triggers or listen on framework trigger APIs invocation +- Process the trigger and call appropriate dump routines +- Manage the output location +- Follow up with post actions like redirect summary to console/ compress the files and upload the file. + +**Component Responsibilities** + +Components implement dump routines following below guidelines +- Implement a function of type DumpInfoFunc. + `typedef void (*DumpInfoFunc)(std::string, KeyOpFieldsValuesTuple);` +- Within the function, dump the information into a string buffer and use a macro `SWSS_DEBUG_PRINT`. Macro will further process the information based on intended framework action. Wrap the function that uses multiple invocations of `SWSS_DEBUG_PRINT` with `SWSS_DEBUG_PRINT_BEGIN` and `SWSS_DEBUG_PRINT_END`. +- Handle KeyOpFieldsValuesTuple as argument. Arguments like dumpType and pass thru arguments that instruct the level of info dump ( short summary/specific info/filtered output/full info) should be handled by components. These arguments are opaque to framework. Additional optional arguments like output location and post-actions are for the framework consumption. +- Interpreting arguments is at discretion of components. Components may choose to dump same info for all trigger types ignoring input arguments. +- Dump routines registered with framework should be thread safe. Necessary synchronization needs to be ensured. + +**Component registration options** + +Framework only manages component registration and invocation of callback functions. Framework itself does not have an executing context. To execute the callback function, components have an option to instruct framework either to create an additional thread that runs in the registrant context or not create a thread but merely publish an event to Redis and expect component to handle the event. +Components have an option to register with debug framework using any one of the below APIs: +Option #1. `Debugframework::linkWithFramework()` +Option #2. `Debugframework::linkWithFrameworkNoThread()` + +Option #1 is preferred. However if a component already handles RedisDB events and doesn't want an additional thread (to avoid thread synchronization issues) might opt for option #2 + +Option #1 ( Framework creates a Thread ) +--------- +Components register dump routine with debug framework using the API: +`Debugframework::linkWithFramework(std::string &componentName, const DumpInfoFunc funcPtr);` + +Framework does the following: +- Update framework internal data structure to save the registration information of components. +- Create a `swss::Selectable` object. +- Create a subscriber to a table/channel in APPL_DB in RedisDB with a unique channel name for receiving triggers. +- Create a thread and wait on events/triggers. +- Invoke component specific dump routine in this thread context on receipt of a trigger. +- Handle the configured framework action within SWSS_DEBUG_PRINT. For example: Redirect the output to syslog or component specific log file. +- Update Redis DB to indicate callback action completion. +- Post-processing of information based on the configured framework action. For example: bundle the information and upload the information to a server. + +Option #2 ( Framework does not create a Thread ) +--------- +Components register dump routine with debug framework using the API: +`Debugframework::linkWithFrameworkNoThread(std::string &componentName);` +Framework will limit the job to the following: +- Update framework internal data structure to save the registration information of components. +- Handle the configured framework action within SWSS_DEBUG_PRINT. For example: Redirect the output to syslog or component specific log file. +- Post-processing of information based on the configured framework action. For example: bundle the information and upload the information to a server. + +The below activity is delegated to the component +- Optional creation of a swss::Selectable object. +- Create a subscriber instance to a table/channel in APPL_DB in RedisDB for receiving triggers. +- Optionally create a thread to wake on triggers. +- Invoke component specific dump routine on receipt of a trigger. +- Update Redis DB to indicate callback action completion. + + +**Implementation brief** +Framework will be implemented in C++ as a class in SWSS namespace. + +``` + class Debugframework + { + public: + static Debugframework &getInstance(); // To have only one instance aka singleton + + typedef void (*DumpInfoFunc)(std::string, KeyOpFieldsValuesTuple); + + // Component registration option 1 API + static void linkWithFramework(std::string &componentName, const DumpInfoFunc funcPtr); + + // Component registration option 2 API + static void linkWithFrameworkNoThread(std::string &componentName); + + // Interface to invoke triggers + static void invokeTrigger(std::string componentName, std::string args); + + private: + Debugframework(); + std::map m_registeredComps; + std::map m.configParams; + + // Thread in while(1) for handling triggers and invoking callbacks + [[ noreturn ]] void runnableThread(); + }; +``` +#### 3.1.1.1 Integration of OrchAgent with framework +##### 3.1.1.1.1 Integration of OrchAgent + +DebugDumpOrch is added as an interface for orchagent modules to register with debug framework. DebugDumpOrch uses linkWithFrameworkNoThread(). Orchagent modules (for eg. Routeorch/Neighorch) calls DebugDumpOrch::addDbgCompMap() to register. + +DebugDumpOrch +- listens for notification events from debugframework +- invokes corresponding component's CLI callback +- notifies debugframework after processing the debug request + +##### 3.1.1.1.2 Triggers to OrchAgent + +show commands will act as triggers for OrchAgent. +Syntax: `show debug ... ` + +Definition of command and required options are left to the component module owners. Sample CLI section descripes few examples on how the "show debug" CLI command can be used. + +##### 3.1.1.1.3 Sample CLI +*RouteOrch:* + +| Syntax | Description | +|---------------------------------------------------------|---------------------------------------------------------------------------------------------| +|show debug routeOrch routes -v -p | Dump all Routes or routes specific to a prefix | +|show debug routeOrch nhgrp | NexthopGroup/ECMP info from RouteOrch::m_syncdNextHopGroups | +|show debug routeOrch all | Translates to list of APIs to dump, which can be used for specific component based triggers | + +*NeighborOrch:* + +| Syntax | Description | +|---------------------------|--------------------| +|show debug NeighOrch nhops | Dump Nexthops info | +|show debug NeighOrch neigh | Dump Neighbor info | + +##### 3.1.1.1.4 Sample Output + +``` +root@sonic:~# show debug routeorch routes -v VrfRED +------------IPv4 Route Table ------------ + +VRF_Name = VrfRED VRF_SAI_OID = 0x30000000005b1 +Prefix NextHop SAI-OID +100.100.4.0/24 Ethernet4 0x60000000005b3 +33.33.33.0/24 0x55c1ca5b3b98 (ECMP) 0x50000000005db +33.33.44.0/24 100.120.120.11|Ethernet8 0x40000000005d9 +33.33.55.0/24 100.120.120.12|Ethernet8 0x40000000005da +100.102.102.0/24 Ethernet12 0x60000000005d3 +100.120.120.0/24 Ethernet8 0x60000000005b4 + +------------IPv6 Route Table ------------ + +VRF_Name = VrfRED VRF_SAI_OID = 0x30000000005b1 +Prefix NextHop SAI-OID +2001:100:120:120::/64 Ethernet8 0x60000000005b4 + +root@sonic:~# show debug routeorch nhgrp + ax Nexthop Group - 512 +NHGrpKey SAI-OID NumPath RefCnt +0x55c1ca5b7bd0 0x50000000005db 3 1 + 1: 100.120.120.10|Ethernet8 + 2: 100.120.120.11|Ethernet8 + 3: 100.120.120.12|Ethernet8 + +root@sonic:~# show debug neighorch nhops + +NHIP Intf SAI-OID RefCnt Flags +100.120.120.10 Ethernet8 0x40000000005d8 1 0 +100.120.120.11 Ethernet8 0x40000000005d9 2 0 +100.120.120.12 Ethernet8 0x40000000005da 2 0 + +NHIP Intf SAI-OID RefCnt Flags +fe80::648a:79ff:fe5d:6b2a Ethernet4 0x40000000005df 0 0 +fe80::fc54:ff:fe44:de2 Ethernet12 0x40000000005d4 0 0 +fe80::fc54:ff:fe78:5fac Ethernet8 0x40000000005d2 0 0 +fe80::fc54:ff:fe88:6f80 Ethernet4 0x40000000005d0 0 0 +fe80::fc54:ff:fe8e:d91f Ethernet0 0x40000000005d1 0 0 + +root@sonic:~# +root@sonic:~# show debug neighorch neigh +NHIP Intf MAC +100.120.120.10 Ethernet8 00:00:11:22:00:10 +100.120.120.11 Ethernet8 00:00:11:22:00:11 +100.120.120.12 Ethernet8 00:00:11:22:00:12 + +NHIP Intf MAC +fe80::648a:79ff:fe5d:6b2a Ethernet4 fe:54:00:35:18:bb +fe80::fc54:ff:fe44:de2 Ethernet12 fe:54:00:44:0d:e2 +fe80::fc54:ff:fe78:5fac Ethernet8 fe:54:00:78:5f:ac +fe80::fc54:ff:fe88:6f80 Ethernet4 fe:54:00:88:6f:80 +fe80::fc54:ff:fe8e:d91f Ethernet0 fe:54:00:8e:d9:1f + +``` + +#### 3.1.1.3 Configuring Framework +Debug framework will initialize with below default parameters and shall be considered if options are not specified as arguments in the triggers. + +# Table 2: Configuration Options and Defaults + +| **Parameter** | **options** | **Default** | +|-------------------------------|-------------------------------------|-------------------------------| +| DumpLocation | "syslog" / "filename" | /var/log/_dump.log | +| TargetComponent | "all" / "componentname" | all | +| Post-action | "upload" / "compress-rotate-keep" | compress-rotate-keep | +| Server-location | ipaddress | 127.0.0.1 | +| Upload-method | "scp" / "tftp" | tftp | +| Upload-directory | "dir_path" / "default_dir" | default_dir | + + +### 3.1.2 Assert Framework + +#### 3.1.2.1 Overview +Asserts are added in the program execution sequence to confirm that the data/state at a certain point is valid/true. During developement, if the programming sequence fails in an assert condition then the program execution is stopped by crash/exception. In production code, asserts are normally removed. This framework enhances/extendes the assert to provide more debug details when an assert fails. + +Classify assert failure conditions based on following types, assert() will have type and the module as additional arguments +- DUMP: Invokes the debug framework registered callback API corresponding to the module +- BTRACE: Prints bracktrace and continue +- SYSLOG: Update syslog with the assert failure +- ABORT: Stop/throw exception + + +#### 3.1.2.2 PsuedoCode: + +``` + static void custom_assert(bool exp, const char*func, const unsigned line); + + #ifdef assert + #undef assert + #define assert(exp) Debugframework::custom_assert(exp, __PRETTY_FUNCTION__, __LINE__) + #endif +``` + +## 3.2 DB Changes + +### 3.2.1 APP DB + +**Dump table** + +For triggering dump routines + +``` +; Defines schema for DEBUG Event attributes +key = DAEMON:daemon_name ; daemon_session_name is unique identifier; +; field = value +DUMP_TYPE = "short" / "full" ; summary dump or full dump +DUMP_TARGET = "default" / "syslog" ; syslog or specified file +PASSTHRU_ARGS = arglist ; zero or more strings separated by "," + +``` +**Dump done table** +``` +; Defines schema for DEBUG response attributes +key = DAEMON:daemon_name ; daemon_session_name is unique identifier; +; field = value +RESPONSE_TYPE = < enum DumpResponse::result > ; pending/failure/successful +``` + +## 3.3 CLI + +### 3.3.1 Show Commands + +|Syntax | Description | +|-------------------------------|-------------------------------------------------------------------------------| +|show debug all | This command will invoke dump routines for all components with default action | +|show debug < component > | This command will invoke dump routine for specific component | +|show debug actions < options > | This command will display the configured framework actions | + +### 3.3.2 Debug/Error Interface counters +show interfaces pktdrop is added to display debug/error counters for all interfaces. + +# 4 Flow Diagrams + +![Framework and component interaction](https://github.com/anilsadineni/SONiC/blob/debug_framework_HLD/images/debug_framework_flow_diagram.png) + +# 5 Serviceability and Debug +None + +# 6 Warm Boot Support + +No change. + + + +# 7 Scalability +No Change + + +# 8 Unit Test +### 8.1.1 Debug Framework +1. Verify that Framework provides a shared library (a) for components to link and register with the framework and (b) for the utilities to issue triggers. +2. Verify that on a running system, without any triggers framework does not add any CPU overhead. +3. Verify that dump routines are invoked when a trigger is issued with all arguments from the trigger utilities/ show debug commands. +4. Verify that number of subscribers is incremented using redis-cli after successful registration of component. +5. Verify that SWSS_DEBUG_PRINT macro writes to dump location specified . +6. Verify that SWSS_DEBUG_PRINT macro writes to SYSLOG when DUMP_TARGET in trigger is mentioned as syslog. +7. Verify that if the utility function triggers dump for all, framework loops through the registrations and updates the table for each component. +8. Verify that framework executes the configured post action. +9. Verify the behaviour of framework when some component doesnt send a DONE message. +10. Verify that framework handles multiple consecutive triggers and handles triggers independently. + +Go back to [Beginning of the document](#Debug-Framework-in-SONiC). + diff --git a/doc/drop_counters/drop_counters_HLD.md b/doc/drop_counters/drop_counters_HLD.md new file mode 100644 index 0000000000..bd0cae6127 --- /dev/null +++ b/doc/drop_counters/drop_counters_HLD.md @@ -0,0 +1,397 @@ +# Configurable Drop Counters in SONiC + +# High Level Design Document +#### Rev 1.0 + +# Table of Contents +* [List of Tables](#list-of-tables) +* [List of Figures](#list-of-figures) +* [Revision](#revision) +* [About this Manual](#about-this-manual) +* [Scope](#scope) +* [Defintions/Abbreviation](#definitionsabbreviation) +* [1 Overview](#1-overview) + - [1.1 Use Cases](#11-use-cases) + - [1.1.1 A flexible "drop filter"](#111-a-flexible-"drop-filter") + - [1.1.2 A helpful debugging tool](#112-a-helpful-debugging-tool) + - [1.1.3 More sophisticated monitoring schemes](#113-more-sophisticated-monitoring-schemes) +* [2 Requirements](#2-requirements) + - [2.1 Functional Requirements](#21-functional-requirements) + - [2.2 Configuration and Management Requirements](#22-configuration-and-management-requirements) + - [2.3 Scalability Requirements](#23-scalability-requirements) + - [2.4 Supported Debug Counters](#24-supported-debug-counters) +* [3 Design](#3-design) + - [3.1 CLI (and usage example)](#31-cli-and-usage-example) + - [3.1.1 Displaying available counter capabilities](#311-displaying-available-counter-capabilities) + - [3.1.2 Displaying current counter configuration](#312-displaying-current-counter-configuration) + - [3.1.3 Displaying the current counts](#313-displaying-the-current-counts) + - [3.1.4 Clearing the counts](#314-clearing-the-counts) + - [3.1.5 Configuring counters from the CLI](#315-configuring-counters-from-the-CLI) + - [3.2 Config DB](#32-config-db) + - [3.2.1 DEBUG_COUNTER Table](#321-debug_counter-table) + - [3.2.2 PACKET_DROP_COUNTER_REASON Table](#322-packet_drop_counter_reason-table) + - [3.3 State DB](#33-state-db) + - [3.3.1 DEBUG_COUNTER_CAPABILITIES Table](#331-debug-counter-capabilities-table) + - [3.3.2 SAI APIs](#332-sai-apis) + - [3.4 Counters DB](#34-counters-db) + - [3.5 SWSS](#35-swss) + - [3.5.1 SAI APIs](#351-sai-apis) + - [3.6 syncd](#34-syncd) +* [4 Flows](#4-flows) + - [4.1 General Flow](#41-general-flow) +* [5 Warm Reboot Support](#5-warm-reboot-support) +* [6 Unit Tests](#6-unit-tests) +* [7 Platform Support](#7-platform-support) + - [7.1 Known Limitations](#7.1-known-limitations) +* [8 Open Questions](#8-open-questions) +* [9 Acknowledgements](#9-acknowledgements) +* [10 References](#10-references) + +# List of Tables +* [Table 1: Abbreviations](#definitionsabbreviation) + +# List of Figures +* [Figure 1: General Flow](#41-general-flow) + +# Revision +| Rev | Date | Author | Change Description | +|:---:|:--------:|:-----------:|---------------------------| +| 0.1 | 07/30/19 | Danny Allen | Initial version | +| 0.2 | 09/03/19 | Danny Allen | Review updates | +| 0.3 | 09/19/19 | Danny Allen | Community meeting updates | +| 1.0 | 11/19/19 | Danny Allen | Code review updates | + +# About this Manual +This document provides an overview of the implementation of configurable packet drop counters in SONiC. + +# Scope +This document describes the high level design of the configurable drop counter feature. + +# Definitions/Abbreviation +| Abbreviation | Description | +|--------------|-----------------| +| RX | Receive/ingress | +| TX | Transmit/egress | + +# 1 Overview +The main goal of this feature is to provide better packet drop visibility in SONiC by providing a mechanism to count and classify packet drops that occur due to different reasons. + +The other goal of this feature is for users to be able to track the types of drop reasons that are important for their scenario. Because different users have different priorities, and because priorities change over time, it is important for this feature to be easily configurable. + +We will accomplish both goals by adding support for SAI debug counters to SONiC. +* Support for creating and configuring port-level and switch-level debug counters will be added to orchagent and syncd. +* A CLI tool will be provided for users to manage and configure their own drop counters + +## 1.1 Use Cases +There are a couple of potential use cases for these drop counters. + +### 1.1.1 A flexible "drop filter" +One potential use case is to use the drop counters to create a filter of sorts for the standard STAT_IF_IN/OUT_DISCARDS counters. Say, for example: +- Packets X, Y, and Z exist in our system +- Our switches should drop X, Y, and Z when they receive them + +We can configure a drop counter (call it "EXPECTED_DROPS", for example) that counts X, Y, and Z. If STAT_IF_IN_DISCARDS = EXPECTED_DROPS, then we know our switch is healthy and that everything is working as intended. If the counts don't match up, then there may be a problem. + +### 1.1.2 A helpful debugging tool +Another potential use case is to configure the counters on the fly in order to help debug packet loss issues. For example, if we're consistently experiencing packet loss in your system, we might try: +- Creating a counter that tracks L2_ANY and a counter that tracks L3_ANY +- L2_ANY is incrementing, so we delete these two counters and create MAC_COUNTER that tracks MAC-related reasons (SMAC_EQUALS_DMAC, DMAC_RESERVED, etc.), VLAN_COUNTER that tracks VLAN related reasons, (INGRESS_VLAN_FILTER, VLAN_TAG_NOT_ALLOWED), and OTHER_COUNTER that tracks everything else (EXCEEDS_L2_MTU, FDB_UC_DISCARD, etc.) +- OTHER_COUNTER is incrementing, so we delete the previous counters and create a counter that tracks the individual reasons from OTHER_COUNTER +- We discover that the EXCEEDS_L2_MTU counter is increasing. There might be an MTU mismatch somewhere in our system! + +### 1.1.3 More sophisticated monitoring schemes +Some have suggested other deployment schemes to try to sample the specific types of packet drops that are occurring in their system. Some of these ideas include: +- Periodically (e.g. every 30s) cycling through different sets of drop counters on a given device +- "Striping" drop counters across different devices in the system (e.g. these 3 switches are tracking VLAN drops, these 3 switches are tracking ACL drops, etc.) +- An automatic version of [1.1.2](#112-a-helpful-debugging-tool) that adapts the drop counter configuration based on which counters are incrementing + +# 2 Requirements + +## 2.1 Functional Requirements +1. CONFIG_DB can be configured to create debug counters +2. STATE_DB can be queried for debug counter capabilities +3. Users can access drop counter information via a CLI tool + 1. Users can see what capabilities are available to them + 1. Types of counters (i.e. port-level and/or switch-level) + 2. Number of counters + 3. Supported drop reasons + 2. Users can see what types of drops each configured counter contains + 3. Users can add and remove drop reasons from each counter + 4. Users can read the current value of each counter + 5. Users can assign aliases to counters + 6. Users can clear counters + +## 2.2 Configuration and Management Requirements +Configuration of the drop counters can be done via: +* config_db.json +* CLI + +## 2.3 Scalability Requirements +Users must be able to use all debug counters and drop reasons provided by the underlying hardware. + +Interacting with debug counters will not interfere with existing hardware counters (e.g. portstat). Likewise, interacting with existing hardware counters will not interfere with debug counter behavior. + +## 2.4 Supported Debug Counters +* PORT_INGRESS_DROPS: port-level ingress drop counters +* PORT_EGRESS_DROPS: port-level egress drop counters +* SWITCH_INGRESS_DROPS: switch-level ingress drop counters +* SWITCH_EGRESS_DROPS: switch-level egress drop counters + +# 3 Design + +## 3.1 CLI (and usage example) +The CLI tool will provide the following functionality: +* See available drop counter capabilities: `show dropcounters capabilities` +* See drop counter config: `show dropcounters configuration` +* Show drop counts: `show dropcounters counts` +* Clear drop counters: `sonic-clear dropcounters` +* Initialize a new drop counter: `config dropcounters install` +* Add drop reasons to a drop counter: `config dropcounters add_reasons` +* Remove drop reasons from a drop counter: `config dropcounters remove_reasons` +* Delete a drop counter: `config dropcounters delete` + +### 3.1.1 Displaying available counter capabilities +``` +admin@sonic:~$ show dropcounters capabilities +Counter Type Total +-------------------- ------- +PORT_INGRESS_DROPS 3 +SWITCH_EGRESS_DROPS 2 + +PORT_INGRESS_DROPS: + L2_ANY + SMAC_MULTICAST + SMAC_EQUALS_DMAC + INGRESS_VLAN_FILTER + EXCEEDS_L2_MTU + SIP_CLASS_E + SIP_LINK_LOCAL + DIP_LINK_LOCAL + UNRESOLVED_NEXT_HOP + DECAP_ERROR + +SWITCH_EGRESS_DROPS: + L2_ANY + L3_ANY + A_CUSTOM_REASON +``` + +### 3.1.2 Displaying current counter configuration +``` +admin@sonic:~$ show dropcounters configuration +Counter Alias Group Type Reasons Description +-------- -------- ----- ------------------ ------------------- -------------- +DEBUG_0 RX_LEGIT LEGIT PORT_INGRESS_DROPS SMAC_EQUALS_DMAC Legitimate port-level RX pipeline drops + INGRESS_VLAN_FILTER +DEBUG_1 TX_LEGIT None SWITCH_EGRESS_DROPS EGRESS_VLAN_FILTER Legitimate switch-level TX pipeline drops + +admin@sonic:~$ show dropcounters configuration -g LEGIT +Counter Alias Group Type Reasons Description +-------- -------- ----- ------------------ ------------------- -------------- +DEBUG_0 RX_LEGIT LEGIT PORT_INGRESS_DROPS SMAC_EQUALS_DMAC Legitimate port-level RX pipeline drops + INGRESS_VLAN_FILTER +``` + +### 3.1.3 Displaying the current counts + +``` +admin@sonic:~$ show dropcounters counts + IFACE STATE RX_ERR RX_DROPS TX_ERR TX_DROPS RX_LEGIT +--------- ------- -------- ---------- -------- ---------- --------- +Ethernet0 U 10 100 0 0 20 +Ethernet4 U 0 1000 0 0 100 +Ethernet8 U 100 10 0 0 0 + +DEVICE TX_LEGIT +------ -------- +sonic 1000 + +admin@sonic:~$ show dropcounters counts -g LEGIT + IFACE STATE RX_ERR RX_DROPS TX_ERR TX_DROPS RX_LEGIT +--------- ------- -------- ---------- -------- ---------- --------- +Ethernet0 U 10 100 0 0 20 +Ethernet4 U 0 1000 0 0 100 +Ethernet8 U 100 10 0 0 0 + +admin@sonic:~$ show dropcounters counts -t SWITCH_EGRESS_DROPS +DEVICE TX_LEGIT +------ -------- +sonic 1000 +``` + +### 3.1.4 Clearing the counts +``` +admin@sonic:~$ sonic-clear dropcounters +Cleared drop counters +``` + +### 3.1.5 Configuring counters from the CLI +``` +admin@sonic:~$ sudo config dropcounters install DEBUG_2 PORT_INGRESS_DROPS [EXCEEDS_L2_MTU,DECAP_ERROR] -d "More port ingress drops" -g BAD -a BAD_DROPS +admin@sonic:~$ sudo config dropcounters add_reasons DEBUG_2 [SIP_CLASS_E] +admin@sonic:~$ sudo config dropcounters remove_reasons DEBUG_2 [SIP_CLASS_E] +admin@sonic:~$ sudo config dropcounters delete DEBUG_2 +``` + +## 3.2 Config DB +Two new tables will be added to Config DB: +* DEBUG_COUNTER to store general debug counter metadata +* DEBUG_COUNTER_DROP_REASON to store drop reasons for debug counters that have been configured to track packet drops + +### 3.2.1 DEBUG_COUNTER Table +Example: +``` +{ + "DEBUG_COUNTER": { + "DEBUG_0": { + "alias": "PORT_RX_LEGIT", + "type": "PORT_INGRESS_DROPS", + "desc": "Legitimate port-level RX pipeline drops", + "group": "LEGIT" + }, + "DEBUG_1": { + "alias": "PORT_TX_LEGIT", + "type": "PORT_EGRESS_DROPS", + "desc": "Legitimate port-level TX pipeline drops" + "group": "LEGIT" + }, + "DEBUG_2": { + "alias": "SWITCH_RX_LEGIT", + "type": "SWITCH_INGRESS_DROPS", + "desc": "Legitimate switch-level RX pipeline drops" + "group": "LEGIT" + } + } +} +``` + +### 3.2.2 DEBUG_COUNTER_DROP_REASON Table +Example: +``` +{ + "DEBUG_COUNTER_DROP_REASON": { + "DEBUG_0|SMAC_EQUALS_DMAC": {}, + "DEBUG_0|INGRESS_VLAN_FILTER": {}, + "DEBUG_1|EGRESS_VLAN_FILTER": {}, + "DEBUG_2|TTL": {}, + } +} +``` + +## 3.3 State DB +State DB will store information about: +* What types of drop counters are available on this device +* How many drop counters are available on this device +* What drop reasons are supported by this device + +### 3.3.1 DEBUG_COUNTER_CAPABILITIES Table +Example: +``` +{ + "DEBUG_COUNTER_CAPABILITIES": { + "SWITCH_INGRESS_DROPS": { + "count": "3", + "reasons": "[L2_ANY, L3_ANY, SMAC_EQUALS_DMAC]" + }, + "SWITCH_EGRESS_DROPS": { + "count": "3", + "reasons": "[L2_ANY, L3_ANY]" + } + } +} +``` + +This information will be populated by the orchestrator (described later) on startup. + +### 3.3.2 SAI APIs +We will use the following SAI APIs to get this information: +* `sai_query_attribute_enum_values_capability` to query support for different types of counters +* `sai_object_type_get_availability` to query the amount of available debug counters + +## 3.4 Counters DB +The contents of the drop counters will be added to Counters DB by flex counters. + +Additionally, we will add a mapping from debug counter names to the appropriate port or switch stat index called COUNTERS_DEBUG_NAME_PORT_STAT_MAP and COUNTERS_DEBUG_NAME_SWITCH_STAT_MAP respectively. + +## 3.5 SWSS +A new orchestrator will be created to handle debug counter creation and configuration. Specifically, this orchestrator will support: +* Creating a new counter +* Deleting existing counters +* Adding drop reasons to an existing counter +* Removing a drop reason from a counter + +### 3.5.1 SAI APIs +This orchestrator will interact with the following SAI Debug Counter APIs: +* `sai_create_debug_counter_fn` to create/configure new drop counters. +* `sai_remove_debug_counter_fn` to delete/free up drop counters that are no longer being used. +* `sai_get_debug_counter_attribute_fn` to gather information about counters that have been configured (e.g. index, drop reasons, etc.). +* `sai_set_debug_counter_attribute_fn` to re-configure drop reasons for counters that have already been created. + +## 3.6 syncd +Flex counter will be extended to support switch-level SAI counters. + +# 4 Flows +## 4.1 General Flow +![alt text](./drop_counters_general_flow.png) +The overall workflow is shown above in figure 1. + +(1) Users configure drop counters using the CLI. Configurations are stored in the DEBUG_COUNTER Config DB table. + +(2) The debug counts orchagent subscribes to the Config DB table. Once the configuration changes, the orchagent uses the debug SAI API to configure the drop counters. + +(3) The debug counts orchagent publishes counter configurations to Flex Counter DB. + +(4) Syncd subscribes to Flex Counter DB and sets up flex counters. Flex counters periodically query ASIC counters and publishes data to Counters DB. + +(5) CLI uses counters DB to satisfy CLI requests. + +(6) (not shown) CLI uses State DB to display hardware capabilities (e.g. how many counters are available, supported drop reasons, etc.) + +# 5 Warm Reboot Support +On resource-constrained platforms, debug counters can be deleted prior to warm reboot and re-installed when orchagent starts back up. This is intended to conserve hardware resources during the warm reboot. This behavior has not been added to SONiC at this time, but can be if the need arises. + +# 6 Unit Tests +This feature comes with a full set of virtual switch tests in SWSS. +``` +=============================================================================================== test session starts =============================================================================================== +platform linux2 -- Python 2.7.15+, pytest-3.3.0, py-1.8.0, pluggy-0.6.0 -- /usr/bin/python2 +cachedir: .cache +rootdir: /home/daall/dev/sonic-swss/tests, inifile: +collected 14 items + +test_drop_counters.py::TestDropCounters::test_deviceCapabilitiesTablePopulated remove extra link dummy +PASSED [ 7%] +test_drop_counters.py::TestDropCounters::test_flexCounterGroupInitialized PASSED [ 14%] +test_drop_counters.py::TestDropCounters::test_createAndRemoveDropCounterBasic PASSED [ 21%] +test_drop_counters.py::TestDropCounters::test_createAndRemoveDropCounterReversed PASSED [ 28%] +test_drop_counters.py::TestDropCounters::test_createCounterWithInvalidCounterType PASSED [ 35%] +test_drop_counters.py::TestDropCounters::test_createCounterWithInvalidDropReason PASSED [ 42%] +test_drop_counters.py::TestDropCounters::test_addReasonToInitializedCounter PASSED [ 50%] +test_drop_counters.py::TestDropCounters::test_removeReasonFromInitializedCounter PASSED [ 57%] +test_drop_counters.py::TestDropCounters::test_addDropReasonMultipleTimes PASSED [ 64%] +test_drop_counters.py::TestDropCounters::test_addInvalidDropReason PASSED [ 71%] +test_drop_counters.py::TestDropCounters::test_removeDropReasonMultipleTimes PASSED [ 78%] +test_drop_counters.py::TestDropCounters::test_removeNonexistentDropReason PASSED [ 85%] +test_drop_counters.py::TestDropCounters::test_removeInvalidDropReason PASSED [ 92%] +test_drop_counters.py::TestDropCounters::test_createAndDeleteMultipleCounters PASSED [100%] + +=========================================================================================== 14 passed in 113.65 seconds =========================================================================================== +``` + +A separate test plan will be uploaded and review by the community. This will consist of system tests written in pytest that will send traffic to the device and verify that the drop counters are updated correctly. + +# 7 Platform Support +In order to make this feature platform independent, we rely on SAI query APIs (described above) to check for what counter types and drop reasons are supported on a given device. As a result, drop counters are only available on platforms that support both the SAI drop counter API as well as the query APIs, in order to preserve safety. + +# 7.1 Known Limitations +* BRCM SAI: + - ACL_ANY, DIP_LINK_LOCAL, SIP_LINK_LOCAL, and L3_EGRESS_LINK_OWN are all based on the same underlying counter in hardware, so enabling any one of these reasons on a drop counter will (implicitly) enable all of them. + +# 8 Open Questions +- How common of an operation is configuring a drop counter? Is this something that will usually only be done on startup, or something people will be updating frequently? + +# 9 Acknowledgements +I'd like to thank the community for all their help designing and reviewing this new feature! Special thanks to Wenda, Ying, Prince, Guohan, Joe, Qi, Renuka, and the team at Microsoft, Madhu and the team at Aviz, Ben, Vissu, Salil, and the team at Broadcom, Itai, Matty, Liat, Marian, and the team at Mellanox, and finally Ravi, Tony, and the team at Innovium. + +# 10 References +[1] [SAI Debug Counter Proposal](https://github.com/itaibaz/SAI/blob/a612dd21257cccca02cfc6dab90745a56d0993be/doc/SAI-Proposal-Debug-Counters.md) diff --git a/doc/drop_counters/drop_counters_general_flow.png b/doc/drop_counters/drop_counters_general_flow.png new file mode 100644 index 0000000000..c662080360 Binary files /dev/null and b/doc/drop_counters/drop_counters_general_flow.png differ diff --git a/doc/error-handling/error_handling_design_spec.md b/doc/error-handling/error_handling_design_spec.md new file mode 100755 index 0000000000..65308e2f4d --- /dev/null +++ b/doc/error-handling/error_handling_design_spec.md @@ -0,0 +1,429 @@ +# Error Handling Framework in SONiC +# High Level Design Document +#### Rev 0.1 + +# Table of Contents + * [Revision](#revision) + * [About this Manual](#about-this-manual) + * [Scope](#scope) + * [Overview](#overview) + * [Definitions/Abbreviation](#definitionsabbreviation) + * [1 Requirements Overview](#1-requirements-overview) + * [1.1 Functional Requirements](#11-functional-requirements) + * [1.2 Configuration and Management Requirements ](#12-Configuration-and-Management-Requirements) + * [1.3 Supported Objects](#13-Supported-Objects) + * [2 Use cases](#2-Use-cases) + * [3 Design](#3-Design) + * [3.1 Error Database](#31-Error-Database) + * [3.2 Error codes](#32-Error-codes) + * [3.3 OrchAgent Changes](#33-OrchAgent-changes) + * [3.3.1 Event processing](#331-Event-Processing) + * [3.3.2 Application registration](#332-Application-Registration) + * [3.3.3 Clearing ERROR_DB](#333-Clearing-ERROR-DB) + * [3.4 DB changes](#34-DB-Changes) + * [3.4.1 Config DB](#341-Config-DB) + * [3.4.2 App DB](#342-App-DB) + * [3.4.3 Error DB](#343-Error-DB) + * [3.4.3.1 Error Tables](#3431-Error-Tables) + * [3.4.3.2 Error DB Schemas](#3432-Error-DB-Schemas) + * [3.5 CLI](#35-CLI) + * [3.5.1 Data Models](#351-Data-Models) + * [3.5.2 Config commands](#352-Config-commands) + * [3.5.3 Show commands](#353-Show-commands) + * [3.5.4 Clear commands](#354-Clear-commands) + * [4 Flow Diagrams](#4-Flows-diagrams) + * [4.1 Route Add Failure](#41-Route-Add-Failure) + * [5 Serviceability and Debug](#5-Serviceability-and-Debug) + * [6 Warm Boot Support](#6-Warm-Boot-Support) + * [7 Scalability](#7-Scalability) + * [8 Unit Tests](#8-Unit-Tests) + * [9 Unsupported features](#9-Unsupported-features) + +# Revision + +| Rev | Date | Author | Change Description | +|:---:|:-----------:|:-------------------------|:----------------------| +| 0.1 | 05/6/2019 | Siva Mukka, Santosh Doke | Initial version | + + +# About this Manual +This document describes the design details of error handling framework in SONiC. The framework is responsible for notifying ASIC/SAI programming failures back to applications. It can also be extended to notify other failures in the system. + +# Scope +This document describes the high level design details of error handling framework in SONiC. + +# Overview +SONIC currently does not have a mechanism to propagate ASIC/SAI programming errors back to application. If the SAI CREATE/SET method fails: + +- Syncd treats all such failures as fatal irrespective of reason code. +- Syncd sends a switch shutdown request to Orchestration Agent. +- Syncd waits indefinitely to restart, switch is no longer manageable/operational. + +To address this, a generic framework is introduced to notify errors, instead of sending shutdown request to OrchAgent. + +# Definitions/Abbreviation +###### Table 1: Abbreviations +| Abbreviation | Description | +| ------------ | ------------------------------------------------------------ | +| SAI | Switch Abstraction Interface | +| APP DB | Application Database | +| ASIC DB | ASIC Database | +| ERROR DB | Error Database | +| SWSS | Switch State Service | +| Syncd | Daemon responsible for invoking SAI functions based on ASIC DB updates | + +In this document, the term '**object**' refers to an entry in the table OR a specific attribute in the table. + +# 1 Requirements Overview +## 1.1 Functional Requirements + +The requirements for error handling framework are: + +1.1.1 Provide registration/de-registration mechanism for applications to enable/disable error notifications on a specific table. More than one application can register for notifications on a given table. + +1.1.2 Provide notifications for failed objects and also for the objects that were successfully installed. By default, only failed operations are notified. A configuration parameter to enable notifications for objects that are processed without error. + +1.1.3 Provide support to get all the failed objects at any given instance. + +1.1.4 Provide error notifications for CREATE/DELETE/UPDATE operations on objects. Framework does not report errors for GET operation. + +1.1.5 Provide all the required key fields as part of notifications so that applications can map the error to an object. For example, framework provides subnet/length/nexthop fields as part of failed route notification along with operation and actual error code. + +1.1.6 Provide error codes to application when reporting failures. Framework defines various error codes and maps them to underlying SAI/ASIC errors. For example, TABLE_FULL, NOT_FOUND and OUT_OF_MEMORY. + +1.1.7 In some cases, a given object may fail multiple times. The framework reports each failure as a separate notification, but retains the last-known error on that object. For example, a route modify operation may fail multiple times with different set of next hops, in which case multiple failure notifications are reported for that route entry. + +1.1.8 Provide configuration commands to clear the failed objects. + +1.1.9 Framework is only responsible for reporting errors. Any retry/rollback operation is the responsibility of applications. + +## 1.2 Configuration and Management Requirements +Provide CLI commands to display and clear ERROR_DB tables. + +## 1.3 Supported objects +Error handling framework supports notifications for the following tables defined in APP_DB: + +- ROUTE_TABLE +- NEIGH_TABLE + +The framework can be extended for other tables depending on application requirements. Currently, only ROUTE/NEIGH tables are supported to address the BGP use case. + +# 2 Use cases + +BGP application relies on RIB failure status to withdraw or advertise the routes. The error handling framework notifies the BGP application about route programming failures. This helps BGP application to withdraw the failed routes that are advertised to the neighbors. + +Following diagram describes a high level overview of the BGP use case: + +![BGP use case](images/bgp_use_case.png) + +# 3 Design + +## 3.1 Error Database + +As SONIC architecture relies on the use of centralized Redis-database as means of multi-process communication among all subsystems, the framework re-uses the same mechanism to notify errors back to applications. + +A new database, ERROR_DB, is introduced to store the details of failed entries/objects corresponding to various tables. The ERROR_DB tables are defined in application friendly format. Applications can register as consumer of ERROR_DB table to receive error notifications, whereas OrchAgent is registered as producer of ERROR_DB table. If the SAI CREATE/SET method fails, Syncd informs OrchAgent using the notification channel of ASIC_DB. OrchAgent is responsible to translate the ASIC_DB notification and store it in ERROR_DB format. It is also responsible to map the SAI specific error codes to SWSS error codes. + +For some objects, a notification may be needed even when the object is successfully programmed (SAI API reports SUCCESS). In such scenarios, the framework does not store the object in ERROR_DB (to reduce memory usage), but does notify the registered applications using ERROR_DB notification channel. + +Defining a separate error database has the following advantages over extending existing APP_DB tables for error notifications: + +- Applications do not register as consumer & producer for same table, which avoids redundant notifications. +- Flexible, not tied to any other table format. +- Extensible to all types of errors in the system, not restricted to APP_DB definitions. +- Efficient, as notifications are limited to failures in the DB. +- Notification for delete failures can be supported even when corresponding objects are deleted from APP_DB. + +## 3.2 Error codes +As per SONIC architecture, applications do not invoke SAI APIs directly and they are not aware of underlying SAI error codes. A new set of generic error codes are defined for applications, which are internally mapped to underlying SAI error codes. + +As SWSS submodule already provides a common library accessible to all SONIC applications, a new set of common error codes are defined as part of this library (applications need not include any new header files). The following table shows the mapping of SWSS error codes to SAI error codes. + +| SWSS error code | SAI error code | +|-----------------------|--------------------------------| +| SWSS_RC_SUCCESS | SAI_STATUS_SUCCESS | +| SWSS_RC_INVALID_PARAM | SAI_STATUS_INVALID_PARAMETER | +| SWSS_RC_UNAVAIL | SAI_STATUS_NOT_SUPPORTED | +| SWSS_RC_NOT_FOUND | SAI_STATUS_ITEM_NOT_FOUND | +| SWSS_RC_NO_MEMORY | SAI_STATUS_NO_MEMORY | +| SWSS_RC_EXISTS | SAI_STATUS_ITEM_ALREADY_EXISTS | +| SWSS_RC_FULL | SAI_STATUS_TABLE_FULL | +| SWSS_RC_IN_USE | SAI_STATUS_OBJECT_IN_USE | + +## 3.3 OrchAgent changes + +OrchAgent is the only component responsible for adding/modifying/deleting objects in ERROR_DB database. Since the error handling framework is part of the orchestration agent, the error notifications are reported only when orchestration agent is running. + +OrchAgent defines the following new classes to handle error reporting/listening: + +1. **Error reporter class** - defines functions to write to ERROR_DB and publish the notifications. +2. **Error listener class** - defines functions to register/de-register for notifications on a specific table. Applications need to specify the table name, operation such CREATE/DELETE/UPDATE, notification type (success/failure/both) and callback function when instantiating the error listener class. Applications can instantiate multiple listeners to receive notifications from different tables. + +### 3.3.1 Event processing + +Below diagram shows how framework reports notifications to the registered listeners: + +![Framework high level design](images/error_handling_overview.png) + +The following is the sequence of events involved in reporting a failure notification from Syncd process to application: + +1. Syncd reports errors using a single notification channel to OrchAgent. Using a single notification channel ensures that order of the notifications is retained. +2. After receiving the notification from Syncd, OrchAgent: + - Translates it from SAI data types to ERROR_DB data types + - Adds an entry in to error database. If the entry already exists, the corresponding failure code is updated. + - Publishes the notifications to respective error listeners. +3. Error listener waits for the incoming notifications, filters them and invokes the application callback. + + + +The following is the sequence of events involved in reporting successful programming of an entry from Syncd process to application: + +1. Syncd reports the successful programming of an entry to OrchAgent using a notification channel. + +2. After receiving the notification from Syncd, OrchAgent: + + - Translates it from SAI data types to ERROR_DB data types. + + - Removes the entry from database. if present. + - Publishes the notifications to respective error listeners. + +3. Error listener waits for the incoming notifications, filters them and invokes the application callback. + + + +The following table describes how framework handles some of the notifications: + +| Previous Notification | Current Notification | Framework Action | +| --------------------- | -------------------- | ------------------------------------------------------------ | +| Create failure | Update failure | Update the entry in the database and notify the registered applications | +| Create failure | Delete failure | Remove the entry from database and notify the registered applications | +| Create failure | Update success | Remove the entry from the database and notify the registered applications | +| Create success | Delete failure | Add the entry to the database and notify the registered applications | +| Delete failure | Create success | Remove the entry from the database and notify the registered applications | + +### 3.3.2 Application registration +``` +ErrorListener fpmErrorListener(APP_ROUTE_TABLE_NAME, (ERR_NOTIFY_FAIL | ERR_NOTIFY_POSITIVE_ACK)); + +Select s; +s.addSelectable(&fpmErrorListener); +``` + +### 3.3.3 Clearing ERROR_DB +ERROR_DB contents can be cleared using CLI command. The clear command can be invoked for all objects in a table or all tables. OrchAgent gets notified about clear operation via the notification channel and it deletes the corresponding objects from ERROR_DB without notifying the registered applications. + +## 3.4 DB Changes +### 3.4.1 Config DB +None. + +### 3.4.2 App DB +None. + +### 3.4.3 Error DB + +#### 3.4.3.1 ERROR Tables + +``` +ERROR_ROUTE_TABLE|prefix + "opcode": {{method}} + "nexthop": {{list_of_nexthops}} + "intf": ifindex ? PORT_TABLE.key + "status": {{return_code}} +``` + +``` +ERROR_NEIGH_TABLE|INTF_TABLE.name/ VLAN_INTF_TABLE.name / LAG_INTF_TABLE.name|prefix + "opcode": {{method}} + "neigh": {{mac_address}} + "family": {{ip_address_family}} + "status": {{return_code}} +``` + +### 3.4.3.2 ERROR_DB Schemas + +``` +;Defines schema for ERROR_ROUTE_TABLE that stores error status while programming the routes +;Status: Mandatory +key = ERROR_ROUTE_TABLE|prefix +; field = value +operation = opcode ; method CREATE/SET/DELETE +nexthop = *prefix, ; IP addresses separated by “,” +intf = ifindex, ; zero or more separated by "," +rc = SWSS Code ; status code +``` + +``` +;Defines schema for ERROR_NEIGH_TABLE that stores error status while programming the neighbor table entries +;Status: Mandatory +key = ERROR_NEIGH_TABLE|INTF_TABLE.name / VLAN_INTF_TABLE.name / LAG_INTF_TABLE.name|prefix +operation = opcode ; method CREATE/SET/DELETE +neigh = 12HEXDIG ; mac address of the neighbor +family = "IPv4" / "IPv6" ; address family +rc = SWSS code ; status code for each neighbour +``` + +Please refer to the [schema](https://github.com/Azure/sonic-swss/blob/master/doc/swss-schema.md) document for details on value annotations. + + +## 3.5 CLI + +### 3.5.1 Data Models + +Commands summary : + + - show error-database [TableName] + - sonic-clear error-database [TableName] + +### 3.5.2 Config commands +None + +### 3.5.3 Show commands +show command to display the error database entries + +``` +show error-database [TableName] + +Usage: show [OPTIONS] COMMAND [ARGS]... + + SONiC command line - 'show' command + +Options: + + -?, -h, --help Show this message and exit. + +Commands: + + error-database Show error database entries + +``` + +Example: +``` +Router#show error-database route +Route Nexthop Operation Failure +-------------- --------------------- --------- -------------- +2.2.2.0/24 10.10.10.2 Create TABLE FULL +192.168.10.12/24 12.12.10.2,11,11,11,2 Update PARAM +... +``` + +### 3.5.4 Clear commands +Clear command to clear error database entries +``` +sonic-clear error-database [TableName] + +Usage: sonic-clear [OPTIONS] COMMAND [ARGS]... + +Options: + + -?, -h, --help Show this message and exit. + +Commands: + + error-database Clear error database entries + +``` + + +# 4 Flows Diagrams + +## 4.1 Route add failure +The following flow diagram depicts the flow of route failure notification. + +![Route add failure](images/route_add_failure_flow.png) + + + +Syncd sends the following data to the framework through a notification channel as part of the error notification. + + +``` +key:"SAI_OBJECT_TYPE_ROUTE_ENTRY:{"dest":"20.20.20.0/24","vr":"oid:0x300000000004a"}" + +1) "opcode" +2) "CREATE" +3) "SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID" +4) "oid:0x1000000000001" +5) "rc" +6) "SAI_STATUS_TABLE_FULL" +``` + +The framework records the following entry into ERROR_ROUTE_TABLE, which gets notified to the registered applications. + +``` +"ERROR_ROUTE_TABLE:20.20.20.0/24" +1) "opcode" +2) "CREATE" +3) "nexthop" +4) "10.10.10.2" +5) "intf" +6) "Vlan10" +7) "rc" +8) "SWSS_RC_TABLE_FULL" +``` + +# 5 Serviceability and Debug + +The logging utility `swssloglevel` is used to set the log level for various operations involved in error handling. + +- When application registers/de-registers for notifications on ERROR_DB +- When framework notifies errors to applications +- When framework receives error notifications from Syncd +- When framework adds entry to ERROR_DB +- When frame deletes entry from ERROR_DB +- When framework receives 'clear' command + + + +# 6 Warm Boot Support + The error database is not persistent across warm reboots. + +# 7 Scalability + +None. + +# 8 Unit Tests + +The unit test plan for error handling framework is documented below: + +| Error Handling | S.No | Test description | +| -------------- | ---- | ------------------------------------------------------------ | +| Route Table | 1.1 | Verify application can register successfully for ROUTE table notifications. Generate failure event and verify application is notified | +| | 1.2 | Verify application can de-register successfully for ROUTE table notifications. Generate failure event and verify application is NO LONGER notified. | +| | 1.3 | Verify multiple applications registering for ROUTE table notifications. Generate failure event and verify all registered applications are notified. | +| | 1.4 | Verify multiple applications de-register for ROUTE table notifications. Generate failure event and verify only de-registered application is NO LONGER notified. Other registered applications continue to get notified. | +| | 1.5 | Verify that the notification for IPv4/IPv6 ROUTE entry contains all the required parameters as defined by the schema - Prefix/Nexthops/Opcode/Failure code. | +| | 1.6 | Verify error is notified incase of IPv4/IPv6 ROUTE add failure due to TABLE full condition. Verify entry exists in ERROR_DB for failed route with Opcode=Add and Error=Table Full. | +| | 1.7 | Verify error is notified incase of IPv4/IPv6 ROUTE add failure due to ENTRY_EXISTS condition. Verify entry exists in ERROR_DB for failed route with Opcode=Add and Error=Entry Exists. | +| | 1.8 | Verify application is notified even in case of IPv4/IPv6 ROUTE is successfully programmed (NO_ERROR). Verify that there is NO entry for the route in ERROR_DB. | +| | 1.9 | Verify error is notified in case of IPv4/IPv6 ROUTE deletion failure due to NOT_FOUND. Verify that there is NO entry for the failed route in ERROR_DB. | +| | 1.10 | Verify that the failed IPv4/IPv6 ROUTE entry in ERROR_DB is cleared, when application deletes that entry. Verify other failed entries in ERROR_DB are retained. | +| | 1.11 | Verify multiple add failures on the ECMP IPv4/IPv6 ROUTE entry. Verify each failure is notified individually along with set of nexthops. Verify ERROR_DB entry reflects the last known failure status. | +| Neighbor Table | 2.1 | Verify application can register successfully for Neighbor table notifications. Generate failure event and verify application is notified. | +| | 2.2 | Verify application can de-register successfully for Neighbor table notifications. Generate failure event and verify application is NO LONGER notified. | +| | 2.3 | Verify multiple applications registerting for Neighbor table notifications. Generate failure event and verify all registered applications are notified. | +| | 2.4 | Verify multiple applications de-register for Neighbor table notifications. Generate failure event and verify only de-registered application is NO LONGER notified. Other registered applications continue to get notified. | +| | 2.5 | Verify that the notification for IPv4/IPv6 Neighbor entry contains all the required parameters as defined by the schema - Ifname/Prefix/Opcode/Failure code. | +| | 2.6 | Verify error is notified incase of IPv4/IPv6 NEIGHBOR add failure due to TABLE full condition. Verify entry exists in ERROR_DB for failed neighbor with Opcode=Add and Error=Table Full. | +| | 2.7 | Verify error is notified incase of IPv4/IPv6 NEIGHBOR add failure due to ENTRY_EXISTS condition. Verify entry exists in ERROR_DB for failed neighbor with Opcode=Add and Error=Entry Exists. | +| | 2.8 | Verify application is notified even in case of IPv4/IPv6 NEIGHBOR is successfully programmed (NO_ERROR). Verify that there is NO entry for the neighbor in ERROR_DB in this case. | +| | 2.9 | Verify error is notified in case of IPv4/IPv6 NEIGHBOR deletion failure due to NOT_FOUND. Verify that there is NO entry in ERROR_DB in this case. | +| | 2.10 | Verify that the failed IPv4/IPv6 NEIGHBOR entry in ERROR_DB is cleared, when application deletes that entry. Verify other failed entries in ERROR_DB are retained. | +| CLI | 3.1 | Verify that the 'show' command displays all the failed entries in the ERROR_DB. Verify that the output matches with redis-cli query command. | +| | 3.2 | Verify that the 'clear' command removes all the failed entries in the ERROR_DB. Verify using redis-cli query command that there are no entries in ERROR_DB. | + + +# 9 Unsupported features + +- Add transaction ID support to track the notifications back to application. This helps applications to correlate when multiple operations fail on the same object. + + Typically, the following steps are required to handle the request that includes transaction ID: + + - Application adds transaction ID (Tid) to APP_DB entry + - OrchAgent pass Tid to ASIC_DB + - Syncd includes Tid in the notification after invoking SAI API + - OrchAgent pass the Tid to the applications that includes other error details. + + If Tid is included in the application request, framework includes the same in the notification. Otherwise, the notification is still sent, but without Tid. This enables migration of applications to start using Tid on a need basis. + +- Extend error handling to other tables in the sytem (VLAN/LAG/Mirror/FDB etc). diff --git a/doc/error-handling/images/bgp_use_case.png b/doc/error-handling/images/bgp_use_case.png new file mode 100755 index 0000000000..6cb81b850b Binary files /dev/null and b/doc/error-handling/images/bgp_use_case.png differ diff --git a/doc/error-handling/images/error_handling_overview.png b/doc/error-handling/images/error_handling_overview.png new file mode 100755 index 0000000000..9bd72264b5 Binary files /dev/null and b/doc/error-handling/images/error_handling_overview.png differ diff --git a/doc/error-handling/images/route_add_failure_flow.png b/doc/error-handling/images/route_add_failure_flow.png new file mode 100755 index 0000000000..7881ac0e1a Binary files /dev/null and b/doc/error-handling/images/route_add_failure_flow.png differ diff --git a/doc/everflow/SONiC Everflow Always-on HLD.pdf b/doc/everflow/SONiC Everflow Always-on HLD.pdf new file mode 100644 index 0000000000..9b7035bbd9 Binary files /dev/null and b/doc/everflow/SONiC Everflow Always-on HLD.pdf differ diff --git a/doc/SONiC_EVERFLOW_IPv6.pdf b/doc/everflow/SONiC_EVERFLOW_IPv6.pdf similarity index 100% rename from doc/SONiC_EVERFLOW_IPv6.pdf rename to doc/everflow/SONiC_EVERFLOW_IPv6.pdf diff --git a/doc/fastreboot.pdf b/doc/fast-reboot/fastreboot.pdf similarity index 100% rename from doc/fastreboot.pdf rename to doc/fast-reboot/fastreboot.pdf diff --git a/doc/fwutil/fwutil.md b/doc/fwutil/fwutil.md new file mode 100755 index 0000000000..9f524880f3 --- /dev/null +++ b/doc/fwutil/fwutil.md @@ -0,0 +1,462 @@ +# SONiC FW utility + +## High Level Design document + +## Table of contents +- [About this manual](#about-this-manual) +- [Revision](#revision) +- [Abbreviations](#abbreviations) +- [1 Introduction](#1-introduction) + - [1.1 Feature overview](#11-feature-overview) + - [1.2 Requirements](#12-requirements) + - [1.2.1 Functionality](#121-functionality) + - [1.2.2 Command interface](#122-command-interface) + - [1.2.3 Error handling](#123-error-handling) + - [1.2.4 Event logging](#124-event-logging) +- [2 Design](#2-design) + - [2.1 Overview](#21-overview) + - [2.2 FW utility](#22-fw-utility) + - [2.2.1 Command structure](#221-command-structure) + - [2.2.2 Command interface](#222-command-interface) + - [2.2.2.1 Show commands](#2221-show-commands) + - [2.2.2.1.1 Overview](#22211-overview) + - [2.2.2.1.2 Description](#22212-description) + - [2.2.2.2 Install commands](#2222-install-commands) + - [2.2.2.2.1 Overview](#22221-overview) + - [2.2.2.2.2 Description](#22222-description) + - [2.2.2.3 Update commands](#2223-update-commands) + - [2.2.2.3.1 Overview](#22231-overview) + - [2.2.2.3.2 Description](#22232-description) +- [3 Flows](#3-flows) + - [3.1 Show components status](#31-show-components-status) + - [3.2 Install component FW](#32-install-component-fw) + - [3.2.1 Non modular chassis platform](#321-non-modular-chassis-platform) + - [3.2.2 Modular chassis platform](#322-modular-chassis-platform) +- [4 Tests](#4-tests) + - [4.1 Unit tests](#41-unit-tests) + +## About this manual + +This document provides general information about FW utility implementation in SONiC. + +## Revision + +| Rev | Date | Author | Description | +|:---:|:----------:|:--------------:|:----------------------------------| +| 0.1 | 21/08/2019 | Nazarii Hnydyn | Initial version | +| 0.2 | 10/09/2019 | Nazarii Hnydyn | Review feedback and other changes | +| 0.3 | 17/09/2019 | Nazarii Hnydyn | Align flows with the platform API | +| 0.4 | 18/12/2019 | Nazarii Hnydyn | CLI review feedback | + +## Abbreviations + +| Term | Meaning | +|:-------|:----------------------------------------------------| +| FW | Firmware | +| SONiC | Software for Open Networking in the Cloud | +| PSU | Power Supply Unit | +| QSFP | Quad Small Form-factor Pluggable | +| EEPROM | Electrically Erasable Programmable Read-Only Memory | +| I2C | Inter-Integrated Circuit | +| SPI | Serial Peripheral Interface | +| JTAG | Joint Test Action Group | +| BIOS | Basic Input/Output System | +| CPLD | Complex Programmable Logic Device | +| FPGA | Field-Programmable Gate Array | +| URL | Uniform Resource Locator | +| API | Application Programming Interface | +| N/A | Not Applicable/Not Available | + +## List of figures + +[Figure 1: FW utility High Level Design](#figure-1-fw-utility-high-level-design) +[Figure 2: Show components status flow](#figure-2-show-components-status-flow) +[Figure 3: FW install (non modular) flow](#figure-3-fw-install-non-modular-flow) +[Figure 4: FW install (modular) flow](#figure-4-fw-install-modular-flow) + +## List of tables + +[Table 1: Event logging](#table-1-event-logging) + +# 1 Introduction + +## 1.1 Feature overview + +A modern network switch is a sophisticated equipment which consists of many auxiliary components +which are responsible for managing different subsystems (e.g., PSU/FAN/QSFP/EEPROM/THERMAL) +and providing necessary interfaces (e.g., I2C/SPI/JTAG). + +Basically these components are complex programmable logic devices with it's own HW architecture +and software. The most important are BIOS/CPLD/FPGA etc. + +It is very important to always have the latest recommended software version to improve device stability, +security and performance. Also, software updates can add new features and remove outdated ones. + +In order to make software update as simple as possible and to provide a nice user frindly +interface for various maintenance operations (e.g., install a new FW or query current version) +we might need a dedicated FW utility. + +## 1.2 Requirements + +### 1.2.1 Functionality + +**This feature will support the following functionality:** +1. Manual FW installation for particular platform component +2. Automatic FW installation for all available platform components +3. Querying platform components and FW versions + +### 1.2.2 Command interface + +**This feature will support the following commands:** +1. show: display FW versions +2. install: manual FW installation +3. update: automatic FW installation + +### 1.2.3 Error handling + +**This feature will provide error handling for the next situations:** +1. Invalid input +2. Incompatible options/parameters +3. Invalid/nonexistent FW URL/path + +**Note:** FW binary validation (checksum, format, etc.) should be done by SONiC platform API + +### 1.2.4 Event logging + +**This feature will provide event logging for the next situations:** +1. FW binary downloading over URL: start/end +2. FW binary downloading over URL: error +3. FW binary installation: start/end +4. FW binary installation: error + +###### Table 1: Event logging + +| Event | Severity | +|:------------------------------------------|:---------| +| FW binary downloading over URL: start/end | NOTICE | +| FW binary downloading over URL: error | ERROR | +| FW binary installation: start/end | INFO | +| FW binary installation: error | ERROR | + +**Note:** Some extra information also will be logged: +1. Component location (e.g., Chassis1/Module1/BIOS) +2. Operation result (e.g., success/failure) + +# 2 Design + +## 2.1 Overview + +![FW utility High Level Design](images/fwutil_hld.svg "Figure 1: FW utility High Level Design") + +###### Figure 1: FW utility High Level Design + +In order to improve scalability and performance a modern network switches provide different architecture solutions: +1. Non modular chassis platforms +2. Modular chassis platforms + +Non modular chassis platforms may contain only one chassis. +A chassis may contain it's own set of components. + +Modular chassis platforms may contain only one chassis. +A chassis may contain one or more modules and it's own set of components. +Each module may contain it's own set of components. + +Basically each chassis/module may contain one or more components (e.g., BIOS/CPLD/FPGA). + +SONiC platform API provides an interface for FW maintenance operations for both modular and +non modular chassis platforms. Both modular and non modular chassis platforms share the same platform API, +but may have different implementation. + +SONiC FW utility uses platform API to interact with the various platform components. + +## 2.2 FW utility + +### 2.2.1 Command structure + +**User interface**: +``` +fwutil +|--- show +| |--- status +| |--- version +| +|--- install +| |--- chassis +| | |--- component +| | |--- fw -y|--yes +| | +| |--- module +| |--- component +| |--- fw -y|--yes +| +|--- update -y|--yes -f|--force -i|--image= +``` + +**Note:** can be absolute path or URL + +### 2.2.2 Command interface + +#### 2.2.2.1 Show commands + +##### 2.2.2.1.1 Overview + +The purpose of the show commands group is to provide an interface for: +1. FW utility related information query (version, etc.) +2. Platform components related information query (fw, etc.) + +##### 2.2.2.1.2 Description + +**The following command displays FW utility version:** +```bash +root@sonic:~# fwutil show version +fwutil version 1.0.0.0 +``` + +**The following command displays platform components and FW versions:** +1. Non modular chassis platform +```bash +root@sonic:~# fwutil show status +Chassis Module Component Version Description +-------- ------- ---------- ------------------ ------------ +Chassis1 N/A BIOS 0ACLH003_02.02.007 Chassis BIOS + CPLD 5 Chassis CPLD + FPGA 5 Chassis FPGA +``` + +2. Modular chassis platform +```bash +root@sonic:~# fwutil show status +Chassis Module Component Version Description +-------- ------- ---------- ------------------ ------------ +Chassis1 BIOS 0ACLH004_02.02.007 Chassis BIOS + CPLD 5 Chassis CPLD + FPGA 5 Chassis FPGA + Module1 CPLD 5 Module CPLD + FPGA 5 Module FPGA +``` + +#### 2.2.2.2 Install commands + +##### 2.2.2.2.1 Overview + +The purpose of the install commands group is to provide an interface +for manual FW update of various platform components. + +##### 2.2.2.2.2 Description + +**The following command installs FW:** +1. Non modular chassis platform +```bash +root@sonic:~# fwutil install chassis component BIOS fw --yes /bios.bin +... +FW update in progress ... +... +Warning: Cold reboot is required! +root@sonic:~# fwutil install chassis component CPLD fw --yes /cpld.bin +... +FW update in progress ... +... +Warning: Power cycle is required! +root@sonic:~# fwutil install chassis component FPGA fw --yes /fpga.bin +... +FW update in progress ... +... +Warning: Power cycle is required! +``` + +2. Modular chassis platform +```bash +root@sonic:~# fwutil install chassis component BIOS fw /bios.bin +New FW will be installed, continue? [y/N]: N +Aborted! +root@sonic:~# fwutil install chassis component CPLD fw /cpld.bin +New FW will be installed, continue? [y/N]: N +Aborted! +root@sonic:~# fwutil install chassis component FPGA fw /fpga.bin +New FW will be installed, continue? [y/N]: N +Aborted! +root@sonic:~# fwutil install module Module1 component CPLD fw /cpld.bin +New FW will be installed, continue? [y/N]: N +Aborted! +root@sonic:~# fwutil install module Module1 component FPGA fw /fpga.bin +New FW will be installed, continue? [y/N]: N +Aborted! +``` + +**Supported options:** +1. -y|--yes - automatic yes to prompts. Assume "yes" as answer to all prompts and run non-interactively + +#### 2.2.2.3 Update commands + +##### 2.2.2.3.1 Overview + +The purpose of the update commands group is to provide an interface +for automatic FW update of all available platform components. + +Automatic FW update requires platform_components.json to be created and placed at: +_sonic-buildimage/device///platform_components.json_ + +**Example:** +1. Non modular chassis platform +```json +{ + "chassis": { + "Chassis1": { + "component": { + "BIOS": { + "firmware": "/etc//fw//chassis1/bios.bin", + "version": "0ACLH003_02.02.010", + "info": "Cold reboot is required" + }, + "CPLD": { + "firmware": "/etc//fw//chassis1/cpld.bin", + "version": "10", + "info": "Power cycle is required" + }, + "FPGA": { + "firmware": "/etc//fw//chassis1/fpga.bin", + "version": "5", + "info": "Power cycle is required" + } + } + } + } +} +``` + +2. Modular chassis platform +```json +{ + "chassis": { + "Chassis1": { + "component": { + "BIOS": { + "firmware": "/etc//fw//chassis1/bios.bin", + "version": "0ACLH003_02.02.010", + "info": "Cold reboot is required" + }, + "CPLD": { + "firmware": "/etc//fw//chassis1/cpld.bin", + "version": "10", + "info": "Power cycle is required" + }, + "FPGA": { + "firmware": "/etc//fw//chassis1/fpga.bin", + "version": "5", + "info": "Power cycle is required" + } + } + } + }, + "module": { + "Module1": { + "component": { + "CPLD": { + "firmware": "/etc//fw//module1/cpld.bin", + "version": "10", + "info": "Power cycle is required" + }, + "FPGA": { + "firmware": "/etc//fw//module1/fpga.bin", + "version": "5", + "info": "Power cycle is required" + } + } + } + } +} +``` + +**Note:** FW update will be skipped if component definition is not provided (e.g., 'BIOS': { }) + +##### 2.2.2.3.2 Description + +**The following command updates FW of all available platform components:** +1. Non modular chassis platform +```bash +root@sonic:~# fwutil update --image=next +Chassis Module Component Firmware Version Status Info +-------- ------- ---------- --------------------- --------------------------------------- ------------------ ----------------------- +Chassis1 N/A BIOS /bios.bin 0ACLH004_02.02.007 / 0ACLH004_02.02.010 update is required Cold reboot is required + CPLD /cpld.bin 5 / 10 update is required Power cycle is required + FPGA /fpga.bin 5 / 5 up-to-date Power cycle is required +New FW will be installed, continue? [y/N]: y + +... +FW update in progress ... +... + +Summary: + +Chassis Module Component Status +-------- ------- ---------- ---------- +Chassis1 N/A BIOS success + CPLD failure + FPGA up-to-date +``` + +2. Modular chassis platform +```bash +root@sonic:~# fwutil update --image=next +Chassis Module Component Firmware Version Status Info +-------- ------- ---------- --------------------- --------------------------------------- ------------------ ----------------------- +Chassis1 BIOS /bios.bin 0ACLH004_02.02.007 / 0ACLH004_02.02.010 update is required Cold reboot is required + CPLD /cpld.bin 5 / 10 update is required Power cycle is required + FPGA /fpga.bin 5 / 5 up-to-date Power cycle is required + Module1 CPLD /cpld.bin 5 / 10 update is required Power cycle is required + FPGA /fpga.bin 5 / 5 up-to-date Power cycle is required +New FW will be installed, continue? [y/N]: y + +... +FW update in progress ... +... + +Summary: + +Chassis Module Component Status +-------- ------- ---------- ---------- +Chassis1 BIOS success + CPLD success + FPGA up-to-date + Module1 CPLD failure + FPGA up-to-date +``` + +**Supported options:** +1. -y|--yes - automatic yes to prompts. Assume "yes" as answer to all prompts and run non-interactively +2. -f|--force - install FW regardless the current version +3. -i|--image - update FW using current/next SONiC image + +**Note:** the default option is _--image=current_ + +# 3 Flows + +## 3.1 Show components status + +![Show components status flow](images/show_status_flow.svg "Figure 2: Show components status flow") + +###### Figure 2: Show components status flow + +## 3.2 Install component FW + +### 3.2.1 Non modular chassis platform + +![FW install (non modular) flow](images/install_non_modular_flow.svg "Figure 3: FW install (non modular) flow") + +###### Figure 3: FW install (non modular) flow + +### 3.2.2 Modular chassis platform + +![FW install (modular) flow](images/install_modular_flow.svg "Figure 4: FW install (modular) flow") + +###### Figure 4: FW install (modular) flow + +# 4 Tests + +## 4.1 Unit tests + +1. Show FW utility version +2. Show components status +3. Install new BIOS/CPLD/FPGA FW on non modular chassis +4. Install new BIOS/CPLD/FPGA FW on modular chassis +5. Update FW on all available platform components diff --git a/doc/fwutil/images/fwutil_hld.svg b/doc/fwutil/images/fwutil_hld.svg new file mode 100755 index 0000000000..7948300504 --- /dev/null +++ b/doc/fwutil/images/fwutil_hld.svg @@ -0,0 +1,609 @@ + + + + + fwutil hld + + + + + + + + + + + + + + Page-11 + + + Rectangle + + + + + + + Rounded Rectangle.42 + DEVICE API + + + + + + + + + + + + + + + + + + + + + + DEVICE API + + Rectangle.43 + get_name + + + + + + + get_name + + Rectangle.4 + get_presence + + + + + + + get_presence + + Rectangle.5 + get_model + + + + + + + get_model + + Rectangle.49 + get_serial + + + + + + + get_serial + + Rectangle.50 + get_status + + + + + + + get_status + + Rectangle.105 + FW Utility + + + + + + + FW Utility + + Simple Double Arrow.106 + + + + + + + + Rectangle.10 + + + + + + + Rectangle.11 + platform_base.py + + + + + + + platform_base.py + + Rounded Rectangle.52 + CHASSIS API + + + + + + + + + + + + + + + + + + + + + + CHASSIS API + + Rectangle.53 + get_chassis + + + + + + + get_chassis + + Rectangle.64 + + + + + + + Rounded Rectangle.70 + DEVICE API + + + + + + + + + + + + + + + + + + + + + + DEVICE API + + Rectangle.71 + get_name + + + + + + + get_name + + Rectangle.72 + get_presence + + + + + + + get_presence + + Rectangle.73 + get_model + + + + + + + get_model + + Rectangle.74 + get_serial + + + + + + + get_serial + + Rectangle.75 + get_status + + + + + + + get_status + + Rectangle.113 + chassis_base.py + + + + + + + chassis_base.py + + Rounded Rectangle.123 + COMPONENT API + + + + + + + + + + + + + + + + + + + + + + COMPONENT API + + Rectangle.124 + get_num_components + + + + + + + get_num_components + + Rectangle.125 + get_all_component + + + + + + + get_all_component + + Rectangle.126 + get_component + + + + + + + get_component + + Rectangle.127 + + + + + + + Rounded Rectangle.128 + API + + + + + + + + + + + + + + + + + + + + + + API + + Rectangle.129 + get_name + + + + + + + get_name + + Rectangle.130 + get_description + + + + + + + get_description + + Rectangle.131 + get_firmware_version + + + + + + + get_firmware_version + + Rectangle.132 + component_base.py + + + + + + + component_base.py + + Rectangle.134 + HW + + + + + + + HW + + Simple Double Arrow.138 + + + + + + + + Simple Double Arrow.139 + + + + + + + + Rectangle.143 + install_firmware + + + + + + + install_firmware + + Simple Double Arrow.154 + + + + + + + + Rounded Rectangle.155 + COMPONENT API + + + + + + + + + + + + + + + + + + + + + + COMPONENT API + + Rectangle.156 + get_num_components + + + + + + + get_num_components + + Rectangle.157 + get_all_component + + + + + + + get_all_component + + Rectangle.158 + get_component + + + + + + + get_component + + Simple Double Arrow.160 + + + + + + + + Simple Double Arrow.161 + + + + + + + + Rounded Rectangle.72 + MODULE API + + + + + + + + + + + + + + + + + + + + + + MODULE API + + Rectangle.16 + get_num_modules + + + + + + + get_num_modules + + Rectangle.17 + get_all_modules + + + + + + + get_all_modules + + Rectangle.18 + get_module + + + + + + + get_module + + Rectangle.65 + module_base.py + + + + + + + module_base.py + + diff --git a/doc/fwutil/images/install_modular_flow.svg b/doc/fwutil/images/install_modular_flow.svg new file mode 100755 index 0000000000..0ac6d3f780 --- /dev/null +++ b/doc/fwutil/images/install_modular_flow.svg @@ -0,0 +1,947 @@ + + + + + fwutil flows + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Page-4 + + + + + + + + + + + Actor lifeline.115 + FW Utility + + Sheet.2 + + + + Sheet.3 + + + + Sheet.4 + + + Sheet.5 + + + + + + + FW Utility + + + + + + + + + Object lifeline.120 + platform : PlatformBase<Object> + + Sheet.7 + + + + Sheet.8 + + + + Sheet.9 + + + Sheet.10 + + + + + + + platform : PlatformBase<Object> + + + + + + + + + Object lifeline.125 + chassis : ChassisBase<DeviceBase> + + Sheet.12 + + + + Sheet.13 + + + + Sheet.14 + + + Sheet.15 + + + + + + + chassis : ChassisBase<DeviceBase> + + + Activation.131 + + + + + + + + Message.133 + <platform>.get_chassis() + + + + + + + + + + + + <platform>.get_chassis() + + Return Message.134 + return <chassis> + + + + + + + + + + + + return <chassis> + + + + + + + + Object lifeline.138 + module : ModuleBase<DeviceBase> + + Sheet.20 + + + + Sheet.21 + + + + Sheet.22 + + + Sheet.23 + + + + + + + module : ModuleBase<DeviceBase> + + + Message.162 + <chassis>.get_name() + + + + + + + + + + + + <chassis>.get_name() + + Return Message.164 + return <chassis_name> + + + + + + + + + + + + return <chassis_name> + + Activation.168 + + + + + + + + Activation.169 + + + + + + + + Activation.170 + + + + + + + + Activation.255 + + + + + + + + Activation.256 + + + + + + + + Message.257 + <chassis>.get_all_modules() + + + + + + + + + + + + <chassis>.get_all_modules() + + Return Message.258 + return <module_list> + + + + + + + + + + + + return <module_list> + + + + + + Optional fragment.737 + + + + + Sheet.34 + opt + + + + + + + + + + + + + + + opt + + Sheet.35 + [ if <chassis_name> is found ] + + + + + + [ if <chassis_name> is found ] + + + + + + + Loop fragment.946 + + + + + Sheet.37 + loop + + + + + + + + + + + + + + + loop + + Sheet.38 + [ for module in <module_list> ] + + + + + + [ for module in <module_list> ] + + + Message.952 + <module>.get_name() + + + + + + + + + + + + <module>.get_name() + + Return Message.953 + return <module_name> + + + + + + + + + + + + return <module_name> + + Activation.954 + + + + + + + + Activation.955 + + + + + + + + + + + + Optional fragment.956 + + + + + Sheet.44 + opt + + + + + + + + + + + + + + + opt + + Sheet.45 + [ if <module_name> is found ] + + + + + + [ if <module_name> is found ] + + + Activation.961 + + + + + + + + Message.963 + <module>.get_all_components() + + + + + + + + + + + + <module>.get_all_components() + + Return Message.964 + return <component_list> + + + + + + + + + + + + return <component_list> + + + + + + Optional fragment.965 + + + + + Sheet.50 + opt + + + + + + + + + + + + + + + opt + + Sheet.51 + [ if <image_path> not exists ] ] + + + + + + [ if <image_path> not exists ] ] + + + Activation.968 + + + + + + + + Self Message.969 + handle_error() + + + + + + + + + + + + handle_error() + + Activation.970 + + + + + + + + Message.971 + <component>.install_firmware(<image_path>) + + + + + + + + + + + + <component>.install_firmware(<image_path>) + + Return Message.973 + return <result> + + + + + + + + + + + + return <result> + + Activation.974 + + + + + + + + Self Message.975 + show_result(<chassis_name>,<module_name>,<component_name>,<im... + + + + + + + + + + + + show_result(<chassis_name>,<module_name>,<component_name>,<image_path>,<result>) + + + + + + Optional fragment.980 + + + + + Sheet.60 + opt + + + + + + + + + + + + + + + opt + + Sheet.61 + [ if <chassis_name> is not found ] + + + + + + [ if <chassis_name> is not found ] + + + Activation.983 + + + + + + + + Self Message.984 + handle_error() + + + + + + + + + + + + handle_error() + + + + + + Optional fragment.985 + + + + + Sheet.65 + opt + + + + + + + + + + + + + + + opt + + Sheet.66 + [ if <module_name> is not found ] + + + + + + [ if <module_name> is not found ] + + + Activation.988 + + + + + + + + Self Message.989 + handle_error() + + + + + + + + + + + + handle_error() + + + + + + + + Object lifeline.1030 + сomponent : ComponentBase<Object> + + Sheet.70 + + + + Sheet.71 + + + + Sheet.72 + + + Sheet.73 + + + + + + + сomponent : ComponentBase<Object> + + + + + + + Loop fragment.1039 + + + + + Sheet.75 + loop + + + + + + + + + + + + + + + loop + + Sheet.76 + [ for component in <component_list> ] + + + + + + [ for component in <component_list> ] + + + + + + + Optional fragment.1044 + + + + + Sheet.78 + opt + + + + + + + + + + + + + + + opt + + Sheet.79 + [ if <component_name> is not found ] + + + + + + [ if <component_name> is not found ] + + + Activation.1047 + + + + + + + + Self Message.1048 + handle_error() + + + + + + + + + + + + handle_error() + + Activation.972 + + + + + + + + Activation.962 + + + + + + + + + + + + Optional fragment.1051 + + + + + Sheet.85 + opt + + + + + + + + + + + + + + + opt + + Sheet.86 + [ if <component_name> is found ] + + + + + + [ if <component_name> is found ] + + + diff --git a/doc/fwutil/images/install_non_modular_flow.svg b/doc/fwutil/images/install_non_modular_flow.svg new file mode 100755 index 0000000000..c0f25a64fc --- /dev/null +++ b/doc/fwutil/images/install_non_modular_flow.svg @@ -0,0 +1,739 @@ + + + + + fwutil flows + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Page-4 + + + + + + + + + + + Actor lifeline.474 + FW Utility + + Sheet.2 + + + + Sheet.3 + + + + Sheet.4 + + + Sheet.5 + + + + + + + FW Utility + + + + + + + + + Object lifeline.479 + platform : PlatformBase<Object> + + Sheet.7 + + + + Sheet.8 + + + + Sheet.9 + + + Sheet.10 + + + + + + + platform :PlatformBase<Object> + + + + + + + + + Object lifeline.484 + chassis : ChassisBase<DeviceBase> + + Sheet.12 + + + + Sheet.13 + + + + Sheet.14 + + + Sheet.15 + + + + + + + chassis :ChassisBase<DeviceBase> + + + Activation.489 + + + + + + + + Message.490 + <platform>.get_chassis() + + + + + + + + + + + + <platform>.get_chassis() + + Return Message.491 + return <chassis> + + + + + + + + + + + + return <chassis> + + Message.508 + <chassis>.get_name() + + + + + + + + + + + + <chassis>.get_name() + + Return Message.509 + return <chassis_name> + + + + + + + + + + + + return <chassis_name> + + Activation.510 + + + + + + + + Activation.511 + + + + + + + + Activation.512 + + + + + + + + + + + + Optional fragment.541 + + + + + Sheet.25 + opt + + + + + + + + + + + + + + + opt + + Sheet.26 + [ if <chassis_name> is found ] + + + + + + [ if <chassis_name> is found ] + + + Activation.928 + + + + + + + + Activation.929 + + + + + + + + Message.930 + <chassis>.get_all_components() + + + + + + + + + + + + <chassis>.get_all_components() + + Return Message.931 + return <component_list> + + + + + + + + + + + + return <component_list> + + + + + + Optional fragment.932 + + + + + Sheet.32 + opt + + + + + + + + + + + + + + + opt + + Sheet.33 + [ if <image_path> not exists ] ] + + + + + + [ if <image_path> not exists ] ] + + + Activation.935 + + + + + + + + Self Message.936 + handle_error() + + + + + + + + + + + + handle_error() + + Activation.937 + + + + + + + + Message.938 + <component>.install_firmware(<image_path>) + + + + + + + + + + + + <component>.install_firmware(<image_path>) + + Return Message.940 + return <result> + + + + + + + + + + + + return <result> + + + + + + Optional fragment.990 + + + + + Sheet.40 + opt + + + + + + + + + + + + + + + opt + + Sheet.41 + [ if <chassis_name> is not found ] + + + + + + [ if <chassis_name> is not found ] + + + Activation.993 + + + + + + + + Self Message.994 + handle_error() + + + + + + + + + + + + handle_error() + + Activation.995 + + + + + + + + Self Message.996 + show_result(<chassis_name>,<component_name>,<image_path>,<res... + + + + + + + + + + + + show_result(<chassis_name>,<component_name>,<image_path>,<result>) + + + + + + + + Object lifeline.999 + сomponent : ComponentBase<Object> + + Sheet.47 + + + + Sheet.48 + + + + Sheet.49 + + + Sheet.50 + + + + + + + сomponent : ComponentBase<Object> + + + + + + + Loop fragment.1006 + + + + + Sheet.52 + loop + + + + + + + + + + + + + + + loop + + Sheet.53 + [ for component in <component_list> ] + + + + + + [ for component in <component_list> ] + + + Message.1011 + <component>.get_name() + + + + + + + + + + + + <component>.get_name() + + Return Message.1012 + return <component_name> + + + + + + + + + + + + return <component_name> + + Activation.1013 + + + + + + + + Activation.1014 + + + + + + + + + + + + Optional fragment.1017 + + + + + Sheet.59 + opt + + + + + + + + + + + + + + + opt + + Sheet.60 + [ if <component_name> is found ] + + + + + + [ if <component_name> is found ] + + + + + + + Optional fragment.1024 + + + + + Sheet.62 + opt + + + + + + + + + + + + + + + opt + + Sheet.63 + [ if <component_name> is not found ] + + + + + + [ if <component_name> is not found ] + + + Activation.1027 + + + + + + + + Self Message.1028 + handle_error() + + + + + + + + + + + + handle_error() + + Activation.939 + + + + + + + + diff --git a/doc/fwutil/images/show_status_flow.svg b/doc/fwutil/images/show_status_flow.svg new file mode 100755 index 0000000000..3f355dfc37 --- /dev/null +++ b/doc/fwutil/images/show_status_flow.svg @@ -0,0 +1,970 @@ + + + + + fwutil flows + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Page-4 + + + + + + + + + + + Actor lifeline.764 + FW Utility + + Sheet.2 + + + + Sheet.3 + + + + Sheet.4 + + + Sheet.5 + + + + + + + FW Utility + + + + + + + + + Object lifeline.769 + platform : PlatformBase<Object> + + Sheet.7 + + + + Sheet.8 + + + + Sheet.9 + + + Sheet.10 + + + + + + + platform : PlatformBase<Object> + + + + + + + + + Object lifeline.774 + chassis : ChassisBase<DeviceBase> + + Sheet.12 + + + + Sheet.13 + + + + Sheet.14 + + + Sheet.15 + + + + + + + chassis : ChassisBase<DeviceBase> + + + Activation.779 + + + + + + + + Message.780 + <platform>.get_chassis() + + + + + + + + + + + + <platform>.get_chassis() + + Return Message.781 + return <chassis> + + + + + + + + + + + + return <chassis> + + + + + + + + Object lifeline.782 + module : ModuleBase<DeviceBase> + + Sheet.20 + + + + Sheet.21 + + + + Sheet.22 + + + Sheet.23 + + + + + + + module : ModuleBase<DeviceBase> + + + Activation.795 + + + + + + + + Message.845 + <chassis>.get_name() + + + + + + + + + + + + <chassis>.get_name() + + Return Message.846 + return <chassis_name> + + + + + + + + + + + + return <chassis_name> + + Activation.847 + + + + + + + + Activation.848 + + + + + + + + + + + + Loop fragment.851 + + + + + Sheet.30 + loop + + + + + + + + + + + + + + + loop + + Sheet.31 + [ for module in <module_list> ] + + + + + + [ for module in <module_list> ] + + + Message.856 + <module>.get_name() + + + + + + + + + + + + <module>.get_name() + + Return Message.857 + return <module_name> + + + + + + + + + + + + return <module_name> + + Activation.858 + + + + + + + + Activation.859 + + + + + + + + Activation.863 + + + + + + + + Activation.864 + + + + + + + + Message.865 + <module>.get_all_components() + + + + + + + + + + + + <module>.get_all_components() + + Return Message.866 + return <component_list> + + + + + + + + + + + + return <component_list> + + Activation.868 + + + + + + + + Self Message.869 + show_status(<chassis_component_map>,<module_component_map>) + + + + + + + + + + + + show_status(<chassis_component_map>,<module_component_map>) + + Activation.872 + + + + + + + + Activation.873 + + + + + + + + Message.874 + <chassis>.get_all_components() + + + + + + + + + + + + <chassis>.get_all_components() + + Return Message.875 + return <component_list> + + + + + + + + + + + + return <component_list> + + + + + + Loop fragment.877 + + + + + Sheet.47 + loop + + + + + + + + + + + + + + + loop + + Sheet.48 + [ for component in <component_list> ] + + + + + + [ for component in <component_list> ] + + + Activation.884 + + + + + + + + Self Message.885 + chassis_component_map[<chassis_name>][<component_name>] = (<f... + + + + + + + + + + + + chassis_component_map[<chassis_name>][<component_name>] = (<firmware_version>,<description>) + + Message.889 + <component>.get_firmware_version() + + + + + + + + + + + + <component>.get_firmware_version() + + Return Message.890 + return <firmware_version> + + + + + + + + + + + + return <firmware_version> + + Activation.891 + + + + + + + + Activation.898 + + + + + + + + Activation.899 + + + + + + + + Message.900 + <chassis>.get_all_modules() + + + + + + + + + + + + <chassis>.get_all_modules() + + Return Message.901 + return <module_list> + + + + + + + + + + + + return <module_list> + + + + + + + + Object lifeline.1062 + сomponent : ComponentBase<Object> + + Sheet.59 + + + + Sheet.60 + + + + Sheet.61 + + + Sheet.62 + + + + + + + сomponent : ComponentBase<Object> + + + Message.1072 + <component>.get_description() + + + + + + + + + + + + <component>.get_description() + + Return Message.1073 + return <description> + + + + + + + + + + + + return <description> + + Activation.1074 + + + + + + + + Activation.1075 + + + + + + + + Activation.892 + + + + + + + + Message.1078 + <component>.get_name() + + + + + + + + + + + + <component>.get_name() + + Return Message.1079 + return <component_name> + + + + + + + + + + + + return <component_name> + + Activation.1080 + + + + + + + + Activation.1081 + + + + + + + + + + + + Loop fragment.1083 + + + + + Sheet.73 + loop + + + + + + + + + + + + + + + loop + + Sheet.74 + [ for component in <component_list> ] + + + + + + [ for component in <component_list> ] + + + Activation.1086 + + + + + + + + Self Message.1087 + module_component_map[<chassis_name>][<module_name>][<componen... + + + + + + + + + + + + module_component_map[<chassis_name>][<module_name>][<component_name>] = (<firmware_version>,<description>) + + Message.1088 + <component>.get_firmware_version() + + + + + + + + + + + + <component>.get_firmware_version() + + Return Message.1089 + return <firmware_version> + + + + + + + + + + + + return <firmware_version> + + Activation.1090 + + + + + + + + Message.1091 + <component>.get_description() + + + + + + + + + + + + <component>.get_description() + + Return Message.1092 + return <description> + + + + + + + + + + + + return <description> + + Activation.1093 + + + + + + + + Activation.1094 + + + + + + + + Activation.1095 + + + + + + + + Message.1096 + <component>.get_name() + + + + + + + + + + + + <component>.get_name() + + Return Message.1097 + return <component_name> + + + + + + + + + + + + return <component_name> + + Activation.1098 + + + + + + + + Activation.1099 + + + + + + + + diff --git a/doc/incremental-update-ip-lag/Incremental IP LAG Update.md b/doc/incremental-update-ip-lag/Incremental IP LAG Update.md new file mode 100644 index 0000000000..87342f6905 --- /dev/null +++ b/doc/incremental-update-ip-lag/Incremental IP LAG Update.md @@ -0,0 +1,135 @@ +# SONiC IP/LAG Incremental Update + +# Table of Contents +* [Revision](#revision) +* [About](#about) +* [Requirements Overview](#1-requirements-overview) +* [Database Design](#2-database-design) +* [Daemon Design](#3-daemon-design) +* [Flows](#4-flows) + +##### Revision +| Rev | Date | Author | Change Description | +|:---:|:--------|:------------------:|--------------------| +| 0.1 | 2018-09 | Shuotian Cheng | Initial Version | + +# About +This document provides the general information about basic SONiC incremental configuration support including IP addresses configuration changes, and port channel configuration changes. + +# 1. Requirements Overview +## 1.1 Functional Requirements +#### Phase #0 +- Should be able to boot directly into working state given a working minigraph +1. All IPs are assigned correctly to each router interfaces +2. All port channel interfaces are created with correct members enslaved +3. All VLAN interfaces are created with correct members enslaved +4. All configured ports are set to admin status UP +5. All configured ports are set to desired MTU +#### Phase #1 +- Should not have static front panel interface configurations in `/etc/network/interfaces` file +- Should not have static teamd configurations in `/etc/teamd/` folder. +- Should be able to use command line to execute incremental updates including: +1. Bring up/down all ports/port channels/VLANs +2. Assign/remove IPs towards non-LAG-member/non-VLAN-member front panel ports, and port channels +3. Create/remove port channels +4. Add/remove members of port channels +- Should be able to restart docker swss and the system recovers to the state before the restart + +*Note:* +1. *Conflicting configurations that cannot be directly resolved are **NOT** supported in this phrase, including:* +- *moving a port with IP into a port channel* +- *assign an IP to a port channel member* +- *adding/removing non-existing ports towards port channels, etc.* +2. *Port channel and port channel members' admin status are controlled separately, indicating that a port channel's admin status DOWN will NOT affect its members' admin status to be brought down as well.* +3. *Admin status and MTU are must have attributes for ports and port channels, and the default values are UP and 9100.* +4. *MTU will be changed to the port channel's MTU once a port is enslaved into the port channel. However, the value will be automatically reset to its original one after the port is removed from the port channel.* +#### Phase #2 +- Should be able to move loopback interface out of `/etc/network/interfaces` file and managed by `portmgrd`. +- Should be able to restart docker teamd and all port channel configurations are reapplied + +*Note:* +*The reason of moving this request into phase 2 is due to unrelated issues encountered while removing and recreating router interfaces, including IPv6 removal and potential SAI implementation issues.* + +#### Future Work +TBD +## 1.2 Orchagent Requirements +The gap that orchagent daemon needs to fill is mostly related to MTU: +- Should be able to change router interface MTU +- Should be able to change LAG MTU + +## 1.3 \*-syncd Requirements +- `portsyncd`: Should not be listening to netlink message to get port admin status and MTU + +## 1.4 \*-mgrd Requirements +- `portmgrd`: Should be responsible for admin status and MTU configuration changes. Related tables: `PORT` +- `intfsmgrd`: Should be responsible for port/port channel/VLAN IP configuraton changes. Related tables: `PORT_INTERFACE`, `PORTCHANNEL_INTERFACE`, `VLAN_INTERFACE`. +- `teammgrd`: Should be responsible for port channel and port channel member configuration changes. Related tables: `PORTCHANNEL` AND `PORTCHANNEL_MEMBER`. + +## 1.4 Utility Requirements +``` +config interface add ip +config interface remove ip +config interface mtu + +config port_channel add +config port_channel remove +config port_channel member add +config port_channel member remove +``` + +# 2. Database Design +## 2.1 CONF_DB +#### 2.1.1 PORT Table +``` +PORT|{{port_name}} + "admin_status": {{UP|DOWN}} + "mtu": {{mtu_value}} +``` +#### 2.1.2 INTERFACE Table +``` +INTERFACE|{{port_name}}|{{IP}} +``` +#### 2.1.3 PORTCHANNEL Table +``` +PORTCHANNEL|{{port_channel_name}} + "admin_status": {{UP|DOWN}} + "mtu": {{mtu_value}} + "min_links": {{min_links_value}} + "fall_back": {{true|false}} +``` +#### 2.1.4 PORTCHANNEL_INTERFACE Table +``` +PORTCHANNEL_INTERFACE|{{port_channel_name}}|{{IP}} +``` +#### 2.1.5 PORTCHANNEL_MEMBER Table +``` +PORTCHANNEL_MEMBER|{{port_channel_name}}|{{port_name}} +``` + +# 3. Daemon Design +## 3.1 `orchagent` +- When LAG MTU is updated, all LAG members' MTUs are updated. +- When port/LAG MTU is updated, the associated router interface MTU is updated. +## 3.2 `portmgrd` +- Monitor `PORT` table +- Should be responsible for admin status changes and MTU changes +## 3.3 intfsyncd +- Monitor `PORT_INTERFACE`, `PORTCHANNEL_INTERFACE`, `VLAN_INTERFACE` tables +- Should be responsible for IP changes +## 3.4 teamsyncd +- Monitor `PORTCHANNEL` and `PORTCHANNEL_MEMBER` tables +- Should be responsible for port channel changes and member changes + +# 4. Flows +## 4.1 Admin Status/MTU Configuration Flow +![Image](https://github.com/stcheng/SONiC/blob/gh-pages/doc/admin_status.png) +## 4.2 Port Channel and Member Configuration Flow +![Image](https://github.com/stcheng/SONiC/blob/gh-pages/doc/port_channel.png) +## 4.3 IP Configuration Flow +![Image](https://github.com/stcheng/SONiC/blob/gh-pages/doc/ip.png) +## 4.4 Device Start Flow +![Image](https://github.com/stcheng/SONiC/blob/gh-pages/doc/device_start.png) +## 4.5 Docker swss Restart +TBD +## 4.6 Docker teamd Restart +TBD diff --git a/doc/incremental-update-ip-lag/admin_status.png b/doc/incremental-update-ip-lag/admin_status.png new file mode 100644 index 0000000000..8185e7770c Binary files /dev/null and b/doc/incremental-update-ip-lag/admin_status.png differ diff --git a/doc/incremental-update-ip-lag/device_start.png b/doc/incremental-update-ip-lag/device_start.png new file mode 100644 index 0000000000..3bbaf219a4 Binary files /dev/null and b/doc/incremental-update-ip-lag/device_start.png differ diff --git a/doc/incremental-update-ip-lag/ip.png b/doc/incremental-update-ip-lag/ip.png new file mode 100644 index 0000000000..6880dc8476 Binary files /dev/null and b/doc/incremental-update-ip-lag/ip.png differ diff --git a/doc/incremental-update-ip-lag/port_channel.png b/doc/incremental-update-ip-lag/port_channel.png new file mode 100644 index 0000000000..81a49c26c0 Binary files /dev/null and b/doc/incremental-update-ip-lag/port_channel.png differ diff --git a/doc/lag/LACP Fallback Feature for SONiC_v0.5.md b/doc/lag/LACP Fallback Feature for SONiC_v0.5.md index b37adefffa..1eab42c948 100644 --- a/doc/lag/LACP Fallback Feature for SONiC_v0.5.md +++ b/doc/lag/LACP Fallback Feature for SONiC_v0.5.md @@ -1,72 +1,119 @@ # Introduction ## Overview -The LACP Fallback Feature allows an active LACP interface to establish a Link Aggregation (LAG) before it receives LACP PDUs from its peer. - -This feature is useful in environments where customers have Preboot Execution Environment (PXE) Servers connected with a LACP Port Channel to the switch. Since PXE images are very small, many operating systems are unable to leverage LACP during the preboot process. The server’s NICs do not have the capability to run LACP without the assistance of a fully functional OS; during the PXE process, they are unaware of the other NIC and don’t have a method to form a LACP connection. Both the NIC’s on the server will be active and are sourcing frames from their respective MAC addresses during the initial boot process. Simply keeping both ports in the LAG active will not solve the problem because packets sourced from the MAC address of NIC-1 can be returned to the port on which NIC-2 is attached, which will cause NIC-2 to drop the packets (due to MAC mismatch). +The LACP Fallback Feature allows an active LACP interface to establish a Link +Aggregation (LAG) before it receives LACP PDUs from its peer. + +This feature is useful in environments where customers have Preboot Execution +Environment (PXE) Servers connected with a LACP Port Channel to the switch. +Since PXE images are very small, many operating systems are unable to leverage +LACP during the preboot process. The server’s NICs do not have the +capability to run LACP without the assistance of a fully functional OS; during +the PXE process, they are unaware of the other NIC and don't have a method to +form a LACP connection. Both the NIC's on the server will be active and are +sourcing frames from their respective MAC addresses during the initial boot +process. Simply keeping both ports in the LAG active will not solve the +problem because packets sourced from the MAC address of NIC-1 can be returned +to the port on which NIC-2 is attached, which will cause NIC-2 to drop the +packets (due to MAC mismatch). ![lag.png](https://github.com/Azure/SONiC/blob/gh-pages/images/lacp_fallback_hld/lag.png) -With the LACP fallback feature, the switch allows the server to bring up the LAG (before receiving any LACP PDUs from the server) and keeps a single port active until it receive the LACP PDUs from the server. This allows the PXE boot server to establish a connection over one Ethernet port, download its boot image and then continue the booting process. When the server boot process is complete, the server fully forms an LACP port-channel. +With the LACP fallback feature, the switch allows the server to bring up the +LAG (before receiving any LACP PDUs from the server) and keeps a single port +active until it receive the LACP PDUs from the server. This allows the PXE boot +server to establish a connection over one Ethernet port, download its boot +image and then continue the booting process. When the server boot process is +complete, the server fully forms an LACP port-channel. ## Requirements -a) LACP fallback feature can be enabled / disabled per LAG. -b) Only one member port will be selected as active per LAG during fallback mode -c) The member port will be moved out of the fallback state if it receives any LACP PDU from its peer. -d) Interoperability with other devices running standard 802.3ad LACP protocol. -e) The LACP runner behavior is not changed if fallback feature is disabled +- LACP fallback feature can be enabled / disabled per LAG. +- Only one member port will be selected as active per LAG during fallback mode +- The member port will be moved out of the fallback state if it receives any + LACP PDU from its peer. +- Interoperability with other devices running standard 802.3ad LACP protocol. +- The LACP runner behavior is not changed if fallback feature is disabled ## Assumptions -a) The LACP fallback feature is implemented on top of the open source libteam (https://github.com/jpirko/libteam) adopted by SONiC -b) The server is supposed to use only the member port in fallback mode to communicate with switch during the fallback mode. -c) The changes are limited to the libteam library only, the APP DB/SAI DB is not aware of the fallback state. +- The LACP fallback feature is implemented on top of the open source libteam + (https://github.com/jpirko/libteam) adopted by SONiC +- The server is supposed to use only the member port in fallback mode to + communicate with switch during the fallback mode. +- The changes are limited to the libteam library only, the APP DB/SAI DB is not + aware of the fallback state. ## Limitations -LACP fallback mode may also kick in during the normal LACP negotiation process due to the timing, which might cause some unexpected traffic loss. For example, if the LACP PDUs sent by peer are dropped completely, local member port with fallback enabled may still enter fallback mode, which might end up with data traffic loss. -  +LACP fallback mode may also kick in during the normal LACP negotiation process +due to the timing, which might cause some unexpected traffic loss. For example, +if the LACP PDUs sent by peer are dropped completely, local member port with +fallback enabled may still enter fallback mode, which might end up with data +traffic loss. + # Background -LACP fallback feature is implemented on the receiver side to establish a LAG before it receives LACP PDUs from its peer. So this section presents a formal description of the standard LACP Receive Machine. +LACP fallback feature is implemented on the receiver side to establish a LAG +before it receives LACP PDUs from its peer. So this section presents a formal +description of the standard LACP Receive Machine. ## Receive Machine States and Timer The receive machine has four states: -• Rxm_current -• Rxm_expired -• Rxm_defaulted -• Rxm_disabled +- Rxm\_current +- Rxm\_expired +- Rxm\_defaulted +- Rxm\_disabled -One timer: -Current while timer that is started in the Rxm_current and Rxm_expired states with two timeout: Short timeout (3s) and Long timeout (180s) depending on the value of the Actor’s Operational Status LACP_Timeout, as transmitted in LACPDUs. +One timer: Current while timer that is started in the Rxm\_current and +Rxm\_expired states with two timeout: Short timeout (3s) and Long timeout +(180s) depending on the value of the Actor's Operational Status LACP\_Timeout, +as transmitted in LACPDUs. ![Current_LACP_State_Machine.png](https://github.com/Azure/SONiC/blob/gh-pages/images/lacp_fallback_hld/Current_LACP_State_Machine.png) ## Receive Machine Events The following events can occur: -• Participant created or reinitialized -• Received LACP PDU -• Physical MAC enabled -• Physical MAC disabled -• Current while timer expiry -The physical MAC disabled event indicates that either or both of the physical MAC transmission or reception for the physical port associated with the actor have become non-operational. The received LACPDU event only occurs if both physical transmission and reception are operational, so far as the actor is aware. +- Participant created or reinitialized +- Received LACP PDU +- Physical MAC enabled +- Physical MAC disabled +- Current while timer expiry + +The physical MAC disabled event indicates that either or both of the physical +MAC transmission or reception for the physical port associated with the actor +have become non-operational. The received LACPDU event only occurs if both +physical transmission and reception are operational, so far as the actor is +aware. ![rxm.png](https://github.com/Azure/SONiC/blob/gh-pages/images/lacp_fallback_hld/rxm.png) # LACP Fallback Design -With the standard rx state machine described above, the member port will be put into defaulted state if the member port never receives LACP PDUs from remote end. And the member port is not selectable in defaulted state, thus the member port cannot be aggregated to the LAG. +With the standard rx state machine described above, the member port will be put +into defaulted state if the member port never receives LACP PDUs from remote +end. And the member port is not selectable in defaulted state, thus the member +port cannot be aggregated to the LAG. -In order to support LACP fallback feature, we need to make the port selectable in defaulted state if fallback is enabled. Hence we'd like to introduce the fallback mode in defaulted state. +In order to support LACP fallback feature, we need to make the port selectable +in defaulted state if fallback is enabled. Hence we'd like to introduce the +fallback mode in defaulted state. ![LACP_Defaulted.png](https://github.com/Azure/SONiC/blob/gh-pages/images/lacp_fallback_hld/LACP_Defaulted.png) -Fallback Mode: -In this mode, the port selected bit is being set, which means the port is selectable and can be aggregated into the LAG. If any LACP PDU is being received over the LAG during this mode, the port will move to expired state, and restart the LACP negotiation with peer. +- Fallback Mode: + +In this mode, the port selected bit is being set, which means the port is +selectable and can be aggregated into the LAG. If any LACP PDU is being +received over the LAG during this mode, the port will move to expired state, +and restart the LACP negotiation with peer. + +- Fallback Eligible: -Fallback Eligible: -This checks whether LACP fallback feature is configured on this LAG. One and only one member port can be put into fallback mode per LAG. And the server is supposed to use only the member port in fallback mode to communicate with switch. +This checks whether LACP fallback feature is configured on this LAG. One and +only one member port can be put into fallback mode per LAG. And the server is +supposed to use only the member port in fallback mode to communicate with +switch. To summarize, in the defaulted state, we have ``` @@ -78,7 +125,10 @@ Else # LACP Fallback Config ## JSON Config -teamd is configured using JSON config string. This can be passed to teamd either on the command line or in a file. JSON format was chosen because it's easy to specify (and parse) hierarchic configurations using it. + +teamd is configured using JSON config string. This can be passed to teamd +either on the command line or in a file. JSON format was chosen because it's +easy to specify (and parse) hierarchic configurations using it. Example teamd config (teamd1.conf): ``` @@ -89,19 +139,21 @@ Example teamd config (teamd1.conf): "name":"lacp", "active": true, "fast_rate": true, - "fallback": true, + "fallback": true, "tx_hash": ["eth", "ipv4"] }, "link_watch":{"name":"ethtool"}, "ports": { "Ethernet30":{}, - "Ethernet31":{}, - "Ethernet32":{} + "Ethernet31":{}, + "Ethernet32":{} } } ``` + ## Minigraph Config + ``` @@ -112,6 +164,7 @@ Example teamd config (teamd1.conf): ``` + The following set of Show commands relevant for LACP will be supported: ``` Teamshow @@ -120,7 +173,6 @@ The following set of Show commands relevant for LACP will be supported: # References -a) SONiC Configuration Management -b) Open Source libteam https://github.com/jpirko/libteam -c) IEEE 802.3ad Standard for LACP http://www.ieee802.org/3/ad/public/mar99/seaman_1_0399.pdf - +- SONiC Configuration Management +- Open Source libteam https://github.com/jpirko/libteam +- IEEE 802.3ad Standard for LACP http://www.ieee802.org/3/ad/public/mar99/seaman_1_0399.pdf diff --git a/doc/layer2-forwarding-enhancements/SONiC Layer 2 Forwarding Enhancements HLD.md b/doc/layer2-forwarding-enhancements/SONiC Layer 2 Forwarding Enhancements HLD.md new file mode 100644 index 0000000000..75e4225885 --- /dev/null +++ b/doc/layer2-forwarding-enhancements/SONiC Layer 2 Forwarding Enhancements HLD.md @@ -0,0 +1,444 @@ +# Layer 2 Forwarding Enhancements +#### Rev 1.0 + +# Table of Contents + * [List of Tables](#list-of-tables) + * [Revision](#revision) + * [About This Manual](#about-this-manual) + * [Scope](#scope) + * [Definition/Abbreviation](#definitionabbreviation) + * [1. Requirements Overview](#1-requirement-overview) + * [1.1 Functional Requirements](#11-functional-requirements) + * [1.2 Configuration and Management Requirements](#12-configuration-and-management-requirements) + * [1.3 Scalability Requirements](#13-scalability-requirements) + * [1.4 Warm Boot Requirements](#14-warm-boot-requirements) + * [2. Functionality](#2-functionality) + * [2.1 Functional Description](#21-functional-description) + * [3. Design](#3-design) + * [3.1 Overview](#31-overview) + * [3.2 DB Changes](#32-db-changes) + * [3.2.1 CONFIG DB](#321-config-db) + * [3.3 Switch State Service Design](#33-switch-state-service-design) + * [3.3.1 Orchestration Agent](#331-orchestration-agent) + * [3.3.2 Other Process](#332-other-process) + * [3.4 Syncd](#34-syncd) + * [3.5 SAI](#35-sai) + * [3.6 CLI](#36-cli) + * [3.6.1 Configuration Commands](#361-configuration-commands) + * [3.6.2 Show Commands](#362-show-commands) + * [4. Flow Diagrams](#4-flow-diagrams) + * [5. Serviceability and Debug](#5-serviceability-and-debug) + * [6. Warm Boot Support](#6-warm-boot-support) + * [7. Scalability](#7-scalability) + * [8. Unit Test](#8-unit-test) + + +# List of Tables +[Table 1: Abbreviations](#table-1-abbreviations) + + +# Revision +| Rev | Date | Author | Change Description | +|:---:|:-----------:|:------------------:|--------------------------------------------| +| 1.0 | 04/30/2019 | Anil Pandey | Added detailed requirements | +| | | | Added detailed Unit Test cases | +| 0.3 | 04/29/2019 | Pankaj Jain | Modified static-mac and aging commands | +| 0.2 | 04/26/2019 | Anil Pandey | Merged VLAN Range FS contents | +| 0.1 | 04/11/2019 | Anil Pandey | Initial version | + + +# About this Manual +This document provides general information about the Layer 2 Forwarding Enhancements feature implementation in SONiC. +# Scope +This document describes the high level design of Layer 2 Forwarding Enhancements feature. + + +# Definition/Abbreviation +### Table 1: Abbreviations +| **Term** | **Meaning** | +|--------------------------|-------------------------------------| +| FDB | Forwarding Database | + + +# 1 Requirement Overview +## 1.1 Functional Requirements + + 1. FDB Flush Support + - FDB entries should be flushed per Port when the operational state of the port goes down. Only dynamic entries should be flushed. + - FDB entries should be flushed per Port per VLAN when a port is removed from a VLAN. Both dynamic and static entries should be flushed. Static FDB configuration should not be removed. + - FDB entries should be flushed per Port or per Port per VLAN when triggered by Layer 2 Protocol upon topology change. Only dynamic entries should be flushed. + - FDB entries should be flushed per Portchannel when the admin or operational status of the Portchannel goes down. + - FDB entry should be removed from FDB_TABLE in STATE_DB and ASIC_SB and internal orchagent FDB data structure and Hardware. + +2. Handle MAC move event generated by hardware. + - Existing FDB entry should be replaced with one having new Port in ASIC_DB and STATE_DB when a MAC move event is received. + +3. Configuration CLI for FDB aging time. + - The FDB aging time should be configurable in hardware to a desired value from CLI. + - There should be an option to disable aging by setting the aging time to 0. + - The default FDB aging time should be set to 600 seconds. + +4. Configuration CLI for Static FDB entry. + - A Static FDB entry should be configurable in hardware from CLI. + - Static FDB entry should be set in CONFIG_DB and APP_DB even if the port is not member of VLAN. + - Static FDB entry should be added in saved FDB in Orchagent if port is not member of VLAN. + - Static FDB entry should be added in saved FDB in Orchagent after the entry is deleted due to FDB flush when port is removed from VLAN. + - Static FDB entry should be added in STATE_DB. + +5. Should have per Port, per VLAN and per Port per VLAN FDB clear options in CLI command "sonic-clear fdb". + +6. VLAN range CLI support + - Should be able to create a range of VLANs in a single CLI command. + - Should be able to delete a range of VLANs in a single CLI command. + - Should be able to add a port to a range of VLANs in a single CLI command. + - Should be able to delete a port from a range of VLANs in a single CLI command. + + +## 1.2 Configuration and Management Requirements +- New CLI is added for configuring FDB aging time. +- New CLI is added to display current FDB aging time. +- New CLI is added for add/delete of Static FDB entry. +- Existing CLI 'sonic-clear fdb' is extended to clear FDB per port or per VLAN or per Port per VLAN. +- Existing CLI tree is extended to include support for keyword 'range' in each of the VLAN create and delete and VLAN member create and delete commands. + + +## 1.3 Scalability Requirements +- Up to 4094 VLAN will be supported. +- VLAN range commands are invalidated for VLANs that fall outside the range 1 through 4094 + (Aforementioned range should be configurable and valid). + + +## 1.4 Warm Boot Requirements +Warm boot support already exists. + + +# 2 Functionality + +## 2.2 Functional Description +**1. FDB Flush suport** +- When a port operational state goes down, all dynamic FDB entries will be flushed on the port. The entries will be removed from FDB_TABLE in ASIC_DB, STATE_DB and Hardware and Orchagent data structures. Portchannel admin or operational state down will also be handled similarly. +- When a port is removed from VLAN, all static and dynamic FDB entries will be flushed on the (port,VLAN). Static FDB entries will still be preserved in the FDB_TABLE in CONFIG_DB and APP_DB. When port is added back to VLAN, Orchagent will reprogram the FDB entries. +- Spanning Tree Protocol to flush dynamic FDB entries either on a Port or on (Port,VLAN) when topology change occurs. API will be provided in Orchagent for L2 protocol component to flush FDB entries accordingly. + + +**2. Handle MAC move events.** +- MAC move event generated by some hardware (e.g. DNX) is currently not handled in SAI and SONiC. Will add support in Orchagent for moving the FDB entry to new port upon receipt of such event. + + +**3. Configuration for FDB aging time.** +- CLI configuration to be added to change the FDB aging time. By default, the aging time in hardware is 0 and set to 600 seconds in SONiC. It can be changed to a desired value if CLI is available. Currently, it can be done only by setting in APP_DB. The configuration range will be 0-1,000,000 seconds. Setting aging time to 0 will disable aging. + + +**4. Configuration for static FDB entry.** +- Currently, static FDB can only be added only by setting in APP_DB. CLI configuration will be added for this. +- If a dynamic FDB already exists with the same (MAC, VLAN), it will be replaced with the static entry. +- If the port is not member of VLAN, the static entry will be present in CONFIG_DB and APP_DB. It will also be saved in the orchagent saved FDB until the port is added to the VLAN. + + +**5. FDB clear options for per port and per VLAN ans per Port per VLAN clear.** + + - Currently, "sonic-clear fdb" command has only the option "ALL" supported. Will add options for "PORT" and "VLAN" to clear FDB entries either on a port or on a VLAN or on PORT and VLAN. Only dynamic FDB will be cleared. + + **6. VLAN Range CLI.** +- In current SONiC release, support for VLAN commands through CLI is limited to creating and deleting a single VLAN and adding/removing a port to/from a single VLAN. Support for VLAN range will be added in CLI to enable administrator to create and delete a range of VLANs and add/remove a port to/from a range of VLANs. +- When a large number of interfaces are made members of a huge set of VLANs, deletion of VLANs takes significant time due to the reason mentioned in [Scalability](#7-scalability) section. +- In SONiC, VLAN operations are performed either through config_db.json file or through CLI commands. In config_db.json file, administrator is required to update the desired VLAN and/or member port in each VLAN as per the JSON file format. This process is overburdening when it is required to manually add/delete VLANs in bulk and add/delete ports to/from VLANs in bulk. Administrator can perform the same operations through CLI commands which, though, is easy, becomes extremely taxing when multiple commands need to be executed to perform bulk operations, like creating/deleting 4094 VLANs and associating/removing ports from each of these VLANs through CLI commands that provide an option to enter only one VLAN. + - Administrator's task is greatly simplified with provision for VLAN range configuration. And time taken to perform the operations is drastically reduced. + + +# 3 Design +## 3.1 Overview +Code changes are confined to the components marked in RED. + + + **Overview of components involved:** + + + + + +For FDB flush and FDB aging and Static FDB commands, the design is described in details in the SWSS section. +Overview of changes for VLAN range support is provided below. + +**VLAN Range Support:** + +VLAN Range Support: For each of the new commands, a loop is iterated to run though the two configured parameters, first vlan-id and last vlan-id, (as specified in commands listed in section 3.6.2) and CONFIG_DB is updated through a single push operation using redisDB pipelining mechanism. Pipeline mechanism ensures the DB is updated very quick. + +Range command issued takes only two VLAN identifiers as arguments. List of VLANs are not configurable. For example, 'config vlan range add 2 10' is allowed and valid, but 'config vlan range add 2 10,3 100' is not valid. + +For VLAN range creation, VLAN existence for each VLAN identifier is checked in CONFIG_DB through GET operation prior to adding the redisDB SET to the pipeline. If few VLANs are already created and 'range create' command issued involves such VLANs, a message of severity 'warning' is logged, if the option '-w' is provided while configuring, and non-existent VLANs are created. + +For VLAN range deletion, existence of each VLAN is checked and delete operations are added to pipeline and executed at once after iterating through the entire range. For all non-existent VLANs, a single warning message is logged (if the optional argument, '-w', is enabled) after pipeline execution is completed. + +For VLAN member range creation and deletion, similar logic is added. + +VLAN and port identifier validation are added as part of the logic added in the functions for the new commands. + + +## 3.2 DB Changes +### 3.2.1 CONFIG DB +**FDB table for Static MAC** + +FDB_TABLE will be set in CONFIG_DB to store static MAC configuration. + +**SWITCH table for Aging time configuration** + +SWITCH_TABLE will be set in CONFIG_DB to store FDB aging time configuration. + +## 3.3 Switch State Service Design +### 3.3.1 Orchestration Agent + + +**Data Structure changes** + + +The internal FDB data structure will be enhanced to store the FDB type (static/dynamic). Currently, FDB is stored as c++ set. It will be changed to c++ map to store (key, value), where key is (MAC, bv_id) and value is FDB type. +FDB type is needed to add Static FDB to saved FDB after flush. + + + **FDB Data structure in Orchagent:** + + + + +In current Sonic implementation, when a FDB event is received with VLAN SAI object ID, the VLAN ID is retrieved by traversing through all the ports and VLANs in the orchagent DB and finding the one that matches the object ID. Also, the Port details are retrieved by traversing all the port and VLANs. This approach is inefficient as it may traverse through the number of VLANs configured for each FDB learn event. Worst case time is O(M*N) where M is the number of MAC events and N is the number of entries in port data structure. + + +A new unordered_map containing a mapping between SAI object ID (can be either Port object ID or Bridge Port object ID or VLAN object ID) and Name (port or VLAN alias) will be added which can be looked up in O(1) time. The Port or VLAN alias retrieved from this will be looked up in the Port data structure. + + + **Object ID to alias mapping.** + + + + + +**Changes for FDB flush.** +When a FDB flush request is received, Orchagent will send a bulk flush request to syncd. Individual FDB delete response from SAI will be used to delete FDB entries in ASIC_DB, STATE_DB and orchagent data structure. The delete response will be received as AGED event to syncd, which will delete the entry in ASIC_DB and notify Orchagent. + + +When a port operational status goes down, portsorch will call fdborch API to trigger FDB flush. Only dynamic FDB will be flushed. + + +When a port is removed from VLAN, portsorch will call fdborch API to trigger FDB flush. Both static and dynamic FDB will be flushed. Also, the static FDB will be added to the temporary DB in orchagent when delete response is received, so that it can programmed back when port is added back to VLAN. + + +When Spanning tree state changes, protocol component within orchagent will call fdborch API to flush FDB by either per Port or per Port per VLAN. Only dynamic FDB will be flushed. + + + **FDB Flush due to Port operational state down:** + + + + + + **FDB Flush due to Port removal from VLAN:** + + + + + + **FDB Flush due to spanning tree state change:** + + + + + +**Changes for handling MAC move events.** +Hardware can generate a single MAC move event instead of generating 2 events (del-old mac and add-new mac). Orchagent code changes will be done to replace the existing FDB entry with the new port in STATE_DB. + + +**Changes for FDB aging time configuration.** +A new CLI commands will be added for configuring the FDB aging time. +FDB aging time configuration will be set in the SWITCH table in CONFIG_DB. Vlanmgr will populate it in the APP_DB. Orchagent will get notified and handling will be SwitchOrch. SwitchOrch will call sai_redis API to send request to syncd via ASIC_DB. + + + + + +**Changes for Static FDB configuration.** +New CLI will be added for configuring Static FDB entry. It will be set in the FDB_TABLE in CONFIG_DB. Vlanmgr will handle the CONIG_DB changes, do necessary validations and then populate in APP_DB. FdbOrch already has all the necessary handling for Static FDB changes from APP_DB. +When a port is removed from VLAN, the corresponding Static FDB entries will also be flushed from all databases except APP_DB and CONFIG_DB. Flushed static FDB will be stored in the saved FDB in orchagent for retrieval and programming later when port is added back to VLAN. +If a dynamic FDB is already learnt and a static FDB is configured with same (MAC, VLAN), the existing dynamic FDB entry will be replaced with the static entry. + + + + + + + + + +### 3.3.2 Other Process +**Vlanmgr changes** + +Vlanmgr will handle Aging time configuration changes from SWITCH table in CONFIG_DB. It will set the new aging time in SWITCH_TABLE in APP_DB, which will be processed by SwitchOrch. + + +Vlanmgr will handle Static FDB configuration from FDB table in CONFIG_DB. It will do validation and then set the static FDB in FDB_TABLE in APP_DB, which will be processed by FdbOrch. + + +## 3.4 SyncD +No change. + + +## 3.5 SAI +No changes are being made in SAI. The following SAI attributes are used by the changes being made to SONiC: + +1. SAI_FDB_FLUSH_ATTR_BRIDGE_PORT_ID +2. SAI_FDB_FLUSH_ATTR_BV_ID +3. SAI_FDB_FLUSH_ATTR_ENTRY_TYPE +4. SAI_FDB_ENTRY_ATTR_TYPE +5. SA_FDB_EVENT_MOVE +6. SAI_SWITCH_ATTR_FDB_AGING_TIME + + + +## 3.6 CLI + +### 3.6.1 Configuration Commands +**FDB Aging time configuration** + +root@sonic:/# config mac aging-time +- To set the mac aging time to a value. + + +**Static MAC configuration** + + +root@sonic:/# config mac add 00:10:3a:2b:05:67 100 Ethernet2 +- To add a static mac on vlan 100 and port Ethernet2. + + +root@sonic:/# config mac del 00:10:3a:2b:05:67 100 +- To delete a static mac on vlan 100. + +**VLAN Range configuration** + +root@sonic:/# config vlan range add <-w, optional argument> + +root@sonic:/# config vlan range del <-w, optional argument> + +root@sonic:/# config vlan member range add <-w, optional argument> + +root@sonic:/# config vlan member range del <-w, optional argument> + + + +### 3.6.2 Show Commands + + +root@sonic:/# show mac aging-time +- To display the current configured mac aging time. + + +# 4 Flow Diagrams +Flow diagrams are provided in the [Design](#3-design) section. + +# 5 Serviceability and Debug +Debug counters will be added for all operations and events related to L2 forwarding like the following: +In Orchagent: +- Number of FDB learn/aged events received from ASIC_DB. +- Number of FDB add/delete request received from APP_DB. +- Number vlan add/delete received from APP_DB. +- Number of FDB entries inserted in the saved FDB database. +- Number of FDB add/delete requests sent to syncd. +- Counters for various failure conditions. + + +NOTICE/INFO level logs will be added for major operations and events in vlanmgr and orchagent. + + +Logic is added in the functions for VLAN range commands to append warning messages to a list and are displayed if the optional argument is provided. Iteration in the loops is not intentionally broken, if an existing/non-existing VLAN is trying to be operated upon, to ensure configuration for other VLANs goes through successfully. + +# 6 Warm Boot Support + +No change. + +# 7 Scalability + +If the number of VLANs in the range commands is high, the time it takes to perform the operation increases, though it is significantly lower compared to the same operation being performed using existing commands. + + +# 8 Unit Test + +**Data structure changes** + +1. Verify that FDB internal map is updated properly when a FDB entry is added/deleted in hardware or when a static FDB entry is added/deleted. + +2. Verify that SAI Object ID to port/VLAN name mapping table is updated properly when a port/port-channel or VLAN is added/removed + +**FDB flush** + +3. Verify that dynamic fdb entries are flushed in hardware, orchagent data structure and other DBs when a port/portchannel operational state goes down. + +4. Verify that dynamic fdb entries are flushed in hardware, orchagent data structure and other DBs when a port/portchannel admin state goes down. + +5. Verify that FDB entries are added back properly when the port comes back operationally up. + +6. Verify that both static and dynamic fdb entries are flushed in hardware, orchagent data structure, STATE_DB and ASIC_DB when a port is removed from VLAN. + +7. Verify that the flushed static FDB entries are stored in saved FDB in orchagent. + +8. Verify that FDB entries are added back properly when the port is added back to the vlan. + +9. Verify that dynamic fdb entries are flushed in hardware, orchagent data structure and other DBs when 'sonic-clear fdb all" command is issued. + +10. Verify that dynamic fdb entries on a port are flushed in hardware, orchagent data structure and other DBs when 'sonic-clear fdb port" command is issued. + +11. Verify that dynamic fdb entries on a VLAN are flushed in hardware, orchagent data structure and other DBs when 'sonic-clear fdb vlan" command is issued. + +12. Verify that mac are learnt properly again after flush due to 'sonic-clear' command. + +13. Send traffic from same MAC, VLAN to a different port and verify that mac is updated in hardware, ASIC_DB, STATE_DB and orchagent data structure. + +**FDB Aging time configuration** + +14. Set FDB aging time from CLI and verify that FDB entries age out after that interval. + +15. Verify that FDB aging time takes effect after configuration is saved and switch is rebooted. + +**Static FDB entry configuration** + +16. Verify that Static FDB entry configured from CLI is added in hardware and all other DBs. + +18. Verify that if port is not member of VLAN, Static FDB entry is added in CONFIG_DB, APP_DB and saved FDB in orchagent. + +19. Verify that Static FDB entry is effective after configuration is saved and switch is rebooted.. + +**Warm restart** + +20. Verify that Aging time and static MAC configuration are re-applied after switch, swss docker, syncd and orchagent warm reboot. + +21. Verify that FDb entries are synchronized after Orchagent unplanned reboot if the reboot happens while FDB flush is in progress. + +**Scale test** + +22. Verify 8K FDB entry learning. All entries should be added to various DBs, hardware and Orchagent data structure. + +23. Verify 8K FDB entry aging. All entries should be deleted from various DBs, hardware and Orchagent data structures. + +24. Verify 8K FDB entry flush per port, per VLAN and per port per VLAN. + +26. Learn 8K FDB entries on 4094 VLANs and verify the above cases. + +**VLAN Range:** + +27. Validate VLAN identifier in VLAN range commands. +28. Check log messages (when '-w' option is provided) for overlapping and non-existent VLANs provided in VLAN range commands. Any conflicting configuration (for example: portchannel having an IP address configured being configured to participate in a VLAN) results in warning messages. Messages are grouped and displayed after the execution of the commands. +29. Validate the port data provided in VLAN member range command. +30. Validate the traffic flow after the VLANs are created and member ports are added. +31. Ensure the ports removed from the VLANs are excluded from the hardware by sending matching traffic flows. +32. Ensure the deletion of range of VLANs is successful. +33. Create a range of VLANs and save the configuration. Check for the existence of VLANs after reload operation. +34. Remove few VLANs randomly after executing the command to create a range of VLANs. Check for the existence of the rest of the VLANs. + + diff --git a/doc/layer2-forwarding-enhancements/images/agingTimeConfig.jpg b/doc/layer2-forwarding-enhancements/images/agingTimeConfig.jpg new file mode 100644 index 0000000000..6922b9853b Binary files /dev/null and b/doc/layer2-forwarding-enhancements/images/agingTimeConfig.jpg differ diff --git a/doc/layer2-forwarding-enhancements/images/fdbDataStructure.jpg b/doc/layer2-forwarding-enhancements/images/fdbDataStructure.jpg new file mode 100644 index 0000000000..bd8702a743 Binary files /dev/null and b/doc/layer2-forwarding-enhancements/images/fdbDataStructure.jpg differ diff --git a/doc/layer2-forwarding-enhancements/images/oidToName.jpg b/doc/layer2-forwarding-enhancements/images/oidToName.jpg new file mode 100644 index 0000000000..f972266458 Binary files /dev/null and b/doc/layer2-forwarding-enhancements/images/oidToName.jpg differ diff --git a/doc/layer2-forwarding-enhancements/images/overall.jpg b/doc/layer2-forwarding-enhancements/images/overall.jpg new file mode 100644 index 0000000000..190118a23e Binary files /dev/null and b/doc/layer2-forwarding-enhancements/images/overall.jpg differ diff --git a/doc/layer2-forwarding-enhancements/images/portDownFlush.jpg b/doc/layer2-forwarding-enhancements/images/portDownFlush.jpg new file mode 100644 index 0000000000..04018a5db1 Binary files /dev/null and b/doc/layer2-forwarding-enhancements/images/portDownFlush.jpg differ diff --git a/doc/layer2-forwarding-enhancements/images/portVlanRemoveFlush.jpg b/doc/layer2-forwarding-enhancements/images/portVlanRemoveFlush.jpg new file mode 100644 index 0000000000..9afffdbd6c Binary files /dev/null and b/doc/layer2-forwarding-enhancements/images/portVlanRemoveFlush.jpg differ diff --git a/doc/layer2-forwarding-enhancements/images/staticMacConfig.jpg b/doc/layer2-forwarding-enhancements/images/staticMacConfig.jpg new file mode 100644 index 0000000000..e857a50645 Binary files /dev/null and b/doc/layer2-forwarding-enhancements/images/staticMacConfig.jpg differ diff --git a/doc/layer2-forwarding-enhancements/images/stpFlush.jpg b/doc/layer2-forwarding-enhancements/images/stpFlush.jpg new file mode 100644 index 0000000000..81ab313b59 Binary files /dev/null and b/doc/layer2-forwarding-enhancements/images/stpFlush.jpg differ diff --git a/doc/macsec-gearbox/dummy b/doc/macsec-gearbox/dummy new file mode 100644 index 0000000000..e69de29bb2 diff --git a/doc/media-settings/Media-based-Port-settings.md b/doc/media-settings/Media-based-Port-settings.md index 8e71f1dad4..7417a82216 100644 --- a/doc/media-settings/Media-based-Port-settings.md +++ b/doc/media-settings/Media-based-Port-settings.md @@ -136,9 +136,9 @@               This mechanism is also very helpful in supporting new media types without upgrading the Operating system. If a new media type need to be supported the only change that needs to be done is modify media_settings.json to add the new media type. -![](https://github.com/dgsudharsan/SONiC/blob/hld_media/doc/media-settings/event_flow.png) +![](event_flow.png) -![](https://github.com/dgsudharsan/SONiC/blob/hld_media/doc/media-settings/key_selection_flow.png) +![](key_selection_flow.png) ## Breakout Scenario: diff --git a/doc/mgmt/CVL_Yang_as_NB_YANG_v2.pptx b/doc/mgmt/CVL_Yang_as_NB_YANG_v2.pptx new file mode 100644 index 0000000000..a007d08416 Binary files /dev/null and b/doc/mgmt/CVL_Yang_as_NB_YANG_v2.pptx differ diff --git a/doc/mgmt/Jun2-26-2019-Sonic-Mgmt-Framework-Demo-Workflow.pptx b/doc/mgmt/Jun2-26-2019-Sonic-Mgmt-Framework-Demo-Workflow.pptx new file mode 100644 index 0000000000..994b8322ec Binary files /dev/null and b/doc/mgmt/Jun2-26-2019-Sonic-Mgmt-Framework-Demo-Workflow.pptx differ diff --git a/doc/mgmt/Management Framework.md b/doc/mgmt/Management Framework.md new file mode 100644 index 0000000000..155e99236a --- /dev/null +++ b/doc/mgmt/Management Framework.md @@ -0,0 +1,2077 @@ +# SONiC Management Framework + +## High level design document + +### Rev 0.11 + +## Table of Contents + +* [List of Tables](#list-of-tables) +* [Revision](#revision) +* [About this Manual](#about-this-manual) +* [Scope](#scope) +* [Definition/Abbreviation](#definitionabbreviation) +* [Table 1: Abbreviations](#table-1-abbreviations) +* [1 Feature Overview](#1-feature-overview) + * [1.1 Requirements](#11-requirements) + * [1.2 Design Overview](#12-design-overview) + * [1.2.1 Basic Approach](#121-basic-approach) + * [1.2.2 Container](#122-container) +* [2 Functionality](#2-functionality) + * [2.1 Target Deployment Use Cases](#21-target-deployment-use-cases) +* [3 Design](#3-design) + * [3.1 Overview](#31-overview) + * [3.1.1 Build time flow](#311-build-time-flow) + * [3.1.2 Run time flow](#312-run-time-flow) + * [3.1.2.1 CLI](#3121-cli) + * [3.1.2.2 REST](#3122-REST) + * [3.1.2.3 gNMI](#3123-gnmi) + * [3.2 SONiC Management Framework Components](#32-sonic-management-framework-components) + * [3.2.1 Build time components](#321-build-time-components) + * [3.2.1.1 Yang to OpenAPI converter](#3211-yang-to-openapi-converter) + * [3.2.1.1.1 Overview](#32111-overview) + * [3.2.1.1.2 Supported HTTP verbs](#32112-supported-http-verbs) + * [3.2.1.1.3 Supported Data Nodes](#32113-supported-data-nodes) + * [3.2.1.1.4 Data Type Mappings](#32114-data-type-mappings) + * [3.2.1.1.5 Future enhancements](#32115-future-enhancements) + * [3.2.1.2 swagger generator](#3212-swagger-generator) + * [3.2.1.3 YGOT generator](#3213-YGOT-generator) + * [3.2.1.4 pyang compiler](#3214-pyang-compiler) + * [3.2.2 Run time components](#322-run-time-components) + * [3.2.2.1 CLI](#3221-cli) + * [3.2.2.2 REST Client SDK](#3222-REST-client-sdk) + * [3.2.2.3 gNMI Client](#3223-gnmi-client) + * [3.2.2.4 REST server](#3224-REST-server) + * [3.2.2.4.1 Transport options](#32241-transport-options) + * [3.2.2.4.2 Translib linking](#32242-Translib-linking) + * [3.2.2.4.3 Media Types](#32243-media-types) + * [3.2.2.4.4 Payload Validations](#32244-payload-validations) + * [3.2.2.4.5 Concurrency](#32245-concurrency) + * [3.2.2.4.6 API Versioning](#32246-api-versioning) + * [3.2.2.4.7 RESTCONF Entity-tag](#32247-restconf-entity-tag) + * [3.2.2.4.8 RESTCONF Discovery](#32248-restconf-discovery) + * [3.2.2.4.9 RESTCONF Query Parameters](#32249-restconf-query-parameters) + * [3.2.2.4.10 RESTCONF Operations](#322410-restconf-operations) + * [3.2.2.4.11 RESTCONF Notifications](#322411-restconf-notifications) + * [3.2.2.4.12 Authentication](#322412-authentication) + * [3.2.2.4.13 Error Response](#322413-error-response) + * [3.2.2.4.14 DB Schema](#322414-db-schema) + * [3.2.2.4.15 API Documentation](#322415-api-documentation) + * [3.2.2.5 gNMI server](#3225-gnmi-server) + * [3.2.2.5.1 Files changed/added](#32251-files-changed/added) + * [3.2.2.5.2 Sample Requests](#32253-sample-requests) + * [3.2.2.6 Translib](#3226-Translib) + * [3.2.2.6.1 App Interface](#32261-app-interface) + * [3.2.2.6.2 Translib Request Handler](#32262-Translib-request-handler) + * [3.2.2.6.3 YGOT request binder](#32263-YGOT-request-binder) + * [3.2.2.6.4 DB access layer](#32264-db-access-layer) + * [3.2.2.6.5 App Modules](#32265-app-modules) + * [3.2.2.7 Transformer](#3227-transformer) + * [3.2.2.7.1 Components](#32271-components) + * [3.2.2.7.2 Design](#32272-design) + * [3.2.2.7.3 Process](#32273-process) + * [3.2.2.7.4 Common App](#32274-common-app) + * [3.2.2.7.5 YANG Extensions](#32275-yang-extensions) + * [3.2.2.7.6 Public Functions](#32276-public-functions) + * [3.2.2.7.7 Overloaded Modules](#32277-overloaded-modules) + * [3.2.2.7.8 Utilities](#32278-utilities) + * [3.2.2.8 Config Validation Library (CVL)](#3228-config-validation-library-cvl) + * [3.2.2.8.1 Architecture](#32281-architecture) + * [3.2.2.8.2 Validation types](#32282-validation-types) + * [3.2.2.8.3 CVL APIs](#32283-cvl-apis) + * [3.2.2.9 Redis DB](#3229-redis-db) + * [3.2.2.10 Non DB data provider](#32210-non-db-data-provider) +* [4 Flow Diagrams](#4-flow-diagrams) + * [4.1 REST SET flow](#41-REST-set-flow) + * [4.2 REST GET flow](#42-REST-get-flow) + * [4.3 Translib Initialization flow](#43-Translib-initialization-flow) + * [4.4 gNMI flow](#44-gNMI-flow) + * [4.5 CVL flow](#45-CVL-flow) +* [5 Developer Work flow](#5-Developer-Work-flow) + * [5.1 Developer work flow for custom (SONiC/CVL) YANG](#51-Developer-work-flow-for-custom-SONiCCVL-YANG) + * [5.1.1 Define Config Validation YANG schema](#511-Define-Config-Validation-YANG-schema) + * [5.1.2 Generation of REST server stubs and Client SDKs for YANG based APIs](#512-Generation-of-REST-server-stubs-and-Client-SDKs-for-YANG-based-APIs) + * [5.1.3 Config Translation App (Go language)](#513-Config-Translation-App-Go-language) + * [5.1.4 IS CLI](#514-IS-CLI) + * [5.1.5 gNMI](#515-gNMI) + * [5.2 Developer work flow for standard (OpenConfig/IETF) YANG](#52-Developer-work-flow-for-standard-OpenConfigIETF-YANG) + * [5.2.1 Identify the standard YANG module for the feature for northbound APIs](#521-Identify-the-standard-YANG-module-for-the-feature-for-northbound-APIs) + * [5.2.2 Define the Redis schema for the new feature. (not applicable for legacy/existing feature)](#522-Define-the-Redis-schema-for-the-new-feature-not-applicable-for-legacyexisting-feature) + * [5.2.3 Define Config Validation YANG schema](#523-Define-Config-Validation-YANG-schema) + * [5.2.4 Generation of REST server stubs and Client SDKs for YANG based APIs](#524-Generation-of-REST-server-stubs-and-Client-SDKs-for-YANG-based-APIs) + * [5.2.5 Config Translation App (Go language)](#525-Config-Translation-App-Go-language) + * [5.2.6 IS CLI](#526-IS-CLI) + * [5.2.7 gNMI](#527-gNMI) +* [6 Error Handling](#6-error-handling) +* [7 Serviceability and Debug](#7-serviceability-and-debug) +* [8 Warm Boot Support](#8-warm-boot-support) +* [9 Scalability](#9-scalability) +* [10 Unit Test](#10-unit-test) +* [11 Appendix A](#11-appendix-a) +* [12 Appendix B](#11-appendix-b) + + +## List of Tables + +[Table 1: Abbreviations](#table-1-abbreviations) + +## Revision + +| Rev | Date | Author | Change Description | +|:---:|:-----------:|:-----------------------:|-----------------------------------| +| 0.1 | 06/13/2019 | Anand Kumar Subramanian | Initial version | +| 0.2 | 07/05/2019 | Prabhu Sreenivasan | Added gNMI, CLI content from DELL | +| 0.3 | 08/05/2019 | Senthil Kumar Ganesan | Updated gNMI content | +| 0.4 | 08/07/2019 | Arun Barboza | Clarifications on Table CAS | +| 0.5 | 08/07/2019 | Anand Kumar Subramanian | Translib Subscribe support | +| 0.6 | 08/08/2019 | Kwangsuk Kim | Updated Developer Workflow and CLI sections | +| 0.7 | 08/09/2019 | Partha Dutta | Updated Basic Approach under Design Overview | +| 0.8 | 08/15/2019 | Anand Kumar Subramanian | Addressed review comments | +| 0.9 | 08/19/2019 | Partha Dutta | Addressed review comments related to CVL | +| 0.10 | 09/25/2019 | Kwangsuk Kim | Updated Transformer section | +| 0.11 | 09/30/2019 | Partha Dutta | Updated as per SONiC YANG guideline | +| 0.12 | 10/19/2019 | Senthil Kumar Ganesan | Added Appendix B | + +## About this Manual + +This document provides general information about the Management framework feature implementation in SONiC. + +## Scope + +This document describes the high level design of Management framework feature. + +## Definition/Abbreviation + +### Table 1: Abbreviations + +| **Term** | **Meaning** | +|--------------------------|-------------------------------------| +| CVL | Config Validation Library | +| NBI | North Bound Interface | +| ABNF | Augmented Backus-Naur Form | +| YANG | Yet Another Next Generation | +| JSON | Java Script Object Notation | +| XML | eXtensible Markup Language | +| gNMI | gRPC Network Management Interface | +| YGOT | YANG Go Tools | + +## 1 Feature Overview + +Management framework is a SONiC application which is responsible for providing various common North Bound Interfaces (NBIs) for the purposes of managing configuration and status on SONiC switches. The application manages coordination of NBI’s to provide a coherent way to validate, apply and show configuration. + +### 1.1 Requirements + +* Must provide support for: + + 1. Standard [YANG](https://tools.ietf.org/html/rfc7950) models (e.g. OpenConfig, IETF, IEEE) + 2. Custom YANG models ([SONiC YANG](https://github.com/Azure/SONiC/blob/master/doc/mgmt/SONiC_YANG_Model_Guidelines.md)) + 3. Industry-standard CLI / Cisco like CLI + +* Must provide support for [OpenAPI spec](https://swagger.io/specification/) to generate REST server side code +* Must provide support for NBIs such as: + + 1. CLI + 2. gNMI + 3. REST/RESTCONF + +* Must support the following security features: + + 1. Certificate-based authentication + 2. User/password based authentication + 3. Role based authorization + +* Ease of use for developer workflow + + 1. Specify data model and auto-generate as much as possible from there + +* Must support Validation and Error Handling - data model, platform capability/scale, dynamic resources +* SNMP integration in SONiC is left for future study + +### 1.2 Design Overview + +Management framework makes use of the translation library (Translib) written in golang to convert the data models exposed to the management clients into the Redis ABNF schema format. Supported management servers can make use of the Translib to convert the incoming payload to SONiC ABNF schema and vice versa depending on the incoming request. Translib will cater to the needs of REST and gNMI servers. Later the Translib can be enhanced to support other management servers if needed. This framework will support both standard and custom YANG models for communication with the corresponding management servers. Management framework will also take care of maintaining data consistency, when writes are performed from two different management servers at the same time. Management framework will provide a mechanism to authenticate and authorize any incoming requests. Management framework will also take care of validating the requests before writing them into the Redis DB. Config Validation Library is used for syntactic and semantic validation of ABNF JSON based on YANG derived from Redis ABNF schema. + +#### 1.2.1 Basic Approach + +* Management framework takes comprehensive approach catering: + * Standard based YANG Models and custom YANG + * Open API spec + * Industry standard CLI + * Config Validation +* REST server, gNMI server, App module and Translib - all in Go +* Translation by using the Translib Library and application specific modules +* Marshalling and unmarshalling using YGOT +* Redis updated using CAS(Check-and-Set) trans. (No locking, No rollback) +* Config Validation by using YANG model from ABNF schema +* CLI with Klish framework + +#### 1.2.2 Container + +The management framework is designed to run in a single container named “sonic-mgmt-framework”. The container includes the REST server linked with Translib, and CLI process. +The gNMI support requires the gNMI server which is provided as a part of sonic-telemetry container. We would like to rename this container as the sonic-gnmi container as now it can perform configurations as well through the gNMI server. + +## 2 Functionality + +### 2.1 Target Deployment Use Cases + +1. Industry Standard CLI which will use REST client to talk to the corresponding servers to send and receive data. +2. REST client through which the user can perform POST, PUT, PATCH, DELETE, GET operations on the supported YANG paths. +3. gNMI client with support for capabilities, get, set, and subscribe based on the supported YANG models. + +## 3 Design + +### 3.1 Overview + +The SONiC management framework comprises two workflows: + +1. Build time flow +2. Run time flow + +as show in the architecture diagram below. + +![Management Framework Architecture diagram](images/Mgmt_Frmk_Arch.jpg) + +#### 3.1.1 Build time flow + +The Developer starts by defining the desired management objects and the access APIs to provide for the target application. This can be done in one of the two ways: - +1) A YANG data model +2) An OpenAPI spec + +This can be an independent choice on an application by application basis. However note that using YANG allows for richer data modelling, and therefore superior data validation. + +1. In case of YANG, if the developer chooses standard YANG model (Openconfig, IETF etc.), a separate SONiC YANG model has to be written based on Redis ABNF schema for validating Redis configuration and transformer hints should be written in a deviation file for standard YANG model to Redis DB coversion and vice versa (refer to [3.2.2.7 Transformer](#3227-transformer) for details). However, if custom SONiC YANG model is written based on guidelines, CVL YANG is automatically derived from it and the same is used for validation purpose and there is no need of writing any deviation file for transformer hints. Based on the given YANG model as input, the pyang compiler generates the corresponding OpenAPI spec which is in turn given to the Swagger generator to generate the REST client SDK and REST server stubs in golang. The YANG data model is also provided to the [YGOT](https://github.com/openconfig/YGOT) generator to create the YGOT bindings. These are used on the interface between Translib and the selected App module. Specifically, Translib populates the binding structures based upon the incoming server payload, and the App module processes the structure accordingly. Additionally, a YANG annotation file must also be provided, for data models that do not map directly to the SONiC YANG structure. The requests in this case will be populated into the YGOT structures and passed to App module for conversion. The App module uses the YANG annotations to help convert and map YANG objects to DB objects and vice-versa. + +2. In case of OpenAPI spec, it is directly given to the [Swagger](https://swagger.io) generator to generate the REST client SDK and REST server stubs in golang. In this case the REST server takes care of validating the incoming request to be OpenAPI compliant before giving the same to Translib. There is no YANG, and therefore no YGOT bindings are generated or processed, and so the Translib infra will invoke the App module functions with the path and the raw JSON for App modules to convert. For configuration validation purpose, SONiC YANG model has to be written based on Redis ABNF schema. + +#### 3.1.2 Run time flow + +##### 3.1.2.1 CLI + +1. CLI uses the KLISH framework to provide a CLI shell. The CLI request is converted to a corresponding REST client SDK request that was generated by the Swagger generator, and is given to the REST server. +2. The Swagger generated REST server handles all the REST requests from the client SDK and invokes a common handler for all the create, update, replace, delete and get operations along with path and payload. This common handler converts all the requests into Translib arguments and invokes the corresponding Translib provided APIs. +3. Translib uses the value of the input (incoming) path/URI to determine the identity of the appropriate App module. It then calls that App module, passing the request as either a populated YGOT structure or as JSON, depending upon the data modelling method in use for the application (see section [3.1.1](#311-build-time-flow)). +4. Further processing of CLI commands will be handled by Translib componenets that will be discussed in more detail in the later sections. + +##### 3.1.2.2 REST + +1. REST client will use the Swagger generated client SDK to send the request to the REST server. +2. From then on the flow is similar to the one seen in the CLI. + +##### 3.1.2.3 gNMI + +GNMI service defines a gRPC-based protocol for the modification and retrieval of configuration from a target device, as well as the control and generation of telemetry streams from a target device to a data collection system. Refer [GNMI spec](https://github.com/openconfig/reference/blob/master/rpc/gnmi/gnmi-specification.md) + +![GNMI Service High level diagram (Proposed)](images/GNMI_Server.png) + +1. Existing SONiC telemetry framework has been extended to support the new GNMI services. +2. All 4 GNMI services are supported: Get, Set, Capabilities and Subscribe. +3. A new transl data client is added to process the incoming YANG-based request (either standard or proprietary) +4. The new transl data client relies on Translib infra provided to translate, get, set of the YANG objects. + +More details onthe GNMI server, Client and workflow provided later in the document. + +### 3.2 SONiC Management Framework Components + +Management framework components can be classified into + +1. Build time components +2. Run time components + +#### 3.2.1 Build time components + +Following are the build time components of the management framework + +1. Pyang compiler (for YANG to OpenAPI conversion) +2. Swagger generator +3. YGOT generator +4. pyang compiler (for YANG to YIN conversion) + +##### 3.2.1.1 YANG to OpenAPI converter + +##### 3.2.1.1.1 Overview + +Open source Python-based YANG parser called pyang is used for YANG parsing and building a Python object dictionary. A custom plugin is developed to translate this Python object dictionary into an OpenAPI spec. As of now OpenAPI spec version, 2.0 is chosen considering the maturity of the toolset available in the open community. + +URI format and payload is RESTCONF complaint and is based on the [RFC8040](https://tools.ietf.org/html/rfc8040). The Request and Response body is in JSON format in this release. + +##### 3.2.1.1.2 Supported HTTP verbs + +Following are the HTTP methods supported in first release. + +POST, PUT, PATCH, GET and DELETE. + +##### 3.2.1.1.3 Supported Data Nodes + +For each of the below-listed Data keywords nodes in the YANG model, the OpenAPI (path) will be generated in version 1 + +* Container +* List +* Leaf +* Leaf-list + +##### 3.2.1.1.4 Data Type Mappings + + YANG Type | OpenAPI Type +-------------|------------------ +int8 | Integer +int16 | Integer +int32 | Integer +int64 | Integer +uint8 | Integer +uint16 | Integer +uint32 | Integer +uint64 | Integer +decimal64 | Number +String | string +Enum | Enum +Identityref | String (Future can be Enum) +long | Integer +Boolean | Boolean +Binary | String with Format as Binary () +bits | integer + +* All list keys will be made mandatory in the payload and URI +* YANG mandatory statements will be mapped to the required statement in OpenAPI +* Default values, Enums are mapped to Default and Enums statements of OpenAPI +* Currently, Swagger/OpenAPI 2.0 Specification does NOT support JSON-schema anyOf and oneOfdirectives, which means that we cannot properly treat YANG choice/case statements during conversion. As a workaround, the current transform will simply serialize all configuration nodes from the choice/case sections into a flat list of properties. + +##### 3.2.1.1.5 Future enhancements + +* Support for additional Data nodes such as RPC, Actions, and notifications(if required). +* Support for RESTCONF query parameters such as depth, filter, etc +* Support for other RESTCONF features such as capabilities. +* Support for HTTPS with X.509v3 Certificates. +* Support for a pattern in string, the range for integer types and other OpenAPI header objects defined in https://github.com/OAI/OpenAPI-Specification/blob/master/versions/2.0.md#header-object +* Other misc OpenAPI related constraints will be added + +##### 3.2.1.2 Swagger generator + +Swagger-codegen tool (github.com/Swagger-api/Swagger-codegen) is used to generate REST server and client code from the OpenAPI definitions. It consumes the OpenAPI definitions generated from YANG files and any other manually written OpenAPI definition files. + +REST server is generated in Go language. Customized Swagger-codegen templates are used to make each server stub invoke a common request handler function. The common request handler will invoke Translib APIs to service the request. + +REST client is generated in Python language. Client applications can generate the REST client in any language using standard Swagger-codegen tool. + +##### 3.2.1.3 YGOT generator + +YGOT generator generates Go binding structures for the management YANG. The generated Go binding structures are consumed by the Translib to validate the incoming payload and help in conversion of Redis data to management YANG specific JSON output payload. + +##### 3.2.1.4 pyang compiler + +Open source pyang tool is used to compile YANG models and generate output file in required format. For example SONiC YANG model is compiled and YIN schema file is generated for validation purpose. Similarly, after compiling YANG model, OpenAPI spec file is generated for generating REST client SDK and server code using Swagger-codegen. + +#### 3.2.2 Run time components + +Following are the run time components in the management framework + +1. CLI +2. REST Client SDK +3. REST server +4. gNMI Client +5. gNMI server +6. Translib +7. Config Validation Library (CVL) +8. Redis DB +9. Non DB data provider + +##### 3.2.2.1 CLI + +Open source Klish is integrated into sonic-mgmt-framework container to provide the command line interface tool to perform more efficient network operations more efficiently in SONiC. Klish will provide the core functionality of command parsing, syntax validation, command help and command auto-completion. The following diagram shows how the CLI commands are built, processed, and executed. + +![CLI components Interaction Diagram](images/cli_interactions.jpg) + +1. CLI command input from user +2. Klish invokes the actioner script +3. Actioner script invokes the Swagger Client SDK API to make a REST API call. +4. Receive response fromSswagger client API and pass it to renderer scripts. +5. Renderer scripts processes the JSON response from REST Client and optionally formats the output using a Jinja template +6. CLI output is rendered to the console. + +###### 3.2.2.1.1 CLI components + +CLI consists of the following components. + +* **Open source Klish** - CLI parser framework to support Command Line Interface Shell +* **XML files** to define CLI command line options and actions + * Klish uses XML to define CLI commands to build the command tree. Klish provides modular CLI tree configurations to build command trees from multiple XML files. XML elements can be defined with macros and entity references, which are then preprocessed by utility scripts to generate the expanded XML files that are ultimately used by Klish. +* **Actioner** - Python scripts defined as a command `action`, to form the request body and invoke the swagger client API +* **Renderer** - Python scripts defined with Jinja templates. Receives the JSON response from Swagger API and use the jinja2 template file to render and format CLI output. +* **Preprocess scripts** - Validates XML files and applies some processing from a developer-friendly form into a "raw" form that is compatible with Klish. + +###### 3.2.2.1.2 Preprocessing XML files + +Multiple scripts are executed at build time to preprocess special XML tags/attributes - MACRO substitution, adding pipe processing, etc. - and generate target XML files in a form that is consumable by Klish. + +The XML files are also validated as part of compilation. `xmllint` is used to validate all the processed XML files after macro substitution and pipe processing against the detailed schema defined in `sonic-clish.xsd`. Once the XML files are fully validated and preprocessed, the target XML files are generated in the folder `${workspace}/sonic-mgmt-framework/build/cli/target/command-tree`. + +The following preprocessing scripts are introduced: +* `klish_ins_def_cmd.py` - append the "exit" and "end" commands to the views of the Klish XML files +* `klish_insert_pipe.py` - extend every show and get COMMAND with pipe option +* `klish_platform_features_process.sh` - validate all platform XML files. Generate the entity.XML files. +* `klish_replace_macro.py` – perform macro substitution on the Klish XML files + +###### 3.2.2.1.3 MACROs + +There are some CLI commands that can have the same set of options, where the set of XML tags would need to be repeated in all those CLI commands. Macros can be used to avoid such repetitions and to keep the options in one place, so that it is possible to make a reference to a macro in multiple command definitions. There are cases where we may need to use variations in the values of these macro options. In such cases, it is also possible to pass those XML attributes as an argument to macros and substitute those values inside the macro definition. The macro definition is referred as the ```` and ```` tags. `klish_replace_macro.py` is used to process macros at the compile time to expand the references to the target XML files. +The macro definition files are located in the folder `${workspace}/sonic-mgmt-framework/src/CLI/clitree/macro`. + +Example: + +Before macro substitution: + +```XML + + + + + ... + +``` + +After macro substitution: + +```XML + + + + + + + + + ... + +``` + +###### 3.2.2.1.4 ENTITY +XML files can include an ENTITY that refers to a predefined value. Entity is typically used to define platform specific values and processed by `klish_platform_features_process.sh` to prepend the ENTITY values to the target XML files. By default, there is a default file called `platform_dummy.XML` that defines a platform default ENTITY list. Note that the platform specific is not supported yet. + +Example: `platform_dummy.XML` + +```XML + + START_PORT_ID + MAX_PORT_ID + START_SUB_PORT_ID + MAX_SUB_PORT_ID + MAX_MTU + +``` + +The ENTITY name can be referenced in command definitions. For example, PTYPE for RANGE_MTU, used for interface commands: + +```XML + +``` + +###### 3.2.2.1.5 Actioner scripts + +The Actioner script is used to invoke the swagger client API. The script can be defined in the `` tag and run with shell commands. Klish spawns a sub-shell to interpret the instructions defined in a command's `` tag. +The sub-shell runs the wrapper script `sonic_cli_.py` +``` + sonic_cli_.py [parameters . . .] +``` +The `sonic_cli_.py` has a dispatch function to call a Swagger client method with parameters passed from user input. + +Example: +```XML + + + + + + if test "${direction-switch}" = "in"; then + Python $SONIC_CLI_ROOT/target/sonic-cli.py post_list_base_interfaces_interface ${access-list-name} ACL_IPV4 ${iface} ingress + else + Python $SONIC_CLI_ROOT/target/sonic-cli.py post_list_base_interfaces_interface ${access-list-name} ACL_IPV4 ${iface} egress + fi + + + ... +``` + +###### 3.2.2.1.6 Renderer scripts + +The actioner script receives the JSON output from the swagger client API and invokes the renderer script. The renderer script will send the JSON response to the jinja2 template file to parse the response and generate the CLI output. + +Example: "show acl" + +``` +{\% set acl_sets = acl_out['openconfig_aclacl']['acl_sets']['acl_set'] \%} + {\% for acl_set in acl_sets \%} + Name: {{ acl_set['state']['description'] }} + {\% endfor \%} + NOTE: An extra backslash is added in front of % in the above code snippet. Remove the backslash while using the actual jinja2 code in SONiC. +``` + +###### 3.2.2.1.7 Workflow (to add a new CLI) + +The following steps are to be followed when a new CLI is to be added. +1. Create an XML file that defines CLI command and parameters that the command requires. +2. Define the CLI help string to be displayed and datatype for the parameters. New parameter types (PTYPES), macros, and entities can be defined and used in the XML files. Valid XML tags are defined in the `sonic-clish.xsd` file. +3. Add the shell commands to `` tag to run the wrapper script with the Swagger client method name and parameters +4. Add the code to the wrapper script to construct the payload in `generate_body()` and handle the response +5. For ‘show’ commands, create a Jinja template to format the output + + +##### 3.2.2.2 REST Client SDK + +Framework provides swagger-codegen generated Python client SDK. Developers can generate client SDK code in other programming languages from the OpenAPI definitions as required. + +Client applications can use swagger generated client SDK or any other REST client tool to communicate with the REST server. + +##### 3.2.2.3 gNMI Client + +SONiC Teleletry service provides the gNMI server, while the client must be provided by the user. + +GNMI client developed by JipanYANG.(github.com/jipanYANG/gnxi/gnmi_get, github.com/jipanYANG/gnxi/gnmi_set) +is used for testing. gnmi_get and gnmi_set code has been changed to handle module name. + +Note: Although the GRPC protocol allows for many encodings and models to be used, our usage is restricted to JSON encoding. + +Supported RPC Operations: +------------------------- +- Get: Get one or more paths and have value(s) returned in a GetResponse. +- Set: Update, replace or delete objects + + Update: List of one or more objects to update + + Replace: List of one or objects to replace existing objects, any unspecified fields wil be defaulted. + + Delete: List of one or more object paths to delete +- Capabilities: Return gNMI version and list of supported models +- Subscribe: + + Subscribe to paths using either streaming or poll, or once based subscription, with either full current state or updated values only. + * Once: Get single subscription message. + * Poll: Get one subscription message for each poll request from the client. + * Stream: Get one subscription message for each object update, or at each sample interval if using sample mode. target_defined uses the values pre-configured for that particular object. + + +Example Client Operations: +-------------------------- +Using opensource clients, these are example client operations. The .json test payload files are available here: https://github.com/project-arlo/sonic-mgmt-framework/tree/master/src/Translib/test + +Get: +---- +`./gnmi_get -xpath /openconfig-acl:acl/interfaces -target_addr 127.0.0.1:8080 -alsologtostderr -insecure true -pretty` + +Set: +---- +Replace: +-------- + `./gnmi_set -replace /openconfig-acl:acl/:@./test/01_create_MyACL1_MyACL2.json -target_addr 127.0.0.1:8080 -alsologtostderr -insecure true -pretty` +Delete: +------- + `./gnmi_set -delete /openconfig-acl:acl/ -target_addr 127.0.0.1:8080 -insecure` + +Subscribe: +---------- +Streaming sample based: +----------------------- +`./gnmi_cli -insecure -logtostderr -address 127.0.0.1:8080 -query_type s -streaming_sample_interval 3000000000 -streaming_type 2 -q /openconfig-acl:acl/ -v 0 -target YANG` + +Poll based: +----------- +`./gnmi_cli -insecure -logtostderr -address 127.0.0.1:8080 -query_type p -polling_interval 1s -count 5 -q /openconfig-acl:acl/ -v 0 -target YANG` + +Once based: +----------- +`./gnmi_cli -insecure -logtostderr -address 127.0.0.1:8080 -query_type o -q /openconfig-acl:acl/ -v 0 -target YANG` + +##### 3.2.2.4 REST Server + +The management REST server is a HTTP server implemented in Go language. +It supports following operations: + +* RESTCONF APIs for YANG data +* REST APIs for manual OpenAPI definitions + +###### 3.2.2.4.1 Transport options + +REST server supports only HTTPS transport and listens on default port 443. +Server port can be changed through an entry in CONFIG_DB REST_SERVER table. +Details are in [DB Schema](#3_2_2_4_14-db-schema) section. + +HTTPS certificates are managed similar to that of existing gNMI Telemetry program. +Server key, certificate and CA certificate file paths are configured in CONFIG_DB +DEVICE_METATDATA table entry. Same certificate is used by both gNMI Telemetry and +REST server. + +###### 3.2.2.4.2 Translib linking + +REST server will statically link with Translib. For each REST request, the server +invokes Translib API which then invokes appropriate App module to process the request. +Below is the mapping of HTTP operations to Translib APIs: + + HTTP Method | Translib API | Request data | Response data +-------------|------------------|---------------|--------------- + GET | Translib.Get | path | status, payload + POST | Translib.Create | path, payload | status + PATCH | Translib.Update | path, payload | status + PUT | Translib.Replace | path, payload | status + DELETE | Translib.Delete | path | status + +More details about Translib APIs are in section [3.2.2.6](#3_2_2_6-Translib). + +###### 3.2.2.4.3 Media Types + +YANG defined RESTCONF APIs support **application/yang-data+json** media type. +Request and response payloads follow [RFC7951](https://tools.ietf.org/html/rfc7951) +defined encoding rules. Media type **application/yang-data+xml** is not supported +in first release. + +OpenAPI defined REST APIs can use any media type depending on App module +implementation. However content type negotiation is not supported by REST server. +A REST API should not be designed to consume or produce multiuple content types. +OpenAPI definition for each REST API should have maximun one cone media type in +its "consumes" and "produces" statements. + +###### 3.2.2.4.4 Payload Validations + +REST server does not validate request payload for YANG defined RESTCONF APIs. +Payload will be validated automatically in lower layers when it gets loaded +into YGOT bindings. + +For OpenAPI defined REST APIs the REST server will provide limited payload +validation. JSON request payloads (content type **application/json**) will be +validated against the schema defined in OpenAPI. Response data and non-JSON +request data are not validated by the REST server - this is left to the App module. + +###### 3.2.2.4.5 Concurrency + +REST server will accept concurrent requests. Translib provides appropriate locking mechanism - parallel reads and sequential writes. + +###### 3.2.2.4.6 API Versioning + +REST server will allow clients to specify API version through a custom HTTP header "Accept-Version". However API versioning feature will be supported only in a future release. The server will ignore the version information in current release. + + Accept-Version: 2019-06-20 + Accept-Version: 1.0.3 + +REST server will extract version text from the request header and pass it to the Translib API as metadata. App modules can inspect the version information and act accordingly. + +For YANG defined RESTCONF APIs, the version is the latest YANG revision date. For manual OpenAPI definitions developer can define version text in any appropriate format. + +###### 3.2.2.4.7 RESTCONF Entity-tag + +REST server will support RESTCONF entity-tag and last-modified timestamps in next release. server will not process or send corresponding request, response headers in first release. +Note that entity-tag and last-modified timestamps will be supported only for top level datastore node (/restconf/data). Per resource entity tags and timestamps will not be supported. Global entity tag and timestamp are used for configuration resources. + +###### 3.2.2.4.8 RESTCONF Discovery + +server will support RESTCONF root resource discovery as described in [RFC8040, section 3.1](https://tools.ietf.org/html/rfc8040#page-18). RESTCONF root resource will be "/restconf". + +YANG module library discovery as per [RFC7895](https://tools.ietf.org/html/rfc7895) will be supported in a future release. + +###### 3.2.2.4.9 RESTCONF Query Parameters + +RESTCONF Query Parameters will be supported in future release. All query parameters will be ignored by REST server in this release. + +###### 3.2.2.4.10 RESTCONF Operations + +RESTCONF operations via YANG RPC are not supported in this release. They can be supported in future releases. + +###### 3.2.2.4.11 RESTCONF Notifications + +RESTCONF Notification are not supported by framework. Clients can use gNMI for monitoring and notifications. + +###### 3.2.2.4.12 Authentication + +REST server will support below 3 authentication modes. + +* No authentication +* TLS Certificate authentication +* Username/password authentication + +Only one mode can be active at a time. Administrator can choose the authentication mode through ConfigDB REST_SERVER table entry. See [DB Schema](#3_2_2_4_14-db-schema) section. + +###### 3.2.2.4.12.1 No Authentication + +This is the default mode. REST server will not authenticate the client; all requests will be processed. It should not be used in production. + +###### 3.2.2.4.12.2 Certificate Authentication + +In this mode TLS public certificate of the client will be used to authenticate the client. Administrator will have to pre-provision the CA certificate in ConfigDB DEVICE_METADATA|x509 entry. REST server will accept a request only if the client TLS certificate is signed by that CA. + +###### 3.2.2.4.12.3 User Authentication + +In this mode REST server expects the client to provide user credentials in every request. server will support HTTP Basic Authentication method to accept user credentials. + +REST server will integrate with Linux PAM to authenticate and authorize the user. PAM may internally use native user database or TACACS+ server based on system configuration. REST write requests will be allowed only if the user belong to admin group. Only read operations will be allowed for other users. + +Performing TACACS+ authentication for every REST request can slow down the APIs. This will be optimized through JSON Web Token (JWT) or a similar mechanism in future release. + +###### 3.2.2.4.13 Error Response + +REST server sends back HTTP client error (4xx) or server error (5xx) status when request processing +fails. Response status and payload will be as per RESTCONF specifications - [RCF8040, section7](https://tools.ietf.org/html/rfc8040#page-73). +Error response data will be a JSON with below structure. Response Content-Type will be +"application/yang-data+json". + + +---- errors + +---- error* + +---- error-type "protocol" or "application" + +---- error-tag string + +---- error-app-tag? string + +---- error-path? xpath + +---- error-message? string + +Note: REST server will not populate error-app-tag and error-path fields in this release. It can be +enhanced in a future release. A sample error response: + + { + "ietf-restconf:errors" : { + "error" : [ + { + "error-type" : "application", + "error-tag" : "invalid-value", + "error-message" : "VLAN 100 not found" + } + ] + } + } + +**error-type** can be either "protocol" or "application", indicating the origin of the error. +RESTCONF defines two more error-type enums "transport" and "rpc"; they are not used by REST server. + +**error-tag** indicates nature of error as described in [RFC8040, section 7](https://tools.ietf.org/html/rfc8040#page-74). + +**error-message** field carries a human friendly error message that can be displayed to the end +user. This is an optional field; system error do not include error-message, or have generic +messages like "Internal error". App module developer should use human friendly messages while +returning application errors. In case of CVL constraint violation the REST server will pick +the error message from the YANG "error-message" statement of CVL schema YANG. + +Table below lists possible error conditions with response status and data returned by REST server. + +Method | Error condition | Status | error-type | error-tag | error-message +--------|--------------------------|--------|-------------|------------------|---------------------- +*any* | Incorrect request data | 400 | protocol | invalid-value | +*write* | Bad content-type | 415 | protocol | invalid-value | Unsupported content-type +*write* | OpenAPI schema validation fails | 400 | protocol| invalid-value | Content not as per schema +*write* | YGOT schema validation fails | 400 | protocol| invalid-value | *YGOT returned message* +*any* | Invalid user credentials | 401 | protocol | access-denied | Authentication failed +*write* | User is not an admin | 403 | protocol | access-denied | Authorization failed +*write* | Translib commit failure | 409 | protocol | in-use | +*any* | Unknown HTTP server failure | 500 | protocol | operation-failed | Internal error +*any* | Not supported by App module | 405 | application | operation-not-supported | *App module returned message* +*any* | Incorrect payload | 400 | application | invalid-value | *App module returned message* +*any* | Resource not found | 404 | application | invalid-value | *App module returned message* +POST | Resource exists | 409 | application | resource-denied | *App module returned message* +*any* | Unknown error in Translib | 500 | application | operation-failed | Internal error +*any* | Unknown App module failure | 500 | application | operation-failed | *App module returned message* +*any* | CVL constraint failure | 500 | application | invalid-value | *error-message defined in SONiC YANG* + + +###### 3.2.2.4.14 DB Schema + +A new table "REST_SERVER" will be introduced in ConfigDB for maintaining REST server configurations. Below is the schema for this table. + + key = REST_SERVER:default ; REST server configurations. + ;field = value + port = 1*5DIGIT ; server port - defaults to 443 + client_auth = "none"/"user"/"cert" ; Client authentication mode. + ; none: No authentication, all clients + ; are allowed. Should be used only + ; for debugging. Default. + ; user: Username/password authentication + ; via PAM. + ; cert: Certificate based authentication. + ; Client's public certificate should + ; be registered on this server. + log_level = DIGIT ; Verbosity for glog.V logs + +###### 3.2.2.4.15 API Documentation + +REST server will provide [Swagger UI](https://github.com/swagger-api/swagger-ui) based online +documentation and test UI for all REST APIs it supports. Documentation can be accessed by launching +URL **https://*REST_SERVER_IP*/ui** in a browser. This page will list all supported OpenAPI +definition files (both YANG generated and manual) along with link to open Swagger UI for them. + + +##### 3.2.2.5 gNMI server + +1. gNMI server is part of the telemetry process that supports telemtry as well as gNMI. +2. The gRPC server opens a TCP port and allows only valid mutually authenticated TLS connections, which requires valid Client, server and CA Certificates be installed as well a properly configured DNS. Multiple simultaneous connections are allowed to gNMI server. +3. The gNMI Agent uses the db client, as well as the non-db client to access and modify data directly in the Redis DB. +4. The Translib client is used to provide alternative models of access such as Openconfig models as opposed to the native Redis schema, as long as the Translib supports these models. Translib offers bidirectional translation between the native Redis model and the desired north bound model, as well as notifications/updates on these model objects to support telemetry and asynchronous updates, alarms and events. Translib should also provide information about what models it supports so that information can be returned in gNMI Capabilities response. +5. The gNMI server defines the four RPC functions as required by the gNMI Specification: Get, Set, Capabilities and Subscribe. +6. Since the db, non-db and Translib clients offer the functionality to support these functions, gNMI only has to translate the paths and object payloads into the correct parameters for the client calls and package the results back into the response gNMI objects to return to the gNMI Client, which is a straightforward operation, since no additional processing of the data is expected to be done in the gNMI server itself. When new models are added to Translib, no additional work should be required to support them in gNMI server. +7. All operations in a Set request are processed in a single transaction that will either succeed or fail as one operation. The db, non-db and Translib clients must support a Bulk operation in order to achieve the transactional behavior. gNMI server then must use this Bulk operation for Set requests. +8. Subscribe operations: Once, Poll and Stream require that the gRPC connection remain open until the subscription is completed. This means many connections must be supported. Subscribe offers several options, such as only sending object updates (not the whole object) which requires support form the db clients. Subscribe also allows for periodic sampling defined by the client. This must be handled in the gNMI agent itself. This requires a timer for each subscribe connection of this type in order to periodically poll the db client and return the result in a Subscribe Response. These timers should be destroyed when the subscription gRPC connection is closed. + +###### 3.2.2.5.1 Files changed/added: + + |-- gnmi_server + | |-- client_subscribe.go + | |-- server.go ------------------- MODIFIED (Handles creation of transl_data_client for GET/SET/CAPABILITY) + | |-- server_test.go + |-- sonic_data_client + | |-- db_client.go ---------------- MODIFIED (Common interface Stub code for new functions as all data clients implement common interface functions) + | |-- non_db_client.go ------------ MODIFIED (Common interface Stub code for new functions as all data clients implement common interface functions) + | |-- transl_data_client.go ------- ADDED (Specific processing for GET/SET/CAPABILITY for transl data clients) + | |-- trie.go + | |-- virtual_db.go + | + |-- transl_utils -------------------- ADDED + |-- transl_utils.go ------------- ADDED (Layer for invoking Translib API's) + +###### 3.2.2.5.2 Sample Requests + +go run gnmi_get.go -xpath /openconfig-acl:acl/acl-sets/acl-set[name=MyACL4][type=ACL_IPV4]/acl-entries/acl-entry[sequence-id=1] -target_addr 10.130.84.34:8081 -alsologtostderr -insecure true -pretty + +go run gnmi_set.go -replace /openconfig-acl:acl/acl-sets/acl-set[name=MyACL4][type=ACL_IPV4]/acl-entries/acl-entry=2/actions/config:@openconfig.JSON -target_addr 10.130.84.34:8081 -alsologtostderr -insecure true -pretty + +go run gnmi_capabilities.go -target_addr 10.130.84.34:8081 -alsologtostderr -insecure true -pretty + +##### 3.2.2.6 Translib + +Translib is a library that adapts management server requests to SONiC data providers and vice versa. Translib exposes the following APIs for the management servers to consume. + + func Create(req SetRequest) (SetResponse, error) + func Update(req SetRequest) (SetResponse, error) + func Replace(req SetRequest) (SetResponse, error) + func Delete(req SetRequest) (SetResponse, error) + func Get(req GetRequest) (GetResponse, error) + func Subscribe(paths []string, q *queue.PriorityQueue, stop chan struct{}) ([]*IsSubscribeResponse, error) + func IsSubscribeSupported(paths []string) ([]*IsSubscribeResponse, error) + func GetModels() ([]ModelData, error) + + Translib Structures: + type ErrSource int + + const( + ProtoErr ErrSource = iota + AppErr + ) + + type SetRequest struct{ + Path string + Payload []byte + } + + type SetResponse struct{ + ErrSrc ErrSource + } + + type GetRequest struct{ + Path string + } + + type GetResponse struct{ + Payload []byte + ErrSrc ErrSource + } + + type SubscribeResponse struct{ + Path string + Payload []byte + Timestamp int64 + SyncComplete bool + IsTerminated bool + } + + type NotificationType int + + const( + Sample NotificationType = iota + OnChange + ) + + type IsSubscribeResponse struct{ + Path string + IsOnChangeSupported bool + MinInterval int + Err error + PreferredType NotificationType + } + + type ModelData struct{ + Name string + Org string + Ver string + } + +Translib has the following sub modules to help in the translation of data + +1. App Interface +2. Translib Request Handlers +3. YGOT request binder +4. DB access layer +5. App Modules +6. Transformer + +###### 3.2.2.6.1 App Interface + +App Interface helps in identifying the App module responsible for servicing the incoming request. It provides the following APIs for the App modules to register themselves with the App interface during the initialization of the app modules. + + func Register(path string, appInfo *AppInfo) error + This method can be used by any App module to register itself with the Translib infra. + Input Parameters: + path - base path of the model that this App module services + appInfo - This contains the reflect types of the App module structure that needs to be instantiated for each request, corresponding YGOT structure reflect type to instantiate the corresponding YGOT structure and boolean indicating if this is native App module to differentiate between OpenAPI spec servicing App module and the YANG serving App module. + Returns: + error - error string + func AddModel(model *gnmi.ModelData) error + This method can be used to register the models that the App module supports with the Translib infra. + Input Parameters: + model - Filled ModelData structure containing the Name, Organisation and version of the model that is being supported. + Returns: + error - error string + + App Interface Structures: + //Structure containing App module information + type AppInfo struct { + AppType reflect.Type + YGOTRootType reflect.Type + IsNative bool + tablesToWatch []*db.TableSpec + } + + Example Usages: + func init () { + log.Info("Init called for ACL module") + err := register("/openconfig-acl:acl", + &appInfo{appType: reflect.TypeOf(AclApp{}), + ygotRootType: reflect.TypeOf(ocbinds.OpenconfigAcl_Acl{}), + isNative: false, + tablesToWatch: []*db.TableSpec{&db.TableSpec{Name: ACL_TABLE}, &db.TableSpec{Name: RULE_TABLE}}}) + + if err != nil { + log.Fatal("Register ACL App module with App Interface failed with error=", err) + } + + err = appinterface.AddModel(&gnmi.ModelData{Name:"openconfig-acl", + Organization:"OpenConfig working group", + Version:"1.0.2"}) + if err != nil { + log.Fatal("Adding model data to appinterface failed with error=", err) + } + } + + type AclApp struct { + path string + YGOTRoot *YGOT.GoStruct + YGOTTarget *interface{} + } + +Translib request handlers use the App interface to get all the App module information depending on the incoming path as part of the requests. + +###### 3.2.2.6.2 Translib Request Handler + +These are the handlers for the APIs exposed by the Translib. Whenever a request lands in the request handler, the handler uses the App interface to get the App module that can process the request based on the incoming path. It then uses the YGOT binder module, if needed, to convert the incoming path and payload from the request into YGOT structures. The filled YGOT structures are given to the App Modules for processing. The Translib also interacts with the DB access layer to start, commit and abort a transaction. + +###### 3.2.2.6.3 YGOT request binder + +The YGOT request binder module uses the YGOT tools to perform the unmarshalling and validation. YGOT (YANG Go Tools) is an open source tool and it has collection of Go utilities which are used to + + 1. Generate a set of Go structures for bindings for the given YANG modules at build time + 2. Unmarshall the given request into the Go structure objects. These objects follows the same hierarchical structure defined in the YANG model, and it's simply a data instance tree of the given request, but represented using the generated Go structures + 3. Validate the contents of the Go structures against the YANG model (e.g., validating range and regular expression constraints). + 4. Render the Go structure objects to an output format - such as JSON. + +This RequestBinder module exposes the below mentioned APIs which will be used to unmarshall the request into Go structure objects, and validate the request + + func getRequestBinder(uri *string, payload *[]byte, opcode int, appRootNodeType *reflect.Type) *requestBinder + This method is used to create the requestBinder object which keeps the given request information such as uri, payload, App module root type, and unmarshall the same into object bindings + Input parameters: + uri - path of the target object in the request. + payload - payload content of given the request and the type is byte array + opcode - type of the operation (CREATE, DELETE, UPDATE, REPLACE) of the given request, and the type is enum + appRootNodeType - pointer to the reflect.Type object of the App module root node's YGOT structure object + Returns: + requestBinder - pointer to the requestBinder object instance + + func (binder *requestBinder) unMarshall() (*YGOT.GoStruct, *interface{}, error) + This method is be used to unmarshall the request into Go structure objects, and validates the request against YANG model schema + Returns: + YGOT.GoStruct - root Go structure object of type Device. + interface{} - pointer to the interface type of the Go structure object instance of the given target path + error - error object to describe the error if the unmarshalling fails, otherwise nil + +Utilities methods: +These utilities methods provides below mentioned common operations on the YGOT structure which are needed by the App module + + func getParentNode(targetUri *string, deviceObj *ocbinds.Device) (*interface{}, *YANG.Entry, error) + This method is used to get parent object of the given target object's uri path + Input parameters: + targetUri - path of the target URI + deviceObj - pointer to the base root object Device + Returns + interface{} - pointer to the parent object of the given target object's URI path + YANG.Entry - pointer to the YANG schema of the parent object + error - error object to describe the error if this methods fails to return the parent object, otherwise nil + + func getNodeName(targetUri *string, deviceObj *ocbinds.Device) (string, error) + This method is used to get the YANG node name of the given target object's uri path. + Input parameters: + targetUri - path of the target URI + deviceObj - pointer to the base root object Device + Returns: + string - YANG node name of the given target object + error - error object to describe the error if this methods fails to return the parent object, otherwise nil + + func getObjectFieldName(targetUri *string, deviceObj *ocbinds.Device, YGOTTarget *interface{}) (string, error) + This method is used to get the go structure object field name of the given target object. + Input parameters: + targetUri - path of the target URI + deviceObj - pointer to the base root object Device + YGOTTarget - pointer to the interface type of the target object. + Returns: + string - object field name of the given target object + error - error object to describe the error if this methods fails to perform the desired operation, otherwise nil + +###### 3.2.2.6.4 DB access layer + +The DB access layer implements a wrapper over the [go-redis](https://github.com/go-redis/redis) package +enhancing the functionality in the following ways: + + * Provide a sonic-py-swsssdk like API in Go + * Enable support for concurrent access via Redis CAS (Check-And-Set) + transactions. + * Invoke the CVL for validation before write operations to the Redis DB + +The APIs are broadly classified into the following areas: + + * Initialization/Close: NewDB(), DeleteDB() + * Read : GetEntry(), GetKeys(), GetTable() + * Write : SetEntry(), CreateEntry(), ModEntry(), DeleteEntry() + * Transactions : StartTx(), CommitTx(), AbortTx() + * Map : GetMap(), GetMapAll() + * Subscriptions : SubscribeDB(), UnsubscribeDB() + +Detail Method Signature: + Please refer to the code for the detailed method signatures. + +Concurrent Access via Redis CAS transactions: + + Upto 4 levels of concurrent write access support. + + 1. Table based watch keys (Recommended): + At App module registration, the set of Tables that are to be managed by + the module are provided. External (i.e. non-Management-Framework) + applications may choose to watch and set these same table keys to detect + and prevent concurrent write access. The format of the table key is + "CONFIG_DB_UPDATED_". (Eg: CONFIG_DB_UPDATED_ACL_TABLE) + + 2. Row based watch keys: + For every transaction, the App module provides a list of keys that it + would need exclusive access to for the transaction to succeed. Hence, + this is more complex for the app modules to implement. The external + applications need not be aware of table based keys. However, the + concurrent modification of yet to be created keys (i.e. keys which are + not in the DB, but might be created by a concurrent application) may not + be detected. + + 3. A combination of 1. and 2.: + More complex, but easier concurrent write access detection. + + 4. None: + For applications not needing concurrent write access protections. + + +DB access layer, Redis, CVL Interaction: + + DB access | PySWSSSDK API | RedisDB Call at | CVL Call at + | | at CommitTx | invocation + ----------------|------------------|-------------------|-------------------- + SetEntry(k,v) | set_entry(k,v) | HMSET(fields in v)|If HGETALL=no entry + | | HDEL(fields !in v | ValidateEditConfig + | | but in | (OP_CREATE) + | | previous HGETALL)| + | | |Else + | | | ValidateEditConfig( + | | | OP_UPDATE) + | | | ValidateEditConfig( + | | | DEL_FIELDS) + ----------------|------------------|-------------------|-------------------- + CreateEntry(k,v)| none | HMSET(fields in v)| ValidateEditConfig( + | | | OP_CREATE) + ----------------|------------------|-------------------|-------------------- + ModEntry(k,v) | mod_entry(k,v) | HMSET(fields in v)| ValidateEditConfig( + | | | OP_UPDATE) + ----------------|------------------|-------------------|-------------------- + DeleteEntry(k,v)|set,mod_entry(k,0)| DEL | ValidateEditConfig( + | | | OP_DELETE) + ----------------|------------------|-------------------|-------------------- + DeleteEntryField| none | HDEL(fields) | ValidateEditConfig( + (k,v) | | | DEL_FIELDS) + + +##### 3.2.2.7 Transformer + +Transformer provides the underlying infrastructure for developers to translate data from YANG to ABNF/Redis schema and vice versa, using YANG extensions annotated to YANG paths to provide translation methods. At run time, the YANG extensions are mapped to an in-memory Transformer Spec that provides two-way mapping between YANG and ABNF/Redis schema for Transformer to perform data translation while processing SET/GET operations. + +In case that SONiC YANG modules are used by NBI applications, the Transformer performs 1:1 mapping between a YANG object and a SONiC DB object without a need to write special translation codes. If the openconfig YANGs are used by NBI applications, you may need special handling to translate data between YANG and ABNF schema. In such case, you can annotate YANG extensions and write callbacks to perform translations where required. + +For special handling, a developer needs to provide: +1. An annotation file to define YANG extensions on YANG paths where translation required +e.g. In openconfig-acl.yang, ACL FORWARDING_ACTION "ACCEPT" mapped to "FORWARD", ACL sequence id ‘1’ mapped to ‘RULE_1’ in Redis DB etc. +2. Transformer callbacks to perform translation + +###### 3.2.2.7.1 Components + +Transformer consists of the following components and data: +* **Transformer Spec:** a collection of translation hints +* **Spec Builder:** loads YANG and annotation files to dynamically build YANG schema tree and Transformer Spec. +* **Transformer Core:** perform main transformer tasks, i.e. encode/decode YGOT, traverse the payload, lookup Transformer spec, call Transformer methods, construct the results, error reporting etc. +* **Built-in Default Transformer method:** perform static translation +* **Overloaded Transformer methods:** callback functions invoked by Transformer core to perform complex translation with developer supplied translation logic +* **Ouput framer:** aggregate the translated pieces returned from default and overloaded methods to construct the output payload +* **Method overloader:** dynamically lookup and invoke the overloaded transformer methods during data translation +* **YANG schema tree:** provides the Transformer with the schema information that can be accessed by Transformer to get node information, like default values, parent/descendant nodes, etc. + +![Transformer Components](images/transformer_components_v1.png) + +###### 3.2.2.7.2 Design + +Requests from Northbound Interfaces (NBI) are processed by Translib public APIs - Create, Replace, Update, Delete, (CRUD) and Get - that call a specific method on the common app module. The common app calls Transformer APIs to translate the request, then use the translated data to proceed to DB/CVL layer to set or get data in Redis DB. The **common app** as a default application module generically handles both SET (CRUD) and GET, and Subscribe requests with Transformer. Note that a specific app module can be registered to the Translib to handle the requests if needed. + +![Transformer Design](images/transformer_design.PNG) + +At Transformer init, it loads YANG modules pertaining to the applications. Transformer parses YANG modules with the extensions to dynamically build an in-memory schema tree and transformer spec. + +Below structure is defined for the transformer spec: + +```YANG +type yangXpathInfo struct { + yangDataType string + tableName *string + childTable []string + dbEntry *yang.Entry + yangEntry *yang.Entry + keyXpath map[int]*[]string + delim string + fieldName string + xfmrFunc string + xfmrKey string + dbIndex db.DBNum + keyLevel int +} +``` + +When a request lands at the common app in the form of a YGOT structure from the Translib request handler, the request is passed to Transformer that decodes the YGOT structure to read the request payload and look up the spec to get translation hints. Note that the Transformer cannot be used for OpenAPI spec. The Transformer Spec is structured with a two-way mapping to allow Transformer to map YANG-based data to ABNF data and vice-versa via reverse lookup. The reverse mapping is used to populate the data read from Redis DB to YANG structure for the response to get operation. + +Transformer has a built-in default transformer method to perform static, simple translation from YANG to ABNF or vice versa. It performs simple mapping - e.g. a direct name/value mapping, table/key/field name - which can be customized by a YANG extension. + +Additionally, for more complex translations of non-ABNF YANG models, i.e. OpenConfig models, Transformer also allows developers to overload the default method by specifying a callback fucntion in YANG extensions, to perform translations with developer-supplied translation codes as callback functions. Transformer dynamically invokes those functions instead of using default method. Each transformer callback must be defined to support two-way translation, i.e, YangToDb_ and DbToYang_, which are invoked by Transformer core. + +###### 3.2.2.7.3 Process + +CRUD requests (configuration) are processed via the following steps: + +1. App module calls transformer, passing it a YGOT populated Go structure to translate YANG to ABNF +2. App module calls CVL API to get all depenent table list to watch tables, and get the ordered table list +3. Transformer allocates buffer with 3-dimensional map: `[table-name][key-values][attributes]` +4. Transformer decodes YGOT structure and traverses the incoming request to get the YANG node name +5. Transformer looks up the Transformer Spec to check if a translation hint exists for the given path +6. If no spec or hint is found, the name and value are copied as-is +7. If a hint is found, check the hint to perform the action, either simple data translation or invoke external callbacks +8. Repeat steps 4 through 7 until traversal is completed +9. Invoke any annotated post-Transformer functions +10. Transformer aggregates the results to returns to App module +11. App module proceeds to update DB to ensure DB update in the order learnt from step 2 + +GET requests are processed via the following steps: +1. App module asks the transformer to translate the URL to the keyspec to the query target + ```YANG + type KeySpec struct { + dbNum db.DBNum + Tsdb.TableSpec + Key db.Key + Child []KeySpec + } +2. Transformer proceeds to traverse the DB with the keyspec to get the results +3. Transformer translate the results from ABNF to YANG, with default transformer method or callbacks +4. Transformer aggregate the translated results, return to the App module to unmarshall the JSON payload + +###### 3.2.2.7.4 Common App + +The Common App is a default app that handles the GET/SET/Subscribe requests for SONiC or OpenConfig YANG modules unless an app module is registered to Translib. + +Here is a diagram to show how the common app supports SET(CRUD)/GET requests. + +![sequence diagram_for_set](images/crud_v1.png) + +![sequence_diagram_for_get](images/get_v1.png) + +If a request is associated with multiple tables, the common app module processes the DB updates in the table order learned from CVL layer. +e.g. in case that the sonic-acl.yang used by NBI and the payload with CREATE operation has a data including both ACL_TABLE and ACL_RULE, the common app updates the CONFIG-DB to create an ACL TABLE instance, followed by ACL_RULE entry creation in this order. DELETE operates in the reverse order, i.e. ACL_RULE followed by ACL_TABLE. + +###### 3.2.2.7.5 YANG Extensions + +The translation hints are defined as YANG extensions to support simple table/field name mapping or more complex data translation by external callbacks. + +---------- + +1. `sonic-ext:table-name [string]`: +Map a YANG container/list to TABLE name, processed by the default transformer method. Argument is a table name statically mapped to the given YANG container or list node. +The table-name is inherited to all descendant nodes unless another one is defined. + +2. `sonic-ext:field-name [string]`: +Map a YANG leafy - leaf or leaf-list - node to FIELD name, processed by the default transformer method + +3. `sonic-ext:key-delimiter [string]`: +Override the default key delimiters used in Redis DB, processed by the default transformer method. +Default delimiters are used by Transformer unless the extension is defined - CONFIG_DB: "|", APPL_DB: ":", ASIC_DB: "|", COUNTERS_DB: ":", FLEX_COUNTER_DB: "|", STATE_DB: "|" + +4. `sonic-ext:key-name [string]`: +Fixed key name, used for YANG container mapped to TABLE with a fixed key, processed by the default transformer method. Used to define a fixed key, mainly for container mapped to TABLE key +e.g. Redis can have a hash “STP|GLOBAL” +```YANG +container global + sonic-ext:table-name “STP” + sonic-ext:key-name “GLOBAL” +``` +5. `sonic-ext:key-transformer [function]`: +Overloading default method with a callback to generate DB keys(s), used when the key values in a YANG list are different from ones in DB TABLE. +A pair of callbacks should be implemented to support 2 way translation - **YangToDB***function*, **DbToYang***function* + +6. `sonic-ext:field-transformer [function]`: +Overloading default method with a callback to generate FIELD value, used when the leaf/leaf-list values defined in a YANG list are different from the field values in DB. +A pair of callbacks should be implemented to support 2 way translation - **YangToDB***function*, **DbToYang***function* + +7. `sonic-ext:subtree-transformer [function]`: +Overloading default method with a callback for the current subtree, allows the sub-tree transformer to take full control of translation. Note that, if any other extensions, e.g. table-name etc., are annotated to the nodes on the subtree, they are not effective. +The subtree-transformer is inherited to all descendant nodes unless another one is defined, i.e. the scope of subtree-transformer callback is limited to the current and descendant nodes along the YANG path until a new subtree transformer is annotated. +A pair of callbacks should be implemented to support 2 way translation - **YangToDB***function*, **DbToYang***function* + +8. `sonic-ext:db-name [string]`: +DB name to access data – “APPL_DB”, “ASIC_DB”, “COUNTERS_DB”, “CONFIG_DB”, “FLEX_COUNTER_DB”, “STATE_DB”. The default db-name is CONFIG_DB, Used for GET operation to non CONFIG_DB, applicable only to SONiC YANG. Processed by Transformer core to traverse database. +The db-name is inherited to all descendant nodes unless another one. Must be defined with the table-name + +9. `sonic-ext:post-transformer [function]`: +A special hook to update the DB requests right before passing to common-app, analogous to the postponed YangToDB subtree callback that is invoked at the very end by the Transformer. +Used to add/update additional data to the maps returned from Transformer before passing to common-app, e.g. add a default acl rule +Note that the post-transformer can be annotated only to the top-level container(s) within each module, and called once for the given node during translation + +10. `sonic-ext:table-transformer [function]`: +Dynamically map a YANG container/list to TABLE name(s), allows the table-transformer to map a YANG list/container to table names. +Used to dynamically map a YANG list/container to table names based on URI and payload. +The table-transformer is inherited to all descendant nodes unless another one is defined + +11. `sonic-ext:get-validate [function]`: +A special hook to validate YANG nodes, to populate data read from database, allows developers to instruct Transformer to choose a YANG node among multiple nodes, while constructing the response payload. +Typically used to check the “when” condition to validate YANG node among multiple nodes to choose only valid nodes from sibling nodes. + +---------- + + +Note that the key-transformer, field-transformer and subtree-transformer have a pair of callbacks associated with 2 way translation using a prefix - **YangToDB***function*, **DbToYang***function*. It is not mandatory to implement both functions. E.g. if you need a translation for GET operation only, you can implement only **DbToYang***function*. + +The template annotation file can be generated and used by the developers to define extensions to the yang paths as needed to translate data between YANG and ABNF format. Refer to the 3.2.2.7.8 Utilities. + +Here is the general guide you can check to find which extensions can be annotated in implementing your model. +```YANG +1) If the translation is simple mapping between YANG container/list and TABLE, consider using the extensions - table-name, field-name, optionally key-delimiter +2) If the translation requires a complex translation with your codes, consider the following transformer extensions - key-transformer, field-transformer, subtree-transformer to take a control during translation. Note that multiple subtree-transformers can be annotated along YANG path to divide the scope +3) If multiple tables are mapped to a YANG list, e.g. openconfig-interface.yang, use the table-transformer to dynamically choose tables based on URI/payload +4) In Get operation access to non CONFIG_DB, you can use the db-name extension +5) In Get operation, you can annotate the subtree-transformer on the node to implement your own data access and translation with DbToYangxxx function +6) In case of mapping a container to TABLE/KET, you can use the key-name along with the table-name extension +``` + +###### 3.2.2.7.6 Public Functions + +`XlateToDb()` and `GetAndXlateFromDb` are used by the common app to request translations. + +```go +func XlateToDb(path string, opcode int, d *db.DB, yg *ygot.GoStruct, yt *interface{}) (map[string]map[string]db.Value, error) {} + +func GetAndXlateFromDB(xpath string, uri *ygot.GoStruct, dbs [db.MaxDB]*db.DB) ([]byte, error) {} +``` + +###### 3.2.2.7.7 Overloaded Methods + +The function prototypes for external transformer callbacks are defined in the following- + +```go +type XfmrParams struct { + d *db.DB + dbs [db.MaxDB]*db.DB + curDb db.DBNum + ygRoot *ygot.GoStruct + uri string + oper int + key string + dbDataMap *map[db.DBNum]map[string]map[string]db.Value + param interface{} +} + +/** + * KeyXfmrYangToDb type is defined to use for conversion of Yang key to DB Key + * Transformer function definition. + * Param: XfmrParams structure having Database info, YgotRoot, operation, Xpath + * Return: Database keys to access db entry, error + **/ +type KeyXfmrYangToDb func (inParams XfmrParams) (string, error) +/** + * KeyXfmrDbToYang type is defined to use for conversion of DB key to Yang key + * Transformer function definition. + * Param: XfmrParams structure having Database info, operation, Database keys to access db entry + * Return: multi dimensional map to hold the yang key attributes of complete xpath, error + **/ +type KeyXfmrDbToYang func (inParams XfmrParams) (map[string]interface{}, error) + +/** + * FieldXfmrYangToDb type is defined to use for conversion of yang Field to DB field + * Transformer function definition. + * Param: Database info, YgotRoot, operation, Xpath + * Return: multi dimensional map to hold the DB data, error + **/ +type FieldXfmrYangToDb func (inParams XfmrParams) (map[string]string, error) +/** + * FieldXfmrDbtoYang type is defined to use for conversion of DB field to Yang field + * Transformer function definition. + * Param: XfmrParams structure having Database info, operation, DB data in multidimensional map, output param YgotRoot + * Return: error + **/ +type FieldXfmrDbtoYang func (inParams XfmrParams) (map[string]interface{}, error) + +/** + * SubTreeXfmrYangToDb type is defined to use for handling the yang subtree to DB + * Transformer function definition. + * Param: XfmrParams structure having Database info, YgotRoot, operation, Xpath + * Return: multi dimensional map to hold the DB data, error + **/ +type SubTreeXfmrYangToDb func (inParams XfmrParams) (map[string]map[string]db.Value, error) +/** + * SubTreeXfmrDbToYang type is defined to use for handling the DB to Yang subtree + * Transformer function definition. + * Param : XfmrParams structure having Database pointers, current db, operation, DB data in multidimensional map, output param YgotRoot, uri + * Return : error + **/ +type SubTreeXfmrDbToYang func (inParams XfmrParams) (error) +/** + * ValidateCallpoint is used to validate a YANG node during data translation back to YANG as a response to GET + * Param : XfmrParams structure having Database pointers, current db, operation, DB data in multidimensional map, output param YgotRoot, uri + * Return : bool + **/ +type ValidateCallpoint func (inParams XfmrParams) (bool) + +/** + * PostXfmrFunc type is defined to use for handling any default handling operations required as part of the CREATE + * Transformer function definition. + * Param: XfmrParams structure having database pointers, current db, operation, DB data in multidimensional map, YgotRoot, uri + * Return: multi dimensional map to hold the DB data, error + **/ +type PostXfmrFunc func (inParams XfmrParams) (map[string]map[string]db.Value, error) + +``` + +###### 3.2.2.7.8 Utilities + +The goyang package is extended to generate the template annotation file for any input yang files. A new output format type "annotate" can be used to generate the template annotation file.The goyang usage is as below: + +``` +Usage: goyang [-?] [--format FORMAT] [--ignore-circdep] [--path DIR[,DIR...]] [--trace TRACEFILE] [FORMAT OPTIONS] [SOURCE] [...] + -?, --help display help + --format=FORMAT + format to display: annotate, tree, types + --ignore-circdep + ignore circular dependencies between submodules + --path=DIR[,DIR...] + comma separated list of directories to add to search path + --trace=TRACEFILE + write trace into to TRACEFILE + +Formats: + annotate - generate template file for yang annotations + + tree - display in a tree format + + types - display found types + --types_debug display debug information + --types_verbose + include base information +``` +The $(SONIC_MGMT_FRAMEWORK)/gopkgs/bin is added to the PATH to run the goyang binary. + +For example: + +``` +goyang --format=annotate --path=/path/to/yang/models openconfig-acl.yang > openconfig-acl-annot.yang + +Sample output: +module openconfig-acl-annot { + + yang-version "1" + + namespace "http://openconfig.net/yang/annotation"; + prefix "oc-acl-annot" + + import openconfig-packet-match { prefix oc-pkt-match } + import openconfig-interfaces { prefix oc-if } + import openconfig-yang-types { prefix oc-yang } + import openconfig-extensions { prefix oc-ext } + + deviation oc-acl:openconfig-acl { + deviate add { + } + } + deviation oc-acl:openconfig-acl/oc-acl:acl { + deviate add { + } + } + deviation oc-acl:openconfig-acl/oc-acl:acl/oc-acl:state { + deviate add { + } + } + deviation oc-acl:openconfig-acl/oc-acl:acl/oc-acl:state/oc-acl:counter-capability { + deviate add { + } + } + deviation oc-acl:openconfig-acl/oc-acl:acl/oc-acl:acl-sets { + deviate add { + } + } + deviation oc-acl:openconfig-acl/oc-acl:acl/oc-acl:acl-sets/oc-acl:acl-set { + deviate add { + } + } + deviation oc-acl:openconfig-acl/oc-acl:acl/oc-acl:acl-sets/oc-acl:acl-set/oc-acl:type { + deviate add { + } + } +... +... + deviation oc-acl:openconfig-acl/oc-acl:acl/oc-acl:config { + deviate add { + } + } +} +``` + + +##### 3.2.2.8 Config Validation Library (CVL) + +Config Validation Library (CVL) is an independent library to validate ABNF schema based SONiC (Redis) configuration. This library can be used by component like [Cfg-gen](https://github.com/Azure/sonic-buildimage/blob/master/src/sonic-config-engine/sonic-cfggen), Translib, [ZTP](https://github.com/Azure/SONiC/blob/master/doc/ztp/ztp.md) etc. to validate SONiC configuration data before it is written to Redis DB. Ideally, any component importing config_db.json file into Redis DB can invoke CVL API to validate the configuration. + +CVL uses SONiC YANG models written based on ABNF schema along with various constraints. These native YANG models are simple and have a very close mapping to the associated ABNF schema. Custom YANG extensions (annotations) are used for custom validation purpose. Specific YANG extensions (rather metadata) are used to translate ABNF data to YANG data. In the context of CVL these YANG models are called CVL YANG models and generated from SONiC YANG during build time. Opensource [libyang](https://github.com/CESNET/libyang) library is used to perform YANG data validation. + +SONiC YANG can be used as Northbound YANG for management interface by adding other data definitions such as state data (read only data), RPC or Notification as needed.Such YANG models are called SONiC NBI YANG models. Since CVL validates configuration data only, these data definition statements are ignored by CVL. During build time CVL YANG is actually generated from SONiC NBI YANG models with the help of CVL specific pyang plugin. + +###### 3.2.2.8.1 Architecture + +![CVL architecture](images/CVL_Arch.jpg) + +1. During build time, developer writes SONiC YANG schema based on ABNF schema following [SONiC YANG Guidelines](https://github.com/Azure/SONiC/blob/master/doc/mgmt/SONiC_YANG_Model_Guidelines.md) and adds metadata and constraints as needed. Custom YANG extensions are defined for this purpose. +2. The YANG models are compiled using Pyang compiler. CVL specific YIN schema files are derived from SONiC YANG with the help of CVL specific pyang plugin. Finally, generated YIN files are packaged in the build. +3. During boot up/initialization sequence, YIN schemas generated from SONiC YANG models are parsed and schema tree is build using libyang API. +4. Application calls CVL APIs to validate the configuration. In case of Translib library DB Access layer calls appropriate CVL APIs to validate the configuration. +5. ABNF JSON goes through a translator and YANG data is generated. Metadata embedded in the YANG schema are used to help this translation process. +6. Then YANG data is fed to libyang for performing syntax validation first. If error occurs, CVL returns appropriate error code and details to application without proceeding further. +7. If syntax validation is successful, CVL uses dependent data from translated YANG data or if needed, fetches the dependent data from Redis DB. Dependent data refers to the data needed to validate constraint expressed in YANG syntax such as 'leafref', 'must', 'when' etc. +8. Finally translated YANG data and dependent data are merged and fed to libyang for performing semantics validation. If error occurs, CVL returns appropriate error code and details to application, else success is returned. +9. Platform validation is specific syntax and semantics validation only performed with the help of dynamic platform data as input. + +###### 3.2.2.8.2 Validation types + +Config Validator does Syntactic, Semantic validation and Platform Validation as per YIN schema with metadata. + +###### 3.2.2.8.2.1 Syntactic Validation + +Following are some of the syntactic validations supported by the config validation library + +* Basic data type +* Enum +* Ranges +* Pattern matching +* Check for mandatory field +* Check for default field +* Check for number of keys are their types +* Check for table size etc. + +###### 3.2.2.8.2.2 Semantic Validation + +* Check for key reference existence in other table +* Check any conditions between fields within same table +* Check any conditions between fields across different table + +###### 3.2.2.8.2.3 Platform specific validation + +There can be two types of platform constraint validation + +###### 3.2.2.8.2.3.1 Static Platform Constraint Validation + +* Platform constraints (range, enum, ‘must’/’when’ expression etc.) are expressed in SONiC YANG deviation model for each feature. + +Example of 'deviation' : + + deviation /svlan:sonic-vlan/svlan:VLAN/svlan:name { + deviate replace { + type string { + // Supports 3K VLANs in a specific platform + pattern "Vlan([1-3][0-9]{3}|[1-9][0-9]{2}|[1-9][0-9]|[1-9])"; + } + } + } + +* Deviation models are compiled along with corresponding SONiC YANG model and new constraints are added or overwritten in the compiled schema. + +###### 3.2.2.8.2.3.2 Dynamic Platform Constraint Validation + +###### 3.2.2.8.2.3.2.1 Platform data is available in Redis DB table. + +* Based on Redis DB, platform specific YANG data defintion can be added in SONiC YANG models. Constraints like ‘must’ or ‘when’ are used in feature YANG by cross-referencing platform YANG models. + +###### 3.2.2.8.2.3.2.2 Platform data is available through APIs + +* If constraints cannot be expressed using YANG syntax or platform data is available through feature/component API (APIs exposed by a feature to query platform specific constants, resource limitation etc.), custom validation needs to be hooked up in SONiC YANG model through custom YANG extension. +* CVL will generate stub code for custom validation. Feature developer populates the stub functions with functional validation code. The validation function should call feature/component API and fetch required parameter for checking constraints. +* Based on YANG extension syntax, CVL will call the appropriate custom validation function along with YANG instance data to be validated. + +###### 3.2.2.8.3 CVL APIs + + //Structure for key and data in API + type CVLEditConfigData struct { + VType CVLValidateType //Validation type + VOp CVLOperation //Operation type + Key string //Key format : "PORT|Ethernet4" + Data map[string]string //Value : {"alias": "40GE0/28", "mtu" : 9100, "admin_status": down} + } + + /* CVL Error Structure. */ + type CVLErrorInfo struct { + TableName string /* Table having error */ + ErrCode CVLRetCode /* Error Code describing type of error. */ + Keys []string /* Keys of the Table having error. */ + Value string /* Field Value throwing error */ + Field string /* Field Name throwing error . */ + Msg string /* Detailed error message. */ + ConstraintErrMsg string /* Constraint error message. */ + } + + /* Error code */ + type CVLRetCode int + const ( + CVL_SUCCESS CVLRetCode = iota + CVL_SYNTAX_ERROR /* Generic syntax error */ + CVL_SEMANTIC_ERROR /* Generic semantic error */ + CVL_ERROR /* Generic error */ + CVL_SYNTAX_MISSING_FIELD /* Missing field */ + CVL_SYNTAX_INVALID_FIELD /* Invalid Field */ + CVL_SYNTAX_INVALID_INPUT_DATA /*Invalid Input Data */ + CVL_SYNTAX_MULTIPLE_INSTANCE /* Multiple Field Instances */ + CVL_SYNTAX_DUPLICATE /* Duplicate Fields */ + CVL_SYNTAX_ENUM_INVALID /* Invalid enum value */ + CVL_SYNTAX_ENUM_INVALID_NAME /* Invalid enum name */ + CVL_SYNTAX_ENUM_WHITESPACE /* Enum name with leading/trailing whitespaces */ + CVL_SYNTAX_OUT_OF_RANGE /* Value out of range/length/pattern (data) */ + CVL_SYNTAX_MINIMUM_INVALID /* min-elements constraint not honored */ + CVL_SYNTAX_MAXIMUM_INVALID /* max-elements constraint not honored */ + CVL_SEMANTIC_DEPENDENT_DATA_MISSING /* Dependent Data is missing */ + CVL_SEMANTIC_MANDATORY_DATA_MISSING /* Mandatory Data is missing */ + CVL_SEMANTIC_KEY_ALREADY_EXIST /* Key already existing. */ + CVL_SEMANTIC_KEY_NOT_EXIST /* Key is missing. */ + CVL_SEMANTIC_KEY_DUPLICATE /* Duplicate key. */ + CVL_SEMANTIC_KEY_INVALID /* Invaid key */ + CVL_NOT_IMPLEMENTED /* Not implemented */ + CVL_INTERNAL_UNKNOWN /*Internal unknown error */ + CVL_FAILURE /* Generic failure */ + ) + +1. Initialize() - Initialize the library only once, subsequent calls does not affect once library is already initialized . This automatically called when if ‘cvl’ package is imported. +2. Finish() - Clean up the library resources. This should ideally be called when no more validation is needed or process is about to exit. +3. ValidateConfig(jsonData string) - Just validates json buffer containing multiple row instances of the same table, data instance from different tables. All dependency are provided in the payload. This is useful for bulk data validation. +4. ValidateEditConfig(cfgData []CVLEditConfigData) - Validates the JSON data for create/update/delete operation. Syntax or Semantics Validation can be done separately or together. Related data should be given as dependent data for validation to be succesful. +5. ValidateKey(key string) - Just validates the key and checks if it exists in the DB. It checks whether the key value is following schema format. Key should have table name as prefix. +6. ValidateField(key, field, value string) - Just validates the field:value pair in table. Key should have table name as prefix. + +##### 3.2.2.9 Redis DB + +Please see [3.2.2.6.4 DB access layer](#3_2_2_6_4-db-access-layer) + +##### 3.2.2.10 Non DB data provider + +Currently, it is up to each App module to perform the proprietary access +mechanism for the app specific configuration. + +## 4 Flow Diagrams + +### 4.1 REST SET flow + +![REST SET flow](images/write.jpg) + +1. REST client can send any of the write commands such as POST, PUT, PATCH or DELETE and it will be handled by the REST Gateway. +2. All handlers in the REST gateway will invoke a command request handler. +3. Authentication and authorization of the commands are done here. +4. Request handler invokes one of the write APIs exposed by the Translib. +5. Translib infra populates the YGOT structure with the payload of the request and performs a syntactic validation +6. Translib acquires the write lock (mutex lock) to avoid another write happening from the same process at the same time. +7. Translib infra gets the App module corresponding to the incoming URI. +8. Translib infra calls the initialize function of the App module with the YGOT structures, path and payload. +9. App module caches the incoming data into the app structure. +10. App module calls Transformer function to translate the request from cached YGOT structures into Redis ABNF format. It also gets all the keys that will be affected as part of this request. +11. App module returns the list of keys that it wants to keep a watch on along with the status. +12. Translib infra invokes the start transaction request exposed by the DB access layer. +13. DB access layer performs a WATCH of all the keys in the Redis DB. If any of these keys are modified externally then the EXEC call in step 26 will fail. +14. Status being returned from Redis. +15. Status being returned from DB access layer. +16. Translib then invokes the processWrite API on the App module. +17. App modules perform writes of the translated data to the DB access layer. +18. DB access layer validates the writes using CVL and then caches them. +19. Status being returned from DB access layer. +20. Status being returned from App module. +21. Translib infra invokes the commit transaction on the DB access layer. +22. DB access layer first invokes MULTI request on the Redis DB indicating there are multiple writes coming in, so commit everything together. All writes succeed or nothing succeeds. +23. Status returned from Redis. +24. Pipeline of all the cached writes are executed from the DB access layer. +25. Status retuned from Redis. +26. EXEC call is made to the Redis DB. Here if the call fails, it indicates that one of the keys that we watched has changed and none of the writes will go into the Redis DB. +27. Status returned from Redis DB. +28. Status returned from DB access layer. +29. Write lock acquired in Step 6 is released. +30. Status returned from the Translib infra. +31. REST Status returned from the Request handler. +32. REST response is sent by the REST gateway to the REST client. + +### 4.2 REST GET flow + +![REST GET flow](images/read.jpg) + +1. REST GET request from the REST client is sent to the REST Gateway. +2. REST Gateway invokes a common request handler. +3. Authentication of the incoming request is performed. +4. Request handler calls the Translib exposed GET API with the uri of the request. +5. Translib infra gets the App module corresponding to the incoming uri. +6. Translib infra calls the initialize function of the App module with the YGOT structures and path. App module caches them. +7. Status retuned from App module. +8. App module queries Transformer to translate the path to the Redis keys that need to be queried. +9. Status returned from App module. +10. Translib infra calls the processGet function on the App module +11. App modules calls read APIs exposed by the DB access layer to read data from the Redis DB. +12. Data is read from the Redis DB is returned to the App module +13. App module fills the YGOT structure with the data from the Redis DB and validated the filled YGOT structure for the syntax. +14. App module converts the YGOT structures to JSON format. +15. IETF JSON payload is returned to the Translib infra. +16. IETF JSON payload is returned to the request handler. +17. Response is returned to REST gateway. +18. REST response is returned to the REST client from the REST gateway. + +### 4.3 Translib Initialization flow + +![Translib Initialization flow](images/Init.jpg) + +1. App module 1 `init` is invoked +2. App module 1 calls `Register` function exposed by Translib infra to register itself with the Translib. +3. App module 2 `init` is invoked +4. App module 2 calls `Register` function exposed by Translib infra to register itself with the Translib. +5. App module N `init` is invoked +6. App module N calls `Register` function exposed by Translib infra to register itself with the Translib. + +This way multiple app modules initialize with the Translib infra during boot up. + +### 4.4 gNMI flow + +![GNMI flow](images/GNMI_flow.jpg) + +1. GNMI requests land in their respective GET/SET handlers which then redirect the requests to corresponding data clients. +2. If user does not provide target field then by default the request lands to the transl_data_client. +3. Next, the transl_data_client provides higher level abstraction along with collating the responses for multiple paths. +4. Transl Utils layer invokes Translib API's which in turn invoke App-Module API's and data is retrieved and modified in/from Redis Db/non-DB as required. + +### 4.5 CVL flow + +![CVL flow](images/CVL_flow.jpg) + +Above is the sequence diagram explaining the CVL steps. Note that interaction between DB Access layer and Redis including transactions is not shown here for brevity. + +1. REST/GNMI invokes one of the write APIs exposed by the Translib. +2. Translib infra populates the YGOT structure with the payload of the request and performs a syntactic validation. +3. Translib acquires the write lock (mutex lock) to avoid another write happening from the same process at the same time. +4. Translib infra gets the App module corresponding to the incoming uri. +5. Translib infra calls the initialize function of the App module with the YGOT structures, path and payload. +6. App module calls Transformer to translate the request from cached YGOT structure into Redis ABNF format. It also gets all the keys that will be affected as part of this request. +7. App modules returns the list of keys that it wants to keep a watch on along with the status. +8. Translib infra invokes the start transaction request exposed by the DB access layer. +9. Status being returned from DB access layer. +10. Translib then invokes the processWrite API on the App module. +11. App modules perform writes of the translated data to the DB access layer. +12. DB access layer calls validateWrite for CREATE/UPDATE/DELETE operation. It is called with keys and Redis/ABNF payload. +13. validateSyntax() feeds Redis data to translator internally which produces YANG XML. This is fed to libyang for validating the syntax. +14. If it is successful, control goes to next step, else error is returned to DB access layer. The next step is to ensure that keys are present in Redis DB for Update/Delete operation. But keys should not be present for Create operation. +15. Status is returned after checking keys. +16. CVL gets dependent data from incoming Redis payload. For example if ACL_TABLE and ACL_RULE is getting created in a single request. +17. Otherwise dependent should be present in Redis DB, query is sent to Redis to fetch it. +18. Redis returns response to the query. +19. Finally request data and dependent is merged and validateSemantics() is called. +20. If above step is successful, success is returned or else failure is returned with error details. +21. DB Access layer forwards the status response to App mpdule. +22. App module forwards the status response to Translib infra. +23. Translib infra invokes the commit transaction on the DB access layer. +24. Status is returned from DB access layer after performing commit operation. +25. Write lock acquired in Step 3 is released. +26. Final response is returned from the Translib infra to REST/GNMI. + +## 5 Developer Work flow +Developer work flow differs for standard YANG (IETF/OpenConfig) vs proprietary YANG used for a feature. When a standards-based YANG model is chosen for a new feature, the associated Redis DB design should take the design of this model into account - the closer the mapping between these, then the less translation logic is required in the Management path. This simplifies the work flow as translation intelligence can be avoided as both Redis schema and NB YANG schema are aligned. + +Where the YANG does not map directly to the Redis DB, the management framework provides mechanisms to represent complex translations via developer written custom functions. + +SONiC YANG should always be developed for a new feature, and should be purely based on the schema of config objects in DB. + +For the case where developer prefers to write a non-standard YANG model for a new or existing SONIC feature, the YANG should be written aligned to Redis schema such that the same YANG can be used for both northbound and CVL. This simplifies the developer work flow (explained in section 5.1) + + +### 5.1 Developer work flow for non Standards-based SONiC YANG + + Typical steps for writing a non-standard YANG for management framework are given below. +- Write the SONiC YANG based upon the Redis DB design. +- Add 'must','when' expressions to capture the dependency between config objects. +- Add required non-config, notification and RPC objects to the YANG. +- Add meta data for transformer. + +#### 5.1.1 Define SONiC YANG schema +Redis schema needs to be expressed in SONiC proprietary YANG model with all data types and constraints. Appropriate custom YANG extensions need to be used for expressing metadata. The YANG model is used by Config Validation Library(CVL) to provide automatic syntactic and semantic validation and Translib for Northbound management interface. + +Custom validation code needs to be written if some of the constraints cannot be expressed in YANG syntax. + +Please refer to [SONiC YANG Guidelines](https://github.com/Azure/SONiC/blob/master/doc/mgmt/SONiC_YANG_Model_Guidelines.md) for detiailed guidelines on writing SONiC YANG. + +#### 5.1.2 Generation of REST server stubs and Client SDKs for YANG based APIs + +* Place the main YANG modules under sonic-mgmt-framework/models/yang directory. + * By placing YANG module in this directory, on the next build, OpenAPI YAML (swagger spec) is generated for the YANG. + * If there is YANG which is augmenting the main YANG module, this augmenting YANG should also be placed in sonic-mgmt-framework/models/yang directory itself. +* Place all dependent YANG modules which are imported into the main YANG module such as submodules or YANGs which define typedefs, etc under sonic-mgmt-framework/models/yang/common directory. + * By placing YANG module in this directory, OpenAPI YAML (swagger spec) is not generated for the YANG modules, but the YANGs placed under sonic-mgmt-framework/models/yang can utilize or refer to types, and other YANG constraints from the YANG modules present in this directory. + * Example: ietf-inet-types.yang which mainly has typedefs used by other YANG models and generally we won't prefer having a YAML for this YANG, this type of YANG files can be placed under sonic-mgmt-framework/models/yang/common. +* Generation of REST server stubs and client SDKs will automatically happen when make command is executed as part of the build. + +#### 5.1.3 Config Translation App (Go language) +Config Translation App consists of two parts - Transformer and App module. They translate the data in Northbound API schema to the native Redis schema (refer to section 5.1.1) and vice versa. All Northbound API services like REST, GNMI, NETCONF invoke this App to read and write data through Translib API calls. + +Key features: + +* Go language. +* YANG to Redis and vice-versa data translation is handled by Transformer. The data translation happens based on the YANG model developed as per section 5.1.1. +* The processing of data is taken care by App module + * App consumes/produces YANG data through [YGOT](https://github.com/openconfig/YGOT) structures. + * Framework provides Go language APIs for Redis DB access. APIs are similar to existing Python APIs defined in sonic-py-swsssdk repo. + * The Transformer converts between these formats. + + * For read operation + * App module receives the YANG path to read in a Go struct + * App module uses the Transformer to get a DB entry(s) reference from this path + * App module reads the Redis entry(s) + * App module uses the Transformer to convert the returned DB object(s) to a Go struct + * App module returns this to TransLib + * For write operations + * App module receives the target YANG path and data in a Go struct + * App module uses the Transformer to translates the Go struct into appropriate Redis calls + * DB Access layer takes care of DB transactions - write everything or none + +* REST server provides a test UI for quick UT of App modules. This UI lists all REST APIs for a YANG and provides option to try them out. +* Pytest automation integration can make use of direct REST calls or CLI (which also makes use of REST APIs internally). Framework generates REST client SDK to facilitate direct REST API calls. + +#### 5.1.4 IS CLI +IS CLI is achieved using KLISH framework. + +* CLI tree is expressed in the XML file with node data types and hierarchy along with different modes. +* Actioner handler needs to be hooked up in XML for corresponding CLI syntax. Actioner handler should be developed to call client SDK APIs (i.e one action handler might need to call multiple client SDK APIs.) +* Show command output formatting is achieved using [Jinja](http://jinja.pocoo.org/) templates. So, the developer needs to check if an existing template can be used or new template needs to be written. + +#### 5.1.5 gNMI +There is no specific steps required for gNMI. + + +### 5.2 Developer work flow for standard-based (e.g. OpenConfig/IETF) YANG + +#### 5.2.1 Identify the standard YANG module for the feature for northbound APIs. +The Developer starts by selecting the standard YANG to be the basis for the Management data model. For SONiC, OpenConfig is the preferred source. However, in the absence of a suitable model there, other standards can be considered (e.g. IETF, IEEE, OEM de-facto standards). + +The SONiC feature implementation may not exactly match the chosen model - there may be parts of the YANG that are not implemented by (or have limitations in) the feature, or the Developer may wish to expose feature objects that are not covered by the YANG. In such cases, the Developer should extend the model accordingly. The preferred method is to write a deviation file and then use modification statements there (e.g. deviate, augment) accordingly. + +#### 5.2.2 Define the Redis schema for the new feature. (not applicable for legacy/existing feature) +The Redis DB design should take the YANG design into account, and try to stay as close to it as possible. This simplifies the translation processes within the Management implementation. Where this is not possible or appropriate, custom translation code must be provided. + + +#### 5.2.3 Define SONiC YANG schema +Redis schema needs to be expressed in SONiC YANG model with all data types and constraints. Appropriate custom YANG extensions need to be used for expressing this metadata. The YANG model is used by Config Validation Library(CVL)to provide automatic syntactic and semantic validation. + +Custom validation code needs to be written if some of the constraints cannot be expressed in YANG syntax. + +Please refer to [SONiC YANG Guidelines](https://github.com/Azure/SONiC/blob/master/doc/mgmt/SONiC_YANG_Model_Guidelines.md) for detiailed guidelines on writing SONiC YANG. + +#### 5.2.4 Generation of REST server stubs and Client SDKs for YANG based APIs + +* Place the main YANG modules under sonic-mgmt-framework/models/yang directory. + * By placing YANG module in this directory, YAML (swagger spec) is generated for the YANG. + * If there is YANG which is augmenting the main YANG module, this augmenting YANG should also be placed in sonic-mgmt-framework/models/yang directory itself. +* Place all dependent YANG modules such as submodules or YANGs which define typedefs, etc under sonic-mgmt-framework/models/yang/common directory. + * By placing YANG module in this directory, YAML (swagger spec) is not generated for the YANG modules, but the YANGs placed under sonic-mgmt-framework/models/yang can utilize or refer to types, and other YANG constraints from the YANG modules present in this directory. + * Example: ietf-inet-types.yang which mainly has typedefs used by other YANG models and generally we won't prefer having a YAML for this YANG, this type of YANG files can be placed under sonic-mgmt-framework/models/yang/common. +* Generation of REST-server stubs and client SDKs will automatically happen when make command is executed as part of the build. + + +#### 5.2.5 Config Translation App (Go language) +Config Translation App consists of two parts - Transformer and App module. They translate the data in Northbound API schema (defined in step#1) to the native Redis schema (defined in step#2) and vice versa. All Northbound API services like REST, GNMI, NETCONF invoke this App to read and write data. + +Key features: + +* Go language. +* YANG to Redis and vice-versa data translation is handled by Transformer. In order to facilitate data translation, the developer needs to provide just the YANG file for the data model + * YANG file for the data model + * Optionally, a YANG annotation file (refer to [3.2.2.7.8 Utilities](#32278-utilities)) to define translation hints to map YANG objects to DB objects. These translation hints are external callbacks for performing complex translation whereas simple translations are handled by Transformer's built-in methods. The annotation file is also placed in `sonic-mgmt-framework/models/yang` + * Code to define the translation callbacks, in `sonic-mgmt-framework/src/translib/transformer` +* The processing of data is taken care by App module + * App consumes/produces YANG data through [YGOT](https://github.com/openconfig/YGOT) structures. + * Framework provides Go language APIs for Redis DB access. APIs are similar to existing Python APIs defined in sonic-py-swsssdk repo. + * For read operation + * App module receives the YANG path to read in a Go struct + * App module uses the Transformer to get a DB entry(s) reference from this path + * App module reads the Redis entry(s) + * App module uses the Transformer to convert the returned DB object(s) to a Go struct + * App module returns this to TransLib + * For write operations + * App module receives the target YANG path and data as YGOT tree + * App module translates the YGOT tree data into appropriate Redis calls using reference from Transformer + * App module also handles additional complex logic like transaction ordering or checking for dependencies + * Translation Framework takes care of DB transactions - write everything or none +* REST server provides a test UI for quick UT of translation app. This UI lists all REST APIs for a YANG and provide option to try them out. REST server invokes Translation Apps. +* Spytest automation integration can make use of direct REST calls or CLI (which also makes use of REST internally - step#5). Framework generates REST client SDK to facilitate direct REST calls. + + +#### 5.2.6 IS CLI +IS CLI is achieved using KLISH framework and steps are same as in SONiC YANG model. Please refer to section [5.1.4 IS CLI](#514-IS-CLI). + +#### 5.2.7 gNMI +There is no specific steps required for gNMI. + + +## 6 Error Handling + +Validation is done at both north bound interface and against database schema. Appropriate error code is returned for invalid configuration. +All application errors are logged into syslog. + +## 7 Serviceability and Debug + +1. Detailed syslog messages to help trace a failure. +2. Debug commands will be added when debug framework becomes available. +3. CPU profiling enable/disable with SIGUR1 signal. + +## 8 Warm Boot Support + +Management Framework does not disrupt data plane traffic during warmboot. No special handling required for warmboot. + +## 9 Scalability + +Manageability framework will be scalable to handle huge payloads conforming to the standard/SONiC yang models. + +## 10 Unit Test + +#### GNMI +1. Verify that gnmi_get is working at Toplevel module +2. Verify thet gnmi_get is working for each ACL Table +3. Verify gnmi_get working for each ACL Rule: +4. Verify that gnmi_get is working for all ACL interfaces +5. Verify that gnmi_get is working for each ACL interface name +6. Verify that gnmi_get fails for non-existent ACL name and type +7. Verify that TopLevel node can be deleted +8. Verify that a particular ACL Table can be deleted +9. Verify that ACL rule can be deleted +10. Verify that ACL table can be created +11. Verify that ACL rule can be created +12. Verify that ACL binding can be created +13. Verify that creating rule on non existent ACL gives error +14. Verify that giving invalid interface number is payload gives error. +15. Verify that GNMI capabalities is returning correctly. + +#### Request Binder (YGOT) +1. create a YGOT object binding for the uri ends with container +2. create a YGOT object binding for the uri ends with leaf +3. create a YGOT object binding for the uri ends with list +4. create a YGOT object binding for the uri ends with leaf-list +5. create a YGOT object binding for the uri which has keys +6. create a YGOT object binding for the uri which has keys and ends with list with keys +7. validate the uri which has the correct number of keys +8. validate the uri which has the invalid node name +9. validate the uri which has the invalid key value +10. validate the uri which has the incorrect number of keys +11. validate the uri which has the invalid leaf value +12. validate the payload which has the incorrect number of keys +13. validate the payload which has the invalid node name +14. validate the payload which has the invalid leaf value +15. validate the uri and the payload with the "CREATE" operation +16. validate the uri and the payload with the "UPDATE" operation +17. validate the uri and the payload with the "DELETE" operation +18. validate the uri and the payload with the "REPLACE" operation +19. validate the getNodeName method for LIST node +20. validate the getNodeName method for leaf node +21. validate the getNodeName method for leaf-list node +22. validate the getParentNode method for LIST node +23. validate the getParentNode method for leaf node +24. validate the getParentNode method for leaf-list node +25. validate the getObjectFieldName method for LIST node +26. validate the getObjectFieldName method for leaf node +27. validate the getObjectFieldName method for leaf-list node + +#### DB access layer +1. Create, and close a DB connection. (NewDB(), DeleteDB()) +2. Get an entry (GetEntry()) +3. Set an entry without Transaction (SetEntry()) +4. Delete an entry without Transaction (DeleteEntry()) +5. Get a Table (GetTable()) +6. Set an entry with Transaction (StartTx(), SetEntry(), CommitTx()) +7. Delete an entry with Transaction (StartTx(), DeleteEntry(), CommitTx()) +8. Abort Transaction. (StartTx(), DeleteEntry(), AbortTx()) +9. Get multiple keys (GetKeys()) +10. Delete multiple keys (DeleteKeys()) +11. Delete Table (DeleteTable()) +12. Set an entry with Transaction using WatchKeys Check-And-Set(CAS) +13. Set an entry with Transaction using Table CAS +14. Set an entry with Transaction using WatchKeys, and Table CAS +15. Set an entry with Transaction with empty WatchKeys, and Table CAS +16. Negative Test(NT): Fail a Transaction using WatchKeys CAS +17. NT: Fail a Transaction using Table CAS +18. NT: Abort an Transaction with empty WatchKeys/Table CAS +19. NT: Check V logs, Error logs +20. NT: GetEntry() EntryNotExist. + +#### ACL app (via REST) +1. Verify that if no ACL and Rules configured, top level GET request should return empty response +2. Verify that bulk request for ACLs, multiple Rules within each ACLs and interface bindings are getting created with POST request at top level +3. Verify that all ACLs and Rules and interface bindings are shown with top level GET request +5. Verify that GET returns all Rules for single ACL +6. Verify that GET returns Rules details for single Rule +7. Verify that GET returns all interfaces at top level ACL-interfaces +8. Verify that GET returns one interface binding +9. Verify that single or multiple new Rule(s) can be added to existing ACL using POST/PATCH request +10. Verify that single or mutiple new ACLs can be added using POST/PATCH request +11. Verify that single or multiple new interface bindings can be added to existing ACL using POST/PATCH request +12. Verify that single Rule is deleted from an ACL with DELETE request +13. Verify that single ACL along with all its Rules and bindings are deleted with DELETE request +14. Verify that single interface binding is deleted with DELETE request +15. Verify that all ACLs and Rules and interface bindings are deleted with top level DELETE request +16. Verify that CVL throws error is ACL is created with name and type same as existing ACL with POST request +17. Verify that CVL throws error is RULE is created with SeqId, ACL name and type same as existing Rule with POST request +18. Verify that GET returns error for non exising ACL or Rule +19. Verify that CVL returns errors on creating rule under non existent ACL using POST request +20. Verify that CVL returns error on giving invalid interface number in payload during binding creation + +#### CVL +1. Check if CVL validation passes when data is given as JSON file +2. Check if CVL Validation passes for Tables with repeated keys like QUEUE,WRED_PROFILE and SCHEDULER +3. Check if CVL throws error when bad schema is passed +4. Check if debug trace level is changed as per updated conf file on receiving SIGUSR2 +5. Check must constraint for DELETE throws failure if condition fails, (if acl is a bound to port, deleting the acl rule throws error due to must constraint) +6. Check if CVL Validation passes when data has cascaded leafref dependency (Vlan Member->Vlan->Port) +7. Check if Proper Error Tag is returned when must condition is not satisfied +8. Check if CVL Validation passes if Redis is loaded with dependent data for UPDATE operation. +9. Check is CVL Error is returned when any mandatory node is not provided. +10. Check if CVL validation passes when global cache is updated for PORT Table for "must" expressions. +11. Check if CVL is able to validate JSON data given in JSON file for VLAN , ACL models +12. Check if CVL initialization is successful +13. Check if CVL is able to validate JSON data given in string format for CABLE LENGTH +14. Check if CVL failure is returned if input JSON data has incorrect key +15. Check if CVL is returning CVL_SUCCESS for Create operation if Dependent Data is present in Redis +16. Check if CVL is returning CVL_FAILURE for Create operation with invalid field for CABLE_LENGTH . +17. Check is CVL Error is returned for any invalid field in leaf +18. Check is Valid CVL_SUCCESS is returned for Valid field for ACL_TABLE when data is given in Test Structure +19. Check is Valid CVL_SUCCESS is returned for Valid field for ACL_RULE where Dependent data is provided in same session +20. Check if CVL is returning CVL_FAILURE for Create operation with invalid Enum vaue +21. Check if CVL validation fails when incorrect IP address prefix is provided. +22. Check is CVL validation fails when incorrect IP address is provided. +23. Check is CVL validation fails when out of bound are provided. +24. Check is CVL validation fails when invalid IP protocol +25. Check is CVL validation fails when out of range values are provided. +26. Check if CVL validation fails when incorrect key name is provided . +27. Check if CVL validation passes is any allowed special character is list name. +28. Check if CVL validation fails when key names contains junk characters. +29. Check if CVL validation fails when additional extra node is provided +30. Check is CVL validation passes when JSON data is given as buffer for DEVICE METEADATA +31. Check if CVL validation fails when key name does not contain separators. +32. Check if CVL validation fails when one of the keys is missing for Create operation +33. Check if CVL validation fails when there are no keys between separators for Create operation +34. Check if CVL validation fails when missing dependent data is provided for Update operation in same transaction +35. Check if CVL validation fails when missing dependent data is provided for Create operation in same transaction. +36. Check if CVL validation fails when there are no keys between separators for DELETE operation +37. Check if CVL validation fails when there are no keys between separators for UPDATE operation +38. Check if CVL validation fails when invalid key separators are provided for Delete operation +39. Check if CVL validation fails if UPDATE operation is given with invalid enum value +40. Check if CVL validation fails if UPDATE operation is given with invalid key containing missing keys +41. Check if CVL validation passes with dependent data present in same transaction for DELETE operation. +42. Check if CVL validation fails if DELETE operation is given with missing key for DELETE operation. +43. Check if CVL validation fails if UPDATE operation is given with missing key +44. Check if CVL validation fails when an existing key is provided in CREATE operation +45. Check if CVL validation passes for INTERFACE table +46. Check if CVL validation fails when configuration not satisfying must constraint is provided +47. Check if CVL validation passes when Redis has valid dependent data for UPDATE operation +48. Check if CVL validation fails when two different sequences are passed(Create and Update is same transaction) +49. Check if CVL validation fails for UPDATE operation when Redis does not have dependent data. +50. Check if CVL validation passes with valid dependent data given for CREATE operation. +51. Check if CVL validation fails when user tries to delete non existent key +52. Check if CVL Validation passes if Cache contains dependent data populated in same sessions but separate transaction. +53. Check if CVL Validation passes if Cache data dependent data that is populated across sessions +54. Check if CVL Validation fails when incorrect dependent Data is provided for CREATE operation +55. Check if CVL validation passes when valid dependent data is provided for CREATE operation +56. Check if Proper Error Tag is returned when must condition is not satisfied in "range" +57. Check if Proper Error Tag is returned when must condition is not satisfied in "length" +58. Check if Proper Error Tag is returned when must condition is not satisfied in "pattern" +59. Check if DELETE fails when ACL Table is tried to Rule or when DELETE tries to delete TABLE with non-empty leafref +60. Check if validation fails when non-existent dependent data is provided. +61. Check if CVL validation fails when DELETE tries to delete leafref of another table(delete ACL table referenced by ACL rule) +62. Check if CVL Validation fails when unrelated chained dependent data is given. +63. Check if CVL Validation fails when VLAN range is out of bound and proper error message is returned +64. Check if Logs are printed as per configuration in log configuration file. +65. Check if DELETE operation is performed on single field +66. Check if CVL validation passes when valid dependent data is provided using a JSON file. +67. Check if CVL validation is passed when when delete is performed on Table and then connected leafref +68. Check if CVL validation is passes when JSON data can be given in file format +69. Check if CVL Finish operation is successful +70. Check if CVL validation passes when Entry can be deleted and created in same transaction +71. Check if CVL validation passes when two UPDATE operation are given + +## 11 Appendix A + +Following are the list of Open source tools used in Management framework + +1. [Gorilla/mux](https://github.com/gorilla/mux) +2. [Go datastructures](https://github.com/Workiva/go-datastructures/tree/master/queue) +3. [Swagger](https://swagger.io) +4. [gNMI client](https://github.com/jipanyang/gnxi) +5. [goyang](https://github.com/openconfig/goyang) +6. [YGOT](https://github.com/openconfig/ygot/ygot) +7. [GO Redis](https://github.com/go-redis/redis) +8. [Logging](https://github.com/google/glog) +9. [Profiling](https://github.com/pkg/profile) +10. [Validation](https://gopkg.in/go-playground/validator.v9) +11. [JSON query](https://github.com/antchfx/jsonquery) +12. [XML query](https://github.com/antchfx/xmlquery) +13. [Sorting](https://github.com/facette/natsort) +14. [pyangbind](https://github.com/robshakir/pyangbind) +15. [libYang](https://github.com/CESNET/libyang) + + +## 12 Appendix B + +Following are the list of Open source libraries used in telemetry container. +Always refer to the [Makefile](https://github.com/Azure/sonic-telemetry/blob/master/Makefile) for the sonic-telemetry container for current package list. + + +1. [GRPC](https://google.golang.org/grpc) +2. [GNMI](https://github.com/openconfig/gnmi/proto/gnmi) +3. [Protobuf](https://github.com/golang/protobuf/proto) +4. [goyang](https://github.com/openconfig/goyang) +5. [GO Redis](https://github.com/go-redis/redis) +6. [YGOT](https://github.com/openconfig/ygot/ygot) +7. [Logging](https://github.com/google/glog) +8. [GO Context](https://golang.org/x/net/context) +9. [Credentials](https://google.golang.org/grpc/credentials) +10. [Validation](https://gopkg.in/go-playground/validator.v9) +11. [GNXI utils](https://github.com/google/gnxi/utils) +12. [Gorilla/mux](https://github.com/gorilla/mux) +13. [jipanyang/xpath](https://github.com/jipanyang/gnxi/utils/xpath) +14. [c9s/procinfo](https://github.com/c9s/goprocinfo/linux) +15. [Workkiva/Queue](https://github.com/Workiva/go-datastructures/queue) +16. [jipanyang/gnmi client](https://github.com/jipanyang/gnmi/client/gnmi) +17. [xeipuuv/gojsonschema](https://github.com/xeipuuv/gojsonschema) diff --git a/doc/mgmt/SONiC_Management_Proposal_v1.3.pptx b/doc/mgmt/SONiC_Management_Proposal_v1.3.pptx new file mode 100644 index 0000000000..cad2029868 Binary files /dev/null and b/doc/mgmt/SONiC_Management_Proposal_v1.3.pptx differ diff --git a/doc/mgmt/SONiC_YANG_Model_Guidelines.md b/doc/mgmt/SONiC_YANG_Model_Guidelines.md new file mode 100644 index 0000000000..8f7057ec0f --- /dev/null +++ b/doc/mgmt/SONiC_YANG_Model_Guidelines.md @@ -0,0 +1,868 @@ + +# SONiC YANG MODEL GUIDELINES + + +## Revision + + | Rev | Date | Author | Change Description | + |:---:|:-----------:|:------------------:|-----------------------------------| + | 1.0 | 22 Aug 2019 | Praveen Chaudhary | Initial version | + | 1.0 | 11 Sep 2019 | Partha Dutta | Adding additional steps for SONiC YANG | + +## References +| References | Date/Version | Link | +|:-------------------------:|:-------------------:|:-----------------------------------:| +| RFC 7950 | August 2016 | https://tools.ietf.org/html/rfc7950 | +| Management Framework | 0.9 | https://github.com/Azure/SONiC/pull/436 | + +## Terminology and Acronyms +| Acronyms | Description/Expansion | +|:-------------------------:|:-------------------:| +| ABNF | Augmented Backus-Naur Form | +| XPath | XML Path Language | +| CVL | Configuration Validation Library | + + +## Overview + This document lists the guidelines, which will be used to write YANG Modules for SONiC. These YANG Modules (called SONiC YANG models) will be primarily based on or represent the ABNF.json of SONiC, and the syntax of YANG models must follow RFC 7950 ([https://tools.ietf.org/html/rfc7950](https://tools.ietf.org/html/rfc7950)). Details of config in format of ABNF.json can be found at https://github.com/Azure/SONiC/wiki/Configuration. + +These YANG models will be used to verify the configuration for SONiC switches, so a library which supports validation of configuration on SONiC Switch must use these YANG Models. List of such Libraries are: 1.) Configuration Validation Library. (CVL). YANG models, which are written using these guidelines can also be used as User End YANG Models, i.e North Bound configuration tools or CLI can provide config data in sync with these YANG models. For example [SONiC Management Framework](https://github.com/Azure/SONiC/pull/436) uses SONiC YANG models as Northbound Management YANG and for configuration validation purpose also. + +## Guidelines + + +### 1. All schema definitions related to a feature should be written in a single YANG model file. YANG model file is named as 'sonic-{feature}.yang', e.g. for ACL feature sonic-acl.yang is the file name. It is best to categorize YANG Modules based on a networking components. For example, it is good to have separate modules for VLAN, ACL, PORT and IP-ADDRESSES etc. + +``` +sonic-acl.yang +sonic-interface.yang +sonic-port.yang +sonic-vlan.yang +``` + +### 2. It is mandatory to define a top level YANG container named as 'sonic-{feature}' i.e. same as YANG model name. For example, sonic-acl.yang should have 'sonic-acl' as top level container. All other definition should be written inside the top level container. + +Example : +#### YANG +``` +module sonic-acl { + container sonic-acl { + ..... + ..... + } +} +``` + +### 3. Define namespace as "http://github.com/Azure/{model-name}". + +Example : +#### YANG +``` +module sonic-acl { + namespace "http://github.com/Azure/sonic-acl"; + ..... + ..... +} +``` + +### 4. Use 'revision' to record revision history whenever YANG model is changed. Updating revision with appropriate details helps tracking the changes in the model. + +Example : + +#### YANG +``` +module sonic-acl { + revision 2019-09-02 { + description + "Added must expression for ACL_RULE_LIST."; // Updated with new change details + } + + revision 2019-09-01 { + description + "Initial revision."; + } + ..... + ..... +} +``` + +### 5. Each primary section of ABNF.json (i.e a dictionary in ABNF.json) for example, VLAN, VLAN_MEMBER, INTERFACE in ABNF.json will be mapped to a container in YANG model. + +Example: Table VLAN will translate to container VLAN. + +#### ABNF + +``` +"VLAN": { + "Vlan100": { + "vlanid": "100" + } + } +``` +will translate to: + +#### YANG +-- +``` +container VLAN { //"VLAN" mapped to a container + list VLAN_LIST { + key name; + leaf vlanid { + type uint16; + } + } +} +``` + + +### 6. Each leaf in YANG module should have same name (including their case) as corresponding key-fields in ABNF.json. + +Example: +Leaf names are same PACKET_ACTION, IP_TYPE and PRIORITY, which are defined in ABNF. + +#### ABNF +``` + "NO-NSW-.....|.....": { + "PACKET_ACTION": "FORWARD", + "IP_TYPE": "IPv6ANY", + "PRIORITY": "955520", + ..... + ..... + }, +``` + +#### YANG +``` + leaf PACKET_ACTION { + ..... + } + leaf IP_TYPE { + ..... + } + leaf PRIORITY { + ..... + } +``` + +### 7. Use IETF data types for leaf type first if applicable (RFC 6021) . Declare new type (say SONiC types) only if IETF type is not applicable. All SONiC types must be part of same header type or common YANG model. +Example: + +#### YANG +``` + leaf SRC_IP { + type inet:ipv4-prefix; <<<< + } + + leaf DST_IP { + type inet:ipv4-prefix; + } +``` + +### 8. Data Node/Object Hierarchy of the an objects in YANG models will be same as for all the fields at same hierarchy in Config DB. If any exception is created then it must be recorded properly with comment under object level in YANG models. To see an example of a comment, please refer to the step with heading as "Comment all must, when and patterns conditions." below. + +For Example: + +"Family" of VLAN_INTERFACE and "IP_TYPE" of ACL_RULE should be at same level in YANG model too. + +#### ABNF +``` +"VLAN_INTERFACE": { + "Vlan100|2a04:f547:45:6709::1/64": { + "scope": "global", + "family": "IPv6" + } +} +"ACL_RULE": { + "NO-NSW-PACL-V4|DEFAULT_DENY": { + "PACKET_ACTION": "DROP", + "IP_TYPE": "IPv4ANY", + "PRIORITY": "0" + } +} +``` +#### YANG +In YANG, "Family" of VLAN_INTERFACE and "IP_TYPE" of ACL_RULE is at same level. +``` +container VLAN_INTERFACE { + description "VLAN_INTERFACE part of config_db.json"; + list VLAN_INTERFACE_LIST { + ...... + ...... + leaf family { + type sonic-head:ip-family; + } + } +} + +container ACL_RULE { + description "ACL_RULE part of config_db.json"; + list ACL_RULE_LIST { + ...... + ...... + leaf IP_TYPE { + type sonic-head:ip_type; + } + } +} +``` + +### 9. If an object is part of primary-key in ABNF.json, then it should be a key in YANG model. In YANG models, a primary-key from ABNF.json can be represented either as name of a Container object or as a key field in List object. Exception must be recorded in YANG Model with a comment in object field. To see an example of a comment, please refer to the step with heading as "Comment all must, when and patterns conditions." below. Though, key names are not stored in Redis DB, use the same key name as defined in ABNF schema. + +Example: VLAN_MEMBER dictionary in ABNF.json has both vlan-id and ifname part of the key. So YANG model should have the same keys. + +#### ABNF +``` + "VLAN_MEMBER": { + "Vlan100|Ethernet0": { //<<<< KEYS + "tagging_mode": "untagged" + } + } +``` +;Defines interfaces which are members of a vlan +key = VLAN_MEMBER_TABLE:"Vlan"vlanid:ifname ; + +#### YANG +``` +container VLAN_MEMBER { + description "VLAN_MEMBER part of config_db.json"; + list ..... { + key "vlan-name ifname";//<<<< KEYS + } +} +``` + +### 10. If any key used in current table refers to other table, use leafref type for the key leaf definition. + +Example : +#### ABNF + +``` +key: ACL_RULE_TABLE:table_name:rule_name +..... +..... +``` +#### YANG +``` +..... +container ACL_TABLE { + list ACL_TABLE_LIST { + key table_name; + + leaf table_name { + type string; + } + ..... + ..... + } +} +container ACL_RULE { + list ACL_RULE_LIST { + key "table_name rule_name"; + + leaf table_name { + type leafref { + path "../../../ACL_TABLE/ACL_TABLE_LIST/aclname"; //Refers to other table 'ACL_TABLE' + } + } + + leaf rule_name { + type string; + } + ..... + ..... + } +} +``` + +### 11. Mapping tables in Redis are defined using nested 'list'. Use 'sonic-ext:map-list "true";' to indicate that the 'list' is used for mapping table. The outer 'list' is used for multiple instances of mapping. The inner 'list' is used for mapping entries for each outer list instance. + +Example : + +#### ABNF +``` +; TC to queue map +;SAI mapping - qos_map with SAI_QOS_MAP_ATTR_TYPE == SAI_QOS_MAP_TC_TO_QUEUE. See saiqosmaps.h +key = "TC_TO_QUEUE_MAP_TABLE:"name ;field +tc_num = 1*DIGIT ;values +queue = 1*DIGIT; queue index + +``` + +#### YANG +``` + container TC_TO_QUEUE_MAP { + list TC_TO_QUEUE_MAP_LIST { + key "name"; + sonic-ext:map-list "true"; //special annotation for map table + + leaf name { + type string; + } + + list TC_TO_QUEUE_MAP_LIST { //this is list inside list for storing mapping between two fields + key "tc_num qindex"; + leaf tc_num { + type string { + pattern "[0-9]?"; + } + } + leaf qindex { + type string { + pattern "[0-9]?"; + } + } + } + } + } +``` + + +### 12. 'ref_hash_key_reference' in ABNF schema is defined using 'leafref' to the referred table. + +Example : + +#### ABNF +``` +; QUEUE table. Defines port queue. +; SAI mapping - port queue. + +key = "QUEUE_TABLE:"port_name":queue_index +queue_index = 1*DIGIT +port_name = ifName +queue_reference = ref_hash_key_reference ;field value +scheduler = ref_hash_key_reference; reference to scheduler key +wred_profile = ref_hash_key_reference; reference to wred profile key + +``` + +#### YANG +``` +container sonic-queue { + container QUEUE { + list QUEUE_LIST { + ..... + leaf scheduler { + type leafref { + path "/sch:sonic-scheduler/sch:SCHEDULER/sch:name"; //Reference to SCHEDULER table + } + } + + leaf wred_profile { + type leafref { + path "/wrd:sonic-wred-profile/wrd:WRED_PROFILE/wrd:name"; // Reference to WRED_PROFILE table + } + } + } + } +} +``` + +### 13. To establish complex relationship and constraints among multiple tables use 'must' expression. Define appropriate error message for reporting to Northbound when condition is not met. For existing feature, code logic could be reference point for deriving 'must' expression. +Example: + +#### YANG +``` + must "(/sonic-ext:operation/sonic-ext:operation != 'DELETE') or " + + "count(../../ACL_TABLE[aclname=current()]/ports) = 0" { + error-message "Ports are already bound to this rule."; + } +``` + +### 14. Define appropriate 'error-app-tag' and 'error' messages for in 'length', 'pattern', 'range' and 'must' statement so that management application can use it for error reporting. + +Example: + +#### YANG +``` +module sonic-vlan { + .... + .... + leaf vlanid { + mandatory true; + type uint16 { + range "1..4095" { + error-message "Vlan ID out of range"; + error-app-tag vlanid-invalid; + } + } + } + .... + .... +} + +``` + +### 15. All must, when, pattern and enumeration constraints can be derived from .h files or from code. If code has the possibility to have unknown behavior with some config, then we should put a constraint in YANG models objects. Also, Developer can put any additional constraint to stop invalid configuration. For new features, constraints may be derived based on low-level design document. + +For Example: Enumeration of IP_TYPE comes for aclorch.h +``` +#define IP_TYPE_ANY "ANY" +#define IP_TYPE_IP "IP" +#define IP_TYPE_NON_IP "NON_IP" +#define IP_TYPE_IPv4ANY "IPV4ANY" +#define IP_TYPE_NON_IPv4 "NON_IPv4" +#define IP_TYPE_IPv6ANY "IPV6ANY" +#define IP_TYPE_NON_IPv6 "NON_IPv6" +#define IP_TYPE_ARP "ARP" +#define IP_TYPE_ARP_REQUEST "ARP_REQUEST" +#define IP_TYPE_ARP_REPLY "ARP_REPLY" +``` +Example of When Statement: Orchagent of SONiC will have unknown behavior if below config is entered, So YANG must have a constraint. Here SRC_IP is IPv4, where as IP_TYPE is IPv6. + +#### ABNF: +``` + "ACL_RULE": { + "NO-NSW-PACL-V4|Rule_20": { + "PACKET_ACTION": "FORWARD", + "DST_IP": "10.186.72.0/26", + "SRC_IP": "10.176.0.0/15", + "PRIORITY": "999980", + "IP_TYPE": "IPv6" + }, + +``` +#### YANG: +``` +choice ip_prefix { + case ip4_prefix { + when "boolean(IP_TYPE[.='ANY' or .='IP' or .='IPV4' or .='IPV4ANY' or .='ARP'])"; + leaf SRC_IP { + type inet:ipv4-prefix; + } + leaf DST_IP { + type inet:ipv4-prefix; + } + } + case ip6_prefix { + when "boolean(IP_TYPE[.='ANY' or .='IP' or .='IPV6' or .='IPV6ANY'])"; + leaf SRC_IPV6 { + type inet:ipv6-prefix; + } + leaf DST_IPV6 { + type inet:ipv6-prefix; + } + } + } +``` +Example of Pattern: If PORT Range should be "<0-65365> - <0-65365>" +``` +leaf L4_DST_PORT_RANGE { + type string { + pattern '([0-9]{1,4}|[0-5][0-9]{4}|[6][0-4][0-9]{3}|[6][5][0-2][0-9]{2}|[6][5][3][0-5]{2}|[6][5][3][6][0-5])-([0-9]{1,4}|[0-5][0-9]{4}|[6][0-4][0-9]{3}|[6][5][0-2][0-9]{2}|[6][5][3][0-5]{2}|[6][5][3][6][0-5])'; + } +} +``` +### 16. Comment all must, when and patterns conditions. See example of comment below. +Example: + +#### YANG +``` +leaf family { + /* family leaf needed for backward compatibility + Both ip4 and ip6 address are string in IETF RFC 6020, + so must statement can check based on : or ., family + should be IPv4 or IPv6 according. + */ + must "(contains(../ip-prefix, ':') and current()='IPv6') or + (contains(../ip-prefix, '.') and current()='IPv4')"; + type sonic-head:ip-family; + } +``` + + +### 17. If a List object is needed in YANG model to bundle multiple entries from a Table in ABNF.json, but this LIST is not a valid entry in config data, then we must define such list as _LIST . + +For Example: Below entries in PORTCHANNEL_INTERFACE Table must be part of List Object in YANG model, because variable number of entries may be present in config data. But there is no explicit list in config data. To support this, a list object with name PORTCHANNEL_INTERFACE_LIST should be added in YANG model. +#### ABNF: +``` +"PORTCHANNEL_INTERFACE": { + "PortChannel01|10.0.0.56/31": {}, + "PortChannel01|FC00::71/126": {}, + "PortChannel02|10.0.0.58/31": {}, + "PortChannel02|FC00::75/126": {} + ... + } +``` + +#### YANG +``` +container PORTCHANNEL_INTERFACE { + + description "PORTCHANNEL_INTERFACE part of config_db.json"; + + list PORTCHANNEL_INTERFACE_LIST {<<<<< + ..... + ..... + } +} +``` + +### 18. In some cases it may be required to split an ABNF table into multiple YANG lists based on the data stored in the ABNF table. + +Example : "INTERFACE" table stores VRF names to which an interface belongs, also it stores IP address of each interface. Hence it is needed to split them into two different YANG lists. + +#### ABNF +``` +"INTERFACE" : { + "Ethernet1" : { + "vrf-name": "vrf1" + } + "Ethernet1|10.184.230.211/31": { + } +} +``` +#### YANG +``` +...... +container sonic-interface { + container INTERFACE { + list INTERFACE_LIST { // 1st list + key ifname; + + leaf ifname { + type leafref { + ...... + } + } + leaf vrf-name { + type leafref { + ...... + } + } + ...... + } + + list INTERFACE_IPADDR_LIST { //2nd list + key ifname, ip_addr; + + leaf ifname { + type leafref { + ...... + } + } + leaf ip_addr { + type inet:ipv4-prefix; + } + ...... + } + } +} +...... +``` + +### 19. Add read-only nodes for state data using 'config false' statement. Define a separate top level container for state data. If state data is defined in other DB than CONFIG_DB, use extension 'sonic-ext:db-name' for defining the table present in other Redis DB. The default separator used in table key is "|", if it is different, use 'sonic-ext:key-delim {separator};' YANG extension. This step applies when SONiC YANG is used as Northbound YANG. + +Example: + +#### YANG +``` +container ACL_RULE { + list ACL_RULE_LIST { + .... + .... + container state { + sonic-ext:db-name "APPL_DB"; //For example only + sonic-ext:key-delim ":"; //For example only + + config false; + description "State data"; + + leaf MATCHED_PACKETS { + type yang:counter64; + } + + leaf MATCHED_OCTETS { + type yang:counter64; + } + } + } +} +``` + +### 20. Define custom RPC for executing command like clear, reset etc. No configuration should change through such RPCs. Define 'input' and 'output' as needed, however they are optional. This step applies when SONiC YANG is used as Northbound YANG. + +Example: + +#### YANG +``` +container sonic-acl { + .... + .... + rpc clear-stats { + input { + leaf aclname { + type string; + } + + leaf rulename { + type string; + } + } + } +} +``` + +### 21. Define Notification for sending out events generated in the system, e.g. link up/down or link failure event. This step applies when SONiC YANG is used as Northbound YANG. + +Example: + +#### YANG +``` +module sonic-port { + .... + .... + notification link_event { + leaf port { + type leafref { + path "../../PORT/PORT_LIST/ifname"; + } + } + } +} +``` + + + + + + + +## APPENDIX + +### Sample SONiC ACL YANG + +``` +module sonic-acl { + namespace "http://github.com/Azure/sonic-acl"; + prefix sacl; + yang-version 1.1; + + import ietf-yang-types { + prefix yang; + } + + import ietf-inet-types { + prefix inet; + } + + import sonic-common { + prefix scommon; + } + + import sonic-port { + prefix prt; + } + + import sonic-portchannel { + prefix spc; + } + + import sonic-mirror-session { + prefix sms; + } + + import sonic-pf-limits { + prefix spf; + } + + organization + "SONiC"; + + contact + "SONiC"; + + description + "SONiC YANG ACL"; + + revision 2019-09-11 { + description + "Initial revision."; + } + + container sonic-acl { + container ACL_TABLE { + list ACL_TABLE_LIST { + key "table_name"; + + leaf table_name { + type string; + } + + leaf policy_desc { + type string { + length 1..255 { + error-app-tag policy-desc-invalid-length; + } + } + } + + leaf stage { + type enumeration { + enum INGRESS; + enum EGRESS; + } + } + + leaf type { + type enumeration { + enum MIRROR; + enum L2; + enum L3; + enum L3V6; + } + } + + leaf-list ports { + type union { + type leafref { + path "/prt:sonic-port/prt:PORT/prt:PORT_LIST/prt:ifname"; + } + type leafref { + path "/spc:sonic-portchannel/spc:PORTCHANNEL/spc:PORTCHANNEL_LIST/spc:name"; + } + } + } + } + } + + container ACL_RULE { + list ACL_RULE_LIST { + key "table_name rule_name"; + leaf table_name { + type leafref { + path "../../../ACL_TABLE/ACL_TABLE_LIST/table_name"; + } + must "(/scommon:operation/scommon:operation != 'DELETE') or " + + "(current()/../../../ACL_TABLE/ACL_TABLE_LIST[table_name=current()]/type = 'L3')" { + error-message "Type not staisfied."; + } + } + + leaf rule_name { + type string; + } + + leaf PRIORITY { + type uint16 { + range "1..65535"{ + error-message "Invalid ACL rule priority."; + } + } + } + + leaf RULE_DESCRIPTION { + type string; + } + + leaf PACKET_ACTION { + type enumeration { + enum FORWARD; + enum DROP; + enum REDIRECT; + } + } + + leaf MIRROR_ACTION { + type leafref { + path "/sms:sonic-mirror-session/sms:MIRROR_SESSION/sms:MIRROR_SESSION_LIST/sms:name"; + } + } + + leaf IP_TYPE { + type enumeration { + enum ANY; + enum IP; + enum IPV4; + enum IPV4ANY; + enum NON_IPV4; + enum IPV6ANY; + enum NON_IPV6; + } + } + + leaf IP_PROTOCOL { + type uint8 { + range "1|2|6|17|46|47|51|103|115"; + } + } + + leaf ETHER_TYPE { + type string { + pattern "(0x88CC)|(0x8100)|(0x8915)|(0x0806)|(0x0800)|(0x86DD)|(0x8847)" { + error-message "Invalid ACL Rule Ether Type"; + error-app-tag ether-type-invalid; + } + } + } + + choice ip_src_dst { + case ipv4_src_dst { + when "boolean(IP_TYPE[.='ANY' or .='IP' or .='IPV4' or .='IPV4ANY'])"; + leaf SRC_IP { + mandatory true; + type inet:ipv4-prefix; + } + leaf DST_IP { + mandatory true; + type inet:ipv4-prefix; + } + } + case ipv6_src_dst { + when "boolean(IP_TYPE[.='ANY' or .='IP' or .='IPV6' or .='IPV6ANY'])"; + leaf SRC_IPV6 { + mandatory true; + type inet:ipv6-prefix; + } + leaf DST_IPV6 { + mandatory true; + type inet:ipv6-prefix; + } + } + } + + choice src_port { + case l4_src_port { + leaf L4_SRC_PORT { + type uint16; + } + } + case l4_src_port_range { + leaf L4_SRC_PORT_RANGE { + type string { + pattern "[0-9]{1,5}(-)[0-9]{1,5}"; + } + } + } + } + + choice dst_port { + case l4_dst_port { + leaf L4_DST_PORT { + type uint16; + } + } + case l4_dst_port_range { + leaf L4_DST_PORT_RANGE { + type string { + pattern "[0-9]{1,5}(-)[0-9]{1,5}"; + } + } + } + } + + leaf TCP_FLAGS { + type string { + pattern "0[xX][0-9a-fA-F]{2}[/]0[xX][0-9a-fA-F]{2}"; + } + } + + leaf DSCP { + type uint8; + } + } + } + } +} + +``` + diff --git a/doc/mgmt/Sample_Transformer_v1.0.pptx b/doc/mgmt/Sample_Transformer_v1.0.pptx new file mode 100644 index 0000000000..30fff438ff Binary files /dev/null and b/doc/mgmt/Sample_Transformer_v1.0.pptx differ diff --git a/doc/mgmt/Transformer_Create_v1.pptx b/doc/mgmt/Transformer_Create_v1.pptx new file mode 100644 index 0000000000..58c220f879 Binary files /dev/null and b/doc/mgmt/Transformer_Create_v1.pptx differ diff --git a/doc/mgmt/images/CVL_Arch.jpg b/doc/mgmt/images/CVL_Arch.jpg new file mode 100644 index 0000000000..b3fb56ac6b Binary files /dev/null and b/doc/mgmt/images/CVL_Arch.jpg differ diff --git a/doc/mgmt/images/CVL_flow.jpg b/doc/mgmt/images/CVL_flow.jpg new file mode 100644 index 0000000000..25d8c7a2de Binary files /dev/null and b/doc/mgmt/images/CVL_flow.jpg differ diff --git a/doc/mgmt/images/GNMI_Server.png b/doc/mgmt/images/GNMI_Server.png new file mode 100644 index 0000000000..f5886e35f8 Binary files /dev/null and b/doc/mgmt/images/GNMI_Server.png differ diff --git a/doc/mgmt/images/GNMI_flow.jpg b/doc/mgmt/images/GNMI_flow.jpg new file mode 100644 index 0000000000..150c0b064c Binary files /dev/null and b/doc/mgmt/images/GNMI_flow.jpg differ diff --git a/doc/mgmt/images/Init.jpg b/doc/mgmt/images/Init.jpg new file mode 100644 index 0000000000..23e1796864 Binary files /dev/null and b/doc/mgmt/images/Init.jpg differ diff --git a/doc/mgmt/images/Mgmt_Frmk_Arch.jpg b/doc/mgmt/images/Mgmt_Frmk_Arch.jpg new file mode 100644 index 0000000000..dec92ecb97 Binary files /dev/null and b/doc/mgmt/images/Mgmt_Frmk_Arch.jpg differ diff --git a/doc/mgmt/images/cli_interactions.jpg b/doc/mgmt/images/cli_interactions.jpg new file mode 100644 index 0000000000..c6f16c8f79 Binary files /dev/null and b/doc/mgmt/images/cli_interactions.jpg differ diff --git a/doc/mgmt/images/crud_v1.png b/doc/mgmt/images/crud_v1.png new file mode 100644 index 0000000000..c9a5fcfd68 Binary files /dev/null and b/doc/mgmt/images/crud_v1.png differ diff --git a/doc/mgmt/images/dial_in_out.png b/doc/mgmt/images/dial_in_out.png new file mode 100644 index 0000000000..8148cf72e9 Binary files /dev/null and b/doc/mgmt/images/dial_in_out.png differ diff --git a/doc/mgmt/images/get_v1.png b/doc/mgmt/images/get_v1.png new file mode 100644 index 0000000000..4b8af01237 Binary files /dev/null and b/doc/mgmt/images/get_v1.png differ diff --git a/doc/mgmt/images/read.jpg b/doc/mgmt/images/read.jpg new file mode 100644 index 0000000000..da4357a69f Binary files /dev/null and b/doc/mgmt/images/read.jpg differ diff --git a/doc/mgmt/images/transformer_components_v1.png b/doc/mgmt/images/transformer_components_v1.png new file mode 100644 index 0000000000..2453d217c9 Binary files /dev/null and b/doc/mgmt/images/transformer_components_v1.png differ diff --git a/doc/mgmt/images/transformer_design.PNG b/doc/mgmt/images/transformer_design.PNG new file mode 100644 index 0000000000..80673116f2 Binary files /dev/null and b/doc/mgmt/images/transformer_design.PNG differ diff --git a/doc/mgmt/images/write.jpg b/doc/mgmt/images/write.jpg new file mode 100644 index 0000000000..0658a1096f Binary files /dev/null and b/doc/mgmt/images/write.jpg differ diff --git a/doc/mgmt/sonic-acl.yang b/doc/mgmt/sonic-acl.yang new file mode 100644 index 0000000000..4081d1d8ad --- /dev/null +++ b/doc/mgmt/sonic-acl.yang @@ -0,0 +1,252 @@ +module sonic-acl { + namespace "http://github.com/Azure/sonic-acl"; + prefix sacl; + yang-version 1.1; + + import ietf-yang-types { + prefix yang; + } + + import ietf-inet-types { + prefix inet; + } + + import sonic-common { + prefix scommon; + } + + import sonic-port { + prefix prt; + } + + import sonic-portchannel { + prefix spc; + } + + import sonic-mirror-session { + prefix sms; + } + + import sonic-pf-limits { + prefix spf; + } + + organization + "BRCM"; + + contact + "BRCM"; + + description + "SONIC ACL"; + + revision 2019-05-15 { + description + "Initial revision."; + } + + container sonic-acl { + scommon:db-name "CONFIG_DB"; + + list ACL_TABLE { + key "aclname"; + scommon:key-delim "|"; + scommon:key-pattern "ACL_TABLE|{aclname}"; + + /* must "count(/prt:sonic-port/prt:PORT) > 0"; */ + + leaf aclname { + type string; + } + + leaf policy_desc { + type string { + length 1..255 { + error-app-tag policy-desc-invalid-length; + } + } + } + + leaf stage { + type enumeration { + enum INGRESS; + enum EGRESS; + } + } + + leaf type { + type enumeration { + enum MIRROR; + enum L2; + enum L3; + enum L3V6; + } + } + + leaf-list ports { + /*type union { */ + type leafref { + path "/prt:sonic-port/prt:PORT/prt:ifname"; + } + /* type leafref { + path "/spc:sonic-portchannel/spc:PORTCHANNEL/spc:name"; + } + }*/ + } + } + + list ACL_RULE { + key "aclname rulename"; + scommon:key-delim "|"; + scommon:key-pattern "ACL_RULE|{aclname}|{rulename}"; + scommon:pf-check "ACL_CheckAclLimits"; + + /* Limit for number of dynamic ACL rules */ + /*must "count(../ACL_RULE) > /spf:sonic-pf-limits/acl/MAX_ACL_RULES" { + error-message "Number of ACL rules reached max platform limit."; + } + must "PRIORITY > /spf:sonic-pf-limits/acl/MAX_PRIORITY" { + error-message "Invalid ACL rule priority."; + } + must "count(../ACL_TABLE) > 0 and count(/prt:sonic-port/prt:PORT) > 0"; //Temporary work-around + */ + + leaf aclname { + type leafref { + path "../../ACL_TABLE/aclname"; + } + must "(/scommon:operation/scommon:operation != 'DELETE') or " + + "count(../../ACL_TABLE[aclname=current()]/ports) = 0" { + error-message "Ports are already bound to this rule."; + } + } + + leaf rulename { + type string; + } + + leaf PRIORITY { + type uint16 { + range "1..65535"; + } + } + + leaf RULE_DESCRIPTION { + type string; + } + + leaf PACKET_ACTION { + type enumeration { + enum FORWARD; + enum DROP; + enum REDIRECT; + } + } + + leaf MIRROR_ACTION { + type leafref { + path "/sms:sonic-mirror-session/sms:MIRROR_SESSION/sms:name"; + } + } + + leaf IP_TYPE { + type enumeration { + enum any; + enum ip; + enum ipv4; + enum ipv4any; + enum non_ipv4; + enum ipv6any; + enum non_ipv6; + } + } + + leaf IP_PROTOCOL { + type uint8 { + range "1|2|6|17|46|47|51|103|115"; + } + } + + leaf ETHER_TYPE { + type string{ + pattern "(0x88CC)|(0x8100)|(0x8915)|(0x0806)|(0x0800)|(0x86DD)|(0x8847)"; + } + } + + choice ip_src_dst { + case ipv4_src_dst { + leaf SRC_IP { + mandatory true; + type inet:ipv4-prefix; + } + leaf DST_IP { + mandatory true; + type inet:ipv4-prefix; + } + } + case ipv6_src_dst { + leaf SRC_IPV6 { + mandatory true; + type inet:ipv6-prefix; + } + leaf DST_IPV6 { + mandatory true; + type inet:ipv6-prefix; + } + } + } + + choice src_port { + case l4_src_port { + leaf L4_SRC_PORT { + type uint16; + } + } + case l4_src_port_range { + leaf L4_SRC_PORT_RANGE { + type string { + pattern "[0-9]{1,5}(-)[0-9]{1,5}"; + } + } + } + } + + choice dst_port { + case l4_dst_port { + leaf L4_DST_PORT { + type uint16; + } + } + case l4_dst_port_range { + leaf L4_DST_PORT_RANGE { + type string { + pattern "[0-9]{1,5}(-)[0-9]{1,5}"; + } + } + } + } + + leaf TCP_FLAGS { + type string { + pattern "0[xX][0-9a-fA-F]{2}[/]0[xX][0-9a-fA-F]{2}"; + } + } + + leaf DSCP { + type uint8; + } + } + + container state { + config false; + + leaf MATCHED_PACKETS { + type yang:counter64; + } + + leaf MATCHED_OCTETS { + type yang:counter64; + } + } + } +} diff --git a/doc/nat/images/Nat_block_diagram.png b/doc/nat/images/Nat_block_diagram.png new file mode 100644 index 0000000000..968b68d097 Binary files /dev/null and b/doc/nat/images/Nat_block_diagram.png differ diff --git a/doc/nat/images/dynamic_napt_entry_creation_flow.png b/doc/nat/images/dynamic_napt_entry_creation_flow.png new file mode 100644 index 0000000000..eb7aa4b3d5 Binary files /dev/null and b/doc/nat/images/dynamic_napt_entry_creation_flow.png differ diff --git a/doc/nat/images/nat_dc_deployment_usecase.png b/doc/nat/images/nat_dc_deployment_usecase.png new file mode 100644 index 0000000000..81d12e04ad Binary files /dev/null and b/doc/nat/images/nat_dc_deployment_usecase.png differ diff --git a/doc/nat/images/nat_enterprise_deployment_usecase.png b/doc/nat/images/nat_enterprise_deployment_usecase.png new file mode 100644 index 0000000000..b60aa1edca Binary files /dev/null and b/doc/nat/images/nat_enterprise_deployment_usecase.png differ diff --git a/doc/nat/images/nat_entry_aging_flow.png b/doc/nat/images/nat_entry_aging_flow.png new file mode 100644 index 0000000000..590e4b2335 Binary files /dev/null and b/doc/nat/images/nat_entry_aging_flow.png differ diff --git a/doc/nat/images/static_napt_config_flow.png b/doc/nat/images/static_napt_config_flow.png new file mode 100644 index 0000000000..96c42174b1 Binary files /dev/null and b/doc/nat/images/static_napt_config_flow.png differ diff --git a/doc/nat/nat_design_spec.md b/doc/nat/nat_design_spec.md new file mode 100644 index 0000000000..b012659ab3 --- /dev/null +++ b/doc/nat/nat_design_spec.md @@ -0,0 +1,1572 @@ +# NAT in SONiC +# High Level Design Document +#### Rev 0.1 + +# Table of Contents + * [Revision](#revision) + * [About this Manual](#about-this-manual) + * [Scope](#scope) + * [Definitions/Abbreviations](#definitionsabbreviations) + * [1 Requirements Overview](#1-requirements-overview) + * [1.1 Functional Requirements](#11-functional-requirements) + * [1.2 Configuration and Management Requirement](#12-configuration-and-management-requirements) + * [1.3 Scalability Requirements](#13-scalability-requirements) + * [2 Functionality](#2-functionality) + * [2.1 Target Deployment Use Cases](#21-target-deployment-use-cases) + * [2.2 Functional Description](#22-functional-description) + * [2.2.1 SNAT and DNAT](#221-snat-and-dnat) + * [2.2.2 Static NAT/NAPT](#222-static-nat-and-napt) + * [2.2.3 Dynamic NAT/NAPT](#223-dynamic-nat-and-napt) + * [2.2.4 NAT zones](#224-nat-zones) + * [2.2.5 Twice NAT/NAPT](#225-twice-nat-and-napt) + * [2.2.6 VRF support](#226-vrf-support) + * [3 Design](#3-design) + * [3.1 Design overview](#31-design-overview) + * [3.2 DB Changes](#32-db-changes) + * [3.2.1 Config DB](#321-config-db) + * [3.2.2 ConfigDB Schemas](#322-configdb-schemas) + * [3.2.3 APP DB](#323-app-db) + * [3.2.4 APP DB Schemas](#324-app-db-schemas) + * [3.2.5 COUNTERS DB](#325-counters-db) + * [3.3 Switch State Service Design](#33-switch-state-service-design) + * [3.3.1 NatMgr daemon](#331-natmgr-daemon) + * [3.3.2 Natsync daemon](#332-natsync-daemon) + * [3.3.3 NatOrch Agent](#333-natorch-agent) + * [3.4 Linux Integration](#34-linux-integration) + * [3.4.1 IPtables](#341-iptables) + * [3.4.2 Connection tracking](#342-connection-tracking) + * [3.4.3 Interactions between Kernel and Natsyncd](#343-interactions-between-kernel-and-natsyncd) + * [3.5 Docker for NAT](#35-docker-for-nat) + * [3.6 SAI](#35-sai) + * [3.7 Statistics](#36-statistics) + * [3.8 CLI](#37-cli) + * [3.8.1 Data Models](#381-data-models) + * [3.8.2 Config CLI commands](#382-config-cli-commands) + * [3.8.3 Show CLI commands](#383-show-cli-commands) + * [3.8.4 Clear commands](#384-clear-commands) + * [3.8.5 Debug commands](#385-debug-commands) + * [3.8.6 REST API Support](#386-rest-api-support) + * [3.8.7 Example configuration](#387-example-configuration) + * [4 Flow Diagrams](#4-flow-diagrams) + * [4.1 Static NAPT configuration flow](#41-static-napt-configuration-flow) + * [4.2 Dynamic NAPT configuration flow](#42-dynamic-napt-configuration-flow) + * [4.3 Dynamic NAPT entry aging flow](#43-dynamic-napt-entry-aging-flow) + * [5 Serviceability and Debug](#5-serviceability-and-debug) + * [6 Warm Boot Support](#6-warm-boot-support) + * [7 Scalability](#7-scalability) + * [8 Unit Test](#8-unit-test) + * [9 To be done in future release](#9-to-be-done-in-future-release) + +# Revision + +| Rev | Date | Author | Change Description | +|:---:|:-----------:|:-------------------|:-----------------------------------| +| 0.2 | 09/09/2019 | Kiran Kella,
Akhilesh Samineni | Initial version | + + +# About this Manual +This document describes the design details of the Network Address Translation (NAT) feature. +NAT router enables private IP networks to communicate to the public networks (internet) by translating the private IP address to globally unique IP address. It also provides security by hiding the identity of the host in private network. For external hosts to be able to access the services hosted in the internal network, port address translation rules are added to map the incoming traffic to the internal hosts. + +# Scope +This document describes the high level design details about how NAT works. + +# Definitions/Abbreviations +###### Table 1: Abbreviations +| Abbreviation | Full form | +|--------------------------|----------------------------------| +| NAT | Network Address Translation | +| SNAT | Source NAT | +| DNAT | Destination NAT | +| PAT | Port Address Translation | +| NAPT | Network Address Port Translation
NAPT and PAT mean the same and are used inter-changeably in this document. | +| DIP | Destination IP address | +| SIP | Source IP address | +| DC | Data Center | +| ToR | Top of the Rack | +| ACL | Access Control List | + +# 1 Requirements Overview +## 1.1 Functional Requirements +The requirements for NAT are: + +1.0 NAT/NAPT is a method by which many network addresses and their TCP/UDP ports are translated into a single/multiple network address(es) and TCP/UDP ports, deployed in Enterprise and Data Center scenarios. It can be viewed as a privacy mechanism since the actual identity of the private hosts is not visible to external hosts. + +1.1 NAT is standardized through RFC 2633, RFC 3022, RFC 4787. + +2.1.0 Provide the ability to create/delete Static Basic NAT entries (one-to-one IP address translation) mapping from public IP address to an internal host's IP address. + +2.1.1 Provide the ability to create/delete Static NAPT (PAT) entries that map an L4 port on the Router's public IP address to an internal host's IP address + L4 port. + +2.1.2 Provide ability to do dynamic NAT from internal host IP addresses to the available range of public IP addresses. + +2.1.3 Provide ability to do dynamic NAPT (or PAT) from internal host IP addresses to the public IP address + L4 ports range. + +2.1.4 PAT entries are configurable per IP protocol type. Allowed IP protocols are TCP and UDP. + +2.1.5 Configure NAT pool that specifies the range of IP addresses and range of L4 ports to do dynamic network address translation to. + +2.1.6 More than 1 NAT pool can be created, limited to a maximum number of 16 pools. Same applies to NAT bindings. + +2.1.7 Access lists are used to define the set of hosts that are subjected to dynamic NAT/NAPT, by binding ACL and NAT pool together. + +2.1.8 NAT pool binding with no associated ACL, allows all hosts to be subjected to dynamic NAT/NAPT. + +2.1.9 Static NAT/NAPT entries are not timed out. They have to be unconfigured explicitly. + +2.1.10 If Static NAPT entry is same as the dynamic NAPT entry, entry is retained as Static NAPT entry. + +2.1.11 For the NAT/NAPT entries created statically or dynamically, bi-directional NAT translations can be performed. + +2.1.12 Dynamic NAPT entry is timed out if it is inactive in the hardware for more than the configurable age timeout period. + +2.1.13 Dynamic and Static NAT/NAPT entries should persist across warm reboot with no traffic disruption to the active flows. +2.1.14 NAT feature can be enabled or disabled globally. All the NAT configuration and functionality works only if the NAT feature is enabled. + +2.2.0 Provide the ability for Twice NAT that translates both Source and Destination IP addresses when crossing the zones. + +2.2.1 Provide the ability for Twice NAPT that translates both Source and Destination IP addresses and L4 ports when crossing the zones. + +2.3.0 ICMP packets are translated via NAT or NAPT rules in the Linux kernel. + +2.4.0 The hardware NAT table full condition is handled gracefully by logging error log message. + +2.5.0 The SNAT or DNAT miss packets are rate limited to CPU. + +2.5.1 The NAT miss packets are processed from a higher priority CPU COS Queue than the Broadcast/Unknown Multicast packets. + +2.6.0 Support for the Full Cone NAT/NAPT functionality. + +3.0 ACL and NAT pool binding is applicable on the valid L3 interfaces (VLAN, Ethernet, PortChannel) defined by the ACL TABLE entry. + +3.1 The L3 ports on which the NAT ACL is applied are in a different NAT zone compared to the port on which the NAT Pool IP address is based on. + +3.2 Provide configuration of zones on L3 interfaces. + +4.0 Provide configurable age timeout interval for the inactive dynamic UDP NAPT entries (in seconds). Default is 300 secs. Range is from 120 sec to 600 secs. + +4.1 Provide configurable age timeout interval for the inactive dynamic TCP NAPT entries (in seconds). Default is 86400 secs. Range is 300 sec to 432000 secs (5 days). + +4.2 Provide configurable age timeout interval for the inactive dynamic NAT entries (in seconds). Default is 600 secs. Range is 300 sec to 432000 secs (5 days). + +5.0 Should be able to ping from internal host to an outside host via NAPT. + +5.1 Should be able to traceroute from internal host to an outside host via NAPT. + +5.2 Provide support for NAT translation statistics. + +5.3 Ability to clear the NAT translation table entries. + +5.4 Ability to clear the NAT translation statistics. + +5.2 Ability to stop and start the NAT docker service. + +5.3 Ability to enable the logging at different severity levels of the NAT module. + +## 1.2 Configuration and Management Requirements +Configuration of the NAT feature can be done via: + +- JSON config input +- Incremental CLI + +## 1.3 Scalability Requirements +Ability to support the scale of up to 40K Bi-directional NAT entries, but this is SAI capability dependent. + +# 2 Functionality +## 2.1 Target Deployment Use Cases +NAT feature is targeted for DC and Enterprise deployments. + +In DC deployments, we can have the following different use-cases: +1. Traffic between the Server connected to the ToR and the Internet. ToR enabled for NAT does the DNAT for the south bound traffic from Internet to the Server and does the SNAT for the north bound traffic to the Internet. + +2. Intra-DC traffic between the nodes connected to different ToRs in the same DC. This is the case of internal communication with private IP addresses between the nodes in the DC that do not need to be NAT'ted at the ToRs, thereby saving the ASIC NAT table resources. In such scenarios, ACLs are used (see section 3.3 for more details) to avoid NAT actions on selected traffic flows. + +3. Traffic between 2 Servers/hosts connected to the same ToR (also referred to as hairpinning traffic). Administrator may want to either do or not do the NAT on the hairpinning traffic. To avoid NAT in this case, ACLs can be used. Support for NAT on the hairpinning traffic is a future enhancement. + +![DC deployment use case](images/nat_dc_deployment_usecase.png) + +In Enterprise deployments, the Customer Edge switch or other Customer premises equipment implements the NAT firewall functionality for translating the traffic between the internal enterprise hosts and the public domain. + +![Enterprise deployment use case](images/nat_enterprise_deployment_usecase.png) + +## 2.2 Functional Description +### 2.2.1 SNAT and DNAT +Source NAT translation involves translating the Source IP address (optionally along with L4 port) in the IP packets crossing from inside the private network to public network. Destination NAT translation involves translating the Destination IP address (optionally along with L4 port) in the IP packets crossing from public network to private network. The IP header checksum and the L4 header checksum are recalculated as the header contents are changed due to the NAT translation. + +### 2.2.2 Static NAT and NAPT +User can configure a static binding of IP address + optional L4 port between different zones (like for eg., binding from a [Public IP address + optional L4 port] to a designated [Private host IP address + optional L4 port]). With no optional L4 port configuration, the translation is only from private IP to public IP and vice versa, which is referred as basic NAT. Any conflicts should be avoided between the IP addresses in the Static NAT config and the Static NAPT config as there is no order or priority ensured in the processing of the conflicting rules. + +Static NAT/NAPT entries are not timed out from the translation table. + +NOTE: Overlap in the configured Global IP address between Static NAT and Static NAPT is not allowed. + +### 2.2.3 Dynamic NAT and NAPT +On the NAT router, when the public addresses are limited and less in number than the internal network addresses, the many-to-one mapping is needed from the [Private IP address + L4 port] to a [Public IP address + L4 port]. This is where multiple internal IPs map to a single public IP by overloading with different L4 ports. When the internally originated outbound traffic hits the NAT router, if a matching SNAT entry exists it is translated. Else the packet is trapped to CPU so that a new SNAT mapping is allocated for the [Private IP address + L4 port]. + +NAT pool is configured by the user to specify the public IP address (optionally an IP address range) and the L4 port range to overload the private IP address + L4 port to. + +ACL is configured to classify a set of hosts to apply the NAT translation on. ACL is bound to a NAT pool to NAT translate the ACL allowed hosts to the public IP + L4 port defined by the pool. Packets not matching the ACL are L3 forwarded in the regular fashion. + +Dynamically created NAT/NAPT entries are timed out after a period of inactivity in the hardware. The inactivity timeout periods are independently configurable for TCP and UDP NAPT entries. +The inactivity timeout period is configurable for the basic NAT entries. + +When the pool is exhausted, new incoming connections are no longer NAT'ted in the hardware. In such scenario, the new incoming traffic are reported as NAT miss and dropped. Only when the inactive entries are released are the new incoming connections dynamically mapped to the freed up ports. + +Dynamic NAT/NAPT is supported only for the traffic of IP protocol types TCP/UDP/ICMP. Other IP protocol type traffic are not dynamically NAT'ted but are dropped in the hardware. + +NOTE: Overlap in the configured Global IP address and Port range between Static NAT/NAPT and Dynamic NAT/NAPT is not allowed. + +#### 2.2.3.1 Full Cone NAT +When the first connection from a particular internal IP + L4 port is dynamically SNAT mapped to an external IP + L4 port, the SNAT and the corresponding DNAT entries are added for that mapping in the hardware. +Subsequently any new connections from the same internal IP + L4 port to new destination end points use the same mapping. As well, traffic sent by any external host to this external IP + L4 port is NAT'ted back to the same internal IP + L4 port. This behavior model is referred to as Full Cone NAT. + +### 2.2.4 NAT zones +NAT zones refer to different network domains between which the NAT translation happens when the packet crosses between them. NAT zones are also referred to as NAT realms. +NAT zones are created by configuring zone-id per L3 interface. The L3 interface referred to for NAT purposes can be an Ethernet, VLAN or PortChannel or Loopback interface that are configured with IP address(es). + +In this document, the interface that is towards the private networks (private realm) on the NAT router is referred to as an inside interface. +And the interface that is towards the public network (public realm) on the NAT router is referred to as outside interface. +By default, L3 interface is in NAT zone 0 which we refer to as an inside interface. + +NAT/NAPT is performed when packets matching the configured NAT ACLs cross between different zones. +The source zone of a packet is determined by the zone of the interface on which the packet came on. And the destination zone of the packet is determined by the zone of the L3 next-hop interface from the L3 route lookup of the destination. + +Currently only 2 zones are supported, which correspond to the inside interfaces and the outside interfaces. + +Any inbound traffic ingressing on the outside interface that is L3 forwarded on to the inside interface, is configured by user via Static NAT/NAPT entries to be DNAT translated. +Any outbound traffic ingressing on the inside interface is configured to be dynamically SNAT translated. + +The public IP address of the interface deemed to be outside interface, is referred to in the NAT pool configuration. + +The allowed range of configurable zone values is 0 to 3. + +In typical DC deployments, the Loopback interface IP address can be used as the public IP address in the static or dynamic NAT configurations. In such a case, Loopback interface as well should be assigned a zone value that is same as the zone assigned to the physical outside interfaces. Though the zone value is not configured for the Loopback interface in the hardware, it is required by the application layer for the NAT functionality. + +More details on the configuration of the Loopback IP as public IP is in the section 3.4 below. + +### 2.2.5 Twice NAT and NAPT +Twice NAT or Double NAT is a NAT variation where both the Source IP and the Destination IP addresses are modified as a packet crosses the address zones. It is typically used in the communication between networks with overlapping private addresses. + +The configuration for Twice NAT/NAPT is achieved in 2 ways: +- Putting two Static NAT/NAPT entries (one SNAT and one DNAT) in the same Group ('twice_nat_id' value). +- Putting a Dynamic Source NAT/NAPT Binding and a Static Source NAT/NAPT entry in the same Group ('twice_nat_id' value). + +When a host matching a dynamic NAT pool binding sends traffic to host with a matching DNAT Static NAT/NAPT entry in the same 'twice_nat_id' group, a bi-directional Twice NAT/NAPT entry is created for the traffic flow. + +NOTE: The Static NAT/NAPT entry that is part of a Twice NAT group is not added used for single NAT'ting in the hardware. + +### 2.2.6 VRF support +NAT is supported in the default Virtual Routing and Forwarding (VRF) domain only. +It is assumed that the Static or Dynamic NAT configurations do not include the interfaces in non-default VRFs. + +# 3 Design +## 3.1 Design overview +The design overview at a high level is as below. The details are explained in the following subsections. +- Configuration tables and schema for provisioning the static and dynamic NAT/NAPT entries. +- Design is centered around the iptables and connection tracking module in the kernel. +- NAT application configures the NAT configuration rules using iptables in the kernel. +- The iptables and connection tracking modules in the kernel are leveraged to handle the NAT miss packets to create the NAT/NAPT entries, that are added to the hardware. + +## 3.2 DB Changes +### 3.2.1 Config DB +Following table changes are done in Config DB. + +#### 3.2.1.1 STATIC_NAPT Table +``` +STATIC_NAPT|{{global_ip}}|{{ip_protocol}}|{{global_l4_port}} + "local_ip": {{ip_address}} + "local_port" :{{l4_port}} + "nat_type" :{{snat-or-dnat}} (OPTIONAL) + "twice_nat_id" :{{twice-nat-id}} (OPTIONAL) +``` + +``` +This config tells to do: +- If the "nat_type" is 'dnat': + - DNAT translation of the DIP/DPORT in the IP packet from 'global_ip' address and 'global_port' to 'local_ip' address and 'local_l4_port'. + - SNAT translation of the SIP/SPORT in the IP packet from 'local_ip' address and 'local_port' + to 'global_ip' address and 'global_l4_port' when the packet crosses the zones. +- If the "nat_type" is 'snat': + - SNAT translation of the SIP/SPORT in the IP packet from 'global_ip' address and 'global_port' to 'local_ip' address and 'local_l4_port' when the packet crosses the zones. + - DNAT translation of the DIP/DPORT in the IP packet from 'local_ip' address and 'local_port' + to 'global_ip' address and 'global_l4_port'. +- The default value of nat_type is 'dnat' if the option is not given. +``` + +#### 3.2.1.2 STATIC_NAT Table +``` +STATIC_NAT|{{global_ip}} + "local_ip": {{ip_address}} + "nat_type" :{{snat-or-dnat}} (OPTIONAL) + "twice_nat_id" :{{twice-nat-id}} (OPTIONAL) +``` + +``` +This config tells to do: +- If the "nat_type" is 'dnat': + - DNAT translation of the DIP in the IP packet from 'global_ip' address to 'local_ip'. + - SNAT translation of the SIP in the IP packet from 'local_ip' address to 'global_ip' address when the packet crosses the zones. +- If the "nat_type" is 'snat': + - SNAT translation of the SIP in the IP packet from 'global_ip' address to 'local_ip' address when the packet crosses the zones. + - DNAT translation of the DIP in the IP packet from 'local_ip' address to 'global_ip' address. +- The default value of nat_type is 'dnat' if the option is not given. +``` +#### 3.2.1.3 NAT_GLOBAL Table +``` +NAT_GLOBAL|Values + "admin_mode" : {{enable-or-disable}} + "nat_timeout" : {{timeout_in_secs}} + "nat_tcp_timeout": {{timeout_in_secs}} + "nat_udp_timeout": {{timeout_in_secs}} +``` + +#### 3.2.1.4 NAT_POOL Table +``` +NAT_POOL|{{pool_name}} + "nat_ip": {{ip_address}} (or) {{ip_addr_start}}-{{ip_addr_end}} + "nat_port": {{start_l4_port}}-{{end_l4_port}} (OPTIONAL) +``` + +#### 3.2.1.5 ACL to Pool Bindings +``` +NAT_BINDINGS|{{binding-name}} + "nat_pool": {{pool-name}} + "access_list": {{access-list-name}} (OPTIONAL) + "nat_type": {{snat-or-dnat}} (OPTIONAL) + "twice_nat_id" :{{twice-nat-id}} (OPTIONAL) +``` +``` +This config tells to do: +- The hosts that are denied the NAT action by the ACL are not subjected to NAT but follow + the normal L3 forwarding. +- The hosts that are permitted by the ACL are subjected to dynamic NAT or NAPT as per the pool. If no ACL is provided, all hosts will be subjected to NAT. +- Based on the nat_type, dynamic source NAPT or dynamic destination NAPT is to be done for the hosts allowed by the ACL. Currently only 'snat' nat_type is supported, which means the NAT_BINDINGS is used to do the dynamic source NAPT translation only. Default nat_type is 'snat' if the attribute is not specified. +- NAT_BINDING entry and a STATIC NAT entry that are in the same twice_nat_id are used to create a double NAT entry. +``` + +#### 3.2.1.6 Zone configuration +``` +VLAN_INTERFACE|{{vlan-name}} + "nat_zone": {{zone-value}} + +INTERFACE|{{ethernet-name}} + "nat_zone": {{zone-value}} + +PORTCHANNEL_INTERFACE|{{portchannel-name}} + "nat_zone": {{zone-value}} + +LOOPBACK_INTERFACE|{{loopback-name}} + "nat_zone": {{zone-value}} +``` + +### 3.2.2 ConfigDB Schemas + +``` +; Defines schema for STATIC_NAPT configuration attributes +key = STATIC_NAPT:global_ip:ip_protocol:global_l4_port ; Static NAPT mapping configuration +; field = value +LOCAL_IP = ipv4 ; Local private IP address +LOCAL_PORT = port_num ; Local tcp/udp port number +NAT_TYPE = SNAT/DNAT ; Type of NAT to be done +TWICE_NAT_ID = twice_nat_id ; Group id used for twice napt +; value annotations +ipv4 = dec-octet "." dec-octet "." dec-octet "." dec-octet +dec-octet = DIGIT ; 0-9 + / %x31-39 DIGIT ; 10-99 + / "1" 2DIGIT ; 100-199 + / "2" %x30-34 DIGIT ; 200-249 + +port_num = 1*5DIGIT ; a number between 1 and 65535 +twice_nat_id = 1*4DIGIT ; a number between 1 and 9999 +``` + +``` +; Defines schema for STATIC_NAT configuration attributes ; Static NAT mapping configuration +key = STATIC_NAT:global_ip +; field = value +LOCAL_IP = ipv4 ; Local private IP address +NAT_TYPE = SNAT/DNAT ; Type of NAT to be done +TWICE_NAT_ID = twice_nat_id ; Group Id used for twice nat + +; value annotations +ipv4 = dec-octet "." dec-octet "." dec-octet "." dec-octet +dec-octet = DIGIT ; 0-9 + / %x31-39 DIGIT ; 10-99 + / "1" 2DIGIT ; 100-199 + / "2" %x30-34 DIGIT ; 200-249 +twice_nat_id = 1*4DIGIT ; a number between 1 and 9999 +``` + +``` +; Defines schema for NAT_GLOBAL configuration attributes ; Global attributes for NAT +key = Values +; field = value +ADMIN_MODE = ENABLE/DISABLE ; If the NAT feature is enabled or disabled +NAT_TIMEOUT = 1*6DIGIT ; Timeout in secs (Range: 300 sec - 432000 sec) +NAT_TCP_TIMEOUT = 1*3DIGIT ; Timeout in secs (Range: 300 sec - 432000 sec) +NAT_UDP_TIMEOUT = 1*6DIGIT ; Timeout in secs (Range: 120 sec - 600 sec) +``` + +``` +; Defines schema for NAT_POOL table +key = NAT_POOL:pool_name ; NAT Pool configuration +; field = value +NAT_IP = ipv4 (or) ipv4_L - ipv4_H ; range of IPv4 addresses +NAT_PORT = port_num_L - port_num_H ; range of L4 port numbers + +; value annotations +port_num_L = 1*5DIGIT ; a number between 1 and 65535 + ; port_num_L < port_num_H +port_num_H = 1*5DIGIT ; a number between 1 and 65535 + ; port_num_L < port_num_H +``` + +``` +; Defines schema NAT Bindings configuration attributes +key = NAT_BINDINGS:bindings-name +; field = value +ACCESS_LIST = 1*64VCHAR ; Name of the ACL +NAT_POOL = 1*64VCHAR ; Name of the NAT pool +NAT_TYPE = SNAT/DNAT ; Type of NAT to be done +TWICE_NAT_ID = 1*4DIGIT ; a number between 1 and 9999 + +``` +``` +; Defines schema for NAT zone configuration +key = VLAN_INTERFACE:vlan-name +; field = value +NAT_ZONE = 1*1DIGIT ; a number in the range 0-3 + +key = INTERFACE:port-name +; field = value +NAT_ZONE = 1*1DIGIT ; a number in the range 0-3 + +key = PORTCHANNEL_INTERFACE:portchannel-name +; field = value +NAT_ZONE = 1*1DIGIT ; a number in the range 0-3 + +key = LOOPBACK_INTERFACE:loopback-name +; field = value +NAT_ZONE = 1*1DIGIT ; a number in the range 0-3 +``` + +Please refer to the [schema](https://github.com/Azure/sonic-swss/blob/master/doc/swss-schema.md) document for details on value annotations. + +### 3.2.3 APP DB +New tables are introduced to specify NAT translation entries. + +``` +NAPT_TABLE:{{ip_protocol}}:{{ip}}:{{l4_port}} + "translated_ip" :{{ip-address}} + "translated_l4_port":{{l4_port}} + "nat_type" :((snat-or-dnat}} + "entry_type" :{{static_or_dynamic}} + +NAT_TABLE:{{ip}} + "translated_ip" :{{ip-address}} + "nat_type" :((snat-or-dnat}} + "entry_type" :{{static_or_dynamic}} + +NAT_TWICE_TABLE:{{src_ip}}:{{dst_ip}} + "translated_src_ip" : {{ip-address}} + "translated_dst_ip" : {{ip-address}} + "entry_type" : {{static_or_dynamic}} + +NAPT_TWICE_TABLE:{{ip_protocol}}:{{src_ip}}:{{src_l4_port}}:{{dst_ip}}:{{dst_l4_port}} + "translated_src_ip" : {{ip-address}} + "translated_src_l4_port": {{l4_port}} + "translated_dst_ip" : {{ip-address}} + "translated_dst_l4_port": {{l4-port}} + "entry_type" : {{static_or_dynamic}} + +NAT_GLOBAL_TABLE:Values + "admin_mode" : {{enable_or_disable}} + "nat_timeout" : {{timeout_in_secs}} + "nat_tcp_timeout": {{timeout_in_secs}} + "nat_udp_timeout": {{timeout_in_secs}} +``` + +### 3.2.4 APP DB Schemas + +``` +; Defines schema for the NAPT translation entries +key = NAPT_TABLE:ip_protocol:ip:l4_port ; NAPT table +; field = value +TRANSLATED_IP = ipv4 +TRANSLATED_L4_PORT = port_num +NAT_TYPE = "snat" / "dnat" +ENTRY_TYPE = "static" / "dynamic" + +; value annotations +port_num = 1*5DIGIT ; a number between 1 and 65535 +``` + +``` +; Defines schema for the NAT translation entries +key = NAT_TABLE:ip ; NAT table +; field = value +TRANSLATED_IP = ipv4 +NAT_TYPE = "snat" / "dnat" +ENTRY_TYPE = "static" / "dynamic" +``` + +``` +; Defines schema for the Twice NAT translation entries + +key = NAT_TWICE_TABLE:src_ip:dst_ip +TRANSLATED_SRC_IP = ipv4 +TRANSLATED_DST_IP = ipv4 +ENTRY_TYPE = "static" / "dynamic"  +``` + +``` +; Defines schema for the Twice NAPT translation entries + +key = NAPT_TWICE_TABLE:ip_protocol:src_ip:src_l4_port:dst_ip:dst_l4_port +TRANSLATED_SRC_IP = ipv4 +TRANSLATED_SRC_L4_PORT = port_num +TRANSLATED_DST_IP = ipv4 +TRANSLATED_DST_L4_PORT = port_num +ENTRY_TYPE = "static" / "dynamic"  +``` + +``` +; Defines schema for the NAT global table +key = NAT_GLOBAL_TABLE:Values ; NAT global table +; field = value +ADMIN_MODE = "enable" / "disable" ; If the NAT feature is enabled or disabled globally +NAT_TIMEOUT = 1*6DIGIT ; Timeout in secs (Range: 300 sec - 432000 sec) +NAT_TCP_TIMEOUT = 1*3DIGIT ; Timeout in secs (Range: 300 sec - 432000 sec) +NAT_UDP_TIMEOUT = 1*6DIGIT ; Timeout in secs (Range: 120 sec - 600 sec) +``` + +### 3.2.5 COUNTERS DB +The following new counters are applicable per zone based on the support in the hardware. +``` +COUNTERS_NAT_ZONE: + SAI_NAT_DNAT_DISCARDS + SAI_NAT_SNAT_DISCARDS + SAI_NAT_DNAT_TRANSLATION_NEEDED + SAI_NAT_SNAT_TRANSLATION_NEEDED + SAI_NAT_DNAT_TRANSLATIONS + SAI_NAT_SNAT_TRANSLATIONS + +``` +The following new counters are available per NAT/NAPT entry: +``` +The counters in the COUNTERS_DB are updated every 5 seconds. The key for the entry in the COUNTERS_DB is same as the key in the APP_DB. + +COUNTERS_NAT:ip + DNAT_TRANSLATIONS_PKTS : + DNAT_TRANSLATIONS_BYTES : + SNAT_TRANSLATIONS_PKTS : + SNAT_TRANSLATIONS_BYTES : + +COUNTERS_NAPT:ip_protocol:ip:l4_port + DNAT_TRANSLATIONS_PKTS : + DNAT_TRANSLATIONS_BYTES : + SNAT_TRANSLATIONS_PKTS : + SNAT_TRANSLATIONS_BYTES : + +COUNTERS_NAT_TWICE:src_ip:dst_ip + DNAT_TRANSLATIONS_PKTS : + DNAT_TRANSLATIONS_BYTES : + SNAT_TRANSLATIONS_PKTS : + SNAT_TRANSLATIONS_BYTES : + +COUNTERS_NAPT_TWICE:ip_protocol:src_ip:src_l4_port:dst_ip:dst_l4_port + DNAT_TRANSLATIONS_PKTS : + DNAT_TRANSLATIONS_BYTES : + SNAT_TRANSLATIONS_PKTS : + SNAT_TRANSLATIONS_BYTES : +``` + +## 3.3 Switch State Service Design +Following changes are done in the orchagent. + +### 3.3.1 NatMgr daemon + NatMgrd gets the STATIC_NAPT, STATIC_NAT, NAT_POOL, NAT_GLOBAL, NAT_BINDINGS config changes from CONFIG_DB. + NatMgr is responsible for pushing the Static NAT/NAPT entries and the NAT_GLOBAL configuration into the APP_DB. It also programs the Static NAT/NAPT entries and the NAT_POOL to ACL binding configuration as iptable rules in the kernel. + +Before acting upon the Static NAPT configuration, NatMgrd checks with the STATE_DB that the matching global IP interface is configured in the system (state == ok). + +For a STATIC_NAPT entry and the interface configuration as below: +``` +STATIC_NAPT|65.55.42.1|TCP|1024 + "local_ip": 20.0.0.1 + "local_port" :6000 + "nat_type": "dnat" + +INTERFACE|Ethernet15|65.55.42.1/24 +... + +INTERFACE|Ethernet15 + "nat_zone": 1 + +``` +the following iptable rules are added for inbound and outbound directions in the nat table as below: +``` +iptables -t nat -A PREROUTING -m mark --mark 2 -p tcp -j DNAT -d 65.55.42.1 --dport 1024 --to-destination 20.0.0.1:6000 +iptables -t nat -A POSTROUTING -m mark --mark 2 -p tcp -j SNAT -s 20.0.0.1 --sport 6000 --to-source 65.55.42.1:1024 +``` +They essentially tell the kernel to do the DNAT port translation for any incoming packets, and the SNAT port translation for the outgoing packets. + +If there are any ACL to NAT pool bindings configured, the NatMgrd listens to the notifications from the ACL tables and the ACL Rule configuration tables. On start-up, the NatMgrd queries the ACL Configuration for the ACLs bound to NAT pools. Once the ACL rules are retrieved, they are updated as iptables filtering options for the SNAT configuration. + +Before acting upon the Dynamic NAPT Pool configuration on an interface, NatMgrd checks with the STATE_DB that the matching outside IP interface is created in the system (state == ok). + +For the NAT_POOL configuration as below that is configured to match an ACL, +``` +NAT_POOL|pool1 + "nat_ip": 65.55.42.1 + "nat_port": 1024-65535 + +INTERFACE|Ethernet15|65.55.42.1/24 +... + +ACL_TABLE|10 + "stage": "INGRESS", + "type": "L3", + "policy_desc": "nat-acl", + "ports": "Vlan2000" + +ACL_RULE|10|1 + "priority": "20", + "src_ip": "20.0.1.0/24", + "packet_action": "do_not_nat" + +ACL_RULE|10|2 + "priority": "10", + "src_ip": "20.0.0.0/16", + "packet_action": "forward" + +NAT_BINDINGS|nat1 + "access_list": "10" + "nat_pool": "pool1 + "nat_type": "snat" + +INTERFACE|Ethernet15 + "nat_zone": 1 + +``` +the following iptable rules for udp, tcp, icmp protocol types are added as SNAT rules in the nat table: +``` +iptables -t nat -A POSTROUTING -p tcp -s 20.0.1.0/24 -j RETURN +iptables -t nat -A POSTROUTING -p udp -s 20.0.1.0/24 -j RETURN +iptables -t nat -A POSTROUTING -p icmp -s 20.0.1.0/24 -j RETURN + +iptables -t nat -A POSTROUTING -p tcp -s 20.0.0.0/16 -j SNAT -m mark --mark 2 --to-source 65.55.42.1:1024-65535 --fullcone +iptables -t nat -A POSTROUTING -p udp -s 20.0.0.0/16 -j SNAT -m mark --mark 2 --to-source 65.55.42.1:1024-65535 --fullcone +iptables -t nat -A POSTROUTING -p icmp -s 20.0.0.0/16 -j SNAT -m mark --mark 2 --to-source 65.55.42.1:1024-65535 --fullcone +``` +They tell the kernel to do the dynamic SNAT L4 port mapping or icmp query-id mapping dynamically for any incoming packets permitted by the ACL (20.0.0.0/16 subnet hosts excepting 20.0.1.0/24 subnet hosts), that are routed and before being sent out on the interface Ethernet15. + +The above configured iptables rules in the kernel look like: +``` +root@sonic:/home/admin# iptables -t nat -v -L +Chain PREROUTING (policy ACCEPT 0 packets, 0 bytes) + pkts bytes target prot opt in out source destination + 0 0 DNAT tcp -- Ethernet15 any anywhere 65.55.42.1 tcp dpt:1024 to:20.0.0.1:6000 + +Chain POSTROUTING (policy ACCEPT 0 packets, 0 bytes) + pkts bytes target prot opt in out source destination + 0 0 SNAT tcp -- any Ethernet15 20.0.0.1 anywhere tcp spt:x11 to:65.55.42.1:1024 + 0 0 RETURN icmp -- any Ethernet15 20.0.1.0/24 anywhere + 0 0 RETURN udp -- any Ethernet15 20.0.1.0/24 anywhere + 0 0 RETURN tcp -- any Ethernet15 20.0.1.0/24 anywhere + 0 0 SNAT icmp -- any Ethernet15 20.0.0.0/16 anywhere to:65.55.42.1:1024-65535 + 0 0 SNAT udp -- any Ethernet15 20.0.0.0/16 anywhere to:65.55.42.1:1024-65535 + 0 0 SNAT tcp -- any Ethernet15 20.0.0.0/16 anywhere to:65.55.42.1:1024-65535 +``` +In the NAT pool config, user can also give the NAT IP address range as below: +``` +NAT_POOL|pool1 + "nat_ip": 65.55.42.1-65.55.42.3 + "nat_port": 1024-65535 +``` +In which case, the Source NAPT translations are done to pick the outside SIP + L4 port from the given IP address range and the given L4 port range. +It is the responsibility of the administrator to ensure that the addresses in the IP address range are assigned on the outside IP interface. + +#### 3.3.1.1 ACL usage +ACLs are installed in the hardware independently by the ACL OrchAgent module. + +ACL rules are added with 'do_not_nat' action for the hosts for which the NAT is to be avoided, and with 'forward' action for the hosts for which the NAT is to be performed. + +The action 'do_not_nat' tells the hardware to skip doing NAT zone checks and NAT processing for the packet, and instead the packet is L3 forwarded. For the rest of the permitted packets, the NAT processing is done in the hardware. +If a matching SNAT entry or DNAT entry does not exist in the hardware, the packets are trapped to CPU as NAT miss packets to be processed by the kernel software forwarding path. + +Corresponding to the ACL, the equivalent deny and permit rules are added in the iptables so that the intended NAT miss packets are dynamically SNAT'ted in the kernel. + +When the ACL that is bound to the NAT pool is deleted, the corresponding iptables SNAT rules are deleted in the kernel. +When the ACL rules are created or modified, the corresponding iptables SNAT rules are updated in the kernel. +Multiple ACLs can be bound to a single NAT pool. 1 ACL can be bound at most to a single NAT pool. + +If there is no matching ACL configured, it is treated as 'implicit permit all' and hence the corresponding NAT pool in the binding is applied on all the traffic. + +Following match actions of the ACL are handled to match against the traffic for dynamic SNAT mapping: +- Source IP address or Subnet +- Destination IP address or Subnet +- Source L4 port or L4 port range +- Destination L4 port or L4 port range +- IP protocol + +Limitation: When ACL rule is changed from 'forward' action to 'do_not_nat' action, the matching traffic flows corresponding to the NAT entries that were created before due to the 'forward' action continue to be translated till the NAT entries are timed out. The conntrack entries in the kernel and the entries in the hardware continue to exist they become inactive and timeout. Flushing of the conntrack entries and the hardware entries on action change from 'forward' to 'do_not_nat' is a future item. + +#### 3.3.1.2 IP Interface config events +When the IP on the outside NAT zone interface is deleted, for any matching NAT pool and the matching Static NAT/NAPT entries, the corresponding iptables SNAT rules and the Static DNAT rules are deleted in the kernel. +The conntrack entries in the kernel and the APP_DB entries are aged out eventually by the inactivity period timeouts since there will not be any NAT traffic crossing over the deleted IP interfaces. + +#### 3.3.1.3 Interface link events +When the outside NAT zone interface's link goes down, the iptables NAT rules, the conntrack entries and the APP_DB dynamic entries are not removed immediately. + +This is done for the reasons below: + + - The temporary link flap events happening on the NAT outside interface should not + affect the NAT translation entries (corresponding to the end to end TCP/UDP sessions). + - To avoid performance issues during link flaps. + - The dynamic entries are aged out eventually due to inactivity timeout in case + the NAT outside interface is link down for more than the timeout period. + +### 3.3.2 Natsync daemon + +NatSyncd listens to the conntrack netlink notification events from the kernel for the creation, update and removal events of the connections in the conntrack table. +Once the SNAT/DNAT notifications are received, they are pushed into the NAT_TABLE/NAPT_TABLE/NAPT_TWICE_TABLE in the APP_DB to be consumed by the NatOrch Agent. More details in the section 3.4 below. + +### 3.3.3 NatOrch Agent +NAT feature is disabled by default. +When the NAT feature is enabled in the configuration, NatOrch enables the feature in the hardware. +The SNAT or DNAT miss packets are by default rate limited to CPU @ 600pps. The NAT miss trap action and the rate limiting, queuing policies on the NAT miss packets are managed via the CoPP configuration. + +NatOrch is responsible for the following activities: + + - Listens on the notifications in the NAT, NAPT, NAPT_TWICE tables in the APP_DB, picks the notifications, translates them to SAI objects and pushes them to ASIC_DB to be consumed by Syncd. For every single NAT/NAPT entry, NatOrch pushes 1 SNAT entry and 1 DNAT entry (for inbound and outbound directions) to the ASIC_DB via sairedis. + - Monitors the translation activity status every sampling period for each of the NAT/NAPT entries in the hardware. Sampling period is chosen to be a optimal value of 30 seconds. If the NAPT entry is inactive for the duration of the NAT timeout period, the entry is removed. Static NAT/NAPT entries are not monitored for the inactivity. They have to be unconfigured explicitly to be removed from the hardware. + +### 3.3.3.1 Interaction with NeighOrch and RouteOrch +NatOrch registers as observer with the NeighOrch and RouteOrch, so that the DNAT/DNAPT entries can be pushed to ASIC_DB by NatOrch, only if the corresponding translated IP is reachable via a neighbor entry or a route entry. + +When the neighbor or route are no longer resolving the translated IP, all the corresponding DNAT/DNAPT entries are removed from the hardware. + +### 3.3.3.2 Aging inactive entries +Since the active data traffic for the TCP/UDP connections are not flowing through the kernel, the timing out of the connection tracking entries is the responsibility of the application. +NatOrch agent maintains the internal cache of all the NAT entries that are received from the APP_DB. +NatOrch queries the translation activity status of each of the dynamic entries every sampling period. Static entries are not monitored for inactivity. The timeout of the active entries are reset every sampling period. If the dynamic entry is not active for the configured timeout, the entry is timed out from the connection tracking table in the Kernel. That in turn triggers the removal of the inactive dynamic entries from the APP_DB and the ASIC_DB. + +### 3.3.3.3 Handling the failed NAT entries +Once the error handling framework is in place, the following changes handle the failed NAT entries from SAI: +NatOrch listens on the notifications of the failed NAT entries from the Syncd. +For the failed NAT entries, NatOrch removes the corresponding entries from the internal cache and from the conntrack table. That will remove the entry from the APP_DB. + +### 3.3.3.4 Clear command +When the administrator issues the clear command, NatOrch flushes the conntrack table, which in turn results in deleting the entries in the APP_DB by the Natsyncd daemon. + +### 3.3.3.5 Max limit on the NAT entries +NatOrch queries the maximum capacity of the NAT entries in the hardware on startup. It keeps track of the total number of static + dynamic entries created and limits the NAT'ted conntrack entries in the kernel to the supported max limit in the hardware. When a new translation notification is received beyond the maximum limit (due to new traffic that is being NAT'ted), the conntrack entry is removed in the kernel. + +### 3.3.3.5 Block diagram +The high level block diagram for NAT in SONiC is captured below. + +![NAT block diagram](images/Nat_block_diagram.png) + +## 3.4 Linux Integration +### 3.4.1 IPtables +In SONiC, the core NAT logic which involves the (IP + port) translation dynamically is the iptables module in the kernel. +The iptables uses the Netfilter hooks in the kernel to execute its NAT'ting rules at different points in the packet forwarding path. + +The important hooks that iptables registers with Netfilter are the ones for PREROUTING and POSTROUTING. The DNAT rules if available are applied in the PREROUTING chain and SNAT rules if available are applied in the POSTROUTING chain. + +The packets that are going from one zone to another zone in the hardware are trapped to CPU as NAT miss packets if there are no matching NAT entries in the hardware. Such packets are received by the KNET kernel module and passed on to the netdev interface (corresponding to the interface the packet came on) in the Network stack. + +Before the L3 forwarding is done in the Kernel, the iptables rules in PREROUTING chain are applied. If there are any DNAT rules that the packets match against, they translate the DIP and L4 port to the IP and L4 port as applicable. In this process, the DNAT state and DIP of the connection entry in the connection tracking table are updated. + +After the L3 forwarding is done and before the packet is about to be sent out, the rules in the POSTROUTING chain are applied which do the SNAT (translate the SIP + L4 port to the public IP + L4 port). In this process, the SNAT state and SIP of the connection entry in the connection tracking table are updated. + +As long as the hardware entry is not installed, the NAT translation and forwarding is done in the Linux kernel using the iptables and conntrack entries. + +#### 3.4.1.1 No NAT Zones in the Kernel +Similar to the zone per L3 interface that is programmed in the hardware, there is no concept of NAT zone on the interface in the Kernel. To achieve zone checks in the kernel, rules are added in the iptables 'mangle' tables to put a MARK on the packet at the ingress and the egress. The MARK value is derived from the zone value (mark = zone+1 to avoid using default 0 value as the MARK value). When the packet traverses the kernel, 'mangle' tables rules are executed before the 'nat' tables rules at every stage. Hence we can match on the MARK value in the 'nat' table rules after it is set by the 'mangle' table rule. + +Effectively the POSTROUTING/SNAT happens for packet that is going out on any interface in the same outbound zone, and the PREROUTING/DNAT happens for packet that is coming in on any interface in the same inbound zone +For the static NAT/NAPT configurations, the iptables rules do the SNAT by matching against the outbound zone and do the DNAT by matching against the inbound zone. +For the dynamic SNAT/SNAPT configuration, the iptables rules do the SNAT by matching against the outbound zone. + +#### 3.4.1.2 Loopback IP as Public IP +In typical DC deployments, the Loopback interface IP addresses can be used as public IP in the static NAT/NAPT and dynamic Pool binding configurations. Reason being that the Loopback IP is always UP and reachable through one of the uplink physical interfaces and does not go down unlike the physical interfaces. To be used as the public IP, the Loopback interface should be configured with a zone value that is same as the zone value configured on the outside physical interfaces. Loopback interface zone configuration is not propagated to the hardware. + +For example, if user configures the Loopback0 interface with the public IP to be source translated to, with 2 uplink outside interfaces Ethernet28, Vlan100 in the same zone as below: + +``` +LOOPBACK_INTERFACE|Loopback0|65.55.42.1/24 + +LOOPBACK_INTERFACE|Loopback + "nat_zone": 1 + +INTERFACE|Ethernet28 + "nat_zone": 1 + +VLAN_INTERFACE|Vlan100 + "nat_zone": 1 + +NAT_POOL|pool1 + "nat_ip": 65.55.42.1 + "nat_port": 1024-65535 + +NAT_BINDINGS|nat1 + "nat_pool": "pool1 + "nat_type": "snat" +``` + +The iptables 'mangle' table rules are added on each L3 interface (other than loopback) to set the MARK value. +``` +iptables -t mangle -A PREROUTING -i Ethernet28 -j MARK --set-mark 2 +iptables -t mangle -A PREROUTING -i Vlan100 -j MARK --set-mark 2 +iptables -t mangle -A POSTROUTING -o Ethernet28 -j MARK --set-mark 2 +iptables -t mangle -A POSTROUTING -o Vlan100 -j MARK --set-mark 2 +``` + +The following iptables rules are added in the kernel to match on the MARK value for SNAT purpose: +``` +iptables -t nat -A POSTROUTING -p tcp -j SNAT -m mark --mark 2 --to-source 65.55.42.1:1024-65535 --fullcone +iptables -t nat -A POSTROUTING -p udp -j SNAT -m mark --mark 2 --to-source 65.55.42.1:1024-65535 --fullcone +iptables -t nat -A POSTROUTING -p icmp -j SNAT -m mark --mark 2 --to-source 65.55.42.1:1024-65535 --fullcone +``` + +### 3.4.2 Connection tracking +Connection tracking module in the kernel creates the connection entries and maintains their states as and when the packet traverses the forwarding path in the kernel. It keeps track of all the connections created in the system. IPtables module consults the connections tuples tracked in the system during NAT process and updates the connections. + +Connections when added, deleted or updated in the connection tracking system are notified via netlink interface to interested listeners (natsyncd in our case). + +#### 3.4.2.1 Handling NAT model mismatch between the ASIC and the Kernel +The kernel's conntrack subsystem is the source for creating the NAT translation mappings from a dynamic Pool binding configuration. + +The NAT model in the ASIC does SNAT/DNAT translations by doing 3-tuple match (matching against Protocol, SIP, SPORT (or) Protocol, DIP, DPORT) in the packet. Unlike in the ASIC, the traditional Linux iptables/conntrack model for NAT does the translations by 5-tuple match (matching against Protocol, SIP, SPORT, DIP, DPORT) in the kernel. + +By default the kernel SNATs to the same translated SIP+SPORT from the given pool range, for the traffic flows from different sources to different destinations. As a result the conntrack NAT 5-tuple cannot be mapped directly to the ASIC NAT 3-tuple. + +For example., the traffic flow +[SIP=1.0.0.1, SPORT=100 ==> DIP=2.0.0.2, DPORT=200] is SNAT'ted as [SIP=65.55.45.1, SPORT=600 ==> DIP=2.0.0.2, DPORT=200] as well as the traffic flow +[SIP=1.0.0.2, SPORT=120 ==> DIP=2.0.0.3, DPORT=300] is SNAT'ted as [SIP=65.55.45.1, SPORT=600 ==> DIP=2.0.0.3, DPORT=300] +since the translated 5-tuples are unique in each case. + +For the first traffic flow above, the 3-tuple entry is added in the hardware to +[SIP=1.0.0.1, SPORT=100] SNAT to [SIP=65.55.45.1, SPORT=600] + +The second traffic flow [SIP=1.0.0.2, SPORT=120] cannot be added in the hardware to translate to the same IP/PORT [SIP=65.55.45.1, SPORT=600], since the reverse traffic flows cannot be uniquely translated to the original Source endpoints. + +This mismatch in the NAT models between the ASIC and the Kernel is addressed by: +- Changes in the Linux kernel to do 3-tuple unique translation and full cone NAT functionality in the outbound (SNAT) direction. +- Full cone NAT functionality in the inbound (DNAT) direction. +- Change in iptables utility to pass the fullcone option to the kernel while creating the PREROUTING and POSTROUTING rules. + +With those changes, for the above flows, 3-tuple unique translations are achieved. +[SIP=1.0.0.1, SPORT=100] SNAT to [SIP=65.55.45.1, SPORT=600] +[SIP=1.0.0.2, SPORT=120] SNAT to [SIP=65.55.45.1, SPORT=601] + +### 3.4.3 Interactions between Kernel and Natsyncd +Following sections explain how the NAT entries corresponding to the connections originated from the private networks are created in the hardware. + +#### 3.4.4.1 NAT entries for TCP traffic +For the TCP traffic initiated by the hosts in the private zone, when the TCP SYN packet reaches the NAT router, if there is no matching NAT entry, the NAT miss is reported and the packet is trapped to CPU. The connection entry is created in the conntrack table with NEW conntrack state. The packet is L3 forwarded if a matching route and the nexthop are found. Once the packet is forwarded and ready to be sent out on the NAT outside interface, the packet's SIP and L4 port are translated and the connection entry is updated. As well the connection's state is updated with SNAT and/or DNAT status flag. + +Since the dynamic NAPT entry creation logic is driven by the kernel's iptables rules, we allow the 3-way handshake TCP SYN/ACK packets to be NAT translated and software forwarded in the Linux kernel. After the 3-way handshake, the conntrack status of the connection is set to ASSURED. + +Only the TCP connection entries that are marked as ASSURED in the conntrack table are considered by Natsyncd to be added in the APP_DB.This is done for the below reasons: +1. Since we do not want to use the NAT table space in the hardware to be filled up with any uni-directional SYN flood traffic. +2. So that the kernel does not timeout the unidirectional SYN-SENT connection state entries early (if only SYN packet is software forwarded in the kernel and the SYN+ACK/ACK packets are hardware forwarded). +3. To have the conntrack TCP entries and the hardware entries in sync. + +The conntrack entry notified by the kernel has the SIP and DIP fields in both directions of the flow. If only SIP or DIP is modified in any direction, it is a case of Single NAT/NAPT. If both the SIP and DIP are modified in any direction, it is a case of Twice NAT/NAPT. + +The conntrack netlink DESTROY events result in the deletion of the NAT entry from the APP_DB. The DESTROY events are received on the timeouts of the TCP connections in the connection tracking table. + +The TCP FIN flagged packets are not trapped to CPU. Hence the NAT entries for the closed TCP connections are not removed immediately from the hardware. They are timed out eventually based on the translation inactivity and removed. + +#### 3.4.4.2 NAT entries for UDP traffic +Unlike TCP traffic, UDP traffic has no connection establishment phase. +The first UDP packet in a session originated in the private zone raises a NAT miss on reaching the NAT router. The UDP connection entry is created in the tracking table and the SNAT translation is applied on the connection. +NatSyncd considers the UDP connections that have the conntrack entry state as SNAT and adds them to the APP_DB. +The timing out of the UDP connections in the conntrack table is the responsibility of the NatOrch. + +#### 3.4.4.3 NAT entries for ICMP traffic +ICMP query messages and responses (like echo requests/responses) resulting in the NAT miss are also subjected to dynamic SNAT with a translation from the local ICMP identifier and private IP to the overloaded ICMP identifier on the public IP . + +NAT translation of the ICMP traffic is always done in the Linux kernel. ICMP traffic crossing the zones is trapped to the CPU. ICMP NAT/NAPT entries hence are not added to hardware. + +Using tools like Ping and Traceroute from internal hosts destined to external public IP addresses should work with NAT translations. + +The ICMP error messages received from the external public IP networks in response to the packets from internal hosts, are NAT/NAPT translated back to the internal hosts. +The NAT translation for the ICMP error messages is based on RFC 5508, Section 4.2, where the contents of the embedded ICMP payload's IP header and transport headers are inspected to do DNAT translation of the outer IP header and the embedded IP and transport headers. + +The ICMP NAT session timeout is 30 seconds as maintained by the Kernel. + +## 3.5 Docker for NAT +NatSyncd and NatMgrd daemons run in a separate docker named 'nat'. It can be stopped/started/restarted independently. When the NAT docker is stopped, the following cleanup is done: +- The iptable rules in the nat table and the conntrack tables are cleared. +- The NAT/NAPT entries in the APP_DB and the ASIC_DB are cleared. + +``` +root@sonic:/home/admin# docker ps +CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES +9ff843cc79fa docker-syncd-brcm:latest "/usr/bin/supervisord" About a minute ago Up About a minute syncd +042faedb7b4b docker-dhcp-relay:latest "/usr/bin/docker_iniâ¦" About a minute ago Up About a minute dhcp_relay +210f0ad0b776 docker-router-advertiser:latest "/usr/bin/supervisord" About a minute ago Up About a minute radv +b45498e6e705 docker-orchagent-brcm:latest "/usr/bin/supervisord" About a minute ago Up About a minute swss +a77dbe67649a docker-nat:latest "/usr/bin/supervisord" 2 minutes ago Up About a minute nat +0d3b86e300f9 docker-lldp-sv2:latest "/usr/bin/supervisord" 2 minutes ago Up About a minute lldp +f65865e90fd1 docker-platform-monitor:latest "/usr/bin/supervisord" 2 minutes ago Up About a minute pmon +4374ebb6e0e1 docker-teamd:latest "/usr/bin/supervisord" 2 minutes ago Up About a minute teamd +6244613730fb docker-fpm-frr:latest "/bin/sh -c '/usr/biâ¦" 2 minutes ago Up About a minute bgp +9ba9d3e63426 docker-database:latest "/usr/bin/supervisord" 2 minutes ago Up 2 minutes database +``` +## 3.6 SAI +Table shown below represents the SAI attributes which shall be used for NAT. + +SAI Spec @ [ NAT SAI Spec ](https://github.com/opencomputeproject/SAI/blob/master/doc/NAT/SAI-NAT-API.md) + +###### Table 2: NAT table SAI attributes +| NAT component | SAI attributes | +|--------------------------|-------------------------------------------------------| +| NAT feature | SAI_SWITCH_ATTR_NAT_ENABLE | +| NAT Entry | SAI_NAT_ENTRY_ATTR_NAT_TYPE
SAI_NAT_ENTRY_ATTR_SRC_IP
SAI_NAT_ENTRY_ATTR_DST_IP
SAI_NAT_ENTRY_ATTR_L4_SRC_PORT
SAI_NAT_ENTRY_ATTR_L4_DST_PORT
SAI_NAT_ENTRY_ATTR_ENABLE_PACKET_COUNT
SAI_NAT_ENTRY_ATTR_ENABLE_BYTE_COUNT | +| NAT Counter | SAI_NAT_ZONE_COUNTER_ATTR_NAT_TYPE
SAI_NAT_ZONE_COUNTER_ATTR_ZONE_ID
SAI_NAT_ZONE_COUNTER_ATTR_ENABLE_DISCARD
SAI_NAT_ZONE_COUNTER_ATTR_ENABLE_TRANSLATION_NEEDED
SAI_NAT_ZONE_COUNTER_ATTR_ENABLE_TRANSLATIONS
SAI_NAT_ENTRY_ATTR_BYTE_COUNT
SAI_NAT_ENTRY_ATTR_PACKET_COUNT | +| NAT Hitbit | SAI_NAT_ENTRY_ATTR_HIT_BIT | + +NAT feature can be enabled/disabled at the switch level. By default the NAT feature is enabled in SONiC. +SAI_SWITCH_ATTR_NAT_ENABLE attribute can be set in the switch attributes using __set_switch_attribute__ API. + +The attribute SAI_ROUTER_INTERFACE_NAT_ZONE_ID is set in the sai_router_interface_attr_t using the __set_router_interface_attribute__ API to set the Zone ID value on the routing interface. + +The traps SAI_HOSTIF_TRAP_TYPE_SNAT_MISS and SAI_HOSTIF_TRAP_TYPE_DNAT_MISS are enabled on the host interfaces to trap the NAT miss packets to the CPU: + +For example, to create a bi-directional NAPT entry, NAT orchagent invokes the following SAI APIs with the necessary SAI attributes: + +``` +STATIC_NAPT|65.55.42.1|TCP|1024 + "local_ip" : "20.0.0.1" + "local_port" : "6000" + "nat_type" : "dnat" +``` +``` +/* Step 1: Create a Source NAPT Entry object: + * ---------------------------------------- */ +sai_attribute_t nat_entry_attr[10]; +nat_entry_t snat_entry; + +nat_entry_attr[0].id = SAI_NAT_ENTRY_ATTR_NAT_TYPE; +nat_entry_attr[0].value = SAI_NAT_TYPE_SOURCE_NAT; + +nat_entry_attr[1].id = SAI_NAT_ENTRY_ATTR_SRC_IP; +nat_entry_attr[1].value.u32 = 65.55.42.1; /* Corresponding address value */ + +nat_entry_attr[2].id = SAI_NAT_ENTRY_ATTR_L4_SRC_PORT; +nat_entry_attr[2].value.u16 = 1024; + +nat_entry_attr[3].id = SAI_NAT_ENTRY_ATTR_ENABLE_PACKET_COUNT; +nat_entry_attr[3].value.booldata = true; + +nat_entry_attr[4].id = SAI_NAT_ENTRY_ATTR_ENABLE_BYTE_COUNT; +nat_entry_attr[4].value.booldata = true; + +attr_count = 5; + +memset(&snat_entry, 0, sizeof(nat_entry)); + +snat_entry.data.key.src_ip = 20.0.0.1; +snat_entry.data.mask.src_ip = 0xffffffff; +snat_entry.data.key.l4_src_port = 6000; +snat_entry.data.mask.l4_src_port = 0xffff; +snat_entry.data.key.proto = 17; /* TCP */ +snat_entry.data.mask.proto = 0xff; + + +create_nat_entry(&snat_entry, attr_count, nat_entry_attr); + + +/* Step 2: Create a Destination NAPT Entry object: + * --------------------------------------------- */ +sai_attribute_t nat_entry_attr[10]; +nat_entry_t dnat_entry; + +nat_entry_attr[0].id = SAI_NAT_ENTRY_ATTR_NAT_TYPE; +nat_entry_attr[0].value = SAI_NAT_TYPE_DESTINATION_NAT; + +nat_entry_attr[1].id = SAI_NAT_ENTRY_ATTR_DST_IP; +nat_entry_attr[1].value.u32 = 20.0.0.1; /* Corresponding address value */ + +nat_entry_attr[2].id = SAI_NAT_ENTRY_ATTR_L4_DST_PORT; +nat_entry_attr[2].value.u16 = 6000; + +nat_entry_attr[3].id = SAI_NAT_ENTRY_ATTR_ENABLE_PACKET_COUNT; +nat_entry_attr[3].value.booldata = true; + +nat_entry_attr[4].id = SAI_NAT_ENTRY_ATTR_ENABLE_BYTE_COUNT; +nat_entry_attr[5].value.booldata = true; + +attr_count = 5; + +memset(&dnat_entry, 0, sizeof(nat_entry)); + +dnat_entry.data.key.dst_ip = 65.55.42.1; +dnat_entry.data.mask.dst_ip = 0xffffffff; +dnat_entry.data.key.l4_dst_port = 1024; +dnat_entry.data.mask.l4_dst_port = 0xffff; +dnat_entry.data.key.proto = 17; /* TCP */ +dnat_entry.data.mask.proto = 0xff; + +create_nat_entry(&dnat_entry, attr_count, nat_entry_attr); + +``` + +For example, to create a Twice NAPT entry, NAT orchagent invokes the following SAI APIs with the necessary SAI attributes: + +``` +STATIC_NAPT|138.76.28.1|TCP|1024 + "local_ip" : "200.200.200.1" + "local_port" : "6000" + "nat_type" : "dnat" + “twice_nat_id”: "100" + +STATIC_NAPT|200.200.200.100|TCP|3000 + "local_ip" : "172.16.1.100" + "local_port" : "5000" + "nat_type" : "snat" + “twice_nat_id”: "100" +``` +``` +/* Step 1: Create a Double NAPT entry in the forward direction: +* ----------------------------------------------------------- */ +sai_attribute_t nat_entry_attr[10]; +nat_entry_t dbl_nat_entry; + +nat_entry_attr[0].id = SAI_NAT_ENTRY_ATTR_NAT_TYPE; +nat_entry_attr[0].value = SAI_NAT_TYPE_DOUBLE_NAT; + +nat_entry_attr[1].id = SAI_NAT_ENTRY_ATTR_SRC_IP; +nat_entry_attr[1].value.u32 = 138.76.28.1; + +nat_entry_attr[2].id = SAI_NAT_ENTRY_ATTR_L4_SRC_PORT; +nat_entry_attr[2].value.u16 = 1024; + +nat_entry_attr[3].id = SAI_NAT_ENTRY_ATTR_DST_IP; +nat_entry_attr[3].value.u32 = 200.200.200.100; + +nat_entry_attr[4].id = SAI_NAT_ENTRY_ATTR_L4_DST_PORT; +nat_entry_attr[4].value.u16 = 3000; + +nat_entry_attr[5].id = SAI_NAT_ENTRY_ATTR_ENABLE_PACKET_COUNT; +nat_entry_attr[5].value.booldata = true; + +nat_entry_attr[6].id = SAI_NAT_ENTRY_ATTR_ENABLE_BYTE_COUNT; +nat_entry_attr[6].value.booldata = true; + +attr_count = 7; + +memset(&dbl_nat_etnry, 0, sizeof(nat_etnry)); + +dbl_nat_entry.data.key.src_ip = 200.200.200.1; +dbl_nat_entry.data.mask.src_ip = 0xffffffff; +dbl_nat_entry.data.key.proto = 17; /* TCP */ +dbl_nat_entry.data.key.l4_src_port = 6000; +dbl_nat_entry.data.mask.l4_src_port = 0xffff; +dbl_nat_entry.data.key.dst_ip = 172.16.1.100; +dbl_nat_entry.data.mask.dst_ip = 0xffffffff; +dbl_nat_entry.data.key.l4_dst_port = 5000; +dbl_nat_entry.data.mask.l4_dst_port = 0xffff; + +create_nat_entry(&dbl_nat_entry, attr_count, nat_entry_attr); + +/* Step 2: Create a Double NAPT entry for the reverse direction: +* ------------------------------------------------------------ */ +sai_attribute_t nat_entry_attr[10]; +nat_entry_t dbl_nat_entry; + +nat_entry_attr[0].id = SAI_NAT_ENTRY_ATTR_NAT_TYPE; +nat_entry_attr[0].value = SAI_NAT_TYPE_DOUBLE_NAT; + +nat_entry_attr[1].id = SAI_NAT_ENTRY_ATTR_SRC_IP; +nat_entry_attr[1].value.u32 = 172.16.1.100; + +nat_entry_attr[2].id = SAI_NAT_ENTRY_ATTR_L4_SRC_PORT; +nat_entry_attr[2].value.u16 = 5000; + +nat_entry_attr[3].id = SAI_NAT_ENTRY_ATTR_DST_IP; +nat_entry_attr[3].value.u32 = 200.200.200.1; + +nat_entry_attr[4].id = SAI_NAT_ENTRY_ATTR_L4_DST_PORT; +nat_entry_attr[4].value.u16 = 6000; + +nat_entry_attr[5].id = SAI_NAT_ENTRY_ATTR_ENABLE_PACKET_COUNT; +nat_entry_attr[5].value.booldata = true; + +nat_entry_attr[6].id = SAI_NAT_ENTRY_ATTR_ENABLE_BYTE_COUNT; +nat_entry_attr[6].value.booldata = true; + +attr_count = 7; + +memset(&dbl_nat_etnry, 0, sizeof(nat_etnry)); + +dbl_nat_entry.data.key.src_ip = 200.200.200.100; +dbl_nat_entry.data.mask.src_ip = 0xffffffff; +dbl_nat_entry.data.key.proto = 17; /* TCP */ +dbl_nat_entry.data.key.l4_src_port = 3000; +dbl_nat_entry.data.mask.l4_src_port = 0xffff; +dbl_nat_entry.data.key.dst_ip = 138.76.28.1; +dbl_nat_entry.data.mask.dst_ip = 0xffffffff; +dbl_nat_entry.data.key.l4_dst_port = 1024; +dbl_nat_entry.data.mask.l4_dst_port = 0xffff; + +create_nat_entry(&dbl_nat_entry, attr_count, nat_entry_attr); + +``` +For example, to create a Counter entry per Zone per NAT type, NAT orchagent invokes the following SAI APIs with the necessary SAI attributes: + +``` +/* Step 1: Create a NAT Zone Counter Object for Source NAT Zone 1: +* ------------------------------------------------------------------- */ +sai_attribute_t nat_zone_counter_attr[10]; + +nat_zone_counter_attr[0].id = SAI_NAT_ZONE_COUNTER_ATTR_NAT_TYPE; +nat_zone_counter _attr[0].value = SAI_NAT_TYPE_SOURCE_NAT; + +nat_zone_counter_attr[1].id = SAI_NAT_ZONE_COUNTER_ATTR_ZONE_ID; +nat_zone_counter_attr[1].value.u32 = 1; + +nat_zone_counter_attr[2].id = SAI_NAT_ZONE_COUNTER_ATTR_ENABLE_TRANSLATION_NEEDED; +nat_zone_counter_attr[2].value.booldata = true; + +nat_zone_counter_attr[3].id = SAI_NAT_ZONE_COUNTER_ATTR_ENABLE_DISCARD; +nat_zone_counter_attr[3].value.booldata = true; + +nat_zone_counter_attr[4].id = SAI_NAT_ZONE_COUNTER_ATTR_ENABLE_TRANSLATIONS; +nat_zone_counter_attr[4].value.booldata = true; + +attr_count = 5; + +nat_zone100_counter_id = +create_nat_zone_counter(switch_id, attr_count, nat_zone_counter_attr); +``` +To read the hit-bit status and counters for a given NAT entry, the following SAI API is called. +``` +/* Step 1: Read SNAT entry hit bit: +* ----------------------------------------------------------- */ + +sai_attribute_t nat_entry_attr[10]; + +nat_entry_attr[0].id = SAI_NAT_ENTRY_ATTR_HIT_BIT; + +nat_entry_attr[1].id = SAI_NAT_ENTRY_ATTR_BYTE_COUNT; + +nat_entry_attr[2].id = SAI_NAT_ENTRY_ATTR_PACKET_COUNT; + +attr_count = 3; + +get_nat_entry_attributes(snat_entry, attr_count, nat_entry_attr); + +``` + +## 3.7 Statistics +The following NAT counters are applicable per zone: + +DNAT_DISCARDS/SNAT_DISCARDS – If Packet is not TCP/UDP and/or is a fragmentated IP packet. +DNAT_TRANSLATION_NEEDED/SNAT_TRANSLATION_NEEDED – If there is NAT table lookup miss for TCP/UDP packets. +DNAT_TRANSLATIONS/SNAT_TRANSLATIONS – If NAT table lookup is hit. + +The following counters are provided per NAT/NAPT entry: + +Translated packets - Number of packets translated using the NAT/NAPT entry. +Translated bytes - Number of bytes translated using the NAT/NAPT entry. + +## 3.8 CLI + +### 3.8.1 Data Models +N/A + +### 3.8.2 Config CLI commands +| Command | Description | +|:------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------| +| config nat add static basic {global-ip} {local-ip} -nat_type {snat/dnat} -twice_nat_id {value} | Use this command to add basic static NAT entry | +| config nat remove static basic {global-ip} {local-ip} | Use this command to remove basic static NAT entry | +| config nat add static {tcp \| udp} {global-ip} {global-port} {local-ip} {local-port} -nat_type {snat/dnat} -twice_nat_id {value} | Use this command to add a static NAPT entry | +| config nat remove static {tcp \| udp} {global-ip} {global-port} {local-ip} {local-port} | Use this command to remove a static NAPT entry | +| config nat remove static all | Use this command to remove all the static NAT/NAPT configuration | +| config nat add pool {pool-name} {global-ip-range} {global-port-range} | Use this command to create a NAT pool | +| config nat remove pool {pool-name} | Use this command to remove a NAT pool | +| config nat remove pools | Use this command to remove all the NAT pool configuration | +| config nat add binding {binding-name} {pool-name} {acl-name} -nat_type {snat/dnat} -twice_nat_id {value} | Create a binding between an ACL and a NAT pool | +| config nat remove binding {binding-name} | Remove a binding between an ACL and a NAT pool | +| config nat remove bindings | Use this command to remove all the NAT binding configuration | +| config nat add interface {interface-name} {-nat_zone {zone-value}} | Use this command to configure the NAT zone value on an interface | +| config nat remove interface {interface-name} | Use this command to remove the NAT configuration on the interface | +| config nat remove interfaces | Use this command to remove the NAT configuration on all the L3 interfaces | +| config nat set timeout {secs} | Use this command to configure the Basic NAT entry aging timeout in seconds. | +| config nat reset timeout | Use this command to reset the Basic NAT entry aging timeout to default value. | +| config nat feature {enable/disable} | Use this command to enable or disable the NAT feature. | +| config nat set udp-timeout {secs} | Use this command to configure the UDP NAT entry aging timeout in seconds. | +| config nat reset udp-timeout | Use this command to reset the UDP NAT entry aging timeout to default value. | +| config nat set tcp-timeout {secs} | Use this command to configure the TCP NAT entry aging timeout in seconds. | +| config nat reset tcp-timeout | Use this command to reset the TCP NAT entry aging timeout to default value. | + +### 3.8.3 Show CLI commands +| Command | Description | +|:----------------------|:-----------------------------------------------------------| +| show nat translations | Use this command to show the NAT translations table | +| show nat statistics | Use this command to display the NAT translation statistics | +| show nat config static| Use this command to display the Static NAT/NAPT configuration | +| show nat config pool | Use this command to display the NAT pools configuration | +| show nat config bindings | Use this command to display the NAT bindings configuration | +| show nat config globalvalues | Use this command to display the global NAT configuration | +| show nat config zones | Use this command to display the L3 interface zone values | +| show nat translations count | Use this command to display the NAT entries count | +| show nat config | Use this command to display all the NAT configuration | + +Example: +``` +Router#show nat translations + +Static NAT Entries ................. 4 +Static NAPT Entries ................. 2 +Dynamic NAT Entries ................. 0 +Dynamic NAPT Entries ................. 4 +Static Twice NAT Entries ................. 0 +Static Twice NAPT Entries ................. 4 +Dynamic Twice NAT Entries ................ 0 +Dynamic Twice NAPT Entries ................ 0 +Total SNAT/SNAPT Entries ................ 9 +Total DNAT/DNAPT Entries ................ 9 +Total Entries ................ 14 + + +Protocol Source Destination Translated Source Translated Destination +-------- --------- -------------- ----------------- ---------------------- +all 10.0.0.1 --- 65.55.42.2 --- +all --- 65.55.42.2 --- 10.0.0.1 +all 10.0.0.2 --- 65.55.42.3 --- +all --- 65.55.42.3 --- 10.0.0.2 +tcp 20.0.0.1:4500 --- 65.55.42.1:2000 --- +tcp --- 65.55.42.1:2000 --- 20.0.0.1:4500 +udp 20.0.0.1:4000 --- 65.55.42.1:1030 --- +udp --- 65.55.42.1:1030 --- 20.0.0.1:4000 +tcp 20.0.0.1:6000 --- 65.55.42.1:1024 --- +tcp --- 65.55.42.1:1024 --- 20.0.0.1:6000 +tcp 20.0.0.1:5000 65.55.42.1:2000 65.55.42.1:1025 20.0.0.1:4500 +tcp 20.0.0.1:4500 65.55.42.1:1025 65.55.42.1:2000 20.0.0.1:5000 +tcp 20.0.0.1:5500 65.55.42.1:2000 65.55.42.1:1026 20.0.0.1:4500 +tcp 20.0.0.1:4500 65.55.42.1:1026 65.55.42.1:2000 20.0.0.1:5500 + +Router#show nat translations count + +Static NAT Entries ................. 4 +Static NAPT Entries ................. 2 +Dynamic NAT Entries ................. 0 +Dynamic NAPT Entries ................. 4 +Static Twice NAT Entries ................. 0 +Static Twice NAPT Entries ................. 4 +Dynamic Twice NAT Entries ................ 0 +Dynamic Twice NAPT Entries ................ 0 +Total SNAT/SNAPT Entries ................ 9 +Total DNAT/DNAPT Entries ................ 9 +Total Entries ................ 14 + +Router#show nat statistics + +Protocol Source Destination Packets Bytes +-------- --------- -------------- ------------- ------------- +all 10.0.0.1 --- 802 1009280 +all 10.0.0.2 --- 23 5590 +tcp 20.0.0.1:4500 --- 110 12460 +udp 20.0.0.1:4000 --- 1156 789028 +tcp 20.0.0.1:6000 --- 30 34800 +tcp 20.0.0.1:5000 65.55.42.1:2000 128 110204 +tcp 20.0.0.1:5500 65.55.42.1:2000 8 3806 + + +Router#show nat config static + +Nat Type IP Protocol Global IP Global L4 Port Local IP Local L4 Port Twice-Nat Id +-------- ----------- ------------ -------------- ------------- ------------- ------------ +dnat all 65.55.45.5 --- 10.0.0.1 --- --- +dnat all 65.55.45.6 --- 10.0.0.2 --- --- +dnat tcp 65.55.45.7 2000 20.0.0.1 4500 1 +snat tcp 20.0.0.2 4000 65.55.45.8 1030 1 + +Router#show nat config pool + +Pool Name Global IP Range Global L4 Port Range +------------ ------------------------- -------------------- +Pool1 65.55.45.5 1024-65535 +Pool2 65.55.45.6-65.55.45.8 --- +Pool3 65.55.45.10-65.55.45.15 500-1000 + +Router#show nat config bindings + +Binding Name Pool Name Access-List Nat Type Twice-Nat Id +------------ ------------ ------------ -------- ------------ +Bind1 Pool1 --- snat --- +Bind2 Pool2 1 snat 1 +Bind3 Pool3 2 snat -- + + +Router#show nat config globalvalues + + Admin Mode : enabled + Global Timeout : 600 secs + TCP Timeout : 86400 secs + UDP Timeout : 300 secs + +``` +### 3.8.4 Clear commands + + +| Command | Description | +|:-----------------------------|:--------------------------------------------------------------------------| +| sonic-clear nat translations | Use this command to clear the NAT entries from the system. | +| sonic-clear nat statistics | Use this command to clear the statistics of NAT operations in the system. | + +### 3.8.5 Debug commands +Debug commands will be available once the debug framework is approved. + +Debug command 'show debug natorch' dumps the below information of NatOrch in the file /var/log/natorch_debug.log: +- Internal cache of NAT, NAPT, Twice NAT, Twice NAPT entries +- Nexthop resolution cache for the DNAT entries. + +### 3.8.6 REST API Support +N/A + +### 3.8.7 Example configuration + +#### ConfigDB objects:  + +``` +{ + "STATIC_NAPT": { + "65.55.42.2:TCP:1024": { + "local_ip": "20.0.0.1", + "local_port": 6000, + "nat_type": "dnat" + } + }, + "STATIC_NAT": { + "65.55.42.3": { + "local_ip": "20.0.0.3" + "nat_type": "dnat" + } + }, + + "ACL_TABLE": { + "10": { + "stage": "INGRESS", + "type": "L3", + "policy_desc": "nat-acl", + "ports": "Vlan2000" + } + }, + + "ACL_RULE": { + "10|1": { + "PRIORITY": "20", + "SRC_IP": "20.0.1.0/24", + "PACKET_ACTION": "do_not_nat" + }, + "10|2": { + "PRIORITY": "10", + "SRC_IP": "20.0.0.0/16", + "PACKET_ACTION": "forward" + }, + }, + + "NAT_POOL": { + "pool1": { + "nat_ip": "65.55.42.1", + "nat_port": "1024-65535" + } +    }, + + "NAT_BINDINGS": { + "nat1" : { + “access_list”: “10”, + “nat_pool”: “pool1” + } + }, + +    "NAT_GLOBAL: { + "Values" : { + "admin_mode": "enable", + "nat_timeout": 600, +        "nat_tcp_timeout": 1200, +        "nat_udp_timeout": 300 +    } + }, + + +``` +#### APPDB Objects:  + +For single NAT entries +``` +{  +    "NAT_TABLE:65.55.42.3": {  + "translated_ip": "20.0.0.3" + "nat_type" : "dnat" +        "entry_type" : "static" +    },  +    "NAT_TABLE:20.0.0.3": {  + "translated_ip": "65.55.42.3" + "nat_type" : "snat" +        "entry_type" : "static" +    },  +    "NAPT_TABLE:TCP:20.0.0.4:6003": {  + "translated_ip" : "65.55.42.1" + "translated_l4_port" : "1026" + "nat_type" : "snat" + "entry_type" : "dynamic"  +    } +    "NAPT_TABLE:TCP:65.55.42.1:1026": {  + "translated_ip" : "20.0.0.4" + "translated_l4_port" : "6003" + "nat_type" : "dnat" + "entry_type" : "dynamic"  +    } +} +``` +For twice NAT entries +``` +{  +    "NAPT_TWICE_TABLE:TCP:20.0.0.6:6004:65.55.42.1:1030": {  + "translated_src_ip": "65.55.42.1" + "translated_src_l4_port": "1031" + "translated_dst_ip": "20.0.0.7" + "translated_dst_l4_port": "6005" +        "entry_type": "dynamic" +    } +    "NAPT_TWICE_TABLE:TCP:20.0.0.8:6004:65.55.42.1:1032": {  + "translated_src_ip": "65.55.42.1" + "translated_src_l4_port": "1034" + "translated_dst_ip": "20.0.0.9" + "translated_dst_l4_port": "6006" +        "entry_type": "static" +    } +} + +``` + +# 4 Flow Diagrams + +## 4.1 Static NAPT configuration flow + +![Static NAPT configuration](images/static_napt_config_flow.png) + +## 4.2 Dynamic NAPT configuration flow + +![Dynamic NAPT entry creation](images/dynamic_napt_entry_creation_flow.png) + +## 4.3 Dynamic NAPT entry aging flow + +![Dynamic NAPT entry aging](images/nat_entry_aging_flow.png) + +# 5 Serviceability and Debug +The logging utility swssloglevel is used to set the log level of the NAT daemons like NatMgrd, NatSyncd. +Logging enables dumping the traces for different events like: +- When NatMgrd receives configuration events from CONFIG_DB. +- When NatMgrd is programming the iptables rules in the Linux Kernel. +- When NatMgrd is pushing the Static NAT/NAPT entries into the APP_DB. +- When NatSyncd is receiving notifications from the Kernel via conntrack netlink. +- When NatSyncd is pushing the dynamic NAPT entries into the APP_DB. +- When NatOrch is receiving notifications from the APP_DB. +- When NatOrch is pushing the NAT/NAPT entries into the ASIC_DB. + +# 6 Warm Boot Support +The traffic corresponding to the NAT translation sessions should not be disturbed during the warm reboot process. +When a planned warm restart is initiated: +- The NAT entries in the conntrack table in the kernel are saved into a nat_entries.dump file. +- All the dynamic NAT entries in the APP_DB are saved and restored in the APP_DB as part of warm reboot's Redis DB restore process. +- A python script 'restore_nat_entries.py' is started by the supervisord in the 'nat' docker startup after warm reboot. This script restores all the NAT entries from the nat_entries.dump file into the Linux Kernel's conntrack table and sets the 'restored' flag in the NAT_RESTORE_TABLE of STATE_DB. +- Once the NAT_RESTORE_TABLE 'restored' flag is set, the Natsyncd repopulates from the netlink dump of the conntrack table into the internal cache map that has the entries read from APP_DB . +- Natsyncd starts the reconciliation (deleting stale NAT entries and adding new NAT entries) into the APP_DB. + +Only if the L3 Route/Neighbor/Nexthop entries are restored properly during the warm restart, does the NAT warm restart work properly without traffic loss. + +Warm boot is supported for the NAT feature at the docker level and at the system level. + +# 7 Scalability + +###### Table 3: Scaling limits +|Name | Scaling value | +|--------------------------|------------------| +| Number of NAT entries | 1024 | +| Number of NAT bindings | 16 | + +# 8 Unit Test + +The Unit test case one-liners are as below: + +| S.No | Test case summary | +|----|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| 1 | Verify that the STATIC_NAPT NAT_POOL NAT_BINDINGS configuration from CONFIG_DB are received by natmgrd. | +| 2 | Verify that the static DNAT entry creations and deletions done via incremental CLI are received by natmgrd. | +| 3 | Verify that the static DNAT entry from config is pushed by natmgrd to APP_DB. | +| 4 | Verify that the iptable rule for the static DNAT entry is pushed by natmgrd to the kernel. | +| 5 | Verify that the iptable rules in the kernel are correctly populated for the 'ACL to NAT Pool' binding entries by the natmgrd. | +| 6 | Verify that the Orchagent is receiving static NAT entry creation from APP_DB. | +| 7 | Verify that the Orchagent is pushing the NAT entries into ASIC_DB by checking the contents in the ASIC_DB. | +| 8 | Verify that Orchagent is removing the inactive dynamic entries from ASIC_DB. | +| 9 | Verify that the conntrack entries for the inactive dynamic entries are removed from the kernel. | +| 10 | Verify that the NAT entries are programmed in the hardware. | +| 11 | Verify that the translation is happening in the hardware for the entries programmed by sending the traffic. | +| 12 | Verify in the hardware that the per entry NAT translation statistics packets and bytes are incrementing properly. | +| 13 | Verify that the NAT misses are reported by the hardware by sending new source outbound traffic. | +| 14 | Verify that a dynamic SNAT entry creation is notified via netlink to natsyncd when a new source outbound traffic is notified as NAT miss by hardware. | +| 15 | Verify that IP protocol type is matched in translation. For eg only the tcp traffic is port translated in the inbound direction by sending both udp and tcp flows when only static tcp nat is configured. | +| 16 | Verify that the active NAT entries are not timed out by the Orchagent. | +| 17 | Verify that the static NAT entries are not timed out though they are inactive. | +| 18 | Verify the inactivity timeout with different configured timeouts. | +| 19 | Verify that the NAT zone configuration is propagated to hardware. | +| 20 | Verify the outbound NAT translations to be working with traffic on VLAN or Ethernet or Port Channel L3 interfaces. | +| 21 | Verify that the dynamic NAPT translations are applied only on the traffic permitted by the ACL in the binding and not applied on the traffic that are 'do_no_nat'. | +| 22 | Verify that the static NAPT entries are successfuly created in CONFIG_DB. | +| 23 | Verify that the NAT pool config and the ACL association of the pool are configured and added to CONFIG_DB. | +| 24 | Verify that the static twice NAT/NAPT entries are successfully created in the APP_DB. | +| 25 | Verify that the traffic flows are double NAT'ted for the twice NAT/NAPT entries. | +| 26 | Verify the full cone NAT behavior. All outbound connections from the same internal IP+port are translated to the same IP+port. All inbound connections to the same IP+port are translated to the same internal IP_port. | +| 27 | Verify that new flows sent after the pool is exhausted are not NAT'ted and are dropped. | +| 28 | Verify that the NAT translations and functionality works only if the NAT feature is enabled globally. | +| 29 | Verify the NAT entries are displayed in 'show nat translations' command output. | +| 30 | Verify that the dynamic NAT entries are cleared in APP_DB, conntrack tables in kernel and in ASIC_DB after clear command using 'sonic-clear nat translations'. | +| 31 | Verify that the statistics are displaying the NAT translations packets and bytes for all entries. | +| 32 | Verify that the statistics are cleared on issuing the 'sonic-clear nat statistics' command. | +| 33 | Stop NAT docker and verify that any new dynamic NAT/NAPT entries are no longer added to hardware. | +| 34 | Start NAT docker and verify that the static NAT entries from CONFIG_DB are added in the kernel, APP_DB, ASIC_DB and any new dynamic entries are added to hardware. | +| 35 | Verify that the traffic flows that are NAT translated are not affected during warm restart. Zero packet loss and successful reconcilation. | +| 36 | Verify that the dynamic NAT translations are restored to APP_DB and the kernel after warm reboot. | +| 37 | Send up to 1024 outbound traffic flows that trigger creation of 1024 dynamic NAT entries. | +| 38 | Send more than 1024 outbound traffic flows and check that the NAT entry creations beyond 1024 entries are not created. | +| 39 | Verify that timed-out entries are creating space for new NAT entries and again limited to 1024 maximum entries. | +| 40 | Verify scaling beyond table limits and ensure the 'Table full' condition is reported by Nat OrchAgent. | +| 41 | NatMgrd handles errors while programming iptables rules in the kernel by logging the error log messages. | +| 42 | Delete static NAT/NAPT entry and verify if it is deleted from the APP_DB, kernel and the hardware. | +| 43 | Modify the ACL rules and verify that the iptable rules are updated in the kernel. | +| 44 | Error log when errors are seen in receiving netlink messages from conntrack by NatSyncd (for the dynamic NAPT entry notifications from the kernel). | +| 45 | Error messages are logged if NatOrch gets errors when writing to ASIC_DB. | +| 46 | Verify that the received NAT source miss and destination miss packets are 'trapped' to CPU on the right COS queue (Queue 3). The queue assignment should be lower than protocol queues and higher than broadcast packets. | +| 47 | Verify that the NAT source miss and destination miss packets are 'rate limited' to CPU (600pps). | +| 48 | Verify that dynamic NAT entries are learned even during a BCAST/MCAST storm (at line rate). | +| 49 | Verify the tracing of natmgrd at different log levels. | +| 50 | Verify the tracing of natorch at different log levels. | +| 51 | Execute the debug dump commands for dumping the internal operational data/state of NatOrch. | +| 52 | Verify NAT happens on ICMP traffic like ping, traceroute traffic. | +| 53 | Verify NAT happens on TCP traffic like ssh, telnet and UDP traffic like TFTP. | + +# 9 To be done in future release + +Features planned for future releases: + +- Hairpinning traffic with NAT +- VRF aware NAT +- Per Zone counters +- Removing entries when ACL action is modified from forward to do-not-nat +- Dynamic Destination NAT/NAPT based on the Pool and ACL bindings +- Dynamic NAPT for protocol types other than TCP/UDP/ICMP +- NAT64 to translate traffic between IPv6 and IPv4 hosts +- Subnet based NAT +- NAT on fragmented IP packets arriving at the NAT router +- Error handling of the failed NAT entries in the hardware +- ALG support diff --git a/doc/Barefoot-SAI-Workshop-03-21-2018-v4.pdf b/doc/ocp/201803-SAI-SONIC/Barefoot-SAI-Workshop-03-21-2018-v4.pdf similarity index 100% rename from doc/Barefoot-SAI-Workshop-03-21-2018-v4.pdf rename to doc/ocp/201803-SAI-SONIC/Barefoot-SAI-Workshop-03-21-2018-v4.pdf diff --git a/doc/SONIC Network Telemetry-final.pdf b/doc/ocp/201803-SAI-SONIC/SONIC Network Telemetry-final.pdf similarity index 100% rename from doc/SONIC Network Telemetry-final.pdf rename to doc/ocp/201803-SAI-SONIC/SONIC Network Telemetry-final.pdf diff --git a/doc/SONiC OCP2018 WAN.pdf b/doc/ocp/201803-SAI-SONIC/SONiC OCP2018 WAN.pdf similarity index 100% rename from doc/SONiC OCP2018 WAN.pdf rename to doc/ocp/201803-SAI-SONIC/SONiC OCP2018 WAN.pdf diff --git a/doc/SONiC SAI workshop.pdf b/doc/ocp/201803-SAI-SONIC/SONiC SAI workshop.pdf similarity index 100% rename from doc/SONiC SAI workshop.pdf rename to doc/ocp/201803-SAI-SONIC/SONiC SAI workshop.pdf diff --git a/doc/Sonic_workshop_config_model.pdf b/doc/ocp/201803-SAI-SONIC/Sonic_workshop_config_model.pdf similarity index 100% rename from doc/Sonic_workshop_config_model.pdf rename to doc/ocp/201803-SAI-SONIC/Sonic_workshop_config_model.pdf diff --git a/doc/OCP_Workshop_SONiC_201808.pdf b/doc/ocp/201808-SONIC/OCP_Workshop_SONiC_201808.pdf similarity index 100% rename from doc/OCP_Workshop_SONiC_201808.pdf rename to doc/ocp/201808-SONIC/OCP_Workshop_SONiC_201808.pdf diff --git a/doc/SONiC_OCP_2018_Rodny.pdf b/doc/ocp/201808-SONIC/SONiC_OCP_2018_Rodny.pdf similarity index 100% rename from doc/SONiC_OCP_2018_Rodny.pdf rename to doc/ocp/201808-SONIC/SONiC_OCP_2018_Rodny.pdf diff --git a/doc/SONiC_Platform_Management_Services_OCP_2018-08-30.pdf b/doc/ocp/201808-SONIC/SONiC_Platform_Management_Services_OCP_2018-08-30.pdf similarity index 100% rename from doc/SONiC_Platform_Management_Services_OCP_2018-08-30.pdf rename to doc/ocp/201808-SONIC/SONiC_Platform_Management_Services_OCP_2018-08-30.pdf diff --git a/doc/network_telemetry_in_SONiC(2018-08).pdf b/doc/ocp/201808-SONIC/network_telemetry_in_SONiC(2018-08).pdf similarity index 100% rename from doc/network_telemetry_in_SONiC(2018-08).pdf rename to doc/ocp/201808-SONIC/network_telemetry_in_SONiC(2018-08).pdf diff --git a/doc/SONiC Deployment - Alibaba.pdf b/doc/ocp/201810-SONIC/SONiC Deployment - Alibaba.pdf similarity index 100% rename from doc/SONiC Deployment - Alibaba.pdf rename to doc/ocp/201810-SONIC/SONiC Deployment - Alibaba.pdf diff --git a/doc/SONiC Intro and Roadmap.pdf b/doc/ocp/201810-SONIC/SONiC Intro and Roadmap.pdf similarity index 100% rename from doc/SONiC Intro and Roadmap.pdf rename to doc/ocp/201810-SONIC/SONiC Intro and Roadmap.pdf diff --git a/doc/SONiC Powered by Programmable Dataplane - Barefoot.pdf b/doc/ocp/201810-SONIC/SONiC Powered by Programmable Dataplane - Barefoot.pdf similarity index 100% rename from doc/SONiC Powered by Programmable Dataplane - Barefoot.pdf rename to doc/ocp/201810-SONIC/SONiC Powered by Programmable Dataplane - Barefoot.pdf diff --git "a/doc/SONiC\302\240Architect adn Platform\302\240Management Service - MSFT.pdf" "b/doc/ocp/201810-SONIC/SONiC\302\240Architect adn Platform\302\240Management Service - MSFT.pdf" similarity index 100% rename from "doc/SONiC\302\240Architect adn Platform\302\240Management Service - MSFT.pdf" rename to "doc/ocp/201810-SONIC/SONiC\302\240Architect adn Platform\302\240Management Service - MSFT.pdf" diff --git a/doc/Sonic CLI Framework and Practice - Alibaba.pdf b/doc/ocp/201810-SONIC/Sonic CLI Framework and Practice - Alibaba.pdf similarity index 100% rename from doc/Sonic CLI Framework and Practice - Alibaba.pdf rename to doc/ocp/201810-SONIC/Sonic CLI Framework and Practice - Alibaba.pdf diff --git a/doc/Taking SONiC Beyond BGP with Mellanox Spectrum - MLNX.pdf b/doc/ocp/201810-SONIC/Taking SONiC Beyond BGP with Mellanox Spectrum - MLNX.pdf similarity index 100% rename from doc/Taking SONiC Beyond BGP with Mellanox Spectrum - MLNX.pdf rename to doc/ocp/201810-SONIC/Taking SONiC Beyond BGP with Mellanox Spectrum - MLNX.pdf diff --git a/doc/2019-OCP-hackathon-Kubers.pdf b/doc/ocp/201903-SONIC/hackathon/2019-OCP-hackathon-Kubers.pdf similarity index 100% rename from doc/2019-OCP-hackathon-Kubers.pdf rename to doc/ocp/201903-SONIC/hackathon/2019-OCP-hackathon-Kubers.pdf diff --git a/doc/Aviz_AIMS_SONiC_Hackathon_v3.0.pdf b/doc/ocp/201903-SONIC/hackathon/Aviz_AIMS_SONiC_Hackathon_v3.0.pdf similarity index 100% rename from doc/Aviz_AIMS_SONiC_Hackathon_v3.0.pdf rename to doc/ocp/201903-SONIC/hackathon/Aviz_AIMS_SONiC_Hackathon_v3.0.pdf diff --git a/doc/Cape-tain Johan Hackathon.pdf b/doc/ocp/201903-SONIC/hackathon/Cape-tain Johan Hackathon.pdf similarity index 100% rename from doc/Cape-tain Johan Hackathon.pdf rename to doc/ocp/201903-SONIC/hackathon/Cape-tain Johan Hackathon.pdf diff --git a/doc/ConfigValidator Hackathon.pdf b/doc/ocp/201903-SONIC/hackathon/ConfigValidator Hackathon.pdf similarity index 100% rename from doc/ConfigValidator Hackathon.pdf rename to doc/ocp/201903-SONIC/hackathon/ConfigValidator Hackathon.pdf diff --git a/doc/Hackathon Environment.pdf b/doc/ocp/201903-SONIC/hackathon/Hackathon Environment.pdf similarity index 100% rename from doc/Hackathon Environment.pdf rename to doc/ocp/201903-SONIC/hackathon/Hackathon Environment.pdf diff --git a/doc/INV_robustness monitor Hackathon.pdf b/doc/ocp/201903-SONIC/hackathon/INV_robustness monitor Hackathon.pdf similarity index 100% rename from doc/INV_robustness monitor Hackathon.pdf rename to doc/ocp/201903-SONIC/hackathon/INV_robustness monitor Hackathon.pdf diff --git a/doc/OCP-2019-Hackathon-SONiC-WeekendWarriors.pdf b/doc/ocp/201903-SONIC/hackathon/OCP-2019-Hackathon-SONiC-WeekendWarriors.pdf similarity index 100% rename from doc/OCP-2019-Hackathon-SONiC-WeekendWarriors.pdf rename to doc/ocp/201903-SONIC/hackathon/OCP-2019-Hackathon-SONiC-WeekendWarriors.pdf diff --git a/doc/SONiC_webnms Hackathon.pdf b/doc/ocp/201903-SONIC/hackathon/SONiC_webnms Hackathon.pdf similarity index 100% rename from doc/SONiC_webnms Hackathon.pdf rename to doc/ocp/201903-SONIC/hackathon/SONiC_webnms Hackathon.pdf diff --git a/doc/Virtual Switch X Hackathon.pdf b/doc/ocp/201903-SONIC/hackathon/Virtual Switch X Hackathon.pdf similarity index 100% rename from doc/Virtual Switch X Hackathon.pdf rename to doc/ocp/201903-SONIC/hackathon/Virtual Switch X Hackathon.pdf diff --git a/doc/Cloud-grade Routing as a Micro-service for Open Networking Platforms - Juniper.pdf b/doc/ocp/201903-SONIC/workshop/Cloud-grade Routing as a Micro-service for Open Networking Platforms - Juniper.pdf similarity index 100% rename from doc/Cloud-grade Routing as a Micro-service for Open Networking Platforms - Juniper.pdf rename to doc/ocp/201903-SONIC/workshop/Cloud-grade Routing as a Micro-service for Open Networking Platforms - Juniper.pdf diff --git a/doc/Developer's Overview of SONiC - LNKD.pdf b/doc/ocp/201903-SONIC/workshop/Developer's Overview of SONiC - LNKD.pdf similarity index 100% rename from doc/Developer's Overview of SONiC - LNKD.pdf rename to doc/ocp/201903-SONIC/workshop/Developer's Overview of SONiC - LNKD.pdf diff --git a/doc/Opportunities and Obstacles in Open Source Networking - Dell.pdf b/doc/ocp/201903-SONIC/workshop/Opportunities and Obstacles in Open Source Networking - Dell.pdf similarity index 100% rename from doc/Opportunities and Obstacles in Open Source Networking - Dell.pdf rename to doc/ocp/201903-SONIC/workshop/Opportunities and Obstacles in Open Source Networking - Dell.pdf diff --git a/doc/SONiC Dataplane Emulation Alibaba.pdf b/doc/ocp/201903-SONIC/workshop/SONiC Dataplane Emulation Alibaba.pdf similarity index 100% rename from doc/SONiC Dataplane Emulation Alibaba.pdf rename to doc/ocp/201903-SONIC/workshop/SONiC Dataplane Emulation Alibaba.pdf diff --git a/doc/SONiC Extension Infrastructure - MLNX.pdf b/doc/ocp/201903-SONIC/workshop/SONiC Extension Infrastructure - MLNX.pdf similarity index 100% rename from doc/SONiC Extension Infrastructure - MLNX.pdf rename to doc/ocp/201903-SONIC/workshop/SONiC Extension Infrastructure - MLNX.pdf diff --git a/doc/SONiC in Azure Mission Critical Application - MSFT.pdf b/doc/ocp/201903-SONIC/workshop/SONiC in Azure Mission Critical Application - MSFT.pdf similarity index 100% rename from doc/SONiC in Azure Mission Critical Application - MSFT.pdf rename to doc/ocp/201903-SONIC/workshop/SONiC in Azure Mission Critical Application - MSFT.pdf diff --git a/doc/SONiC unit test and function test enhancement - Edgecore.pdf b/doc/ocp/201903-SONIC/workshop/SONiC unit test and function test enhancement - Edgecore.pdf similarity index 100% rename from doc/SONiC unit test and function test enhancement - Edgecore.pdf rename to doc/ocp/201903-SONIC/workshop/SONiC unit test and function test enhancement - Edgecore.pdf diff --git a/doc/Self-Healing Network Linkedin Keynote.pdf b/doc/ocp/201903-SONIC/workshop/Self-Healing Network Linkedin Keynote.pdf similarity index 100% rename from doc/Self-Healing Network Linkedin Keynote.pdf rename to doc/ocp/201903-SONIC/workshop/Self-Healing Network Linkedin Keynote.pdf diff --git a/doc/Using NBI Image for SONiC System - Cisco.pdf b/doc/ocp/201903-SONIC/workshop/Using NBI Image for SONiC System - Cisco.pdf similarity index 100% rename from doc/Using NBI Image for SONiC System - Cisco.pdf rename to doc/ocp/201903-SONIC/workshop/Using NBI Image for SONiC System - Cisco.pdf diff --git a/doc/platform/brcm_pdk_pddf.md b/doc/platform/brcm_pdk_pddf.md new file mode 100644 index 0000000000..f3151d3245 --- /dev/null +++ b/doc/platform/brcm_pdk_pddf.md @@ -0,0 +1,1170 @@ +## Feature Name +Platform Driver Development Framework (PDDF) + +## High Level Design Document +**Rev 0.1** + +## Table of Contents + * [List of Tables](#list-of-tables) + * [Revision](#revision) + * [About This Manual](#about-this-manual) + * [Scope](#scope) + * [Definition/Abbreviation](#definitionabbreviation) + * [Requirements Overview](#requirements-overview) + * [Functional Requirements](#functional-requirements) + * [Scalability Requirements](#scalability-requirements) + * [Warmboot Requirements](#warmboot-requirements) + * [Configuration and Management Requirements](#configuration-and-management-requirements) + * [Functional Description](#functional-description) + * [Design](#design) + * [Overview](#overview) + * [Generic PDDF HW Device Drivers](#generic-pddf-hw-device-drivers) + * [PDDF Platform API plugins](#pddf-platform-api-plugins) + * [Generic Driver Design](#generic-driver-design) + * [PDDF Device Driver Modules](#pddf-device-driver-modules) + * [PDDF Device Modules](#pddf-device-modules) + * [Driver Extension Framework](#driver-extension-framework) + * [Generic Plugin Design](#generic-plugin-design) + * [PDDF I2C Component Design](#pddf-i2c-component-design) + * [List of Supported Components](#list-of-supported-components) + * [I2C Topology Descriptor](#i2c-topology-descriptor) + * [PSU Component](#psu-component) + * [FAN Component](#fan-component) + * [LED Component](#led-component) + * [Sensors](#sensors) + * [System EEPROM Component](#system-eeprom-component) + * [System Status Registers](#system-status-registers) + * [Optics Component](#optics-component) + * [lm-sensors](#lm-sensors-tools) + * [SAI](#sai) + * [CLI](#cli) + * [Serviceability and DEBUG](#serviceability-and-debug) + * [Warm Boot Support](#warm-boot-support) + * [Unit Test](#unit-test) + + +# List of Tables +[Table 1: Abbreviations](#table-1-abbreviations) + +# Revision +| Rev | Date | Author | Change Description | +|:---:|:-----------:|:------------------:|-----------------------------------| +| 0.1 | 05/27/2019 | Systems Infra Team | Initial version | +| 0.2 | 06/06/2019 | Systems Infra Team | Incorporated feedback | + +# About this Manual +Platform Driver Development Framework (PDDF) is part of SONiC Platform Development Kit (PDK) which optimizes the platform development. PDK consists of + - PDDF (Platform Driver Development Framework): For optimized data-driven platform driver and SONiC plugin development + - PDE (Platform Development Environment): For optimized build and test of platform and SAI code + +PDE details are covered in another document. This document describes Platform Driver Development Framework (PDDF) which can be used as an alternative to the existing manually-written SONiC platform driver framework. It enables platform vendors to rapidly develop the device specific custom drivers and SONiC user space python plugins, using a data-driven architecture, to manage platform devices like Fan, PSUs, LEDs, Optics, System EEPROM, etc., and validate a platform on SONiC. + +# Scope +This document describes the high level design details of PDDF and its components. The PDDF consists of generic device drivers and user space platform API plugins which use the per platform specific data in the JSON descriptor files. This document describes the interaction between all the components and the tools used to support these drivers and plugins. + + +# Definition/Abbreviation +### Table 1: Abbreviations +| **Term** | **Meaning** | +|--------------------------|-------------------------------------| +| ODM | Original Design Manufacturer | +| OEM | Original Equipment Manufacturer | +| PDDF | Platform Driver Development Framework | +| PDE | Platform Development Environment | +| PDK | Platform Development Kit | +| SAI | Switch Abstraction Interface | +| PSU | Power Supply Unit | +| I2C | Inter-integrated Circuit communication protocol | +| SysFS | Virtual File System provided by the Linux Kernel | + + +## 1 Requirements Overview +SONiC OS is portable across different network devices with supported ASIC via Switch Abstraction Interface (SAI). These devices primarily differ in the way various device specific hardware components are accessed, and thus require custom device drivers and python plugins. Each platform vendor implements these custom device drivers and plugins. The feature requirement is to support a SONiC platform driver development framework to enable rapid development of custom device drivers and plugins. + +### 1.1 Functional Requirements +Define Platform driver development framework to enable platform vendors to develop custom device drivers and plugins rapidly to accelerate development and validation of platforms in SONiC environment. The following requirements need to be satisfied by the framework. + - PDDF to provide a data driven framework to access platform HW devices. + - PDDF shall support I2C based HW designs with I2C controllers on the Host CPU. + - Provide reusable generic device drivers for the following components + - FAN + - PSU (Power supply units) + - System EEPROM + - Optic Transceivers (SFP, QSFP) + - CPLD + - System Status Registers + - System LED + - Generic drivers would expose device attributes via SysFS interface + - PDDF shall support reusing custom device driver or standard linux driver to be used along with the generic drivers. This would allow platform HW devices to be managed by PDDF generic drivers or custom/standard Linux drivers. + - Custom drivers should expose device attributes via SysFS interface + - Support platform vendors to extend generic drivers with custom implementations to support initialization, exit, pre and post access for get / set of attributes + - Provide generic SONiC python plugins to access various attributes for the following devices + - FAN + - PSU (Power supply units) + - System EEPROM + - Optic Transceivers (SFP, QSFP) + - System Status Registers + - System LED + - Support data driven framework using the descriptor files to represent the following platform specific information + - I2C HW topology representing the various devices and their interconnections + - Per platform data to enable get / set of various attributes in each device + - PDDF generic drivers shall not require a reboot after installation + - PDDF generic drivers shall support initialization and de-initialization + - Platform drivers developed using the PDDF framework shall support the current SONiC platform CLIs + - PDDF developer guide shall be provided + + ### 1.2 Configuration and Management Requirements + - There are no configuration commands + - The generic PDDF plugins use the base classes from src/sonic-platform-common for the following + components: + - PSU (sonic_psu) + - Optic Transceivers (sonic_sfp) + - EEPROM (sonic_eeprom) + - LED (sonic_led) + - Current SONiC platform CLIs shall be supported + +### 1.3 Scalability Requirements +NA +### 1.4 Warmboot Requirements +NA +## 2 Functional Description + +SONiC platform bring up typically involves the following steps: + + - Support Switching ASIC + - Vendor platform specific drivers and plugins to manage platform devices + +Generally, the SAI support for a given switching silicon is pre-validated, and the platform vendor mostly focuses on the platform devices during platform bring up. The platform components involve the following: + + - port_config.ini (Port / Lane mappings) + - config.bcm + - Platform Device drivers (FAN/PSU/Optics/Sensors/CPLD,etc.) + - SONiC platform python plugins + +Most of the platform bring up effort goes in developing the platform device drivers, SONiC plugins and validating them. Typically each platform vendor writes their own drivers and plugins which is very tailor made to that platform. This involves writing code, building, installing it on the target platform devices and testing. Many of the details of the platform are hard coded into these drivers, from the HW spec. They go through this cycle repetitively till everything works fine, and is validated before upstreaming the code. + +PDDF aims to make this platform driver and plugin development process much simpler by providing a data driven development framework. This is enabled by: + + - JSON descriptor files for platform data + - Generic data-driven drivers for various devices + - Generic SONiC plugins + - Vendor specific extensions for customization and extensibility + +This makes the development and testing much simpler. Any change in the platform data can be made on the target in the JSON files and validated instantly. This helps improve the productivity of the platform developers significantly. + +## 3 Design + +### 3.1 Overview + +![PDDF Architecture](../../images/platform/pddf_hld1.png) + + SONiC PDDF (Platform driver development framework) supports the following HW devices on a given platform: + + - Fan + - PSU + - System EEPROM + - CPLD + - Optic Transceivers + - System LED control via CPLD + - System Status Registers in CPLD + - Temp Sensors + + High level architecture of the PDDF consists of the following: + + - PDDF JSON Descriptor files + - Generic PDDF Python plugins for various devices implementing the Platform APIs + - PDDF Tools + - Generic PDDF HW device drivers in kernel space + - Custom vendor driver extensions modules + +#### 3.1.1 JSON Descriptor files + + The descriptor files are used to represent the following information for a given platform: + - I2C Topology descriptor + - Representation of the I2C bus + - I2C client devices + - Inter connection of I2C devices + - I2C Device Access attributes + - Each device exposes a set of data attributes to read/write + - Eg., PSU(psu_present), SFP/QSFP(sfp_present, lpmode) - CPLD registers/offsets/ masks, etc., + - For each device, platform specific attributes to help access the data attributes + - Reference to a standard Linux driver, if available and used, for I2C device + - pca954x, lm75, etc., + - Value Map for each device in a platform + - Each device or a platform can represent values to be interpreted in a different way. + - For eg., on some platforms “1” can represent “Port Side Intake”, whereas on another platform it could be “0”. This map provides how to interpret the values. + +#### 3.1.2 Generic PDDF HW Device Drivers + + +PDDF generic drivers are device drivers for the following devices: FAN/PSU/EEPROM/Optics Transceivers/ System LED/ CPLDs. These drivers in kernel space, rely on the per-platform data in JSON descriptor files to expose data via SysFS interface. They provide a generic interface to get/set of these attributes. There are two types of data, a driver works on: + + - Device data – Attributes exposed by the device itself + - Platform access data – Information on how/ where to access the device attributes + +These generic PDDF drivers provide capabilities to: + + - Extend using vendor implementations + - Mix and match generic and standard drivers + - Support any existing driver for a given component + +#### 3.1.3 PDDF Platform API plugins +PDDF provides generic user space python plugins which implement the platform APIs defined under: + - src/sonic-platform-common/sonic_sfp (Optic transceivers) + - src/sonic-platform-common/sonic_psu (PSU Util) + - src/sonic-platform-common/sonic_led (Port LED plugin) + +These plugins use the per platform JSON descriptor files to use the appropriate SysFS attributes to get and set. + +#### 3.1.4 Source code organization and Packaging +PDDF source code is mainly organized into platform dependent data files(JSON descriptors), generic PDDF driver modules, generic plugins, generic utils, and start up scripts. + + - /service/sonic-buildimage/platform/pddf + - modules + - init + + - /service/sonic-buildimage/src/sonic-platform-common/pddf + - plugins + - utils + + - JSON descriptor files should be placed in the "pddf/json" directory under the respective "/sonic-buildimage/platform/" directory path. For example: + - sonic-buildimage/platform/broadcom/sonic-platform-modules-accton/as7712-32x/pddf/json/\) + +From SONiC build, all the PDDF components shall be built and packaged into a common pddf Debian package. Every platform builds and packages per platform specific drivers, utilities, scripts, etc., into a platform Debian package. + +#### 3.1.5 Deployment details +For the Runtime environment, PDDF shall provide a init script which shall be integrated into the per platform init script. This will load the PDDF modules and plugins and will use the per platform JSON descriptor files for initializing the platform service. + +### 3.2 Generic Driver Design +Vendors write platform specific component drivers and deploy them as kernel loadable modules. In PDDF, drivers are generic, with platform specific data populated in JSON descriptor files. The JSON descriptor files are provided by the PDDF developer. Usually two different kernel modules are associated with each component. One is *Device Driver Module* and other is *Device Module*. + +For a generic device driver, there are 2 types of data. + - Device-Data Attributes + - Access-Data + +**Device-Data Attributes:** + These are the attributes exposed to the user by the driver. These attributes provide device-related information. These attributes can be read-only or read-write. These attributes are dynamically associated with the device driver using the input from data JSON file. Examples of Device-Data attributes include, + *psu_present* and *psu_power_good* for PSU, and + *fan1_front_rpm* and *fan1_rear_rpm* for FAN device drivers. + +**Access-Data:** + This is platform specific data used to retrieve values from hardware. This includes per-Device Data attribute device addresses, register offsets, masks, expected values and length. This access-data varies for various components. The per-platform data is read from JSON file and passed to the kernel space as driver platform_data using the access-data attributes. + + +#### 3.2.1 PDDF Device Driver Modules +PDDF device driver modules are generic. Access-Data is attached to the I2C device data structure during I2C device instantiation. This access-data also specifies which Device-Data attributes are supported, along with the platform dependent data for each attribute. The supported Device-Data attributes are dynamically created as SysFS attributes. This design is helpful in linking different Device-Data attributes to different I2C client devices, if applicable. The additional *driver_data* for the client, which consist of values of the attributes, last updated time, mutex lock, name etc, is also allocated dynamically and maintained *per-attribute* wise. + +![Figure1: PSU device driver](../../images/platform/pddf_device_driver_psu.png "PSU Device Driver") + + +#### 3.2.2 PDDF Device Modules +PDDF device module is an intermediate module to manage the actual device driver module. It helps populate the per-platform access data, and manages the access data attributes via SysFS interface. It also helps in I2C device instantiation using the I2C topology data and access-data. It defines a SysFs attribute *dev_ops* to trigger instantiation or detachment of the devices. This module has a dependency on the driver-module. + + +#### 3.2.3 Driver Extension Framework +There is a provision to have a *pre* and *post* APIs for important driver/module functions such as probe, init, exit. These pre and post functionalities can be vendor specific, if required, and need to be defined by each vendor in a separate vendor-specific API module. A generic implementation of *show* or *store* APIs are provided for each Device-Data attribute. However, if needed vendor can provide their own implementation for these APIs. Such definitions should also go into the vendor-specific API module. + + + +#### 3.2.4 JSON Descriptor Files +There are multiple JSON files which must be provided by a PDDF developer. The list of information provided by the JSON files is below, + + - Platform Inventory Info: + - Details like number of fans, PSUs, ports etc + - Device Parsing Info + - I2C Topology Info + - Device Access info + - Value Maps Info for various device-data Attributes, etc. + + +### 3.3 Generic Plugin Design + +![Figure2: PSU Generic Plugin](../../images/platform/pddf_generic_plugin_psu.png "PSU Generic Plugin") + + +Generic plugins are extended from respective base classes but do not have any platform specific data. All the platform specific data mentioned below, is retrieved from JSON files. + * Platform inventory + * SysFS paths of various device attributes + * Platform dependent interpretations of some of the attribute values + +Important thing to note in this type of design is that the PDDF has standardized the attribute names, and it provides the ability to map it to driver supported attribute names. Since PDDF provides the drivers for most of the devices, it maintains a list of device attributes. If there is a need to use a non-PDDF custom/standard driver, user must provide the list of attributes supported (which might be used by the generic plugin) by that driver. If such driver uses different name for an attribute, then it is incumbent that the user also define the driver attribute name. +Example below shows the usage of a driver 'ym2651' for a PSU1-PMBUS device. Generic plugin has an attribute name *psu_fan_dir*. However, if the same information is denoted in the driver by *psu_fan_direction*, then user indicates this by the field *drv_attr_name*. +``` +"PSU1-PMBUS": { + "dev_info": { + "device_type": "PSU-PMBUS", + "device_name": "PSU1-PMBUS", + "device_parent": "MUX3", + "virt_parent": "PSU1" + }, + "i2c": { + "topo_info": { + "parent_bus": "0x30", + "dev_addr": "0x58", + "dev_type": "ym2851" + }, + "attr_list": [ + { "attr_name": "psu_fan1_fault" }, + { "attr_name": "psu_v_out" }, + { "attr_name": "psu_i_out" }, + { "attr_name": "psu_p_out" }, + { "attr_name": "psu_temp1_input" }, + { "attr_name": "psu_fan1_speed_rpm" }, + { + "attr_name":"psu_fan_dir", + "drv_attr_name": "psu_fan_direction" + }, + { "attr_name": "psu_mfr_id" } + ] + } +} +``` +List of supported attribute names are mentioned under each device's plugin util. Path for each attribute's SysFS is retrieved and is cached so that each time plugin util is called, it doesn't calculate the path again. + + +### 3.4 PDDF I2C Component Design + +#### 3.4.1 List of Supported HW Components +PDDF supports I2C based HW design consisting of the following components: + + - Fan Controller (CPLD or dedicated Controller EM2305) + - PSUs (YM2651, Ym2851, etc.,) + - Temp Sensors (LM75, LM90, TMP411, etc.,) + - Optics (SFP/QSFPs, EEPROM, etc.,) + - System EEPROM (at24, etc.,) + - CPLDs + - MUX (PCA954x,..) + - System LEDs managed by CPLD etc., + +#### 3.4.2 I2C Topology Descriptor +Generally a platform consist of fans, PSUs, temperature sensors, CPLDs, optics (SFP, QSFPs etc), eeproms and multiplexing devices. I2C topology refers to the parent-child and other connectivity details of the I2C devices for a platform. The path to reach any device can be discerned using the I2C topology. + +Example, + +![Figure3: PSU Topology Data](../../images/platform/pddf_topo_psu.png "PSU Topology Data") + + +I2C topology data consist of information such as *parent_bus*, *dev_addr* and *dev_type*. Users would describe the I2C topology data using a JSON Topology descriptor file . +*dev_info* object is used to represent the logical device. +*i2c* and *topo_info* are used for creating the I2C client. + +``` +"FAN-CPLD": + { + "dev_info": { + "device_type":"FAN", + "device_name":"FAN-CPLD", + "device_parent":"MUX2" + }, + "i2c": + { + "topo_info": { + "parent_bus":"0x20", + "dev_addr":"0x66", + "dev_type":"fan_ctrl" + }, + ... + } +} +``` +Here is a brief explanation of the fields in topology JSON + +> **device_type**: This mentions the generic device type. It can be either of these, PSU, FAN, CPLD, MUX, EEPROM, SFP, etc. This is a mandatory field. + +> **device_name**: This is the name of the device in the I2C topology. There can be a number or a substring appended to uniquely identify the device. e.g. FAN-CPLD, PSU1, PSU2, PORT1, MUX2 etc. This is an optional field. + +> **device_parent**: This gives the name of the parent device in the topology. It is also a mandatory field. + +> **i2c** object is put to differentiate with other mode of access such as PCI or BMC etc. **topo_info** gives the info to generate the I2C client. All the fields inside topo_info are mandatory. + +> **parent_bus**: This denotes the bus number to which device is connected. + +> **dev_addr**: This denotes the I2C address in the range of <0x0-0xff>. + +> **dev_type**: This denotes the name/type of device. This should match with the dev_id of the device inside the supporting driver. + + +If there is a MUX in path, its connected devices are mentioned under an array *channel*. Here is an example, + +``` +"MUX2": { + "dev_info": { + "device_type":"MUX", + "device_name":"MUX2", + "device_parent":"MUX1" + }, + "i2c": { + "topo_info": { + "parent_bus":"0x10", + "dev_addr":"0x76", + "dev_type":"pca9548" + }, + "dev_attr": {"virt_bus":"0x20"}, + "channel": [ + { "chn":"0", "dev":"FAN-CPLD" }, + { "chn":"2", "dev":"CPLD1" } + ] + } +} +``` +If the object is a MUX, then +> **virt_bus**: This is an information used internally to denote the base address for the channels of the mux. So if the virt_bus is 0x20 for a pca9548 then channel-buses are addressed as (0x20+0), (0x20+1), (0x20+2) .... , (0x20+7). + +> **channel**: This array gives info about the child devices for a mux. It mentions **chn** denoting the channel number, and **dev** denoting the device_name connected to this channel. + +If the object is PSU, then +> **psu_idx**: This is used internally to denote the PSU number. It is also a mandatory field. + +> **interface**: Here the user needs to define the PSU interface, eeprom and pmbus, for which I2C clients would be created. If user needs to use only pmbus client to get all the information, then only that should be mentioned. + +``` +"PSU1": +{ + "dev_info": { + "device_type":"PSU", + "device_name":"PSU1", + "device_parent":"MUX3" + }, + "dev_attr": { "psu_idx":"1"}, + "i2c": { + "interface": [ + { "itf":"pmbus", "dev":"PSU1-PMBUS" }, + { "itf":"eeprom", "dev":"PSU1-EEPROM" } + ] + }, +} +``` + +PDDF tools will use descriptor file data to instantiate I2C client devices and populate per-platform data. + +#### 3.4.3 PSU Component +PDDF has a PSU module and a PSU driver module. +##### 3.4.3.1 PSU Driver Modules +PDDF PSU module is used to + * Create the access data attributes to transfer the access data from user space to kernel. + * Populates the access data into PSU client's *platform_data* + * Create the PSU I2C client + +Usually every PSU device has two interfaces and hence two I2C clients. + 1. EEPROM interface + 2. SMBUS interface + +It is possible that all the required PSU info can be read using SMBUS interface itself. In such cases, only one SMBUS device needs to be created, and all the SysFS attributes shall be created under this device. + +PDDF PSU driver is used for both the interfaces and SysFS attributes are divided among the two. PSU driver module has the following functionalities, + * Create the SysFS data attributes + * Get/Set attribute's value from/to HW + +Currently supported PSU SysFS Attributes are: +``` +psu_present +psu_model_name +psu_power_good +psu_mfr_id +psu_serial_num +psu_fan_dir +psu_v_out +psu_i_out +psu_p_out +psu_fan1_speed_rpm +``` +##### 3.4.3.2 PSU JSON Design +PSU JSON is structured to include the access-data for all the supported SysFS attributes. +*attr_list* is an array object which stores the array of access-data for multiple attributes. If some of the field in the attribute object is not applicable to some particular attribute, it can be left and not filled. + +Description of the fields inside *attr_list* +> **attr_name**: This field denotes the name of SysFS attribute associated with this device I2C client. It is a mandatory field. + +> **attr_devaddr**: This denotes the I2C address of device from where this SysFS attribute value is to be read. e.g if *psu_present* is the SysFS attribute, and it needs to be read from a CPLD, the I2C address of that CPLD is to be mentioned here. + +> **attr_devtype**: Source device type of the value of SysFS attribute. + +> **attr_offset**: Register offset of the SysFS attribute. + +> **attr_mask**: Mask to be applied to read value. + +> **attr_cmpval**: Expected reg value after applying the mask. This is used to provide a Boolean value to the attribute. e.g `attr_val = ((reg_val & attr_mask) == attr_cmpval)` . + +> **attr_len**: Length of the SysFS attribute in bytes. + + + +``` +"PSU1-EEPROM": { + "i2c": { + "attr_list": [ + { + "attr_name":"psu_present", + "attr_devaddr":"0x60", + "attr_devtype":"cpld", + "attr_offset":"0x2", + "attr_mask":"0x2", + "attr_cmpval":"0x0", + "attr_len":"1" + }, + { + "attr_name":"psu_model_name", + "attr_devaddr":"0x50", + "attr_devtype":"eeprom", + "attr_offset":"0x20", + "attr_mask":"0x0", + "attr_len":"9" + }, + ... + ] + } +}, +``` + + +##### 3.4.3.3 PSU Plugin Design +PsuBase is the base PSU plugin class, which defines various APIs to get/set information from the PSU devices. PDDF PSU generic plugin shall extend from PsuBase and implement the platform specific APIs, using the platform specific information in the JSON descriptor files + +Example, +``` +def get_num_psus(self): + """ + Retrieves the number of PSUs supported on the device + :return: An integer, the number of PSUs supported on the device + """ + return 0 +``` + +#### 3.4.4 FAN Component +Fan has a PDDF device module and a PDDF device driver module. + +##### 3.4.4.1 FAN Driver Modules +PDDF fan module is used to + * Create the access data attributes to transfer the access data from user space to kernel. + * Populates the access data into PSU client's *platform_data* + * Create the fan I2C client + +There could be one or multiple client for fan controller. If any other controller is used, such as EMC2305 or EMC2302 etc, then there might be multiple fan controller clients . + +PDDF fan driver is used for all the fan clients and SysFS attributes are divided. Fan driver module has the following functionalities, + * Create the SysFS attributes + * Get/Set SysFS attribute's value from/to Fan controller devices + +Supported Fan SysFS attributes are: + +``` +fan_present +fan_direction +fan_front_rpm +fan_rear_rpm +fan_pwm +fan_duty_cycle +fan_fault +where idx represents the Fan index [1..8] +``` +##### 3.4.4.2 FAN JSON Design +FAN JSON is structured to include the access-data for all the supported SysFS attributes. +*attr_list* is an array object which stores the array of access-data for multiple attributes. If some of the field in the attribute object is not applicable to some particular attribute, it can be left out. + +Description of the objects inside *attr_list* which are very specific to Fan components are: + +> **attr_mult**: Multiplication factor to the value to get the FAN rpm. + +> **attr_is_divisor**: If the register value is a divisor to the multiplication factor to get the FAN rpm. + + + + +``` +"FAN-CPLD": { + "i2c": { + "dev_attr": { "num_fan":"6"}, + "attr_list": [ + { + "attr_name":"fan1_present", + "attr_devtype":"FAN-CPLD", + "attr_offset":"0x0F", + "attr_mask":"0x1", + "attr_cmpval":"0x0", + "attr_len":"1" + }, + ... + { + "attr_name":"fan1_direction", + "attr_devtype":"FAN-CPLD", + "attr_offset":"0x10", + "attr_mask":"0x1", + "attr_cmpval":"0x1", + "attr_len":"1" + }, + ... + { + "attr_name":"fan1_front_rpm", + "attr_devtype":"FAN-CPLD", + "attr_offset":"0x12", + "attr_mask":"0xFF", + "attr_len":"1", + "attr_mult":"100", + "attr_is_divisor": 0 + }, + ] + } +} +``` + + +##### 3.4.4.3 FAN Plugin Design +FanBase is the base FAN plugin class, which defines various APIs to get/set information from the Fan devices. PDDF Fan generic plugin shall extend from FanBase and implement the platform specific APIs, using the platform specific information in the JSON descriptor files. FanBase is part of the new platform API framework in SONiC. + +Example, +``` + def get_direction(self): + """ + Retrieves the direction of fan + Returns: + A string, either FAN_DIRECTION_INTAKE or FAN_DIRECTION_EXHAUST + depending on fan direction + """ + + def get_speed(self): + """ + Retrieves the speed of fan in rpms + Returns: + An integer, denoting the rpm (revolutions per minute) speed + """ + +``` + +#### 3.4.5 LED Component +Network switches have a variety of LED lights, system LEDs, Fan Tray LEDs, and port LEDs, used to act as indicators of switch status and network port status. The system LEDs are used to indicate the status of power and the system. The fan tray LEDs indicate each fan status. The port LEDs are used to indicate the state of the links such as link up, Tx/RX activity and speed. The Port LEDs are in general managed by the LED controller provided by switch vendors. The scope of this LED section is for system LEDs and fan tray LEDs. + +##### 3.4.5.1 LED Driver Design +LEDs are controlled via CPLDs. LEDs status can be read and set via I2C interfaces. A platform-independent driver is designed to access CPLDs via I2c interfaces. CPLD/register address data is stored in platform-specific JSON file. User can run plugins to trigger drivers to read/write LED statuses via SysFS. This generic LED driver is implemented to control System LED and Fan Tray LED. + +##### 3.4.5.2 JSON Design + This section provides examples of configuring platform, System LED and Fantray LED. They are consisted of key/value pairs. Each pair has a unique name. The table describes the naming convention for each unique key. + + +| **Key** | **Description** | +|--------------------------|-------------------------------------| +| PLATFORM | Numbers of power supply LED, fan tray LED | +| SYS_LED | System LED indicates System | +| PSU_LED |Power Supply Status LED X is an integer starting with 1 Example: PSU1_LED, PSU2_LED | +| LOC_LED | Flashing by remote management command. Assists the technician in finding the right device for service in the rack | +| FAN_LED | Fan Status LED for all fans | +| DIAG_LED | System self-diagnostic test status LED | +| FANTRAY_LED | Status LED for individual fan. X is an integer starting with 1 Example: FANTRAY1_LED, FANTRAY2_LED | + +Samples: + + "PLATFORM" : { "num_psu_led":"1", "num_fantray_led" : "4"} + "PSU1_LED" : { "dev_info": { "device_type":"LED", "device_name":"PSU_LED"}, + "dev_attr": { "index":"0"}, + "i2c": { + [ + {"attr_name":"on", "bits" : "6:5", "color" : "Green", "value" : "0x1", "swpld_addr" : "0x60", "swpld_addr_offset" : "0x66"}, + {"attr_name":"faulty", "bits" : "6:5", "color" : "Amber", "value" : "0x2", "swpld_addr" : "0x60", "swpld_addr_offset" : "0x66"}, + {"attr_name":"off", "bits" : "6:5", "color" : "Off", "value" : "0x3", "swpld_addr" : "0x60", "swpld_addr_offset" : "0x66"} + ] + } + } + + +##### 3.4.5.3 LED Plugin Design +A generic user space Python plugin is designed to access LEDs via SysFS interface. The plugin reads SysFS path information from platform-specific JSON File. The PddfLedUtil class provides set/get APIs to set LED color and retrieve LED color. + + + class PddfLedUtil: + # Possible status LED colors + GREEN = “on” + RED= “faulty” + OFF=”off” + def set_status_led(self, device_name, index, color, color_state): + Args: + device_name: A string representing device: FAN, LOC, DIAG, SYS, PSU and FANTRAY + index: An integer, 1-based index to query status + color: A string representing the color with which to set the status LED: GREEN, RED, OFF + color_state: A string representing the color state: SOLD, BLINK + Returns: + Boolean: True is status LED state is set successfully, False if not. + + def get_status_led(self, device_name, index): + Args: + device_name: A string representing device: FAN, LOC, DIAG, SYS and PSU + index: An integer, 1-based index of the PSU of which to query status + Returns: + Color and color state information + SysFS Path Example: + /sys/kernel/pal/led/cur_state/ + color + color_state + Examples: + #./ledutil.py –set + #./ledutil.py –set PSU 1 GREEN SOLID + #./ledutil.py –get + #./ledutil.py –get PSU 1 + PSU1_LED : Green + Color State: Solid + + + + +#### 3.4.6 Sensors + +##### 3.4.6.1 Driver Design +The Linux driver supports LM75/LM90 compatible temperature sensors. It is used to support communication through the I2C bus and interfaces with the hardware monitoring sub-system. A SysFS interface is added to let the user provides the temperature sensors information to the kernel to instantiate I2C devices. + +##### 3.4.6.2 JSON Design +Platform specific temperature sensor configuration file is designed to instantiate I2c devices and provides access information for plugin. These data are grouped into three sections: PLATFORM, I2C Topology and TEMP Data. PLATFORM section provides the number of temperature sensors. I2C Topology section and TEMP Data are used for instantiating I2C devices and accessing temperature sensors vis SysFS attributes. +They are consisted of key/value sections. Each section has a unique name. The table describes the naming convention for each unique key. + +| **Key** | **Description** | +|--------------------------|-------------------------------------| +| PLATFORM | Numbers of temperature sensors | +| TEMP | Temperature sensor. x is an integer starting with 1 | +| MUX | This section is part of I2C topology configuration | + + Samples: + + "PLATFORM" : { "num_temp_sensors":"3"} + "TEMP1" : { "dev_info": { "device_type":"TEMP_SENSOR", "device_name":"TEMP1", "display_name" : "CPU Temp Sensor" }, + "dev_attr": { "index":"0"}, + "i2c": { + "topo_info": { "parent_bus":"0x21", "dev_addr":"0x48", "dev_type":"lm75"}, + "attr_list": + [ + { "attr_name": "temp1_max"}, + { "attr_name": "temp1_max_hyst"}, + { "attr_name": "temp1_input"} + ] + } + } +##### 3.4.6.3 Plugin Design +A generic user space Python plugin is designed to access temperature sensors via SysFS interface. The plugin gets SysFS path information from platform-specific JSON File, pal-device.json. The PddfThermailUtil class provides two APIs to retrieve the number of temp sensors and to show temperature sensor readings. + + Class PddfThermalUtil: + def get_num_thermals(self): + Retrieves the number of temp sensors supported on the device + Return: An integer, the number of temperature sensors on the device + def show_temp_values(self): + Prints out readings from all temp sensors on the device + Return: Boolean, True if reading successfully, False if not + + Example: + Reading Display: + #./thermalutil.py + lm75-i2c-33-48 + CPU Temp Sensor +25.5 C (high = +80.0 C, hyst = +75.0 C) + lm75-i2c-33-49 + CPU Temp Sensor +23.0 C (high = +80.0 C, hyst = +75.0 C) + lm75-i2c-33-4a + CPU Temp Sensor +28.0 C (high = +80.0 C, hyst = +75.0 C) + + SysFS Path Example: + /sys/bus/i2c/devices/i2c-0/i2c-16/i2c-33/33-0048/hwmon/hwmon0/temp1_max + /sys/bus/i2c/devices/i2c-0/i2c-16/i2c-33/33-0048/hwmon/hwmon0/temp1_max_hyst + /sys/bus/i2c/devices/i2c-0/i2c-16/i2c-33/33-0048/hwmon/hwmon0/temp1_input + + +#### 3.4.7 System EEPROM Component + +##### 3.4.7.1 Driver Design +For SYS EEPROM component, PDDF leverages the existing Linux standard driver **at24**. This driver supports multiple variations of the EEPROM viz. 24c00, 24c01, 24c08, 24c16, 24c512 and many more. This driver provides one SysFS device-data attribute named **eeprom**. Since standard driver is being used for EEPROM, PDDF uses user-space command to instantiate the EEPROM device. +Example, +> echo 24c02 0x57 > /sys/bus/i2c/devices/i2c-0/new_device + +##### 3.4.7.2 JSON Design +For SYS EEPROM, the client creation information is present in I2C Topology JSON file. Since the driver is standard and attribute-name is fixed, *eeprom*, there is no component specific JSON file representing the access-data. Example of the SYS EEPROM entry in the topology JSON is mentioned below. +``` +"EEPROM1": { + "dev_info": { + "device_type": "EEPROM", + "device_name": "EEPROM1", + "device_parent": "SMBUS0" + }, + "i2c": { + "topo_info": { + "parent_bus": "0x0", + "dev_addr": "0x57", + "dev_type": "at24" + }, + "dev_attr": {"access_mode": "BLOCK"} + } +}, +``` + +##### 3.4.7.3 Plugin Design +A generic user space plugin is written for EEPROM. Internally it leverages eeprom_base and eeprom_tlvinfo base classes. The SysFS path for driver supported attribute is retrieved from the user provided JSON file. An example of the API definition form eeprom_base is shown below, +``` +def check_status(self): + if self.u != '': + F = open(self.u, "r") + d = F.readline().rstrip() + F.close() + return d + else: + return 'ok' +def set_cache_name(self, name): + # before accessing the eeprom we acquire an exclusive lock on the eeprom file. + # this will prevent a race condition where multiple instances of this app + # could try to update the cache at the same time + self.cache_name = name + self.lock_file = open(self.p, 'r') + fcntl.flock(self.lock_file, fcntl.LOCK_EX) +``` +Generic plugin may provide further initialization steps and definitions for new APIs. + + +#### 3.4.8 System Status Registers + +##### 3.4.8.1 Driver Design +System Status information is present in CPLD registers. These information can be retrieved from CPLD using I2C interface to the CPLDs. Access-data to retrieve *system status* information is provided by the user in a JSON file. A generic driver is written in PDDF to store all the access-data in kernel space, and use it to read information from CPLDs. User space plugin would trigger the retrieval of *system status* data using generic module. The system status data attributes are created under `/sys/kernel/pal/devices/sysstatus/data/`. + +##### 3.4.8.2 JSON Design +An example of object from system status JSON file is shown below, +``` +"SYSSTATUS": { + "attr_list": [ + { "attr_name":"board_info","attr_devaddr":"0x60", "attr_offset":"0x0","attr_mask":"0x1f","attr_len":1}, + { "attr_name":"cpld1_version","attr_devaddr":"0x60","attr_offset":"0x1","attr_mask":"0xff","attr_len":1}, + { "attr_name":"power_module_status","attr_devaddr":"0x60","attr_offset":"0x2","attr_mask":"0x1f","attr_len":1}, + { "attr_name":"system_reset5","attr_devaddr":"0x60","attr_offset":"0x50","attr_mask":"0xff","attr_len":1}, + { "attr_name":"system_reset6","attr_devaddr":"0x60", "attr_offset":"0x51","attr_mask":"0xff","attr_len":1}, + { "attr_name":"system_reset7","attr_devaddr":"0x60","attr_offset":"0x52","attr_mask":"0xff","attr_len":1}, + { "attr_name":"system_reset8","attr_devaddr":"0x60","attr_offset":"0x53","attr_mask":"0xff","attr_len":1}, + { "attr_name":"misc1","attr_devaddr":"0x60","attr_offset":"0x68","attr_mask":"0xff","attr_len":1}, + { "attr_name":"cpld2_version","attr_devaddr":"0x62","attr_offset":"0x1","attr_mask":"0xff","attr_len":1}, + { "attr_name":"interrupt_status","attr_devaddr":"0x62","attr_offset":"0x2","attr_mask":"0xf","attr_len":1}, + { "attr_name":"system_reset","attr_devaddr":"0x62","attr_offset":"0x3","attr_mask":"0x1","attr_len":1}, + { "attr_name":"misc2","attr_devaddr":"0x62","attr_offset":"0x68","attr_mask":"0x2","attr_len":1} + ] +}, +``` +##### 3.4.8.3 Plugin Design +A generic user-space plugin is added to read the system status information. The SysFS path of the info-data attribute is retrieved from the user provided JSON file. Example of the API definition is given below, +``` +def get_system_reset_info(self, str): + Args: + str: Name of the system status data attribute. + Returns: + Integer: Value of the system status data attribute. + +``` +#### 3.4.9 Optics Component +##### 3.4.9.1 Driver design + +Transceiver devices (SFP, QSFP etc.) expose mainly two kinds of access/control methods. + * EEPROM read/write access on linear address space of 256\*128 byte paged memory. + * Device control attributes exposed as control pins via external CPLD/FPGA device. + +In existing implementations, the drivers that access these attributes are very platform dependent. They depend on exactly which CPLD is managing which pin on each SFP device. And that’s different for every switch, even for similar switches from the same vendor. In SONiC, this is handled by having the switch vendor provide that driver, for each platform. + +For eeprom access, OOM based 'optoe' is leveraged which exposes the eeprom as a binary attribute under the device tree. Optoe is independent of switches, because it only depends on I2C, and the EEPROM architecture, both of which are standardized. Linux provides a common I2C interface, which hides the switch dependent addressing to get to the device. + +Each of of these SysFS attributes are distributed across multiple devices based on their implementation. The PDDF generic drivers provide a common interface to initialize and manage all the attributes. PDDF model for optics requires that every given port on switch should be associated with optoe as well as PAL optic driver to expose and support SysFS attributes related to optical transceivers. + +The commonly used Optic SysFS attribute list include: +``` +EEPROM bin +Module presence +Low power mode +Module reset +Rx LOS +Tx Disable +Tx Fault etc.. +``` +**Generic Optic PDDF drivers:** + +PDDF has the following three different drivers for optics. +* Optoe driver +* Optic_pddf_module +* Optic_pddf_driver_module + +#### Optoe driver: +* Responsible for creation of SFP I2C client under the given parent bus. + 1) Devices with one I2C address (eg QSFP) use I2C address 0x50 (A0h) + 2) Devices with two I2C addresses (eg SFP) use I2C address 0x50(A0h) and 0x51(A2h) +* Expose eeprom bin attribute under the same path. + +#### Optic_pddf_module: + * Create the access data attributes to transfer the access data from user space to kernel. + * Populates the access data into SFP client's *platform_data*. (Information collected from the JSON parsing) + * Registers a virtual SFP I2C client with address 0x60 (User defined). +#### Optic_pddf_driver_module: +Driver module has the following functionalities, + * Allocate the memory dynamically to *driver_data* and supported SysFS attributes + * Create the SysFS attributes and link it to Optic_Pal client's kernel object + * Retrieve the SysFS attribute's value from HW and update the *driver_data* + + ##### 3.4.9.2 JSON Design + +Optic JSON is structured to include the access-data for all the supported SysFS attributes. +*attr_list* is an array object which stores the array of access-datas for multiple attributes. Some of these values can be left empty if they are not applicable. + +``` +"PORT1": + { + "dev_info": { "device_type":"SFP/QSFP", "device_name":"PORT1", "device_parent":"MUX4"}, + "dev_attr": { "dev_idx":"1"}, + "i2c": + { + "interface": + [ + { "itf":"eeprom", "dev":"PORT1-EEPROM" }, + { "itf":"control", "dev":"PORT1-CTRL" } + ] + }, + }, + "PORT1-EEPROM": + { + "dev_info": { "device_type":"", "device_name":"PORT1-EEPROM", "device_parent":"MUX4", "virt_parent":"PORT1"}, + "i2c": + { + "i2c_info": { "parent_bus":"0x40", "dev_addr":"0x50", "dev_type":"optoe1"}, + "attr_list": + [ + { "attr_name":"eeprom"} + ] + } + }, + "PORT1-CTRL": + { + "dev_info": { "device_type":"", "device_name":"PORT1-CTRL", "device_parent":"MUX4", "virt_parent":"PORT1"}, + "i2c": + { + "i2c_info": { "parent_bus":"0x40", "dev_addr":"0x53", "dev_type":"pal_xcvr"}, + "attr_list": [ + { + "attr_name":"xcvr_present", + "attr_devaddr":"0x60", + "attr_devtype":"cpld", + "attr_offset":"0x2", + "attr_mask":"0x2", + "attr_cmpval":"0x0", + "attr_len":"1" + }, + { + "attr_name":"xcvr_lp_mode", + "attr_devaddr":"0x60", + "attr_devtype":"cpld", + "attr_offset":"0x30", + "attr_mask":"0x0", + "attr_len":"9" + } + ... + } + } + +``` + +##### 3.4.9.3 Plugin design + +SfpUtilBase is the base Optic plugin class, which defines various APIs to get/set information from the optic transceivers. PDDF generic plugin shall extend from SfpUtilBase and implement the platform specific APIs, using the platform specific information in the JSON descriptor files. + +Example, +``` +def get_presence(self, portNum): + """ + + """ + return 0 +``` + +#### 3.4.10 lm-sensors Tools +lm-sensors package (Linux monitoring sensors) provides tools and drivers for monitoring temperatures, voltages, and fan speeds via command line. It can monitoring hardware such the LM75 and LM78. These tools are described below. These tools would continue to work with PDDF framework too. + +##### 3.4.10.1 sensors.conf +/etc/sensors.conf is a user customized configuration file for libsensors. It describes how libsensors, and so all programs using it, should translate the raw readings from the kernel modules to real-world values. A user can configure each chip, feature and sub-feature that makes sense for his/her system. + + Example: + sensors.conf: + bus "i2c-3" "i2c-1-mux (chan_id 1)" + chip "lm75-i2c-3-49" + label temp1 "Temp Sensor" + set temp1_max 60 + set temp1_max_hyst 56 + + admin@sonic:~$ sensors + lm75-i2c-3-49 + Adapter: i2c-1-mux (chan_id 1) + Temp Sensor: +27.0 C (high = +60.0 C, hyst = +56.0 C) + +This would continue to be supported using PDDF driver framework as well. + +##### 3.4.10.2 fancontrol: + +fancontrol is a shell script for use with lm_sensors. It reads its configuration from a file, /etc/fancontrol, then calculates fan speeds from temperatures and sets the corresponding PWM outputs to the computed values. + + # fancontrol /etc/fancontrol + + Example of configuration file + INTERVAL=10 + FCTEMPS=/sys/bus/i2c/devices/2-0066/pwm1=/sys/bus/i2c/devices/2-0066/sys_temp + FCFANS=/sys/bus/i2c/devices/2-0066/pwm1=/sys/bus/i2c/devices/2-0066/fan1_input /sys/bus/i2c/devices/2-0066/pwm1=/sys/bus/i2c/devices/2-0066/fan2_input /sys/bus/i2c/devices/2-0066/pwm1=/sys/bus/i2c/devices/2-0066/fan3_input /sys/bus/i2c/devices/2-0066/pwm1=/sys/bus/i2c/devices/2-0066/fan4_input /sys/bus/i2c/devices/2-0066/pwm1=/sys/bus/i2c/devices/2-0066/fan5_input /sys/bus/i2c/devices/2-0066/pwm1=/sys/bus/i2c/devices/2-0066/fan6_input /sys/bus/i2c/devices/2-0066/pwm1=/sys/bus/i2c/devices/2-0066/fan11_input /sys/bus/i2c/devices/2-0066/pwm1=/sys/bus/i2c/devices/2-0066/fan12_input /sys/bus/i2c/devices/2-0066/pwm1=/sys/bus/i2c/devices/2-0066/fan13_input /sys/bus/i2c/devices/2-0066/pwm1=/sys/bus/i2c/devices/2-0066/fan14_input /sys/bus/i2c/devices/2-0066/pwm1=/sys/bus/i2c/devices/2-0066/fan15_input /sys/bus/i2c/devices/2-0066/pwm1=/sys/bus/i2c/devices/2-0066/fan16_input + MINTEMP=/sys/bus/i2c/devices/2-0066/pwm1=135 + MAXTEMP=/sys/bus/i2c/devices/2-0066/pwm1=160 + MINSTART=/sys/bus/i2c/devices/2-0066/pwm1=100 + MINSTOP=/sys/bus/i2c/devices/2-0066/pwm1=32 + MINPWM=/sys/bus/i2c/devices/2-0066/pwm1=32 + MAXPWM=/sys/bus/i2c/devices/2-0066/pwm1=69 + +The SysFS paths should be given as per the PDDF I2C topology description and the attributes. + +## 4 SAI +Not applicable + +## 5 CLI + +SONiC provides various platform related CLIs to manage various platform devices. Some of the existing CLIs are: + + - psuutil + - sfputil + - show interface status + - show interface transceiver eeprom + - show interface transceiver presence + - decode-syseeprom + - show reboot-cause + - show platform summary + - show environment + +In addition, the following CLI utils will also be added. + +### 5.1 PDDF_PSUUTIL +``` +root@sonic:/home/admin# pddf_psuutil +Usage: pddf_psuutil [OPTIONS] COMMAND [ARGS]... + + pddf_psuutil - Command line utility for providing PSU status + +Options: + --help Show this message and exit. + +Commands: + numpsus Display number of supported PSUs on device + status Display PSU status + version Display version info + info Display PSU manufacture and running info +root@sonic:/home/admin# +``` +Example output of the above commands, +``` +root@sonic:/home/admin# pddf_psuutil numpsus +Total number of PSUs: 2 +root@sonic:/home/admin# +root@sonic:/home/admin# pddf_psuutil info +PSU1: Power Not OK + + +PSU2: Power OK +Manufacture Id: 3Y POWER +Model: YM-2851F +Serial Number: SA070U461826011272 +Fan Direction: FAN_DIRECTION_INTAKE +root@sonic:/home/admin# +``` +### 5.2 PDDF_FANUTIL +``` +root@sonic:/home/admin# pddf_fanutil +Usage: pddf_fanutil [OPTIONS] COMMAND [ARGS]... + + pddf_fanutil- Command line utility for providing FAN info + +Options: + --help Show this message and exit. + +Commands: + numfans Display number of supported FANs on device + direction Display FAN status + getspeed Display FAN speeds + setspeed Set FAN speed +root@sonic:/home/admin# +``` +Example output of the above commands, +``` +root@sonic:/home/admin# pddf_fanutil direction +FAN-1 direction is FAN_DIRECTION_INTAKE +FAN-2 direction is FAN_DIRECTION_INTAKE +FAN-3 direction is FAN_DIRECTION_INTAKE +FAN-4 direction is FAN_DIRECTION_INTAKE +FAN-5 direction is FAN_DIRECTION_INTAKE +FAN-6 direction is FAN_DIRECTION_INTAKE +root@sonic:/home/admin# +root@sonic:/home/admin# pddf_fanutil getspeed + +FAN_INDEX FRONT_RPM REAR_RPM +FAN-1 12200 10200 +FAN-2 12400 10400 +FAN-3 12200 10300 +FAN-4 12300 10400 +FAN-5 12400 10500 +FAN-6 12600 10500 +root@sonic:/home/admin# +root@sonic:/home/admin# pddf_fanutil setspeed 100 +New Fan Speed: 100% + +FAN_INDEX FRONT_RPM REAR_RPM +FAN-1 21100 17200 +FAN-2 21100 18000 +FAN-3 20700 17800 +FAN-4 20800 18200 +FAN-5 20300 18100 +FAN-6 20600 18100 +root@sonic:/home/admin# +``` +### 5.3 PDDF_LEDUTIL +``` +#./pddf_ledutil.py –set +#./pddf_ledutil.py –set PSU 1 GREEN SOLID +#./pddf_ledutil.py –get +#./pddf_ledutil.py –get PSU 1 + PSU1_LED : Green + Color State: Solid +``` +### 5.4 PDDF_Thermalutil + Example: + Reading Display: + #./thermalutil.py + lm75-i2c-33-48 + CPU Temp Sensor +25.5 C (high = +80.0 C, hyst = +75.0 C) + lm75-i2c-33-49 + CPU Temp Sensor +23.0 C (high = +80.0 C, hyst = +75.0 C) + lm75-i2c-33-4a + CPU Temp Sensor +28.0 C (high = +80.0 C, hyst = +75.0 C) + + +## 6 Serviceability and DEBUG + +### Debug Utils + - pal-parse --create + - Create I2C topology script to verify + - pal-parse --load + - Create led population information to verify platform data population + - lsmod | grep -i pddf + - Check if all pddf modules are loaded correctly + - systemctl | grep -i pddf + - Check if pddf platform service is running + - pddf_fanutil debug dump_sysfs + - Dump all Fan related SysFS path and attributes + - pddf_psuutil debug dump_sysfs + - Dump all PSU related SysFS path and attributes + - pddf_ledutil debug dump_sysfs + - Dump all LED related SysFS path and attributes + - pddf_sfputil debug dump_sysfs + - Dump all Optics related SysFS path and attributes + - pddf_eepromutil debug dump_sysfs + - Dump all EEPROM related SysFS path and attributes + - sysutil debug dump_sysfs + - Dump all System Status register related SysFS path and attributes +### Debug logs +All the logs can be found under /var/log/pddf. + + +## 7 Warm Boot Support +Platform service restart should be supported without having to reboot the device + +## 8 Scalability +NA +## 9 Unit Test +Generic unit tests are listed below. These should be extended to all the components where they are applicable. +1. JSON descriptor file - Schema validation and error detection +2. Test descriptor file Parsing +3. Check if all the modules are loaded successfully. +4. Check if the device-data SysFS attributes are created successfully. +5. Provide wrong data in descriptor files and check if the errors are handled properly. +6. Test to check for kernel memory leak. +7. Remove the driver and ensure that all the device artifacts, memory etc are cleaned up. +8. Stop and start of the platform service. diff --git a/doc/platform/pde.md b/doc/platform/pde.md new file mode 100644 index 0000000000..a6a5735bbc --- /dev/null +++ b/doc/platform/pde.md @@ -0,0 +1,179 @@ +# Feature Name +SONiC Platform Development Environment (PDE) +# High Level Design Document +#### Rev 1.2 + +# Table of Contents + * [List of Tables](#list-of-tables) + * [Revision](#revision) + * [About This Manual](#about-this-manual) + * [Scope](#scope) + * [Definition/Abbreviation](#definitionabbreviation) + +# List of Tables +[Table 1: Abbreviations](#table-1-abbreviations) + +# Revision +| Rev | Date | Author | Change Description | +|:---:|:-----------:|:------------------:|----------------------------------------------------------------------------------| +| 1.2 | 06/24/2019 | Bill Schwartz | Updated document and images to reflect new build infrastructure changes | +| 1.1 | 06/17/2019 | Bill Schwartz | Rename Platform Development Kit (PDK) to Platform Development Environment (PDE) | +| 0.1 | 05/17/2019 | Bill Schwartz | Initial version | + +# About this Manual +This document provides general information about the SONiC Platform Development Environment (PDE). The SONiC PDE is part of the SONiC Platform Development Kit (PDK) which optimizes platform development. The SONiC PDK consists of: + +-PDDF (Platform Driver Development Framework): For optimized data-driven platform driver and SONiC plugin development. The PDDF details are covered in a separate document. + +-PDE (Platform Development Environment): For optimized build and testing of platform and SAI code. + +The SONiC PDE is generated from the SONiC "sonic-buildimage" repository and is intended to provide ODM and external customers with a means to quickly add, compile and test their platform drivers and static device data required for a fully functional SONiC distribution. + +# Scope +This document describes the high level design details on how the SONiC PDE is constructed as well as details on the PDE test suite. The PDE is available to ODMs and others looking to add new platform support, and it optimizes the development and qualification process. It offers a pre-canned, minimal code package to which the ODM can add their necessary platform driver files and static configuration files (required by SONiC to properly initialize SAI and the switching silicon). Furthermore, the PDE will provide a test suite where platform developers can quickly test their drivers and configuration files to resolve issues more easily without relying on the full SONiC application and infrastructure to be in place. + + +# Definition/Abbreviation +### Table 1: Abbreviations +| **Term** | **Meaning** | +|--------------------------|---------------------------------------| +| ODM | Original Design Manufacturer | +| PDE | Platform Development Environment | +| PDK | Platform Development Kit | +| PDDF | Platform Driver Development Framework | + +# 1 Requirements Overview + +## 1.1 Base SONiC PDE Requirements + +The requirements for the SONiC PDE are: +1. The SONiC PDE repository is generated from the "sonic-buildimage" repository. +2. The SONiC PDE source code tree is generated by a "make initpde" command and generates a codebase that can be provided to an ODM for development. +3. The SONiC PDE is packaged with all existing SONiC supported platforms. +4. The SONiC PDE build generates a fully functional ONIE installable image supporting any of the existing included ODM platforms as well as new platforms added. +5. The PDE is designed to be used on a typical software developer system or virtual machine and does not require a more powerful build server. + The minimum requirement for a SONiC PDE build system is a Ubuntu 16.04 LTS (VM or dedicated system), with 8GB of RAM and at least 4 CPU cores. +6. A new PDE SONiC container is created during the SONiC PDE build process. +7. All new unit tests (scripts, binaries, etc) which are generated by or for the PDE reside in the PDE container. +8. All platform drivers and device files will reside in their original SONiC locations as part of the PDE build process. +9. The PDE uses the base SONiC build generated Linux kernel, Debian bootstrap image, SAI, root filesystem, and other pre-compiled binaries necessary for booting the PDE ONIE image. +10. All PDE makefiles and associated supporting files reside in the base SONiC repository as part of the "sonic-buildimage" repository and will not interfere or execute as part of the normal SONiC build process. +11. The PDE supports the existing SONiC porting guide for adding new platform support. Please refer to [link](https://github.com/Azure/SONiC/wiki/Porting-Guide) for more information. + +As the PDE New platform changes added in the PDE are seamlessly integrated into the SONiC base repository. + +## 1.2 SONiC PDE Test Suite Requirements + +1. Test and verify that the ODM provided configuration files (config.bcm, port_config.ini, sai.profile) initialize the ports through SAI accordingly. +2. Test and verify that all ports (including expandable or flex ports) will link up and pass L2 traffic. +3. Test and verify all front panel port LEDS work according to hardware specification / customer requirements. +4. Test and verify all system LEDS (power, attention, locate, other) work according to hardware specification / customer requirements. +5. Test and verify platform specific pre-emphasis settings are programmed properly in internal / external PHYs according to the media type, cable length, etc according to the hardware specification. +6. Test and verify that power mode for QSFP transceivers can be configured properly for high or lower power operation. +7. Test and verify that Forward Error Correction (FEC) is properly enabled based on inserted transceiver / DAC. +8. Test and verify that the platform drivers load without error both in switch OS reboot conditions and full AC power cycles. +9. Test and verify that system EEPROM contents can be read without error. +10. Test and verify that transceiver EEPROM contents for all ports can be read without error. +11. Test and verify that warmboot, when supported by the underlying switching silicon, works without error on the platform. +12. Test and verify SONiC first boot behavior specifically vs secondary boot process. Specifically ensure that all platform drivers, configuration files, and other packages are installed without error. +13. Test and verify that ODM and customer supported optical modules as well as DACs can operate properly within SONiC. +14. Test and verify that platform drivers can be loaded and unloaded without requiring a reboot. +15. All PDE tests will run at the platform layer plugin / sysfs layer. These tests are targeted to ensure the ODM added plugin scripts and drivers interact properly and will work when interfacing to the higher level SONiC application. + + +### 1.1.2 Configuration and Management Requirements +None + +### 1.1.3 Scalability Requirements +None + +### 1.1.4 Warm Boot Requirements +Warmboot is an important feature in SONiC and a requirement for many datacenter customers. All platforms and their platform drivers are tested as part of the PDE test suite to ensure they do not break the warmboot feature. The PDE test suite checks to ensure that traffic is not lost during a warm boot. + +## 1.2 Design Overview +### 1.2.1 Basic Approach +The PDE source code base is generated from a successfully compiled SONiC build and will contain all the pre-built binaries, scripts, and support infrastructure needed to create a lightweight development infrastructure for platform development. The PDE will be provided to ODMs and new customers looking to add platform support, where they can add their necessary platform driver files and static configuration files required by SONiC to properly initialize SAI and the switching silicon. Furthermore, the PDE will provide a test suite where platform developers can quickly test their drivers and configuration files to resolve issues more easily without relying on the full SONiC application and infrastructure to be in place. + + +### 1.2.2 Container +The PDE build creates a new "docker-pde" container which contains all necessary scripts, binaries, and static configuration data needed to support the PDE and the PDE test suite. Platform drivers and device configuration files remain in their existing locations within the SONiC build and runtime filesystem. + +# 2 Functionality +## 2.1 Target Deployment Use Cases + +The PDE does not target any type of feature deployment within SONiC. The primary use case is to enable an ODM or customer to quickly add new platform support and run a test suite to ensure that it is compatible with the full SONiC application. + +As seen in the diagram below, the PDE consists of a subset of the full SONiC build. As the SONiC control plane is not needed for platform validation, it is replaced with a PDE test harness which focuses on testing and validating the platform as well as basic functionality of the switching silicon. The PDE is intended to validate and qualify the hardware platform such that it is seamlessly integrated into SONiC where the full function application can be used on the platform. + +![PDE](../../images/platform/sonicpdeoverview.png) + + +## 2.2 Functional Description + +The SONiC build process has build times that exceed an hour for a full build. Furthermore, the build system itself has to have at least 50GB of disk storage and multiple CPU cores available for maximum performance. From an ODM perspective, adding platform driver support, static configuration necessary for SONiC and SAI, and platform plugins is not a complicated process. The PDE utilizes the pre-compiled binaries (kernel, necessary file system, etc) that were generated in a full SONiC build, and then only compiles and packages the new platform drivers that an ODM adds. This speeds up the build time tremendously and allow for quick turnaround for platform development. + +Furthermore, platform developer / enablers do not need to be concerned with a majority of the SONiC application and features. The PDE creates and packages a set of automated and manual tests that ensure the platform drivers are stable and work as required for full SONiC operation, and the switch configuration is correct and allows full traffic flow across the supported port configurations. + +### 2.2.3 SAI Overview +The PDE incorporates the necessary SAI / SDK version for the ODM to bring up the ASIC on the target platform. It is assumed that the base SAI support for the switching silicon has already been tested and verified. + +# 3 Design +## 3.1 Overview + +The below diagram shows the build process for generating the initial PDE repository. + +![PDE](../../images/platform/initialcreationflow.png) + +The below diagram depicts the normal sonic build running with its associated docker containers. The PDE container is not included / active in a regular SONiC build. + +![PDE](../../images/platform/sonicbuildcontainers.png) + +The below diagram shows the PDE build containing the PDE container. In a PDE build, this container is active and responsible for interacting with the switching silicon and platform drivers/hardware. There is no dependency on the remaining SONiC containers in the PDE test environment. + +![PDE](../../images/platform/pdebuildcontainers.png) + +## 3.2 DB Changes +### 3.2.1 CONFIG DB +N/A +### 3.2.2 APP DB +N/A +### 3.2.3 STATE DB +N/A +### 3.2.4 ASIC DB +N/A +### 3.2.5 COUNTER DB +N/A + +## 3.3 Switch State Service Design +### 3.3.1 Orchestration Agent +N/A +### 3.3.2 Other Process +N/A + +## 3.4 SyncD +N/A + +## 3.5 SAI +No new SAI APIs are expected to be required to support the PDE. + +## 3.6 CLI + +The PDE has no dependency on the SONiC CLI. + +# 4 Flow Diagrams +N/A + +# 5 Error Handling +N/A + +# 6 Serviceability and Debug +N/A + +# 7 Warm Boot Support +N/A + +# 8 Scalability +N/A + +# 9 Unit Test +The unit test cases provided by the PDE will cover the detailed requirements list from section 1.2 above. diff --git a/doc/pmon/sonic_platform_test_plan.md b/doc/pmon/sonic_platform_test_plan.md index bb433d5420..5d2891dd6c 100644 --- a/doc/pmon/sonic_platform_test_plan.md +++ b/doc/pmon/sonic_platform_test_plan.md @@ -6,17 +6,19 @@ - [1.4 Check xcvrd information in DB](#14-check-xcvrd-information-in-db) - [1.5 Sequential syncd/swss restart](#15-sequential-syncdswss-restart) - [1.6 Reload configuration](#16-reload-configuration) - - [1.7 COLD/WARM/FAST reboot](#17-coldwarmfast-reboot) + - [1.7 COLD/WARM/FAST/POWER OFF/WATCHDOG reboot](#17-coldwarmfastpower-offwatchdog-reboot) - [1.8 Check thermal sensors output using new OPTIC cables](#18-check-thermal-sensors-output-using-new-optic-cables) - [1.9 Manually plug in and pull out PSU modules](#19-manually-plug-in-and-pull-out-psu-modules) - [1.10 Manually plug in and pull out PSU power cord](#110-manually-plug-in-and-pull-out-psu-power-cord) - [1.11 Manually plug in and pull out FAN modules](#111-manually-plug-in-and-pull-out-fan-modules) - [1.12 Manually plug in and pull out optical cables](#112-manually-plug-in-and-pull-out-optical-cables) + - [1.13 Check platform daemon status](#113-check-platform-daemon-status) - [Mellanox Specific Test Cases](#mellanox-specific-test-cases) - [2.1 Ensure that the hw-management service is running properly](#21-ensure-that-the-hw-management-service-is-running-properly) - [2.2 Check SFP using ethtool](#22-check-sfp-using-ethtool) - [2.3 Check SYSFS](#23-check-sysfs) - [2.4 Verify that `/var/run/hw-management` is mapped to docker pmon](#24-verify-that-varrunhw-management-is-mapped-to-docker-pmon) + - [2.5 Check SFP presence](#25-check-sfp-presence) - [Automation Design](#automation-design) - [Folder Structure and Script Files](#folder-structure-and-script-files) - [Scripts to be implemented in phase1](#scripts-to-be-implemented-in-phase1) @@ -24,6 +26,7 @@ - [Helper scripts](#helper-scripts) - [Vendor specific steps](#vendor-specific-steps) +2.5 Check SFP presence # Introduction This test plan is to check the functionalities of platform related software components. These software components are for managing platform hardware, including FANs, thermal sensors, SFP, transceivers, pmon, etc. @@ -354,14 +357,26 @@ New automation required ### Automation Partly covered by existing automation. New automation required. -## 1.7 COLD/WARM/FAST reboot +## 1.7 COLD/WARM/FAST/POWER OFF/WATCHDOG reboot ### Steps -* Perform cold/warm/fast reboot +* Perform cold/warm/fast/power off/watchdog reboot + * cold/warm/fast reboot + * Make use of commands to reboot the switch + * watchdog reboot + * Make use of new platform api to reboot the switch + * power off reboot + * Make use of PDUs to power on/off DUT. + * Power on/off the DUT for (number of PSUs + 1) * 2 times + * Power on each PSU solely + * Power on all the PSUs simultaneously + * Delay 5 and 15 seconds between powering off and on in each test * After reboot, check: * status of services: syncd, swss * `sudo systemctl status syncd` * `sudo systemctl status swss` + * reboot cause: + * `show reboot-cause` * status of hw-management - **Mellanox specific** * `sudo systemctl status hw-management` * status of interfaces and port channels @@ -375,6 +390,7 @@ Partly covered by existing automation. New automation required. ### Pass/Fail Criteria * After reboot, status of services, interfaces and transceivers should be normal: * Services syncd and swss should be active(running) + * Reboot cause should be correct * Service hw-management should be active(exited) - **Mellanox specific** * All interface and port-channel status should comply with current topology. * All transcevers of ports specified in lab connection graph (`ansible/files/lab_connection_graph.xml`) should present. @@ -564,6 +580,21 @@ Expected results of checking varous status: ### Automation Manual intervention required, not automatable +## 1.13 Check platform daemon status + +This test case will check the all daemon running status inside pmon(ledd no included) if they are supposed to to be running on this platform. +* Using command `docker exec pmon supervisorctl status | grep {daemon}` to get the status of the daemon + +Expected results of checking daemon status: +* the status of the daemon should be `RUNNING` + +### Steps +* Get the running daemon list from the configuration file `/usr/share/sonic/device/{platform}/{hwsku}/pmon_daemon_control.json` +* Check all the daemons running status in the daemon list + +### Pass/Fail Criteria +* All the daemon status in the list shall be `RUNNING` + # Mellanox Specific Test Cases ## 2.1 Ensure that the hw-management service is running properly @@ -675,6 +706,8 @@ New automation required ### Pass/Fail Criteria * Verify that symbolic links are created under `/var/run/hw-management`. Ensure that there are no invalid symbolic link +* Check current FAN speed against max and min fan speed, also check the the fan speed tolerance, insure it's in the range +* Check thermal valules(CPU, SFP, PSU,...) against the max and min value to make sure they are in the range. ### Automation New automation required @@ -691,6 +724,25 @@ New automation required ### Automation New automation required +## 2.5 Check SFP presence + +### Steps +* Get all the connected interfaces +* Check the presence of the SFP on each interface and corss check the SFP status from the sysfs + +### Pass/Fail Criteria +* All th SFP shall be presence and SFP status shall be OK. + +### Steps +* Go to docker pmon: `docker exec -it pmon /bin/bash` +* Go to `/var/run` of docker container, verify that host directory `/var/run/hw-management` is mapped to docker `pmon` + +### Pass/Fail Criteria +* Host directory `/var/run/hw-management` should be mapped to docker pmon + +### Automation +New automation required + # Automation Design This section outlines the design of scripts automating the SONiC platform test cases. The new pytest-ansible framework will be used. Sample code can be found [here](https://github.com/Azure/sonic-mgmt/tree/master/tests). diff --git a/doc/RIF_counters.md b/doc/rif-counters/RIF_counters.md similarity index 100% rename from doc/RIF_counters.md rename to doc/rif-counters/RIF_counters.md diff --git a/doc/sflow/sflow_hld.md b/doc/sflow/sflow_hld.md new file mode 100644 index 0000000000..9679b40935 --- /dev/null +++ b/doc/sflow/sflow_hld.md @@ -0,0 +1,633 @@ +# sFlow High Level Design +### Rev 1.1 +## Table of Contents + +## 1. Revision +Rev | Rev Date | Author | Change Description +---------|--------------|-----------|------------------- +|v0.1 |05/01/2019 |Padmanabhan Narayanan | Initial version +|v0.2 |05/20/2019 |Padmanabhan Narayanan | Updated based on internal review comments +|v0.3 |06/11/2019 |Padmanabhan Narayanan | Update CLIs, remove sflowcfgd +|v0.4 |06/17/2019 |Padmanabhan Narayanan | Add per-interface configurations, counter mode support and
unit test cases. Remove genetlink CLI +|v0.5 |07/15/2019 |Padmanabhan Narayanan | Update CLI and DB schema based on comments from InMON :
Remove max-datagram-size from collector config
Add CLI for counter polling interval
Remvoe default header-size
Add "all" interfaces option
Separate CLI to set agent-id
+|v1.0 |09/13/2019 |Sudharsan | Updating sequence diagram for various CLIs +|v1.1 |10/23/2019 |Padmanabhan Narayanan | Update SAI section to use SAI_HOSTIF_ATTR_GENETLINK_MCGRP_NAME instead of ID. Note on genetlink creation. Change admin_state values to up/down instead of enable/disable to be consistent with management framework's sonic-common.yang. + +## 2. Scope +This document describes the high level design of sFlow in SONiC + +## 3. Definitions/Abbreviations + +Definitions/Abbreviation|Description +------------------------|----------- +SAI| Switch Abstraction Interface +NOS| Network Operating System +OID| OBject Identifier + +## 4. Overview + +sFlow (defined in https://sflow.org/sflow_version_5.txt) is a standard-based sampling technology the meets the key requirements of network traffic monitoring on switches and routers. sFlow uses two types of sampling: + +* Statistical packet-based sampling of switched or routed packet flows to provide visibility into network usage and active routes +* Time-based sampling of interface counters. + +The sFlow monitoring system consists of: + + * sFlow Agents that reside in network equipment which gather network traffic and port counters and combines the flow samples and interface counters into sFlow datagrams and forwards them to the sFlow collector at regular intervals over a UDP socket. The datagrams consist of information on, but not limited to, packet header, ingress and egress interfaces, sampling parameters, and interface counters. A single sFlow datagram may contain samples from many flows. + * sFlow collectors which receive and analyze the sFlow data. + + sFlow is an industry standard, low cost and scalable technique that enables a single analyzer to provide a network wide view. + +## 5. Requirements + +sFlow will be implemented in multiple phases: + +### **Phase I:** + +1. sFlow should be supported on physical interfaces. +2. sFlow should support 2 sFlow collectors. +3. sFlow collector IP can be either IPv4 or IPv6. +4. sFlow collector can be reachable via + 1. Front end port + 2. Management port +6. Default sFlow sample size should be set to 128 bytes. +7. Support sFlow related + 1. CLI show/config commands + 2. syslogs +8. sFlow counter support needed and config to change polling interval. + +### **Phase II:** +1. Speed based sample rate setting (config sflow sample-rate speed...) +2. sFlow should be supported on portchannel interfaces. +2. Enhance CLI with session support (i.e create sessions add interfaces to specific sessions) +3. SNMP support for sFlow. + +### **Phase III:** +1. sFlow extended switch support. +2. sFlow extended router support. + +### Not planned to be supported: +1. Egress sampling support. +2. sFlow backoff mechanism (Drop the packets beyond configured CPU Queue rate limit). +3. sFlow over vlan interfaces. + +## 6. Module Design + +### 6.1 **Overall design** +The following figure depicts the sFlow container in relation to the overall SONiC architecture: + +![alt text](../../images/sflow/sflow_architecture.png "SONiC sFlow Architecture") + +The CLI is enhanced to provide configuring and display of sFlow parameters including sflow collectors, agent IP, sampling rate for interfaces. The CLI configurations currently only interact with the CONFIG_DB. + +The newly introduced sflow container consists of: +* An instantiation of the InMon's hsflowd daemon (https://github.com/sflow/host-sflow described in https://sflow.net/documentation.php). The hsflowd is launched as a systemctl service. The host-sflow is customised to interact with SONiC subsystems by introducing a host-sflow/src/Linux/mod_sonic.c (described later) +* sflowmgrd : updates APP DB sFlow tables based on config updates + +The swss container is enhanced to add the following component: +* sfloworch : which subscribes to the APP DB and acts as southbound interface to SAI for programming the SAI_SAMPLEPACKET sessions. +* copporch : Copporch gets the genetlink family name and multicast group from copp.json file, programs the SAI genetlink attributes and associates it with trap group present for sflow in copp.json + +The syncd container is enhanced to support the SAI SAMPLEPACKET APIs. + +The ASIC drivers need to be enhanced to: +* Associate the SAI_HOSTIF_TRAP_TYPE_SAMPLEPACKET to a specific genetlink channel and multicast group. +* Punt trapped samples to this genetlink group + +The sflow container and changes to the existing components to support sflow are described in the following sections. + +### 6.2 **Configuration and control flow** +The following figure shows the configuration and control flows for sFlow: + +![alt text](../../images/sflow/sflow_config_and_control.png "SONiC sFlow Configuration and Control") + +1. The user configures the sflow collector, agent, sampling related parameters (interfaces to be sampled and rate) and these configurations are added to the CONFIG DB. +2. The copporch (based on swssconfig/sample/00-copp.config.json) calls a SAI API that enables the ASIC driver to map the SAI_HOSTIF_TRAP_TYPE_SAMPLEPACKET trap to the specific genetlink channel and multicast group. The SAI driver creates the genetlink family and multicast group which will eventually be used to punt sFlow samples to hsflowd. If the SAI implementation uses the psample kernel driver (https://github.com/torvalds/linux/blob/master/net/psample/psample.c), the genetlink family "psample" and multicast group "packets" that the psample driver creates is to be used. +3. The sflowmgrd daemon watches the CONFIG DB's SFLOW_COLLECTOR table and updates the /etc/hsflowd.conf which is the configuration file for hsflowd. Based on the nature of changes, the sflowmgrd may restart the hsflowd service. The hsflowd service uses the collector, UDP port and agent IP information to open sockets to reach the sFlow collectors. +4. When hsflowd starts, the sonic module (mod_sonic) registered callback for packetBus/HSPEVENT_CONFIG_CHANGED opens a netlink socket for packet reception and registers an sflow sample handler over the netlink socket (HsflowdRx()). +5. Sampling rate changes are updated in the SFLOW table. The sflowmgrd updates sampling rate changes into SFLOW_TABLE in the App DB. The sfloworch subagent in the orchagent container processes the change to propagate as corresponding SAI SAMPLEPACKET APIs. + +Below figures explain the flow for different commands from CLI to SAI + +![alt text](../../images/sflow/sflow_enable.png "SONiC sFlow Enable command") + +![alt text](../../images/sflow/sflow_disable.png "SONiC sFlow Disable command") + +![alt text](../../images/sflow/sflow_intf_disable_all.png "SONiC Interface disable all command") + +![alt text](../../images/sflow/sflow_intf_disable.png "SONiC Interface enable/disable command") + +![alt text](../../images/sflow/sflow_intf_rate.png "SONiC Interface rate set command") + + +### 6.3 **sFlow sample path** +The following figure shows the sFlow sample packet path flow: + +![alt text](../../images/sflow/sflow_sample_packet_flow.png "SONiC sFlow sample packet flow") + +1. The ASIC (DMAs) an sflow sample and interrupts the ASIC driver +2. The ASIC driver ascertains that this is sample buffer that has been received as a result of sflow sampling being enabled for this interface. +3. The ASIC driver checks that SAI_HOSTIF_TRAP_TYPE_SAMPLEPACKETs are associated with a specific genetlink channel name and group. the ASIC driver encapsulates the sample in a genetlink buffer and adds the following netlink attributes to the sample : IIFINDEX, OIFINDEX, ORIGSIZE, SAMPLE, SAMPLE RATE. The genetlink buffer is sent via genlmsg_multicast(). +4. The hsflowd daemon's HsflowdRx() is waiting on the specific genetlink family name's multicast group id and receives the encapsulated sample. The HsflowdRx parses and extracts the encapsulated sflow attributes and injects the sample to the hsflowd packet thread using takeSample(). +5. The hsflowd packet thread accumulates sufficient samples and then constructs an sFlow UDP datagram and forwards to the configured sFlow collectors. + +### 6.4 **sFlow counters** + +The sFlow counter polling interval is set to 20 seconds. The pollBus/HSPEVENT_UPDATE_NIO callback caches the interface SAI OIDs during the first call by querying COUNTER_DB:COUNTERS_PORT_NAME_MAP. It periodically retrieves the COUNTER_DB interface counters and fills the necessary counters in hsflowd's SFLHost_nio_counters. + +### 6.5 **CLI** + +#### sFlow utility interface +* sflow [options] {config | show} ... + + An sflow utility command is provided to operate with sflow configuration + Also, the **config** and **show** commands would be extended to include the sflow option. + +#### Config commands + +* **sflow collector add** *{collector-name {ipv4-address | ipv6-address}} [**port** {number}]* + + Where: + * name is the unique name of the sFlow collector + * ipv4-address : IP address of the collector in dotted decimal format for IPv4 + * ipv6-address : x: x: x: x::x format for IPv6 address of the collector (where :: notation specifies successive hexadecimal fields of zeros) + * port (OPTIONAL): specifies the UDP port of the collector (the range is from 0 to 65535. The default is 6343.) + + Note: + * A maximum of 2 collectors is allowed. + + * **sflow collector del** *{collector-name}* + + Delete the sflow collector with the given name + +* **sflow agent-id ** *{interface-name}* + + Where: + * agent-id: specify the interface name whose ipv4 or ipv6 address will be used as the agent-id in sFlow datagrams. + + Note: + * This setting is global (applicable to both collectors) and optional. Only a single agent-id is allowed. If agent-id is not specified (with this CLI), an appropriate IP that belongs to the switch is used as the agent-id based on some simple heuristics. + +* **sflow ** + + Globally, sFlow is disabled by default. When sFlow is enabled globally, the sflow deamon is started and sampling will start on all interfaces which have sFlow enabled at the interface level (see “config sflow interface…”). +When sflow is disabled globally, sampling is stopped on all relevant interfaces and sflow daemon is stopped. + +* **sflow interface ** *<{interface-name}|**all**>* + + Enable/disable sflow at an interface level. By default, sflow is enabled on all interfaces at the interface level. Use this command to explicitly disable sFlow for a specific interface. An interface is sampled if sflow is enabled globally as well as at the interface level. + + The “all” keyword is used as a convenience to enable/disable sflow at the interface level for all the interfaces. + +* **sflow interface sample-rate** *{interface-name} {value}* + + Configure the sample-rate for a specific interface. + + The default sample rate for any interface is (ifSpeed / 1e6) where ifSpeed is in bits/sec. So, the default sample rate based on interface speed is: + + * 1-in-1000 for a 1G link + * 1-in-10,000 for a 10G link + * 1-in-40,000 for a 40G link + * 1-in-50,000 for a 50G link + * 1-in-100,000 for a 100G link + + This default is chosen to allow the detection of a new flow of 10% link bandwidth in under 1 second. It is recommended not to change the defaults. This CLI is to be used only in case of exceptions (e.g., to set the sample-rate to the nearest power-of-2 if there are hardware restrictions in using the defaults) + + * value is the average number of packets skipped before the sample is taken. As per SAI samplepacket definition : "The sampling rate specifies random sampling probability as the ratio of packets observed to samples generated. For example a sampling rate of 256 specifies that, on average, 1 sample will be generated for every 256 packets observed." + * Valid range 256:8388608. + +* **sflow polling-interval** *{value}* + + The counter polling interval for all interfaces. + + * Valid range 0:300 seconds + * Set polling-interval to 0 to disable + +* **sflow sample-rate speed <100M|1G|10G|25G|40G|50G|100G>** *{value}* + + Set the sampling-rate for interfaces based on speed: + e.g. + ``` + config sflow sample-rate speed 100M 250 + config sflow sample-rate speed 1G 500 + config sflow sample-rate speed 40G 5000 + ``` + * This will override the default speed based setting (which is ifSpeed / 1e6 where ifSpeed is in bits/sec.) + * If port speed changes, this setting will be used to determine the updated sample-rate for the interface. + * The config sflow interface sample-rate {interface-name} {value} setting can still be used to override the speed based setting for specific interfaces. + + +#### Show commands + +* **show sflow** + * Displays the current configuration, global defaults as well as user configured values including collectors. +* **show sflow interface** + * Displays the current running configuration of sflow interfaces. + +#### Example SONiC CLI configuration #### + +# sflow collector add collector1 10.100.12.13 + +# sflow collector add collector2 10.144.1.2 port 6344 + +# sflow agent-id add loopback0 + +# sflow enable + +# sflow interface disable Ethernet0 + +# sflow interface sample-rate Ethernet16 32768 + +The configDB objects for the above CLI is given below: + +``` +{ + "SFLOW_COLLECTOR": { + "collector1": { + "collector_ip": "10.100.12.13", + "collector_port": "6343" + }, + "collector2": { + "collector_ip": "10.144.1.2", + "collector_port": "6344" + } + }, + + "SFLOW": { + "global": { + "admin_state": "up" + "polling_interval": "20" + "agent_id": "loopback0", + } + } + + "SFLOW_SESSION": { + "Ethernet0": { + "admin_state": "down" + "sample_rate": "40000" + }, + "Ethernet16": { + "admin_state": "up" + "sample_rate": "32768" + } + } + +``` + +If user issues a "config sflow interface disable all", the SFLOW_SESSION will have the following: +``` + "SFLOW_SESSION": { + "all":{ + "admin_state":"down" + }, + ... + } +``` + +# show sflow + +Displays the current configuration, global defaults as well as user configured values including collectors. + +``` +sFlow services are enabled +Counter polling interval: 20 +2 collectors configured: +Collector IP addr: 10.100.12.13, UDP port: 6343 +Collector IP addr: 10.144.1.2, UDP port: 6344 +Agent ID: loopback0 (10.0.0.10) + +``` + +# show sflow interface + +Displays the current running configuration of sflow interfaces. + +``` +Interface Admin Status Sampling rate +--------- ------------ ------------- +Ethernet0 Disabled 40000 +Ethernet1 Enabled 40000 +Ethernet2 Enabled 40000 +Ethernet3 Enabled 40000 +Ethernet4 Enabled 40000 +Ethernet5 Enabled 40000 +Ethernet6 Enabled 40000 +Ethernet7 Enabled 40000 +Ethernet8 Enabled 40000 +Ethernet9 Enabled 40000 +Ethernet10 Enabled 40000 +Ethernet11 Enabled 40000 +Ethernet12 Enabled 40000 +Ethernet13 Enabled 40000 +Ethernet14 Enabled 40000 +Ethernet15 Enabled 40000 +Ethernet16 Enabled 32768 +Ethernet17 Enabled 40000 +Ethernet18 Enabled 40000 +Ethernet19 Enabled 40000 +Ethernet20 Enabled 40000 +Ethernet21 Enabled 40000 +Ethernet22 Enabled 40000 +Ethernet23 Enabled 40000 +Ethernet24 Enabled 40000 +Ethernet25 Enabled 40000 +Ethernet26 Enabled 40000 +Ethernet27 Enabled 40000 +Ethernet28 Enabled 40000 +Ethernet29 Enabled 40000 +Ethernet30 Enabled 40000 +Ethernet31 Enabled 40000 + +``` + +### 6.6 **DB and Schema changes** + +#### ConfigDB Table & Schema + +A new SFLOW_COLLECTOR ConfigDB table entry would be added. +``` +SFLOW_COLLECTOR|{{collector_name}} + "collector_ip": {{ip_address}} + "collector_port": {{ uint32 }} (OPTIONAL) + +; Defines schema for sFlow collector configuration attributes +key = SFLOW_COLLECTOR:collector_name ; sFlow collector configuration +; field = value +COLLECTOR_IP = IPv4address / IPv6address ; Ipv4 or IpV6 collector address +COLLECTOR_PORT = 1*5DIGIT ; destination L4 port : a number between 0 and 65535 + +;value annotations +collector_name = 1*16VCHAR +``` + +A new SFLOW table will be added which holds global configurations +``` +; Defines schema for SFLOW table which holds global configurations +key = SFLOW +ADMIN_STATE = "up" / "down" +POLLING_INTERVAL = 1*3DIGIT ; counter polling interval +AGENT_ID = ifname ; Interface name +``` + +A new SFLOW_SESSION table would be added. +``` +key SFLOW_SESSION:interface_name +ADMIN_STATE = "up" / "down" +SAMPLE_RATE = 1*7DIGIT ; average number of packets skipped before the sample is taken +``` + +#### AppDB & Schema + +A new SFLOW_SESSION_TABLE is added to the AppDB: + +``` +; Defines schema for SFLOW_SESSION_TABLE which holds global configurations +key = SFLOW_SESSION_TABLE:interface_name +ADMIN_STATE = "up" / "down" +SAMPLE_RATE = 1*7DIGIT ; average number of packets skipped before the sample is taken +``` + +A new SFLOW_SAMPLE_RATE_TABLE table which maps interface speed to the sample rate for that speed is added to the AppDB +``` +; Defines schema for SFLOW_SAMPLE_RATE which maps interface speed to sampling rate +key = SFLOW_SAMPLE_RATE_TABLE:speed +SAMPLE_RATE = 1*7DIGIT ; average number of packets skipped before the sample is taken +``` +Where speed = 1*6DIGIT ; port line speed in Mbps + +### 6.7 **sflow container** + +hsflowd (https://github.com/sflow/host-sflow) is the most popular open source implementation of the sFlow agent and provides support for DNS-SD (http://www.dns-sd.org/) and can be dockerised. hsflowd supports sFlow version 5 (https://sflow.org/sflow_version_5.txt which supersedes RFC 3176). hsflowd will run as a systemd service within the sflow docker. + +CLI configurations will be saved to the ConfigDB. Once the genetlink channel has been initialised and the sFlow traps mapped to the genetlink group, the hsflowd service is started. The service startup script will derive the /etc/hsflowd.conf from the ConfigDB. Config changes will necessitate restart of hsflowd. hsflowd provides the necessary statistics for the "show" commands. CLI "config" commands will retrieve the entries in the config DB. + +#### sflowmgrd + +The sflowmgrd consumes sflow config DB changes and updates the SFLOW APP DB tables. + +The sflowmgrd daemon listens to SFLOW_COLLECTOR to construct the hsflowd.conf and start the hsflowd service. +The mapping between the SONiC sflow CLI parameters and the host-sflow is given below: + +SONIC CLI parameter| hsflowd.conf equivalent +-------------------|------------------------ +collector ip-address | collector.ip +collector port| collector.UDPPort +agent ip-address | agentIP +max-datagram-size | datagramBytes +sample-rate | sampling + +The master list of supported host-sflow tokens are found in host-sflow/src/Linux/hsflowtokens.h + +sflowmgrd also listens to SFLOW to propogate the sampling rate changes to App DB SFLOW_TABLE. + +#### hsflowd service + +hsflowd provides an module adaptation layer for interfacing with the NOS. In the host-sflow repo, a src/Linux/mod_sonic.c adaption layer will be provided for hsflowd APIs to SONiC that deal with hsflowd initialization, configuration changes, packet sample consumption etc. More specifically, SONiC will register and provide callbacks for the following HSP events: + +hsflowd bus/events|SONiC callback actions +------------------|---------------------- + pollBus/HSPEVENT_INTF_READ | select all switchports for sampling by default + pollBus/HSPEVENT_INTF_SPEED | set sampling rate + pollBus/HSPEVENT_UPDATE_NIO | poll interface state from STATE_DB:PORT_TABLE and update counter stats in SFLHost_nio_counters from COUNTER DB + pollBus/HSPEVENT_CONFIG_CHANGED)| Change sampling rate (/ port speed changed) + packetBus/HSPEVENT_CONFIG_CHANGED | open netlink socket and register HsflowdRx() + +Refer to host-sflow/src/Linux/hsflowd.h for a list of events. + +### 6.8 **SWSS and syncd changes** + +### sFlowOrch + +An sFlowOrch is introduced in the Orchagent to handle configuration requests. The sFlowOrch essentially facilitates the creation/deletion of samplepacket sessions as well as get/set of session specific attributes. sFlowOrch sets the genetlink host interface that is to be used by the SAI driver to deliver the samples. + +Also, it monitors the SFLOW_SESSIONS_TABLE and PORT state to determine sampling rate / speed changes to derive and set the sampling rate for all the interfaces. It uses the SAI samplepacket APIs to set each ports's sampling rate. + +### Rate limiting + +Considering that sFlow backoff mechanism is not being implemented, users should consider rate limiting sFlow samples using the currently existing COPP mechanism (the COPP config (e.g. src/sonic-swss/swssconfig/sample/00-copp.config.json) can include appropriate settings for the samplepacket trap and initialised using swssconfig). + +### 6.9 **SAI changes** + +Creating sFlow sessions and setting attributes (e.g. sampling rate) is described in SAI proposal : https://github.com/opencomputeproject/SAI/tree/master/doc/Samplepacket + +As per the sFlow specification, each packet sample should have certain minimal meta data for processing by the sFlow analyser. The psample infrastructure (http://man7.org/linux/man-pages/man8/tc-sample.8.html) already describes the desired metadata fields (which the SAI driver needs to add to each sample): + +``` +SAMPLED PACKETS METADATA FIELDS + The metadata are delivered to userspace applications using the + psample generic netlink channel, where each sample includes the + following netlink attributes: + + PSAMPLE_ATTR_IIFINDEX + The input interface index of the packet, if there is one. + + PSAMPLE_ATTR_OIFINDEX + The output interface index of the packet. This field is not + relevant on ingress sampling + + PSAMPLE_ATTR_ORIGSIZE + The size of the original packet (before truncation) + + PSAMPLE_ATTR_SAMPLE_GROUP + The psample group the packet was sent to + + PSAMPLE_ATTR_GROUP_SEQ + A sequence number of the sampled packet. This number is + incremented with each sampled packet of the current psample + group + + PSAMPLE_ATTR_SAMPLE_RATE + The rate the packet was sampled with +``` + +The SAI driver may provide the interface OIDs corresponding to the IIFINDEX AND OIFINDEX. The hsflowd mod_sonic HsflowdRx() may have to map these correspondingly to the netdev ifindex. Note that the default PSAMPLE_ATTR_SAMPLE_GROUP that hsflowd expects is 1. + +Rather than define a new framework for describing the metadata for sFlow use, SAI would re-use the framework that the psample driver (https://github.com/torvalds/linux/blob/master/net/psample/psample.c) currently uses. The psample kernel driver is based on the Generic Netlink subsystem that is described in https://wiki.linuxfoundation.org/networking/generic_netlink_howto. SAI ASIC drivers supporting sFlow may choose to use the psample.ko driver as-is or may choose to implement the generic netlink interface (that complies with the above listed metadata) using a private generic netlink family. + +#### SAI Host Interface based on Generic Netlink + +During SWSS init (as part of copporch), based on the swssconfig/sample/00-copp.config.json settings, sai_create_hostif_fn() is used to let the SAI driver create a special genetlink interface (type SAI_HOST_INTERFACE_TYPE_GENETLINK) and associate it with generic netlink family (SAI_HOST_INTERFACE_ATTR_NAME) and multicast group name (SAI_HOSTIF_ATTR_GENETLINK_MCGRP_NAME). Later, sai_create_hostif_table_entry_fn() is used to map SAI_HOSTIF_TRAP_TYPE_SAMPLEPACKET to the genetlink sai_host_if. + +syncd/SAI implementations can use one of the following methods to create the genetlink interface: + +1. As part of syncd/SAI driver init, a driver based on the standard psample driver (genetlink family ="psample" and multicast group "packets") may be installed which would create the genetlink. In this case the sai_create_hostif_fn() determines that the genetlink interface is already created and merely associates the sai_host_if to the genetlink. + +2. The genetlink interface may alternatively be created during the call to sai_create_hostif_fn(). + + +#### Changes in SAI to support the GENETLINK host interface + +The changes in SAI to support the GENETLINK host interface is highlighted below: + +``` + /** Generic netlink */ + SAI_HOSTIF_TYPE_GENETLINK + + /** + * @brief Name [char[SAI_HOSTIF_NAME_SIZE]] + * + * The maximum number of characters for the name is SAI_HOSTIF_NAME_SIZE - 1 since + * it needs the terminating null byte ('\0') at the end. + * + * In case of GENETLINK, name refers to the genl family name + * + * @type char + * @flags MANDATORY_ON_CREATE | CREATE_ONLY + * @condition SAI_HOSTIF_ATTR_TYPE == SAI_HOSTIF_TYPE_NETDEV or SAI_HOSTIF_ATTR_TYPE == SAI_HOSTIF_TYPE_GENETLINK + */ + SAI_HOSTIF_ATTR_NAME, + + /** + * @brief Name [char[SAI_HOSTIF_GENETLINK_MCGRP_NAME_SIZE]] + * + * The maximum number of characters for the name is SAI_HOSTIF_GENETLINK_MCGRP_NAME_SIZE - 1 + * Set the Generic netlink multicast group name on which the packets/buffers + * are received on this host interface + * + * @type char + * @flags MANDATORY_ON_CREATE | CREATE_ONLY + * @condition SAI_HOSTIF_ATTR_TYPE == SAI_HOSTIF_TYPE_GENETLINK + */ + SAI_HOSTIF_ATTR_GENETLINK_MCGRP_NAME, + + /** Receive packets via Linux generic netlink interface */ + SAI_HOSTIF_TABLE_ENTRY_CHANNEL_TYPE_GENETLINK +``` +#### Creating a GENETLINK Host Interface + +Below is an example code snip that shows how a GENETLINK based host inerface is created. It is assumed that the application has already installed the psample.ko and created multicast group 100. + +``` +// Create a Host Interface based on generic netlink +sai_object_id_t host_if_id; +sai_attribute_t sai_host_if_attr[3]; +  +sai_host_if_attr[0].id=SAI_HOST_INTERFACE_ATTR_TYPE; +sai_host_if_attr[0].value=SAI_HOST_INTERFACE_TYPE_GENETLINK; +  +sai_host_if_attr[1].id= SAI_HOST_INTERFACE_ATTR_NAME; +sai_host_if_attr[1].value="psample"; +  +sai_host_if_attr[2].id= SAI_HOSTIF_ATTR_GENETLINK_MCGRP_NAME; +sai_host_if_attr[2].value="packets"; + +sai_create_host_interface_fn(&host_if_id, 9, sai_host_if_attr); +``` + +### Mapping a sFlow (SAI_HOSTIF_TRAP_TYPE_SAMPLEPACKET) trap to a GENETLINK host interface multicast group id + +Below is the code snip that outlines how an sFlow trap is mapped to the GENETLINK host interface created in the previous section. + +``` +// Configure the host table to receive traps on the generic netlink socket + +sai_object_id_t host_table_entry; +sai_attribute_t sai_host_table_attr[9]; +  +sai_host_table_attr[0].id=SAI_HOSTIF_TABLE_ENTRY_ATTR_TYPE; +sai_host_table_attr[0].value= SAI_HOST_INTERFACE_TABLE_ENTRY_TYPE_TRAP_ID; +  +sai_host_table_attr[1].id= SAI_HOSTIF_TABLE_ENTRY_ATTR_TRAP_ID; +sai_host_table_attr[1].value=sflow_trap_id; // Object referencing SAMPLEPACKET trap + +sai_host_table_attr[2].id= SAI_HOSTIF_TABLE_ENTRY_ATTR_CHANNEL; +sai_host_table_attr[2].value= SAI_HOSTIF_TABLE_ENTRY_CHANNEL_TYPE_GENETLINK; + +sai_host_table_attr[3].id= SAI_HOSTIF_TABLE_ENTRY_ATTR_HOST_IF; +sai_host_table_attr[3].value=host_if_id; // host interface of type file descriptor for GENETLINK + +sai_create_hostif_table_entry_fn(&host_table_entry, 4, sai_host_table_attr); +``` + +It is assumed that the trap group and the trap itself have been defined using sai_create_hostif_trap_group_fn() and sai_create_hostif_trap_fn(). + +## 7 **Warmboot support** + +sFlow packet/counter sampling should not be affected after a warm reboot. In case of a planned warm reboot, packet sampling will be stopped. + +## 8 **sFlow in Virtual Switch** + +On the SONiC VS, SAI calls would map to the tc_sample commands on the switchport interfaces (http://man7.org/linux/man-pages/man8/tc-sample.8.html). + +## 9 **Build** + +* The host-sflow package will be built with the mod_sonic callback implementations using the FEATURES="SONIC" option + +## 10 **Restrictions** +* /etc/hsflowd.conf should not be modified manually. While it should be possible to change /etc/hsflowd.conf manually and restart the sflow container, it is not recommended. +* Management VRF support: TBD +* configuration updates will necessitate hsflowd service restart + +## 11 **Unit Test cases** +Unit test case one-liners are given below: + +| S.No | Test case synopsis | +|------|-----------------------------------------------------------------------------------------------------------------------------------------| +| 1 | Verify that the SFLOW_COLLECTOR configuration additions from CONFIG_DB are processed by hsflowd. | +| 2 | Verify that the SFLOW_COLLECTOR configuration deletions from CONFIG_DB are processed by hsflowd. | +| 3 | Verify that sFlowOrch creates "psample" multicast group "packets" if there is not psample driver inserted. | +| 4 | Verify that sFlowOrch maps SAI_HOSTIF_TRAP_TYPE_SAMPLEPACKET trap to the "psample" family and multicast group "packets". | +| 5 | Verify that it is possible to enable and disable sflow using the SFLOW table's admin_state field in CONFIG_DB | +| 6 | Verify that interfaces can be enabled/disabled using additions/deletions in SFLOW_SESSION table in CONFIG_DB | +| 7 | Verify that it is possible to change the counter polling interval using the SFLOW table in CONFIG_DB | +| 8 | Verify that it is possible to change the agent-id using the SFLOW table in CONFIG_DB +| 9 | Verify that it is possible to change the sampling rate per interface using SFLOW_SESSION interface sample_rate field in CONFIG_DB | +| 10 | Verify that changes to SFLOW_SESSION CONFIG_DB entry is pushed to the corresponding table in APP_DB and to ASIC_DB by sFlowOrch | +| 11 | Verify that collector and per-interface changes get reflected using the "show sflow" and "show sflow interface" commands | +| 12 | Verify that packet samples are coming into hsflowd agent as per the global and per-interface configuration | +| 13 | Verify that the hsflowd generated UDP datagrams are generated as expected and contain all the PSAMPLE_ATTR* attributes in the meta data | +| 14 | Verify that samples are received when either 1 or 2 collectors are configured. | +| 15 | Verify the sample collection for both IPv4 and IPv6 collectors. | +| 16 | Verify that sample collection works on all ports or on a subset of ports (using lowest possible sampling rate). | +| 17 | Verify that counter samples are updated every 20 seconds | +| 18 | Verify that packet & counter samples stop for a disabled interface. | +| 19 | Verify that sampling changes based on interface speed and per-interface sampling rate change. | +| 20 | Verify that if sFlow is not enabled in the build, the sflow docker is not started | +| 21 | Verify that sFlow docker can be stopped and restarted and check that packet and counter sampling restarts. | +| 22 | Verify that with config saved in the config_db.json, restarting the unit should result in sFlow coming up with saved configuration. | +| 23 | Verify sFlow functionality with valid startup configuration and after a normal reboot, fast-boot and warm-boot. | +| 24 | Verify that the sFlow hsflowd logs are emitted to the syslog file for various severities. | +| 25 | Verify that the swss restart works without issues when sflow is enabled and continues to sample as configured. | +## 12 **Action items** +* Determine if it is possible to change configuration without restarting hsflowd +* Check host-sflow licensing options diff --git a/doc/sonic-build-system/build_system_improvements.md b/doc/sonic-build-system/build_system_improvements.md new file mode 100644 index 0000000000..99342ab3cf --- /dev/null +++ b/doc/sonic-build-system/build_system_improvements.md @@ -0,0 +1,184 @@ +# Build system improvements + +This document describes few options to improve SONiC build time. +To split the work we will consider that SONiC has two stages: + + 1. debian/python packages compilation <- relatively fast + 2. docker images build <- slower espessially when several users are building in parallel + +So we will first focus on second stage as it is the most time consuming stage + +## Improving Dockerfile instructions + +Each build instruction in Dockerfile involves creating a new layer, which is time consuming for docker daemon. + +As long as we are using ```--no-cache --squash``` to build docker images there is no real use of building in layers. + +e.g. SNMP docker image: + +```Dockerfile +{\% if docker_snmp_sv2_debs.strip() -\%} +# Copy locally-built Debian package dependencies +{\%- for deb in docker_snmp_sv2_debs.split(' ') \%} +COPY debs/{{ deb }} /debs/ +{\%- endfor \%} + +``` +Renders to: +```Dockerfile +# Copy locally-built Debian package dependencies +COPY debs/libnl-3-200_3.2.27-2_amd64.deb /debs/ +COPY debs/libsnmp-base_5.7.3+dfsg-1.5_all.deb /debs/ +COPY debs/libsnmp30_5.7.3+dfsg-1.5_amd64.deb /debs/ +COPY debs/libpython3.6-minimal_3.6.0-1_amd64.deb /debs/ +COPY debs/libmpdec2_2.4.2-1_amd64.deb /debs/ +COPY debs/libpython3.6-stdlib_3.6.0-1_amd64.deb /debs/ +COPY debs/python3.6-minimal_3.6.0-1_amd64.deb /debs/ +COPY debs/libpython3.6_3.6.0-1_amd64.deb /debs/ +COPY debs/snmp_5.7.3+dfsg-1.5_amd64.deb /debs/ +COPY debs/snmpd_5.7.3+dfsg-1.5_amd64.deb /debs/ +COPY debs/python3.6_3.6.0-1_amd64.deb /debs/ +COPY debs/libpython3.6-dev_3.6.0-1_amd64.deb /debs/ +``` + +Same goes for instructions to install built packages: + +```Dockerfile +RUN dpkg_apt() { [ -f $1 ] && { dpkg -i $1 || apt-get -y install -f; } || return 1; }; dpkg_apt /debs/libnl-3-200_3.2.27-2_amd64.deb +RUN dpkg_apt() { [ -f $1 ] && { dpkg -i $1 || apt-get -y install -f; } || return 1; }; dpkg_apt /debs/libsnmp-base_5.7.3+dfsg-1.5_all.deb +RUN dpkg_apt() { [ -f $1 ] && { dpkg -i $1 || apt-get -y install -f; } || return 1; }; dpkg_apt /debs/libsnmp30_5.7.3+dfsg-1.5_amd64.deb +... +``` + +### Suggestion to improve: + +```Dockerfile +{\% if docker_snmp_sv2_debs.strip() -\%} +# Copy locally-built Debian package dependencies +COPY +{\%- for deb in docker_snmp_sv2_debs.split(' ') \%} +debs/{{ deb }} \ +{\%- endfor \%} +/debs/ +``` + +This will generate single COPY instruction: +```Dockerfile +# Copy locally-built Debian package dependencies +COPY debs/libnl-3-200_3.2.27-2_amd64.deb \ + debs/libsnmp-base_5.7.3+dfsg-1.5_all.deb \ + debs/libsnmp30_5.7.3+dfsg-1.5_amd64.deb \ + debs/libpython3.6-minimal_3.6.0-1_amd64.deb \ + debs/libmpdec2_2.4.2-1_amd64.deb \ + debs/libpython3.6-stdlib_3.6.0-1_amd64.deb \ + debs/python3.6-minimal_3.6.0-1_amd64.deb \ + debs/libpython3.6_3.6.0-1_amd64.deb \ + debs/snmp_5.7.3+dfsg-1.5_amd64.deb \ + debs/snmpd_5.7.3+dfsg-1.5_amd64.deb \ + debs/python3.6_3.6.0-1_amd64.deb \ + debs/libpython3.6-dev_3.6.0-1_amd64.deb \ + /debs/ +``` + +Reduced number of steps from 52 to 20 for SNMP docker. + +### How much faster? + +```bash +stepanb@51bc3c787be0:/sonic$ time BLDENV=stretch make -f slave.mk target/docker-snmp-sv2.gz +``` + +|Without optiomization|With optimizations| +|---------------------|------------------| +|27m48.289s |10m50.024s | + +Gives 2.7 times build time improvement + +**NOTE**: build time is linear to number of steps: 27/10 ~ 52/20 + +### How to force developers to use single step instruction for new Dockerfiles.j2? +Provide a set of macros defined in dockers/dockerfile-macros.j2 file: + +```jinja +copy_files +install_debian_packages +install_python_wheels +``` + +## Upgrade docker in slave to 18.09 and use Docker Build Kit (optionally) + +1. Upgrade docker in sonic-slave-stretch to 18.09 - already available in debian stretch repositories +2. Add environment variable ```DOCKER_BUILD_KIT=1``` to ```docker build``` command to use BuildKit instead of legacy docker build engine + +|Without optiomization in #1 |With optimizations in #1| +|----------------------------|------------------------| +|11m2.483s |4m20.083s | + +Gives 2.5 times build time improvement +Max 6.5 times build time improvement + +**NOTE**: (bug) squash generates image squashed with base image resulting in sonic image size (600 mb -> 1.5 gb) + +Introduce option SONIC_USE_DOCKER_BUILDKIT and warn user about image size: +``` +$ make SONIC_USE_DOCKER_BUILDKIT=y target/sonic-mellanox.bin +warning: using docker buildkit will produce increase image size (more details: https://github.com/moby/moby/issues/38903) +... +``` + +However, eventuly it will be fixed, so we can use SONIC_USE_DOCKER_BUILDKIT=y by default + +### Avoid COPY debs/py-debs/python-wheels at all (for future) +https://github.com/moby/buildkit/blob/master/frontend/dockerfile/docs/experimental.md#run---mounttypebind-the-default-mount-type + +```Dockerfile +RUN --mount=type=bind,target=/debs/,source=debs/ dpkg_apt() deb1 debs2 deb3... +``` + +|With optimizations in #1| +|------------------------| +|3m56.957s | + +**NOTE**: requires enabling ```# syntax = docker/dockerfile:experimental``` in Dockerfile + + +## Enable swss, swss-common, sairedis parallel build + +From ``` man dh build ```: +``` +If your package can be built in parallel, please either use compat 10 or pass --parallel to dh. Then dpkg-buildpackage -j will work. +``` + +- swss (can be built in parallel, ~7m -> ~2m) +- swss-common (can be built in parallel) +- sairedis (~20m -> ~7m) + +## Seperate sairedis RPC from non-RPC build + +Some work is done on that but no PR (https://github.com/Azure/sonic-sairedis/issues/333) + +sairedis is a dependency for a lot of targets (usually I see sairedis compilation takes a lot of time blocking other targets to start) + +The idea of improvement is: + +- No need to build libthrift, saithrift when 'ENABLE_SYNCD_RPC != y' +- The debian/rules in sairedis is written in a way that it will built sairedis from scratch twice - non-rpc and rpc version. + +This improvement is achivable by specifying in rules/sairedis.mk: + +```SAIREDIS_DPKG_TARGET = binary-syncd``` + +and conditionaly injecting libthrift depending on ENABLE_SYNCD_RPC. + +The overal improvement ~10m. + +sairedis target built time now is ~3m. + +## Total improvement + +It is hard to measure the total improvement, because since last time it was tested build system has changed (new packages were added and we finnaly moved to stretch for all dockers) + +Few month ago on our build server with 12 CPUs sonic took around ~6h. +Right now on the same server it is around 2.5h. Enabling ```SONIC_USE_BUILD_KIT=y``` I was able to build the image in 1.5h. +The test included linux kernel built from scratch and not downloaded pre-built package. + diff --git a/doc/ssdhealth_design.md b/doc/ssdhealth_design.md new file mode 100644 index 0000000000..886ba826db --- /dev/null +++ b/doc/ssdhealth_design.md @@ -0,0 +1,155 @@ +## Motivation +Add to SONiC an ability to check storage health state. Basic functionality will be implemented as a CLI command. Optionally pmon daemon could be added for constant disk state monitoring. + +## CLI + +### Syntax + show platform ssdhealth [verbose/vendor] + +### Output example +#### Brief + admin@sonic-switch: ~$ show platform ssdhealth + Device Model : InnoDisk Corp. - mSATA 3ME + Health: 72.9% + Temperature: N/A + admin@sonic-switch: ~$ + +#### Verbose + admin@sonic-switch: ~$ show platform ssdhealth verbose + Device Model : InnoDisk Corp. - mSATA 3ME + FW Version : S140714 + Serial Number : 20160429AA1134000035 + Health : 72.9% + Capacity : 29.818199 GB + Temperature : N/A + Power On Hours : 1576 hours + Power Cycle count: 130 + Something else??? + +#### Vendor + admin@sonic-switch: ~$ show platform ssdhealth vendor + + ******************************************************************************************** + * Innodisk iSMART V3.9.41 2018/05/25 * + ******************************************************************************************** + Model Name: InnoDisk Corp. - mSATA 3ME + FW Version: S140714 + Serial Number: 20160429AA1134000035 + Health: 72.900% + Capacity: 29.818199 GB + P/E Cycle: 3000 + Lifespan : 1576 (Years : 4 Months : 3 Days : 26) + Write Protect: Disable + InnoRobust: Enable + -------------------------------------------------------------------------------------------- + ID SMART Attributes Value Raw Value + -------------------------------------------------------------------------------------------- + [09] Power On Hours [18304] [090200646480470000000000] + [0C] Power Cycle Count [ 130] [0C0200646482000000000000] + [AA] Total Bad Block Count [ 15] [AA0300646400000F00000000] + [AD] Erase Count Max. [ 883] [AD020064642D037303000000] + [AD] Erase Count Avg. [ 813] [AD020064642D037303000000] + [C2] Temperature [ 0] [000000000000000000000000] + [EB] Later Bad Block [ 0] [EB0200640000000000000000] + [EB] Read Block [ 0] [EB0200640000000000000000] + [EB] Write Block [ 0] [EB0200640000000000000000] + [EB] Erase Block [ 0] [EB0200640000000000000000] + [EC] Unstable Power Count [ 0] [EC0200646400000000000000] + admin@sonic-switch: ~$ + +## Implementation +### Generic part +#### 'show' utility update +New item under menu `platform` in `show/main.py` +It will execute "ssdhealth -d /dev/sdX" [options] + +#### ssdhealth utility +New utility in `sonic-utilities/scripts/` +It will import device plugin `ssdutil.py` and print the output returned by different API functions + +**Syntax:** + + root@mts-sonic-dut:/home/admin# ssdhealth -h + usage: ssdhealth -d DEVICE [-h] [-v] [-e] + + Show disk device health status + + optional arguments: + -h, --help show this help message and exit + -d, --device disk device to get information for + -v, --verbose show verbose output (more parameters) + -e, --vEndor show vendor specific disk information + + Examples: + ssdhealth -d /dev/sda + ssdhealth -d /dev/sda -v + ssdhealth -d /dev/sda -e + + +#### Plugins design +##### Class SsdBase +Location: `sonic-buildimage/src/sonic-platform-common/sonic_platform_base/sonic_ssd/ssd_base.py` +Generic implementation of the API. Will use specific utilities for known disks or the `systemctl` utility for others. Since not all disk models are in smartctl's database, some information can be unavailable or incomplete. + + class SsdBase: + ... + +##### Class SsdUtil +Inherited from SsdBase. Can be implemented by vendors to provide detailed info about the disk installed. +Location: `sonic-buildimage/device/{{vendor}}/platform/plugins/ssdutil.py` + + class SsdUtil(SsdBase): + ... + +#### API +* **get\_disk\_health(diskdev)** + * Accepts: + * diskdev:string - disk device name (e.g. /dev/sda) + * Returns: + * res:float - Floating point in range 0-100 representing disk health in percentages. -1 if not available +* **get\_temperature(diskdev)** + * Accepts: + * diskdev:string - disk device name (e.g. /dev/sda) + * Returns: + * res:string - Integer (floating point?) disk temperature in centigrade. Zero if not available +* **get\_model(diskdev)** + * Accepts: + * diskdev:string - disk device name (e.g. /dev/sda) + * Returns: + * res:string - Human readable string holding disk model. Empty if not available +* **get\_firmware(diskdev)** + * Accepts: + * diskdev:string - disk device name (e.g. /dev/sda) + * Returns: + * res:string - Human readable string holding disk firmware version. Empty if not available +* **get\_serial(diskdev)** + * Accepts: + * diskdev:string - disk device name (e.g. /dev/sda) + * Returns: + * res:string - Human readable string holding disk serial number. Empty if not available +* **get\_vendor_output(diskdev)** + * Accepts: + * diskdev:string - disk device name (e.g. /dev/sda) + * Returns: + * res:string - Human readable string. Output of vendor application. Empty if not available + +## Utilities and packages +#### smartctl +Part of smartmontools package (1.9M) +PR: [https://github.com/Azure/sonic-buildimage/pull/2703](https://github.com/Azure/sonic-buildimage/pull/2703) + +#### iSmart +Utility for InnoDisk Corp. SSDs (<120K) +https://www.innodisk.com/en/iService/utility/iSMART +Need to be added as binary. + +#### SmartCmd +Utility for StorFly and Virtium (2.2M) + +## (Optional) Daemon for monitoring +Daemon in Pmon (ssdmond) which will periodically query disk health (get_health()) and raise alarm when value decides to some critical value. + +## Open questions +1. Daemon and monitoring? +2. SNMP needed? + diff --git a/doc/stp/SONiC_PVST_HLD.md b/doc/stp/SONiC_PVST_HLD.md new file mode 100644 index 0000000000..aaef99aaf1 --- /dev/null +++ b/doc/stp/SONiC_PVST_HLD.md @@ -0,0 +1,746 @@ +# Feature Name +PVST +# High Level Design Document +#### Rev 1.0 + +# Table of Contents + * [List of Tables](#list-of-tables) + * [Revision](#revision) + * [About This Manual](#about-this-manual) + * [Scope](#scope) + * [Definition/Abbreviation](#definitionabbreviation) + * [Requirements Overview](#requirement-overview) + * [Functional Requirements](#functional-requirements) + * [Configuration and Management Requirements](#configuration-and-management-requirements) + * [Scalability Requirements](#scalability-requirements) + * [Warm Boot Requirements](#warm-boot-requirements) + * [Functionality](#-functionality) + * [Functional Description](#functional-description) + * [Design](#design) + * [Overview](#overview) + * [DB Changes](#db-changes) + * [CONFIG DB](#config-db) + * [APP DB](#app-db) + * [STATE DB](#state-db) + * [Switch State Service Design](#switch-state-service-design) + * [Orchestration Agent](#orchestration-agent) + * [STP Container](#stp-container) + * [SAI](#sai) + * [CLI](#cli) + * [Configuration Commands](#configuration-commands) + * [Show Commands](#show-commands) + * [Debug Commands](#debug-commands) + * [Clear Commands](#clear-commands) + * [Flow Diagrams](#flow-diagrams) + * [Serviceability and Debug](#serviceability-and-debug) + * [Warm Boot Support](#warm-boot-support) + * [Scalability](#scalability) + * [Unit Test](#unit-test) + +# List of Tables +[Table 1: Abbreviations](#table-1-abbreviations) + +# Revision +| Rev | Date | Author | Change Description | +|:---:|:-----------:|:-------------------------:|-----------------------------------| +| 0.1 | 05/02/2019 | Sandeep, Praveen | Initial version | +| 0.2 | 05/02/2019 | Sandeep, Praveen | Incorporated Review comments | +| 0.3 | 06/25/2019 | Sandeep, Praveen | Incorporated Review comments | +| 1.0 | 10/15/2019 | Sandeep, Praveen | Minor changes post implementation | + + +# About this Manual +This document provides general information about the PVST (Per VLAN spanning tree) feature implementation in SONiC. +# Scope +This document describes the high level design of PVST feature. + +# Definition/Abbreviation +### Table 1: Abbreviations +| **Term** | **Meaning** | +|--------------------------|-------------------------------------| +| BPDU | Bridge protocol data unit | +| PVST | Per VLAN Spanning tree | +| STP | Spanning tree protocol | + + +# 1 Requirement Overview +## 1.1 Functional Requirements + + + 1. Support PVST+ per VLAN spanning tree + 2. Support Portfast functionality + 3. Support Uplink fast functionality + 4. Support BPDU guard functionality + 5. Support Root guard functionality + 6. Bridge-id should include the VLAN id of the pvst instance along with bridge MAC address to derive a unique value for each instance. + 7. Port channel path cost will be same as the member port cost, it will not be cumulative of member ports cost + 8. DA MAC in case of PVST should be cisco pvst mac - 01:00:0C:CC:CC:CD + 9. Support protocol operation on static breakout ports + 10. Support protocol operation on Port-channel interfaces + + +## 1.2 Configuration and Management Requirements +This feature will support CLI and REST based configurations. + 1. Support CLI configurations as mentioned in section 3.6.2 + 2. Support show commands as mentioned in section 3.6.3 + 3. Support debug commands as mentioned in section 3.6.4 + 4. Support Openconfig yang model - with extensions for supporting PVST + 5. Support REST APIs for config and operational data + +## 1.3 Scalability Requirements +16k port-vlan instances with max 255 STP instances. +The scaling limit might differ depending on the platform and the CPU used, which needs to be determined based on testing. + +## 1.4 Warm Boot Requirements + +Warm boot is not supported in this release. User is expected to do cold reboot when PVST is running so that topology will reconverge and traffic will be redirected via alternate paths. + +If PVST is enabled and user tries to perform warm reboot an error will be displayed indicating PVST doesnt support warm reboot. + + +# 2 Functionality + +## 2.1 Functional Description +The Spanning Tree Protocol (STP) prevents Layer 2 loops in a network and provides redundant links. If a primary link fails, the backup link is activated and network traffic is not affected. STP also ensures that the least cost path is taken when multiple paths exist between the devices. + +When the spanning tree algorithm is run, the network switches transform the real network topology into a spanning tree topology. In an STP topology any LAN in the network can be reached from any other LAN through a unique path. The network switches recalculate a new spanning tree topology whenever there is a change to the network topology. + +For each switch in the topology, a port with lowest path cost to the root bridge is elected as root port. + +For each LAN, the switches that attach to the LAN select a designated switch that is the closest to the root switch. The designated switch forwards all traffic to and from the LAN. The port on the designated switch that connects to the LAN is called the designated port. The switches decide which of their ports is part of the spanning tree. A port is included in the spanning tree if it is a root port or a designated port. + +PVST+ allows for running multiple instances of spanning tree on per VLAN basis. + +One of the advantage with PVST is it allows for load-balancing of the traffic. When a single instance of spanning tree is run and a link is put into blocking state for avoiding the loop, it will result in inefficient bandwidth usage. With per VLAN spanning tree multiple instances can be run such that for some of the instances traffic is blocked over the link and for other instances traffic is forwared allowing for load balancing of traffic. + +PVST+ support allows the device to interoperate with IEEE STP and also tunnel the PVST+ BPDUs transparently across IEEE STP region to potentially connect other PVST+ switches across the IEEE STP region. For interop with IEEE STP, PVST+ will send untagged IEEE BPDUs (MAC - 01:80:C2:00:00:00) with information corresponding to VLAN 1. The STP port must be a member of VLAN 1 for interoperating with IEEE STP. + +# 3 Design +## 3.1 Overview + +![STP](images/STP_Architecture.png "Figure 1: High level architecture") +__Figure 1: High level architecture__ + +High level overview: + +STP container will host STPMgr and STPd process. + +STPMgr will handle all the STP configurations and VLAN configurations via config DB and state DB respectively. + +STPd process will handle all the protocol functionality and has following interactions + + * Linux kernel for packet tx/rx and netlink events + * STPMgr interaction for configuration handling + * STPSync is part of STPd handling all the STP operational data updates to APP DB + +Alternate design consideration: +Linux kernel has support for spanning-tree but is not being considered for following reasons + * Supports only STP, no support for RSTP and MSTP + * Currently SONiC does not create a netdevice for each vlan, port combination as it relies on vlan aware bridge configuration. For supporting per VLAN spanning tree a netdevice needs to be created for each vlan, port combination, this will result in higher memory requirements due to additional netdevices and also a major change from SONiC perspective. + +## 3.2 DB Changes +This section describes the changes made to different DBs for supporting the PVST protocol. +### 3.2.1 CONFIG DB +Following config DB schemas are defined for supporting this feature. +### STP_GLOBAL_TABLE + ;Stores STP Global configuration + ;Status: work in progress + key = STP|GLOBAL ; Global STP table key + mode = "pvst" ; spanning-tree mode pvst + rootguard_timeout = 3*DIGIT ; root-guard timeout value (5 to 600 sec, DEF:30 sec) + forward_delay = 2*DIGIT ; forward delay in secs (4 to 30 sec, DEF:15 sec) + hello_time = 2*DIGIT ; hello time in secs (1 to 10 sec, DEF:2sec) + max_age = 2*DIGIT ; maximum age time in secs (6 to 40 sec, DEF:20sec) + priority = 5*DIGIT ; bridge priority (0 to 61440, DEF:32768) + +### STP_VLAN_TABLE + ;Stores STP configuration per VLAN + ;Status: work in progress + key = STP_VLAN|"Vlan"vlanid ; VLAN with prefix "STP_VLAN" + forward_delay = 2*DIGIT ; forward delay in secs (4 to 30 sec, DEF:15 sec) + hello_time = 2*DIGIT ; hello time in secs (1 to 10 sec, DEF:2sec) + max_age = 2*DIGIT ; maximum age time in secs (6 to 40 sec, DEF:20sec) + priority = 5*DIGIT ; bridge priority (0 to 61440, DEF:32768) + enabled = "true"/"false" ; spanning-tree is enabled or not + +### STP_VLAN_INTF_TABLE + ;Stores STP interface details per VLAN + ;Status: work in progress + key = STP_VLAN_INTF|"Vlan"vlanid|ifname ; VLAN+Intf with prefix "STP_VLAN_INTF" ifname can be physical or port-channel name + path_cost = 9*DIGIT ; port path cost (1 to 200000000) + priority = 3*DIGIT ; port priority (0 to 240, DEF:128) + +### STP_INTF_TABLE + ;Stores STP interface details + ;Status: work in progress + key = STP_INTF|ifname ; ifname with prefix STP_INTF, ifname can be physical or port-channel name + enabled = BIT ; is the STP on port enabled (1) or disabled (0) + root_guard = BIT ; is the root guard on port enabled (1) or disabled (0) + bpdu_guard = BIT ; is the bpdu guard on port enabled (1) or disabled (0) + bpdu_guard_do_disable = BIT ; port to be disabled when it receives a BPDU; enabled (1) or disabled (0) + path_cost = 9*DIGIT ; port path cost (2 to 200000000) + priority = 3*DIGIT ; port priority (0 to 240, DEF:128) + portfast = BIT ; Portfast is enabled or not + uplink_fast = BIT ; Uplink fast is enabled or not + +### 3.2.2 APP DB + +### STP_VLAN_TABLE + ;Stores the STP per VLAN operational details + ;Status: work in progress + key = STP_VLAN:"Vlan"vlanid + bridge_id = 16HEXDIG ; bridge id + max_age = 2*DIGIT ; maximum age time in secs (6 to 40 sec, DEF:20sec) + hello_time = 2*DIGIT ; hello time in secs (1 to 10 sec, DEF:2sec) + forward_delay = 2*DIGIT ; forward delay in secs (4 to 30 sec, DEF:15 sec) + hold_time = 1*DIGIT ; hold time in secs (1 sec) + last_topology_change = 1*10DIGIT ; time in secs since last topology change occured + topology_change_count = 1*10DIGIT ; Number of times topology change occured + root_bridge_id = 16HEXDIG ; root bridge id + root_path_cost = 1*9DIGIT ; port path cost + desig_bridge_id = 16HEXDIG ; designated bridge id + root_port = ifName ; Root port name + root_max_age = 1*2DIGIT ; Max age as per root bridge + root_hello_time = 1*2DIGIT ; hello time as per root bridge + root_forward_delay = 1*2DIGIT ; forward delay as per root bridge + stp_instance = 1*4DIGIT ; STP instance for this VLAN + +### STP_VLAN_INTF_TABLE + ;Stores STP VLAN interface details + ;Status: work in progress + key = STP_VLAN_INTF:"Vlan"vlanid:ifname ; VLAN+Intf with prefix "STP_VLAN_INTF" + port_num = 1*3DIGIT ; port number of bridge port + path_cost = 1*9DIGIT ; port path cost (1 to 200000000) + priority = 3*DIGIT ; port priority (0 to 240, DEF:128) + port_state = "state" ; STP state - disabled, block, listen, learn, forward + desig_cost = 1*9DIGIT ; designated cost + desig_root = 16HEXDIG ; designated root + desig_bridge = 16HEXDIG ; designated bridge + desig_port = 1*3DIGIT ; designated port + fwd_transitions = 1*5DIGIT ; number of forward transitions + bpdu_sent = 1*10DIGIT ; bpdu transmitted + bpdu_received = 1*10DIGIT ; bpdu received + tcn_sent = 1*10DIGIT ; tcn transmitted + tcn_received = 1*10DIGIT ; tcn received + root_guard_timer = 1*3DIGIT ; root guard current timer value + +### STP_INTF_TABLE + ;Stores STP interface details + ;Status: work in progress + key = STP_INTF:ifname ; ifname with prefix STP_INTF, ifname can be physical or port-channel name + bpdu_guard_shutdown = "yes" / "no" ; port disabled due to bpdu-guard + port_fast = "yes" / "no" ; port fast is enabled or not + + +### STP_PORT_STATE_TABLE + ;Stores STP port state per instance + ;Status: work in progress + key = STP_PORT_STATE:ifname:instance ; ifname and the STP instance + state = 1DIGIT ; 0-disabled, 1-block, 2-listen, 3-learn, 4-forward + + +### STP_VLAN_INSTANCE_TABLE + ;Defines VLANs and the STP instance mapping + ;Status: work in progress + key = STP_VLAN_INSTANCE_TABLE:"Vlan"vlanid ; DIGIT 1-4095 with prefix "Vlan" + stp_instance = 1*4DIGIT ; STP instance associated with this VLAN + + +### STP_FASTAGEING_FLUSH_TABLE + ;Defines vlans for which fastageing is enabled + ;Status: work in progress + key = STP_FASTAGEING_FLUSH_TABLE:"Vlan"vlanid ; vlan id for which flush needs to be done + state = "true" ; true perform flush + +### 3.2.3 STATE DB + +### STP_TABLE + ;Defines the global STP state table + ;Status: work in progress + key = STP_TABLE:GLOBAL ; key + max_stp_inst = 1*3DIGIT ; Max STP instances supported by HW + + +## 3.3 Switch State Service Design +### 3.3.1 Orchestration Agent + +All STP operations for programming the HW will be handled as part of OrchAgent. OrchAgent will listen to below updates from APP DB for updating the ASIC DB via SAI REDIS APIs. + + * Port state udpate - STP_PORT_STATE_TABLE + + * VLAN to instance mapping - STP_VLAN_INSTANCE_TABLE + +Orchagent will also listen to updates related to Fast ageing on APP DB. When fast ageing is set for the VLAN due to topology change, Orchagent will perform FDB/MAC flush for the VLAN. + + * FDB/MAC flush - STP_FASTAGEING_FLUSH_TABLE + + + +## 3.4 STP Container + +STP container will have STPMgr and STPd processes. + +STPMgr process will register with the config DB for receiving all the STP configurations and with state DB for VLAN configurations. STPMgr will notify this configuration information to STPd for protocol operation. STPMgr will also handle the STP instance allocation. + +STPd process will handle following interactions. STPd will use libevent for processing the incoming events and timers. +1) Packet processing + + * Socket of PF_PACKET type will be created for packet tx/rx + + * Filters will be attached to the socket to receive the STP BPDUs based on DA MAC + + * PACKET_AUXDATA socket option will be set to get the VLAN id of the packet received from the cmsg headers + +2) Configuration - All STP CLI and VLAN configurations will be received from STPMgr via unix domain socket +3) Timers handling - Timer events are generated every 100ms for handling STP protocol timers +4) Port events - Netlink socket interface for processing create/delete of port, lag interface, lag member add/delete, link state changes and port speed +5) Operational updates - STPsync is part of STPd and handles all the updates to APP DB. All DB interactions are detailed in the Section 4. +6) STP Port state sync to linux kernel - Currently the vlan aware bridge doesnt support programming the STP state into linux bridge. As a workaround, when STP state is not forwarding, corresponding VLAN membership on that port will be removed to filter the packets on blocking port. + +### Interface DB: + +In SONiC, ethernet interface is represented in the format Ethernet where id represents the physical port number and port-channel is represented by PortChannel where id is a 4 digit numerical value. + +STPd implementation makes use of its own internal port id for its protocol operation. These port ids are used for bit representation in port masks and also for indexing the array which holds the pointers to STP port level information. So to continue using these mechanisms it is required to convert the SONiC representation of interface to local STP port ids. So when STPd interacts with other components in the system local port id will be converted to SONiC interface name, similarly all messages received from other components with SONiC interface name will be converted to local STP port id before processing. For this purpose an interface DB (AVL tree) will be maintained to map the SONiC interface names to local STP port ids. + + + +### Local STP Port id allocation: + +For allocating local port ids STPd needs to know the max number of physical ports and Port-channel interfaces supported on the device. Currently there is no mechanism to obtain these values in a direct way (like reading from DBs) so below logic will be used to arrive at max ports - + +Max STP Ports = Max physical ports + Max Port-Channel interfaces + +where, + +Max physical ports will be determined from the higher interface id populated say for example if higher port is Ethernet252 (or 254) - max ports will be 256 to ensure breakout ports are accomodated. + +Max Port-Channel interfaces will be same as max physical ports to accomodate the worst case scenario of each Port-channel having only one port. + + +Post bootup, Portsyncd reads the port_config.ini file which contains the port specific configuration information, and updates the APP DB with this information. Orchagent listens to these APP DB updates and takes care of creating all the interfaces in linux kernel and the hardware. Kernel sends netlink updates for the interfaces being created which Portsyncd listens to, once Portsyncd confirms all the interfaces have been configured it will generate PortInitDone notification. STPd will receive this message via STPMgr and then gets all the interface information via netlink for allocating the local STP port ids. + +Note: The port id for Port-channel interface will be allocated only when the first member port is added, this ensures Port-channel interfaces with no member ports dont burn the port ids. + +Example of port id allocation - +``` +SONiC interface STP Port id +Ethernet0 0 +Ethernet4 4 +Ethernet8 8 +Ethernet252 252 +PortChannel10 256 +PortChannel5 257 +PortChannel20 258 +``` + +Note: As Port-channel port ids are dynamically allocated it might result in same port channel getting different value post reboot. To ensure a predictable convergence where port-channel port id is used as tie breaker, user needs to configure appropriate port priority value. + +### 3.4.1 BPDU trap mechanism +SONiC uses copp configuration file 00-copp.config.json for configuring the trap group, ids, cpu priority queue and a policer. This functionality has been extended for supporting STP and PVST BPDU trap to CPU. + +## 3.5 SAI +STP SAI interface APIs are already defined and is available at below location - + +https://github.com/opencomputeproject/SAI/blob/master/inc/saistp.h + +Control packet traps required for STP (SAI_HOSTIF_TRAP_TYPE_STP) and PVST (SAI_HOSTIF_TRAP_TYPE_PVRST) are defined in below SAI spec - + +https://github.com/opencomputeproject/SAI/blob/master/inc/saihostif.h + +## 3.6 CLI +### 3.6.1 Data Models +Openconfig STP yang model will be extended to support PVST. + +### 3.6.2 Configuration Commands + +### 3.6.2.1 Global level + +### 3.6.2.1.1 Enabling or Disabling of PVST feature - Global spanning-tree mode +This command allows enabling the spanning tree mode for the device. + +**config spanning_tree {enable|disable} {pvst}** + +Note: +1) When global pvst mode is enabled, by default spanning tree will be enabled on the first 255 VLANs, for rest of the VLAN spanning tree will be disabled. +2) When multiple spanning-tree modes are supported, only one mode can be enabled at any given point of time. + +### 3.6.2.1.2 Per VLAN spanning-tree +This command allows enabling or disabling spanning-tree on a VLAN. + +**config spanning_tree vlan {enable|disable} ** + + ### 3.6.2.1.3 Root-guard timeout + +This command allows configuring a root guard timeout value. Once superior BPDUs stop coming on the port, device will wait for a period until root guard timeout before moving the port to forwarding state(default = 30 secs), range 5-600. + +**config spanning_tree root_guard_timeout ** + + ### 3.6.2.1.4 Forward-delay + +This command allows configuring the forward delay time in seconds (default = 15), range 4-30. + +**config spanning_tree forward_delay ** + + ### 3.6.2.1.5 Hello interval +This command allow configuring the hello interval in secs for transmission of bpdus (default = 2), range 1-10. + +**config spanning_tree hello ** + + ### 3.6.2.1.6 Max-age +This command allows configuring the maximum time to listen for root bridge in seconds (default = 20), range 6-40. + +**config spanning_tree max_age ** + + ### 3.6.2.1.7 Bridge priority +This command allows configuring the bridge priority in increments of 4096 (default = 32768), range 0-61440. + +**config spanning_tree priority ** + + ### 3.6.2.2 VLAN level +Below commands are similar to the global level commands but allow configuring on per VLAN basis. + +**config spanning_tree vlan forward_delay ** + +**config spanning_tree vlan hello ** + +**config spanning_tree vlan max_age ** + +**config spanning_tree vlan priority ** + + ### 3.6.2.3 VLAN, interface level +Below configurations allow STP parameters to be configured on per VLAN, interface basis. + + ### 3.6.2.3.1 Port Cost +This command allows to configure the port level cost value for a VLAN, range 1 - 200000000 + +**config spanning_tree vlan interface cost ** + + ### 3.6.2.3.2 Port priority +This command allows to configure the port level priority value for a VLAN, range 0 - 240 (default 128) + +**config spanning_tree vlan interface priority ** + + ### 3.6.2.4 Interface level + + ### 3.6.2.4.1 STP enable/disable on interface + +This command allows enabling or disabling of STP on an interface, by default STP will be enabled on the interface if global STP mode is configured. + +**config spanning_tree interface {enable|disable} ** + + ### 3.6.2.4.2 Root Guard: +The Root Guard feature provides a way to enforce the root bridge placement in the network and allows STP to interoperate with user network bridges while still +maintaining the bridged network topology that the administrator requires. When BPDUs are received on a root guard enabled port, the STP state will be moved to "Root inconsistent" state to indicate this condition. Once the port stops receiving superior BPDUs, Root Guard will automatically set the port back to a FORWARDING state after the timeout period has expired. + +This command allow enabling or disabling of root guard on an interface. + +**config spanning_tree interface root_guard {enable|disable} ** + +Following syslogs will be generated when entering and exiting root guard condition respectively - + +STP: Root Guard interface Ethernet4, VLAN 100 inconsistent (Received superior BPDU) + +STP: Root Guard interface Ethernet4, VLAN 100 consistent (Timeout) + + ### 3.6.2.4.3 BPDU Guard +BPDU Guard feature disables the connected device ability to initiate or participate in STP on edge ports. When STP BPDUs are received on the port where BPDU guard is enabled the port will be shutdown. User can re-enable the port administratively after ensuring the BPDUs have stopped coming on the port. + +Below command allows enabling or disabling of bpdu guard on an interface. + +**config spanning_tree interface bpdu_guard {enable|disable} ** + + +By default, BPDU guard will only generate a syslog indicating the condition, for taking an action like disabling the port below command can be used with shutdown option + +**config spanning_tree interface bpdu_guard {enable|disable} [--shutdown | -s]** + +Following syslog will be generated when BPDU guard condition is entered - + +STP: Tagged BPDU(100) received, interface Ethernet4 disabled due to BPDU guard trigger + +STPd will update the config DB for shutting down the interface, user can enable the interface back once it has stopped receiving the BPDUs. + + ### 3.6.2.4.4 Port fast +Portfast command is enabled by default on all ports, portfast allows edge ports to move to forwarding state quickly when the connected device is not participating in spanning-tree. + +Below command allows enabling or disabling of portfast on an interface. + +**config spanning_tree interface portfast {enable|disable} ** + + ### 3.6.2.4.4 Uplink fast + Uplink fast feature enhances STP performance for switches with redundant uplinks. Using the default value for the standard STP forward delay, convergence following a transition from an active link to a redundant link can take 30 seconds (15 seconds for listening and an additional 15 seconds for learning). + +When uplink fast is configured on the redundant uplinks, it reduces the convergence time to just one second by moving to forwarding state (bypassing listening and learning modes) in just once second when the active link goes down. + +**config spanning_tree interface uplink_fast {enable|disable} ** + + ### 3.6.2.4.5 Port level priority +This command allows to configure the port level priority value, range 0 - 240 (default 128) + +**config spanning_tree interface priority ** + + ### 3.6.2.4.6 Port level path cost +This command allows to configure the port level cost value, range 1 - 200000000 + +**configure spanning_tree interface cost ** + + + +### 3.6.3 Show Commands +- show spanning_tree +- show spanning_tree vlan +- show spanning_tree vlan interface + +``` +Spanning-tree Mode: PVST +VLAN 100 - STP instance 3 +-------------------------------------------------------------------- +STP Bridge Parameters: + +Bridge Bridge Bridge Bridge Hold LastTopology Topology +Identifier MaxAge Hello FwdDly Time Change Change +hex sec sec sec sec sec cnt +8000002438eefbc3 20 2 15 1 0 0 + +RootBridge RootPath DesignatedBridge Root Max Hel Fwd +Identifier Cost Identifier Port Age lo Dly +hex hex sec sec sec +8000002438eefbc3 0 8000002438eefbc3 Root 20 2 15 + +STP Port Parameters: + +Port Prio Path Port Uplink State Designated Designated Designated +Num rity Cost Fast Fast Cost Root Bridge +Ethernet13 128 4 Y N FORWARDING 0 8000002438eefbc3 8000002438eefbc3 +``` + +- show spanning_tree bpdu_guard +This command displays the interfaces which are BPDU guard enabled and also the state if the interface is disabled due to BPDU guard. +``` +show spanning_tree bpdu_guard +PortNum Shutdown Port shut + Configured due to BPDU guard +------------------------------------------------- +Ethernet1 Yes Yes +Ethernet2 Yes No +Port-Channel2 No NA +``` + +-show spanning_tree root_guard +This command displays the interfaces where root guard is active and the pending time for root guard timer expiry +``` +Root guard timeout: 120 secs + +Port VLAN Current State +------------------------------------------------- +Ethernet1 1 Inconsistent state (102 seconds left on timer) +Ethernet8 100 Consistent state +``` + +- show spanning_tree statistics +- show spanning_tree statistics vlan +This command displays the spanning-tree bpdu statistics. Statistics will be synced to APP DB every 10 seconds. +``` +VLAN 100 - STP instance 3 +-------------------------------------------------------------------- +PortNum BPDU Tx BPDU Rx TCN Tx TCN Rx +Ethernet13 10 4 3 4 +PortChannel15 20 6 4 1 +``` + + +### 3.6.4 Debug Commands +Following debug commands will be supported for enabling additional logging which can be viewed in /var/log/stpd.log, orchangent related logs can be viewed in /var/log/syslog. +- debug spanning_tree - This command must be enabled for logs to be written to log file, after enabling any of the below commands. +- debug spanning_tree bpdu [tx|rx] +- debug spanning_tree event +- debug spanning_tree interface +- debug spanning_tree verbose +- debug spanning_tree vlan + +To disable the debugging controls enabled '-d' or '--disable' option can be used, an example of disabling bpdu debugging is shown below - +- debug spanning_tree bpdu -d + +Follow commands can be used to reset and display the debugging controls enabled respectively +- debug spanning_tree reset +- debug spanning_tree show + +Following debug commands will be supported for displaying internal data structures +- debug spanning_tree dump global +- debug spanning_tree dump vlan +- debug spanning_tree dump interface + +### 3.6.5 Clear Commands +Following clear commands will be supported +- sonic-clear spanning_tree statistics +- sonic-clear spanning_tree statistics vlan +- sonic-clear spanning_tree statistics vlan-interface +- sonic-clear spanning_tree statistics interface + +### 3.6.6 REST API Support +REST APIs is not supported in this release + +# 4 Flow Diagrams + +## 4.1 Global spanning-tree mode + +![STP](images/Global_PVST_enable.png "Figure 2: Global PVST mode enable") + +__Figure 2: Global PVST mode enable__ + +![STP](images/Global_PVST_disable.png "Figure 3: Global PVST mode disable") + +__Figure 3: Global PVST mode disable__ + +## 4.2 Per VLAN spanning-tree +![STP](images/STP_VLAN_enable.png "Figure 4: STP VLAN enable") + +__Figure 4: STP VLAN enable__ + +![STP](images/STP_disable_VLAN.png "Figure 5: STP VLAN disable") + +__Figure 5: STP VLAN disable__ + +## 4.3 STP enable/disable on interface +![STP](images/STP_enable_interface.png "Figure 6: STP enable on an interface") + +__Figure 6: STP enable on an interface__ + +![STP](images/STP_disable_on_interface.png "Figure 7: STP disable on an interface") + +__Figure 7: STP disable on an interface__ + +## 4.4 STP port state update +![STP](images/STP_Port_state.png "Figure 8: STP port state update") + +__Figure 8: STP port state update__ + +## 4.5 STP Topology change +![STP](images/Topology_change_update.png "Figure 9: STP Topology change") + +__Figure 9: STP Topology change__ + +## 4.6 STP Port id allocation +![STP](images/STP_Port_id_allocation.png "Figure 10: STP Port id allocation") + +__Figure 10: STP Port id allocation__ + +# 5 Serviceability and Debug +Debug command and statistics commands as mentioned in Section 3.6.3 and 3.6.4 will be supported. Debug command output will be captured as part of techsupport. + +# 6 Warm Boot Support +Warm boot is not supported + +# 7 Scalability +16k port-vlan instances with max 255 STP instances. +The scaling limit might differ depending on the platform and the CPU used, which needs to be determined based on testing. + +# 8 Unit Test + +CLI: +1) Verify CLI to enable spanning-tree globally +2) Verify CLI to enable spanning-tree per VLAN +3) Verify CLI to enable spanning-tree on interface +4) Verify CLI to set Bridge priority +5) Verify CLI to set Bridge forward-delay +6) Verify CLI to set Bridge hello-time +7) Verify CLI to set Bridge max-age +8) Verify CLI to set Bridge Port path cost +9) Verify CLI to set Bridge Port priority +10) Verify CLI to set Bridge root guard +11) Verify CLI to set Bridge root guard timeout +12) Verify CLI to set Bridge bpdu guard +13) Verify CLI to set Bridge bpdu guard with do-disable action +14) Verify CLI to set portfast +15) Verify CLI to set uplink fast +16) Verify CLI to clear uplink fast +17) Verify CLI to clear portfast +18) Verify CLI to clear Bridge bpdu guard with do-disable action +19) Verify CLI to clear Bridge bpdu guard +20) Verify CLI to clear Bridge root guard timeout +21) Verify CLI to clear Bridge root guard +22) Verify CLI to clear Bridge Port priority +23) Verify CLI to clear Bridge Port path cost +24) Verify CLI to clear Bridge max-age +25) Verify CLI to clear Bridge hello-time +26) Verify CLI to clear Bridge forward-delay +27) Verify CLI to clear Bridge priority +28) Verify CLI to disabling spanning-tree on interface +29) Verify CLI to disabling spanning-tree per VLAN +30) Verify CLI to disabling spanning-tree globally +31) Verify CLI to display spanning-tree running config +32) Verify CLI to display spanning-tree state information +33) Verify CLI to display spanning-tree statistics +34) Verify CLI to display bpdu-guard port information +35) Verify CLI for clear spanning tree statistics + +Functionality +1) Verify Config DB is populated with configured STP values +2) Verify PVST instances running on multiple VLANs +3) Verify VLAN to STP instance mapping is populated correctly in APP DB, ASIC DB, Hardware +4) Verify spanning-tree port state updates for the VLANs are populated correctly in APP DB, ASIC DB, Hardware +5) Verify during topology change fast ageing updates in APP DB and also MAC flush is performed +6) Verify BPDU format of the packets are correct +7) Verify traffic with stp convergence and loops are prevented where there are redundant paths in topology +8) Verify load balancing functionality with multiple spanning tree instances +9) Verify adding port to VLAN after spanning-tree is enabled on the VLAN and verify port state updates +10) Verify deleting port from VLAN running PVST and verify re-convergence is fine +11) Verify BUM traffic forwarding with spanning-tree running +12) Verify forward-delay by changing intervals +13) Verify hello-time timers +14) Verify max-age by changing intervals +15) Verify altering bridge priority will alter Root Bridge selection +16) Verify altering port priority will alter Designated port selection +17) Verify altering port cost results in path with lowest root path cost is seleced as root port +18) Verify port states on same physical interface for multiple STP instances configured +19) Verify Topology change functionality and spanning-tree reconvergence by disabling/enabling links +20) Verify spanning-tree behavior after adding few more VLANs +21) Verify spanning-tree behavior after removing some of the VLANs +22) Verify rebooting one of the nodes in the topology and verify re-convergence and traffic takes alternate path +23) Verify port cost values chosen are correct as per the interface speed +24) Verify Root guard functionality +25) Verify BPDU guard functionality, verify BPDU guard with do-disable functionality +26) Verify Portfast functionality +27) Verify Uplink fast functionality +28) Verify PVST and STP traps are created after switch reboot +29) Verify PVST behavior when spanning-tree disabled on VLAN, verify APP DB, ASIC DB, hardware are populated correctly +30) Verify global spanning-tree disable, verify APP DB, ASIC DB, hardware are populated correctly +31) Verify spanning tree config save and reload, verify topology converges is same as before reboot +32) Verify PVST convergence over untagged ports +33) Verify PVST interop with IEEE STP +35) Verify bridge id encoding includes the VLAN id of the respective PVST instance +36) Verify PVST operational data is sync to APP DB +37) Verify PVST over static breakout ports + +Scaling +1) Verify running 255 PVST instances +2) Verify 16k vlan,port scaling + +Logging and debugging +1) Verify debug messages are logged in /var/log/syslog +2) Verify changing log level +3) Verify Pkt Tx/Rx Debug +4) Verify STP debug commands +5) Verify debug commands output is captured in techsupport + +REST +1) Verify all REST commands + +PVST over LAG +1) Verify PVST behavior over LAG +2) Verify adding port to LAG will not flap the protocol +3) Verify deleting a port from LAG will not flap the protocol +4) Verify BPDU is sent only from one of the LAG member port +5) Verify adding LAG to VLAN running PVST +6) Verify deleting LAG from VLAN running PVST + +SAI +1) Verify creating STP instance and ports in SAI +2) Verify adding VLAN to STP instance in SAI +3) Verify updating PortState in SAI +4) Verify deleting VLAN from STP instance in SAI +5) Verify deleting STP instance and ports in SAI + +L3 +1) Verify L3 traffic with PVST in topology diff --git a/doc/stp/images/Global_PVST_disable.png b/doc/stp/images/Global_PVST_disable.png new file mode 100644 index 0000000000..c0a0aee3b1 Binary files /dev/null and b/doc/stp/images/Global_PVST_disable.png differ diff --git a/doc/stp/images/Global_PVST_disable.xml b/doc/stp/images/Global_PVST_disable.xml new file mode 100644 index 0000000000..a65260b9bc --- /dev/null +++ b/doc/stp/images/Global_PVST_disable.xml @@ -0,0 +1,2 @@ + +7Vxbc9o4FP41zOw+JOMLNvDIpclmphemZNPuo7AFeCosaosQ+utXsiVfJAMyjVOXkofEPtbNR5/O+c6RnI49Xr/cR2Cz+oB9iDqW4b907EnHskzT6dM/TLJPJT3bSQXLKPB5oVwwC35ALjS4dBv4MC4VJBgjEmzKQg+HIfRISQaiCO/KxRYYlXvdgCVUBDMPIFX6JfDJKpX2HSOX/wOD5Ur0bBr8yRqIwlwQr4CPdwWR/a5jjyOMSXq1fhlDxJQn9JLWuzvwNBtYBEOiU2Fge4b7/eX5yw9IHu4e7PDD1+CGt/IM0Ja/8Oxx+uH+Mx8y2Qs90NFv2OV2jd4HC4iCkN6NNjAK1pDAiD5BXDzNZaPdKiBwtgEeq7qjCKGyFVkjemfSSzppBNAqUXaPENjEwTzp1aCSCHrbKA6e4WcYp9hgUrwlrKdxNudJUbwNfejzpjJtG0m768Dj1wjMIRoB79syqTDGCLPuQ5y8UEwi/A0KYceyjeQneyIwwLpYBAgVSt4lP0xO3+oOrAPEIP8EIx+EgIs5vk2L31d1BFCwDKnMo/OaKFGdaDFrMCLwpSDiE38PMZ2AaE+L8KddsZz4Khzw210O6Qy4qwKcnT4XAr6MllnTOdLoBQdbDeBZCvAUyBWAssFBSJIhOKOOM5GQhyOywkscAlTEXo4H49LxcHRp64PEKWHEtKpAomLE6jaFEbvKOE2upqnNUKyNOsdpm2nqKrD7FHmr4fKvmGz+vqLvotDXt9uGPufqGBt2jN26IJEdo6vpGI2mMOIqGBl/+nj3cH81Tm0GY23jZLaOtfeuxqlh4+TWBYlsnAxN4+Q2hZH+FSO/kk4PyniwbE08WE3hYaDg4R7hOUjShEEMmNewjDX22Z+O5SI6oNE8oldLdjV9mj2q0mOIYroKPICGXLtzTAhe0wcw9IcsI8hkCHvfEhF9w68cL8nNf1rg0Zx46C/hjA8SojnevcsFo0RAH4jxHpt/1tDR2Y8gAoQ63FKtqqnkVads2RU8jSuhxpTQEONt5EFeq5hklBqyZPjJDREQLSFRGkqQlb3P+WAT3RXQFmISLPZXDDWNIduQpt5yzsNQ1znRUNMYUnOTTWPoz0JKT+Is9kDP2lCtgX2hGOcOh/uRUkuinxwvaYvVtfu9E6NsGoZq+lOBG4PLRp81ZNtkYC5aMDqVMyzMea+swYrI160gE25jka+pJudE6CuvzclIY72KANnb09DU14l+5yn5fD/PBNmy/ZQGuFyuTsvxaT5N8U5ovbGoz9TISR1AYr1Xro9PKfjRJLvdXmOqUlMzx3zEOdGPrpKdA0o+iqu3DRVNjXxCwVtyd+iDeJVljArKzInYrVOkYtmzyUuRpU32/K6mQlOzr4GBk260oHenQu9C9pPetrxEFOjrsrJ+uZ0bedevaW+okVZoD1S4gUt1+/tAhS5IifOcixZpW0V2TU2DRc056DqsV6NOPbNl1EkkeApKGU6nVDBpJS0aHJiI9tCiDCuvRYsOvfLvT4uEZn49LaqRUf1ltMjSiPxa4+uyU5mnaJF4iZb4usugRZYaD7cXKsLAnaRFLYPKpdAi6/w4/tVo0WDQNlqkRuzD2cO4rbzIOiusf1tepBHV1+FFB1/5AnhR85vlukruHVBym3iRRljXGmcnFHqaF/Vb5ewugxeJlf17QIXr6jQvahdULoUX2Wogb97S+383PiDsWMqGmtYEO8mtvKlksbJD36e/n94PP8Zs0WP6y4cLtXAQ0lZCr6IZmzUzgQgmXZIV+z17nFZtYrlgzWhGOI83hSbji9+fPrp6mt61vjGlAw52t6eF9rrb1jfZGT6pI719a+vEKJteTFXJixSXDBD0yQJwI+d+37KPH0cCMpkg28KFAFEEposQJH0fOg9WvSRESfomaedZ8VMLhZIgUnYCETvLDPJDzioZA1uC+XlnVkEc70NwQSrWG8FspDGNHIJw+chuJjfdY8cyapwJdQclDFg2dxxF6mZXOAa7Kepmq3mKGbUzVLIUhwG1bF12YvDCLd0bncTpSuf+bGNwq3dqq65R60onSfKe9KyafK61YqRNGzY1UTAbPjDXP3244rFsgIzbni0cQXYw/VU4pylzzq6Yl1cGrGn2jcqO9PBqWk519VNwzVsSBfFiEcNmIH1+pubVEoJm9p1xa75ysdWszGwfer6im+vnVS36VOI4wmumtd4Wbxp5resnMz+Hg742Dtr60aet7t788Z82HF3zjScEpK0sWwQ59T94ON5O06elDqS/qHdhtKOYiuKZMBlvPDGWpr0WONqByKdB7YFU2We4xs+wkC1bRBSAhz/+MvJo/o/k1D8JU1dOW/UGjRBmVzrwJvrR48vZ/wU7NMqzFwG9zf/JVlo8/1dl9rv/AQ== \ No newline at end of file diff --git a/doc/stp/images/Global_PVST_enable.png b/doc/stp/images/Global_PVST_enable.png new file mode 100644 index 0000000000..1e46076404 Binary files /dev/null and b/doc/stp/images/Global_PVST_enable.png differ diff --git a/doc/stp/images/Global_PVST_enable.xml b/doc/stp/images/Global_PVST_enable.xml new file mode 100644 index 0000000000..ededac154d --- /dev/null +++ b/doc/stp/images/Global_PVST_enable.xml @@ -0,0 +1,2 @@ + +7ZrbctowEIafhst0fMAGLgskaTtNSwvTw6VsC1sT2XJkEaBP35UtnyE4yThNW7jB+nX06pN2l2RgzsLdNUdxcMM8TAeG5u0G5nxgGLo5GcOXVPaZMrLNTPA58VSjUliSX1iJmlI3xMNJraFgjAoS10WXRRF2RU1DnLNtvdma0fqsMfJxS1i6iLbV78QTQaaOLa3U32HiB/nMuqZqQpQ3VkISII9tK5J5OTBnnDGRPYW7GabSeLldsn5XR2qLhXEciS4d1mNz+MV5Z9zdxbezDztuL1azCyMb5R7RjXrh5Wpxc/1VLVnsczvA6mP5uAnpR7LGlERQmsaYkxALzKGGKnlRatNtQARexsiVXbdACGiBCCmUdHiETRMIuvCiTCmKE+Kks2qgcOxueELu8VecZGxIlW2EnGlW7HnalG0iD3tqqMLaWjpuSFz1TJGD6RS5t37aYcYok9NHLH2hRHB2i3NxYJha+ilqcgbkFGtCaaXlVfqROrzVFQoJlch/w9xDEVKy4ls3VPnQRIgSPwLNhX1Njai2CHOBd0f3Xi+IgqOIGWwA30MT1WGYHyd1CiequC2RLsANKjhbYyUidYz8YuiSNHhQsD0CPLMFXgu5CigxI5FIl2BNB9a8QR7jImA+ixCtslfyoP2HPBw/7cchsWqM6MYhSNqMGMO+GBkeupzm56vpL0Px4avJsl7b1TRqYTf7/Onq/fUZvH8KPP3V+cTx2Se+vE8cPconah19ot0XI5MzI3/SWU3qPMB03Xgw+uIhn60CxDVlDpIdkxhFEYn8C8ExlikpZOcDuRKbwrqmDocnXz4tvi1XbfUhsKQVCWTJb5WRHSYEC6ECR95bmXZLjTL3NpXgRX8obNLCz04Mddx/7EHCrhaJqcO2l6UwTQWoyNdbvJbs9VgIOKZIgN+t9Tu0o6rrQp6+isOxG/DoDSgStuEuVr2qmXxjIKNJYXMggbiPRWugFLDifZ7BnN5iLmKCrPdnhvpmyNQaW29YT2NoaJ0YqG+GOvwAIHc0Pnprq1/8kJM31zpvQn6ORvUI0G5f5vaBy9zWervM2xlvnno0D9Z82uGw5QmKu4fUwOuSfTiZ8//oFEJxsj5nCYbSnx5wZTvf3pQTVu8t6tatlyDxUabI+WwEnx2DjeGoN1PZp031zOjzGVxZJ7l62VBdb/+Q0LJWxdUpj+WhJCgy9ooxSw/4xqr6wKJuvqu6x/lelTq4qgePReYHTkLxoN2tQ2m00p7pEOtHpIV+V3c4ro9z0fxNs29v2CH1f8WoKF+eGfvVogIHsr7L5lNpMetxQ9M19QyL2Q6dlgAHKH6e+S1Xi0PhgY1C6e8jJ5FfKh28iO8T0V/k/n/F58NGjmdqkzfdInSwG9pXminneXymRvBazlTilo155OKcnFzpkzmGYvl37ax5+d8B5uVv \ No newline at end of file diff --git a/doc/stp/images/STP_Architecture.png b/doc/stp/images/STP_Architecture.png new file mode 100644 index 0000000000..a3b0919eca Binary files /dev/null and b/doc/stp/images/STP_Architecture.png differ diff --git a/doc/stp/images/STP_Architecture.xml b/doc/stp/images/STP_Architecture.xml new file mode 100644 index 0000000000..a28ee60da6 --- /dev/null +++ b/doc/stp/images/STP_Architecture.xml @@ -0,0 +1,2 @@ + +7Vxbc5s6EP41njnnIRlAXOxHX1ufpKknTm+PBBRMg8EjcGOfX3+EETZIqsFYxoSeznRiFhDw7ber3dVCBwyXmw/IXC0+BTb0Oopkbzpg1FEUWda6+E8s2SYSA2iJwEGuTQ46CObuv5AIJSJduzYMcwdGQeBF7iovtALfh1aUk5kIBW/5w14CL3/VlelARjC3TI+VfnPtaJFIu4pxkH+ErrNIryzrvWTP0kwPJk8SLkw7eMuIwLgDhigIouTXcjOEXgxeikty3uQ3e/c3hqAflTmhD19CZ6CNHk2nezPZoLmz+HSTDvPL9NbkicndRtsUAhSsfRvGo8gdMHhbuBGcr0wr3vuGlY5li2jpkd3sXaVXgCiCm4yI3OUHGCxhhLb4kD1pCGKEMjrZfDvgD1JQFxnsgUSEJtG5sx/6AAv+QZA5ASVQDBL07X5MN7zlBz4sCQq0c+xjIck8s8Z55FSGoGdG7q88Z3kwkCvMAhffyR5xJQ+42qWADIM1siA5KUsxahwZG8bxkSITOTBiRtppZf/Y1RWlMoq6g8iPXRKlLkzFKK+jMELBKxwGXoAOOnxxPY8SmZ7r+HjTwrqEWD6Iie1ih9EnO5aubceX4VrKwZYkMcZCqU6TWGNROcRRLmUqGqOBLyFE4Q6FP0UJqnplJcisFo45LMszw9C1dvibKGLFGQ1hkND2ewzdLdBAKviR3TvaEGSTrW12awaRi58x1lki3LhRMpqikc0fmV2HoeKNbZHGEj9FpmjsEr88/Oz9M5j47qs9/7npT28IDIkXOgZgt1H+We6JctBdIzfQnpY1+ecU1wwx508z+7yQI+sfOgro9YbDyURQKALywGONMpYtGxzF6xcz7R4PwU8OaiyGeyMgGHZZ5yjzwrmLQZheLA/hfOtbzcVQ1fI8BMYty0SeB0rTLfEolgiK684ctMZlDgobkQ4D/8V1sGw0YADDOeIq/mltPRcjh4pRe04gvn/eC0zr1dkB/3kd4VEgn52TyUAzBIU/Wq/QwnnUvJyBVw1/snFOGphko5Jb7fy45ChNGhJv7OMLotB9Rn5qvAGMgoEuHG+kZZoME/qzWZttjxeh1Gt8bIhX3fhus0mBfCHjS6fIwqygYVaqy/lgHlQ0Ut2gYoQLGyVg468ZNhwY39LTBg/5uGEY05pSAZ1RqNcu2AC2AvwAY+f1GjP9F37SsKPoXoz+M04ydCfaA9FG/TC1nKvrhw1m0giyrTrQevlsR762CtgoggG/7lSnS8371091ALe209ktzUUmDobYEkURY3exUvzvXdBWplwHr9Ah1cpbtlYUx71nKOTduBCD0oXB6qJbpypUhVHFZ2Qt+s5fYbT6+zxvQted+kNjIKju1CtT/6y1eKeethZbsIaRXb8gycaxFQySlqSJyIVrAmTWKUxLUm41JS2hViR7FdMSug1Aq3mtQmVLd/35dNjm4gHgBLu1Fg9UNtidf5vPO7GL0c1lDKH/HMZ/hu2fw2RJLZzE6g0oVF2A861Q29k7bPk0h13d+aY+tdj5NqwmdGXni1VvbjOHreIDwt/fb4/2QCDXxIZ/JCOK9ewNzOfopSvQY42dm9DRGhZn7Jx8butb9p+ZQPD0UbPzZbM5Eo/YCDsMFJ7HYSrMGPTGUl8TxG3aK2kqi2WtWUTqbZlaRbCCCPvfwDd3Y/gvQXspDuilI1CyueVyzYNsLXpfQkrWzNutEVXJFz6v7nM0Ngf6et9/aL8iqMhb5pRXeZZBL28LU4TOOiwG/WatqjYsMKY7g+ippfRiqazz44KaqhI6a5HViXBCHSvb96tSaZgE9o3Ap/T+VudW2gT57vp4KRJWb+Pt5t2TYZQi4anZGd13nGaFF83OdLYGROacG/x/6oeR6ccvE0hLc7Vy/RavgxoUX3iLfHKtK6E6WwBKArRZgKKdFswIqyZOSiYm3nDgTkGTyW7daeKtw0WL1QWo5nrAUVfaNluPujgpfKKh9tawZY1FvdYatsGGavMFnoxby3ud7ha9cruGUeKd1tJLhFSoo8tiQ5331nNoSHld02Xh0mE0FQnRvYuCAhj6dmXSMnnRAMZQLse/IvadQTTjf6KJI5pC+hQuSzSB7znkexqUksmgnM39iIcs7KIQmwuWX6lTGkVwYbmgRFGcmVhFJYNSl+a4VHBn9AmkeHFZo2BTlD8i+5AlupyvsuX8euMwAW+C/BaYhlixLucr9pXfzKYH0rRyVizMbNjVxcpzScW6opadSIq+JlB9wkiT4uLiYa9RVBM1YdQUEQHqfuVuDbXDbonq+Emr4if1owvx4nzl1LFSDr72P4aP4zv7u//4dqOOwyn6fMPLpZO3gcKV6afvA8WNIaN0B75Odh/v7SHOCP0pvkx/Nj1vlNHdCeefQwyqojWWR9rYEFTRoixUlWpsuuZ+NoUlwd3D+IluyRQKqMj+E6DTc2uN/SfHvkOTAfTe9dcbLHqA0VuAXndAmtZrYzGVr9nTw8W06osBp3zzKJ/qXu19gbLfNjpmzM2Maoyq6/J0uMEs8IuLn7moVl2Wbwv7msUqlcqFtS6oRiumdZEeSBytAuh/VD9MBz/v7i39LvLC7VDnhl+nZmVN48/RTo33VqcW5b8onhnCPpSCNw/fmE0OP3ypF4z/Aw== \ No newline at end of file diff --git a/doc/stp/images/STP_Port_id_allocation.png b/doc/stp/images/STP_Port_id_allocation.png new file mode 100644 index 0000000000..0ab1f5e46c Binary files /dev/null and b/doc/stp/images/STP_Port_id_allocation.png differ diff --git a/doc/stp/images/STP_Port_id_allocation.xml b/doc/stp/images/STP_Port_id_allocation.xml new file mode 100644 index 0000000000..61a22f875d --- /dev/null +++ b/doc/stp/images/STP_Port_id_allocation.xml @@ -0,0 +1,2 @@ + +7Zpbc9o4FMc/DY/p+A555JJkO0unTGDa3aeObAvQIFteWQTop1/Jlo2MHDBJnZAWHhL7r5t9zs86R7I79jDaPlCQLL+QEOKOZYTbjj3qWJZpuj3+Tyi7XOnabi4sKAplpb0wRT+hFA2prlEI00pFRghmKKmKAYljGLCKBiglm2q1OcHVUROwgJowDQDW1e8oZMvivgxjX/AXRIulHLrnyoIIFJWlkC5BSDaKZN917CElhOVH0XYIsTBeYZe83f0zpeWFURizJg2248dvP4yfn/+e+f8Nbrps5gXzGzvv5QngtbzhCaFsuouDUF412xWm4DeQiMN1hMdoDjGK+dkggRRFkEHKS7CUJ3ttsFkiBqcJCETTDYeEa0sWYX5m8kPuNwZ4E1qeYwySFPnZqAZXKAzWNEVP8BGmOR5CJWsmRhqWbs+qknUcwlB2VRrcyPqNUCCPMfAhHoBgtcgaDAkmYviYZDeUMkpWsBA7lm1kv7KkwEAMMUcYKzXvs5/Q+V3dgwhhQf03SEMQAylLxE1LntcNBDBaxFwLuGszI+q+lu5/gpTBrSJJ3z9Awh1Ad7yKLLUsL28iH8SexHKjUF1AvVSAdouKQD5Ji7LrPWz8QPJ2BnuOxt50Nvny8HgF77cCzzUuDTxXA09DTgElIShm2SW4g447OiCPz5ZLsiAxwCp7ex6M352Ho492Y0gct8LIbR0jOiL8SltCpHdF5D1jVdes4FDmUCd4MFubMorR1ETp6+Psx6w/GPNe+h3Lw/wiBj7lRwtxJPKozzFiI+EprfQYTcJOiGegfWlZnzBGIl4A47AvUlqhYRKsMonf3T+Slezk30bgNHQ6DHkyLC8SYp9s7vbCIBN4QXG9x3wvOjrqeQoxYDzWVlrVuVE2nYhHTiGmd0BMMaEUXaRkTQMoW6lZ8kFHjnmiIwboAjKto4yq8n5eAZqekceEofnuylDbDDnWges942UMue6JjtpmSM+s32Y2ehde9vh+sioAf3Ju3UIYbVW+Rzv1TInKmfhuALreATe925cB2DXs4x21DGCRLR1LnwQiSfNcodzhAX7Rg1HvBNnK7lWXHa6eQng1KYRntJVCWPrM3p9MuDDSn79ixRvs+FozbLKc9fN0cuyXQvkYfs1XrFLXTX7chaeTthMWbS0ns/Rprill593y2exZ1dWMaTdLX51ua6Zqf8nb1MjNl4anFn9Wa9byTltLiX4yvIUgXZZbQIoxlfjkqvHJPBGbzjRoPqU3YOBkFFPs7tbty0jttcGu8oho6DcNdb1qPzfO26ZaVvcjoVJcW2bbj4PKjdmtetl+KS3FRHywFflGsDj6JkLTgPXL0iLPqzeBOtk6da5sLYzb+l7bdDYJNctc3wJc0Jbecb7PjO1vi9tt66nQH7S1e/yJ/rDb/46pQdLHmASAQa6KA9FFwt3P/8XryOfmER8XiL8g654t4b4GiEPRDCzEuNyWcz4dpR3LA5GYjmI/TTLjntyW4dZk1fmrSkmx0aLgICXNlYfbOhEKw2yHpW7CrALdPDo96++ueRDQnRqP23Uub83jF7Bl0XsmPXnHV/T6nsUYxestl1aQxuJbn2uU/mCzc0n6JUdpp8HezjVKv44DuzEH9VHa9JqFafPw9cevo+S81XeAQZoKx6pzy7Pz+YUsfnvdalCwzQNrNn4pYHrHO2p79auv8x4ga5IwzUnHGoqWCIh0C4pJffVHpUtWlYHazKBu7/0F6RI/3X+Tmvt+/2Wvffc/ \ No newline at end of file diff --git a/doc/stp/images/STP_Port_state.png b/doc/stp/images/STP_Port_state.png new file mode 100644 index 0000000000..497504ce4b Binary files /dev/null and b/doc/stp/images/STP_Port_state.png differ diff --git a/doc/stp/images/STP_Port_state.xml b/doc/stp/images/STP_Port_state.xml new file mode 100644 index 0000000000..16ff9a41de --- /dev/null +++ b/doc/stp/images/STP_Port_state.xml @@ -0,0 +1,2 @@ + +7Ztbc6M2FMc/jWfah+wAMuA8+pLsprOddeu22z7KIGNNBGJAjuN++kogrgKDM2Zjp/ZMxuigG+f8hKS/4hGY+6+fIxhuf6UuIiNDc19HYDEyDF03J/xLWA6pxQZmavAi7MpMhWGF/0XSqEnrDrsormRklBKGw6rRoUGAHFaxwSii+2q2DSXVVkPoIcWwciBRrd+xy7apdWLYhf0Lwt42a1m37tM7PswyyyeJt9Cl+5IJPIzAPKKUpVf+6xwR4bzML2m5x5a7ecciFLA+BX55AYtA+x77vwVfxp6zcHZ2eCdreYFkJx949cdyITvMDpkXeN9DcbnzyVe8QQQHPDULUYR9xFDE7xBpXha22X6LGVqF0BFF95wPbtsyn/CUzi95yBjkRaI8TQgMY7xOWtW4JULOLorxC/odxSkZwkp3TLQ0zyOeZKW7wEWurCr3tZbU62NHXhO4RmQGnWcvKTCnhIrmA5o8UMwi+owy48gAWvLJ72QEiCY2mJBSzsfkI+z8qR6hj4kA/i8UuTCA0izp1g2ZbmoIEuwF3ObwqCZOVMOcxQxFDL2WTDLsnxHlAYgOPEt215YIyjF4L4fgvgBa12SWbQlmcyKNUA4iL6+64IxfSNROwM5QsPsWOdup91PMwp9v9H0o+sbmpdEHFPoU5EqghBQHLOmCORuZixp5NGJb6tEAkjJ7BQ/aR+fh6Pg+AZIKI7qlNUCiMpLnOzsj4xsj7/jOMPQqDwZoemk08DDYO8NUePgzdCFD3HbH//iqSTwzj/TI4A1YhHdoto74lccSl1jQFxNQsI7DJDhJ0XndjgN+J+AzVkMN7fQJv2K+ZJ3KSKwpY9TnN1DgTsUaWNgIdZ47ceqJAnL5Gll2BZE13T8Uhlli4DeyXh0jQlR0lIcIEcj4FFwp1RRcWXQpBmKJI6PGkTmuVhHTXeQgWapAhHsNHkrZ5Phubwdoje0UxKU1NpcGZkcvGYw8xJReJiDnzno725bCdkAZ3hxuFJ6HQmDV4jsxB6EQ1Nb5WTv9KBzfd/RyYArt2xv2Ctk2tSo1wBh/GoZuU9daWurHtwU6ezow4ZP+hLfC3IE9DkQDBIovH/lrviAzNAbFpvYG/CBLCmDcDwR8fVFRtPS2ZUVDTwcG/r51YcGj4OGY7xf4BkjolnU2HYJR4pwbtOV9kfbJBgBUopq9Fc+8RgH2j1mjZO30XKOMm0t3AV1UlGWkm02MBoE+C8gx6UBAF7bSoOyT8+MNuM5q0EbHOLFqc6Wlbp+thu2zpQ21fdbVg4bp6onXpS3UIZ3pvc6B4MDtI+auUy3l6zo35CP7W6rXSrvq8+Mx7FYsOlw6mCChqxp6X8xOe+ST4atNkDrop+WN7cFcNbzg29fJRouTj3LV5K38WPH83uohfZYmVDljujDe5gcgJWdyf0SHv3lC42sPmfynfG8h3KDlqYNMnejQ9L3fg4HOmbbkd7PB7+Z5ptvqEFHQb5lslWom1XruxrV6Bl7f6aoqesGoZOfwiW+vB5U73a4vzd5IS20/UZ+ahoZFlRlXh8BxW6f/23HvNR7d2DXK3v+4V1eVxXNP//+js7zjY/t6D3x1VZ27nYKcUzizaiEHZr9p7FSFwar9v0nWTj+Fwa5PtvVeDj1LqprZavrEDdPl043EH6WG2bWFNRjoxM6+1xrb6cfqxGoufUlqWLZNfU81DAC94qd3V8OMBjVsubxUMSwP4eWKYRlU5xLDWh/5+sUw42LEsDxoFyyGGdckhmV96xTDjMtSOD6GGGZckxiW/yilSwy7MFQuXwzjyeI3PWn24pdR4OE/ \ No newline at end of file diff --git a/doc/stp/images/STP_VLAN_enable.png b/doc/stp/images/STP_VLAN_enable.png new file mode 100644 index 0000000000..dfa2a2cdd2 Binary files /dev/null and b/doc/stp/images/STP_VLAN_enable.png differ diff --git a/doc/stp/images/STP_VLAN_enable.xml b/doc/stp/images/STP_VLAN_enable.xml new file mode 100644 index 0000000000..006003ddc2 --- /dev/null +++ b/doc/stp/images/STP_VLAN_enable.xml @@ -0,0 +1,2 @@ + +7Zzdc5s4EMD/Gs/cPbTDh7Hxo2Mnucy0TabupZdHGWSbqUAcyIl9f/1JIPElDLJrEpraDy2sxAp2f5JWi8jAnPm72wiEm8/YhWhgaO5uYM4HhqHrlk3/Y5J9KhmbVipYR57LK+WChfcf5EKNS7eeC+NSRYIxIl5YFjo4CKBDSjIQRfilXG2FUbnVEKyhJFg4AMnS755LNuK5NC0v+At66w1v2rZ4wRI4P9YR3ga8vYFhrpJfWuwDoYvXjzfAxS8FkXk9MGcRxiQ98ncziJhthdnS624OlGb3HcGAqFzg7h5v/v4WXv/71fp+a1ubp6VNPnAtzwBtuT0W3x4+337lt0z2wkz07kN2uPXRJ28FkRfQs6sQRp4PCYxoCeLih1x29bLxCFyEwGGXvlCAqGxDfETPdHpIfUoAvSTKzhECYewtk1Y1Komgs41i7xl+hXGKDpPiLWEtzTIkkqrMFdDlqjJra4le33P4MQJLiK4y380wwqz5ACcPFJMI/4BCSF2qJb+sRCDCmlh5CBVq3iQ/JqdPdQN8D7Ee8QgjFwSAizn+usHP6xoCyFsHVOZQvyZGlB0tvAYjAncFEXf8LcTUAdGeVuGlQ0Ez76Q2P30pEC+qbAqwW6Ii4L1snanOSaMHHLYjwDMk8CTkCqCE2AtIcgvW1cCaV8jDEdngNQ4AKrKX86C9dx4au7Y6JFaJEd2og0RmxNC6YsSsG5zml6GpzygeTZ017NvQNJSwu4+czXT9R0zCPy/0vSv6bKNv9FmXibHjiXF4LCTViXGkNjFm9c7OyEhiZHb/5ebu9jI49RnGowcnvXdR+/gyOHU8OI2OhaQ6OGmKUbvRFSP2hZG3DKftMg+G4ipO72zMmEg8TF2XCuhajv77iEAwMKhm2vwI0Zu5Wkb0aM2OvCAmIKCzUAM/zDKeA9CU23KJCcE+LYCBO2XpQSZD2PnRCoSiM6G7hgt+KxAt8ct1LrhKBLRA3FWTT2O8jRzYYDjebwiI1rBpyOAdjt1YIyERRIDQSbl0F3Xu5pc+sK5ZIEvX6skSKtIH4lfl0FAvgH2hGu/xB9sZjurbyRlMNeZEZs94OqTi2dooHQGfRUPBMg4T3x5GtlpyLogpRtH+Hz7EJSdPSuNd7/B+JWxHVgUnUw1bSdHYsJoVpR1VUnQ2ROVc/XUAWNAtKMUBQ/XT9MvvxlrXDOlWJcianMiQYbco6pohOe0eYOKt9hdizkuMMakMFtroNGKGwxZFXRMjJ+EFMb/7DNc1Q9mbdOH6aj5JlSF71KKoa4bkjPosgoDkM9fhsCmNwtJJTSO4se65AHwTzHLqPxol7j8O6VjCBfNdsVvM98Wzwgo5ER7ktnVBYb4S4Pa4wqU9OQ1wXauk8SVNXRMuZ+2niNJUz7i2ojxRP2zY8XOytKiifAtJjr0P/SVkV4Q4YisnDQSsU8T7wBGdgjbh1nWI1i5C4I6U07vlxIiAvpAB4SIpe1HtYr7nugntdfnkcg7nIKvqiQ+zGufXZekndZmwrhIfupyml8zPRphQ/emzbVBgKTRojVYxKotpSzbKqMYmo87e6etyClm8u6iyOlfhV7zhcPbIo0ApvL5YpuR9WmaCbKS/T99QcLnslmY3t6PaYvXOUnC6Qk72AInHPfLxfFbWQ2ZNp62x1HDcmankdGVTWHFK+lrVyPYBIzdy9bq5fqG4yVqFAIvPHC6IN9krv4IxCyFQFvE8FcsOhD9HGrQ1/hEMtMY/BbtbNXYXsp9NvJa6iIS+8iqgrOfD8HUXAYacvuoxKnyAa01i9gwVOr2WvWyeSotZjhuqU1PXsMh5KtUJ62yh01jrWehkyKmY6QNbWcz7GBZlLuxvWGTIeQlVyo575F8/LDK635amamT17VtvFxYprPx6M9cJg7aGRYKBnsx17yQsUthS1R9UxL21bl3oFyrvJiw6fR1/trBoMulbWFSzwWhxN+ttXHTSsv5V4yIB+dniokOP/OvHRabCqvZ14qLMaT2Oi0yFZV1vJjth0PO9LrvERUegIi92+4uKGOBa46KeofJe4iJTXsgv9oHjHpz+L9/D/Ip723Vdr8fs7T6IMS+f63X9RYypnvDp6/d6ppwPumyiU5wJf3brrlYJiEzFzxZkTdVdwJKmrmc5OVW0mN5RwfTh7p1thjsdF+4s7ePYFCFk9mXcWWImXa/spsw2Hp35OxhdH2u1DdV+CCNfbgzrL2/DNdckKuLVKoZHIk1P8z9tlFbP/36Uef0/ \ No newline at end of file diff --git a/doc/stp/images/STP_disable_VLAN.png b/doc/stp/images/STP_disable_VLAN.png new file mode 100644 index 0000000000..6e62d7c842 Binary files /dev/null and b/doc/stp/images/STP_disable_VLAN.png differ diff --git a/doc/stp/images/STP_disable_VLAN.xml b/doc/stp/images/STP_disable_VLAN.xml new file mode 100644 index 0000000000..775c3e9244 --- /dev/null +++ b/doc/stp/images/STP_disable_VLAN.xml @@ -0,0 +1,2 @@ + +7Zxbk6I4FMc/jVW7D90FBLw8eunu7a25dI1Tc3mMEJWaSBzEsd1PvwETLglCdKSNjv3QDQdIIPkl/5PDoVtguHh9CuFy/p54CLcsw3ttgVHLskzT6dI/sWW7s3SAszPMQt9jJ2WGsf8fYkaDWde+h1aFEyNCcOQvi0aXBAFyo4INhiHZFE+bElysdQlnSDKMXYhl61ffi+Y7a9cxMvs/yJ/Nec2mwY4sID+ZGVZz6JFNzgQeWmAYEhLtthavQ4TjxuPtsrvucc/R9MZCFEQqFyw2Lz/Rv5+/GY+Db0GHfJ393G7uWCm/IF6zBx5/fnn/9IndcrTl7UDvfhlvrhf4nT9F2A/o3mCJQn+BIhTSI5iZXzLbYDP3IzReQje+dEMJobZ5tMB0z6SbtNMiSC8J032M4XLlT5JaDWoJkbsOV/4v9AmtdmzEVrKO4pqGaZ8np5J14CGPFZW2tpGUu/Bdto3hBOEBdH/MkguGBJO4+oAkD7SKQvIDcWPLAkbykx7hDMRVTH2Mc2c+Jj+xnT7VI1z4OEb+Cwo9GEBmZnybFtsvqwhifxZQm0v7NWlEuaN5r6EwQq85E+v4J0RoB4Rbego7avPhxEZhj+1uMqRTcOc5nJ0uM0I2jGZp0RlpdIPBdgB4lgSehFwOlCXxgyi5BWfQckYCeSSM5mRGAojz7GU8GNfOQ+XQVofEKTBiWmWQyIxYRlOMgLLJaXSbmnRG8WDqHEe3qcmWsPsYuvP+7K9VtPz7Rt9V0dcFutHn3ISxYWG0D4VEFMa2mjCm552ckbbEyPDjh8fnp9vkpDOMB09OpnZee+c2OTU8ObUPhUScnAxFr91qipHujZFzutO9Ig8WUBSrxuaMnsTDKIkR0rUc/f0Fw6Bl0ZJp9W1Mb2YwCenWLN7yg1UEA6pCFfzELeO7EPdZW05IFJEFPYACrx/H/2IbJu6PWiAUOxN5MzRmt4LwhGweMsMgMdAD/K6q+nRF1qGLKhqOjZsIhjNUNWWwARffWCUhIcIwoqJcuIuy7maXvsRDM0eWZZSTxYvYPRC7KoOG9gLc5k5jI35vPXanvJ6MwV2JGZHpMx4PKR8SdZS24SL2hoLJapn07X5kxSOngphiFG6/sSku2fmuNN9ph/cbYdtuCzjZathKBaVvLvYVtBuoUkEnQ1SO1Y/8FYy9bo4pCWJW3/U//GmwNQ2R2Rb6Xox5qkJkifIsFtQ0RHLcPSCRP93eiDktMcAQOtpsH0eM7dQU1DQxchSeE/OnS1zTDHWEtZ3VOXLW6XZqCmqaITmkPkJ3ENP+hlEmX5nzZExpv1LXdx5v/0qcLxm2Wvzokikqhq6Kiz6OTW51x0zSykzEd+F7XsJLWaysuD7di88Ba3xD8GGckkWdVbbKb2pRZyqEqeMxulR/+jSHA054CUZlq4haapbGx+RGsUX6T9cqcmC2ak48JvYht2d1/9QzdrZAkakQTcypAxuaHlzN03hxrjEz0bh38rKRHhu95hVltN07PCsbtHY1zhmoFYdcuzsl7c5tv7tqLwwRCX1lBSmWc/fWax+FoOI5UFFd62oGxZ24ngFi9E+VCyAok/nWYMjRxTfXITEwcH4d4lO2BjrU29PKGumQJQdWtJhcKhu0Vod4c2ky5VyHDnEydENF+ZWCXlCcToc69nl1yJKjJG+uQ6ZhaSdE8sL/TEKUdpDOQqSwqNZGiHiD1guRrdWccyVCpBBp0FqI9ILidEJkGuDMSiRHVcbbwPUkPG65gZec52OaQjrG+ZMDrVvmV9PZgengvtzcZUuO2NzeJyqK4e9mMRhCpAoopnDJJUl6KZbUsMrx6vIq13+mhv7Lc3MgXRgurLOM+w4AoDgNnMZtMk3hxTIQp40T5QSaaXqqUFFpUqB8ueWUX16Ha1YSP5FMpyvUDNIKYcjGX6kKuZclatEuUYt2c5+AygE3/qmLOMhHKikB3Ol1t9Td9FQ82snO5Xg3SQ3p+P+4c1qZXVHCgbqrV9PqjTly4Phg1mGPfDifgjejmNzeXIALaBPgAhcQ4AKXFOACqgEuoFcs4zoCXEDTAFflBFcb9tIMlTvJfzuWFiHoJUpT07AoZBI1ngVg6uY6yeGY/kucZjnS0i1Sj2yczS06PtfksEe+fLeIi4UGbtEFJKDYl5SAwhu01i3iDGiiddfhFtnyelhjVNgEV+sWaYbKtbhFthZJKYZmfpEtL9n74+ehro6RfdS6/k0dI/v4j0EOe+QrcIy0+ULEvoAvROxL+kLEVv1CxNbrY4ArcYw0/UKkcoKrd4z0QuVqHCN5Jd/3vBb/3wlGROgvD01b+W9SpfdNCKPSj1ev6u1zbiRYheyJe7vn1IyGZC8nTdU5g6dLJP1NyrvCVGb1esdBLuUCSiUdzTndzf5z+O707P+vg4f/AQ== \ No newline at end of file diff --git a/doc/stp/images/STP_disable_on_interface.png b/doc/stp/images/STP_disable_on_interface.png new file mode 100644 index 0000000000..5b0464da00 Binary files /dev/null and b/doc/stp/images/STP_disable_on_interface.png differ diff --git a/doc/stp/images/STP_disable_on_interface.xml b/doc/stp/images/STP_disable_on_interface.xml new file mode 100644 index 0000000000..5e3396ffa9 --- /dev/null +++ b/doc/stp/images/STP_disable_on_interface.xml @@ -0,0 +1,2 @@ + +7Zxbc6M2FIB/jWfah+xwMb48+hKnmdltMutus9uXjgwyZiIjBvA67q+vBBI3gRGJcYjXeUjgICRx9EnnIkhPn21f7nzgbb5gC6KeplgvPX3e0zRVNUbkD5UcYslQN2KB7TsWK5QKls5/kAkVJt05FgxyBUOMUeh4eaGJXReaYU4GfB/v88XWGOVb9YANBcHSBEiUPjlWuImlI0NJ5X9Ax97wllWFXdkCXpgJgg2w8D4j0m97+szHOIyPti8ziKjyuF7i+xYVV5OO+dANZW6wD6uv6rebp38mwfw7WPyY/ju4u2G1/ARoxx54+dfjl7uvrMvhgeuB9N6jh7st+uysIXJccjb1oO9sYQh9cgUx8WMqm+43TgiXHjDprXtCCJFtwi0iZyo5JIMWAnKLn5wjBLzAWUWtKkTiQ3PnB85P+BUGMRtUinchbWmWjHlUFO9cC1qsqkTbSlTv1jHZMQIriKbAfLajG2YYYdq8i6MHCkIfP0Mu7Gm6Ev0kVzgDtIm1g1Cm5CL6oXLyVAuwdRBF/m/oW8AFTMz4VjV2XtYQQI7tEplJxjVSojjQfNSgH8KXjIgN/B3EZAD8AynCrvb5dGKzcMxO9ynSCbibDM7GiAkBm0Z2UnVKGjlgsDUATxPAE5DLgOJhxw2jLhjTnjEvkIf9cINt7AKUZS/lQbl0Ho5ObXlIjBwjqlYGiciIprTFiF62OM2vS1OXUWxMnWF0bWnqC9g9+OZmYv8WhN7vV/ouir6R3jX6jKthbNkw9ptCUjSMAznDmJQ7OSMDgZHZw5+L+7vr4tRlGBsvTmrnvPbhdXFqeXEaNIWkuDgpkl671hYjoysj7+lOj/M8aLqksWptzRiXRXE0I+gEgJoMbUZroBpYU4ujDRDp0nTlkyM7TJRSTg/Vi2MCNGGaXOEwxFtyAbrWhGb/qAxh87kWB8mhhJYNl6wrEK3w/jYVTCMBucB7dWxEA7zzTXhEbWzWhMC34bEFg0032rGjfPgQgZCY5Fwvygab3fpIJ2aGK00p54pXET8QuytFhowCOGSKsfle2U5/WN5OSmBcY8pj8oyvR5RPCGlGT0MkYcI/fGerVXTyQ2rp6hyrw/MwqBWCNSHzVMGgWFFhkVTHhYriWSdUdDLexLS7i0NnfWhv9fsorNWud9p5WNOVgh1V5Vg7GSJigvybZ4EQUh1t6O94hfKIi1RGzQBsadjmrgIvcmnYnZj8WjyR2pQ1GWXyNLQndXfHLTkuqcU1YfCrmOi2ERsUEeuPWzGpg0IYydspNanC3cmGbVUv214rxV2AttfKXwvDYSGC1IqRwIkwHBYS/bwdOQxHw5peto2huCvw9vU4L7yuyO89FUajPGS6Ov5ktDIZRmOloiW56aAqam1X254REjsVlBqvcjiFrEbyGg9Y8RqUXulAc0e+EC2WZOYHJcmOQWtb1mp1ar44RedTiWnLE/jmATmuJZOdX8XJsc+rRJDM3oc4Ac/k4rAcH+b6FFSN1lvLMKkSaekKEps9cnM+C3GmZDIuSYKcXlXtZ2dllTysUPJRrs6bylbF3KWgrYzRZFbRAsEm2dHKKDMNwclKnQnCk2tzqoYkPp8f2FlDhdZG0KpsyjCjd6NE71z21sxibooI6MvmdAqm+6Z/Xv+QM/gxUGG6qk3sdQwVwnjB63ktLYVM4pmDCU3M/8karJO5TkO1Y64T10FGKZNHGubMu+gWNdiZey+3SBMzJ7KUNXvkj+8WaWJ0L6jqPG5RMmgddos0icivM7aOK7R+Y6HfKVt3IW6RGA93GBXpvfluoXIxbtHr4/iTmabxuGtukRixT5b3s876Ra8K68/rF0lE9Y38oqpH/vh+Ee9AB/yicYWSO+QX6RJhXXeMHVNorV/EGeiIsbsMv0gXg90Oo8IWuFq/qGOoXIpfpJd8CDm5J4LJ4/116zdvQpVPQ53rK3kl/iQ0qUWa+pKvHzbdG1aTd7ELDUluDWtG+e11uKY18YJ4vQ5gO0hLJFzadvVVtfBO8/t/X6OLmZXlwTUtQTfXD7s69JHGccIbOqzn5U0iPXP9WOdtHMi/P9HVz011MS9zfQXzlO+dqUohrNGNdl4FVpWCJ8sbknQtEsaq+vlqT5icpv8KKS6e/kMp/fZ/ \ No newline at end of file diff --git a/doc/stp/images/STP_enable_interface.png b/doc/stp/images/STP_enable_interface.png new file mode 100644 index 0000000000..fce47c61f3 Binary files /dev/null and b/doc/stp/images/STP_enable_interface.png differ diff --git a/doc/stp/images/STP_enable_interface.xml b/doc/stp/images/STP_enable_interface.xml new file mode 100644 index 0000000000..9760d69e28 --- /dev/null +++ b/doc/stp/images/STP_enable_interface.xml @@ -0,0 +1,2 @@ + +7ZxZc6M4EIB/jat2H2aKw/h49JFkUzWzyY73fJRBBioyYgGP4/n1I4HEJWyExzjEQx4SaHTR+iR1t0QG+mL7+hAA3/mMLYgGmmK9DvTlQNNU1ZiQP1RySCRj3UgEduBaLFEmWLnfIBMqTLpzLRgWEkYYo8j1i0ITex40o4IMBAHeF5NtMCrW6gMbCoKVCZAo/ce1Ioe/l6JkD36Dru2wqicGe7AFPDEThA6w8D4n0u8G+iLAOEqutq8LiKjyuF6SfPdHnqYNC6AXyWQ4fFNeH/9Q79bO1B59Wf7vbl9mH1gpXwHasRde/fn8+eELa3J04Hogrffp5W6LPrkbiFyP3M19GLhbGMGAPEFM/JzJ5nvHjeDKBybNuieEEJkTbRG5U8kl6bQIkCxBeo8Q8EN3HdeqEEkAzV0Qul/hFxgmbFAp3kW0pkXa53FSvPMsaLGiUm0rcblb12TXCKwhmgPzxY4zLDDCtHoPxy8URgF+gVw40HQl/kmfcAZoFRsXoVzK+/iHyslb3YOtiyjyf8PAAh5gYsa3qrH7qooAcm2PyEzSr7ESxY7mvQaDCL7mRKzjHyAmHRAcSBL2dMiHExuFU3a7zyHNiXZyNBsTJgRsGNlp0Rlp5ILB1gA8TQBPQC4Hio9dL4qbYMwHxrJEHg4iB9vYAyjPXsaDcus8nBza8pAYBUZUrQoSkRFNaYsRvWpyWvZTU5dRbEydYXRtahoK2D0FpjOzfwkj/9eevpuib6J3jT6jXxhbXhiHTSEpL4wjuYUxTXdxRkYCI4un3+8fH/rJqcswNp6c1M5Z7eN+cmp5cho1haQ8OSmSVrvWFiOTnpG3NKenRR40XXKxam3OmFZ5cUQAPUBXDG1BC6AK2NAFRxsh0qL5OiBXdpTqpBoeqhbXBGjGFLnGUYS35AH0rBkN/lEZwuZLLQ2SPQktG65YUyBa4/1dJpjHAvKAt+pUh4Z4F5jwhNbYoIlAYMNT8wUbbbRhJ/EIIAIRWZELrajqa5b1mY7LHFaaUo0VLyJ5IZYrI4b0AjjkkrHhfrSe4bi6ngzApMQMx/QdzyeUjwdZRC8DJEEiOPzL5qr45j+piatzqI6vg6BWctWEuNMRBMWCSlOkOi0VlAw6oaCL4SYG3T0cuZtDe5Pfe2GtdrrTrsOarpRWUVWOtYshIobH//ItEEGqI4f+TiYonxhIVdSMwJY6bd469GODJs5ZFG5IP5P3oW2py5/U5XqkHM+E4c+yRrcN2agM2XDaypo6KrmRvJ7KNVXInW7YHmtl27OluAvQ9mz5c2E4LnmQWtkTuBCG41Kgn9cjh+FkXNPKtjEUdwX6GfnWhsJkUoRMV6cfjVYGw2SqHKlJbjioilrb1LZHhBiFFqij1PhHu1OIaqTHeMCal6AMKjuam/Ild7EiMj+qCHaMWtuyVsUIKQ/Nl4foci4xbHkA3zwg17NkovPrJDj2aZ0K0tH7lATgmVzsltPdXB+CqtF6axEmVSLkeITEZq/cnM+SpykZjEujIJdXlRiNO7VUnBOdlVXy5IiST3J13VA2L/iUtnKLJlsVLRA66Y5WTpmZE05m6pwbnj5bUjWkHvrywO4aKrTWh+YM1K6mOb0bFXrnsh8NLRaGiIC+bFSntHR/GF7XPtTEoE6HUWETXG1or2OokOW1ZPWcS0splnhlZ4Kz8Zam01jtmOmkiY7+7Jm6OcsumkVpF3bXLNJEl1WWsmav/P7NIq39U1eySpY/nfR2ZpGE59eZtY4rtH5rwejUWncjZpHEiaHuoMLbVrs53y1UbsYsOt+Pv5hZNJ12zSwSPfbZ6nHRWbvoLLf+qnYRh/xidtGxV37/dpEu4dVexy5KO63DdpEu4dZ1ZrHjCq21i3TZzZfeLmqAiujsdhcVPsHV2kUdQ+VW7CJddORXs0cimD0/9lu/xSVU+TjW+eBKj8RfhCa1TNNQ8gBi071hNT2LXapIcmtYM6qz1+GalcQT4s0mhO0gLRFwadvUV9XSoea3/75GFyMrq4NnWoJu+g+7OvSRxmnCGxqs1+Wt/6CrdQ5++IuuN//cVBfjMv0RzEueO1OVklujG+0cBVaVkiXLK5I0LVLGjrXzbEuY3Gb/CilJnv1DKf3uOw== \ No newline at end of file diff --git a/doc/stp/images/Topology_change_update.png b/doc/stp/images/Topology_change_update.png new file mode 100644 index 0000000000..9212c50f54 Binary files /dev/null and b/doc/stp/images/Topology_change_update.png differ diff --git a/doc/stp/images/Topology_change_update.xml b/doc/stp/images/Topology_change_update.xml new file mode 100644 index 0000000000..f0235c9391 --- /dev/null +++ b/doc/stp/images/Topology_change_update.xml @@ -0,0 +1,2 @@ + +7Vxbc5s4FP41ntl9iAfE/dGXppuZXjLrTLftmwyyzUZGHsBNvL9+JRBXgRGJcZ2MM9MGHYQQ53w6lw+RkTbbPn8M4W7zmXgIj4DiPY+0+QgAVTVs+otJDqnE0oxUsA59j3cqBAv/P8SFCpfufQ9FlY4xITj2d1WhS4IAuXFFBsOQPFW7rQiu3nUH10gQLFyIRek/vhdvUqkNrEL+F/LXm+zOqumkZ7Yw68yfJNpAjzyVRNqHkTYLCYnTo+3zDGGmvEwv6XW3LWfziYUoiGUuCL3o68NkNbvT/8WPP73tT7BZ3vBRfkG85w+8eLif8wnHh0wLdO47drjf4k/+CmE/oK3pDoX+FsUopGcwF98XsunTxo/RYgdddukTxQeVbeItpi2VHlKTxZBeEuZtjOEu8pfJXRUqCZG7DyP/F/obRSkymJTsY3anWW7xpCvZBx7y+FC5rpVk3K3v8mMMlwhPofu4Ti6YEUzY7QOSPFAUh+QRZcIR0JTkJz+TIYDdYuVjXOp5m/wwOX2qW7j1MQP8NxR6MIBczNGtAt5uuhHE/jqgMpdaNVGiaObMZiiM0XNJxM3+ERFqgPBAu+SrkEOQr0GbN58KQKsKl21KYDayjpAvonU+dIEzesCh1gN2QIDd19DdTNZ/RPHuzyv63hX6NOfS0KcJ6BMgVwLKjvhBnEzBmI6MeQ15JIw3ZE0CiMvYK/CgvHc8HF3f0iDRjQpGVNAEEhEjGhgKI3o3RlDgTViGwfSDYRQxs5Y9DFVRePhebvxgdh/TDIi3588cCGnrUG6V4JQI+2keeZW0RtR7SbFG0+LjshBhGFMXWM2rGpTN73DPFkthVqNqVqCY1REisg9dxC8q5y4d42hOdZwYhmsUC+NQ88BDqRtfyq3Tzf1Qbb5t0xL683kVwEtnUMAwt8DLkWlcvdfvzKUESP9uT2UKeHggO6qNNZu1u4EB9QTAhFuW/wTLaJfowcR0XtNlSI/W7GgFI4ZrVgyBGRsNw0DsdQxoTIU+rZsmXOlLEsdkm3iz3E0uMXEfO5EjaXXm4hZ8KggvydOHQjBNBPRENqtjxu/0la90gqrZ4b1avGBv72X18EbC1cB5mY89lVuzBBgHJPZXhysKT4NCTanZ11IGQaFWKzaz+8ihUNc6ZjkwCm0BhStvyWpTKryh/27nU/Y/3keb1FF++zT5IgCSxpK4mgtWo2EGsVLY4yIhZNUBvfU9L8FWU3lbDdwniHa6WTOm2RDtmsLdYNHOkYl2iug0EmdCFRn7JLiGuKFDnF6v3QcKcdl9XhbihFkO7FyyhfKqovLZj7+XjtOS0rZM3i5KStY4lBrdBaUsWM5TO9YDlqHKYUocqEZACQMNbXaRZP/CUxuqo7UfUZtQpy24Ihf7KFkNnT5KjDchoyxhwWWKZaB8pGlxZbTSWiU/8uREvgDqASh/f8OnPCq/ImkKTDfKWFFVu2LX7GXGK3EHtOqo9VBGVqsIDQMUIABFMDWzwU4+svdWLKh52IaAbzaseVMZKuCrIls7ub+ngrm4DrIXBe4B+zQRkngLsEwzpk/LXJDj/GtK9HO5LMJBiyGOetEmjQ7GfqsS1GYLyvo9cn/s1UhgXY5ayZPU06tqeK5NVsl6i5KP4qqZiAJjx7ItoOmKDgxD04ZSnkhMCcorZTs8M/ZgtMlfpDWR6co4p89/lM+1cOk99ZumAhKQuJAkqZbQ1leCbIpkV8e50ZyxqtuOZtKEG6jAMvXquENnTCIZdMHQyeaW6PrtQIdquWp1TZIRasjUK+Ocu6oSORvZeHayrMkCl5Y1iTzJZHE3u9i0yW6xxOWkTZmF34ZPsiV9UvYQV590Up8ExJJ/jjBN+ajs84QtQ2ZWH9HiXlmFZNv9gq60et8VZZnzWeMqo2UZp2C0ji6OoUnRG1AHs81Wuwyc+/KiN8BQx/WV43CJJDmqtFx+tjUjwX6cqdiSj0avq1yH2xQAJPa4XVy06iy+LixaafWk74Wxqr4/1zlzsBIJoUUMWayqRxqp6HPWbLL3G02jVupKZurDpZUSFNPQ1YtZf2UvuadpOOINSHBH54kFuX1eTbwBxRgbmurkxMpQxBs4KXvSU0/dPty8KB/+Pgk0IMGJ/I7wz83TXZZeFkj0moPUtRfCxFRbEsVz4UKkhbrq0lq499nbYupKeS2qwMBLYJXkC++vPG2F8NDFo66bFaRokp6pb+Wom069cDT77Kkxlbbrz4XpbC2W09dD4HoC7q6fkb3ljfe2XvXBF/AZmUjyHfN11y8x+n9Hlq/tt/shmSbSWteN7aeMlFbtixzdGObzCqu2Fzq7j1yctO2OWQ4dJUVCcDG5o4LJ/d0ViVUfoowtTavuAlRPU0jYXTuQT4RVp/ZhYr990k7LLusurBYDvWavJG0WfxQj7V78aRHtw/8= \ No newline at end of file diff --git a/doc/subport/sonic-sub-port-intf-hld.md b/doc/subport/sonic-sub-port-intf-hld.md new file mode 100644 index 0000000000..5f33638552 --- /dev/null +++ b/doc/subport/sonic-sub-port-intf-hld.md @@ -0,0 +1,596 @@ +# SONiC sub port interface high level design + +# Table of Contents + + + * [Revision history](#revision-history) + * [Scope](#scope) + * [Acronyms](#acronyms) + * [1 Requirements](#1-requirements) + * [2 Schema design](#2-schema-design) + * [2.1 Configuration](#21-configuration) + * [2.1.1 config_db.json](#211-config-db-json) + * [2.1.2 CONFIG_DB](#212-config-db) + * [2.1.3 CONFIG_DB schemas](#213-config-db-schemas) + * [2.2 APPL_DB](#22-appl-db) + * [2.3 STATE_DB](#23-state-db) + * [2.4 SAI](#24-sai) + * [2.4.1 Create a sub port interface](#241-create-a-sub-port-interface) + * [2.4.2 Runtime change on sub port interface attributes](#242-runtime-change-on-sub-port-interface-attributes) + * [2.4.2.1 SAI-supported attributes](#2421-sai-supported-attributes) + * [2.4.2.2 Set sub port interface admin status](#2422-set-sub-port-interface-admin-status) + * [2.4.3 Remove a sub port interface](#243-remove-a-sub-port-interface) + * [2.5 Linux integration](#25-linux-integration) + * [2.5.1 Host sub port interfaces](#251-host-sub-port-interfaces) + * [2.5.2 Route, neighbor subsystems](#252-route-neighbor-subsystems) + * [3 Event flow diagrams](#3-event-flow-diagrams) + * [3.1 Sub port interface creation](#31-sub-port-interface-creation) + * [3.2 Sub port interface runtime admin status change](#32-sub-port-interface-runtime-admin-status-change) + * [3.3 Sub port interface removal](#33-sub-port-interface-removal) + * [4 CLI](#4-cli) + * [4.1 Config commands](#41-config-commands) + * [4.1.1 Config a sub port interface](#411-config-a-sub-port-interface) + * [4.1.2 Config IP address on a sub port interface](#412-config-ip-address-on-a-sub-port-interface) + * [4.1.3 Change admin status on a sub port interface](#413-change-admin-status-on-a-sub-port-interface) + * [4.2 Show commands](#42-show-commands) + * [5 Warm reboot support](#5-warm-reboot-support) + * [6 Unit test](#6-unit-test) + * [6.1 Sub port interface creation](#61-sub-port-interface-creation) + * [6.1.1 Create a sub port interface](#611-create-a-sub-port-interface) + * [6.1.2 Add an IP address to a sub port interface](#612-add-an-ip-address-to-a-sub-port-interface) + * [6.2 Sub port interface admin status change](#62-sub-port-interface-admin-status-change) + * [6.3 Sub port interface removal](#63-sub-port-interface-removal) + * [6.3.1 Remove an IP address from a sub port interface](#631-remove-an-ip-address-from-a-sub-port-interface) + * [6.3.2 Remove all IP addresses from a sub port interface](#632-remove-all-ip-addresses-from-a-sub-port-interface) + * [6.3.3 Remove a sub port interface](#633-remove-a-sub-port-interface) + * [7 Scalability](#7-scalability) + * [8 Port channel renaming](#8-port-channel-renaming) + * [9 Appendix](#9-appendix) + * [9.1 Difference between a sub port interface and a vlan interface](#91-difference-between-a-sub-port-interface-and-a-vlan-interface) + * [10 Open questions](#10-open-questions) + * [11 Acknowledgment](#11-acknowledgment) + * [12 References](#12-references) + + + +# Revision history +| Rev | Date | Author | Change Description | +|:---:|:-----------:|:------------------:|-----------------------------------| +| 0.1 | 07/01/2019 | Wenda Ni | Initial version | + +# Scope +A sub port interface is a logical interface that can be created on a physical port or a port channel. +A sub port interface serves as an interface to either a .1D bridge or a VRF, but not both. +This design focuses on the use case of creating a sub port interface on a physical port or a port channel and using it as a router interface to a VRF as shown in Fig. 1. + +![](sub_intf_rif.png "Fig. 1: Sub port router interface") +__Fig. 1: Sub port router interface__ + +Multiple L3 sub port interfaces, each characterized by a VLAN id in the 802.1q tag, can be created on a physical port or a port channel. +Sub port interfaces attaching to the same physical port or port channel can interface to different VRFs, though they share the same VLAN id space and must have different VLAN ids. +Sub port interfaces attaching to different physical ports or port channels can use the same VLAN id, even when they interface to the same VRF. +However, there is no L2 bridging between these sub port interfaces; each sub port interface is considered to stay in a separate bridge domain. + +This design does NOT cover the case of creating and using sub port as a bridge port to a .1D bridge shown in Fig. 2. + +![](sub_intf_bridge_port.png "Fig. 2: Sub port bridge port (out of scope of this design") +__Fig. 2: Sub port bridge port__ + +# Acronyms +| Acronym | Description | +|--------------------------|--------------------------------------------| +| VRF | Virtual routing and forwarding | +| RIF | Router interface | + +# 1 Requirements + +Manage the life cycle of a sub port interface created on a physical port or a port channel and used as a router interface to a VRF: +* Creation with the specified dot1q vlan id encapsulation +* Runtime admin status change +* Removal + +A sub port interface shall support the following features: +* L3 forwarding (both unicast and multicast) +* BGP +* ARP and NDP +* VRF +* RIF counters +* QoS setting inherited from parent physical port or port channel +* mtu inherited from parent physical port or port channel +* Per sub port interface admin status config + +# 2 Schema design + +We introduce a new table "VLAN_SUB_INTERFACE" in the CONFIG_DB to host the attributes of a sub port interface. +For APPL_DB and STATE_DB, we do not introduce new tables for sub port interfaces, but reuse existing tables to host sub port interface keys. + +## 2.1 Configuration +### 2.1.1 config_db.json +``` +"VLAN_SUB_INTERFACE": { + "{{ port_name }}.{{ vlan_id }}": { + "admin_status" : "{{ adminstatus }}" + }, + "{{ port_name }}.{{ vlan_id }}|{{ ip_prefix }}": {} +}, +``` +A key in the VLAN_SUB_INTERFACE table is the name of a sub port, which consists of two sections delimited by a "." (symbol dot). +The section before the dot is the name of the parent physical port or port channel. The section after the dot is the dot1q encapsulation vlan id. + +mtu of a sub port interface is inherited from its parent physical port or port channel, and is not configurable in the current design. + +admin_status of a sub port interface can be either up or down. +In the case field "admin_status" is absent in the config_db.json file, a sub port interface is set admin status up by default at its creation. + +Example configuration: +``` +"VLAN_SUB_INTERFACE": { + "Ethernet64.10": { + "admin_status" : "up" + }, + "Ethernet64.10|192.168.0.1/21": {}, + "Ethernet64.10|fc00::/7": {} +}, +``` + +### 2.1.2 CONFIG_DB +``` +VLAN_SUB_INTERFACE|{{ port_name }}.{{ vlan_id }} + "admin_status" : "{{ adminstatus }}" + +VLAN_SUB_INTERFACE|{{ port_name }}.{{ vlan_id }}|{{ ip_prefix }} + "NULL" : "NULL" +``` + +### 2.1.3 CONFIG_DB schemas +``` +; Defines for sub port interface configuration attributes +key = VLAN_SUB_INTERFACE|subif_name ; subif_name is the name of the sub port interface + +; subif_name annotations +subif_name = port_name "." vlan_id ; port_name is the name of parent physical port or port channel + ; vlanid is DIGIT 1-4094 + +; field = value +admin_status = up / down ; admin status of the sub port interface +``` + +``` +; Defines for sub port interface configuration attributes +key = VLAN_SUB_INTERFACE|subif_name|IPprefix ; an instance of this key will be repeated for each IP prefix + +IPprefix = IPv4prefix / IPv6prefix ; an instance of this key/value pair will be repeated for each IP prefix + +IPv4prefix = dec-octet "." dec-octet "." dec-octet "." dec-octet "/" %d1-32 +dec-octet = DIGIT ; 0-9 + / %x31-39 DIGIT ; 10-99 + / "1" 2DIGIT ; 100-199 + / "2" %x30-34 DIGIT ; 200-249 + / "25" %x30-35 ; 250-255 + +IPv6prefix = 6( h16 ":" ) ls32 + / "::" 5( h16 ":" ) ls32 + / [ h16 ] "::" 4( h16 ":" ) ls32 + / [ *1( h16 ":" ) h16 ] "::" 3( h16 ":" ) ls32 + / [ *2( h16 ":" ) h16 ] "::" 2( h16 ":" ) ls32 + / [ *3( h16 ":" ) h16 ] "::" h16 ":" ls32 + / [ *4( h16 ":" ) h16 ] "::" ls32 + / [ *5( h16 ":" ) h16 ] "::" h16 + / [ *6( h16 ":" ) h16 ] "::" +h16 = 1*4HEXDIG +ls32 = ( h16 ":" h16 ) / IPv4address +``` + +Example: +``` +VLAN_SUB_INTERFACE|Ethernet64.10 + "admin_status" : "up" + +VLAN_SUB_INTERFACE|Ethernet64.10|192.168.0.1/21 + "NULL" : "NULL" + +VLAN_SUB_INTERFACE|Ethernet64.10|fc00::/7 + "NULL" : "NULL" +``` + +## 2.2 APPL_DB +``` +INTF_TABLE:{{ port_name }}.{{ vlan_id }} + "admin_status" : "{{ adminstatus }}" + +; field = value +admin_status = up / down ; admin status of the sub port interface + + +INTF_TABLE:{{ port_name }}.{{ vlan_id }}:{{ ip_prefix }} + "scope" : "{{ visibility_scope }}" + "family": "{{ address_family }}" + +; field = value +scope = global / local ; local is an interface visible on this localhost only +family = IPv4 / IPv6 ; address family +``` + +Example: +``` +INTF_TABLE:Ethernet64.10 + "admin_status" : "up" + +INTF_TABLE:Ethernet64.10:192.168.0.1/24 + "scope" : "global" + "family": "IPv4" + +INTF_TABLE:Ethernet64.10:fc00::/7 + "scope" : "global" + "family": "IPv6" +``` + +## 2.3 STATE_DB + +Following the current schema, sub port interface state of a physical port is set to the PORT_TABLE, while sub port interface state of a port channel is set to the LAG_TABLE. +``` +PORT_TABLE|{{ port_name }}.{{ vlan_id }} + "state" : "ok" +``` +``` +LAG_TABLE|{{ port_name }}.{{ vlan_id }} + "state" : "ok" +``` +``` +INTERFACE_TABLE|{{ port_name }}.{{ vlan_id }}|{{ ip_prefix }} + "state" : "ok" +``` + +Example: +``` +PORT_TABLE|Ethernet64.10 + "state" : "ok" +``` +``` +INTERFACE_TABLE|Ethernet64.10|192.168.0.1/21 + "state" : "ok" +``` +``` +INTERFACE_TABLE|Ethernet64.10|fc00::/7 + "state" : "ok" +``` + +## 2.4 SAI +SAI attributes related to a sub port interface are listed in the Table below. + +| SAI attributes | attribute value/type | +|--------------------------------------------------|----------------------------------------------| +| SAI_ROUTER_INTERFACE_ATTR_VIRTUAL_ROUTER_ID | VRF oid | +| SAI_ROUTER_INTERFACE_ATTR_TYPE | SAI_ROUTER_INTERFACE_TYPE_SUB_PORT | +| SAI_ROUTER_INTERFACE_ATTR_PORT_ID | parent physical port or port channel oid | +| SAI_ROUTER_INTERFACE_ATTR_OUTER_VLAN_ID | VLAN id (sai_uint16_t) | +| SAI_ROUTER_INTERFACE_ATTR_SRC_MAC_ADDRESS | MAC address | +| SAI_ROUTER_INTERFACE_ATTR_MTU | mtu size | + +### 2.4.1 Create a sub port interface +``` +sai_attribute_t sub_intf_attrs[6]; + +sub_intf_attrs[0].id = SAI_ROUTER_INTERFACE_ATTR_VIRTUAL_ROUTER_ID; +sub_intf_attrs[0].value.oid = vrf_oid; + +sub_intf_attrs[1].id = SAI_ROUTER_INTERFACE_ATTR_TYPE; +sub_intf_attrs[1].value.s32 = SAI_ROUTER_INTERFACE_TYPE_SUB_PORT; + +sub_intf_attrs[2].id = SAI_ROUTER_INTERFACE_ATTR_PORT_ID; +sub_intf_attrs[2].value.oid = parent_port_oid; /* oid of the parent physical port or port channel */ + +sub_intf_attrs[3].id = SAI_ROUTER_INTERFACE_ATTR_OUTER_VLAN_ID; +sub_intf_attrs[3].value.u16 = 10; + +sai_mac_t mac = {0x00, 0xe0, 0xec, 0xc2, 0xad, 0xf1}; +sub_intf_attrs[4].id = SAI_ROUTER_INTERFACE_ATTR_SRC_MAC_ADDRESS; +memcpy(sub_intf_attrs[4].value.mac, mac, sizeof(sai_mac_t)); + +sub_intf_attrs[5].id = SAI_ROUTER_INTERFACE_ATTR_MTU; +sub_intf_attrs[5].value.u32 = 9100; + +uint32_t sub_intf_attrs_count = 6; +sai_status_t status = create_router_interface(&rif_id, switch_oid, sub_intf_attrs_count, sub_intf_attrs); +``` + +### 2.4.2 Runtime change on sub port interface attributes +#### 2.4.2.1 SAI-supported attributes +The following attributes are supported in SAI spec to be changed at run time. + +| SAI attributes | attribute value/type | +|-------------------------------------------------------|----------------------------------------------| +| SAI_ROUTER_INTERFACE_ATTR_ADMIN_V4_STATE | true / false | +| SAI_ROUTER_INTERFACE_ATTR_ADMIN_V6_STATE | true / false | +| SAI_ROUTER_INTERFACE_ATTR_SRC_MAC_ADDRESS | MAC address | +| SAI_ROUTER_INTERFACE_ATTR_MTU | mtu size | +| SAI_ROUTER_INTERFACE_ATTR_INGRESS_ACL | ACL table or ACL table group oid | +| SAI_ROUTER_INTERFACE_ATTR_EGRESS_ACL | ACL table or ACL table group oid | +| SAI_ROUTER_INTERFACE_ATTR_V4_MCAST_ENABLE | true / false | +| SAI_ROUTER_INTERFACE_ATTR_V6_MCAST_ENABLE | true / false | +| SAI_ROUTER_INTERFACE_ATTR_NEIGHBOR_MISS_PACKET_ACTION | sai_packet_action_t | +| SAI_ROUTER_INTERFACE_ATTR_LOOPBACK_PACKET_ACTION | sai_packet_action_t | + + +#### 2.4.2.2 Set sub port interface admin status +``` +sai_attribute_t sub_intf_attr; + +sub_intf_attr.id = SAI_ROUTER_INTERFACE_ATTR_ADMIN_V4_STATE; +sub_intf_attr.value.booldata = false; +sai_status_t status = set_router_interface_attribute(rif_id, &attr); + +sub_intf_attr.id = SAI_ROUTER_INTERFACE_ATTR_ADMIN_V6_STATE; +sub_intf_attr.value.booldata = false; +sai_status_t status = set_router_interface_attribute(rif_id, &attr); +``` + +### 2.4.3 Remove a sub port interface +``` +sai_status_t status = remove_router_interface(rif_id); +``` + +## 2.5 Linux integration +### 2.5.1 Host sub port interfaces + +Inside SONiC, we use iproute2 package to manage host sub port interfaces. +Specifically, we use `ip link add link name type vlan id ` to create a host sub port interface. +This command implies the dependancy that a parent host interface must be created before the creation of a host sub port interface. + +Example: +``` +ip link add link Ethernet64 name Ethernet64.10 type vlan id 10 +ip link set Ethernet64.10 mtu 9100 +ip link set Ethernet64.10 up +``` +``` +ip link del Ethernet64.10 +``` + +We use `ip address` and `ip -6 address` to add and remove ip adresses on a host sub port interface. + +Example: +``` +ip address add 192.168.0.1/24 dev Ethernet64.10 +``` + +Please note that the use of iproute2 package is internal to SONiC, specifically IntfMgrd. +Users should always use SONiC CLIs defined in Section 4 to manage sub port interfaces in order to have an event flowing properly across all components and DBs shown in Section 3. +The direct use of iproute2 commands, e.g., `ip link` and `ip address`, by users are not recommended. + +### 2.5.2 Route, neighbor subsystems + +Once the host sub port interfaces are properly set, route and neighbor subsystems should function properly on sub port interfaces. +fpmsyncd should receive route add/del updates on sub port interfaces from zebra over TCP socket port 2620. These updates are received in the format of netlink messages. +neighsyncd should receive neigh add/del netlink messages on sub port interfaces from its subscription to the neighbor event notification group RTNLGRP_NEIGH. + +Internally, a sub port interface is represented as a Port object to be perceived seamlessly by NeighOrch and RouteOrch to create neighor and route entries, respectively, on it. + + +# 3 Event flow diagrams +## 3.1 Sub port interface creation +![](sub_intf_creation_flow.png) + +## 3.2 Sub port interface runtime admin status change +![](sub_intf_set_admin_status_flow.png) + +## 3.3 Sub port interface removal +![](sub_intf_removal_flow.png) + +# 4 CLIs +## 4.1 Config commands +### 4.1.1 Config a sub port interface +`subinterface` command category is introduced to the `config` command. + +``` +Usage: config [OPTIONS] COMMAND [ARGS]... + + SONiC command line - 'config' command + +Options: + --help Show this message and exit. + +Commands: + ... + subinterface Sub-port-interface-related configuration tasks +``` + +`add` and `del` commands are supported on a sub port interface. +``` +Usage: config subinterface [OPTIONS] COMMAND [ARGS]... + + Sub-port-interface-related configuration tasks + +Options: + --help Show this message and exit. + +Commands: + add Add a sub port interface + del Remove a sub port interface +``` +``` +Usage: config subinterface add +``` +``` +Usage: config subinterface del +``` + +### 4.1.2 Config IP address on a sub port interface +Once a sub port interface is added, existing `config interface ip` is used to `add` or `del` ip address on it. +``` +Usage: config interface ip [OPTIONS] COMMAND [ARGS]... + + Add or remove IP address + +Options: + --help Show this message and exit. + +Commands: + add Add an IP address towards the interface + del Remove an IP address from the interface +``` +``` +Usage: config interface ip add +``` +``` +Usage: config interface ip del +``` + +### 4.1.3 Change admin status on a sub port interface +Current `config interface startup` and `shutdown` commands are extended to sub port interfaces to set admin status up and down, respectively, on a sub port interface. +``` +Usage: config interface startup [OPTIONS] + + Start up interface + +Options: + --help Show this message and exit. +``` +``` +Usage: config interface shutdown [OPTIONS] + + Shut down interface + +Options: + --help Show this message and exit. +``` +``` +Usage: config interface startup +``` +``` +Usage: config interface shutdown +``` + +## 4.2 Show commands +``` +Usage: show subinterfaces [OPTIONS] COMMAND [ARGS]... + + Show details of the sub port interfaces + +Options: + -?, -h, --help Show this message and exit. + +Commands: + status Show sub port interface status information +``` +Example: +``` +Sub port interface Speed MTU Vlan Admin Type +------------------ ------- ----- ------ ------- ------------------- + Ethernet64.10 100G 9100 10 up dot1q-encapsulation +``` +No operational status is defined on RIF (sub port interface being a type of RIF) in SAI spec. + +# 5 Warm reboot support +There is no special runtime state that needs to be kept for sub port interfaces. +This said, current warm reboot infrastructure shall support sub port interfaces naturally without the need for additional extension. +This is confirmed by preliminary trials on a Mellanox device. + +# 6 Unit test +## 6.1 Sub port interface creation +Test shall cover the parent interface being a physical port or a port channel. + +### 6.1.1 Create a sub port interface +| Test case description | +|--------------------------------------------------------------------------------------------------------| +| Verify that sub port interface configuration is pushed to CONIFG_DB VLAN_SUB_INTERFACE table | +| Verify that sub port interface configuration is synced to APPL_DB INTF_TABLE by Intfmgrd | +| Verify that sub port interface state ok is pushed to STATE_DB by Intfmgrd | +| Verify that a sub port router interface entry is created in ASIC_DB | + +### 6.1.2 Add an IP address to a sub port interface +Test shall cover the IP address being an IPv4 address or an IPv6 address. + +| Test case description | +|--------------------------------------------------------------------------------------------------------| +| Verify that ip address configuration is pushed to CONIFG_DB VLAN_SUB_INTERFACE table | +| Verify that ip address configuration is synced to APPL_DB INTF_TABLE by Intfmgrd | +| Verify that ip address state ok is pushed to STATE_DB INTERFACE_TABLE by Intfmgrd | +| Verify that a subnet route entry is created in ASIC_DB | +| Verify that an ip2me route entry is created in ASIC_DB | + +## 6.2 Sub port interface admin status change +| Test case description | +|--------------------------------------------------------------------------------------------------------| +| Verify that sub port interface admin status change is pushed to CONIFG_DB VLAN_SUB_INTERFACE table | +| Verify that sub port interface admin status change is synced to APPL_DB INTF_TABLE by Intfmgrd | +| Verify that sub port router interface entry in ASIC_DB has the updated admin status | + +## 6.3 Sub port interface removal +### 6.3.1 Remove an IP address from a sub port interface +Test shall cover the IP address being an IPv4 address or an IPv6 address. + +| Test case description | +|--------------------------------------------------------------------------------------------------------| +| Verify that ip address configuration is removed from CONIFG_DB VLAN_SUB_INTERFACE table | +| Verify that ip address configuration is removed from APPL_DB INTF_TABLE by Intfmgrd | +| Verify that ip address state ok is removed from STATE_DB INTERFACE_TABLE by Intfmgrd | +| Verify that subnet route entry is removed from ASIC_DB | +| Verify that ip2me route entry is removed from ASIC_DB | + +### 6.3.2 Remove all IP addresses from a sub port interface +| Test case description | +|--------------------------------------------------------------------------------------------------------| +| Verify that sub port router interface entry is removed from ASIC_DB | + +### 6.3.3 Remove a sub port interface +Test shall cover the parent interface being a physical port or a port channel. + +| Test case description | +|--------------------------------------------------------------------------------------------------------| +| Verify that sub port interface configuration is removed from CONIFG_DB VLAN_SUB_INTERFACE table | +| Verify that sub port interface configuration is removed from APPL_DB INTF_TABLE by Intfmgrd | +| Verify that sub port interface state ok is removed from STATE_DB by Intfmgrd | + +# 7 Scalability +Scalability is ASIC-dependent. +We enforce a minimum scalability requirement on the number of sub port interfaces that shall be supported on a SONiC switch. + +| Name | Scaling value | +|-------------------------------------------------------------------|---------------------------| +| Number of sub port interfaces per physical port or port channel | 250 | +| Number of sub port interfaces per switch | 750 | + +# 8 Port channel renaming +Linux has the limitation of 15 characters on an interface name. +For sub port interface use cases on port channels, we need to redesign the current naming convention for port channels (PortChannelXXXX, 15 characters) to take shorter names (such as, PoXXXX, 6 characters). +Even when the parent port is a physical port, sub port interface use cases, such as Ethernet128.1024, still exceed the 15-character limit on an interface name. + +# 9 Appendix +## 9.1 Difference between a sub port interface and a vlan interface +Sub port interface is a router interface (RIF type sub port Vlan#) between a VRF and a physical port or a port channel. +Vlan interface is a router interface (RIF type vlan Vlan#) facing a .1Q bridge. It is an interface between a bridge port type router (connecting to a .1Q bridge) and a VRF, as shown in Fig. 3. + +![](vlan_intf_rif.png "Fig. 3: Vlan interface") +__Fig. 3: Vlan interface__ + +# 10 Open questions: +1. Miss policy to be defined in SAI specification + + When a 802.1q tagged packet is received on a physical port or a port channel, it will go to the sub port interface that matches the VLAN id inside the packet. + If no sub port interfaces match the VLAN id in the packet tag, what is the default policy on handling the packet? + + As shown in Fig. 1, there is possiblity that a physical port or a port channel may not have a RIF type port created. + In this case, if an untagged packet is received on the physical port or port channel, what is the policy on handling the untagged packet? + +# 11 Acknowledgment +Wenda would like to thank his colleagues with Microsoft SONiC team, Shuotian, Prince, Pavel, and Qi in particular, Itai with Mellanox for all discussions that shape the design proposal, and community members for comments and feedbacks that improve the design. + +# 12 References +[1] SAI_Proposal_Bridge_port_v0.9.docx https://github.com/opencomputeproject/SAI/blob/master/doc/bridge/SAI_Proposal_Bridge_port_v0.9.docx + +[2] Remove the need to create an object id for vlan in creating a sub port router interface https://github.com/opencomputeproject/SAI/pull/998 + +[3] Sub port interface schema https://github.com/Azure/sonic-swss-common/pull/284 + +[4] Sub port interface implementation https://github.com/Azure/sonic-swss/pull/969 + +[5] Use dot1p in packet 802.1q tag to map a packet to traffic class (TC) inside a switch pipeline https://github.com/Azure/sonic-swss/pull/871; https://github.com/Azure/sonic-buildimage/pull/3412; https://github.com/Azure/sonic-buildimage/pull/3422 + +[6] Generate a CONFIG_DB with sub port interface config from minigraph https://github.com/Azure/sonic-buildimage/pull/3413 + +[7] CLI to support sub port interface admin status change https://github.com/Azure/sonic-utilities/pull/638 + +[8] CLI to show subinterfaces status https://github.com/Azure/sonic-utilities/pull/642 + +[8] CLI to add/del ip address on a sub port interface https://github.com/Azure/sonic-utilities/pull/651 diff --git a/doc/subport/sub_intf_bridge_port.png b/doc/subport/sub_intf_bridge_port.png new file mode 100644 index 0000000000..c2b1381f45 Binary files /dev/null and b/doc/subport/sub_intf_bridge_port.png differ diff --git a/doc/subport/sub_intf_creation_flow.png b/doc/subport/sub_intf_creation_flow.png new file mode 100644 index 0000000000..9de3fc9c50 Binary files /dev/null and b/doc/subport/sub_intf_creation_flow.png differ diff --git a/doc/subport/sub_intf_removal_flow.png b/doc/subport/sub_intf_removal_flow.png new file mode 100644 index 0000000000..606a877539 Binary files /dev/null and b/doc/subport/sub_intf_removal_flow.png differ diff --git a/doc/subport/sub_intf_rif.png b/doc/subport/sub_intf_rif.png new file mode 100644 index 0000000000..b127f939b6 Binary files /dev/null and b/doc/subport/sub_intf_rif.png differ diff --git a/doc/subport/sub_intf_set_admin_status_flow.png b/doc/subport/sub_intf_set_admin_status_flow.png new file mode 100644 index 0000000000..bd835251af Binary files /dev/null and b/doc/subport/sub_intf_set_admin_status_flow.png differ diff --git a/doc/subport/sub_intf_set_mtu_flow.png b/doc/subport/sub_intf_set_mtu_flow.png new file mode 100644 index 0000000000..80c3e1a095 Binary files /dev/null and b/doc/subport/sub_intf_set_mtu_flow.png differ diff --git a/doc/subport/vlan_intf_rif.png b/doc/subport/vlan_intf_rif.png new file mode 100644 index 0000000000..8b1cb91633 Binary files /dev/null and b/doc/subport/vlan_intf_rif.png differ diff --git a/doc/system-telemetry/process-docker-stats.md b/doc/system-telemetry/process-docker-stats.md new file mode 100644 index 0000000000..968d8d3d81 --- /dev/null +++ b/doc/system-telemetry/process-docker-stats.md @@ -0,0 +1,108 @@ +# Process and docker stats availability via telemetry agent + +## Revision + +| Rev | Date | Author | Change Description | +|:---:|:--------:|:-----------:|--------------------| +| 0.1 | 09/12/19 | Pradnya Mohite | Initial version | + +## Scope +-Enable sonic streaming telemetry agent to send Process and docker stats data + +### Enable sonic streaming telemetry agent to send Process and docker stats data + +##### Part 1 +For 1st part, Daemon code will be added under sonic-buildimage/files/image_config. A Daemon will start when OS starts. At every 2 min interval it will do following: +Delete all entries for Process and Docker stats from state db +Update Process and Docker stats data to state-DB. +Update last update time for process and Docker stats. + +Details of CLI and state-DB given below. + +##### Part 2 +Verify that from state-DB data is available via telemetry agent + +##### CLI output and corresponding structure in state-DB for process and docker stats + +###### Process stats + +``` +$ ps -eo uid,pid,ppid,%mem,%cpu,stime,tty,time,cmd + + UID PID PPID %MEM %CPU STIME TT TIME CMD + 0 1 0 0.1 0.0 Oct15 ? 00:00:09 /sbin/init + 0 2 0 0.0 0.0 Oct15 ? 00:00:00 [kthreadd] + 0 3 2 0.0 0.0 Oct15 ? 00:00:01 [ksoftirqd/0] + 0 5 2 0.0 0.0 Oct15 ? 00:00:00 [kworker/0:0H] + +``` +above output will be stored inside state-DB as follows for largest 1024 CPU consumption processes: + +``` +PROCESS_STATS|4276 +"UID" +"0" +"PID" +"1" +"PPID" +"0" +"CPU%" +"0.0" +"MEM%" +"0.1" +"TTY" +"?" +"STIME" +"Oct15" +"TIME" +"00:00:09" +"CMD" +"/sbin/init" + +``` +Along with data new entry for timestamp will be updated in state_db: + +``` +PROCESS_STATS|LastUpdateTime +``` + +###### Docker stats + +``` +$ docker stats --no-stream -a + +CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS +209c6e6116c6 syncd 10.82% 286MiB / 3.844GiB 7.26% 0B / 0B 0B / 639kB 32 +8a97fafdbd60 dhcp_relay 0.02% 12.15MiB / 3.844GiB 0.31% 0B / 0B 0B / 65.5kB 5 + +``` +above output will be stored inside state-DB as follows: + +``` +DOCKER_STATS|209c6e6116c6   +"NAME" +"syncd" +"CPU%" +"10.82" +"MEM_BYTES" +"299892736" +"MEM_LIMIT_BYTES" +"4127463571" +"MEM%" +"7.26" +"NET_IN_BYTES" +"0" +"NET_OUT_BYTES" +"0" +"BLOCK_IN_BYTES" +"0" +"BLOCK_OUT_BYTES" +"639000" +"PIDS" +"32" +``` +Along with data new entry for timestamp will be updated in state_db: + +``` +DOCKER_STATS|LastUpdateTime +``` \ No newline at end of file diff --git a/doc/threshold/SONiC Threshold feature spec.md b/doc/threshold/SONiC Threshold feature spec.md new file mode 100644 index 0000000000..f9bc43455e --- /dev/null +++ b/doc/threshold/SONiC Threshold feature spec.md @@ -0,0 +1,585 @@ +# Feature Name +Threshold feature. + +# High Level Design Document +#### Rev 0.1 + +# Table of Contents + * [List of Tables](#list-of-tables) + * [Revision](#revision) + * [About This Manual](#about-this-manual) + * [Scope](#scope) + * [Definition/Abbreviation](#definition-abbreviation) + * [1 Feature Overview](#1-feature-overview) + * [1.1 Requirements](#1_1-requirements) + * [1.1.1 Functional Requirements](#111-functional-requirements) + * [1.1.2 Configuration and Management Requirements](#112-configuration-and-management-requirements) + * [1.1.3 Scalability Requirements](#113-scalability-requirements) + * [1.2 Design Overview](#12-design-overview) + * [1.2.1 Basic Approach](#121-basic-approach) + * [1.2.2 Container](#122-container) + * [1.2.3 SAI Overview](#123-sai-overview) + * [2 Functionality](#2-functionality) + * [2.1 Target Deployment Use Cases](#21-target-deployment-use-cases) + * [2.2 Functional Description](#22-functional-description) + * [3 Design](#3-design) + * [3.1 Overview](#31-overview) + * [3.1.1 ThresholdMgr](#311-thresholdmgr) + * [3.2 DB Changes](#32_db-changes) + * [3.2.1 CONFIG DB](#321-config-db) + * [3.2.2 APP DB](#322-app-db) + * [3.2.3 STATE DB](#323-state-db) + * [3.2.4 ASIC DB](#324-asic-db) + * [3.2.5 COUNTERS DB](#325-counters-db) + * [3.3 Switch State Service Design](#33-switch-state-service-design) + * [3.3.1 Orchestration Agent](#331-orchestration-agent) + * [3.3.2 Other Process](#332-other-process) + * [3.4 Syncd](#34-syncd) + * [3.5 SAI](#35-sai) + * [3.6 CLI](#36-cli) + * [3.6.1 Data Models](#361-data-models) + * [3.6.2 Configuration Commands](#362-configuration-commands) + * [3.6.3 Show Commands](#363-show-commands) + * [3.6.4 Clear Commands](#364-clear-commands) + * [3.6.5 Debug Commands](#365-debug-commands) + * [3.6.6 REST API Support](#366-rest-api-support) + * [4 Flow Diagrams](#4-flow-diagrams) + * [4.1 Config Call Flow](#41-config-call-flow) + * [4.2 Breach Event Call Flow](#42-breach-event-call-flow) + * [5 Error Handling](#5-error-handling) + * [6 Serviceability And Debug](#6-serviceability-and-debug) + * [7 Warm Boot Support](#7-warm-boot-support) + * [8 Scalability](#8-scalability) + * [9 Unit Test](#9-unit-test) + +# List of Tables +[Table 1: Abbreviations](#table-1-abbreviations) + +# Revision +| Rev | Date | Author | Change Description | +|:---:|:-----------:|:------------------:|-----------------------------------| +| 0.1 | 06/12/2019 | Shirisha Dasari | Initial version | + +# About this Manual +This document provides general information about the threshold feature implementation in SONiC. + +# Scope +This document describes the high level design of threshold feature. + +# Definition/Abbreviation +### Table 1: Abbreviations +| **Term** | **Meaning** | +|--------------------------|-------------------------------------| +| SAI | Switch abstraction interface | +| TAM | Telemetry and monitoring | +| Watermark | Peak counter value reached | +| ThresholdMgr | Threshold manager | + +# 1 Feature Overview +The threshold feature allows configuration of a threshold on supported buffers in ingress and egress. A threshold breach notification (entry update in COUNTERS\_DB) is generated on breach of the configured buffer threshold in ASIC. + +## 1.1 Requirements +### 1.1.1 Functional Requirements + +1.0 Threshold feature allows the user to configure a threshold on a buffer (i.e buffer usage) and record all threshold breach events for that buffer. + +1.1 On breach of the configured threshold in ASIC, a breach event entry is added to COUNTERS\_DB. User can list out the breach event data through a CLI command. In addition, the breach entries in COUNTERS\_DB are also available to SONiC telemetry feature as well. + +1.2 The threshold feature can be used to track and be notified of congestion events in the network real-time. + +1.3 Monitoring of buffer usage can be achieved via the watermark feature available in SONiC. + +1.4 Along with the watermark feature, the threshold feature provides data that can be used to design applications to monitor MMU buffer usage. + +2.0.0 The threshold for a buffer must be configured in terms of percentage of buffer allocation. + +2.0.1 The configured threshold can be cleared at any point by the user. + +2.0.2 The supported counters for threshold feature are listed below. + +2.0.2.1 SAI\_INGRESS\_PRIORITY\_GROUP\_STAT\_SHARED\_WATERMARK\_BYTES -- Current ingress priority-group shared buffer usage in bytes. + +2.0.2.2 SAI\_INGRESS\_PRIORITY\_GROUP\_STAT\_XOFF\_ROOM\_WATERMARK\_BYTES -- Current ingress priority-group headroom buffer usage in bytes. + +2.0.2.3 SAI\_QUEUE\_STAT\_SHARED\_WATERMARK\_BYTES -- Current egress queue buffer usage in bytes. + +2.0.3 The counters listed in 2.0.2.1, 2.0.2.2 and 2.0.2.3 are to be supported for all ports on SONiC. + +2.0.4 The threshold can be configured on one or more of the 8 priority-groups of a port. + +2.0.5 The threshold can be configured on one or more of the unicast/multicast queues of a port. + +2.0.6 On breach of the configured threshold, the breach information is written into THRESHOLD\_BREACH\_TABLE of COUNTERS\_DB. + +2.0.6.1 The breach entry must indicate the counter on which the threshold breached. + +2.0.6.2 The breach entry must indicate the port associated with the counter on which the breach occurred. + +2.0.6.3 The breach entry must indicate the counter index of the port (ref 2.0.4, 2.0.5) on which the breach occurred i.e the priority-group index for 2.0.2.1, 2.0.2.2 and queue index and type for 2.0.2.3. + +2.0.6.4 The breach entry must contain the buffer usage in terms of percentage at the time of breach. + +2.0.6.5 The breach entry must contain the absolute buffer usage in bytes at the time of breach. + +2.0.6.6 The breach entry must contain the time stamp of the breach event. The ASIC provides the time stamp of breach event via the breach event protobuf. + +2.0.7 The threshold feature must be a part of the TAM container along with other TAM features. + +3.0 The threshold feature must be supported on physical ports. LAG member ports will continue to be tracked at the physical port level. + +3.1 The threshold feature does not support dynamic port breakout. + +3.2 In case of static port breakout, the user is expected to reconfigure the broken out ports. + +3.3 CPU port counters for queues are not supported for threshold configuration. + +4.0 The threshold can be configured in percentage mode only. + +4.1 The breach entry in COUNTERS\_DB must show the usage at breach in bytes and percentage. + +5.0 UI commands available to check currently configured thresholds. + +5.1 UI commands available to show threshold breach events. + + +### 1.1.2 Configuration and Management Requirements + +The threshold feature will use the python click framework for CLI. + +### 1.1.3 Scalability Requirements +There is no limit enforced on the threshold configuration and breach event entries. + +## 1.2 Design Overview +### 1.2.1 Basic Approach +The threshold feature is newly developed. + +### 1.2.2 Container +A new container, "tam" is created to host the ThresholdMgr. + +### 1.2.3 SAI Overview +The SAI TAM spec specifies the TAM APIs to be used to configure the threshold functionality. Please refer section 3.5 for more details. + +# 2 Functionality +## 2.1 Target Deployment Use Cases + +The threshold feature can be used as a monitoring tool to detect congestion in the network. The breach information is available in COUNTERS\_DB thereby allowing applications to be built upon data from the threshold and watermark features to manage and monitor networks. + +## 2.2 Functional Description + +In a live network, tracking and monitoring buffer usage on a switch can be extremely useful in detecting and isolating potential and actual congestion events in the network. The availability of telemetry information w.r.t buffer usage on network nodes can help to develop network-level monitoring applications. + +The watermark feature in SONiC makes the buffer usage counters readily available in COUNTERS\_DB for multiple users. The watermarks can be used to track peak buffer occupancy across users and intervals. In addition to this, it may also be useful for the user to know when the buffer usage crosses a certain threshold. The threshold feature implements this specific functionality. + +The threshold feature allows configuration of the threshold value as a percentage of the buffer limits(MMU configuration via buffer profile or otherwise). The SAI APIs take threshold as a percentage and convert this percentage to an absolute value based upon the appropriate buffer limits configured. For example, if the configured buffer limit for a given priority-group of a port (shared) buffer is 5000 bytes, a threshold value of 10% configured by the user translates to an absolute value of 500 bytes. SAI configures the threshold for the buffer as 500 bytes and a threshold breach event gets generated if the buffer usage crosses 500 bytes. Note that the original buffer limit and the absolute threshold value configured is not exposed to the user, the user can simply set a threshold in terms of % and be notified when the threshold is crossed. + +The supported buffers for threshold are: + +1. Ingress port priority group + 1. Shared - Specifies the shared buffer usage on a priority group. + 2. Headroom - Specifies the additional buffer limit that can be utilized when the shared limits are exhausted in the case where flow control is enabled. It accounts for the round trip time of data to be received once flow control is asserted. +2. Egress port queues + 1. Unicast + 2. Multicast + + +# 3 Design +## 3.1 Overview + +![Threshold architecture](images/Threshold_arch.jpg) + +The above diagram illustrates the architecture of the threshold feature within SONiC. The call flow sequence is listed below: + +1. CLI takes threshold configuration and populates it into THRESHOLD\_TABLE of config DB. +2. The ThresholdOrch component of Orchagent in SWSS docker picks up the config DB updates on THRESHOLD\_TABLE and converts the config into SAIREDIS calls. ASIC\_DB is updated by SAIREDIS and SYNCD gets notified of incoming updates. +3. SYNCD invokes the SAI APIs for configuring the threshold. +4. SAI configures the ASIC with threshold configuration. +5. ASIC notifies SAI of a threshold breach notification. +6. SAI collates the breach information into a protobuf message and sends it out to the configured collector (in this case, ThresholdMgr) on a socket. +7. ThresholdMgr parses the threshold breach protobuf message and populates the information into COUNTERS\_DB. +8. The data available in COUNTERS\_DB is used by CLI for showing breach information. + +Some salient points are listed below: + +1. ThresholdMgr currently only handles the breach event protobuf processing. It can be extended to support handling of dynamic port breakout configuration etc. in future. +2. ThresholdOrch subscribes to notifications on THRESHOLD\_TABLE of config DB. Since the configuration is in-line with SAI configuration, the orch agent directly picks this config up from config DB. +3. TAM docker currently only hosts ThresholdMgr. It is intended that new TAM applications would be hosted in this container going forward. + +## 3.1.1 ThresholdMgr + +ThresholdMgr running in the TAM container handles the protobuf parsing of the threshold breach report data. SAI is configured by ThresholdOrch to send the threshold breach event notification to a local collector i.e ThresholdMgr. + +The protobuf data received on the socket is decoded and parsed. The data is formatted and written into the THRESHOLD\_BREACH\_TABLE of COUNTERS\_DB. + +Once the dynamic port breakout feature is available, ThresholdMgr needs to be enhanced to subscribe to changes on PORT\_TABLE in APP\_DB and STATE\_DB to handle any breakout configuration. + +## 3.2 DB Changes +### 3.2.1 CONFIG DB + +#### THRESHOLD\_TABLE + + ; New table + ; Defines threshold configuration. + + key = buffer|type|alias|index ; buffer can be "priority-group" or "queue". + ; type (buffer type) can be + ; "shared" + ; "headroom" + ; "unicast" + ; "multicast" + ; alias is unique across all DBs + ; index (buffer index per port) + threshold = 1*3DIGIT ; Threshold in % (1-100) + + Example: + + Queue threshold configuration: + 127.0.0.1:6379[4]> keys *THRESHOLD* + 1) "THRESHOLD_TABLE|queue|unicast|Ethernet32|4" + + 127.0.0.1:6379[4]> HGETALL "THRESHOLD_TABLE|queue|unicast|Ethernet32|4" + 1) "threshold" + 2) "80" + + Priority-group configuration: + 127.0.0.1:6379[4]> keys *THRESHOLD* + 1) "THRESHOLD_TABLE|priority-group|shared|Ethernet40|6" + + 127.0.0.1:6379[4]> hgetall THRESHOLD_TABLE|priority-group|shared|Ethernet40|6 + 1) "threshold" + 2) "20" + + +### 3.2.2 APP DB +N/A +### 3.2.3 STATE DB +N/A +### 3.2.4 ASIC DB + +The ASIC DB is updated by SAI REDIS upon invocation of SAI REDIS APIs by ThresholdOrch. +Following is an example of the ASIC\_DB configuration on applying a configuration of 80% threshold on queue 4 of port "Ethernet32" + + 127.0.0.1:6379[1]> keys *TAM* + 1) "ASIC_STATE:SAI_OBJECT_TYPE_TAM_REPORT:oid:0x490000000005b1" + 2) "ASIC_STATE:SAI_OBJECT_TYPE_TAM_TRANSPORT:oid:0x4c0000000005b3" + 3) "ASIC_STATE:SAI_OBJECT_TYPE_TAM_COLLECTOR:oid:0x4e0000000005b4" + 4) "ASIC_STATE:SAI_OBJECT_TYPE_TAM_EVENT:oid:0x5000000000063c" + 5) "ASIC_STATE:SAI_OBJECT_TYPE_TAM_EVENT_THRESHOLD:oid:0x4a00000000063b" + 6) "ASIC_STATE:SAI_OBJECT_TYPE_TAM_EVENT_ACTION:oid:0x4f0000000005b2" + 7) "ASIC_STATE:SAI_OBJECT_TYPE_TAM:oid:0x3c00000000063d" + + 127.0.0.1:6379[1]> hgetall ASIC_STATE:SAI_OBJECT_TYPE_TAM_REPORT:oid:0x490000000005b1 + 1) "SAI_TAM_REPORT_ATTR_TYPE" + 2) "SAI_TAM_REPORT_TYPE_PROTO" + + 127.0.0.1:6379[1]> hgetall ASIC_STATE:SAI_OBJECT_TYPE_TAM_TRANSPORT:oid:0x4c0000000005b3 + 1) "SAI_TAM_TRANSPORT_ATTR_TRANSPORT_TYPE" + 2) "SAI_TAM_TRANSPORT_TYPE_UDP" + 3) "SAI_TAM_TRANSPORT_ATTR_SRC_PORT" + 4) "7070" + 5) "SAI_TAM_TRANSPORT_ATTR_DST_PORT" + 6) "7071" + + 127.0.0.1:6379[1]> hgetall ASIC_STATE:SAI_OBJECT_TYPE_TAM_COLLECTOR:oid:0x4e0000000005b4 + 1) "SAI_TAM_COLLECTOR_ATTR_SRC_IP" + 2) "10.10.10.10" + 3) "SAI_TAM_COLLECTOR_ATTR_DST_IP" + 4) "127.0.0.1" + 5) "SAI_TAM_COLLECTOR_ATTR_TRANSPORT" + 6) "oid:0x4c0000000005b3" + 7) "SAI_TAM_COLLECTOR_ATTR_DSCP_VALUE" + 8) "0" + + 127.0.0.1:6379[1]> hgetall ASIC_STATE:SAI_OBJECT_TYPE_TAM_EVENT:oid:0x5000000000063c + 1) "SAI_TAM_EVENT_ATTR_TYPE" + 2) "SAI_TAM_EVENT_TYPE_QUEUE_THRESHOLD" + 3) "SAI_TAM_EVENT_ATTR_ACTION_LIST" + 4) "1:oid:0x4f0000000005b2" + 5) "SAI_TAM_EVENT_ATTR_COLLECTOR_LIST" + 6) "1:oid:0x4e0000000005b4" + 7) "SAI_TAM_EVENT_ATTR_THRESHOLD" + 8) "oid:0x4a00000000063b" + + + 127.0.0.1:6379[1]> hgetall ASIC_STATE:SAI_OBJECT_TYPE_TAM_EVENT_THRESHOLD:oid:0x4a00000000063b + 1) "SAI_TAM_EVENT_THRESHOLD_ATTR_ABS_VALUE" + 2) "80" + 3) "SAI_TAM_EVENT_THRESHOLD_ATTR_UNIT" + 4) "SAI_TAM_EVENT_THRESHOLD_UNIT_PERCENTAGE" + + 127.0.0.1:6379[1]> hgetall ASIC_STATE:SAI_OBJECT_TYPE_TAM_EVENT_ACTION:oid:0x4f0000000005b2 + 1) "SAI_TAM_EVENT_ACTION_ATTR_REPORT_TYPE" + 2) "oid:0x490000000005b1" + + 127.0.0.1:6379[1]> hgetall ASIC_STATE:SAI_OBJECT_TYPE_TAM:oid:0x3c00000000063d + 1) "SAI_TAM_ATTR_EVENT_OBJECTS_LIST" + 2) "1:oid:0x5000000000063c" + 3) "SAI_TAM_ATTR_TAM_BIND_POINT_TYPE_LIST" + 4) "1:SAI_TAM_BIND_POINT_TYPE_QUEUE" + + The following config binds the TAM threshold configuration to the appropriate queue: + + 127.0.0.1:6379[1]> hgetall ASIC_STATE:SAI_OBJECT_TYPE_QUEUE:oid:0x150000000001f8 + 1) "NULL" + 2) "NULL" + 3) "SAI_QUEUE_ATTR_TYPE" + 4) "SAI_QUEUE_TYPE_UNICAST" + 5) "SAI_QUEUE_ATTR_INDEX" + 6) "4" + 7) "SAI_QUEUE_ATTR_TAM_OBJECT" + 8) "oid:0x3c00000000063d" + +### 3.2.5 COUNTERS DB +#### THRESHOLD\_BREACH\_TABLE + + ; New table + ; Defines threshold breach information. + + key = breach-report:index ; Breach report index. + + buffer = 1*255VCHAR ; Buffer - "priority-group" or "queue". + type = 1*255VCHAR ; Buffer type - + ; "shared" + ; "headroom" + ; "unicast" + ; "multicast" + port = 1*64VCHAR ; Port on which breach occurred. Unique across all DBs. + index = 1*2DIGIT ; Priority group/queue index. + breach_value = 1*3DIGIT ; Counter value in % at breach. + counter = 1*64DIGIT ; SAI counter value in bytes at breach. + time-stamp = %y-%m-%d - %H:%M:%S ; time-stamp when the threshold breach occurred + + Example: + + Queue breach entry: + + 127.0.0.1:6379[2]> KEYS THRESHOLD* + 1) "THRESHOLD_BREACH_TABLE:breach-report:1" + 127.0.0.1:6379[2]> HGETALL THRESHOLD_BREACH_TABLE:breach-report:1 + 1) "buffer" + 2) "queue" + 3) "type" + 4) "unicast" + 5) "port" + 6) "Ethernet32" + 7) "index" + 8) "4" + 9) "breach_value" + 10) "82" + 11) "SAI_QUEUE_STAT_SHARED_WATERMARK_BYTES" + 12) "8100" + 13) "time-stamp" + 14) "2019-06-14 - 11:29:33" + + Priority-group breach entry: + + 127.0.0.1:6379[2]> KEYS THRESHOLD* + 1) "THRESHOLD_BREACH_TABLE:breach-report:2" + 127.0.0.1:6379[2]> HGETALL THRESHOLD_BREACH_TABLE:breach-report:2 + 1) "buffer" + 2) "priority-group" + 3) "type" + 4) "shared" + 5) "port" + 6) "Ethernet32" + 7) "index" + 8) "7" + 9) "breach_value" + 10) "71" + 11) "SAI_INGRESS_PRIORITY_GROUP_STAT_SHARED_WATERMARK_BYTES" + 12) "8100" + 13) "time-stamp" + 14) "2019-06-14 - 11:29:33" + + +## 3.3 Switch State Service Design +### 3.3.1 Orchestration Agent + +A new orchestration agent class, ThresholdOrch is added to convert the incoming threshold config to ASIC configuration. ThresholdOrch subscribes to the THRESHOLD\_TABLE of CONFIG\_DB and converts the configuration to SAI TAM API call sequence described in section 3.5. + +ThresholdOrch maintains data pertaining to all the currently configured thresholds and the associated TAM object bindings. TAM object bindings are re-used wherever possible. + +### 3.3.2 Other Process +N/A + +## 3.4 SyncD +N/A + +## 3.5 SAI + +The SAI TAM API spec defines all TAM APIs supported in SAI. In addition, buffer, port, queue and switch APIs are used to bind the TAM objects to buffer statistics to be monitored. + +TAM objects used for threshold configuration are listed below: + +1) TAM threshold object - sai_create_tam_event_threshold_fn() +2) TAM report object - sai_create_tam_report_fn() +3) TAM event action object - sai_create_tam_event_action_fn() +4) TAM event object - sai_create_tam_event_fn() +5) TAM object - sai_create_tam_fn() + +The conversion of threshold from percentage to bytes is a SAI-dependent function. For example, in the case of Broadcom SAI, the percentage configuration uses the following principle: + +1) For static buffer limits - configured % is the % of allocated static limit of the buffer. +2) For dynamic buffer limits - configured % is % of total shared pool limit for the buffer i.e service-pool shared limit. + +## 3.6 CLI +### 3.6.1 Data Models +NA +### 3.6.2 Configuration Commands + +1) config priority-group threshold {port_alias} {PGindex} {shared \| headroom} {value}. + +This command is used to configure a threshold for a specific priority-group shared/headroom buffer of a port. The threshold value is provided in %. Valid values are 1-100. + + +2) config queue threshold {port_alias} {queueindex} {unicast \| multicast} {value} + +This command is used to configure a threshold for a specific unicast/multicast queue of a port. The threshold value is provided in %. Valid values are 1-100. + +### 3.6.3 Show Commands +1) show priority-group threshold {shared \| headroom} + +This show command shows the threshold configuration for the shared/headroom priority-group buffers of all ports. + +2) show queue threshold {unicast \| multicast} + +This show command shows the threshold configuration for the unicast/multicast queue buffers of all ports. + +3) show threshold breaches + +This command shows the threshold breaches recorded by the system. + +### 3.6.4 Clear commands + +1) sonic-clear priority-group threshold {port_alias} {PGindex} {shared \| headroom} + +This command can be used to clear a previously configured threshold on shared/headroom priority-group buffer of a port. + +2) sonic-clear priority-group threshold + +This command can be used to clear all currently configured priority-group thresholds. + +3) sonic-clear queue threshold {port_alias} {queueindex} {unicast \| multicast} + +This command can be used to clear a previously configured threshold on unicast/multicast queue buffer of a port. + +4) sonic-clear queue threshold + +This command can be used to clear all currently configured priority-group thresholds. + +5) sonic-clear threshold breach {eventid} + +This command can be used to clear all/particular threshold breaches recorded by the system. The event-id when provided specifies the breach event index (index is indicated in the output of show threshold breaches). + +### 3.6.5 Debug Commands +Debug commands will be available once the debug framework is available. The debug commands are needed to dump: + +- Threshold entries and TAM object bindings maintained by ThresholdOrch. +- Incoming protobuf data at ThresholdMgr. + +### 3.6.6 REST API Support +N/A + +# 4 Flow Diagrams +## 4.1 Config call flow + +![Threshold config call flow](images/Threshold_call_flow.jpg) + +## 4.2 Breach event call flow + +![Threshold breach event call flow](images/Threshold_breach_call_flow.jpg) + +# 5 Error Handling + +## CLI +- CLI configuration sanity is enforced by the CLI handler and any invalid configuration is rejected. An error is displayed to the user notifying the reason for rejection of the configuration. + +## ThresholdMgr +- Any errors encountered during parsing of protobuf message are logged. + +## ThresholdOrch +- Any error occurring in the orchestration agent is logged appropriately via SWSS logging. +- Errors or failures of SAI APIs are logged by ThresholdOrch. +- On failure of a SAI TAM API in the config sequence of section 3.5, the previously configured steps are rolled back i.e previously created intermediate TAM objects for threshold, event etc are destroyed. + +# 6 Serviceability and Debug +Debug commands specified in section 3.6.5 will be supported once the debug framework is available. + +# 7 Warm Boot Support +Warm boot is supported for the threshold feature. + +# 8 Scalability + +There is no limit enforced on the threshold configuration and breach event entries. + +# 9 Unit Test +## CLI + +1. Verify CLI command to configure shared/headroom priority-group threshold on a port. +2. Verify CLI command to configure unicast/multicast queue threshold on a port. +3. Verify if priority-group shared and priority-group headroom thresholds can be configured on a port. +4. Verify if queue unicast and queue multicast thresholds can be configured on a port. +5. Verify if priority-group and queue threshold configuration can be done on a port. +6. Verify if the same threshold value can be configured on multiple port's priority-group/queue. +7. Verify if configured threshold for priority-group can be cleared using the clear command. +8. Verify if configured threshold for queue can be cleared using the clear command. +9. Verify CLI show command to display all the currently configured thresholds across priority-groups on each port. +10. Verify CLI show command to display all the currently configured thresholds across queues on each port. +11. Verify CLI show command to display threshold breaches. +12. Verify CLI clear command to clear priority-group threshold. +13. Verify CLI clear command to clear queue threshold. +14. Verify CLI clear command to clear threshold breaches. + +## ThresholdOrch +1. Verify if threshold configuration from config DB is received by ThresholdOrch. +2. Verify if threshold configuration semantics are verified by ThresholdOrch before processing the data. +4. Verify if ThresholdOrch checks for an existing configuration with the incoming threshold config parameters and handles it appropriately. +5. Verify if ThresholdOrch is able to add new threshold configuration via SAI TAM APIs successfully. +6. Verify if ThresholdOrch is able to delete existing threshold configuration via SAI TAM APIs successfully. +7. Verify if ThresholdOrch is able to use existing TAM threshold objects for threshold config that uses a previously configured threshold value. +8. Verify if ThresholdOrch is able to use existing TAM objects for threshold config that warrants the re-use of existing TAM objects. +9. Verify if a change of threshold value on previous threshold configuration is handled appropriately. +10. Verify if all the internal tables and data are updated appropriately on addition of new threshold configuration. +11. Verify if all the internal tables and data are updated appropriately on deletion of new threshold configuration. +12. Verify if ThresholdOrch logs any errors arising out of SAI API failure. +13. Verify if ThresholdOrch rolls back config in a clean way if there is a SAI API failure. + +## ThresholdMgr +1. Verify if ThresholdMgr is able to receive protobuf messages from SAI on configuration of a threshold in SAI. +2. Verify if ThresholdMgr is able to decode and process the incoming protobuf message as a threshold breach event notification. +3. Verify if ThresholdMgr logs errors encountered in recv and processing of the threshold breach event notification. +4. Verify if ThresholdMgr can update the COUNTERS\_DB with the threshold breach data. +5. Verify if ThresholdMgr is able to generate multiple threshold breach entries in COUNTERS\_DB each with a unique breach report index. + + +## Functional Tests +1. Verify if a threshold on a priority-group shared buffer of a port can be configured successfully. +2. Verify if a threshold breach event gets generated by SAI and an entry is populated in THRESHOLD\_BREACH table for priority-group shared buffer on port once traffic is started. +3. Verify if a threshold on a priority-group headroom buffer of a port can be configured successfully. +4. Verify if a threshold breach event gets generated by SAI and an entry is populated in THRESHOLD\_BREACH table for priority-group headroom buffer once traffic is started. +5. Verify if a threshold on a queue unicast buffer of a port can be configured successfully. +6. Verify if a threshold breach event gets generated by SAI and an entry is populated in THRESHOLD\_BREACH table for queue unicast buffer on port once traffic is started. +7. Verify if a threshold on a queue multicast buffer of a port can be configured successfully. +8. Verify if a threshold breach event gets generated by SAI and an entry is populated in THRESHOLD\_BREACH table for queue multicast buffer on port once traffic is started. +9. Verify that there is no crash encountered at any of the layers with an invalid threshold configuration. +10. Verify that an invalid configuration is rejected gracefully at appropriate layers. + +## Negative test cases +1. Verify if CLI throws an error if the threshold value is <1% or >100%. +2. Verify if CLI returns an error if CLI is unable to write the threshold config to config DB. +3. Verify if CLI returns an error if CLI is unable to read the threshold config from config DB. +4. Verify if CLI returns an error if CLI is unable to read threshold breach information from counter DB. +5. Verify if CLI throws an error on providing invalid index in arguments. +6. Verify if CLI throws an error on invalid commands i.e commands other than unicast, multicast, shared, headroom and event-id. +7. Verify if ThresholdOrch logs an error on receipt of an incorrect config DB THRESHOLD\_TABLE entry. +8. Verify if ThresholdOrch logs an error if it is unable to read from the config THRESHOLD\_TABLE. +9. Verify if ThresholdOrch logs all errors encountered during processing of the incoming threshold config request. +10. Verify if ThresholdOrch handles invalid threshold value in the incoming table entry data. +11. Verify if ThresholdOrch handles invalid index values for priority groups in table key. +12. Verify if ThresholdOrch handles invalid buffer and buffer_type in incoming table keys. +13. Verify if ThresholdMgr logs an error message if it is unable to write into the COUNTERS\_DB. +7. Verify if ThresholdMgr logs an error message is there is an error encountered while creating the socket. +8. Verify if ThresholdMgr logs an error and drops a protobuf received with invalid port oid. +9. Verify if ThresholdMgr logs an error and drops a protobuf received with invalid pg oid. +10. Verify if ThresholdMgr logs an error and drops a protobuf received with invalid queue oid. +11. Verify if ThresholdMgr logs an error and drops a protobuf received with invalid buffer stats report. +12. Verify if ThresholdMgr logs an error if the incoming protobuf cannot be decoded into the breach proto format. diff --git a/doc/threshold/images/Threshold_arch.jpg b/doc/threshold/images/Threshold_arch.jpg new file mode 100644 index 0000000000..5f89634b04 Binary files /dev/null and b/doc/threshold/images/Threshold_arch.jpg differ diff --git a/doc/threshold/images/Threshold_breach_call_flow.jpg b/doc/threshold/images/Threshold_breach_call_flow.jpg new file mode 100644 index 0000000000..88ee8c3f64 Binary files /dev/null and b/doc/threshold/images/Threshold_breach_call_flow.jpg differ diff --git a/doc/threshold/images/Threshold_call_flow.jpg b/doc/threshold/images/Threshold_call_flow.jpg new file mode 100644 index 0000000000..4235c0ede1 Binary files /dev/null and b/doc/threshold/images/Threshold_call_flow.jpg differ diff --git a/doc/sonic-vrf-hld-v1.0.md b/doc/vrf/sonic-vrf-hld.md similarity index 61% rename from doc/sonic-vrf-hld-v1.0.md rename to doc/vrf/sonic-vrf-hld.md index daa7a7db1f..fbfe077dbc 100644 --- a/doc/sonic-vrf-hld-v1.0.md +++ b/doc/vrf/sonic-vrf-hld.md @@ -3,20 +3,30 @@ Table of Contents -- [SONiC VRF support design spec draft](#sonic-vrf-support-design-spec-draft) - - [Document History](#document-history) - - [Abbreviations](#abbreviations) - - [VRF feature Requirement](#vrf-feature-requirement) - - [Dependencies](#dependencies) - - [SONiC system diagram for VRF](#sonic-system-diagram-for-vrf) - - [The schema changes](#the-schema-changes) - - [Add VRF related configuration in config_db.json](#add-vrf-related-configuration-in-configdbjson) - - [Change redirect syntax in acl_rule_table of configdb](#change-redirect-syntax-in-aclruletable-of-configdb) - - [Add a VRF_TABLE in APP_DB](#add-a-vrftable-in-appdb) - - [Add 2-segment key entry support in APP-intf-table](#add-2-segment-key-entry-support-in-app-intf-table) - - [Add VRF key to app-route-table key list](#add-vrf-key-to-app-route-table-key-list) - - [Event flow diagram](#event-flow-diagram) - - [Agent changes](#agent-changes) +- [SONiC VRF support design spec draft](#SONiC-VRF-support-design-spec-draft) + - [Document History](#Document-History) + - [Abbreviations](#Abbreviations) + - [VRF feature Requirement](#VRF-feature-Requirement) + - [Functionality](#Functionality) + - [Target Deployment Use Cases](#Target-Deployment-Use-Cases) + - [Functional Description](#Functional-Description) + - [VRF route leak support](#VRF-route-leak-support) + - [Dependencies](#Dependencies) + - [SONiC system diagram for VRF](#SONiC-system-diagram-for-VRF) + - [The schema changes](#The-schema-changes) + - [Add VRF related configuration in config_db.json](#Add-VRF-related-configuration-in-configdbjson) + - [Add vrf-table in config_db](#Add-vrf-table-in-configdb) + - [Add vrf-binding information in config_db.json file](#Add-vrf-binding-information-in-configdbjson-file) + - [Add vrf information in the BGP configuration of config_db.json file](#Add-vrf-information-in-the-BGP-configuration-of-configdbjson-file) + - [Add static route information in config_db.json file](#Add-static-route-information-in-configdbjson-file) + - [Change redirect syntax in acl_rule_table of config_db.json](#Change-redirect-syntax-in-aclruletable-of-configdbjson) + - [Add a VRF_TABLE in APP_DB](#Add-a-VRFTABLE-in-APPDB) + - [Add 2-segment key entry support in APP-intf-table](#Add-2-segment-key-entry-support-in-APP-intf-table) + - [Add VRF key to app-route-table key list](#Add-VRF-key-to-app-route-table-key-list) + - [Event flow diagram](#Event-flow-diagram) + - [Modules changes](#Modules-changes) + - [Frr template changes](#Frr-template-changes) + - [loopback interface consideration](#loopback-interface-consideration) - [vrfmgrd changes](#vrfmgrd-changes) - [intfsmgrd changes](#intfsmgrd-changes) - [nbrmgrd changes](#nbrmgrd-changes) @@ -27,17 +37,21 @@ Table of Contents - [neighorch changes](#neighorch-changes) - [aclorch changes](#aclorch-changes) - [warm-reboot consideration](#warm-reboot-consideration) - - [TODO](#todo) - - [CLI](#cli) - - [user scenarios](#user-scenarios) - - [Configure ip address without vrf feature](#configure-ip-address-without-vrf-feature) - - [Add VRF and bind/unbind interfaces to this VRF](#add-vrf-and-bindunbind-interfaces-to-this-vrf) - - [Delete vrf](#delete-vrf) - - [Impact to other service after import VRF feature](#impact-to-other-service-after-import-vrf-feature) - - [Progress](#progress) - - [Test plan](#test-plan) - - [Appendix - An alternative proposal](#appendix---an-alternative-proposal) - - [vrf as key](#vrf-as-key) + - [TODO](#TODO) + - [CLI](#CLI) + - [Other Linux utilities](#Other-Linux-utilities) + - [User scenarios](#User-scenarios) + - [Configure ip address without vrf feature](#Configure-ip-address-without-vrf-feature) + - [Add VRF and bind/unbind interfaces to this VRF](#Add-VRF-and-bindunbind-interfaces-to-this-VRF) + - [Delete vrf](#Delete-vrf) + - [Create a Loopback interface and bind it to a VRF](#Create-a-Loopback-interface-and-bind-it-to-a-VRF) + - [Add IP address on Loopback interface](#Add-IP-address-on-Loopback-interface) + - [Delete Loopback interface](#Delete-Loopback-interface) + - [Static leak route configuration](#Static-leak-route-configuration) + - [Impact to other service after import VRF feature](#Impact-to-other-service-after-import-VRF-feature) + - [Test plan](#Test-plan) + - [Appendix - An alternative proposal](#Appendix---An-alternative-proposal) + - [vrf as the part of key](#vrf-as-the-part-of-key) - [intfsmgrd changes](#intfsmgrd-changes-1) - [intfsorch changes](#intfsorch-changes-1) @@ -54,6 +68,8 @@ Table of Contents | v.05 | 04/17/2019 | Xin Liu, Prince Sunny (MSFT) | Update the status | | v.06 | 05/09/2019 | Shine Chen, Jeffrey Zeng, Tyler Li | Add Some description and format adjustment | | v1.0 | 05/26/2019 | Shine Chen, Jeffrey Zeng, Tyler Li, Ryan Guo | After review, move proposal-2 in v0.6 to Appendix +| v1.1 | 06/04/2019 | Preetham Singh, Nikhil Kelapure, Utpal Kant Pintoo | Update on VRF Leak feature support | +| v1.2 | 07/21/2019 | Preetham Singh, Shine Chen | Update on loopback device/frr template for vrf and static route support | ## Abbreviations @@ -75,33 +91,82 @@ Table of Contents 4. Enable BGP VRF aware in SONiC 5. Fallback lookup. -The fallback feature which defined by RFC4364 is very useful for specified VRF user to access internet through global/main route. Some enterprise users still use this to access internet on vpn environment. + The fallback feature which defined by RFC4364 is very useful for specified VRF user to access internet through global/main route. Some enterprise users still use this to access internet on vpn environment. 6. VRF route leaking between VRFs. - 7. VRF Scalability: Currently VRF number can be supported up to 1000 after fixing a bug in FRR. +8. loopback devices with vrf. + - Add/delete Loopback interfaces per VRF + - Support to add IPv4 and IPv6 host address on these loopback interfaces + - Support to use these loopback interfaces as source for various routing protocol control packet transmission. For instance in case of BGP multihop sessions source IP of the BGP packet will be outgoing interface IP which can change based on the interface status bringing down BGP session though there is alternate path to reach BGP neighbor. In case loopback interface is used as source BGP session would not flap. + - These loopback interface IP address can be utilized for router-id generation on the VRF which could be utilized by routing protocols. + - Support to use these interfaces as tunnel end points if required in future. + - Support to use these interfaces as source for IP Unnumbered interface if required in future. -In this release, supporting requirement 5) and 6) are not supported. See next section for details. +In this release, requirement 5) is not supported. See next section for details. Note: linux kernel use VRF master device to support VRF and it supports admin up/down on VRF master device. But we don't plan to support VRF level up/down state on SONIC. +## Functionality + +### Target Deployment Use Cases + +Virtual Routing and Forwarding is used to achieve network virtualization and traffic separation over on a shared network infrastructure in Enterprise and DC delpoyments. + +![Deployment use case](https://github.com/preetham-singh/SONiC-1/blob/master/images/vrf_hld/Multi-VRF_Deployment.png "Figure 1: Multi VRF Deployment use case") +__Figure 1: Multi VRF Deployment use case__ + +In above customer deployment: +Customer A and Customer B in Site 1 or Site 2 are customer facing devices referred as Customer Edge routers. +Router-1 and Router-2 are routers which provide interconnectivity between customers across sites referred as Provider Edge Routers. + +Above figure depicts typical customer deployment where multiple Customer facing devices are connected to Provider edge routers. +With such deployment Provider Edge routers associate each input interface with one VRF instance. + +In Figure 1, All cutomer facing devices belonging to Customer-A irrespective of the site, will belong to VRF Green and those in Customer-B will belong to VRF Red. +With this deployment, isolation of traffic is achieved across customers maintaining connectivity within same customer sites. + +Note that multiple input interfaces can be associated to a VRF instance. This input interface can be Physical interface, Port-channel or L3 Vlan interface. + +### Functional Description + +Multi-VRF is the ability to maintain multiple "Virtual Routing and Forwarding"(VRF) tables on the same Provider Edge router. +Multi-VRF aware routing protocol such as BGP is used to exchange routing information among peer Provider Edge routers. +The Multi-VRF capable Provider Edge router maps an input customer interface to a unique VRF instance. Provider Edge router maintains unique VRF table for each VRF instance on that Provider Edge router. + +Multi-VRF routers communicate with one another by exchanging route information in the VRF table with the neighboring Provider Edge router. +This exchange of information among the Provider Edge routers is done using routing protocol like BGP. +Customers connect to Provider Edge routers in the network using Customer Edge routers as shown in Figure 1. + +Due to this overlapping address spaces can be maintained among the different VRF instances. + +FRR's BGP implementation is capable of running multiple autonomous systems (AS) at once. Each configured AS is associated with a VRF instance. The SONiC VRF implementation will enable this capability of FRR BGP in SONiC. + +#### VRF route leak support + +Generally VRF route leak is a case where route and nexthops are in different VRF. VRF route leak for directly connected destinations in another VRF is also supported. + +VRF route leak can be achieved via both Static or Dynamic approach. + +In Static VRF route leak, FRR CLI can be used where user can specify nexthop IP along with nexthop VRF, where the nexthop IP is reachable through a nexthop VRF. + +In Dynamic VRF route leak, Route maps can be used to import routes from other VRFs. +Prefix lists within route maps are used to match route prefixes in source VRF and various action can be applied on these matching prefixes. +If route map action is permit, these routes will be installed into destination VRF. + +Leaked routes will be automatically deleted when corresponding routes are deleted in source VRF. + ## Dependencies VRF feature needs the following software package/upgrade 1. Linux kernel 4.9 -Linux Kernel 4.9 support generic IP VRF with L3 master net device. Every L3 -master net device has its own FIB. The name of the master device is the -VRF's name. Real network interface can join the VRF by becoming the slave of -the master net device. + Linux Kernel 4.9 support generic IP VRF with L3 master net device. Every L3 master net device has its own FIB. The name of the master device is the VRF's name. Real network interface can join the VRF by becoming the slave of the master net device. -Application can get creation or deletion event of VRF master device via RTNETLINK, -as well as information about slave net device joining a VRF. + Application can get creation or deletion event of VRF master device via RTNETLINK, as well as information about slave net device joining a VRF. -Linux kernel supports VRF forwarding using PBR scheme. It will fall to main -routing table to check do IP lookup. VRF also can have its own default network -instruction in case VRF lookup fails. + Linux kernel supports VRF forwarding using PBR scheme. All route lookups will be performed on main routing table associated with the VRF. VRF also can have its own default network instruction in case route lookup fails. 2. FRRouting is needed to support BGP VRF aware routing. @@ -130,23 +195,11 @@ ip [-6] rule add pref 32765 table local && ip [-6] rule del pref 0 SAI right now does not seem to have VRF concept, it does have VR. -We propose to implement VR as "virtual router" and VRF as "virtual router -forwarding" - -VR is defined as a logical routing system. VRF is defined as forwarding -domain within a VR. - -As this stage, we assume one VR per system. Only implement VRFs within this VR. +Hence in this implementation release we use VR object as VRF object. -Accordingly, we need to add vrf_id to sai_Route_entry and add vrf attribute -to sai_routeInterface object. - -An alternative method is using VR as VRF, this requires to add two attribution -to VR object to support Requirement 5) (fallback lookup). SAI community has -decided to take VR as VRF. So in this implementation release we use VR object as VRF object. Here are the new flags we propose to add in the SAI interface: -```jason +```json /* * @brief if it is global vrf * @@ -178,9 +231,9 @@ The following is high level diagram of modules with VRF support. Note "fallback" keyword is not supported in this release. -Add vrf-table in config_db.json file. +#### Add vrf-table in config_db -```jason +```json "VRF": { "Vrf-blue": { "fallback":"true" //enable global fib lookup while vrf fib lookup missed @@ -195,9 +248,9 @@ Add vrf-table in config_db.json file. ``` -Add vrf-binding information in config_db.json file. +#### Add vrf-binding information in config_db.json file -```jason +```json "INTERFACE":{ "Ethernet0":{ @@ -238,43 +291,93 @@ Add vrf-binding information in config_db.json file. With this approach, there is no redundant vrf info configured with an interface where multiple IP addresses are configured. -Logically IP address configuration must be processed after interface binding to vrf is processed. In intfmgrd/intfOrch process intf-bind-vrf event must be handled before IP address event. So interface-name entry in config_db.json is necessary even user doesn't use VRF feature. e.g. `"Ethernet2":{}` in the above example configuration. For version upgrade compatibility we need to add a script, this script will convert old config_db.json to new config_db.json at bootup, then the new config_db.json would contain the interface-name entry for interfaces associated in the global VRF table. +Logically IP address configuration must be processed after interface binding to vrf is processed. In intfmgrd/intfOrch process intf-bind-vrf event must be handled before IP address event. So interface-name entry in config_db.json is necessary even though user doesn't use VRF feature. e.g. `"Ethernet2":{}` in the above example configuration. For version upgrade compatibility we need to add a script, this script will convert old config_db.json to new config_db.json at bootup automatically, then the new config_db.json would contain the interface-name entry for interfaces associated in the global VRF table. -### Change redirect syntax in acl_rule_table of configdb +#### Add vrf information in the BGP configuration of config_db.json file -The existing acl_rule_table definition is the following. +```json -```jason +"BGP_NEIGHBOR": { + "Vrf-blue|10.0.0.49": { // This neighbour belongs to Vrf-blue + "name": "ARISTA09T0", + "rrclient": "0", + "local_addr": "10.0.0.48", + "asn": "64009", + "nhopself": "0" + } +} - "table1|rule1": { - "L4_SRC_PORT": "99", - "PACKET_ACTION": "REDIRECT:20.1.1.93,30.1.1.93" - }, - "table1|rule2": { - "L4_SRC_PORT": "100", - "PACKET_ACTION": "REDIRECT:20.1.1.93" - }, +"BGP_PEER_RANGE": { + "BGPSLBPassive": { // This BGP_PEER_Group belong to Vrf-blue + "name": "BGPSLBPassive", + "vrf_name": "Vrf-blue", + "src_address":"10.1.1.2", + "ip_range": [ + "192.168.8.0/27" + ] + } +} ``` -To support vrf the nexthop key should change to `{IP,interface}` pair from single `{IP}`. So new acl_rule_table should like the following. +#### Add static route information in config_db.json file -```jason +```json - "table1|rule1": { - "L4_SRC_PORT": "99", - "PACKET_ACTION": "REDIRECT:20.1.1.93|Ethernet10,30.1.1.93|Ethernet11" +"STATIC_ROUTE": { + "11.11.11.0/24": { + "distance": "10", + "nexthop": "1.1.1.1" }, - "table1|rule2": { - "L4_SRC_PORT": "100", - "PACKET_ACTION": "REDIRECT:20.1.1.93|Ethernet11" + + "Vrf-blue|22.11.11.0/24": { + "distance": "10", + "nexthop-vrf": "Vrf-red", + "nexthop": "2.1.1.1" }, + "Vrf-red|11.11.11.0/24": { + "nexthop": "1.1.1.1" + } +} + +``` + +### Change redirect syntax in acl_rule_table of config_db.json + +The existing acl_rule_table definition is the following. + +```json + +"table1|rule1": { + "L4_SRC_PORT": "99", + "PACKET_ACTION": "REDIRECT:20.1.1.93,30.1.1.93" +}, +"table1|rule2": { + "L4_SRC_PORT": "100", + "PACKET_ACTION": "REDIRECT:20.1.1.93" +}, + +``` + +To support vrf the nexthop key should change to `{IP,interface}` pair from single `{IP}`. For backward compatibilty nexthop key `{IP}` is also supported, it only works on global vrf. So new acl_rule_table should like the following. + +```json + +"table1|rule1": { + "L4_SRC_PORT": "99", + "PACKET_ACTION": "REDIRECT:20.1.1.93|Ethernet10,30.1.1.93" +}, +"table1|rule2": { + "L4_SRC_PORT": "100", + "PACKET_ACTION": "REDIRECT:20.1.1.93|Ethernet11, 30.1.1.93" +}, + ``` ### Add a VRF_TABLE in APP_DB -```jason +```json ;defines virtual routing forward table ; ;Status: stable @@ -299,12 +402,11 @@ app-intf-table is defined as the following: key = INTF_TABLE:ifname mtu = 1\*4DIGIT ; MTU for the interface -VRF_NAME = 1\*64VCHAR ; +VRF_NAME = 1\*15VCHAR ; ``` app-intf-prefix-table is defined as the following corresponding to config_db definition. - ```json ;defines logical network interfaces with IP-prefix, an attachment to a PORT and list of 0 or more ip prefixes; @@ -319,7 +421,7 @@ family = "IPv4" / "IPv6" ; address family ### Add VRF key to app-route-table key list -```jason +```json ;Stores a list of routes ;Status: Mandatory @@ -440,7 +542,71 @@ sequenceDiagram ``` -## Agent changes +## Modules changes + +### Frr template changes + +- FRR template must enhance to contain VRF related configuration and static route configuration. +- On startup `sonic-cfggen` will use `frr.conf.j2` to generate `frr.conf` file. + +The generated frr.conf with vrf and static route feature is like the following: + +```json + +## static route configuration +! +ip route 11.11.11.0/24 1.1.1.1 10 +! +vrf Vrf-red +## static leak route + ip route 11.11.11.0/24 Vrf-blue 1.1.1.1 +! +## bgp configuration with VRF +router bgp 64015 vrf Vrf-red + bgp router-id 4.4.4.4 + network 4.4.4.4/32 + neighbor 10.0.0.49 remote-as 64009 + neighbor 10.0.0.49 description ARISTA09T0 + address-family ipv4 + neighbor 10.0.0.49 activate + neighbor 10.0.0.49 soft-reconfiguration inbound + maximum-paths 64 + exit-address-family + neighbor PeerGroup peer-group + neighbor PeerGroup passive + neighbor PeerGroup remote-as 65432 + neighbor PeerGroup ebgp-multihop 255 + neighbor PeerGroup update-source 2.2.2.2 + bgp listen range 192.168.0.0/27 peer-group PeerGroup + address-family ipv4 + neighbor PeerGroup activate + neighbor PeerGroup soft-reconfiguration inbound + neighbor PeerGroup route-map FROM_BGP_SPEAKER_V4 in + neighbor PeerGroup route-map TO_BGP_SPEAKER_V4 out + maximum-paths 64 + exit-address-family +! +router bgp 64015 + bgp router-id 3.3.3.3 +..... +! +! +``` + +### loopback interface consideration + +- Sonic must support multiple loopback interfaces and each loopback interface can belong to different vrf. Because linux kernel can only support one lo interface. We will use dummy interface instead of lo interface in linux kernel. + +- The following is the example to configure new loopback interface by using dummy interface + + ```bash + ip link add loopback1 type dummy + ip link set loopback1 up + ip link set dev loopback1 master Vrf_blue + ip address add 10.0.2.2/32 dev loopback1 + ``` + +- In the existing implementation interface-config.service takes care of loopback configuration. IntfMgrd will take interface-config.service instead to handle it. ### vrfmgrd changes @@ -451,20 +617,21 @@ update kernel using iproute2 CLIs and write VRF information to app-VRF-table. ### intfsmgrd changes -Ip address event and vrf binding event need to be handled seperately. These two events has sequency dependency. +IP address event and VRF binding event need to be handled seperately. These two events has sequence dependency. - Listening to interface binding to specific VRF configuration in config_db. - - bind to vrf event: - - bind kernel device to master vrf - - add interface entry with vrf attribute to app-intf-table. - - set intf-bind-vrf flag on statedb - - unbind from vrf event: + - bind to VRF event: + - bind kernel device to master VRF + - add interface entry with VRF attribute to app-intf-table. + - set intf-bind-vrf flag on STATE_DB + - unbind from VRF event: - wait until all ip addresses associated with the interface is removed. Ip address infomation can be retrieved from kernel. - - bind kernel device to global vrf - - del interface entry with vrf attribute from app-intf-table - - unset vrf-binding flag on statedb + - bind kernel device to global VRF + - del interface entry with VRF attribute from app-intf-table + - unset vrf-binding flag on STATE_DB - Listening to interface ip address configuration in config_db. - - add ip address event: wait until intf-bind-vrf flag is set, set ip address on kernel device and add {interface_name:ip address} entry to app-intf-prefix-table + - add ip address event: + - wait until intf-bind-vrf flag is set, set ip address on kernel device and add {interface_name:ip address} entry to app-intf-prefix-table - del ip address event: - unset ip address on kernel device - del {interface_name:ip address} entry from app-intf-prefix-table. @@ -545,22 +712,25 @@ $ config vrf add $ config vrf del //bind an interface to a VRF -$ config interface vrf bind +$ config interface vrf bind //unbind an interface from a VRF -$ config interface vrf unbind +$ config interface vrf unbind + +// create loopback device +$ config loopback add Loopback<0-999> + +// delete loopback device +$ config loopback del Loopback<0-999> // show attributes for a given vrf $ show vrf [] -// show the list of router interfaces -$ show router-interface [vrf ] - //add IP address to an interface. The command already exists in SONiC, but will be enhanced -$ config interface ip add +$ config interface ip add //remove an IP address from an interface. The command already exists in SONiC, but will be enhanced. -$ config interface ip del +$ config interface ip del //add a prefix to a VRF $ config route add [vrf ] prefix nexthop <[vrf ] | dev > @@ -571,6 +741,26 @@ $ config route del [vrf ] prefix nexthop <[vrf ] +//show ip interface command updated to show VRF name to which interface is bound to +$ show ip interface + +//show ipv6 interface command updated to show VRF name to which interface is bound to +$ show ipv6 interface +``` + +## Other Linux utilities + +Standard linux ping, ping6 and traceroute commands can be used on VRF by explicitly specifying kernel VRF device. +Since Kernel device name is same as user configured VRF name, VRF name itself can be used as interface in below commands. + +```bash +Ping/ping6: +ping [-I ] destination +ping6 [-I ] destination + +traceroute: +traceroute [-i ] destination +traceroute6 [-i ] destination ``` ## User scenarios @@ -584,9 +774,7 @@ If a user does not care about VRF configuration, it can simply use this command Lets use Ethernet0 as an example in this document. ```bash - -$ config interface Ethernet0 ip add 1.1.1.1/24 - +$ config interface ip add Ethernet0 1.1.1.1/24 ``` This command is enhanced to do the following: @@ -599,9 +787,7 @@ This command is enhanced to do the following: To remove IP address from an interface: ```bash - -$ config interface Ethernet0 ip remove 1.1.1.1/24 - +$ config interface ip remove Ethernet0 1.1.1.1/24 ``` This command is enhanced to do the following: @@ -617,10 +803,8 @@ This command is enhanced to do the following: In this case, user wants to configure a VRF "Vrf-blue", with interfaces attached to this VRF. Following are the steps: ```bash - $ config vrf add Vrf-blue -$ config interface Ethernet0 vrf bind Vrf-blue - +$ config interface bind Ethernet0 Vrf-blue ``` The Bind command will do the following: @@ -630,9 +814,7 @@ The Bind command will do the following: - Bind the interface to Vrf-blue (it will eventually create Ethernet0 router interface) ```bash - -$ config interface Ethernet0 ip add 1.1.1.1/24 - +$ config interface ip add Ethernet0 1.1.1.1/24 ``` This command will do the following: @@ -642,9 +824,7 @@ This command will do the following: To unbind an interface from VRF: ```bash - -$ config interface Ethernet0 vrf unbind - +$ config interface unbind Ethernet0 Vrf-blue ``` This command will do the following: @@ -659,25 +839,21 @@ User wants to delete a VRF (Vrf-blue), here are the steps: This set of commands will perform the work: ```bash - $ show vrf Vrf-blue This will to get interface list belonging to Vrf-blue from app_db -$ config interface Ethernet0 ip remove 1.1.1.1/24 +$ config interface ip remove Ethernet0 1.1.1.1/24 This will remove all IP addresses from the interfaces belonging to the VRF. $ config interface Ethernet0 vrf unbind This will unbind all interfaces from this VRF $ config vrf del Vrf-blue This command will delete the VRF. - ``` To simplify the user experience, we can combine the above commands to create one single command, similar to the iprotue2 command:`# ip link del Vrf-blue` This is the current proposal: ```bash - $ config vrf del Vrf-blue - ``` This command will do the following: @@ -687,6 +863,74 @@ This command will do the following: - unbind interfaces(s) from Vrf-blue - del Vrf-blue +### Create a Loopback interface and bind it to a VRF + +In case, user wants to configure Loopback say Loopback10 in Vrf-blue following are the steps: + +```bash +$ config loopback add Loopback10 +$ config interface bind Loopback10 Vrf-blue +``` + +This command does following operations: + +- Create loopback interface entry in LOOPBACK_INTERFACE in config_db +- vrf binding to Vrf-blue +- intfmgrd will create netdevice Loopback10 of type dummy and brings up the netdev. + + This will result in below sequence of netdev events in kernel from intfmgrd + + ```bash + ip link add Loopback10 type dummy + ip link set dev Loopback10 up + ip link set Loopback10 master Vrf-blue + ``` + +- intfsorch will store this interface-vrf binding in local cache of interface information + +### Add IP address on Loopback interface + +```bash +$ config interface ip add Loopback10 10.1.1.1/32 +``` + +This command does following operations: + +In intfmgr: + +- When IP address add is received on these loopback interfaces, ip address will be applied on corresponding kernel loopback netdevice. Also add {interface_name:ip address} to app-intf-prefix-table. + +In intforch + +- When app-intf-prefix-table sees IP address add/delete for Loopback interface, vrf name is fetched from local interface cache to obtain VRF-ID and add IP2ME route with this VRF ID. + +### Delete Loopback interface + +```bash +$ config loopback del Loopback10 +``` + +When user deletes loopback interface, first all IP configured on the interface will be removed from app-intf-prefix-table +Later interface itself will be deleted from INTERFACE table in config_db +In intfmgrd, this will flush all ip on netdev and deletes loopback netdev in kernel +Infrorchd will delete IP2ME routes from corresponding VRF and deletes local interface cache which holds vrf binding information. + +### Static leak route configuration + +User can configure static leak routes using below CLIs: + +```bash +$ config route add vrf Vrf-red prefix 10.1.1.1/32 nexthop vrf Vrf-green 100.1.1.2 +This installs route 10.1.1.1/32 in Vrf-red and with nexthop pointing to 100.1.1.2 in Vrf-green, provided below conditions are met: + - Vrf-green is configured in kernel + - 100.1.1.2 is reachable in Vrf-green + +$ config route del vrf Vrf-red prefix 10.1.1.1/32 nexthop vrf Vrf-green 100.1.1.2 +This deletes route 10.1.1.1/32 in Vrf-red. +``` + +Since FRR is used to manage Static route the converted frr command is issue to FRR suite by vtysh. + ## Impact to other service after import VRF feature For apps that don't care VRF they don't need to modify after sonic import VRF. @@ -719,10 +963,6 @@ be modified or restarted for VRF binding event. For layer 3 apps such as snmpd or ntpd they are using vrf-global socket too. So they are vrf-transparent too. -## Progress - -In the diagram, fpmsyncd, vrfmgrd, intfsmgrd, intfsorch are checked into the master branch. There may need changes to support VRF. Other components are working in progress, will be committed as planned. - ## Test plan A separate test plan document will be uploaded and reviewed by the community @@ -731,20 +971,20 @@ A separate test plan document will be uploaded and reviewed by the community The VRF binding and IP address configuration dependency can be solved in a different way. There are other areas to be considered as well to make the VRF feature support solid. A different proposal is also considered, discussed but rejected by the community. It's listed here for future reference. -Major areas to be addressed in the above chosen proposal are: +Major areas to be addressed in the above chosen proposal are: -1) VRF bind and IP config message sequence dependency -2) Need INTERFACE|Ethernet0:{}, even if user does not use VRF config. Not compatible with JSON file. -3) Not compatible with the existing JSON file, need a script to convert -4) Warm reboot implementation become complicated -5) Unit test cases (swss and Ansible) are not compatible, need test case modification -6) Add more wait states in each daemon, may have performance impact +1) VRF bind and IP config message sequence dependency +2) Need INTERFACE|Ethernet0:{}, even if user does not use VRF config. Not compatible with JSON file. +3) Not compatible with the existing JSON file, need a script to convert +4) Warm reboot implementation become complicated +5) Unit test cases (swss and Ansible) are not compatible, need test case modification +6) Add more wait states in each daemon, may have performance impact -### vrf as key +### vrf as the part of key Using this syntax in the config_db.json can also solve the sequence dependency: -```jason +```json "INTERFACE":{ "Ethernet2|Vrf-blue|12.12.12.1/24": {} "Ethernet2|Vrf-blue|13.13.13.1/24": {} @@ -752,10 +992,9 @@ Using this syntax in the config_db.json can also solve the sequence dependency: }, .... - ``` -Here "Vrf-blue" is part of the IP address configuration of the interface. +Here "Vrf-blue" is part of the IP address configuration of the interface. Since it is very complicated to carry IP addresses when an interface moves from one VRF to another VRF, the current implementation is when interface moves from one VRF to another VRF, the IP address will be deleted. Because of this, we can treat VRF as part of the key of interface entry, but not an attribute. This solution has following advantage: diff --git a/doc/vrf/vrf-ansible-test-plan.md b/doc/vrf/vrf-ansible-test-plan.md new file mode 100644 index 0000000000..f3c68dc1df --- /dev/null +++ b/doc/vrf/vrf-ansible-test-plan.md @@ -0,0 +1,560 @@ +# VRF feature ansible test plan + + + +- [overview](#overview) + - [Scope](#scope) + - [Testbed](#testbed) +- [Setup configuration](#setup-configuration) + - [vrf config in t0 topo](#vrf-config-in-t0-topo) + - [Scripts for generating configuration on SONIC](#scripts-for-generating-configuration-on-sonic) + - [Pytest scripts to setup and run test](#pytest-scripts-to-setup-and-run-test) + - [Setup of DUT switch](#setup-of-dut-switch) + - [vrf configuration](#vrf-configuration) + - [bgp vrf configuration](#bgp-vrf-configuration) + - [acl redirect vrf configuration](#acl-redirect-vrf-configuration) + - [teardown operation after each test case](#teardown-operation-after-each-test-case) +- [PTF Test](#ptf-test) + - [Input files for PTF test](#input-files-for-ptf-test) + - [Traffic validation in PTF](#traffic-validation-in-ptf) +- [Test cases](#test-cases) + - [Test case #1 - vrf creat and bind](#test-case-1---vrf-creat-and-bind) + - [Test case #2 - neighbor learning in vrf](#test-case-2---neighbor-learning-in-vrf) + - [Test case #3 - route learning in vrf](#test-case-3---route-learning-in-vrf) + - [Test case #4 - isolation among different vrfs](#test-case-4---isolation-among-different-vrfs) + - [Test case #5 - acl redirect in vrf](#test-case-5---acl-redirect-in-vrf) + - [Test case #6 - loopback interface](#test-case-6---loopback-interface) + - [Test case #7 - Vrf WarmReboot](#test-case-7---vrf-warmreboot) + - [Test case #8 - Vrf capacity](#test-case-8---vrf-capacity) + - [Test case #9 - unbind intf from vrf](#test-case-9---unbind-intf-from-vrf) + - [Test case #10 - remove vrf when intfs is bound to vrf](#test-case-10---remove-vrf-when-intfs-is-bound-to-vrf) +- [TODO](#todo) + + +## overview + +The purpose is to test vrf functionality on the SONIC switch DUT, closely mimic the production environment. The test will use testbed's basic configuration as default and load different vrf configurations according to the test cases. + +### Scope + +The test is running on real SONIC switch with testbed's basic configuration. The purpose of the test is not to test specific C/C++ class or APIs, those tests are coverd by vs test cases. The purpose is to do VRF functional test on a SONIC system. They include vrf creation/deletion, neighbor/route learning in vrf, binding/unbinding vrf to L3 intf, isolation among vrfs, acl redirection/everflow in vrf and vrf attributes function. + +### Testbed + +The test will run on the `t0` testbed: + +## Setup configuration + +### vrf config in t0 topo + +![vrf-t0_topo](https://github.com/shine4chen/SONiC/blob/vrf-test-case/images/vrf_hld/vrf_t0_topo.png) + +### Scripts for generating configuration on SONIC + +There are some j2 template files for the vrf test configuration. They are used to generate json files and apply them on the DUT. + +### Pytest scripts to setup and run test + +Newly added `test_vrf.py` will be put in 'sonic-mgmt/tests/' directory. + +### Setup of DUT switch + +Setup of SONIC DUT will be done by SONiC CLI. During the setup process pytest will use SONiC CLI to config vrf, bind intf to vrf and add ip address to intf. + +#### vrf configuration + +```jason +{ + "VRF": { + "Vrf1": { + }, + "Vrf2": { + } + }, + "PORTCHANNEL_INTERFACE": { + "PortChannel0001": {"vrf_name": "Vrf1"}, + "PortChannel0002": {"vrf_name": "Vrf1"}, + "PortChannel0003": {"vrf_name": "Vrf2"}, + "PortChannel0004": {"vrf_name": "Vrf2"}, + "PortChannel0001|10.0.0.56/31": {}, + "PortChannel0001|FC00::71/126": {}, + "PortChannel0002|10.0.0.58/31": {}, + "PortChannel0002|FC00::75/126": {}, + "PortChannel0003|10.0.0.60/31": {}, + "PortChannel0003|FC00::79/126": {}, + "PortChannel0004|10.0.0.62/31": {}, + "PortChannel0004|FC00::7D/126": {} + }, + "VLAN_INTERFACE": { + "Vlan1000": {"vrf_name": "Vrf1}, + "Vlan1000|192.168.0.1/21": {}, + "Vlan2000": {"vrf_name": "Vrf2}, + "Vlan2000|192.168.0.1/21": {} + } +} +``` + +#### bgp vrf configuration + +We modify /usr/share/sonic/templates/frr.conf.j2 to generate the frr.conf include vrf configuration and apply the frr.conf to zebra process. The new config_db.json file in T0 topo is the following. + +```jason +{ + "BGP_PEER_RANGE": { + "BGPSLBPassive": { + "name": "BGPSLBPassive", + "ip_range": [ + "10.255.0.0/25" + ], + "vrf_name": "Vrf1" + }, + "BGPVac": { + "name": "BGPVac", + "ip_range": [ + "192.168.0.0/21" + ], + "vrf_name": "Vrf1" + } + }, + "BGP_NEIGHBOR": { + "Vrf1|10.0.0.57": { + "rrclient": "0", + "name": "ARISTA01T1", + "local_addr": "10.0.0.56", + "nhopself": "0", + "admin_status": "up", + "holdtime": "10", + "asn": "64600", + "keepalive": "3" + }, + "Vrf1|10.0.0.59": { + "rrclient": "0", + "name": "ARISTA02T1", + "local_addr": "10.0.0.58", + "nhopself": "0", + "admin_status": "up", + "holdtime": "10", + "asn": "64600", + "keepalive": "3" + }, + "Vrf2|10.0.0.61": { + "rrclient": "0", + "name": "ARISTA03T1", + "local_addr": "10.0.0.60", + "nhopself": "0", + "admin_status": "up", + "holdtime": "10", + "asn": "64600", + "keepalive": "3" + }, + "Vrf2|10.0.0.63": { + "rrclient": "0", + "name": "ARISTA04T1", + "local_addr": "10.0.0.62", + "nhopself": "0", + "admin_status": "up", + "holdtime": "10", + "asn": "64600", + "keepalive": "3" + }, + "Vrf2|fc00::7a": { + "rrclient": "0", + "name": "ARISTA03T1", + "local_addr": "fc00::79", + "nhopself": "0", + "admin_status": "up", + "holdtime": "10", + "asn": "64600", + "keepalive": "3" + }, + "Vrf2|fc00::7e": { + "rrclient": "0", + "name": "ARISTA04T1", + "local_addr": "fc00::7d", + "nhopself": "0", + "admin_status": "up", + "holdtime": "10", + "asn": "64600", + "keepalive": "3" + }, + "Vrf1|fc00::72": { + "rrclient": "0", + "name": "ARISTA01T1", + "local_addr": "fc00::71", + "nhopself": "0", + "admin_status": "up", + "holdtime": "10", + "asn": "64600", + "keepalive": "3" + }, + "Vrf1|fc00::76": { + "rrclient": "0", + "name": "ARISTA02T1", + "local_addr": "fc00::75", + "nhopself": "0", + "admin_status": "up", + "holdtime": "10", + "asn": "64600", + "keepalive": "3" + } + }, +} +``` + +The frr configuration is the following. + +```jason +router bgp 65100 vrf Vrf2 + bgp router-id 10.1.0.32 + bgp log-neighbor-changes + no bgp default ipv4-unicast + bgp graceful-restart + bgp bestpath as-path multipath-relax + neighbor BGPSLBPassive peer-group + neighbor BGPSLBPassive remote-as 65432 + neighbor BGPSLBPassive passive + neighbor BGPSLBPassive ebgp-multihop 255 + neighbor BGPVac peer-group + neighbor BGPVac remote-as 65432 + neighbor BGPVac passive + neighbor BGPVac ebgp-multihop 255 + neighbor 10.0.0.61 remote-as 64600 + neighbor 10.0.0.61 description ARISTA03T1 + neighbor 10.0.0.61 timers 3 10 + neighbor 10.0.0.63 remote-as 64600 + neighbor 10.0.0.63 description ARISTA04T1 + neighbor 10.0.0.63 timers 3 10 + neighbor fc00::7a remote-as 64600 + neighbor fc00::7a description ARISTA03T1 + neighbor fc00::7a timers 3 10 + neighbor fc00::7e remote-as 64600 + neighbor fc00::7e description ARISTA04T1 + neighbor fc00::7e timers 3 10 + bgp listen range 10.255.0.0/25 peer-group BGPSLBPassive + bgp listen range 192.168.0.0/21 peer-group BGPVac + ! + address-family ipv4 unicast + network 10.1.0.32/32 + network 192.168.0.0/21 + neighbor 10.0.0.61 activate + neighbor 10.0.0.61 soft-reconfiguration inbound + neighbor 10.0.0.61 allowas-in 1 + neighbor 10.0.0.63 activate + neighbor 10.0.0.63 soft-reconfiguration inbound + neighbor 10.0.0.63 allowas-in 1 + neighbor BGPSLBPassive activate + neighbor BGPSLBPassive soft-reconfiguration inbound + neighbor BGPSLBPassive route-map FROM_BGP_SPEAKER_V4 in + neighbor BGPSLBPassive route-map TO_BGP_SPEAKER_V4 out + neighbor BGPVac activate + neighbor BGPVac soft-reconfiguration inbound + neighbor BGPVac route-map FROM_BGP_SPEAKER_V4 in + neighbor BGPVac route-map TO_BGP_SPEAKER_V4 out + maximum-paths 64 + exit-address-family + ! + address-family ipv6 unicast + network fc00:1::32/128 + network fc00:168::/117 + neighbor fc00::7a activate + neighbor fc00::7a soft-reconfiguration inbound + neighbor fc00::7a allowas-in 1 + neighbor fc00::7a route-map set-next-hop-global-v6 in + neighbor fc00::7e activate + neighbor fc00::7e soft-reconfiguration inbound + neighbor fc00::7e allowas-in 1 + neighbor fc00::7e route-map set-next-hop-global-v6 in + neighbor BGPSLBPassive activate + neighbor BGPSLBPassive soft-reconfiguration inbound + neighbor BGPVac activate + neighbor BGPVac soft-reconfiguration inbound + maximum-paths 64 + exit-address-family +! +router bgp 65100 vrf Vrf1 + bgp router-id 10.1.0.32 + bgp log-neighbor-changes + no bgp default ipv4-unicast + bgp graceful-restart + bgp bestpath as-path multipath-relax + neighbor BGPSLBPassive peer-group + neighbor BGPSLBPassive remote-as 65432 + neighbor BGPSLBPassive passive + neighbor BGPSLBPassive ebgp-multihop 255 + neighbor BGPVac peer-group + neighbor BGPVac remote-as 65432 + neighbor BGPVac passive + neighbor BGPVac ebgp-multihop 255 + neighbor 10.0.0.57 remote-as 64600 + neighbor 10.0.0.57 description ARISTA01T1 + neighbor 10.0.0.57 timers 3 10 + neighbor 10.0.0.59 remote-as 64600 + neighbor 10.0.0.59 description ARISTA02T1 + neighbor 10.0.0.59 timers 3 10 + neighbor fc00::72 remote-as 64600 + neighbor fc00::72 description ARISTA01T1 + neighbor fc00::72 timers 3 10 + neighbor fc00::76 remote-as 64600 + neighbor fc00::76 description ARISTA02T1 + neighbor fc00::76 timers 3 10 + bgp listen range 10.255.0.0/25 peer-group BGPSLBPassive + bgp listen range 192.168.0.0/21 peer-group BGPVac + ! + address-family ipv4 unicast + network 10.1.0.32/32 + network 192.168.0.0/21 + neighbor 10.0.0.57 activate + neighbor 10.0.0.57 soft-reconfiguration inbound + neighbor 10.0.0.57 allowas-in 1 + neighbor 10.0.0.59 activate + neighbor 10.0.0.59 soft-reconfiguration inbound + neighbor 10.0.0.59 allowas-in 1 + neighbor BGPSLBPassive activate + neighbor BGPSLBPassive soft-reconfiguration inbound + neighbor BGPSLBPassive route-map FROM_BGP_SPEAKER_V4 in + neighbor BGPSLBPassive route-map TO_BGP_SPEAKER_V4 out + neighbor BGPVac activate + neighbor BGPVac soft-reconfiguration inbound + neighbor BGPVac route-map FROM_BGP_SPEAKER_V4 in + neighbor BGPVac route-map TO_BGP_SPEAKER_V4 out + maximum-paths 64 + exit-address-family + ! + address-family ipv6 unicast + network fc00:1::32/128 + network fc00:168::/117 + neighbor fc00::72 activate + neighbor fc00::72 soft-reconfiguration inbound + neighbor fc00::72 allowas-in 1 + neighbor fc00::72 route-map set-next-hop-global-v6 in + neighbor fc00::76 activate + neighbor fc00::76 soft-reconfiguration inbound + neighbor fc00::76 allowas-in 1 + neighbor fc00::76 route-map set-next-hop-global-v6 in + neighbor BGPSLBPassive activate + neighbor BGPSLBPassive soft-reconfiguration inbound + neighbor BGPVac activate + neighbor BGPVac soft-reconfiguration inbound + maximum-paths 64 + exit-address-family +! +route-map set-next-hop-global-v6 permit 10 + set ipv6 next-hop prefer-global +! +route-map ISOLATE permit 10 + set as-path prepend 65100 +! +route-map TO_BGP_SPEAKER_V4 deny 10 +! +route-map FROM_BGP_SPEAKER_V4 permit 10 +! +route-map RM_SET_SRC6 permit 10 + set src fc00:1::32 +! +route-map RM_SET_SRC permit 10 + set src 10.1.0.32 +! +ip protocol bgp route-map RM_SET_SRC +! +ipv6 protocol bgp route-map RM_SET_SRC6 +... +``` + +#### acl redirect vrf configuration + +Acl redirect action supports vrf, so we need specify the outgoing interface of the nexthop explicitly, the acl redirect configuration template is as following: + +```jason +{ + "ACL_TABLE": { + "VRF_ACL_REDIRECT_V4": { + "policy_desc": "Redirect traffic to nexthop in different vrfs", + "type": "L3", + "ports": ["{{ src_port }}"] + }, + + "VRF_ACL_REDIRECT_V6": { + "policy_desc": "Redirect traffic to nexthop in different vrfs", + "type": "L3V6", + "ports": ["{{ src_port }}"] + } + }, + "ACL_RULE": { + "VRF_ACL_REDIRECT_V4|rule1": { + "priority": "55", + "SRC_IP": "10.0.0.1", + "packet_action": "redirect:{\% for intf, ip in redirect_dst_ipv6s \%}{{ ip ~ "@" ~ intf }}{{ "," if not loop.last else "" }}{\% endfor \%}" + }, + "VRF_ACL_REDIRECT_V6|rule1": { + "priority": "55", + "SRC_IPV6": "2000::1", + "packet_action": "redirect:{\% for intf, ip in redirect_dst_ipv6s \%}{{ ip ~ "@" ~ intf }}{{ "," if not loop.last else "" }}{\% endfor \%}" + } + } +} +NOTE: In the above rules, a backslash is used before % symbol related to FOR loops to avoid the github build failure. While using the actual code, remove the backslash. +``` + +#### teardown operation after each test case + +- Restore original topo-t0 configuration + +## PTF Test + +### Input files for PTF test + +PTF test will generate traffic between ports and make sure the traffic forwarding is expected according to the vrf configuration. Depending on the testbed topology and the existing configuration (e.g. ECMP, LAGS, etc) packets may forward to different interfaces. Therefore port connection information will be generated from the minigraph and supplied to the PTF script. + +### Traffic validation in PTF + +Depending on the test cases PTF will verify the packet is arrived or dropped. For vrf "src_mac" option test, PTF will analyze ip packet dst_mac after L3 forwarding through vrf and do L3 forwarding only when ip packet's dst_mac is matched with configed vrf "src_mac". + +## Test cases + +### Test case #1 - vrf creat and bind + +#### Test objective + +verify vrf creat and bind intf to vrf + +#### Test steps + +- config load vrf configuration +- verify vrf and ip configuration in kernel +- verify vrf and ip configuration in app_db + +### Test case #2 - neighbor learning in vrf + +#### Test objective + +verify arp/neighbor learning in vrf + +#### Test steps + +- from DUT ping vms successfully +- verify neighbor entries by traffic + +### Test case #3 - route learning in vrf + +#### Test objective + +Verify v4/v6 route learning in vrf + +#### Test steps + +- bgp load new frr.conf +- DUT exchange routes information with peer VMs + - each vm propagates 6.4K ipv4 route and 6.4k ipv6 route to DUT. +- verify route entries by traffic(choose some routes) + +### Test case #4 - isolation among different vrfs + +#### Test objective + +The neighbor and route entries should be isolated among different vrfs. + +#### Test steps + +- load vrf configuration and frr.conf +- both vms and DUT exchange routes via bgp + - route prefix overlaps in different vrfs. +- verify traffic matched neighbor entry isolation in different vrf +- verify traffic matched route entry isolation in different vrf + +### Test case #5 - acl redirect in vrf + +#### Test objective + +ACL redirection can redirect packets to the nexthop with specified interface bound to the vrf. + +#### Test steps + +- load acl_redirect configuration file +- PTF send pkts +- verify PTF ports can not receive pkt from origin L3 forward destination ports +- verify PTF ports can receive pkt from configured nexthop group member +- verify load balance between nexthop group members +- Restore configuration + +### Test case #6 - loopback interface + +#### Test objective + +User can configurate multiple loopback interfaces. Each interface can belong to different vrf. + +#### Test steps + +- load loopback configuration file +- On ptf in different vrf ping DUT loopback interface ip address +- Verify if ping operation is successful +- use loopback interface as bgp update-source in vrf +- verify bgp session state is Established +- Restore configuration + +### Test case #7 - Vrf WarmReboot + +#### Test objective + +During system/swss warm-reboot, traffic should not be dropped. + +#### Test steps + +- execute ptf background traffic test +- do system warm-reboot +- after system warm-reboot, stop ptf background traffic test +- verify traffic should not be dropped +- execute ptf background traffic test +- do swss warm-reboot +- after swss warm-reboot, stop ptf background traffic test +- verify traffic should not be dropped + +### Test case #8 - Vrf capacity + +#### Test objective + +Current sonic can support up to 1000 Vrf. + +#### Test steps + +- create 1000 vlans and Vrfs using CLI +- configure Ethernet0 to 1000 vlans +- bind 1000 vlan interfaces to 1000 vrfs +- configure ip addresses on 1000 vlan interfaces +- Verify if any error log occur +- Verify 1000 vlan interfaces connection with ptf by traffic + +### Test case #9 - unbind intf from vrf + +#### Test objective + +When the interface is unbound from the vrf all neighbors and routes associated with the interface should be removed. The other interfaces still bound to vrf should not be effected. + +#### Test steps + +- unbind some intfs from vrf +- verify neighbor and route entries removed by traffic +- verify neighbor and route entries related to other intf should not be effected by traffic +- Restore configuration + +### Test case #10 - remove vrf when intfs is bound to vrf + +#### Test objective + +Use CLI to remove vrf when intfs is bound to the vrf, all ip addresses of the intfs belonging to the vrf should be deleted and all neighbor and route entries related to this vrf should be removed. The entries in other vrf should not be effected. + +#### Test steps + +- load vrf configuration and frr.conf +- verify route and neigh status is okay +- remove specified vrf using CLI +- verify ip addresses of the interfaces belonging to the vrf are removed by traffic +- verify neighbor and route entries removed by traffic +- verify neighbor and route entries in other vrf existed by traffic +- Restore configuration + +## TODO + +- vrf table attributes 'fallback lookup' test +- vrf route leaking between VRFS +- everflow support in vrf +- application test in VRF such as ssh diff --git a/doc/vrf/vrf-vs-test-plan.md b/doc/vrf/vrf-vs-test-plan.md new file mode 100644 index 0000000000..7f72135d72 --- /dev/null +++ b/doc/vrf/vrf-vs-test-plan.md @@ -0,0 +1,63 @@ +# VRF feature vs test plan + +## overview + +Vrf vs test plan are used to verify the function of the swss by checking the content of the APP_DB/ASIC_DB/kernel. FRR and SAI function will be covered by ansible pytest test cases. + +## test cases + +No|Test case summary +---------|---------- +1|Verify that the vrf entry from config is pushed correctly by vrfmgrd to APP_DB and linux kernel. +2|Verify that the Orchagent is pushing the vrf entry into ASIC_DB by checking the contents in the ASIC_DB. +3|Verify that the random combination of vrf attributes can successfully configured by checking the contents in the APP_DB and ASIC_DB. +4|Verify that the vrf attribute can be updated successfully after vrf is created by checking the contents in the ASIC_DB +5|Verify that the vrf entries can be successfully removed from the CONFIG_DB, APP_DB and ASIC_DB. +6|Verify that the maximum number of vrf entries be created can reach to 1K. +7|Verify that the interface entry from config is pushed correctly by intfmgrd to APP_DB and linux kernel. +8|Verify that the Orchagent is receiving interface creation and deletion from APP_DB. +9|Verify that the Orchagent is pushing the interface entry into ASIC_DB by checking the contents in the ASIC_DB. +10|Verify that the port interface bind IPv4 address without vrf correctly by checking the contents in the APP_DB and ASIC_DB. +11|Verify that the port interface bind IPv4 address with vrf correctly by checking the contents in the APP_DB and ASIC_DB. +12|Verify that the different port interface bound to different vrf can configure the same IPv4 Address by checking the APP_DB and ASIC_DB. +13|Verify that the IPv4 address is removed successfully from the port interface by checking the contents in the APP_DB and ASIC_DB +14|Verify that the port interface bind IPv6 address without vrf correctly by checking the contents in the APP_DB and ASIC_DB. +15|Verify that the port interface bind IPv6 address with vrf correctly by checking the contents in the APP_DB and ASIC_DB. +16|Verify that the IPv6 address is removed successfully from the port interface by checking the contents in the APP_DB and ASIC_DB +17|Verify that the lag interface bind IPv4 address without vrf correctly by checking the contents in the APP_DB and ASIC_DB. +18|Verify that the lag interface bind IPv4 address with vrf correctly by checking the contents in the APP_DB and ASIC_DB. +19|Verify that the IPv4 address is removed successfully from the lag interface by checking the contents in the APP_DB and ASIC_DB +20|Verify that the lag interface bind IPv6 address without vrf correctly by checking the contents in the APP_DB and ASIC_DB. +21|Verify that the lag interface bind IPv6 address with vrf correctly by checking the contents in the APP_DB and ASIC_DB. +22|Verify that the IPv6 address is removed successfully from the lag interface by checking the contents in the APP_DB and ASIC_DB +23|Verify that the vlan interface bind IPv4 address without vrf correctly by checking the contents in the APP_DB and ASIC_DB. +24|Verify that the vlan interface bind IPv4 address with vrf correctly by checking the contents in the APP_DB and ASIC_DB. +25|Verify that the IPv4 address is removed successfully from the vlan interface by checking the contents in the APP_DB and ASIC_DB +26|Verify that the vlan interface bind IPv6 address without vrf correctly by checking the contents in the APP_DB and ASIC_DB. +27|Verify that the vlan interface bind IPv6 address with vrf correctly by checking the contents in the APP_DB and ASIC_DB. +28|Verify that the IPv6 address is removed successfully from the vlan interface by checking the contents in the APP_DB and ASIC_DB +29|Verify that the loopback interface bind IPv4 address without vrf correctly by checking the contents in the APP_DB and ASIC_DB. +30|Verify that the loopback interface bind IPv4 address with vrf correctly by checking the contents in the APP_DB and ASIC_DB. +31|Verify that the IPv4 address is removed successfully from the loopback interface by checking the contents in the APP_DB and ASIC_DB. +32|Verify that the loopback interface bind IPv6 address without vrf correctly by checking the contents in the APP_DB and ASIC_DB. +33|Verify that the IPv6 address remove successfully from the loopback interface by checking the contents in the APP_DB and ASIC_DB. +34|Verify that the neighsyncd pushed neighbor entries to APP_DB correctly by checking the contents in the APP_DB. +35|Verify that the Orchagent is pushing the neighbor entry into ASIC_DB by checking the contents in the ASIC_DB. +36|Verify that the IPv4 neighbor create and delete successfully by checking the contents in the APP_DB and ASIC_DB. +37|Verify that the IPv6 neighbor create and delete successfully by checking the contents in the APP_DB and ASIC_DB. +38|Verify that the different interface with different vrf can add the same IPv4 neighbor address by checking the APP_DB and ASIC_DB. +39|Verify that the fpmsyncd pushed route entries to APP_DB correctly by checking the contents in the APP_DB. +40|Verify that the Orchagent is pushing the route entry into ASIC_DB by checking the contents in the ASIC_DB. +41|Verify that the IPv4 route entry add successfully by checking the contents in the ASIC_DB. +42|Verify that the IPv4 route entry delete successfully by checking the contents in the APP_DB and ASIC_DB. +43|Verify that the IPv6 route entry add successfully by checking the contents in the ASIC_DB. +44|Verify that the IPv6 route entry delete successfully by checking the contents in the APP_DB and ASIC_DB. +45|Verify that the IPv4 route entry with vrf add successfully by checking the contents in the ASIC_DB. +46|Verify that the IPv4 route entry with vrf delete successfully by checking the contents in the APP_DB and ASIC_DB. +47|Verify that the IPv6 route entry with vrf add successfully by checking the contents in the ASIC_DB. +48|Verify that the IPv6 route entry with vrf delete successfully by checking the contents in the APP_DB and ASIC_DB. +49|Verify that the route entry can point to a nexthop in different vrf. +50|Verify that the acl packet action is redirect to a nexthop, the acl entry add correctly by checking the contents in the ASIC_DB. +51|Verify that the kernel vrf config keep the same during vrfmgrd warm-reboot. +52|Verify that the VIRTUAL_ROUTER/ROUTE_ENTRY/NEIGH_ENTRY in ASIC_DB keep the same during vrfmgrd warm-reboot by monitoring the object changes in ASIC_DB. +53|Verify that the vrfmgrd work well after warm-reboot by checking that the new config is pushed correctly to APP_DB and ping work well via vrf port interfaces. diff --git a/doc/ztp/SONiC-config-setup.md b/doc/ztp/SONiC-config-setup.md new file mode 100644 index 0000000000..c848a3b157 --- /dev/null +++ b/doc/ztp/SONiC-config-setup.md @@ -0,0 +1,291 @@ +# SONiC Configuration Setup Service + +# High Level Design Document +#### Rev 0.2 + +# Table of Contents + * [List of Tables](#list-of-tables) + * [Revision](#revision) + * [About This Manual](#about-this-manual) + * [Scope](#scope) + * [Definitions](#definitions) + * [1. Feature Overview](#1-feature-overview) + * [1.1 Requirements](#11-requirements) + * [1.2 Design Overview](#12-design-overview) + * [2. Functionality](#2-functionality) + * [2.1 Target Use Cases](#21-target-deployment-use-cases) + * [2.2 Functional Description](#22-functional-description) + * [2.3 CLI](#23-cli) + * [3. Unit Tests](#3-unit-tests) + + +# List of Tables +[Table 1: Definitions](#table-1-definitions) + +# Revision +| Rev | Date | Author | Change Description | +|:---:|:-----------:|:------------------:|-----------------------------------| +| 0.1 | 07/16/2019 | Rajendra Dendukuri | Initial version | +| 0.2 | 07/22/2019 | Rajendra Dendukuri | Update Test plan, fixed review comments | + +# About this Manual +This document provides details about how the switch configuration is handled on a SONiC device. + +# Scope +This document describes functional behavior of the proposed config-setup service. It also explains the dependency between various other SWSS services and the config-setup service. + +# Definitions +### Table 1: Definitions +| **Term** | **Definition** | +| -------------- | ---------------------------- | +| Config DB | SONiC Configuration Database | +| config-setup | Configuration Setup | +| startup-config | /etc/sonic/config_db.json | +| ZTP | Zero Touch Provisioning | + +# 1 Feature Overview + +SONiC switch configuration is stored in a redis database instance called Config DB. The contents of Config DB reflect most of the configuration of a SONiC switch. The contents of Config DB can be saved in a file */etc/sonic/config_db.json* using the *config save* CLI command. During switch bootup, Config DB is populated with the intended configuration present in the file */etc/sonic/config_db.json*. Through out this document the term startup-configuration is used to refer to */etc/sonic/config_db.json*. + + + +When a new SONiC firmware version is installed, the newly installed image does not include a *startup-configuration*. A startup-config has to be created on first boot. Also when the user upgrades from firmware version A to version B, the startup-config needs to be migrated to the new version B. + + + +SONiC is a collection of various switch applications whose configuration is not always stored in Config DB and hence not present in the startup-config file. First boot configuration creation and configuration migration for these applications also needs to be handled so that all applications in a SONiC switch provide their intended functionalities. + + + +A new service, ***config-setup***, is being proposed to take care of all of the above described activities. Some of the functions provided by the config-setup service are already being handled by *updategraph* service. As part of proposed changes, functionality dealing with configuration management is moved from *updategraph* to *config-setup* service. In future, the *updategraph* service can be removed all together and config-setup can be the single place where SONiC configuration files are managed. + + + + +## 1.1 Requirements + + +### 1.1.1 Functional Requirements + +1. The config-setup service must provide the ability to generate a configuration when there is none present. + +2. The config-setup service must be extensible enough so that any additions to it can be done without requiring any changes to the core service script. + +3. It must be backward compatible with the current SONiC configuration generation scheme which makes use of t1/l2/empty config presets. + +4. It must support configuration of components beyond the *startup-config* (*/etc/sonic/config_db.json*) file. Example frr configuration. + +5. It should support intermediate reboots during config initialization process. For example, some changes to switch SDK files may require a reboot to get applied to hardware. + +6. It must provide infrastructure to support config migration when the user installs a new SONiC firmware version and reboots to switch to the newly installed version. + +7. The config-setup service must also take into consideration of other methods to configuring a switch. e.g ZTP, updategraph. + + + +### 1.1.2 Configuration and Management Requirements +*config-setup* service provides a command line tool */usr/bin/config-setup* which should provide following functionality: + +1. Create factory default configuration on first boot +2. Create factory default configuration on demand +3. Take backup of configuration files when a new SONiC firmware version is installed +4. Restore backed up configuration files and apply when user switches from SONiC firmware version A to version B + +### 1.1.3 Warm Boot Requirements + +When the switch reboots in warm boot mode, *config-setup* service must ensure that config migration steps do not affect warm boot functionality. + + + +## 1.2 Design Overview + +### 1.2.1 Basic Approach +*config-setup* service is heavily inspired by the functionality provided by the existing *updategraph* service. config_migration and config_initialization sections of updategraph have been migrated to the new *config-setup* service. Core logic is kept intact and some additional enhancements have been made to consider the case when the ZTP feature is available and if it is administratively enabled or disabled. + + + +In additional to this borrowed functionality, few provisions for user defined extensions (hooks) have been added to perform customizations which are implementation based. + + + +### 1.2.2 Systemd service dependencies +The *config-setup.service* runs after *rc-local.service* and *database.service.* It requires *database.service*. The *updategraph.service* requires *config-setup.service* and runs only after *config-setup.service*. In the future when the *updategraph.service* is removed from SONiC, all systemd services which have dependency on *updategraph.service* will be modified to depend on the *config-setup.service*. + + + +# 2 Functionality +## 2.1 Target Deployment Use Cases +The *config-setup* service is used to perform the following functions: + +1. Switch boots a SONiC image which was installed on an empty storage device + +2. A new SONiC firmware image is installed and user switches to the newly installed image + +3. SONiC switch boots without a startup-config file present + + +## 2.2 Functional Description + +### 2.2.1 Config Setup Hooks + +To extend the functionality of the *config-setup* script, users are not expected to modify this script directly. Instead users are required to place executable *hook* scripts in corresponding hooks directory which extend the config-setup script's capability. All executable scripts in the hooks directory are invoked inline using the Bourne shell '.' command. The exit status of the config-setup action will be passed to each hook in the *exit_status* shell variable, and will always be zero if the script succeeded at the task for which it was invoked. The hook scripts can modify the value of exit_status to change the exit status of the config-setup action. + + + +With in a hook script, if required a system reboot operation can be performed. If such an event occurs, the config-setup service will resume executing hooks from the point where reboot was issued. Any hooks previously executed will not be executed again. The satisfying condition for initiating a reboot and the reboot operation itself is not implemented in the config-setup service. Instead this logic needs to be implemented as part of the hook script. + + + +In the following sections, supported hooks for including user extensions are described. + + + +### 2.2.1 Config Initialization + +When a SONiC switch boots, the config-setup service starts after the database service exits. It detects if the switch is booting for the first time by checking for existence of the */tmp/pending_config_initialization* file. + +At this point if ZTP is enabled, config-setup service loads the ZTP configuration into Config DB and continues the rest of the boot process. ZTP service will then perform provisioning data discovery and download the configuration for the switch. If updategraph is enabled, *config-setup* services exits and lets updategraph handle the rest of config initialization steps. + + + +If both ZTP and updategraph are not enabled, config-setup service is responsible to create a factory default startup-configuration for the switch and load it to Config DB. + + + +#### Factory Default Config Hooks + +Hooks Directory: */etc/config-setup/factory-default-hooks.d* + +If defined by the user these are executed during factory default config creation step of *config-setup* service. User can choose to create the startup-config file, frr configuration files and any other files using pre-defined logic. Various application packages can install hooks with recipes to create their own factory default configuration, config-setup service will execute these recipes. + + + +Below is an example hook to generate a config_db.json using a pre-defined factory default template file. + +*/etc/config-setup/factory-default-hooks.d/10-02-render-config-db-json* + +```bash +#Render the config_db.json.j2 template if defined +CONFIG_DB_TEMPLATE=/usr/share/sonic/templates/config_db.json.j2 + +PLATFORM=sonic-cfggen -H -v DEVICE_METADATA.localhost.platform +PRESET=(head -n 1 /usr/share/sonic/device/$PLATFORM/default_sku) +HW_KEY=${PRESET[0]} + +if [ -f ${CONFIG_DB_TEMPLATE} ] && [ "${HW_KEY}" != "" ]; then + sonic-cfggen -H -k ${HW_KEY} -p -t ${CONFIG_DB_TEMPLATE} > /etc/sonic/config_db.json + exit_status=$rv +fi +``` + + + +#### Preset Configuration + +If a startup-config was not created as part of factory default, config-setup service uses the config preset defined in the /usr/share/sonic/devices/*$platform*/*default_sku* file. This is the current observed functionality provided by the updategraph service. + + + +### 2.2.2 Config Migration + +The user can use the *sonic_installer* utility to install a new version of SONiC firmware. As part of this procedure, sonic_installer takes a backup of all files in directory */etc/sonic* and copies them as */host/old_config*. Later when switch boots into the newly installed image, a file */tmp/pending_config_migration* is created by rc.local service and config-setup service makes note of it. The backed up files in /host/old_config are then restored to /etc/sonic directory and the restored startup-config file is loaded to Config DB. + + + +#### Config migration Hooks - Pre (Take Backup) + +Hooks Directory: */etc/config-setup/config-migration-pre-hooks.d* + +Config migration hooks provides various applications the ability to extend their config migration step and define their own back up scripts that are invoked when */etc/sonic* is backed up. Based on the specific requirement of an application, the corresponding config-migration-pre-hook for the application implements the appropriate recipe. These hooks can be invoked by using the "config-setup backup" command. + + + +Below is an example hook to take a copy of the known ssh hosts file. + +*/etc/config-setup/config-migration-pre-hooks.d/backup-known-ssh-known-hosts* + +```bash +#Take a backup of known_hosts file for all users +mkdir -p /host/ssh_config +for user in $(ls -1 /home); do + if [ -e /home/${user}/.ssh/known_hosts ]; then + cp /home/${user}/.ssh/known_hosts /host/ssh_config/${user}_known_hosts + fi +done +``` + + + +#### Config migration Hooks - Post (Restore Backup) + +Hooks Directory: */etc/config-setup/config-migration-post-hooks.d* + +These hooks are executed by the config-setup service when switch boots into a newly installed image and also a snapshot of backed up configuration files are found in /host/old_config directory. Applications can install corresponding hooks in the *config-migration-post-hooks.d* directory, which restore the files that were baked up in the *config-migration-pre-hooks*. + + + +Below is an example hook to restore known ssh hosts file from a backed up files. + +*/etc/config-setup/config-migration-post-hooks.d/restore-ssh-known-hosts* + +```bash +#Restore known_hosts file for all users +if [ -d /host/ssh_config ]; then + for user in $(ls -1 /home); do + if [ -e /host/ssh_config/${user}_known_hosts ]; then + cp -f /host/ssh_config/${user}_known_hosts /home/${user}/.ssh/known_hosts + fi + done +fi +# Remove the copy +rm -rf /host/ssh_config/*_known_hosts +``` + + + +### 2.2.3 Config Detection + +On every switch bootup, config-setup service starts and detects if a startup-config file is present or not. If startup-config does not exist, config Initialization action is performed. + + + +## 2.3 CLI +Following are the commands supported by the config-setup tool. These are Linux shell commands. + + + +***config-setup factory*** + +This command is executed to create factory default configuration. Please refer to section 2.2.1 for more details. It is to be noted that this command simply creates a configuration and does not load it into Config DB. It is up to the calling entity to execute either 'config reload' or switch reboot for the configuration to take effect. + + + +***config-setup backup*** + +This command is executed to take a backup copy of SONiC configuration files that need to be migrated over when a new firmware image is installed and booted in to. Please refer to section 2.2.2 for more details. + + + +***config-setup boot*** + +This command is executed as part of system bootup by the config-setup service. Users must not execute this command on Linux shell even though it is possible to. The actions performed by this command are described in section 2.2. + + + +# 3 Unit Tests + +1. Install SONiC firmware image on an empty disk using ONIE or similar bare metal provisioning tool. Verify that a factory default configuration is created. +2. Delete startup-config and reboot the switch. Verify that a factory default configuration is created. +3. Install a new SONiC firmware image using "sonic_installer install" command and reboot the switch to boot into newly installed image. Verify that the startup-config file used in the old SONiC firmware is restored to be used by the new image. +4. When ZTP is enabled, verify that the ZTP configuration is loaded when SONiC switch boots without startup-config. Factory default config does not get created in this case. +5. Verify that updategraph service takes over config creation role if it is enabled and SONiC switch boots without startup-config. +6. Verify that the user specified config-setup-factory hooks are executed when switch boots without a startup-config. +7. Verify that the user specified config-migration-pre hooks are executed when a new SONiC firmware image is installed. +8. Verify that when the switch boots into the newly installed SONiC image, the user specified config-migration-post hooks are executed. +9. Verify that the exit status of the user defined hooks is correctly read and reported by the config-setup service in syslog. If the hook script is syntactically incorrect, it will be reported with failed exit status. +10. When the switch boots with a saved startup-config, verify that the config-setup service does not perform any additional actions. +11. Verify that the user can execute the "config-setup factory" command manually with requiring a switch reboot. +12. Verify that the user can execute the "config-setup backup" command multiple times so that any new configuration file changes are picked up again. +13. Verify that when a config-setup hook scripts issue a switch reboot, all the hook scripts previously executed are not executed again up subsequent switch boot. +13. Verify that config-migration hook scripts are executed even when switch boots in warm boot mode. This will allow scripts to perform any actions required for some applications which are not covered by the warm boot functionality. +14. Verify that config-setup hook scripts are not executed when switch boots in warm boot mode. diff --git a/doc/ztp/images/sonic_config_flow.png b/doc/ztp/images/sonic_config_flow.png new file mode 100644 index 0000000000..baa5749d01 Binary files /dev/null and b/doc/ztp/images/sonic_config_flow.png differ diff --git a/doc/ztp/images/sonic_config_flow.xml b/doc/ztp/images/sonic_config_flow.xml new file mode 100644 index 0000000000..5c799615df --- /dev/null +++ b/doc/ztp/images/sonic_config_flow.xml @@ -0,0 +1 @@ +7V1be6I4GP41vbQPkBDwctrpdA8zbbfdU+cOBZVdFAdxa/vrNyhBciiEGCjVzs2UiBH43rzfMR9n4HK+uU685exb7AfRmWX4mzPw+cyyTMNF+L9s5Hk34iBnNzBNQj8/aT/wEL4E5Jv56Dr0gxV1YhrHURou6cFxvFgE45Qa85IkfqJPm8QR/atLbxpwAw9jL+JH/wr9dLYbdW1jP/5TEE5n5JdNI/9k7pGT84HVzPPjp9IQuDoDl0kcp7u/5pvLIMoeXvFcbl5++zFAi19enqL5j3BsvNz+OdhN9qXJV4pbSIJFqjz1aLbcjB9vL+6NX6/8Z/Prr/HiO5n6Py9a58/r4SlMxzM8dpHd2O7G02fyNPGUWHD44OJpFqbBw9IbZ588YezgsVk6j/CRif+chFF0GUdxsv0e8L3AnYzx+CpN4n+D0ido7AajSfEJkZGVzREv0of8t2WfAbmhIEmDTQkB+TO5DuJ5kCbP+BTyqZl/Jcc3kfbTHizDfGhWwgnBj5fDc1pMvBcB/iOXQgOJWJxEeCEs/E/ZysBH48hbrcIx/ezxE0qe/8YHxrlNDh+zQ3LweUMdPZOjTZiWvoaPHsmM+O/9l7ID8h1eas0EtYrXyTioR2jqJdMgrTjPBbsTA5/iA17uJcHaFYJNgshLw/9oFhFJO/+FuzjE97uHlQ3xUywDCzGI2d15/rXyuq2dyTSZqXYPh5tqi7/iztUhaSIBJlGU5kuUAif6sY7JB4PVVht8wieYxnKz/xD/Nc3+x5w/CafrBD/oeJHJbhOuUnxPX8j0+HJ3v7A7n1sJySyej9arxnQ0ccfBWEhHI9eGthjY3dCR4TLStiDiGQnwwHXbYiRzyIn/EWt0dVaiOGlPUTWsZJY4ac9Qsqx0gPDqKQpJchThlp5wlEVMnRxpwLLVKMpyuInOmalapiiitbVT1Pff7/BnVwtvFAV+D2iJxvHh+reJoUTzEhgaRMgluLoCuLbGTBZvvioxkwLDqLJZt8xElkUtMwGzV8xkDmmsQTRkCUWVm/Ay6pqbeJP+JlYGaRPjXNURaN2ol8alM+wXLm0al7bJgUkal6bBTtUpKoFIY7KYxE+dUNMqmM6x8K/2QxdtWXw8/ArwN2BozZCFso4ogr2CrEvDzGGVsSxemXkK27EGrhgg3nPptGV2wur1yx3SP+OaRuVVMaebwyGzVna/r3fl8FbHZRJ4aYDHdubi5dav5ZZTEq8XfuDnq6NhAM0OXB+KTEPXGgGE3s5jxRqVVrCGLWkXthZDA7zG3QnmLonHwYq3EI9TMsA1mVgCJMHOkmQcyEsGtiYZUSjpXQUSNKsVIKtWCKZ7olYAqjWrZTULBG9toQNHgMqPAGcrpIRscM4GEnhS6jTACVwNpNRClFKd6nSTlMWT1F+T20d7FS1HP/4erb4/vHz7aq0HSJKjciwY54YJACf8FmkLknT24bRlMw4cBMzVvEJa+4nIifFksgraITY+dM8j+zAnrxrEVFpRJbxmdgV8IZ4tR1I79yxOYTMq1WFNbekoBZMisFnLUJPbV1ic5IItVH1dkL0uh1k/LTh+kA+ZiMLNWBGm9BpJAmwoZBmEHKX508Bn2xdn9mc84kXhdJEtMYzbAGvqi0yhhmMv+pR/MA99f7seI28URBfe+N/p1mkp6/ztv3o9Xs0UrCIvKnPy6z8rF7+ISH1gnDuYHCnxmIehun2ihBU+/cQbp/H2Dv1g4q2jdPdMTsfDNw1aaRYxpBKZFWVTZTZjSx60WWw2vxK/xp5fCKawuE9CPpY9pORj2a5sBAa1JCASBK0yPFYzb7k1Otbzq6KsbBkkIb6EjAM/58Vmd/uhOpGNCl68XadRuAjEopx4vuOJvSOIbAO2mUWFBlvN4wKBtIBAWqCt5YRE/q+iA3QU+SkC31q7z5T1eDqq6GB8Ei7EqpqdAh1npxDvuBSFqbeYI7b07kU8So+uPtUxmPpUIbmbIr6wrZb4wrE46XzUXYgCivV+o90r/kD1oVhZBkFc3YVceEQXg5An+2GBsITCCsaUtD4q0HQYm2jNCZ0jiGhqGDpmDTmIbRejkogERQrZnCWotBGJJaiur0LoV0AKMeUDkLVllWlFsg5BG61IRFRPklZMMGQcG5ukDd7KrXH5KIGyjXIUbg0Bby17kHBYT9iDq5UDXH25+m4a2+66WN3lzefTxiXZu/XeauuKeYpQjqq7zUxkEUR0hUjAIZLUsBnzcHpS0VTTYKumiv12VLzb5WHVWtmUKwqn7ipIRvvakYYVK66oYuXh9ibE15iXMO4rV75EmIteK1ophkevFrLwSTFa8Is4s2kolORD8qkxEQL3GG1ujTeotGPSkUAQoRHtInY0AEaYuObXc3MNo01bqOgqzRpGFJARPrh+KRiHSb9DVQWDHDqKaKPWFIzwsZLgTxWBZZtilJik4/Q6D83KJagloQ4QOAxQWlPoVStHew0lli4GZzp4SZeDRTwgmfeP+kluLXdQP1lV9XeA5IWmyLZkH3nzTFKL0Wq5/QjP5aVYoHjN4Pm2UzJn5KySFBbNH5eXVw8P/AeniyE4pLUBGPIGruMIEKQhBCxEkMYC3C4cYnXjpGr91Nom/QrpQp6I2HIS6VJaUn1GJjKYiVq2TkQxXd2qK14MJl4YrZPghHmHRL4KOXdX+S8UvMlHiAVbs9/c4tQpnMbmJ3JpZcH5IPVUMTBtSM1BjOP2rVJT5Hgw8j3JfA3bF8G2uitCE9fai2IWai5i8z0LSqUopY0TcZLO4mlWkkTvnRDbF3X5aM32hSDrI/bg+tVCDfFZHzXzgp2IYzBNmxiQSddJ23kaTdemBPGy+ai0fa3OhbZOHUEUv606F7GkJCqSDu24oWUXYve0Jd1xo3IJ9IS3LHa7hGrnR5fhLa53hybecqqbbrTEW69HjXSrez19TXsSLBBUplduXezJomATEAhaisqcmchprzJd/FxFsSpF3NYakHT2cn/O1zhe5hD+J0jT57zttrdO41pLt/Mt5dI7aw8j90NTbUxDC4evMWobWe12oRhMvCjKzLbTjT/xuRPkipoVdRqDkmrEpuzQNilp7om7aglU3HvYPME6HEjZ7rOYiTpWcYIGZ6eNSFPgibyHNqoW5IpdTeXKWX4y2GblrBiZVqvIbBL860eDEtnQntkvYAKGKm2oCErIUCUcdgxIiaD1dmvkMRU2kUWopbLJdN3DINV+DglIhFg7Yp33knKQjVL0jJegRTuBylEKtl0uMpiJ2uYlieja0fESWaY6eMnFxERJsE8tjCqzdx+JINbrZ+LxkDyYEqdYAk4xNTTHEUtKYnE2jfwMRYEfeoU33rOhoNM6bM2rWWPJFuH1LEluIwbcZItc87g62xmTmahljQU19CA6KkBC2U2RoF+ZHtt1aCBB9R4iTEEFlOw+qQ2TEnXJJ6lS2f6iFukI9Fa1FVCm0W03VWH9aNlMFnOZPaKldRP/MbTv72eLlbm24PXmOn9pbF/IAxI/iQBrqKjOAFPTYJEwQ1cb3lrN6vQbkJVc+t4617AGluUqR9DZ3jV4qm4xKVOApimv07eNN4e9uNPqF0uyryEDDjOF8os7Ybcbb2yR2j7ubcHFItQRpQLOsPfRc1tDezcB69RHwgteshQSzg3Tev+s50tyOV4yJiO7siykn7lEb+R4Dx1zWLOsOG7MXEykvq330bEXTN6gJWt3kvckSTfabXi+7jfeiZfw65Vp6u1nzNdft+yHq+r3LR+TPiD0qEMf0M7TwOqRbhA6nxqzVG9cTNuVHVvlw+t7h9yhXjxLaqTxoH6zsup59LQhxc83g7v72+v7j6YUFH0B9g2zJCPQQVMKIYr0tSw61ryFLBlVBoR6YplCh60Nt1xH+d1wlsRkLZOgRMvAnbXljbAupIwtrUhurWRp7+A58g6eOtgPezNi39AOmAiSqh9mM29/Bmx/AE1+GHvBdX4Yd4M1fhV3H/T5Tf0qfJjEmfGwPx1r5tm32A+yM/4H \ No newline at end of file diff --git a/doc/ztp/ztp.md b/doc/ztp/ztp.md index 7e37c9c1c0..53c41b8ffd 100644 --- a/doc/ztp/ztp.md +++ b/doc/ztp/ztp.md @@ -1,6 +1,6 @@ # Zero Touch Provisioning (ZTP) -### Rev 0.2 +### Rev 0.9 ## Table of Contents - [1. Revision](#1-revision) @@ -11,11 +11,15 @@ - [3.3 Dynamic Content](#33-dynamic-content) - [3.4 DHCP Options](#34-dhcp-options) - [3.5 ZTP Service](#35-ztp-service) - - [3.6 Provisioning over in-band network](#36-provisioning-over-in-band-network) - - [3.7 Component Interactions](#37-components-interactions) + - [3.6 Start and exit conditions](#36-start-and-exit-conditions) + - [3.7 Provisioning over in-band network](#37-provisioning-over-in-band-network) + - [3.8 Component Interactions](#38-components-interactions) - [4. Chronology of Events](#4-chronology-of-events) + - [4.1 SONiC ZTP Flow Diagram](#41-sonic-ztp-flow-diagram) - [5. Security Considerations](#5-security-considerations) - [6. Configuring ZTP](#6-configuring-ztp) + - [6.1 Show Commands](#61-show-commands) + - [6.2 Configuration Commands](#62-configuration-commands) - [7. Code Structure](#7-code-structure) - [8. Logging](#8-logging) - [9. Debug](#9-debug) @@ -28,25 +32,35 @@ |:---:|:------------:|:------------------:|-----------------------------------| | v0.1 | 03/06/2019 | Rajendra Dendukuri | Initial version | | v0.2 | 04/17/2019 | Rajendra Dendukuri | Added: suspend-exit-code, in-band provisioning, interaction with updategraph, Test plan | +| v0.9 | 09/17/2019 | Rajendra Dendukuri | Update the design document as per the ZTP code | ## 2. Requirements -- When a newly deployed SONiC switch boots for the first time, it should allow automatic setup of the switch without any user intervention. This framework is called as Zero Touch Provisioning or in short ZTP. -- DHCP offer sent to a SONiC switch will kickstart ZTP. -- ZTP should allow users to perform one or more configuration tasks. Data and logic used for these configuration task can be defined by the user. It should also allow ordering of these configuration tasks as defined by the user. -- ZTP should allow users to suspend a configuration task and move on to the next one. ZTP resumes the incomplete task later after finishing rest of the tasks in the order of execution. -- Switch reboots during ZTP should be supported. ZTP should resume from where it had left prior to reboot. -- Configuration tasks should be completely user defined. Few predefined tasks shall be provided as part of default switch image. However, user should be able to override the logic of predefined tasks with user supplied logic (script). -- Include switch information while requesting for files from a remote provisioning server. This allows remote server to provide switch specific files at runtime. -- ZTP output and result should be logged through syslog. -- ZTP is expected to run to completion only after all the configuration tasks are completed. Result is either SUCCESS/FAILED. At this point ZTP exits and does not run again. It requires user intervention to re-enable ZTP. -- Manual interruption of ZTP service should be allowed. It should result in ZTP to be disabled and user intervention is needed to re-enable it. -- User should be able to view completion status of each configuration task and ZTP completion status as a whole. -- ZTP feature should be a build time selection option. By default it is not included in the image. -- Provide optional security features to allow encryption and authentication while exchanging sensitive information between the switch and remote provisioning server. -- Example template to demonstrate download, validate and install of a SONiC image file can be provided. -- ZTP should be able to provisioning the switch over in-band network in addition to out-of-band management network. The first interface to provide provisioning data will be used and any provisioning data provided by other interfaces is ignored. -- Both IPv4 and IPv6 DHCP discovery and ZTP provisioning should be supported. +1. When a newly deployed SONiC switch boots for the first time, it must allow automatic setup of the switch without any user intervention. This framework is called as Zero Touch Provisioning or in short ZTP. +2. DHCP offer sent to a SONiC switch will kickstart ZTP. Unless ZTP is disabled by user, it must wait forever till provisioning of the switch is completed. +3. ZTP must allow users to perform one or more configuration tasks. Data and logic used for these configuration task can be defined by the user. It must also allow ordering of these configuration tasks as defined by the user. This information is represented as in JSON format. DHCP option (67) in the DHCP offer contains the url to the JSON file. This allows ZTP to download and process the data to execute the described configuration tasks. +4. ZTP should allow users to suspend a configuration task and move on to the next one. ZTP resumes the incomplete task later after finishing rest of the tasks in the order of execution. The *suspend-exit-code* object is used to for this feature. +5. Switch reboots during ZTP must be supported. ZTP should resume from where it had left prior to reboot. +6. Configuration tasks must be completely user defined. Few predefined tasks shall be provided as part of default switch image. However, user must be able to override the logic of predefined tasks with user supplied logic (script). +7. Pre-defined configuration tasks to download and apply Config DB JSON, SNMP community string, download, validate and install/remove of a SONiC firmware image must be included as part of ZTP. +8. ZTP should allow user to download and execute a single provisioning script as a secondary alternative to defining a workflow using a JSON file. A different DHCP option (239) should be used to specify the url string for the provisioning script. +9. ZTP must be backward compatible with the legacy SONiC provisioning solution using updategraph service. It must provide a provision to download and apply minigraph.xml and acl.json. +10. ZTP service must not block other switch applications from continuing to boot. As part of configuration tasks appropriate action (e.g restart service) is taken to apply the configuration required by switch applications. +11. ZTP must include switch information while requesting for files from a remote provisioning server. This allows remote server to provide switch specific files at runtime. The information must include switch Product-Name, switch Serial-Number, SONiC software version running on the switch and Base-MAC-Address. HTTP headers are used to send this information. +12. While specifying url strings to files that can be downloaded from a remote server, ZTP must allow options to construct url string at runtime using switch information. This allows a switch to request for a file that is appropriate for it from the remote provisioning server. The information used to construct the url string includes switch Product-Name, switch Serial-Number, SONiC software version running on the switch and Base-MAC-Address. The *dynamic-url* object is used to satisfy this requirement. +13. ZTP must support HTTP, HTTPS, TFTP, FTP and scp protocols over IPv4 and IPv6 transport to request files from a remote provisioning server. This includes JSON file and provisioning scripts. +14. ZTP output and result must be logged through syslog and a local file. User must be able to configure ZTP in debug mode to see more verbose output of ZTP process. +15. ZTP is expected to run to completion only after all the configuration tasks are completed. Result is either SUCCESS/FAILED. At this point ZTP exits and does not run again. It requires user intervention to re-enable ZTP. +16. ZTP must provide an option to user to ignore result of configuration tasks to determine the result (SUCCESS/FAILED) of ZTP. The *ignore-result* object is used to specify this option. +17. ZTP should provide an option to user to reboot the switch when a configuration task succeeds or fails. The *reboot-on-success* and *reboot-on-failure* objects are used to specify this option. +18. When an error is encountered in a configuration task, ZTP must provide an option to the user to stop execution of more configuration tasks. User intervention is required to inspect and re-start ZTP. The *halt-on-failure* object is used to specify this option. +19. Manual interruption of ZTP service must be allowed. It must result in ZTP to be disabled and user intervention is needed to re-enable it. The commands ztp enable/ztp disable/ztp run are used to perform this operation. +20. ZTP status command issue by user must include completion status of each configuration task and ZTP completion status as a whole. A date/timestamp is also recorded when each configuration tasks status has changed. +21. All files created during a ZTP session must be stored in a persistent location for user to inspect. When a new ZTP session is started, this data is deleted. */var/lib/ztp* is the location where this data is stored. +22. ZTP feature must be a build time selection option. By default it is not included in the image. +23. Provide optional security features to allow encryption and authentication while exchanging sensitive information between the switch and remote provisioning server. +24. ZTP must be able to provisioning the switch over in-band network in addition to out-of-band management network. The first interface to provide provisioning data will be used and any provisioning data provided by other interfaces is ignored. +25. Both IPv4 and IPv6 DHCP discovery and ZTP provisioning should be supported. ## 3. Functional Description @@ -56,7 +70,7 @@ Zero Touch Provisioning (ZTP) service can be used by users to configure a fleet SONiC consists of many pre-installed software modules that are part of default image. Some of these modules are network protocol applications (e.g FRR) and some provide support services (e.g syslog, DNS). Data and logic to configure various SONiC modules is encoded in a user defined input file in JSON format. This data file is referred to as ZTP JSON. -When SONiC switch boots for first time, ZTP service checks if there is an existing ZTP JSON file. If no such file exists, DHCP Option 67 value is used to obtain the URL of ZTP JSON file. ZTP service then downloads the ZTP JSON file and processes the file. If DHCP Option 67 value is not available, ZTP service waits till it is provided by the DHCP server. +When SONiC switch boots for first time, ZTP service checks if there is an existing ZTP JSON file. If no such file exists, DHCP Option 67 value is used to obtain the URL of ZTP JSON file. ZTP service then downloads the ZTP JSON file and processes the file. If DHCP Option 67 value is not available, ZTP service waits forever till it is provided by the DHCP server. Other switch services include SWSS continue to boot. If a ZTP JSON file is already present on the switch, ZTP service uses it to perform next steps. @@ -86,17 +100,12 @@ Below is an example configuration section used for configuring SNMP community s ```json "snmp": { "ignore-result": false, - "community-ro": [ - "public", - "local" - ], - "community-rw": [ - "private" - ] + "community-ro": "public", + "snmp-location": "ny-t32-r02" } ``` -Each section has a unique name, *snmp* in above example. It provides sufficient data to configure a single or a set of modules on the switch. In the *snmp* example, a list of read only and read write SNMP community strings are provided as values of the *community-rw* and *community-ro* objects. ZTP service invokes the logic which takes these values and adds them to "/etc/sonic/snmp.yml" file and restarts SNMP daemon. +Each section has a unique name, *snmp* in above example. It provides sufficient data to configure a single or a set of modules on the switch. In the *snmp* example, the read write SNMP community string and SNMP location are provided as values of the *community-ro* and *snmp-location* objects. ZTP service invokes the logic which takes these values and adds them to "/etc/sonic/snmp.yml" file and restarts SNMP daemon. Also each configuration section of ZTP JSON includes some common objects that are used to influence its execution. They also help track progress of individual section and the entire ZTP activity. @@ -111,11 +120,19 @@ Also each configuration section of ZTP JSON includes some common objects that ar Default value assumed to be BOOT if the object is not present. ZTP service adds this object to the ZTP JSON file if not found. +- **description**: Optional free from textual string used to describe a configuration section defined in the ZTP JSON file. + +- **exit-code**: Indicates the program exit code obtained after processing the configuration section. + - **ignore-result** : + - false - ZTP service marks status as FAILED if an error is encountered while processing this individual section - - true - ZTP service marks status as SUCCESS even if an error is encountered while processing this individual section +- true - ZTP service marks status as SUCCESS even if an error is encountered while processing this individual section + - Default value is assumed to be *false* if the object is not present. +Default value is assumed to be *false* if the object is not present. + +- **start-timestamp** : Specifies the time and date when ZTP service began processing the configuration section. This object is also available for the top level ztp section and it indicates the time and date when ZTP service started. This object is used to calculate the processing time. ZTP service adds this object to the ZTP JSON file - **timestamp** : Specifies the time and date when the *status* variable of a section is modified. @@ -144,11 +161,50 @@ Also each configuration section of ZTP JSON includes some common objects that ar Default value is assumed to be *false* if the object is not present. -- **ztp-json-source** : This object defines the source from which the ZTP JSON file was downloaded from. This object is applicable only for the overarching ztp object and not individual configuration sections. Default value is assumed to be *DHCP* if the object is not present. +- **restart-ztp-on-failure**: + + - true - ZTP procedure is restarted if the result of ZTP is FAILED after processing all of the configuration sections defined in the ZTP JSON file. +- false - ZTP service exits after processing all of the configuration sections defined in the ZTP JSON file. + + Default value is assumed to be false if the object is not present. + + This object is applicable only for the top level *ztp* object in the ZTP JSON file. + +- **restart-ztp-no-config**: + + - true - ZTP procedure is restarted if the configuration file */etc/sonic/config_db.json* is not present after the completion of processing the configuration sections defined in the ZTP JSON file. + - false - ZTP service exits after processing all of the configuration sections defined in the ZTP JSON file even if the configuration file */etc/sonic/config_db.json* is not present. + + Default value is assumed to be true if the object is not present. - - DHCP - ZTP service downloaded the ZTP JSON file using the URL specified in the DHCP option 67 received by the switch when it obtained an IP address. + This object is applicable only for the top level *ztp* object in the ZTP JSON file. - - local_fs - This value should be used if the ZTP JSON file has been included in the SONiC image as part of build. When this value is set, ZTP service ignores the URL provided in DHCP Option 67 and processes only the file on disk. This option can be used in scenarios where some DHCP server is not present or cannot be possible and some initial configuration steps need to be performed on the switch on boot. +- **config-fallback**: + + - true - Factory default configuration is generated if the configuration file */etc/sonic/config_db.json* is not present after the completion of processing the configuration sections defined in the ZTP JSON file. +- false - ZTP service takes action based on the value of the object *restart-ztp-no-config*. + + Default value is assumed to be false if the object is not present. + + This object is applicable only for the top level *ztp* object in the ZTP JSON file. + +- **ztp-json-version**: This object defines the version of the ZTP JSON file. This object can be used in future to perform ZTP JSON data migration between different old and newer versions of sonic-ztp package. If not set by the user, the ZTP service assigns a value to it specifying the version number that is compliant with the current version of the ZTP service. This object is applicable only for the top level *ztp* object in the ZTP JSON file. + +- **ztp-json-source** : This object defines the source from which the ZTP JSON file was downloaded from. It also indicates the interface from which ZTP JSON file URL was learnt. This object is applicable only for the top level *ztp* object in the ZTP JSON file. The following are the possible values this object can have: + + - dhcp-opt67 - ZTP service downloaded the ZTP JSON file using the URL specified in the DHCP option 67 received by the switch when it obtained an IP address. + + - dhcp6-opt59 - ZTP service downloaded the ZTP JSON file using the URL specified in the DHCPv6 option 59 received by the switch when it obtained an IPv6 address. + + - dhcp-opt239 - ZTP service downloaded the provisioning script using the URL specified in the DHCP option 239 received by the switch when it obtained an IP address. + + - dhcp6-opt239 - ZTP service downloaded the provisioning script using the URL specified in the DHCPv6 option 239 received by the switch when it obtained an IPv6 address. + + - dhcp-opt225-graph-url - ZTP service downloaded the minigraph.xml file using the URL specified in the DHCP option 225 received by the switch when it obtained an IPv6 address. + + - local-fs - This value indicates that the ZTP JSON file has been included in the SONiC image as part of the build. This option can be used in scenarios where a DHCP server is not present and initial configuration steps need to be performed on the switch on boot. + + Configuration sections in ZTP JSON are processed by ZTP service in lexical order of section names. In order to force execution order, names in xxx-name format (e.g 001-firmware) can be used. For predefined plugins leading sequence number is stripped off to find appropriate plugin. So 001-firmware configuration section will be processed internally using firmware plugin. More on plugins in the [*ZTP plugins*](#32-ztp-plugins) section. @@ -157,13 +213,13 @@ ZTP service exits and marks the status as FAILED if any errors are encountered w ## 3.2 ZTP Plugins -Each section of ZTP JSON data is processed by corresponding handler which can understand the objects/values of that section using a predefined logic. This handler is referred to as a plugin. Plugins are executable files, mostly scripts, which take objects/values described in corresponding configuration section as input. For e.g the "snmp" section is processed by the snmp plugin provided by SONiC-ZTP package. For plugins provided by SONiC-ZTP package, it is mandatory that the name of the configuration section matches the plugin file name. Predefined plugins can be found in the directory "/var/lib/ztp/plugins". +Each section of ZTP JSON data is processed by corresponding handler which can understand the objects/values of that section using a predefined logic. This handler is referred to as a plugin. Plugins are executable files, mostly scripts, which take objects/values described in corresponding configuration section as input. For e.g the "snmp" section is processed by the snmp plugin provided by SONiC-ZTP package. For plugins provided by SONiC-ZTP package, it is mandatory that the name of the configuration section matches the plugin file name. Predefined plugins can be found in the directory "/usr/lib/ztp/plugins". ### 3.2.1 User Defined Plugins SONiC ZTP allows users to specify custom configuration sections and provide corresponding plugin executable. ZTP service downloads the plugin and uses it to process objects/values specified in the configuration section. This allows users to extend ZTP functionality in ways that suit their environment and deployment needs. For better compatibility with input data, users are encouraged to use executables which can process JSON formatted data. -Below is an example section of ZTP JSON data which is used to configure SNMP communities on a switch. The *plugin* object defines the usage of user defined plugin. In this example, user provided *my-snmp.py* file is downloaded using the url indicated by the *plugin.url.source* field. The plugin is copied locally as the file "/var/run/ztp/plugins/my-snmp" on SONiC switch and executed by ZTP service. +Below is an example section of ZTP JSON data which is used to configure SNMP communities on a switch. The *plugin* object defines the usage of user defined plugin. In this example, user provided *my-snmp.py* file is downloaded using the url indicated by the *plugin.url.source* field. The plugin is copied locally as the file "/var/run/ztp/plugins/my-snmp" on SONiC switch and executed by ZTP service. If *plugin.url.destination* is not provided, the downloaded plugin is saved as */var/lib/ztp/sections/'section-name'/plugin*. ```json "snmp": { @@ -183,7 +239,7 @@ Below is an example section of ZTP JSON data which is used to configure SNMP com ] } ``` -User defined plugins downloaded by ZTP service are deleted after the configuration section processing is complete. +User defined plugins are downloaded only once during a ZTP service. If the destination file already exists locally, the plugin is not downloaded again. It is recommended to not use *plugin.url.destination* and allow ZTP to download a plugin file to temporary storage. The temporary storage is cleared when a new ZTP session starts and is also guaranteed not to conflict with plugins used in other configuration sections. ### 3.2.2 Plugin Exit Code @@ -229,6 +285,7 @@ Following is the list of objects supported by url object. Also provided is brief | curl-arguments | Arguments to curl command used to download the url | Refer to [curl manual](https://curl.haxx.se/docs/manual.html "curl manual"). | Null string | | encrypted | Indicates the file being downloaded in encrypted format. | Refer to [Encryption](#511-encryption-and-authentication). | No encryption | | include-http-headers | To enable/disable sending of switch information as part of [HTTP Headers](#331-http-headers). | true
false | true | +| timeout | Maximum number of seconds allowed for curl to establish a connection | Valid non-zero integer | 30s | In case there are no additional fields to be defined in *url* object and only *source* is being defined, *url* can be specified in short hand notation. @@ -264,10 +321,13 @@ In below example, SONiC ZTP package provided *config-db-json* plugin is being us Following is the list of objects supported by plugin object. Also provided is brief description of their usage, values that can be assigned and the default value assumed when the object is not used. | Object |Usage |Supported Values| Default Value| -| ------------ | ------------ | ------------ | ------------ | +| :----------- | ------------ | ------------ | ------------ | | url | Define the URL string from where plugin has to be downloaded in the form of url object |Refer to [url object](#url-object) |Name of enclosing object | | dynamic-url | Define the URL string from where plugin has to be downloaded in the form of url object |Refer to [dynamic-url object](#332-dynamic-url-object) |Name of enclosing object | | name | Use a predefined plugin available as part of SONiC ZTP package | Predefined plugins | Name of enclosing object| +| shell | Use this to specify if the plugin has to be executed through the shell. For more information refer to the shell option of python [subprocess](https://docs.python.org/3/library/subprocess.html) library. | true
false | false | +| ignore-section-data | Use this to specify if section data read from the configuration section of the plugin should not be passed as the first argument to plugin command | true
false | false | +| args | Defines argument string that can be passed as an argument to the plugin command. The *ignore-section-data* object needs to be set to true if the plugin command is expecting only the argument string as part of the command. | Valid command arguments | Null string | *plugin.dyrnamic-url* takes precedence over *plugin.url* over *plugin.name* if multiple definitions are defined. @@ -275,13 +335,13 @@ A short hand notation is possible using 'plugin': 'name' which is equivalent of ```json "plugin": { - "name": "config-db-json" + "name": "configdb-json" } ``` can be written in short hand notation as ```json - "plugin": "config-db-json" + "plugin": "configdb-json" ``` #### configdb-json @@ -297,6 +357,23 @@ The *configdb-json* plugin is used to download ConfigDB JSON file and apply the } ``` + + +Following is the list of objects supported by the configdb-json object. Also provided is brief description of their usage, values that can be assigned and the default value assumed when the object is not used. + +| Object | Usage | Supported Values | Default Value | +| :----------- | ------------------------------------------------------------ | ------------------------------------------------------ | ------------- | +| url | Define the URL string from where the config_db.json file has to be downloaded in the form of url object | Refer to [url object](#url-object) | N/A | +| dynamic-url | Define the URL string from where the config_db.json file has to be downloaded in the form of dynamic-url object | Refer to [dynamic-url object](#332-dynamic-url-object) | N/A | +| clear-config | Use this to specify if the existing configuration has to be cleared before loading the download config_db.json file content to the Config DB. When set to true, *config reload* command is executed. When set to false, *config load* command is executed. | true
false | true | +| save-config | Use this to perform a *config save* command after loading the downloaded config_db.json file. | true
false | false | + +It is mandatory that either one of the *url* or *dynamic-url* objects are defined in configuration section. + +When configdb-json plugin is executed, the DHCP address assigned during ZTP discovery is released. So it is important that interface IP address assignment is performed as part of the provided *config_db.json* file. + + + #### firmware The *firmware* plugin is used for image management on a switch. It can be used to install, remove and boot selection of images. sonic_installer utility is used by this plugin to perform the supported functions. @@ -341,31 +418,74 @@ Example to install a new image only if it satisfies the pre-install verify check } ``` + + Following is the list of objects supported by the *firmware* plugin. Also provided is brief description of their usage, values that can be assigned and the default value assumed when the object is not used. + + | Object |Usage |Supported Values| Default Value| | ------------ | ------------ | ------------ | ------------ | -| install | Used to install an image using URL | [url](#url-object)
[dynamic-url](#332-dynamic-url)
pre-check
set_default
set_next_boot |N/A| -| remove | Used to uninstall an existing image | version
pre-check |N/A| -| upgrade_docker | Used install a docker image on the SONiC switch | [url](#url-object)
[dynamic-url](#332-dynamic-url) | N/A | +| **install** | Used to install an image using URL | || +| url
dynamic-url | Specifies the URL string from where the firmware image file has to be downloaded in the form of url or dynamic-url object | url](#url-object)
[dynamic-url](#332-dynamic-url) || +| version | Specifies the version of the SONiC image being installed. This object is optional. | SONiC version string |N/A| +| set-default | Specifies that the firmware image being installed is selected as the default image to boot the switch from. | true
false |true| +| set-next-boot | Specifies that the firmware image being installed is selected as the image to boot the switch from only for one time on next reboot. | true
false |false| +| skip-reboot | Specifies if a switch reboot operation is performed immediately after installing a new switch firmware image. | true
false |false| +| pre-check | Specifies the URL of a user provided script to be executed before installing the downloaded switch firmware image. The firmware installation is performed only if pre-check script result is success. | [url](#url-object)
[dynamic-url](#332-dynamic-url) | | +| **remove** | Used to uninstall an existing image on the disk | || +| version | Specifies the version of the SONiC image to be removed. This object is mandatory. | SONiC version string |N/A| +| pre-check | Specifies the URL of a user provided script to be executed before removing the specified switch firmware image version for the disk. The firmware removal is performed only if pre-check script result is success. | [url](#url-object)
[dynamic-url](#332-dynamic-url) |N/A| +| **upgrade-docker** | Used install a docker image on the SONiC switch | | | +| url
dynamic-url | Specifies the URL string from where the docker image file has to be downloaded in the form of url or dynamic-url object | [url](#url-object)
[dynamic-url](#332-dynamic-url) | | +| container-name | Name of the docker image being upgrade | Supported docker container names (e.g swss) | | +| cleanup-image | Clean up old docker image while installing new docker image | true
false | false | +| enforce-check | Enforce pending task check for docker upgrade | true
false | false | +| tag | Specify a tag to the newly installed docker image | Valid string | Null string | +| pre-check | Specifies the URL of a user provided script to be executed before installing the specified docker container. The docker image installation is performed only if pre-check script result is success. | [url](#url-object)
[dynamic-url](#332-dynamic-url) | N/A | The *pre-check* object is used to specify a user provided script to be executed. If the result of the script is successful, the action (install/remove) is performed. Its value is a *url object*. *firmware.remove* is first processed followed by *firmware.install* if both are defined. +#### connectivity-check + +The *connectivity-check* plugin is used to ping a remote host and verify if the switch is able to reach the remote host. It is possible to ping multiple hosts and the plugin result is marked as failed even if ping to one of the specified host fails. + + + +```json + "connectivity-check" : { + "ping-hosts" : [ "192.168.1.1", "172.10.1.1" ] + } +``` + + + +Following is the list of objects supported by the connectivity-check plugin. Also provided is brief description of their usage, values that can be assigned and the default value assumed when the object is not used. + + + +| Object | Usage | Supported Values | Default Value | +| :------------- | ------------------------------------------------------------ | ---------------------------- | ------------- | +| ping-hosts | List of IPv4 hosts to ping | N/A | N/A | +| ping6-hosts | List of IPv6 hosts to ping | N/A | N/A | +| retry-interval | Specify a timeout, in seconds, before retrying ping to a host. | Valid non-zero integer value | 5 seconds | +| retry-count | Stop ping to a ping host and move on to the next host specified in the list. | Valid non-zero integer value | 12 | +| ping-count | Stop after sending *count* ECHO_REQUEST packets. With *deadline* option, ping waits for *count* ECHO_REPLY packets, until the timeout expires. | Valid non-zero integer value | 3 | +| deadline | Specify a timeout, in seconds, before ping exits regardless of how many packets have been sent or received. In this case ping does not stop after *count* packet are sent, it waits either for *deadline* expire or until *count* probes are answered or for some error notification from network. | Valid non-zero integer value | N/A | +| timeout | Time to wait for a response, in seconds. The option affects only timeout in absense of any responses, otherwise ping waits for two RTTs. | Valid non-zero integer value | N/A | + + + #### snmp The *snmp* plugin is used to configure SNMP community string on SONiC switch. This plugin is provided as an alternative for soon to be deprecated privately used DHCP options 224 used by SONiC. ```json "snmp": { - "community-ro": [ - "public", - "local" - ], - "community-rw": [ - "private" - ] + "community-ro": "public", + "snmp-location": "dnv-r10-t01" } ``` @@ -373,9 +493,9 @@ Following is the list of objects supported by the *snmp* plugin. Also provided | Object |Usage |Supported Values| Default Value| | ------------ | ------------ | ------------ | ------------ | -| community-ro | Comma separated list of SNMP read only community strings | Syntactically valid SNMP community string |Null string| -| community-rw | Comma separated list of SNMP read write community strings | Syntactically valid SNMP community string | Null string| -| restart_agent | | true
false | true | +| community-ro | SNMP read only community string | Syntactically valid SNMP community string |Null string| +| snmp-location | SNMP location string | Syntactically valid SNMP location string | Null string| +| restart-agent | Restart snmp service after setting specified SNMP parameters | true
false | true | #### graphservice @@ -385,10 +505,10 @@ Example usage. ```json "graphservice": { - "minigraph_url": { + "minigraph-url": { "url": "http://192.168.1.1:8080/minigraph.xml" }, - "acl_url": { + "acl-url": { "dynamic-url": { "source": { "prefix": "http://192.168.1.1:8080/acl_", @@ -435,8 +555,8 @@ The *prefix*, *identifier* and *suffix* are concatenated to form the url which i ##### identifier subobject This subobject is used to specify the logic that is executed on the switch to determine the variable portion of the url. Some pre-defined generally used logic are provided. There is also a possibility to provide user-defined logic. -"identifier:": "hostname" -"identifier:": "hostname-fqdn" +"identifier": "hostname" +"identifier": "hostname-fqdn" Hostname of the switch is used to the identifier. Switches are assigned unique hostnames by the DHCP server. It can be used while naming files corresponding to a particular switch. @@ -452,7 +572,7 @@ In below example all the switch configuration files are stored at the remote ser "destination": "/etc/sonic/config_db.json" } ``` -"identifier:": "serial-number" +"identifier": "serial-number" ```json "dynamic-url": { @@ -464,7 +584,7 @@ In below example all the switch configuration files are stored at the remote ser "destination": "/etc/sonic/config_db.json" } ``` -"identifier:": "product-name" +"identifier": "product-name" In below example switch model string is used to identify the image that needs to be downloaded. @@ -478,7 +598,7 @@ In below example switch model string is used to identify the image that needs to } ``` -"identifier:": "url" +"identifier": "url" It is not possible to pre-determine the file naming convention using at the server. So a provision for running user-defined logic can be supplied as a url object. In below example user provides a script *config_filename_eval.sh* which is downloaded and executed. The output string returned by the user provided script is used as the switch's configuration file name. ```json @@ -505,18 +625,34 @@ Following is the list of objects supported by the *dynamic-url* object. Also pro | curl-arguments | Arguments to curl command used to download the url | Refer to [curl manual](https://curl.haxx.se/docs/manual.html "curl manual"). | Null string | | encrypted | Indicates the file being downloaded in encrypted format. | Refer to [Encryption](#511-encryption-and-authentication). | No encryption | | include-http-headers | To enable/disable sending of switch information as part of [HTTP Headers](#331-http-headers). | true
false | true | +| timeout | Maximum number of seconds allowed for curl to establish a connection | Valid non-zero integer | 30s | ### 3.4 DHCP Options -The following are the private DHCP options used by SONiC switch to receive input data for ZTP service and graphservice. +The following are the DHCP options used by the SONiC switch to receive input provisioning data for ZTP service and graphservice. | DHCP Option | Name | Description | |:-----------:|:-------------------|:-------------------------------------------------| -| 224 | snmp_community | snmpcommunity DHCP hook updates /etc/sonic/snmp.yml file with provided value | -| 225 | minigraph_url | graphserviceurl DHCP hook updates the file /tmp/dhcp_graph_url with the provided url. updategraph service processes uses it for further processing. | -| 226 | acl_url | graphserviceurl DHCP hook updates the file /tmp/dhcp_acl_url with the provided url. updategraph service processes uses it for further processing. | -| 67 | ztp_json_url | URL for ZTP input data: All user configurable data that can be input to ZTP process. This information can be used to access more advanced configuration information.| +| 61 | dhcp-client-identifier | Used to uniquely identify the switch initiating DHCP request. SONiC switches set this value to "SONiC##*product-name*##*serial-no*". | +| 66 | tftp-server | TFTP-Server address | +| 67 | ztp_json_url | URL to download the ZTP JSON file. It can also specify the ZTP JSON file path on tftp server. | +| 77 | user-class | Used to optionally identify the type or category of user or applications it represents. SONiC switches set this value to "SONiC-ZTP". | +| 224 | snmp_community | snmpcommunity DHCP hook updates /etc/sonic/snmp.yml file with provided value | +| 225 | minigraph_url | graphserviceurl DHCP hook updates the file /tmp/dhcp_graph_url with the provided url. updategraph service processes uses it for further processing. | +| 226 | acl_url | graphserviceurl DHCP hook updates the file /tmp/dhcp_acl_url with the provided url. updategraph service processes uses it for further processing. | | 239 | ztp_provisioning_script_url | URL for a script which needs to be downloaded and executed by ZTP service on the switch. | + + +The following are the DHCPv6 options used by the SONiC switch to receive input provisioning data for ZTP service. + +| DHCPv6 Option | Name | Description | +|:-----------:|:-------------------|:-------------------------------------------------| +| 15 | user-class | Used to optionally identify the type or category of user or applications it represents. SONiC switches set this value to "SONiC-ZTP". | +| 59 | boot-file-url | URL to download the ZTP JSON file. | +| 239 | ztp_provisioning_script_url | URL for a script which needs to be downloaded and executed by ZTP service on the switch. | + + + The use of following DHCP options will be deprecated in future releases of SONiC as its values can be included in the ZTP JSON file whose URL can be obtained via DHCP option 67. | Deprecated
DHCP Options | Name | @@ -529,6 +665,8 @@ It is recommended to use either ztp_provisioning_script_url or ztp_json_url but DHCP hook script */etc/dhcp/dhclient-exit-hooks.d/ztp* is used to process DHCP option 67 and 239. This script is provided as part of SONiC-ZTP package. + + ## 3.5 ZTP Service @@ -539,57 +677,111 @@ DHCP hook script */etc/dhcp/dhclient-exit-hooks.d/ztp* is used to process DHCP -The ZTP service is defined as a systemd service running on native SONiC O/S. It does not run inside a container. ZTP service starts after *networking.service*, *rc-local.service* and *database.service*. If ZTP is not administratively enabled, the service exits and does not run again until next boot or if user intervenes. Only updategraph.service wants ztp.service. No other services are not blocked for ztp.service to start or exit. +The ZTP service is defined as a systemd service running on native SONiC O/S. It does not run inside a container. ZTP service starts after the *interfaces-config.service*, *rc-local.service* and *database.service*. If ZTP is not administratively enabled, the service exits and does not run again until next boot or if user intervenes. No services are not blocked for ztp.service to start or exit. + + + +The switch tries to obtain management connectivity after it has boot up and all the connected ports are linked up. DHCP discovery is performed on all the connected in-band interfaces and the out-of-band management interface. Also, both DHCPv4 and DHCPv6 address discovery is performed. Which ever interface receives the first DHCP offer, is used as primary management interface to obtain user provided provisioning data. + + + +When a management interface obtains IP address via DHCP, a URL pointing to ZTP JSON file is provided as a value of the DHCP option 67. When the ZTP service starts, it first checks if there already exists a ZTP JSON file locally and if found loads it. If *ztp.status* field of local file is either SUCCESS, FAILED or DISABLED, ZTP service exits. If *ztp.status* field of local ZTP JSON file is IN-PROGRESS or BOOT, local file is used for further processing. If no local ZTP JSON file is found, the ZTP service downloads the ZTP JSON file using the URL provided by the DHCP Option 67 and starts processing it. -When management interface obtains IP address via DHCP, URL pointing to ZTP JSON file is provided as value of DHCP option 67. When ZTP service starts, it first checks if there already exists a ZTP JSON file locally and if found loads it. If *ztp.status* field of local file is either SUCCESS, FAILED or DISABLED, ZTP service exits. If *ztp.status* field of local ZTP JSON file is IN-PROGRESS, local file is used for further processing. If no local ZTP JSON file is found or if the *ztp.status* field of local ZTP JSON file is BOOT, ZTP service downloads the ZTP JSON file using the URL provided by DHCP Option 67 are starts processing it. -If user defines DHCP option 239, ZTP service downloads the provisioning script indicated in the URL and executes it. The exit code returned by the provisioning script is used to indicate whether ZTP has succeeded or failed. Exit code 0 indicates successful execution and any other value is treated as failure. ZTP service exits and does not run again unless user enables it again manually. -It is to be noted that DHCP option 67 takes precedence over DHCP option 239. +If user defines the DHCP option 239 in the DHCP offer, the ZTP service downloads the provisioning script indicated in the URL and executes it. The exit code returned by the provisioning script is used to indicate whether ZTP has succeeded or failed. Exit code 0 indicates successful execution and any other value is treated as failure. The ZTP service exits and does not run again unless user enables it again manually. When DHCP Option 239 based provisioning script is used, ZTP service does not restart automatically even if the startup configuration file */etc/sonic/config_db.json* is not created by the user as part of the provisioning script. -ZTP service parses the ZTP JSON file and processes individual configuration sections in lexical order of their names. If *status* or *timestamp* fields are missing they are added to it. A local copy of ZTP JSON file is maintained as the file */var/lib/ztp/data/ztp.json*. Individual configuration sections are identified and split into individual files as */var/lib/ztp/data/sections/section-name*. The ztp.json file continues to hold all sections as defined by the user. -This ztp.json file is constantly updated with any changes made during the processing of loaded ZTP data. To begin with *ztp.status* is set to IN-PROGRESS and individual sections are processed in order of their names. The *status* object of the configuration section being processed is set to 1 In-Progress and corresponding plugin is executed. -Each section whose *status* value is BOOT or IN-PROGRESS is processed in order. Corresponding plugin is called with */var/lib/ztp/data/sections/section-name* as argument to it. Exit code of plugin is used to determine the configuration sections *status* as explained in the [*Plugin Exit Code*](#322-plugin-exit-code) section of this document. +It is to be noted that DHCP option 67 takes precedence over DHCP option 239. Also in the case of IPv6 based network, DHCPv6 option 59 is used to provide the ZTP JSON file URL. + + + +The ZTP service parses the ZTP JSON file and processes individual configuration sections in lexical order of their names. Any missing objects are added after assigning default values to them. A local copy of ZTP JSON file is maintained as the file */host/ztp/ztp_data.json*. Individual configuration sections are identified and split into individual files as */var/lib/ztp/sections/section-name/input.json*. The *ztp_data.json* file continues to hold all sections as defined by the user and is used by the ZTP service. + + + +This *ztp_data.json* file is constantly updated with any changes made during the processing of loaded ZTP data. To begin with *ztp.status* is set to IN-PROGRESS and individual sections are processed in the order of their names. The *status* object of the configuration section being processed is set to IN-PROGRESS and corresponding plugin is executed. Each section whose *status* value is BOOT or IN-PROGRESS is processed in order. Corresponding plugin is called with */var/lib/ztp/sections/section-name/input.json* as argument to it. Exit code of plugin is used to determine the configuration sections *status* as explained in the [*Plugin Exit Code*](#322-plugin-exit-code) section of this document. + + When all the sections have been processed, *ztp.status* field is updated taking into consideration the result of all individual sections. Sections with disabled *status* and *ignore-status: true* are not considered. *ztp.status* is marked as SUCCESS only if *status* field of all rest of the sections is SUCCESS. ZTP service exits and does not run again unless user enables it again. -If user does not provide both DHCP option 67 or DHCP option 239, ZTP service continues to run and wait for one of these values to be provided to it. +If user does not provide both DHCP option 67 or DHCP option 239, ZTP service continues to run and wait forever until one of these values is provided to it. + + + +Following is the order in which DHCP options are processed: + +1. The ZTP JSON file specified in pre-defined location as part of the image Local file on disk */host/ztp/ztp_local_data.json*. +2. ZTP JSON URL specified via DHCP Option-67 +3. ZTP JSON URL constructed using DHCP Option-66 TFTP server name, DHCP Option-67 file path on TFTP server +4. ZTP JSON URL specified via DHCPv6 Option-59 +5. A provisioning script URL specified via DHCP Option-239 +6. A provisioning script URL specified via DHCPv6 Option-239 +7. Minigraph URL and ACL URL specified via DHCP Option 225, 226 + + + +### 3.6 Start and exit conditions + +On a switch bootup, the ZTP service starts and checks for the presence of the startup configuration file */etc/sonic/config_db.json*. Only if the startup configuration file is not found, the ZTP service starts and performs DHCP discovery to receive information about the location of the ZTP JSON file. If the ZTP admin mode is disabled, the ZTP service exits and the switch proceeds to boot with factory default configuration. -### 3.6 Provisioning over in-band network -If there is no */etc/sonic/config_db.json* and */tmp/pending_config_initialization* is set, ZTP service creates a configuration using ztp preset. The *ztp* preset defines a configuration with PORT table and DEVICE_METADATA table. In addition to creating the default configuration, ztp also creates interface files with name *ztp-Ethernetxxx* for all the ports in PORT table. These are added in */var/run/ztp/dhcp/interfaces.d* and *networking.service* is restarted. This starts DHCP discovery on all in-band interfaces. A dhcp-exit-hook is installed which is used to set the offered IP address in Config DB using the *"config interface interface-name ip add"* command. At this point, the switch receives DHCP option 67 ZTP JSON and is ready to communicate with remote devices. ZTP JSON file is downloaded and ZTP service start performing configuration tasks described in the ZTP JSON. + +If a ZTP JSON file is already present on the switch, the ZTP service continues to process it and does not download a new ZTP JSON. This allows the ZTP service to perform configuration steps which may involve multiple switch reboots. + + + +The SONiC ZTP service exits only after a ZTP JSON file is downloaded and processed by it. After processing the ZTP JSON file, if the startup configuration file */etc/sonic/config_db.json* is not found, the ZTP service restarts DHCP discovery again to obtain a new ZTP JSON file and start processing it. + + + +At any given point of time, user can choose to stop a SONiC ZTP service by executing the *config ztp disable* command. The disable action creates a factory default configuration and saves it as the switch startup configuration. The switch continues to operate with the loaded factory default configuration. + + + +There are multiple configuration options available in the ZTP JSON file for the user to influence the exit criteria of the ZTP service. They are defined in the [*ZTP JSON*](#31-ztp-json) section and can also be viewed in the [*SONiC ZTP Flow Diagram*](#41-sonic-ztp-flow-diagram). + + + +### 3.7 Provisioning over in-band network + +If there is no startup configuration file */etc/sonic/config_db.json* , the ZTP service creates a configuration using the ztp config template */usr/lib/ztp/templates/ztp-config.j2*. The *ztp-config.j2* defines a configuration with the PORT table and the DEVICE_METADATA table populated. In addition to this, the */etc/network/interfaces* file creates defines interfaces and dhcp address configuration for all the ports in the PORT table and the out of band interface, eth0. The *intefaces-config.service* is used to dynamically generate the */etc/network/interfaces* file, corresponding */etc/dhcp/dhclient.conf* with appropriate ZTP related DHCP options and then restarts *networking.service* to kick start DHCP on all the interfaces. This starts DHCP discovery on all in-band interfaces and the out of band interface. A dhcp-enter-hook */etc/dhcp/dhclient-enter-hooks.d/inband-ztp-ip(6)* is installed which is used to set the offered IP address in Config DB using the *"config interface interface-name ip add"* command. At this point, the switch receives DHCP option 67 ZTP JSON and is ready to communicate with remote hosts. The ZTP JSON file is downloaded and ZTP service start performing configuration tasks described in the ZTP JSON. Since DHCP discovery is performed on all in-band interfaces, there can be a condition where multiple interfaces can get an IP address. Only the first port receiving the DHCP offer along with DHCP Option 67 or/and DHCP Option 239 will be processed. So, care should be taken by the administrator to ensure that only one and also the same DHCP server responds to the DHCP discovery initiated by ZTP. -If ZTP is in completed state or in administratively disabled state, it will not create ZTP preset configuration. Instead, switch will continue to boot with empty configuration. To re-run ZTP user will have to login via console and issue 'ztp enable' and/or 'ztp run' command. There are some other scenarios that are possible when a new SONiC image is installed. They are discussed in the section [*Component Interactions*](#37-component-interactions). +If ZTP is in completed state or in administratively disabled state, it will not create ZTP configuration. Instead, switch will create a factory default configuration and proceed to boot with it. To re-run ZTP user will have to login via console and issue 'ztp enable' and 'ztp run' command. + + SONiC ZTP supports reboot and resume while a ZTP session is in progress. To handle these scenarios appropriately, it is recommended that the plugins used for configuration tasks, use sufficient checks to determine that switch is communicating to external devices before executing provisioning steps. Since SONiC ZTP is a framework and does not have knowledge on making a decision on reachability, connectivity checks or sufficient retries need to be included in the plugins scripts as part of the defined workflow. This may not be applicable in the case of first run of ZTP since the ZTP JSON URL is obtained only after establishing connectivity. ZTP service can safely assume connectivity and proceed without any issues. To summarize, when there is a reboot step involved, the configuration section plugins should take care that there may be instances where there can be connectivity loss. Same is the case when a config_db.json file is downloaded and applied to the switch. -A configuration option is provided in the ZTP configuration file *ztp_cfg.json* to enable or disable in-band provisioning feature. Provisioning over in-band network is enabled by default when ZTP package is included. +A configuration option is provided in the ZTP configuration file *ztp_cfg.json* to enable or disable in-band provisioning feature. Use *"feat-inband" : false* in */host/ztp/ztp_cfg.json* to disable ZTP in-band provisioning. Provisioning over in-band network is enabled by default when the ZTP package is included. + -### 3.7 Component Interactions + +### 3.8 Component Interactions ##### updategraph -ZTP and updategraph can co-exist in the same SONiC image. However, updategraph depends on ZTP to provide the values of *graph_url* and *acl_url.* ZTP service processes the DHCP response and updates *src* and *acl_src* values in */etc/sonic/updategraph.conf* file and restarts updategraph service. In the updategraph.service definition file, updategraph.service wants ztp.service. When ZTP feature is available in the build, updategraph does not creates default *''/etc/sonic/config_db.json'* from preset config templates. +ZTP and updategraph can co-exist in the same SONiC image. However, for updategraph to operate ZTP has to be disbaled. If ZTP is enabled, updategraph gracefully exits and depends on ZTP to process the values of *graph_url* and *acl_url.* The ZTP service processes the DHCP response and downloads the *minigraph.xml* and the *acl.json* files and places them in the */etc/sonic* directory. The *config load_minigraph* command is then executed to process the downloaded *minigraph.xml* and *acl.json* files. + -If */tmp/config_migration* is set, ZTP will not create switch default configuration but wait for config_migration to be completed. **Image Upgrade** -When a new SONiC image is installed, contents of */etc/sonic* directory are migrated to the newly installed directory. ZTP JSON and ZTP configuration files are also migrated as part of this configuration migration step. If the image upgrade happened as part of a ZTP session in progress, after booting the new image, ZTP resumes from the point where it left of prior to image switchover. ZTP service waits for configuration migration to complete before taking any action. If after configuration migration, if */etc/sonic/config_db.json* file is not found, ZTP service creates a ZTP preset configuration that enables all in-band interfaces and performs DHCP discovery on them. This establishes connectivity to external devices for provisioning to be completed. +When a new SONiC image is installed, contents of */etc/sonic* directory are migrated to the newly installed directory. ZTP JSON and ZTP configuration files are also available to the new image as they are stored in */host/ztp* directory. If the image upgrade happened as part of a ZTP session in progress, after booting the new image, ZTP resumes from the point where it left of prior to image switchover. ZTP service waits for configuration migration to complete before taking any action. If after configuration migration, if */etc/sonic/config_db.json* file is not found, ZTP service creates a ZTP configuration that enables all in-band interfaces and performs DHCP discovery on them to obtain connectivity and resume processing the ZTP JSON file. This establishes connectivity to external hosts for provisioning to be completed. -There can also be a scenario where on a switch a ZTP is in completed (SUCCESS/FAILED) state. A new SONiC image is installed and user reboots the switch to boot into new image. Even this scenario, contents of */etc/sonic* and ZTP JSON, ZTP configuration files are migrated to the newly installed image. A new session of ZTP is started using the ZTP JSON file that was migrated. If */etc/sonic/config_db.json* file does not exist, ZTP service creates a ZTP preset configuration that enables all in-band interfaces and performs DHCP discovery on them. This establishes connectivity to external devices for provisioning to be completed. It is to be noted that no new ZTP JSON file is downloaded as it is assumed that in a typical scenarios, a successful ZTP would have generated a *config_db.json* file which is migrated to the new image. By not performing DHCP discovery again, we are trying to minimize the affect on the network where in connectivity is provided by the migrated switch configuration and ZTP is re-executed to perform all tasks that are needed for the new switch image. +There can also be a scenario where on a switch a ZTP is in completed (SUCCESS/FAILED) state. A new SONiC image is installed using *sonic_installer* upgrade tool and the user reboots the switch to boot into new image. In this scenario, contents of */host/ztp* and thus ZTP JSON, ZTP configuration file are accessible to the newly installed image. Since ZTP is in completed state, it will not run again. Only if */etc/sonic/config_db.json* file does not exist, ZTP service creates a ZTP configuration and starts the ZTP procedure afresh. It is up to the user to install configuration migration hooks to migrate changes to new image. More information is available in the [SONiC Configuration Setup Service](https://github.com/rajendra-dendukuri/SONiC/blob/config_setup/doc/ztp/SONiC-config-setup.md) design document. - ### 3.8 ZTP State Transition Diagram + ### 3.9 ZTP State Transition Diagram Below figure explains the events and its effect on ZTP state. @@ -601,17 +793,29 @@ User is expected to use supported installation methods by SONiC to install image Below is sequence of events that happen in a simple workflow. -1. SONiC switch boots and ZTP service starts -2. DHCP server provides IP connectivity to management interface along with DHCP Option 67 which provides URL to ZTP JSON file -3. ZTP service downloads ZTP JSON file and processes individual configuration sections in the lexical sorted order of their names, one at a time. -4. Plugin scripts for each configuration section are executed. If it is user-defined, the plugin is downloaded and then executed. +1. The SONiC switch boots and theZTP service starts if there is no startup configuration file */etc/sonic/config_db.json* +2. The DHCP server provides IP connectivity to the management interface along with the DHCP Option 67 which provides URL to the ZTP JSON file +3. The ZTP service downloads the ZTP JSON file and processes all of the individual configuration sections in the lexical sorted order of their names, one after the other. +4. The plugin scripts for each configuration section are executed. If it is a user-defined, the plugin is downloaded and then executed. - If plugin exits with success (= 0), the configuration section is marked SUCCESS and not executed again. - If plugin exits with failure (> 0), the configuration section is marked FAILED and not executed again. - If plugin exits with *suspend-exit-code* , the configuration section is marked SUSPEND and executed again in next cycle. 5. ZTP service cycles through all configuration sections multiple times until each and every configuration section is either in SUCCESS or FAILED state. This is to address sections that have returned with (-1) return code. 6. ZTP result is evaluation based on the result of each configuration section and ZTP service exits and does not run again. -All possible scenarios are not described here as they have been explained in the [ZTP Service](#34-ztp-service) section. +All possible scenarios are not described here and they have been explained further in the [ZTP Service](#35-ztp-service) section. + + + +### 4.1 SONiC ZTP Flow Diagram + + + +Below flow diagram pictorially explains in detail the sequence of events performed by a SONiC switch up on bootup. + +![SONiC ZTP Flow Diagram](images/sonic_config_flow.png) + + ## 5. Security Considerations @@ -678,7 +882,7 @@ The RSA public key and AES key need to be included in SONiC switch image at buil The security concerns can be solved by using a secure protocol like HTTPS for downloading the contents. Users use https URLs and ZTP service verifies the server certificate against known trusted CA certificates on the switch. Thus a trusted and secure communication is established. ##### Installing Certificates -In case of HTTPS URL's, the certificate issuing authority's certificate which issued the server's SSL certificate needs to be installed on the switch as part of default image at build time. */etc/ssl/certs* is the directory to install these certificates. This can be done using organizational extensions or by placing them in the *usr/lib/ztp/certs* directory inside SONiC ZTP package source code directory. +In case of HTTPS URL's, the certificate issuing authority's certificate which issued the server's SSL certificate needs to be installed on the switch as part of default image at build time. */etc/ssl/certs* is the directory to install these certificates. This can be done using organizational extensions or by placing them in the */usr/lib/ztp/certs* directory inside SONiC ZTP package source code directory. ### 5.2 File Permissions @@ -686,45 +890,247 @@ Only *root* user is allowed to read and modify the files created by ZTP service. ## 6. Configuring ZTP -Following are the supported ZTP configuration commands. All configurable parameters are found in *ztp_cfg.json*. ZTP service reads the configuration file during initialization. +This section explains all the Zero Touch Provisioning commands that are supported in SONiC. + +### 6.1 Show commands -### ztp status +**show ztp status** -*ztp status* command displays current state of ZTP service and the date/time since it was in the current state. It also displays current status of each configuration section of user provided ZTP JSON and date/time it was last processed. +This command displays the current ZTP configuration of the switch. It also displays detailed information about current state of a ZTP session. It displays information related to all configuration sections as defined in the switch provisioning information discovered in a particular ZTP session. -### ztp enable -*ztp enable* command is used to administratively enable ZTP. When ZTP feature is included as a build option, ZTP service is configured to be enabled by default. This command is used to re-enable ZTP after it has been disabled by user. It is to be noted that this command will only modify the ZTP configuration file and does not perform any other actions. +- Usage: + show ztp status -### ztp run + show ztp status --verbose -*ztp run* command is used to restart ZTP of a SONiC switch and initiate intended configuration tasks. Also ZTP service is started if it is not already running. This command is useful to restart ZTP after it has failed or is disabled by user. +- Example: + +``` +root@B1-SP1-7712:/home/admin# show ztp status +ZTP Admin Mode : True +ZTP Service : Inactive +ZTP Status : SUCCESS +ZTP Source : dhcp-opt67 (eth0) +Runtime : 05m 31s +Timestamp : 2019-09-11 19:12:24 UTC + +ZTP Service is not running + +01-configdb-json: SUCCESS +02-connectivity-check: SUCCESS +``` + +Use the verbose option to display more detailed information. + +``` +root@B1-SP1-7712:/home/admin# show ztp status --verbose +Command: ztp status --verbose +======================================== +ZTP +======================================== +ZTP Admin Mode : True +ZTP Service : Inactive +ZTP Status : SUCCESS +ZTP Source : dhcp-opt67 (eth0) +Runtime : 05m 31s +Timestamp : 2019-09-11 19:12:16 UTC +ZTP JSON Version : 1.0 + +ZTP Service is not running + +---------------------------------------- +01-configdb-json +---------------------------------------- +Status : SUCCESS +Runtime : 02m 48s +Timestamp : 2019-09-11 19:11:55 UTC +Exit Code : 0 +Ignore Result : False + +---------------------------------------- +02-connectivity-check +---------------------------------------- +Status : SUCCESS +Runtime : 04s +Timestamp : 2019-09-11 19:12:16 UTC +Exit Code : 0 +Ignore Result : False +``` + +- Description + - **ZTP Admin Mode** - Displays if the ZTP feature is administratively enabled or disabled. Possible values are True or False. This value is configurable using "config ztp enabled" and "config ztp disable" commands. + - **ZTP Service** - Displays the ZTP service status. The following are possible values this field can display: + - *Active Discovery*: ZTP service is operational and is performing DHCP discovery to learn switch provisioning information + - *Processing*: ZTP service has discovered switch provisioning information and is processing it + - **ZTP Status** - Displays the current state and result of ZTP session. The following are possible values this field can display: + - *IN-PROGRESS*: ZTP session is currently in progress. ZTP service is processing switch provisioning information. + - *SUCCESS*: ZTP service has successfully processed the switch provisioning information. + - *FAILED*: ZTP service has failed to process the switch provisioning information. + - *Not Started*: ZTP service has not started processing the discovered switch provisioning information. + - **ZTP Source** - Displays the DHCP option and then interface name from which switch provisioning information has been discovered. + - **Runtime** - Displays the time taken for ZTP process to complete from start to finish. For individual configuration sections it indicates the time taken to process the associated configuration section. + - **Timestamp** - Displays the date/time stamp when the status field has last changed. + - **ZTP JSON Version** - Version of ZTP JSON file used for describing switch provisioning information. + - **Status** - Displays the current state and result of a configuration section. The following are possible values this field can display: + - *IN-PROGRESS*: Corresponding configuration section is currently being processed. + - *SUCCESS*: Corresponding configuration section was processed successfully. + - *FAILED*: Corresponding configuration section failed to execute successfully. + - *Not Started*: ZTP service has not started processing the corresponding configuration section. + - *DISABLED*: Corresponding configuration section has been marked as disabled and will not be processed. + - **Exit Code** - Displays the program exit code of the configuration section executed. Non-zero exit code indicates that the configuration section has failed to execute successfully. + - **Ignore Result** - If this value is True, the result of the corresponding configuration section is ignored and not used to evaluate the overall ZTP result. + - **Activity String** - In addition to above information an activity string is displayed indicating the current action being performed by the ZTP service and how much time it has been performing the mentioned activity. Below is an example. + - (04m 12s) Discovering provisioning data + +### 6.2 Configuration commands + +This sub-section explains the list of the configuration options available for ZTP. Following are the supported ZTP configuration commands. All configurable parameters are found in the *ztp_cfg.json*. The ZTP service reads the configuration file during initialization. + + + +**config ztp enable** + +The *config ztp enable* command is used to administratively enable ZTP. When ZTP feature is included as a build option, ZTP service is configured to be enabled by default. This command is used to re-enable ZTP after it has been disabled by user. It is to be noted that this command will only modify the ZTP configuration file and does not perform any other actions. + + + +- Usage: + + config ztp enable + + ztp enable + + + +- Example: + +``` +root@sonic:/home/admin# config ztp enable +Running command: ztp enable +``` + + + +**config ztp disable** + +The *config ztp disable* command is used to stop and disable the ZTP service. If the ZTP service is in progress, it is aborted and ZTP status is set to disable in configuration file. SIGTERM is sent to ZTP service and its sub processes currently under execution. *systemd* defined default time of 90 seconds is provided for them to handle the SIGTERM and take appropriate action and gracefully exit. It is the responsibility of plugins to handle SIGTERM to perform any necessary cleanup or save actions. If the process is still running after 90s, the process is killed. After disable ztp mode, if there is no startup configuration file, a factory default switch configuration is created and loaded. + + + +The ZTP service does not run if it is disabled even after reboot or if startup configuration file is not present. User will have to use *ztp enable* for it to enable it administratively again. + +- Usage: + config ztp disable + + config ztp disable -y + + ztp disable + + ztp disable -y + +- Example: + +``` +root@sonic:/home/admin# config ztp disable +Active ZTP session will be stopped and disabled, continue? [y/N]: y +Running command: ztp disable -y +``` + + + +**config ztp run** + +Use this command to manually restart a new ZTP session. This command deletes the existing */etc/sonic/config_db.json* file and stats ZTP service. It also erases the previous ZTP session data. The ZTP configuration is loaded on to the switch and ZTP discovery is performed. This command is useful to restart ZTP after it has failed or has been disabled by user. + +- Usage: + config ztp run + + config ztp run -y + + ztp run + + ztp run -y + +- Example: + +``` +root@sonic:/home/admin# config ztp run +ZTP will be restarted. You may lose switch data and connectivity, continue? [y/N]: y +Running command: ztp run -y +``` -### ztp disable -*ztp disable* command is used to stop and disable ZTP service. If ZTP service is in progress, it is aborted and ZTP status is set to disable in configuration file. SIGTERM is sent to ZTP service and its sub processes currently under execution. *systemd* defined default time of 90 seconds is provided for them to handle the SIGTERM and take appropriate action and gracefully exit. It is the responsibility of plugins to handle SIGTERM to perform any necessary cleanup or save actions. If the process is still running after 90s the process is killed. -ZTP service does not run if it is disabled even after reboot. User will have to use *ztp enable* for it to enable it administratively again. ## 7. Code Structure -Code related to ZTP framework shall be included in azure/sonic-ztp github repository. The package will be named as sonic_ztp_*version*_all.deb. +Code related to ZTP framework is included in the [azure/sonic-ztp](azure/sonic-ztp) github repository. The compiled package is named as sonic-ztp_version_all.deb. + + + +Top level code structure. + + + +|-- LICENSE +|-- Makefile +|-- README.md >> Very brief ZTP user guide +|-- debian >> sonic-ztp debian package definitions +|-- doc >> doxygen configuration file +|-- src >> Source code of ZTP feature +`-- tests >> pytest based unit testcases + + + +src/usr/lib/ztp/ +|-- dhcp >> DHCP hooks used by ztp +|-- plugins >> Location for pre-defined plugins +|-- sonic-ztp >> sonic-ztp service launcher file +|-- templates >> Jinja2 templates used to create ZTP configuration profile at run time +|-- ztp-engine.py >> Core ZTP service execution loop +`-- ztp-profile.sh >> Helper script to create/remove ZTP configuration profile + + + +src/usr/lib/python3/dist-packages/ztp/ +|-- DecodeSysEeprom.py >> Helper class to read Syseeprom contents +|-- Downloader.py >> Helper class used to download a file from remote host +|-- JsonReader.py >> Helper class used to read contents of a JSON file as a python dictionary and | | then write back changes to the JSON file +|-- Logger.py >> Logger API to generate syslogs and console logs at each step of the ZTP service | execution +|-- ZTPCfg.py >> Helper class to access contents of the ZTP configuration file */host/ztp/ztp_cfg.json* +|-- ZTPLib.py >> Helper API's used across ZTP libs, ZTP engine and ztp command utility +|-- ZTPObjects.py >> Class which defiles URL, DynamicURL, Identifier and other ojects used in ZTP + +| JSON +|-- ZTPSections.py >> Class to process ZTP JSON and the Configuration Sections which are part of it +`-- defaults.py >> Factory default values of all variables and constants. These values can be over + +​ ridden by defining them in the ZTP configuration file */host/ztp/ztp_cfg.json* + + + +src/usr/bin/ztp >> ZTP command line utility to interface with the ZTP service + -More information will be added to this section after implementation of SONiC-ZTP is complete. ## 8. Logging -All output generated by ZTP service are logged to local syslog and stdout of the service. The *stdout* and *stderr* of executed plugins are are also redirected to syslog and stdout of ZTP service. In addition to this, logs are are also sent to */var/log/ztp.log* file. User can modify */usr/lib/ztp/ztp_cfg.json* to increase or decrease logging priority level. User can also disable logging to file */var/log/ztp.log* or change to which file logs can be written to. Default logging level is assumed to be of level INFO. All messages with severity of INFO or greater are logged. +All output generated by ZTP service are logged to local syslog and stdout of the service. The *stdout* and *stderr* of executed plugins are are also redirected to syslog and stdout of ZTP service. In addition to this, logs are are also sent to */var/log/ztp.log* file. User can modify */host/ztp/ztp_cfg.json* to increase or decrease logging priority level. User can also disable logging to file */var/log/ztp.log* or change to which file logs can be written to. Default logging level is assumed to be of level INFO. All messages with severity of INFO or greater are logged. + + ## 9. Debug -ZTP service can be started in optional debug mode providing more verbose information on steps being performed by the service. Set logging level in */usr/lib/ztp/ztp_cfg.json* to 'DEBUG'. +ZTP service can be started in optional debug mode providing more verbose information on steps being performed by the service. Set logging level in */host/ztp/ztp_cfg.json* to 'DEBUG'. + + ## 10. Examples ### Example #1 -Use this ZTP JSON file which performs following steps on first boot -1. Install firmware -2. Push configuration -3. Run post-provisioning scripts and reboot on success -4. Run connectivity check scripts +1. Use below example ZTP JSON file, the following steps are performed on first boot: + 1. Install SONiC firmware image on to the switch and reboot to start the newly installed image + 2. Download config_db.json associated with the switch. The file name of the config_db.json associated with the switch is stored as *$hostname_config_db.json* at the webroot of the HTTP server with address 192.168.1.1. + 3. Download and run post-provisioning script post_install.sh. The switch is rebooted if post_install.sh script exits with successful exit code 0. + 4. Post boot, connectivity check is performed by pinging hosts 10.1.1.1 and yahoo.com to verify connectivity ```json @@ -732,34 +1138,26 @@ Use this ZTP JSON file which performs following steps on first boot "ztp": { "01-firmware": { "install": { - "url": "http://192.168.1.1:8080/broadcom-sonic-v1.0.bin", - "pre-check": { - "url": "http://192.168.1.1:8080/firmware_check.sh" - }, - "set-default": true - }, - "reboot-on-success": true + "url": "http://192.168.1.1/broadcom-sonic-v1.0.bin" + } }, "02-configdb-json": { "dynamic-url": { "source": { - "prefix": "http://192.168.1.1:8080/", + "prefix": "http://192.168.1.1/", "identifier": "hostname", "suffix": "_config_db.json" - }, - "destination": "/etc/sonic/config_db.json" + } } }, "03-provisioning-script": { "plugin": { - "url":"http://192.168.1.1:8080/post_install.sh" + "url":"http://192.168.1.1/post_install.sh" }, "reboot-on-success": true }, - "04-connectivity-tests": { - "plugin": { - "url": "http://192.168.1.1:8080/ping_test.sh" - } + "04-connectivity-check": { + "ping-hosts": [ "10.1.1.1", "yahoo.com" ] } } } @@ -767,7 +1165,8 @@ Use this ZTP JSON file which performs following steps on first boot ## 11. Future -More predefined plugins can be added as deemed appropriate by wider audience. +- More predefined plugins can be added as deemed appropriate by wider audience. +- Automatic port break out of in-band interfaces to detect an active link while performing ZTP discovery. @@ -777,21 +1176,99 @@ More predefined plugins can be added as deemed appropriate by wider audience. This test plan describes tests to exercise various capabilities of Zero Touch Provisioning (ZTP). The scope of these tests is limited to testing ZTP and not SONiC protocol applications functionality. Certain central features like loading a switch configuration and applying them on the switch, installing new switch image are also part of this test plan. -### 12.2 Test setup + + +### 12.2 Unit Test Suite + +To execute ZTP unit tests which are part of the ZTP source tree, use following instructions + +Install pytest on a SONiC switch. + +``` +apt-get install -y python3-pytest +``` + + + +Run unit test suite on the SONiC switch + +``` +cd /usr/lib/ztp/tests +pytest-3 -v +``` + + + +Run unit test suite with code coverage + +``` +# Install code coverage tool +apt-get install python3-coverage + +# Install depdenent libraries for coverage +apt-get install -y libjs-jquery libjs-jquery-isonscreen \ + libjs-jquery-tablesorter libjs-jquery-throttle-debounce \ + libjs-jquery-hotkeys + + +# Run code coverage using pytest module +cd /usr/lib/ztp/tests +python3-coverage run --append -m pytest -v + +# Generate code coverage report +python3-coverage report +python3-coverage html +``` + + + +Code coverage of the ZTP service code by starting the ZTP service with coverage enabled. Modify /etc/default/ztp to set COVERAGE="yes". + + + +Below are instructions to generate a combined coverage report of unit tests and the ZTP service + +``` +cd /home/admin + +# Coverage information for the ZTP service +cp /.coverage .coverage.service + +# Coverage information from ZTP unit tests +cp /usr/lib/ztp/tests/.coverage .coverage.pytest + +# Combine both covreage information and generate a report +python3-coverage combine + +python3-coverage report --omit="*_pytest*,*test_*,*pkg_resources*,*dist-packages/py/*,*dist-packages/pytest.py,*__init__.py,*testlib.py,testlib.py" + +python3-coverage html --omit="*_pytest*,*test_*,*pkg_resources*,*dist-packages/py/*,*dist-packages/pytest.py,*__init__.py,*testlib.py,testlib.py" +``` + + + +### 12.3 Test setup - Out-of-band Provisioning - DHCP server to assign IP address to management interface (eth0) - Web server to host scripts and other relevant files needed for provisioning + - In-band Provisioning - DHCP server to assign IP address to front-panel ports. Ensure that the DHCP server is one hop away and there is a DHCP relay agent to ensure DHCP requests/response are forwarded to next hop. - DHCP server <-----> [DHCP Relay agent] <-------------> [DUT] - Web server to host scripts and other relevant files needed for provisioning reachable over front panel interfaces. + - Test ZTP JSON files as per test case + - Plugin scripts for processing ZTP JSON + - Switch configuration files (config_db.json, minigraph.xml, acl.json) + - Few different versions of switch firmware images -## 12.3 Test Cases Summary + + +## 12.4 Test Cases Summary 1. Verify that ZTP can be initiated using ZTP JSON URL provided as part of DHCP Option 67 using out-of-band management network. @@ -837,7 +1314,7 @@ This test plan describes tests to exercise various capabilities of Zero Touch Pr 22. Verify that DHCP Option 67 is given precedence over DHCP Option 239 when both of them are provided by the DHCP response. DHCP Option 239 is ignored -23. Verify that ZTP service exits with a failure when ZTP configuration file*/usr/lib/ztp/ztp_cfg.json* is not present or contains incorrect data +23. Verify that ZTP service exits with a failure when ZTP configuration file*/host/ztp/ztp_cfg.json* is not present or contains incorrect data 24. Verify behavior when a ZTP session is in progress and as part of ZTP, install a new SONiC switch image and reboot to newly installed image @@ -855,7 +1332,7 @@ This test plan describes tests to exercise various capabilities of Zero Touch Pr -### 12.4 Test Cases +### 12.5 Test Cases ### Test Case #1 @@ -1097,6 +1574,18 @@ This test plan describes tests to exercise various capabilities of Zero Touch Pr - Using shorthand notation of url object uses its value as source string and does not look for sub objects +- Use FTP url string to download plugin stored on an FTP server + + - All configuration sections are executed in order and ZTP is marked as complete + +- Use TFTP url string to download plugin stored on an TFTP server + + - All configuration sections are executed in order and ZTP is marked as complete + +- Use SFTP/scp url string to download plugin stored on an SFTP/scp server + + - All configuration sections are executed in order and ZTP is marked as complete + ### Test Case #10 @@ -1527,11 +2016,11 @@ This test plan describes tests to exercise various capabilities of Zero Touch Pr ### Test Case #23 -**Objective:** Verify that ZTP service exits with a failure when ZTP configuration file */usr/lib/ztp/ztp_cfg.json* is not present or contains incorrect data +**Objective:** Verify that ZTP service exits with a failure when ZTP configuration file */host/ztp/ztp_cfg.json* is not present or contains incorrect data **Test Steps:** -- Modfy contents with invalid values, incorrect JSON format or delete the file /usr/lib/ztp/ztp_cfg.json +- Modfy contents with invalid values, incorrect JSON format or delete the file /host/ztp/ztp_cfg.json - Start ztp service **Expected Results:** @@ -1697,20 +2186,21 @@ This test plan describes tests to exercise various capabilities of Zero Touch Pr **Additional Tests:** -- Use various levels of severity to log-level-stdout field in /usr/lib/ztp/ztp_cfg.json file +- Use various levels of severity to log-level-stdout field in /host/ztp/ztp_cfg.json file - Messages with severity level equal to or higher than set level show up in /var/log/syslog -- Use various levels of severity to log-level-file field in /usr/lib/ztp/ztp_cfg.json file +- Use various levels of severity to log-level-file field in /host/ztp/ztp_cfg.json file - Messages with severity level equal to or higher than set level show up in /var/log/ztp.log - Use different logging priority level values for log-level-stdout and log-level-file - Observe that log contents are different based on their set values - User invalid logging level string - Observe that logging is performed assuming default level as INFO and ignoring the invalid value provided by user -- Remove "log-level-file" field from /usr/lib/ztp/ztp_cfg.json +- Remove "log-level-file" field from /host/ztp/ztp_cfg.json - Observe that no logs are generated to file ''/var/log/ztp.log' -- Remove "log-level-stdout" or "log-level-file" fields from /usr/lib/ztp/ztp_cfg.json +- Remove "log-level-stdout" or "log-level-file" fields from /host/ztp/ztp_cfg.json - Observe that logging is performed assuming default level as INFO - Issue 'systemctl status -l' command - Contents of output are same as the ZTP logs sent to syslog - Create a configuration task which generates a huge volume of output continuously and does not exit for a very long time, may be days - Ensure that all the volumes of data is logged to /var/log/ztp.log and /var/log/syslog without any interruption to ztp process - Run these tests continuously and use logrotate to truncate the log files so that switch does not run out of disk space + diff --git a/images/debug_framework_block_diagram.png b/images/debug_framework_block_diagram.png new file mode 100644 index 0000000000..c4357fde09 Binary files /dev/null and b/images/debug_framework_block_diagram.png differ diff --git a/images/debug_framework_flow_diagram.png b/images/debug_framework_flow_diagram.png new file mode 100644 index 0000000000..59ece8def8 Binary files /dev/null and b/images/debug_framework_flow_diagram.png differ diff --git a/images/dut_monitor_hld/Dut_monitor_ssh.jpg b/images/dut_monitor_hld/Dut_monitor_ssh.jpg new file mode 100755 index 0000000000..4e05acd67d Binary files /dev/null and b/images/dut_monitor_hld/Dut_monitor_ssh.jpg differ diff --git a/images/dut_monitor_hld/Load_flaw.jpg b/images/dut_monitor_hld/Load_flaw.jpg new file mode 100755 index 0000000000..0902650dac Binary files /dev/null and b/images/dut_monitor_hld/Load_flaw.jpg differ diff --git a/images/platform/initialcreationflow.png b/images/platform/initialcreationflow.png new file mode 100644 index 0000000000..b333618e3c Binary files /dev/null and b/images/platform/initialcreationflow.png differ diff --git a/images/platform/pddf_device_driver_psu.png b/images/platform/pddf_device_driver_psu.png new file mode 100644 index 0000000000..a113dc7d54 Binary files /dev/null and b/images/platform/pddf_device_driver_psu.png differ diff --git a/images/platform/pddf_generic_plugin_psu.png b/images/platform/pddf_generic_plugin_psu.png new file mode 100644 index 0000000000..8e437b330c Binary files /dev/null and b/images/platform/pddf_generic_plugin_psu.png differ diff --git a/images/platform/pddf_hld1.png b/images/platform/pddf_hld1.png new file mode 100644 index 0000000000..3368a822da Binary files /dev/null and b/images/platform/pddf_hld1.png differ diff --git a/images/platform/pddf_topo_psu.png b/images/platform/pddf_topo_psu.png new file mode 100644 index 0000000000..4a4322f211 Binary files /dev/null and b/images/platform/pddf_topo_psu.png differ diff --git a/images/platform/pdebuildcontainers.png b/images/platform/pdebuildcontainers.png new file mode 100644 index 0000000000..2506678df8 Binary files /dev/null and b/images/platform/pdebuildcontainers.png differ diff --git a/images/platform/sonicbuildcontainers.png b/images/platform/sonicbuildcontainers.png new file mode 100644 index 0000000000..3ad6dee00e Binary files /dev/null and b/images/platform/sonicbuildcontainers.png differ diff --git a/images/platform/sonicpdeoverview.png b/images/platform/sonicpdeoverview.png new file mode 100644 index 0000000000..f5d3c9aa31 Binary files /dev/null and b/images/platform/sonicpdeoverview.png differ diff --git a/images/sflow/sflow_architecture.png b/images/sflow/sflow_architecture.png new file mode 100644 index 0000000000..31c629fa10 Binary files /dev/null and b/images/sflow/sflow_architecture.png differ diff --git a/images/sflow/sflow_config_and_control.png b/images/sflow/sflow_config_and_control.png new file mode 100644 index 0000000000..16e9f5c438 Binary files /dev/null and b/images/sflow/sflow_config_and_control.png differ diff --git a/images/sflow/sflow_disable.png b/images/sflow/sflow_disable.png new file mode 100644 index 0000000000..d1163d96d9 Binary files /dev/null and b/images/sflow/sflow_disable.png differ diff --git a/images/sflow/sflow_enable.png b/images/sflow/sflow_enable.png new file mode 100644 index 0000000000..c8a4b237c1 Binary files /dev/null and b/images/sflow/sflow_enable.png differ diff --git a/images/sflow/sflow_intf_del.png b/images/sflow/sflow_intf_del.png new file mode 100644 index 0000000000..8457c878f1 Binary files /dev/null and b/images/sflow/sflow_intf_del.png differ diff --git a/images/sflow/sflow_intf_disable.png b/images/sflow/sflow_intf_disable.png new file mode 100644 index 0000000000..d1fb8aaef3 Binary files /dev/null and b/images/sflow/sflow_intf_disable.png differ diff --git a/images/sflow/sflow_intf_disable_all.png b/images/sflow/sflow_intf_disable_all.png new file mode 100644 index 0000000000..dcca1f343d Binary files /dev/null and b/images/sflow/sflow_intf_disable_all.png differ diff --git a/images/sflow/sflow_intf_rate.png b/images/sflow/sflow_intf_rate.png new file mode 100644 index 0000000000..ce51163830 Binary files /dev/null and b/images/sflow/sflow_intf_rate.png differ diff --git a/images/sflow/sflow_sample_packet_flow.png b/images/sflow/sflow_sample_packet_flow.png new file mode 100644 index 0000000000..191d95f297 Binary files /dev/null and b/images/sflow/sflow_sample_packet_flow.png differ diff --git a/images/teamsyncd_design.jpg b/images/teamsyncd_design.jpg new file mode 100644 index 0000000000..1a6f8213f3 Binary files /dev/null and b/images/teamsyncd_design.jpg differ diff --git a/images/thermal-control.svg b/images/thermal-control.svg new file mode 100644 index 0000000000..bbd7d4104d --- /dev/null +++ b/images/thermal-control.svg @@ -0,0 +1,2 @@ + +
Yes
Yes
No
No
stop loop?
stop loop?
Check fan presence and take action
Check fan presence and take action
Check PSU presence and take action
Check PSU presence and take action
Run vendor specific thermal control algorithm
Run vendor specific thermal control algorithm
End
End
Vendor specific Init
Vendor specific Init
vendor specific clean up
vendor specific clean up
\ No newline at end of file diff --git a/images/vrf_hld/Multi-VRF_Deployment.png b/images/vrf_hld/Multi-VRF_Deployment.png new file mode 100755 index 0000000000..0c0a4a34ce Binary files /dev/null and b/images/vrf_hld/Multi-VRF_Deployment.png differ diff --git a/images/vrf_hld/vrf_t0_topo.png b/images/vrf_hld/vrf_t0_topo.png new file mode 100644 index 0000000000..d620c5b3f3 Binary files /dev/null and b/images/vrf_hld/vrf_t0_topo.png differ diff --git a/index.html b/index.html index d4d88ccaae..f31aa34318 100644 --- a/index.html +++ b/index.html @@ -73,6 +73,7 @@
  • Features
  • Architecture
  • Roadmap
  • +
  • Supported Devices And Platforms
  • Source Code And License
  • @@ -84,20 +85,29 @@
  • Configuration
  • Command Reference
  • FAQ
  • +
  • NewLetters
  • - -
  • Join Community
  • + @@ -215,29 +225,25 @@

    Rapidly Growing Ecosystem

    -

    Upcoming OCP Events

    +

    Upcoming OCP Events

    - img +
    img
    -



    -

        OCP China Day - Beijing - 24th June 2019    - Join -

    +


    +

               OCP Global Summit - San Jose - 4-5 March 2020   
    +              Join




        

    -

        OCP Workshop - Japan - 27th June 2019    - Join -

    -




    +




    -

    SONiC Video From OCP Summit 2019

    - +

    SONiC Video From OCP Summit 2019

    +
    @@ -254,8 +260,8 @@

    SONiC Video Fro
    -
    - + diff --git a/menu.html b/menu.html new file mode 100644 index 0000000000..dd365ee6ce --- /dev/null +++ b/menu.html @@ -0,0 +1,66 @@ + + \ No newline at end of file diff --git a/newsletters.html b/newsletters.html new file mode 100644 index 0000000000..8f9acd887a --- /dev/null +++ b/newsletters.html @@ -0,0 +1,144 @@ + + + + + + + SONiC | Home + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    +
    + + + + + +
    +
    +
    +
    +
    +

    NEWSLETTERS

    +
    +
    +
    +
    +
    + + + +
    +
    + +
    +
    + + + + +
    + + + + +
    + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/pdf/newsletters/SONiC_newsletter_2019_08.pdf b/pdf/newsletters/SONiC_newsletter_2019_08.pdf new file mode 100644 index 0000000000..f20de1da9c Binary files /dev/null and b/pdf/newsletters/SONiC_newsletter_2019_08.pdf differ diff --git a/pdf/newsletters/SONiC_newsletter_2019_10.html b/pdf/newsletters/SONiC_newsletter_2019_10.html new file mode 100644 index 0000000000..17f0361c18 --- /dev/null +++ b/pdf/newsletters/SONiC_newsletter_2019_10.html @@ -0,0 +1,519 @@ + + + + + + + + + + +

    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

    + + + + + + + + + + + + + + + +
    + logo + + October 2019 Newsletter +
    + + RECENT EVENTS + + + + SONiC COMMUNITY NEWS + +
    +
    +

    + + + +

    +

    + SONiC Together +

    +

    + + + +

    +

    + SONiC - Reliability, Manageability and Extensibility +

    +

    + + +     + + + +

    +

    + SONiC Ansible Test Automation +

    +

    + + +     + + + +

    +

    + Enabling PIM card hot swapping in SONiC for Minipack +

    +

    + + +     + + + +

    +
    +

    + + + +

    +

    + SONiC at Comcast Datacenters +

    +

    + + +     + + + +

    +
    +
    +

    + SONiC is expanding outside Cloud - Comcast is deploying SONiC in their datacenter and building an AI based network monitoring on top of SONiC with Augtera Networks !!!   + + + +   + + + +
    Mellanox Announces Support Solutions for SONiC Open Source Network Operating System.   + + + +
    SONiC test sub-group has been formed with the mission of building a sophisticated test suite. +

    +
    + + SONiC RELEASE UPDATES + +
    +

    + SONiC community is working hard on 201910 release, which will reach code complete in November. + +
    The community has been focusing on multiple major changes in the recent months and happy to share the status. +
    1. HLD reviews for all the features planned for 201910 are completed. +
    2. Code Pull Request Progress - Approximately 50% merged, 20% under review and 30% in progress. +
    3. Code freeze is planned by end of November 2019. +
    4. Joint test is planned by multiple contributors in the month of November. +
    +
    + SAI community has released SAI 1.5 in September 2019 that includes the new features like NAT, TAM2.0, sflow, SAI counters and debug counters. This SAI 1.5 is included in this SONiC 201910 release.    + + + + +
    SAI community is happy to announce that community is working on gearbox and MACsec API standardization. + +
    Next release : Planning started to build the backlog. +

    +
    + + UPCOMING EVENTS + + + + SONiC DESIGN DISCUSSIONS + +
    +
    +

    + + + +

    +

    + 4-5 March, 2020 +

    +

    + San Jose Convention Center +

    +

    +
    + + + +

    +

    + 30 Sep - 1 Oct, 2020 +

    +

    + Prague, Czech Republic +

    +
    +
    +
    +
    +

    + 03 SEP +

    +
    +

    + + + +    Changes in BGP error handling    +

    +
    +

    + 03 SEP +

    +
    +

    + + + +   Error handling framework     +

    +
    +

    + 10 SEP +

    +
    +

    + + + +        Configurable Drop Counters in SONiC   +

    +
    +

    + 17 SEP +

    +
    + + + +   High Level Design for fwutil   +
    +

    + 24 SEP +

    +
    +

    + + + +   Dynamic port breakout     +

    +
    +

    + 15 OCT +

    +
    +

    + + + +   Tech support data export & core file manager  +

    +

    + 22 OCT +

    +
    +

    + + + +   VRRP HLD       +

    +
    +

    + 29 OCT +

    +
    +

    + + + +   RADIUS Management User Authentication HLD  +

    +
    + + NEW PLATFORMS + +
    +

    + Accton +

    +
    +

    + AS5812-54T +

    +
    +

    + Arista +

    +
    +

    + 7280CR3-C40 +

    +
    +

    + Celestica +

    +
    +

    + DX010-C32, DX010-D48C8, midstone-200i +

    +
    +

    + DellEMC +

    +
    +

    + S5248f +

    +
    +

    + Delta +

    +
    +

    + ET-C032if

    +
    +

    + Inventec +

    +
    +

    + D7332 +

    +
    +

    + Mellanox +

    +
    +

    + SN3800 +

    +
    + + + diff --git a/pdf/newsletters/SONiC_newsletter_2019_12.html b/pdf/newsletters/SONiC_newsletter_2019_12.html new file mode 100644 index 0000000000..f2ed83f75f --- /dev/null +++ b/pdf/newsletters/SONiC_newsletter_2019_12.html @@ -0,0 +1,252 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
       
    + Bi-Monthly Newsletter - December 2019 +
    + UPCOMING EVENTS + + SONiC & SAI 2019 HIGHLIGHTS +
    +
    +


    +

    + + 2020 OCP Global Summit + +

    + +

    + 4-5 March, 2020 +

    +

    + San Jose Convention Center +

    +
    +

    +


    +

    + Pre-Summit Workshop & Hackathon +

    + +

    + 2-3 March, 2020 +

    +

    + Await for more details +

    +
    +

    +
    +

    It has been an amazing year 2019 for SONiC ! + Thanks to the community for the excellent achievement of 2 releases in this year with 27 new features, 2480 closed pull requests from 200+ active contributors ! +

    +

    It has been a great achievement in standardizing the & as part of SAI ! +

    +

    Successfully hosted & participated in 1 hackathon, 4 workshops & 2 OCP summits and exhibited SONiC capabilities to the world ! +

    +

    Created 7 workgroups to have a focused feature specific community discussions and contributions ! +

    +

    Looking forward for another successful year 2020 ! +

    +

       +

    +
    + SONiC COMMUNITY NEWS +
    +

    + Cisco announced supporting of SONiC and SAI on Silicon One and 8000 Series, a high density 400GE new platform for hyperscale providers. For details +

    +

    Mellanox describes its full support for SONiC as "ASIC to protocol" that includes the service calls, troubleshooting and bug fixes for customers. For details .

    +

       +

    +
    + SONiC RELEASE UPDATES +
    +

    + SONiC release branch 201911 has been created, branch stabilization is in progress. +

    +

    + Features Merged - Build time improvements, Configurable drop counters, Egress mirroring and ACL action support check via SAI, HW resource monitor,L3 perf enhancement, Log analyzer to pytest, Management Framework, Management VRF, Platform test, sFlow, SSD diagnostic tolling and Sub-port support have been merged.
    + Pending Features - MLAG,VRF and ZTP. +

    +

    For current status of all features

    +
    + +
    NEW PLATFORMS
    +
    +
    + SONiC DESIGN DISCUSSIONS +
    + +
      +
    • Arista - 7280CR3K-32P4
    • +
    • DellEMC - z9332f-32x400G
    • +
    • Mellanox - SN3800-D112C8
    • +
    +
    +
    05 NOV 2019 +

    + + +    Support for DPKG local caching    +

    +
    12 NOV 2019 +

    + + +   DPKG Caching Framework - BRCM     +

    +
    19 NOV 2019 +

    + + +   Thermal control design   +

    +
    26 NOV 2019 +

    + + +   Release tracking status discussion  ' +

    +
    03 DEC 2019 +

    + + +   Release tracking status      +

    +
    10 DEC 2019 +

    + + +   Release tracking status discussion  +

    +
    17 DEC 2019 +

    + + +   PCI-e diag design specification       +

    +
     
    + + \ No newline at end of file diff --git a/previous_presentations.html b/previous_presentations.html new file mode 100644 index 0000000000..1a4b32beaa --- /dev/null +++ b/previous_presentations.html @@ -0,0 +1,148 @@ + + + + + + + SONiC | Home + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    +
    + + + + + +
    +
    +
    +
    +
    +

    PREVIOUS PRESENTATIONS

    +
    +
    +
    +
    +
    + + + +
    + +
    + + + + +
    + + + + +
    + + + + + + + + + + + + + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/sonic-acl.yang b/sonic-acl.yang new file mode 100644 index 0000000000..4081d1d8ad --- /dev/null +++ b/sonic-acl.yang @@ -0,0 +1,252 @@ +module sonic-acl { + namespace "http://github.com/Azure/sonic-acl"; + prefix sacl; + yang-version 1.1; + + import ietf-yang-types { + prefix yang; + } + + import ietf-inet-types { + prefix inet; + } + + import sonic-common { + prefix scommon; + } + + import sonic-port { + prefix prt; + } + + import sonic-portchannel { + prefix spc; + } + + import sonic-mirror-session { + prefix sms; + } + + import sonic-pf-limits { + prefix spf; + } + + organization + "BRCM"; + + contact + "BRCM"; + + description + "SONIC ACL"; + + revision 2019-05-15 { + description + "Initial revision."; + } + + container sonic-acl { + scommon:db-name "CONFIG_DB"; + + list ACL_TABLE { + key "aclname"; + scommon:key-delim "|"; + scommon:key-pattern "ACL_TABLE|{aclname}"; + + /* must "count(/prt:sonic-port/prt:PORT) > 0"; */ + + leaf aclname { + type string; + } + + leaf policy_desc { + type string { + length 1..255 { + error-app-tag policy-desc-invalid-length; + } + } + } + + leaf stage { + type enumeration { + enum INGRESS; + enum EGRESS; + } + } + + leaf type { + type enumeration { + enum MIRROR; + enum L2; + enum L3; + enum L3V6; + } + } + + leaf-list ports { + /*type union { */ + type leafref { + path "/prt:sonic-port/prt:PORT/prt:ifname"; + } + /* type leafref { + path "/spc:sonic-portchannel/spc:PORTCHANNEL/spc:name"; + } + }*/ + } + } + + list ACL_RULE { + key "aclname rulename"; + scommon:key-delim "|"; + scommon:key-pattern "ACL_RULE|{aclname}|{rulename}"; + scommon:pf-check "ACL_CheckAclLimits"; + + /* Limit for number of dynamic ACL rules */ + /*must "count(../ACL_RULE) > /spf:sonic-pf-limits/acl/MAX_ACL_RULES" { + error-message "Number of ACL rules reached max platform limit."; + } + must "PRIORITY > /spf:sonic-pf-limits/acl/MAX_PRIORITY" { + error-message "Invalid ACL rule priority."; + } + must "count(../ACL_TABLE) > 0 and count(/prt:sonic-port/prt:PORT) > 0"; //Temporary work-around + */ + + leaf aclname { + type leafref { + path "../../ACL_TABLE/aclname"; + } + must "(/scommon:operation/scommon:operation != 'DELETE') or " + + "count(../../ACL_TABLE[aclname=current()]/ports) = 0" { + error-message "Ports are already bound to this rule."; + } + } + + leaf rulename { + type string; + } + + leaf PRIORITY { + type uint16 { + range "1..65535"; + } + } + + leaf RULE_DESCRIPTION { + type string; + } + + leaf PACKET_ACTION { + type enumeration { + enum FORWARD; + enum DROP; + enum REDIRECT; + } + } + + leaf MIRROR_ACTION { + type leafref { + path "/sms:sonic-mirror-session/sms:MIRROR_SESSION/sms:name"; + } + } + + leaf IP_TYPE { + type enumeration { + enum any; + enum ip; + enum ipv4; + enum ipv4any; + enum non_ipv4; + enum ipv6any; + enum non_ipv6; + } + } + + leaf IP_PROTOCOL { + type uint8 { + range "1|2|6|17|46|47|51|103|115"; + } + } + + leaf ETHER_TYPE { + type string{ + pattern "(0x88CC)|(0x8100)|(0x8915)|(0x0806)|(0x0800)|(0x86DD)|(0x8847)"; + } + } + + choice ip_src_dst { + case ipv4_src_dst { + leaf SRC_IP { + mandatory true; + type inet:ipv4-prefix; + } + leaf DST_IP { + mandatory true; + type inet:ipv4-prefix; + } + } + case ipv6_src_dst { + leaf SRC_IPV6 { + mandatory true; + type inet:ipv6-prefix; + } + leaf DST_IPV6 { + mandatory true; + type inet:ipv6-prefix; + } + } + } + + choice src_port { + case l4_src_port { + leaf L4_SRC_PORT { + type uint16; + } + } + case l4_src_port_range { + leaf L4_SRC_PORT_RANGE { + type string { + pattern "[0-9]{1,5}(-)[0-9]{1,5}"; + } + } + } + } + + choice dst_port { + case l4_dst_port { + leaf L4_DST_PORT { + type uint16; + } + } + case l4_dst_port_range { + leaf L4_DST_PORT_RANGE { + type string { + pattern "[0-9]{1,5}(-)[0-9]{1,5}"; + } + } + } + } + + leaf TCP_FLAGS { + type string { + pattern "0[xX][0-9a-fA-F]{2}[/]0[xX][0-9a-fA-F]{2}"; + } + } + + leaf DSCP { + type uint8; + } + } + + container state { + config false; + + leaf MATCHED_PACKETS { + type yang:counter64; + } + + leaf MATCHED_OCTETS { + type yang:counter64; + } + } + } +} diff --git a/sourcecode.md b/sourcecode.md index eb58449883..72efae900c 100644 --- a/sourcecode.md +++ b/sourcecode.md @@ -1,44 +1,171 @@ -# SONiC Source Repositories - -## Imaging and Building tools -- https://github.com/Azure/sonic-buildimage - - Source to build an installable SONiC image - -## SAI, Switch State Service -- https://github.com/Azure/sonic-swss - - Switch State Service - Core component of SONiC which processes network switch data -- https://github.com/Azure/sonic-swss-common - - Switch State Service common library - Common library for Switch State Service -- https://github.com/opencomputeproject/SAI - - Switch Abstraction Interface standard headers -- https://github.com/Azure/sonic-sairedis - - C++ library for interfacing to SAI objects in Redis -- https://github.com/Azure/sonic-dbsyncd - - Python Redis common functions for LLDP -- https://github.com/Azure/sonic-py-swsssdk - - Python switch state service library -- https://github.com/Azure/sonic-quagga - - Fork of Quagga Software Routing Suite for use with SONiC - -## Monitoring and management tools -- https://github.com/Azure/sonic-mgmt - - Management and automation code used for build, test and deployment automation -- https://github.com/Azure/sonic-utilities - - Various command line utilities used in SONiC -- https://github.com/Azure/sonic-snmpagent - - A net-snmpd agentx subagent - -## Switch hardware drivers -- https://github.com/Azure/sonic-linux-kernel - - Kernel patches for various device drivers -- https://github.com/Azure/sonic-platform-common - - API for implementing platform-specific functionality in SONiC -- https://github.com/Azure/sonic-platform-daemons - - Daemons for controlling platform-specific functionality in SONiC -- https://github.com/celestica-Inc/sonic-platform-modules-cel -- https://github.com/edge-core/sonic-platform-modules-accton -- https://github.com/Azure/sonic-platform-modules-s6000 -- https://github.com/Azure/sonic-platform-modules-dell -- https://github.com/aristanetworks/sonic -- https://github.com/Ingrasys-sonic/sonic-platform-modules-ingrasys - +# SONiC Source Repositories + + +## Imaging and Building tools + +### sonic-buildimage +- https://github.com/Azure/sonic-buildimage + - Main repo that contains SONiC code,links to all sub-repos, build related files, platform/device specific files, etc., + This repo has the following directories. + - device - It contains files specific to each vendor device. In general, it contains the python scripts for accessing EEPROM, SFP, PSU, LED,etc., specific to the device hardware. + - dockers - This folder contains sub-folders for all dockers running in the SONiC. Each of those sub-folders contains files that explains about the processes that need to run inside that docker. List of dockers and the processes running inside each docker is given at the end of this document. + - files - Contain multiple sub-folders required for building and running SONiC services. + (a) Aboot, (b) apt - few default files related to for "apt-get" ,"apt-*" applications (c) build_templates - Contains the jinja2 template files to generate (as part of "build process") the systemd services files required for starting the various dockers. It also contains the file sonic_debian_extension.j2 is used in "build process"; it copies the required files and installs the required packages in the "/fsroot" which is built ass part of the SONiC image. + (d) dhcp - Contains the config file for dhcp client & exit hook scripts, (e) docker - Contains the "docker" file (related to docker-engine) that is extracted from docker-ce 17.03.0\~ce-0\~debian-stretch to enable 'service docker start' in the build chroot env. + (f) image_config - Contains sub-folders like apt (non debian packages related info), bash (for bashrc), caclmgrd (control plane ACL manager daemon), cron.d (logrotate), ebtables (filter), environment (vtysh,banner), fstrim, hostcfgd (tacacs+ config handler), hostname (service to handle hostname config), interfaces (service to handle interface related config changes), logrotate, ntp (ntp service with conf file j2 file), platform (rc.local file), rsyslog (service for syslog & j2 file),snmp (snmp.yml file), sudoers (sudo users & permissions), systemd (journald.conf file), updategraph (script for getting minigraph and installing it), warmboot-finalizer (script used during warmreboot). + (g) initramfs-tools - Contains files related to ramfs, (h) scripts - contains scripts for arp_update (gratuitous ARP/ND), swss, syncd, etc., (i) sshd - SSH service and keygen script + + - installer - contains scripts that are used by onie-mk-demo script that is called as part of build_image.sh + - rules - contains the "config" file where the build options can be modified, contains *.mk makefiles that contains the required marcros for building the image. + - platform - contains sub-folders for all platforms like "barefoot", "broadcom", "cavium", "centec", "marvell", "mellanox", "nephos", "p4", "vs" (virtual switch). + Each of those platform folder contains code specific to the hardware device from each vendors. It includes the required kernel drivers, platform sensors script for fetching data from hardware devices, etc., + - sonic-slave, sonic-slave-stretch - Contains the main Dockerfile that lists the various debian packages that are required for various features. + - src - contains sub-folders for features like bash, gobgp, hiredis, initramfs-tools, iproute2, isc-dscp, ixgbe, libnl3, libteam, + libyang, lldpd, lm-sensors, mpdecimal, python-click, python3, radvd - Router advertisement for IPv6, redis, smartmontools, + snmpd, socat, sonic-config-engine, sonic-daemon-base, sonic-device-data, sonic-frr (routing software with patches), supervisor, + swig, tacacs, telemetry and thrift. + + +## SAI, Switch State Service + +### sonic-swss +- https://github.com/Azure/sonic-swss + - Switch State Service - Core component of SONiC which processes network switch data - The SWitch State Service (SWSS) is a collection of software that provides a database interface for communication with and state representation of network applications and network switch hardware. + + - This repository contains the source code for the swss container, teamd container & bgp container shown in the [architecture diagram](https://github.com/Azure/SONiC/blob/master/images/sonic_user_guide_images/section4_images/section4_pic1_high_level.png "High Level Component Interactions") + - When swss container is started, start.sh starts the processes like rsyslogd, orchagent, restore_neighbors, portsyncd, neighsyncd, swssconfig, vrfmgrd, vlanmgrd, intfmgrd, portmgrd, buffermgrd, enable_counters, nbrmgrd, vxlanmgrd & arp_update. + + SWWS repository contains the source code for the following. + - cfgmgr - This directory contains the code to build the following processes that run inside swss container. More details about each deamon is available in the [architecture document](https://github.com/Azure/SONiC/wiki/Architecture). + - nbrmgrd - manager for neighbor management - Listens to neighbor-related changes in NEIGH_TABLE in ConfigDB for static ARP/ND configuration and also to trigger proactive ARP (for potential VxLan Server IP address by not specifying MAC) and then uses netlink to program the neighbors in linux kernel. nbrmgrd does not write anything in APP_DB. + - portmgrd - manager for Port management - Listens to port-related changes in ConfigDB and sets the MTU and/or AdminState in kernel using "ip" commands and also pushes the same to APP_DB. + - buffermgrd - manager for buffer management - Reads buffer profile config file and programs it in ConfigDB and then listens (at runtime) for cable length change and speed change in ConfigDB, and sets the same into buffer profile table ConfigDB. + - teammgrd - team/portchannel management - Listens to portchannel related config changes in ConfigDB and runs the teamd process for each port channel. Note that teammgrd will be executed inside teamd container (not inside swss container). + - intfmgrd - manager for interfaces - Listens for IP address changes and VRF name changes for the interfaces in ConfigDB and programs the same in linux using "/sbin/ip" command and writes into APP_DB. + - vlanmgrd - manager for VLAN - Listens for VLAN related changes in ConfigDB and programs the same in linux using "bridge" & "ip" commands and and writes into APP_DB + - vrfmgrd - manager for VRF - Listens for VRF changes in ConfigDB and programs the same in linux and writes into APP_DB. + + - fpmsyncd - this folder contains the code to build the "fpmsynd" process that runs in bgp container. This process runs a TCP server and listens for messages from Zebra for route changes (in the form of netlink messages) and it writes the routes to APP_DB. It also waits for clients to connect to it and then provides the route updates to those clients. + - neighsyncd - this folder contains the code to build the "neighsyncd" process. Listens for ARP/ND specific netlink messages from kernel for dynamically learnt ARP/ND and programs the same into APP_DB. + - portsyncd - this folder contains the code to build the "portsyncd" process. It first reads port list from configDB/ConfigFile and adds them to APP_DB. Once if the port init process is completed, this process receives netlink messages from linux and it programs the same in STATE_DB (state OK means port creation is successful in linux). + - swssconfig - this folder creates two executables, viz, swssconfig and swssplayer. + - "swssconfig" runs during boot time only. It restores FDB and ARP table during fast reboot. It takes the port config, copp config, IP in IP (tunnel) config and switch (switch table) config from the ConfigDB and loads them into APP_DB. + - "swssplayer" - this records all the programming that happens via the SWSS which can be played back to simulate the sequence of events for debugging or simulating an issue. + - teamsyncd - allows the interaction between “teamd” and south-bound subsystems. It listens for messages from teamd software and writes the output into APP_DB. + - orchagent - The most critical component in the Swss subsystem. Orchagent contains logic to extract all the relevant state injected by *syncd daemons, process and massage this information accordingly, and finally push it towards its south-bound interface. This south-bound interface is yet again another database within the redis-engine (ASIC_DB), so as we can see, Orchagent operates both as a consumer (for example for state coming from APPL_DB), and also as a producer (for state being pushed into ASIC_DB). + + +### sonic-swss-common +- https://github.com/Azure/sonic-swss-common + - Switch State Service common library - Common library for Switch State Service + +### Opencomputeproject/SAI +- https://github.com/opencomputeproject/SAI (Switch Abstraction Interface standard headers) + - This repo refers/uses the SAI sub-repo from OCP github that includes the required SAI header files. + +### sonic-sairedis +- https://github.com/Azure/sonic-sairedis + - This repo contains the C++ library code for interfacing to SAI objects in Redis + - The SAI Redis provides a SAI redis service that built on top of redis database. + - It contains two major components + - a SAI library that puts SAI objects into the ASIC_DB and + - a syncd process that takes the SAI objects and puts them into the ASIC. + - It also contains the sub-folders "saiplayer" (that records all the actions from orchagent that results in making the SAI API calls to ASIC), "saidump" ( tool to dump the ASIC contents) + - Note that the SAI library for the specific platform is not part of this repo. The SAI library is built using the sonic-buildimage/platform//*sai.mk (slave.mk includes the platform//rules.mk that in turn includes the *sai.mk that installs the required SAI debians). + + +### sonic-dbsyncd +- https://github.com/Azure/sonic-dbsyncd + - Python Redis common functions for LLDP + - This repo contains the code for SONiC Switch State Service sync daemon for LLDP data. Scripts upload lldp information to Redis DB + + +### sonic-py-swsssdk +- https://github.com/Azure/sonic-py-swsssdk + - This repo contains python utility library for SWSS DB access. + - configdb.py - This provides utilities like ConfigDBConnector, db_connect, connect, subscribe, listen, set_entry, mod_entry, get_entry, get_keys, get_table, delete_table, mod_config, get_config, etc., + - dbconnector.py - It contains utilities like SonicV1Connector, SonicV2Connector, etc., + - exceptions.py - It contains utilities like SwssQueryError, UnavailableDataError, MissingClientError, etc., + - interface.py - It contains utilities like DBRegistry, DBInterface, connect, close, get_redis-client, publish, expire, exists, keys, get, get_all, set, delete, etc., + - port_util.py - It contains utilities like get_index, get_interface_oid_map, get_vlan_id_from_bvid, get_bridge_port_map, etc., + - util.py - It contains utilities like process_options, setup_logging, etc., + + +### sonic-quagga +- https://github.com/Azure/sonic-quagga/tree/debian/0.99.24.1 + - This repo contains code for the Quagga routing software which is a free software that manages various IPv4 and IPv6 routing protocols. Currently Quagga supports BGP4, BGP4+, OSPFv2, OSPFv3, RIPv1, RIPv2, and RIPng as well as very early support for IS-IS. + + +## Monitoring and management tools + +### sonic-mgmt +- https://github.com/Azure/sonic-mgmt + - Management and automation code used for build, test and deployment automation + +### sonic-utilities +- https://github.com/Azure/sonic-utilities + - This repository contains the code for Command Line Interfaces for SONiC. + - Folders like "config", "show", "clear" contain the CLI commands + - Folders like "scripts", "sfputil", "psuutil" & "acl_loader" contain the scripts that are used by the CLI commands. These scripts are not supposed to be directly called by user. All these scripts are wrapped under the "config" and "show" commands. + - "connect" folder and "consutil" folder is used for scripts to connec to other SONiC devices and manage them from this device. + - crm folder contains the scripts for CRM configuration and show commands. These commands are not wrapped under "config" and "show" commands. i.e. users can use the "crm" commands directly. + - pfc folder contains script for configuring and showing the PFC parameters for the interface + - pfcwd folder contains the PFC watch dog related configuration and show commands. + - utilities-command folder contains the scripts that are internally used by other scripts. + + +### sonic-snmpagent +- https://github.com/Azure/sonic-snmpagent + - This repo contains the net-snmpd AgentX SNMP subagent implementation for supporting the MIBs like MIB-II, Physical Table MIB, Interfaces MIB, Sensor Table MIB, ipCidrRouteDest table in IP Forwarding Table MIB, dot1qTpFdbPort in Q-BRIDGE-MIB & LLDP MIB. + - The python scripts present in this repo are used as part of the "snmp" docker that runs in SONiC. + + +## Switch hardware drivers + +### sonic-linux-kernel +- https://github.com/Azure/sonic-linux-kernel +- This repo contains the Kernel patches for various device drivers. +- This downloads the appropriate debian kernel code, applies the patches and builds the custom kernel for SONiC. + + +### sonic-platform-common +- https://github.com/Azure/sonic-platform-common + - This repo contains code which is to be shared among all platforms for interfacing with platform-specific peripheral hardware. + - It contains the APIs for implementing platform-specific functionality in SONiC + - It provides the base class for peripherals like EEPROM, LED, PSU, SFP, chassis, device, fan, module, platform, watchdog, etc., that are used for existing platform code as well as for the new platform API. + - Platform specific code present in sonic-buildimage repo (device folder) uses the classes defined in this sonic-platform-common repository. + - New platform2.0 APIs are defined in the base classes inside "sonic_platform_base" folder. + +### sonic-platform-daemons +- https://github.com/Azure/sonic-platform-daemons + - This repo contains the daemons for controlling platform-specific functionality in SONiC + - This repo contains python scripts for platform daemons that listens for events from Optics, LED & PSU and writes them in the STATE_DB + - xcvrd - This listens for SFP events and writes the status to STATE_DB. + - ledd - This listens for LED events and writes the status to STATE_DB. + - psud - This listens for PSU events and writes the status to STATE_DB. + + +### Other Switch Hardware Drivers (Deprecated) +- https://github.com/celestica-Inc/sonic-platform-modules-cel +- https://github.com/edge-core/sonic-platform-modules-accton +- https://github.com/Azure/sonic-platform-modules-s6000 +- https://github.com/Azure/sonic-platform-modules-dell +- https://github.com/aristanetworks/sonic +- https://github.com/Ingrasys-sonic/sonic-platform-modules-ingrasys + + +## Dockers Information + +Following are the dockers that are running in SONiC. + +1) telemetry - Runs processes like telemetry & dialout_client_cli +2) syncd - Runs processes like syncd & dsserve which is used to sync the application data into the ASIC. +3) dhcp_relay - Runs the DHCP relay agent process. +4) teamd - Runs the teammgrd and teamsyncd processes. +5) radv (router-advertise) - Runs the IPv6 router advertisement process +6) snmp - Runs the SNMP agent daemon +7) swss (orchagent) - Runs the orchagent, portsyncd, neighsyncd, vrfmgrd, vlanmgrd, intfmgrd, portmgrd, buffermgrd, nbrmgrd & vxlanmgrd. +8) pmon (platform-monitor) - Runs the platform daemons xvrd (listens for SFP events) & psud (listens for power supply related events). +9) lldp - Runs the lldp process and lldpmgrd +10) bgp (fpm-frr) - Runs bgpcfgd, zebra, staticd, bgpd & fpmsyncd +11) database - Runs the REDIS server. diff --git a/thermal-control-design.md b/thermal-control-design.md new file mode 100644 index 0000000000..ed0534017b --- /dev/null +++ b/thermal-control-design.md @@ -0,0 +1,272 @@ +# SONiC Thermal Control Design # + +### Rev 0.1 ### + +### Revision ### + + | Rev | Date | Author | Change Description | + |:---:|:-----------:|:------------------:|-----------------------------------| + | 0.1 | | Liu Kebo | Initial version | + | 0.2 | | Liu Kebo | Revised after community review | + + + +## 1. Overview + +The purpose of Thermal Control is to keep the switch at a proper temperature by using cooling devices, e.g., fan. +Thermal control daemon need to monitor the temperature of devices (CPU, ASIC, optical modules, etc) and the running status of fan. It store temperature values fetched from sensors and thermal device running status to the DB, to make these data available to CLI and SNMP or other apps which interested. + +Thermal control also enforce some environment related polices to help the thermal control algorithm to adjust the switch temperature. + +## 2. Thermal device monitoring + +Thermal monitoring function will retrieve the switch device temperatures via platform APIs, platform APIs not only provide the value of the temperature, but also provide the threshold value. The thermal object status can be deduced by comparing the current temperature value against the threshold. If it above the high threshold or under the low threshold, alarm shall be raised. + +Besides device temperature, shall also monitoring the fan running status. + +Thermal device monitoring will loop at a certain period, 60s can be a good value since usually temperature don't change much in a short period. + +### 2.1 Temperature monitoring + +In new platform API ThermalBase() class provides get_temperature(), get_high_threshold(), get_low_threshold(), get_critical_high_threshold() and get_critical_low_threshold() functions, values for a thermal object can be fetched from them. Warning status can also be deduced. + +For the purpose of feeding CLI/SNMP or telemetry functions, these values and warning status can be stored in the state. DB schema can be like this: + + ; Defines information for a thermal object + key = TEMPERATURE_INFO|object_name ; name of the thermal object(CPU, ASIC, optical modules...) + ; field = value + temperature = FLOAT ; current temperature value + timestamp = STRING ; timestamp for the temperature fetched + high_threshold = FLOAT ; temperature high threshold + critical_high_threshold = FLOAT ; temperature critical high threshold + low_threshold = FLOAT ; temperature low threshold + critical_low_threshold = FLOAT ; temperature critical low threshold + warning_status = BOOLEAN ; temperature warning status + +These devices shall be included to the temperature monitor list but not limited to: CPU core, CPU pack, ASIC, PSU, Optical Modules, etc. + +TEMPERATURE_INFO Table key object_name convention can be "device_name + index" or device_name if there is no index, like "cpu_core_0", "asic", "psu_2". Appendix 1 listed all the thermal sensors that supported on Mellanox platform. + +### 2.2 Fan device monitoring + +In most case fan is the device to cool down the switch when the temperature is rising. Thus to make sure fan running at a proper speed is the key for thermal control. + +Fan target speed and speed tolerance was defined, by examining them we can know whether the fan reached at the desired speed. + +same as the temperature info, a [table for fan](https://github.com/Azure/SONiC/blob/master/doc/pmon/pmon-enhancement-design.md#153-fan-table) also defined as below: + + ; Defines information for a fan + key = FAN_INFO|fan_name ; information for the fan + ; field = value + presence = BOOLEAN ; presence of the fan + model = STRING ; model name of the fan + serial = STRING ; serial number of the fan + status = BOOLEAN ; status of the fan + change_event = STRING ; change event of the fan + direction = STRING ; direction of the fan + speed = INT ; fan speed + speed_tolerance = INT ; fan speed tolerance + speed_target = INT ; fan target speed + led_status = STRING ; fan led status + timestamp = STRING ; timestamp for the fan info fetched + +### 2.3 Syslog for thermal control + +If there was warning raised or warning cleared, log shall be generated: + + High temperature warning: PSU 1 current temperature 85C, high threshold 80C! + High temperature warning cleared, PSU1 temperature restore to 75C, high threshold 80C + +If fan broken or become up present, log shall be generated: + + Fan removed warning: Fan 1 was removed from the system, potential overheat hazard! + Fan removed warning cleared: Fan 1 was inserted. + +## 3. Thermal control management + +Adjust cooling device according to the current temperature can be very vendor specific and some vendors already have their own implementation. In below Appendix chapter describes a Mellanox implementation. But handle the cooling device according to some predefined policies can be generic, this is part of what Thermal control management will do. + +This cooling device control function can be disabled if the vendor have their own implementation in the kernel or somewhere else. + +### 3.1 Thermal control management flow + +It will be a routing function to check whether the policies was hit an the fan speed need to adjust, and also run vendor specific thermal control algorithm. + + +Below policies are examples that can be applied: + +- Set PWM to full speed if one of PS units is not present + +- Set PWM to full speed if one of FAN drawers is not present or one of tachometers is broken present + +- Set the fan speed to a consant value (60% of full speed) thermal control functions was disabled. + +FAN status led and PSU status led shall also be set accordingly when policy meet. + +Policy check functions will go through the device status and adjus the fan speed if necessary, these check will be preformed by calling the platform new API. + +A thermal control daemon class will be deifined with above functions defined, vendors will be allowed to have their own implementation. + +![](https://github.com/keboliu/SONiC/blob/master/images/thermal-control.svg) + +### 3.2 Policy management + +Policies are defined in a json file for each hwsku, for example, one SKU want to apply below policies: + +- Thermal control algorithm control, enabled this hwsku or not, fan speed value to set if not running; + +- FAN absence action, suspend the algorithm or not, fan speed value to set; + +- PSU absence action, suspend the algorithm or not, fan speed value to set. + +- All fans failed/absebce action, power down the + +Below is an example for the policy configuration: + + { + "thermal_control_algorithm": { + "run_at_boot_up": true, + "fan_speed_when_suspend": 60% + }, + "fan_absence": { + "action": { + "thermal_control_algorithm": "disable", + "fan_speed": 100%, + "led_color": "red" + } + }, + "psu_absence": { + "action": { + "thermal_control_algorithm": "disable", + "fan_speed": 100%, + "led_color": "red" + } + }, + "all_fan_failed": { + "action": { + "shutdown_switch": true + } + } + } + +In this configuration, thermal control algorithm will run on this device; in fan absence situation, the fan speed need to be set to 100%, the thermal control algorithm will be suspended and fan status led shall be set to red ; in psu absence situation, thermal control algorithm will be suspend, fan speed will be set to 100% and psu status led shall be set to red. + +During daemon start, this configuration json file will be loaded and parsed, daemon will handle the thermal control algorithm run and fan speed set when predefined policy meet. + +## 4. CLI show command for temperature and fan design + +### 4.1 New CLI show command for temperature + + adding a new sub command to the "show platform": + + admin@sonic# show platform ? + Usage: show platform [OPTIONS] COMMAND [ARGS]... + + Show platform-specific hardware info + + Options: + -?, -h, --help Show this message and exit. + + Commands: + mlnx Mellanox platform specific configuration... + psustatus Show PSU status information + summary Show hardware platform information + syseeprom Show system EEPROM information + temperature Show device temperature information + +out put of the new CLI + + admin@sonic# show platform temperature + NAME Temperature Timestamp High Threshold Low Threshold Critical High Threshold Critical Low Threshold Warning Status + ---- ----------- ------------------ --------------- -------------- ------------------------ ------------------------ ---------------- + CPU 85 20191112 09:38:16 110 -10 120 -20 false + ASIC 75 20191112 09:38:16 100 0 110 -10 false + +An option '--major' provided by this CLI to only print out major device temp, if don't want show all of sensor temperatures. +Major devices are CPU pack, cpu cores, ASIC and optical modules. + +### 4.2 New show CLI for fan status + +We don't have a CLI for fan status getting yet, new CLI for fan status could be like below, it's adding a new sub command to the "show platform": + + admin@sonic# show platform ? + Usage: show platform [OPTIONS] COMMAND [ARGS]... + + Show platform-specific hardware info + + Options: + -?, -h, --help Show this message and exit. + + Commands: + fanstatus Show fan status information + mlnx Mellanox platform specific configuration... + psustatus Show PSU status information + summary Show hardware platform information + syseeprom Show system EEPROM information +The output of the command is like below: + + admin@sonic# show platform fanstatus + FAN Speed Direction Timestamp + ----- --------- --------- ----------------- + FAN 1 12919 RPM Intake 20191112 09:38:16 + FAN 2 13043 RPM Exhaust 20191112 09:38:16 + + +## 5. Potential ehhancement for Platform API +1. Why can't we propose different change events for different cpu/fan/optics? +2. Verbose on API definition on threshold levels about Average/Max/Snapshot. +3. Is there any API exposed for fanTray contain more than one fan? + +## Appendix + +## 1.Mellanox platform thermal sensors list + +On Mellanox platform we have below thermal sensors that will be monitored by the thermal control daemons, not all of the Mellanox platform include all of them, some platform maybe only have a subset of these thermal sensors. + + cpu_core_x : "CPU Core x Temp", + cpu_pack : "CPU Pack Temp", + modules_x : "xSFP module x Temp", + psu_x : "PSU-x Temp", + gearbox_x : "Gearbox x Temp" + asic : "Ambient ASIC Temp", + port : "Ambient Port Side Temp", + fan : "Ambient Fan Side Temp", + comex : "Ambient COMEX Temp", + board : "Ambient Board Temp" + + +## 2.Mellanox thermal control implementation + +### 2.1 Mellanox thermal Control framework + +Mellanox thermal monitoring measure temperature from the ports and ASIC core. It operates in kernel space and binds PWM(Pulse-Width Modulation) control with Linux thermal zone for each measurement device (ports & core). The thermal algorithm uses step_wise policy which set FANs according to the thermal trends (high temperature = faster fan; lower temperature = slower fan). + +More detail information can refer to Kernel documents https://www.kernel.org/doc/Documentation/thermal/sysfs-api.txt +and Mellanox HW-management package documents: https://github.com/Mellanox/hw-mgmt/tree/master/Documentation + +### 2.2 Components + +- The cooling device is an actual functional unit for cooling down the thermal zone: Fan. + +- Thermal instance describes how cooling devices work at certain trip point in the thermal zone. + +- Governor handles the thermal instance not thermal devices. Step_wise governor sets cooling state based on thermal trend (STABLE, RAISING, DROPPING, RASING_FULL, DROPPING_FULL). It allows only one step change for increasing or decreasing at decision time. Framework to register thermal zone and cooling devices: + +- Thermal zone devices and cooling devices will work after proper binding. Performs a routing function of generic cooling devices to generic thermal zones with the help of very simple thermal management logic. + +### 2.3 Algorithm + +Use step_wise policy for each thermal zone. Set the fan speed according to different trip points. + +### 2.4 Trip points + +a series of trip point is defined to trigger fan speed manipulate. + + |state |Temperature value(Celsius) |PWM speed |Action | + |:------:|:-------------------------:|:-------------------------:|:-----------------------------------------| + |Cold | t < 75 C | 20% | Do nothing | + |Normal | 75 <= t < 85 | 20% - 40% | keep minimal speed| + |High | 85 <= t < 105 | 40% - 100% | adjust the fan speed according to the trends| + |Hot | 105 <= t < 110 | 100% | produce warning message | + |Critical| t >= 110 | 100% | shutdown | + diff --git a/workgroups.html b/workgroups.html index ec337d1cd8..0c0eabf284 100644 --- a/workgroups.html +++ b/workgroups.html @@ -1,10 +1,11 @@ - + + - - SONiC | Home + + SONiC | Workgroups @@ -12,22 +13,20 @@ - + - + - + - + - + - - @@ -36,193 +35,226 @@ - - + + + - + - + - - - -
    - -
    - - -