Skip to content

Commit

Permalink
Platform API for PCIe AER stats collection (#702)
Browse files Browse the repository at this point in the history
Update the PCIed hld to add the platform API definition for PCIe AER stats collection
  • Loading branch information
ArunSaravananBalachandran committed Nov 10, 2020
1 parent 403993f commit 0669c62
Showing 1 changed file with 34 additions and 2 deletions.
36 changes: 34 additions & 2 deletions doc/pcie-mon/pcie-monitoring-services-hld.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# SONiC PCIe Monitoring services HLD #

### Rev 0.3 ###
### Rev 0.4 ###

### Revision
| Rev | Date | Author | Change Description |
Expand All @@ -10,6 +10,7 @@
| | | | Add pcied to PMON for runtime monitoring |
| 0.3 | | Arun Saravanan Balachandran | Add AER stats update support in pcied |
| | | | Add command to display AER stats |
| 0.4 | | Arun Saravanan Balachandran | Add platform API to collect AER stats |

## About This Manual ##

Expand Down Expand Up @@ -127,7 +128,38 @@ For AER supported PCIe device, the AER stats belonging to severities `correctabl

### 2.2 PCIe AER stats collection in pcied ###

For PCIe devices that pass PcieUtil `get_pcie_check`, the AER stats if available will be retrieved and updated in the STATE_DB periodically every minute by pcied.
A common platform API `get_pcie_aer_stats` is defined in class `PcieBase` for retrieving AER stats of a PCIe device:

```
@abc.abstractmethod
def get_pcie_aer_stats(self, domain, bus, dev, fn):
"""
Returns a nested dictionary containing the AER stats belonging to a
PCIe device
Args:
domain, bus, dev, fn: Domain, bus, device, function of the PCIe
device respectively
Returns:
A nested dictionary where key is severity 'correctable', 'fatal' or
'non_fatal', value is a dictionary of key, value pairs in the format:
{'AER Error type': Error count}
Ex. {'correctable': {'BadDLLP': 0, 'BadTLP': 0},
'fatal': {'RxOF': 0, 'MalfTLP': 0},
'non_fatal': {'RxOF': 0, 'MalfTLP': 0}}
For PCIe devices that do not support AER, the value for each severity
key is an empty dictionary.
"""
return {}
```

Default `get_pcie_aer_stats`is implemented in PcieUtil class at sonic_platform_base/sonic_pcie/pcie_common.py.
It returns the AER stats for a given PCIe device obtained from the AER sysfs under `/sys/bus/pci/devices/<Domain>:<Bus>:<Dev>.<Fn>`

For PCIe devices that pass PcieUtil `get_pcie_check`, AER stats will be retrieved using `get_pcie_aer_stats` and updated in the STATE_DB periodically every minute by pcied.

### 2.3 STATE_DB keys and value ###

Expand Down

0 comments on commit 0669c62

Please sign in to comment.