Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Platform API for PCIe AER stats collection #702

Merged
merged 2 commits into from
Nov 10, 2020
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 34 additions & 2 deletions doc/pcie-mon/pcie-monitoring-services-hld.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# SONiC PCIe Monitoring services HLD #

### Rev 0.3 ###
### Rev 0.4 ###

### Revision
| Rev | Date | Author | Change Description |
Expand All @@ -10,6 +10,7 @@
| | | | Add pcied to PMON for runtime monitoring |
| 0.3 | | Arun Saravanan Balachandran | Add AER stats update support in pcied |
| | | | Add command to display AER stats |
| 0.4 | | Arun Saravanan Balachandran | Add platform API to collect AER stats |

## About This Manual ##

Expand Down Expand Up @@ -127,7 +128,38 @@ For AER supported PCIe device, the AER stats belonging to severities `correctabl

### 2.2 PCIe AER stats collection in pcied ###

For PCIe devices that pass PcieUtil `get_pcie_check`, the AER stats if available will be retrieved and updated in the STATE_DB periodically every minute by pcied.
A common platform API `get_pcie_aer_stats` is defined in class `PcieBase` for retrieving AER stats of a PCIe device:

```
@abc.abstractmethod
def get_pcie_aer_stats(self, domain, bus, dev, fn):
"""
Returns a nested dictionary containing the AER stats belonging to a
PCIe device

Args:
domain, bus, dev, fn: Domain, bus, device, function of the PCIe
device respectively

Returns:
A nested dictionary where key is severity 'correctable', 'fatal' or
'non_fatal', value is a dictionary of key, value pairs in the format:
{'AER Error type': Error count}

Ex. {'correctable': {'BadDLLP': 0, 'BadTLP': 0},
'fatal': {'RxOF': 0, 'MalfTLP': 0},
'non_fatal': {'RxOF': 0, 'MalfTLP': 0}}

For PCIe devices that do not support AER, the value for each severity
key is an empty dictionary.
"""
return {}
```

Default `get_pcie_aer_stats`is implemented in PcieUtil class at sonic_platform_base/sonic_pcie/pcie_common.py.
It returns the AER stats for a given PCIe device obtained from the AER sysfs under `/sys/bus/pci/devices/<Domain>:<Bus>:<Dev>.<Fn>`

For PCIe devices that pass PcieUtil `get_pcie_check`, AER stats will be retrieved using `get_pcie_aer_stats` and updated in the STATE_DB periodically every minute by pcied.

### 2.3 STATE_DB keys and value ###

Expand Down