Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transceiver monitoring HLD #202

Merged
merged 15 commits into from
Jul 6, 2018
151 changes: 151 additions & 0 deletions doc/transceiver-monitor-hld.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,151 @@
# Transceiver and Sensor Monitoring HLD #

### Rev 0.1 ###

### Revision
| Rev | Date | Author | Change Description |
|:---:|:-----------:|:------------------:|-----------------------------------|
| 0.1 | | Liu Kebo | Initial version |

## About This Manual ##

This document is intend to provide general information about the Transceiver and Sensor Monitoring implementation.
The requirement is described in [Sensor and Transceiver Info Monitoring Requirement.](https://github.com/Azure/SONiC/blob/gh-pages/doc/OIDsforSensorandTransciver.MD)


## 1. Xcvrd design ##

New Xcvrd in platform monitor container is designed to fetch the transceiver and DOM sensor information from the eeprom and then update the state db with these info.

For the transceiver it's self, the type, serial number, hardware version, etc. will not change after plug in. The suitable way for transceiver information update can be triggered by transceiver plug in/out event.

The transceiver dom sensor information(temperature, power,voltage, etc.) can change frequently, these information need to be updated periodically, for now the time period temporarily set to 60s(see open question 1), this time period need to be adjusted according the later on test on all vendors platform.

If there is transceiver and sensor status change, Xcvrd will write the new status to state DB, to store these information some new tables will be added to STATE_DB.

### 1.1 State DB Schema ###

New Transceiver info table and transceiver DOM sensor table will be added to state DB to store the transceiver and DOM sensor information.

#### 1.1.1 Transceiver info Table ####

; Defines Transceiver information for a port
key = TRANSCEIVER_INFO|ifname ; configuration for watchdog on port
; field = value
type = 1*255VCHAR ; type of sfp
hardwarerev = 1*255VCHAR ; hardware version of sfp
serialnum = 1*255VCHAR ; serial number of the sfp
manufacturename = 1*255VCHAR ; sfp venndor name
modelname = 1*255VCHAR ; sfp model name

#### 1.1.2 Transceiver DOM sensor Table ####

; Defines Transceiver DOM sensor information for a port
key = TRANSCEIVER_DOM_SENSOR|ifname ; configuration for watchdog on port
temperature = FLOAT ; temperature value in Celsius
voltage = FLOAT ; voltage value
rx1power = FLOAT ; rx1 power in dbm
rx2power = FLOAT ; rx2 power in dbm
rx3power = FLOAT ; rx3 power in dbm
rx4power = FLOAT ; rx4 power in dbm
tx1bias = FLOAT ; tx1 bias in mA
tx2bias = FLOAT ; tx2 bias in mA
tx3bias = FLOAT ; tx3 bias in mA
tx4bias = FLOAT ; tx4 bias in mA

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to add TX power? Or TX bias is enough?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yongcanwang00 this is defined in the requirement doc, I would like to ask the requirement author @hui-ma to comment on this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TX bias is from the output of "show interfaces transceiver eeprom" command, It is also the output of some other vendors' snmp output. I think they can be translated into TX power. I need to look for its equation. Do you prefer TX power in the output directly?

ChannelMonitorValues:
          RX1Power: -1.1936dBm
          RX2Power: -1.1793dBm
          RX3Power: -0.9388dBm
          RX4Power: -1.0729dBm
          TX1Bias: 4.0140mA
          TX2Bias: 4.0140mA
          TX3Bias: 4.0140mA
          TX4Bias: 4.0140mA
    ModuleMonitorValues :
            Temperature : 1.1111C
            Vcc : 0.0000Volts

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, TX power is useful. It's helpful to have them both.


### 1.2 Access eeprom from platform container ###

Transceiver information eeprom can be accessed via read files(e.g. `/sys/bus/i2c/devices/2-0048/hwmon/hwmon4/qsfp9_eeprom`), different vendors may have these files under different folders, these folder need to be mounted to platform container so Xcvrd can access them.


For the convenience of implementation and reduce the time consuming, need to do enhancement to the `SfpUtilBase` class:

1. `SfpUtilBase` internally should add the ability to read the eeprom and only pick up the interested bytes by given offset and number of bytes.

2. `SfpUtilBase` will provide APIs `get_eeprom_sfp_info_dict(self, port_num)` and `get_eeprom_dom_info_dict(self, port_num)` to return `eeprom_if_dict` and `eeprom_dom_dict` separately, the interested values of these two dict are defined in section 1.1.1 and 1.1.2. In these two APIs can pick up these values from eeprom by provide the corresponding offset and number of bytes.


### 1.3 Transceiver plug in/out event ###

Xcvrd need to be triggered by transceiver plug in/out event to refresh the transceiver info table.

How to get this event is various on different platform, there is no common implementation available.

Here we define a common platform API to wait for this event in class `SfpUtilBase`:

@abc.abstractmethod
def get_transceiver_change_event(self):
"""
:returns: Boolean, True if call successful, False if not;
dict for pysical port number and the SFP status. like {'0': 'PLUGIN', '31':'PLUGOUT'}
"""
return

Each vendor need to implement this function in `SfpUtil` plugin.

Xcvrd will call this API to wait for the sfp plug in/out event, following example code showing how this API will be called:

while True:
status, port_dict = platform_sfputil.get_transceiver_change_event()
if(status):
for key, value in port_dict.iteritems():
print("SFP on port: %s" was %s" % (key, value))



### 1.4 Xcvrd daemon flow ###

Xcvrd will spawn a thread to wait for the SFP plug in/out event, when event received, it will update the DB entries accordingly.

A timer will be started to periodically refresh the DOM sensor information .

Detailed flow as showed in below chart:

![](https://github.com/keboliu/SONiC/blob/xcvrd-hld/images/transceiver_monitoring_hld/xcvrd_flow.svg)

## 2. SNMP Agent Change ##

### 2.1 MIB tables extension ###

MIB table entPhysicalTable from [Entity MIB(RFC2737)](https://tools.ietf.org/html/rfc2737) need to be extended to support new OIDs.

| OID | SNMP counter | Where to get the info in Sonic. | Example: |
| --- | --- | --- | --- |
| 1.3.6.1.2.1.47.1.1.1 | entPhysicalTable | | |
| 1.3.6.1.2.1.47.1.1.1.1 | entPhysicalEntry | | |
| 1.3.6.1.2.1.47.1.1.1.1.2. ifindex | entPhysicalDescr | Show interfaces alias | Xcvr for Ethernet29 |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ifindex should be replaced with index, which is Ifindex * 1000, as changed in requirement

| 1.3.6.1.2.1.47.1.1.1.1.7. ifindex | entPhysicalName | skipped | |
| 1.3.6.1.2.1.47.1.1.1.1.8. ifindex | entPhysicalHardwareVersion | Vendor Rev in CLI or sfputil | A1 |
| 1.3.6.1.2.1.47.1.1.1.1.9. ifindex | entPhysicalFirmwareVersion | Skipped | |
| 1.3.6.1.2.1.47.1.1.1.1.10.ifindex | entPhysicalSoftwareRevision | Skipped | |
| 1.3.6.1.2.1.47.1.1.1.1.11.ifindex | entPhysicalSerialNum | Vendor SN in CLI or sfputil | WW5062F |
| 1.3.6.1.2.1.47.1.1.1.1.12.ifindex | entPhysicalMfgName | Vendor Name in CLI or sfputil | FINISAR CORP |
| 1.3.6.1.2.1.47.1.1.1.1.13.ifindex | entPhysicalModelName | Vendor PN in CLI or sfputil| FCBN410QD3C02 |


Another entPhySensorTable which is defined in [Entity Sensor MIB(RFC3433)](https://tools.ietf.org/html/rfc3433) need to be new added.

| OID | SNMP counter | Where to get the info in Sonic. | Example: |
| --- | --- | --- | --- |
| 1.3.6.1.2.1.99.1.1 | entPhySensorTable | | |
| 1.3.6.1.2.1.99.1.1.1 | entPhySensorEntry | | |
| 1.3.6.1.2.1.99.1.1.1.1.index | entPhySensorType | In CLI: E.g.RX1Power: -0.97dBm | 6 |
| 1.3.6.1.2.1.99.1.1.1.2.index | entPhySensorScale | Same as above | 8 |
| 1.3.6.1.2.1.99.1.1.1.3.index | entPhySensorPrecision | Same as above | 4 |
| 1.3.6.1.2.1.99.1.1.1.4.index | entPhySensorValue | Same as above | 7998 |
| 1.3.6.1.2.1.47.1.1.1.1.2.index | entPhysicalDescr | Show interfaces alias | DOM RX Power Sensor for DOM RX Power Sensor for Ethernet29/1 |


More detailed information about new table and new OIDs are described in [Sensor and Transceiver Info Monitoring Requirement](https://github.com/Azure/SONiC/blob/gh-pages/doc/OIDsforSensorandTransciver.MD#transceiver-requirements-entity-mib).

### 2.2 New connection to STATE_DB ###

To get the transceiver and dom sensor status, SNMP agent need to connect to STATE\_DB and fetch information from TRNASCEIVER_TABLE which will be updated by Xcvrd when this is status change.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add more details about how SNMP agent should connect to DB? Could it subscribe to the change of Transceiver tables?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will add.



## 3. Open Questions ##

1. DOM sensor polling period need to be finalized after collecting enough data on various platform and later on test based on the new eeprom reading API.


Loading