Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transceiver monitoring HLD #202

Merged
merged 15 commits into from
Jul 6, 2018
140 changes: 140 additions & 0 deletions doc/transceiver-monitor-hld.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,140 @@
# Transceiver and Sensor Monitoring HLD #

### Rev 0.1 ###

### Revision
| Rev | Date | Author | Change Description |
|:---:|:-----------:|:------------------:|-----------------------------------|
| 0.1 | | Liu Kebo | Initial version |

## About This Manual ##

This document is intend to provide general information about the Transceiver and Sensor Monitoring implementation.
The requirement is described in [Sensor and Transceiver Info Monitoring Requirement.](https://github.com/Azure/SONiC/blob/gh-pages/doc/OIDsforSensorandTransciver.MD)


## 1. Xcvrd design ##

New Xcvrd in platform monitor container is designed to fetch the transceiver and DOM sensor information from the eeprom and then update the state db with these info.

For the transceiver it's self, the type, serial number, hardware version, etc. will not change after plug in. The suitable way for transceiver information update can be triggered by transceiver plug in/out event.

The transceiver dom sensor information(temperature, power,voltage, etc.) can change frequently, these information need to be updated periodically, for now the time period temporarily set to 120s(see open question 1), this time period need to be adjusted according the later on test on all vendors platform.

If there is transceiver and sensor status change, Xcvrd will write the new status to state DB, to store these information some new tables will be added to STATE_DB.

### 1.1 State DB Schema ###

New Transceiver info table and transceiver DOM sensor table will be added to state DB to store the transceiver and DOM sensor information.

#### 1.1.1 Transceiver info Table ####

; Defines Transceiver information for a port
key = TRANSCEIVER_INFO|ifname ; configuration for watchdog on port
; field = value
type = 1*255VCHAR ; type of sfp
hardwarerev = 1*255VCHAR ; hardware version of sfp
serialnum = 1*255VCHAR ; serial number of the sfp
manufacturename = 1*255VCHAR ; sfp venndor name
modelname = 1*255VCHAR ; sfp model name

#### 1.1.2 Transceiver DOM sensor Table ####

; Defines Transceiver DOM sensor information for a port
key = TRANSCEIVER_DOM_SENSOR|ifname ; configuration for watchdog on port
temperature = FLOAT ; temperature value in Celsius
voltage = FLOAT ; voltage value
rx1power = FLOAT ; rx1 power in dbm
rx2power = FLOAT ; rx2 power in dbm
rx3power = FLOAT ; rx3 power in dbm
rx4power = FLOAT ; rx4 power in dbm
tx1bias = FLOAT ; tx1 bias in mA
tx2bias = FLOAT ; tx2 bias in mA
tx3bias = FLOAT ; tx3 bias in mA
tx4bias = FLOAT ; tx4 bias in mA

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to add TX power? Or TX bias is enough?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yongcanwang00 this is defined in the requirement doc, I would like to ask the requirement author @hui-ma to comment on this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TX bias is from the output of "show interfaces transceiver eeprom" command, It is also the output of some other vendors' snmp output. I think they can be translated into TX power. I need to look for its equation. Do you prefer TX power in the output directly?

ChannelMonitorValues:
          RX1Power: -1.1936dBm
          RX2Power: -1.1793dBm
          RX3Power: -0.9388dBm
          RX4Power: -1.0729dBm
          TX1Bias: 4.0140mA
          TX2Bias: 4.0140mA
          TX3Bias: 4.0140mA
          TX4Bias: 4.0140mA
    ModuleMonitorValues :
            Temperature : 1.1111C
            Vcc : 0.0000Volts

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, TX power is useful. It's helpful to have them both.


### 1.2 Access eeprom from platform container ###

Transceiver information eeprom can be accessed via read files(e.g. /sys/bus/i2c/devices/2-0048/hwmon/hwmon4/qsfp9_eeprom), different vendors may have these files under different folders, these folder need to be mounted to platform container so Xcvrd can access them.

Another potential enhancement to the eeprom reading is to only read the needed parameters out instead of read all of them, this can be achieved by add new API to `SfpUtilBase` which can read desired bytes starting from give offset and parse the bytes to readable format accordingly.

For the convenience of implementation and reduce the time consuming, need to enhance the `SfpUtilBase` class to provide functions to get `eeprom_if_dict` and `eeprom_dom_dict` separately, the intrested values are defined in section 1.1.1 and 1.1.2, we can pick up these values from eeprom by calling the above new API with proper offset and number of bytes.


### 1.3 Transceiver plug in/out event ###

Xcvrd need to be triggered by transceiver plug in/out event to refresh the transceiver info table.

Transceiver plug in/out status can be derived from the content of sysfs file like `"/sys/bus/i2c/devices/2-0048/hwmon/hwmon7/qsfp10_status"`, if the content of the file is "good" represent SFP is present, conent change to "not_connected" means SFP plug out.

To monitor the file change, can introduce python lib [inotify](https://pypi.org/project/inotify/), which can raise notification when target file change. Below is a sample for how to use inotify lib to monitor file change:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here some details are missing. Who will generate events? where it will run? Is it part of Xcvrd or a separate daemon? If not part of Xcvrd - how event will be passed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Andriy. This is the most tricky part. Need more details.
How should we get the list of path to watch for different vendor? and what are the ""good" and "not_connected" value different vendors? How to retrieve them in the platform monitor container.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay. I think I sent the comment back. I forget to hit the review button.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think each vendor have to implement their own API to fetch the watching path list and also the API to judge the status of the transceiver according to the file content. I'll define some API here. maybe the current CLI to show SFP presence is a good example here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Sounds good to me.


i = inotify.adapters.Inotify()

i.add_watch(b'/bsp/qsfp/qsfp10_status')

try:
for event in i.event_gen():
if event is not None:
(header, type_names, watch_path, filename) = event
_LOGGER.info("WD=(%d) MASK=(%d) COOKIE=(%d) LEN=(%d) MASK->NAMES=%s "
"WATCH-PATH=[%s] FILENAME=[%s]",
header.wd, header.mask, header.cookie, header.len, type_names,
watch_path.decode('utf-8'), filename.decode('utf-8'))
finally:
i.remove_watch(b'/tmp')

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand this is just an example but looks like remove_watch does not match the add_watch
also /bsp/ is not available in pmon (it will run in pmon, right?) so it is better to use /sys/bus...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, we should mount /sys/bus to pmon, will revise the example.



### 1.4 Xcvrd daemon flow ###

Xcvrd retrieve transceiver by event trigger, DOM sensor information will be periodically freshed, these infomation can be readed via sfputil.

![](https://github.com/keboliu/SONiC/blob/xcvrd-hld/images/transceiver_monitoring_hld/xcvrd_flow.svg)

## 2. SNMP Agent Change ##

### 2.1 MIB tables extension ###

MIB table entPhysicalTable from [Entity MIB(RFC2737)](https://tools.ietf.org/html/rfc2737) need to be extended to support new OIDs.

| OID | SNMP counter | Where to get the info in Sonic. | Example: |
| --- | --- | --- | --- |
| 1.3.6.1.2.1.47.1.1.1 | entPhysicalTable | | |
| 1.3.6.1.2.1.47.1.1.1.1 | entPhysicalEntry | | |
| 1.3.6.1.2.1.47.1.1.1.1.2. ifindex | entPhysicalDescr | Show interfaces alias | Xcvr for Ethernet29 |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ifindex should be replaced with index, which is Ifindex * 1000, as changed in requirement

| 1.3.6.1.2.1.47.1.1.1.1.7. ifindex | entPhysicalName | skipped | |
| 1.3.6.1.2.1.47.1.1.1.1.8. ifindex | entPhysicalHardwareVersion | Vendor Rev in CLI or sfputil | A1 |
| 1.3.6.1.2.1.47.1.1.1.1.9. ifindex | entPhysicalFirmwareVersion | Skipped | |
| 1.3.6.1.2.1.47.1.1.1.1.10.ifindex | entPhysicalSoftwareRevision | Skipped | |
| 1.3.6.1.2.1.47.1.1.1.1.11.ifindex | entPhysicalSerialNum | Vendor SN in CLI or sfputil | WW5062F |
| 1.3.6.1.2.1.47.1.1.1.1.12.ifindex | entPhysicalMfgName | Vendor Name in CLI or sfputil | FINISAR CORP |
| 1.3.6.1.2.1.47.1.1.1.1.13.ifindex | entPhysicalModelName | Vendor PN in CLI or sfputil| FCBN410QD3C02 |


Another entPhySensorTable which is defined in [Entity Sensor MIB(RFC3433)](https://tools.ietf.org/html/rfc3433) need to be new added.

| OID | SNMP counter | Where to get the info in Sonic. | Example: |
| --- | --- | --- | --- |
| 1.3.6.1.2.1.99.1.1 | entPhySensorTable | | |
| 1.3.6.1.2.1.99.1.1.1 | entPhySensorEntry | | |
| 1.3.6.1.2.1.99.1.1.1.1.index | entPhySensorType | In CLI: E.g.RX1Power: -0.97dBm | 6 |
| 1.3.6.1.2.1.99.1.1.1.2.index | entPhySensorScale | Same as above | 8 |
| 1.3.6.1.2.1.99.1.1.1.3.index | entPhySensorPrecision | Same as above | 4 |
| 1.3.6.1.2.1.99.1.1.1.4.index | entPhySensorValue | Same as above | 7998 |
| 1.3.6.1.2.1.47.1.1.1.1.2.index | entPhysicalDescr | Show interfaces alias | DOM RX Power Sensor for DOM RX Power Sensor for Ethernet29/1 |


More detailed information about new table and new OIDs are described in [Sensor and Transceiver Info Monitoring Requirement](https://github.com/Azure/SONiC/blob/gh-pages/doc/OIDsforSensorandTransciver.MD#transceiver-requirements-entity-mib).

### 2.2 New connection to STATE_DB ###

To get the transceiver and dom sensor status, SNMP agent need to connect to STATE\_DB and fetch information from TRNASCEIVER_TABLE which will be updated by Xcvrd when this is status change.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add more details about how SNMP agent should connect to DB? Could it subscribe to the change of Transceiver tables?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will add.



## 3. Open Questions ##

1. DOM sensor polling period need to be finialized after collecting enough data on various platform and later on test based on the new eerpom reading API.


Loading