Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[link-training] initial support #925

Merged
merged 8 commits into from
Apr 27, 2022
Merged

Conversation

ds952811
Copy link
Contributor

@ds952811 ds952811 commented Jan 13, 2022

Why I did this?

In the case of DAC, static pre-calibrated pre-emphasis is rarely available on SONIC, as most of the ODM are expecting this to be done dynamically at runtime via link-training, hence we'll need this feature to improve the link quality

Related PRs:

Repo PR title State
sonic-buildimage sonic-port.yang: add link training support
sonic-sairedis vslib: add support for read-only port capabilities
sonic-swss portsorch: initial support for link-training
sonic-swss-common selectabletimer: add mutex to start() and stop()
sonic-utilities sonic-utils: initial support for link-training

Signed-off-by: Dante Su dante.su@broadcom.com

@ds952811 ds952811 force-pushed the link-training branch 5 times, most recently from 0c4b84a to a029855 Compare January 30, 2022 02:39

#### PMON xcvrd Considerations

Given that port link training shall not be enabled on certain transceivers. For example, a chip-to-module transceiver. It's better to have pmon#xcvrd provide a hint to the swss#orchagent for the advanced link training controls. Hence, **xcvr_capabilities** is introduced into APPL_DB.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will be the criteria for determining if a module supports link training?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LT is not going to work if there is a retimer in-between. For examples:

  1. DR transceiver
  2. LR transceiver
  3. chip-to-module transceiver

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, perhaps I wasn't clear. Is there a list of specific module types you're planning to have that explicitly does (or doesn't) support LT somewhere? If so, how is it defined?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CR modules only, the details will later be updated to the HLD

Signed-off-by: Dante Su <dante.su@broadcom.com>

Vendor-specific SAI implementation is not in the scope of this document, but there are some common requirements for SAI:

1. SAI implementation must return error code if any of the above attributes is not supported, swss and syncd must not crash.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please ensure that SAI supports sai_query_attribute_capability and use it before calling link training set

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. Maybe we could query the link training SAI capability and put it into STATE DB, so that CLI can use it and tell user if the platform does not support it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comments, in the current implementation (i.e. PR for sonic-utilities) it resues the xcvr_capabilities in APPL_DB for this, and we could also duplicate this info to STATE_DB, and the xcvr_capabilities will later be reused in AutoNeg for the AN and speed capability, and the speed capability may changed per port breakout operation, hence it's better to create a new 'TABLE TRANSCEIVER_CAPS' for these dynamic values, if we want to make a copy in STATE_DB. Please feel free to share your thoughts, thank you

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ds952811 I believe the SAI capabilities should determine whether xcvrd should publish the capabilities. Please ensure that xcvrd publishes the capabilities only when orchagent has updated STATE DB from SAI that link training is supported by ASIC.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comments, and based on the conclusion of the earlier meeting, the changes against xcvrd will be dropped, and the config command should directly reference the TRANSCEIVER_INFO of STATE_DB for the ability checks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. Maybe we could query the link training SAI capability and put it into STATE DB, so that CLI can use it and tell user if the platform does not support it.

Yep, maybe we could create a new table for the capabilities, will add this later

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. Maybe we could query the link training SAI capability and put it into STATE DB, so that CLI can use it and tell user if the platform does not support it.

Yep, maybe we could create a new table for the capabilities, will add this later

Unfortunately, there is a conflict on this from the comment in AutoNeg PR, which is for the timing constraint in the warm-reboot scenario. Hence the capability will only be initialized when there is a AN or LT config update, that being said the capabilities list will not be initiated during the system startup.

i.e. We're not going to add this into STATE_DB for now

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK


| Mode | Description |
|:----:|:---------------------------------------------------------------:|
| auto | Enable link-training if applicable to the transceiver attached |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the auto mode as discussed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will do, thanks for the reminder

key = PORT_TABLE:port_name ; configuration of the port
; field = value
...
link_training = STRING ; operational link training config
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rename the variable to reflect if it is admin or oper status.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

link_training is the admin, and all the operational status will be integrated as enum like string into link_training_status

fvp = swsscommon.FieldValuePairs([
('type', 'link_training')
])
port_np.send('port_state_refresh', port, fvp)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add the sequence diagram. While portsorch is the consumer of this notification who is going to be the producer for this PORT_CTRL?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering the further integration with CLIs (REST, KLISH...etc.) and avoid possible confusions, it's better to periodically poll the status when LT is enabled, hence this will be updated to a periodical polling


When **link_training=auto**, swss#orchagent should request syncd to enable port link training only if **LT** is specified in **xcvr_capabilities**.

### Warmboot Design Impact
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add considerations for fast reboot. Currently pmon starts after syncd and with link training in place how will this be impacted?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm right about it, the fast-reboot is a case that kernel will be restarted from loading a new ELF file, without BIOS intervention, and this will be a COLD-REBOOT scenario, from the switch silicon, swss and pmon perspective, hence none extra care will be necessary.
In the case of warm-reboot, the APPL_DB and STATE_DB will be restored, and the switch silicon hardware state will not be changed and seamless resumed, all we need is to make sure static pre-emphasis request will not be applied in this scenario. Of course, a system test could provide more details, and I'll update the HLD if any issues are observed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I meant in case of fast-boot, the requirement is to have downtime less than 30 seconds. In normal scenario this wouldn't be a problem as syncd can come up and initialize the ports whereas in this case the xcvrd should publish the capabilities which will be required for link to be up. So the link downtime may be greater than 30 seconds. Please verify this scenario

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comments, and based on the conclusion of the earlier meeting, the changes against xcvrd will be dropped, and the config command should directly reference the TRANSCEIVER_INFO of STATE_DB for the ability checks.

String value, the operational port link training failure status. "none" indicates no error is detected, otherwise any of the link training failure status defined in SAI. For example: "none", "lock", "snr" and "timeout".
- link_training_rxstatus:
String value, the operational port link training rx status. any of the link training rx status defined in SAI. For example: "trained", "not-trained".
- xcvr_capabilities:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are all the list of values that can be in xcr_capabilities apart from LT. Please detail it in the HLD.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As of now, only 'LT' will be present, and yes, I'll update the HLD for this, thanks for the comment


#### PMON xcvrd Considerations

Given that port link training shall not be enabled on certain transceivers. For example, a chip-to-module transceiver. It's better to have pmon#xcvrd provide a hint to the swss#orchagent for the advanced link training controls. Hence, **xcvr_capabilities** is introduced into APPL_DB.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you can capture the types of transceivers which you plan to incorporate in code that can support LT vs that cannot it would be great.
What is going to be the action when user enables LT on a transceiver that doesn't support? If we emit error log, capture it here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. The xcvrd will inspect the 'specification_compliance' and set 'LT' to xcvr_capability if it's a CR module (SR may also works, but some of the transceivers may inappropriately claims itself a SR module, while it's a CuSFP with RJ45.
  2. In the swss#orchagent, a error/warning will be logged for this, but the operation will still proceed
  3. In the config command (sonic-utilitie), the users will be blocked by an error indicate this is an unsupported module, and user could always use '-f' to force to run the operation


Arguments:
interface_name: name of the interface to be configured. e.g: Ethernet0
mode: link training mode, can be either "auto", "on" or "off"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the mode is auto, what is the value of SAI_PORT_ATTR_LINK_TRAINING_ENABLE?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per the discussion in the last meeting, the auto mode will be dropped


Vendor-specific SAI implementation is not in the scope of this document, but there are some common requirements for SAI:

1. SAI implementation must return error code if any of the above attributes is not supported, swss and syncd must not crash.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. Maybe we could query the link training SAI capability and put it into STATE DB, so that CLI can use it and tell user if the platform does not support it.

To minimize system overhead, instead of periodically fetching the operational link training status, the corresponding fields in the APPL_DB will only be updated in the following events

- Per-Port link status changes
In this case, only the **link_training_status** will be updated.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does it get link training status? By which SAI attribute?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The operation state could be fetch by the following SAI attributes

  • SAI_PORT_ATTR_LINK_TRAINING_FAILURE_STATUS
  • SAI_PORT_ATTR_LINK_TRAINING_RX_STATUS
    And 'link_training_status' will be updated to display only one of the best appropriate description for the current state.
    e.g. 'not_trained' will only be presented when none of failure is reported by SAI_PORT_ATTR_LINK_TRAINING_FAILURE_STATUS, and the LT hardware engine failed to get a fine-tuned pre-emphasis

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The HLD will later be updated to have this information, it should be done later today

Signed-off-by: Dante Su <dante.su@broadcom.com>
Signed-off-by: Dante Su <dante.su@broadcom.com>
@dgsudharsan
Copy link
Contributor

@ds952811 What will be the behavior if AN and LT are turned on together? Are there any CLI level blocking planned for it?

@prgeor prgeor self-assigned this Mar 2, 2022
@ds952811
Copy link
Contributor Author

ds952811 commented Mar 2, 2022

@ds952811 What will be the behavior if AN and LT are turned on together? Are there any CLI level blocking planned for it?

  1. From the Intel FPGA spec., the AN and LT could be enabled/disabled separately
    https://www.intel.com/content/www/us/en/docs/programmable/683100/21-1-19-2-0/auto-negotiation-and-link-training.html
    https://www.intel.com/content/www/us/en/docs/programmable/683100/21-1-19-2-0/auto-negotiation-and-link-training-registers.html

  2. In the current Broadcom SDK-6.5.21 implementation, the LT is always enabled when AN is activated and it can't be disabled.

Conclusion:
None of CLI is necessary to block a user from such an AN/LT combination, and swss#orchagent should log an error if the 'link_training=off' + 'autoneg=on' while it's invalid to the vendor-specific SAI implementation. The HLD will later be updated.

Signed-off-by: Dante Su <dante.su@broadcom.com>
- empty upon success
```

For deployment considerations, when the link-training is enabled on a transceiver without LT capabilities, this config command should report an error and get aborted, unless '-f' option is specified.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we have removed pmon related sections, I believe this line becomes irrelevant. Can you please remove?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HLD updated

if autoneg changed:
setPortAutoNeg(port, autoneg)

if link_training changed and "LT" in xcvr_capabilities:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this relevant since pmon sections are removed?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HLD updated

admin@sonic:~$ show interfaces link-training status
Interface LT Oper LT Admin Oper Admin Error Description
----------- ----------- ---------- ------ ------- -----------------------
Ethernet0 trained on up up unsupported transceiver
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe unsupported transceiver will not be relevant since we removed pmon related sections.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HLD updated

Arguments:
interface_name: [mandatory] name of the interface to be configured. e.g: Ethernet0
mode: [mandatory] link training mode, can be either "on" or "off"
-f: [optional] forcing link-training configuration on an unsupported transceiver
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove the -f option since we don't perform transceiver checking?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HLD updated

Signed-off-by: Dante Su <dante.su@broadcom.com>
dgsudharsan
dgsudharsan previously approved these changes Mar 10, 2022

#### System Test cases

Will leverage sonic-mgmt to test this feature.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you be more specific?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will there be new test cases added for this new feature?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comment, and the HLD is now updated, please check it out and see if it works

Signed-off-by: Dante Su <dante.su@broadcom.com>
Signed-off-by: Dante Su <dante.su@broadcom.com>
Signed-off-by: Dante Su <dante.su@broadcom.com>
@Junchao-Mellanox
Copy link
Contributor

Will this comment be considered? #925 (comment) ?

@lguohan lguohan merged commit c64b087 into sonic-net:master Apr 27, 2022
qiluo-msft pushed a commit to sonic-net/sonic-buildimage that referenced this pull request May 15, 2022
This is part of HLD  sonic-net/SONiC#925

#### Why I did it
Add link-training support

#### How I did it
Update SONiC YANG for port link-training support

#### Description for the changelog
Add "link_training" to sonic-port.yang

#### Link to config_db schema for YANG module changes

https://github.com/sonic-net/SONiC/wiki/Configuration#port
qiluo-msft pushed a commit to sonic-net/sonic-utilities that referenced this pull request May 18, 2022
HLD: sonic-net/SONiC#925

#### What I did
Add CLI support for link-trainig

#### How I did it
1. portconfig: initial support for link-training config
2. config/main.py: initial support for link-training config
3. intfutil: initial support for link-training show command
4. show/interfaces/__init__.py: initial support for link-training show command

#### How to verify it
1. Manual test
2. Ran the Unit-tests to the corresponding changes

#### Previous command output (if the output of a command-line utility has changed)

#### New command output (if the output of a command-line utility has changed)

```
admin@sonic:~$ sudo config interface link-training Ethernet0 on
admin@sonic:~$ sudo config interface link-training Ethernet8 on
admin@sonic:~$ sudo config interface link-training Ethernet16 off
admin@sonic:~$ sudo config interface link-training Ethernet24 on
admin@sonic:~$ sudo config interface link-training Ethernet32 off
admin@sonic:~$ show interfaces link-training status
  Interface      LT Oper    LT Admin    Oper    Admin
-----------  -----------  ----------  ------  -------
  Ethernet0      trained          on      up       up
  Ethernet8      trained          on      up       up
 Ethernet16          off         off    down       up
 Ethernet24  not trained          on    down       up
 Ethernet32          off         off    down       up
 Ethernet40          off           -    down       up
 Ethernet48          off           -    down       up
 Ethernet56          off           -    down       up
 Ethernet64          off           -    down       up
 Ethernet72          off           -    down       up
 Ethernet80          off           -    down       up
 Ethernet88          off           -    down       up
 Ethernet96          off           -    down       up
Ethernet104          off           -    down       up
Ethernet112          off           -    down       up
Ethernet120          off           -    down       up
Ethernet128          off           -    down       up
Ethernet136          off           -    down       up
Ethernet144          off           -    down       up
Ethernet152          off           -    down       up
Ethernet160          off           -    down       up
Ethernet168          off           -    down       up
Ethernet176          off           -    down       up
Ethernet184          off           -    down       up
Ethernet192          off           -    down       up
Ethernet200          off           -    down       up
Ethernet208          off           -    down       up
Ethernet216          off           -    down       up
Ethernet224          off           -    down       up
Ethernet232          off           -    down       up
Ethernet240          off           -    down       up
Ethernet248          off           -    down       up
admin@sonic:~$
```
yxieca pushed a commit to sonic-net/sonic-utilities that referenced this pull request Aug 11, 2022
HLD: sonic-net/SONiC#925

#### What I did
Add CLI support for link-trainig

#### How I did it
1. portconfig: initial support for link-training config
2. config/main.py: initial support for link-training config
3. intfutil: initial support for link-training show command
4. show/interfaces/__init__.py: initial support for link-training show command

#### How to verify it
1. Manual test
2. Ran the Unit-tests to the corresponding changes

#### Previous command output (if the output of a command-line utility has changed)

#### New command output (if the output of a command-line utility has changed)

```
admin@sonic:~$ sudo config interface link-training Ethernet0 on
admin@sonic:~$ sudo config interface link-training Ethernet8 on
admin@sonic:~$ sudo config interface link-training Ethernet16 off
admin@sonic:~$ sudo config interface link-training Ethernet24 on
admin@sonic:~$ sudo config interface link-training Ethernet32 off
admin@sonic:~$ show interfaces link-training status
  Interface      LT Oper    LT Admin    Oper    Admin
-----------  -----------  ----------  ------  -------
  Ethernet0      trained          on      up       up
  Ethernet8      trained          on      up       up
 Ethernet16          off         off    down       up
 Ethernet24  not trained          on    down       up
 Ethernet32          off         off    down       up
 Ethernet40          off           -    down       up
 Ethernet48          off           -    down       up
 Ethernet56          off           -    down       up
 Ethernet64          off           -    down       up
 Ethernet72          off           -    down       up
 Ethernet80          off           -    down       up
 Ethernet88          off           -    down       up
 Ethernet96          off           -    down       up
Ethernet104          off           -    down       up
Ethernet112          off           -    down       up
Ethernet120          off           -    down       up
Ethernet128          off           -    down       up
Ethernet136          off           -    down       up
Ethernet144          off           -    down       up
Ethernet152          off           -    down       up
Ethernet160          off           -    down       up
Ethernet168          off           -    down       up
Ethernet176          off           -    down       up
Ethernet184          off           -    down       up
Ethernet192          off           -    down       up
Ethernet200          off           -    down       up
Ethernet208          off           -    down       up
Ethernet216          off           -    down       up
Ethernet224          off           -    down       up
Ethernet232          off           -    down       up
Ethernet240          off           -    down       up
Ethernet248          off           -    down       up
admin@sonic:~$
```
malletvapid23 added a commit to malletvapid23/Sonic-Utility that referenced this pull request Aug 3, 2023
HLD: sonic-net/SONiC#925

#### What I did
Add CLI support for link-trainig

#### How I did it
1. portconfig: initial support for link-training config
2. config/main.py: initial support for link-training config
3. intfutil: initial support for link-training show command
4. show/interfaces/__init__.py: initial support for link-training show command

#### How to verify it
1. Manual test
2. Ran the Unit-tests to the corresponding changes

#### Previous command output (if the output of a command-line utility has changed)

#### New command output (if the output of a command-line utility has changed)

```
admin@sonic:~$ sudo config interface link-training Ethernet0 on
admin@sonic:~$ sudo config interface link-training Ethernet8 on
admin@sonic:~$ sudo config interface link-training Ethernet16 off
admin@sonic:~$ sudo config interface link-training Ethernet24 on
admin@sonic:~$ sudo config interface link-training Ethernet32 off
admin@sonic:~$ show interfaces link-training status
  Interface      LT Oper    LT Admin    Oper    Admin
-----------  -----------  ----------  ------  -------
  Ethernet0      trained          on      up       up
  Ethernet8      trained          on      up       up
 Ethernet16          off         off    down       up
 Ethernet24  not trained          on    down       up
 Ethernet32          off         off    down       up
 Ethernet40          off           -    down       up
 Ethernet48          off           -    down       up
 Ethernet56          off           -    down       up
 Ethernet64          off           -    down       up
 Ethernet72          off           -    down       up
 Ethernet80          off           -    down       up
 Ethernet88          off           -    down       up
 Ethernet96          off           -    down       up
Ethernet104          off           -    down       up
Ethernet112          off           -    down       up
Ethernet120          off           -    down       up
Ethernet128          off           -    down       up
Ethernet136          off           -    down       up
Ethernet144          off           -    down       up
Ethernet152          off           -    down       up
Ethernet160          off           -    down       up
Ethernet168          off           -    down       up
Ethernet176          off           -    down       up
Ethernet184          off           -    down       up
Ethernet192          off           -    down       up
Ethernet200          off           -    down       up
Ethernet208          off           -    down       up
Ethernet216          off           -    down       up
Ethernet224          off           -    down       up
Ethernet232          off           -    down       up
Ethernet240          off           -    down       up
Ethernet248          off           -    down       up
admin@sonic:~$
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants