[Dynamic Buffer Calc][Mellanox] Bug fixes and enhancements for the lua plugins for buffer pool calculation and headroom checking #1781

stephenxs · 2021-06-11T03:46:27Z

What I did
Bug fixes for buffer pool calculation and headroom checking on Mellanox platforms.

Test the number of lanes instead of the speed when determining whether special handling is required for a port.
For speeds other than 400G, eg 100G, it's possible that some 100G ports have 8 lanes and others have 4 lanes,
which means they can not share the same buffer profile.
A suffix _8lane is introduced to indicate it, like pg_lossless_100000_5m_8lane_profile
Take the private headroom into account when calculating the buffer pool size
Take deviation into account when checking the headroom against the per-port limit to avoid the inaccurate result in a rare case
Use hashtable to record the reference count of a profile in lug plugin

Signed-off-by: Stephen Sun stephens@nvidia.com

Why I did it

How I verified it
Run regression and manually test

Details if related

Test the number of lanes instead of the speed when determining whether special handling (double headroom size) is required for a port.
Originally, it was determined by testing whether the ports' speed is 400G but that is not accurate. A user can configure a port with 8 lanes to 100G. In this case, special handling is still required for a port that is not 400G.
So we need to adjust the way to do that.
The variable names are also updated accordingly: xxx_400g => xxx_8lanes
Take deviation into account when checking the headroom against the per-port limit to avoid the inaccurate result in a rare case
There are some deviations that make the accumulative headroom a bit larger than the quantity calculated by the buffer manager. We need to take it into account when calculating the accumulative headroom.

stephenxs · 2021-06-11T03:51:45Z

This PR contains the fixes of 3 bugs and 1 enhancement. In theory, we open 1 PR for 1 thing. But there are dependencies between all fixes, which makes it very difficult to split it.

liat-grozovik · 2021-06-17T19:14:37Z

@neethajohn can you please help to review?
@stephenxs to which branches this PR should be cherry picked and if this is a clean cherry picked or new new PR?

stephenxs · 2021-06-18T09:38:00Z

@neethajohn can you please help to review?
@stephenxs to which branches this PR should be cherry picked and if this is a clean cherry picked or new new PR?

so far it can be smoothly cherry-picked.

- Take number of lanes instead of speed into account when determining whether it has doubled pipeline latency For speeds other than 400G, eg 100G, it's possible that some 100G ports have 8 lanes and others have 4 lanes In this case, we need to add "8_lane" to the profile name to indicate whether the profile is for 8 lane ports or normal ports This is for Mellanox platform only - Take advantage of "set" feature of the lua to present the profile referencing count, which also makes the code more maintainable - Take deviation into account when checking the headroom against the limit - Take private headroom into account when shared headroom pool is enabled Signed-off-by: Stephen Sun <stephens@nvidia.com>

neethajohn

Please add a sonic-mgmt tests for this 8 lane profile

cfgmgr/buffer_headroom_mellanox.lua

…lane port Signed-off-by: Stephen Sun <stephens@nvidia.com>

stephenxs · 2021-06-22T10:41:49Z

Please add a sonic-mgmt tests for this 8 lane profile

In almost every test cases, there is a logic to check whether the buffer profile in BUFFER_PG_TABLE is correct. On platforms with 8-lane ports, in case the speed isn't 400G the 8 lane profiles will be used and tested.
The test case PR: sonic-net/sonic-mgmt#3694

liat-grozovik

can you read the number of lanes and make it more generic and not Spectrum3 only?

Signed-off-by: Stephen Sun <stephens@nvidia.com>

stephenxs · 2021-06-23T11:15:21Z

can you read the number of lanes and make it more generic and not Spectrum3 only?

fixed by removing the dependency on ASIC type.

Signed-off-by: Stephen Sun <stephens@nvidia.com>

cfgmgr/buffer_pool_mellanox.lua

cfgmgr/buffermgrdyn.cpp

stephenxs · 2021-06-24T08:13:06Z

VS test failed due to environmental issue. Need rerun

liat-grozovik · 2021-06-24T12:42:33Z

/azp run

azure-pipelines · 2021-06-24T12:42:49Z

Azure Pipelines successfully started running 1 pipeline(s).

stephenxs · 2021-06-25T00:41:10Z

Failed by dynamic port breakdown

test_port_dpb_system.py::TestPortDPBSystem::test_port_breakout_simple[Ethernet0-2x50G0] FAILED [ 66%]
test_port_dpb_system.py::TestPortDPBSystem::test_port_breakout_simple[Ethernet0-4x25G[10G]0] FAILED [ 66%]
test_port_dpb_system.py::TestPortDPBSystem::test_port_breakout_simple[Ethernet0-2x50G1] FAILED [ 66%]
test_port_dpb_system.py::TestPortDPBSystem::test_port_breakout_simple[Ethernet0-2x25G(2)+1x50G(2)0] FAILED [ 66%]
test_port_dpb_system.py::TestPortDPBSystem::test_port_breakout_simple[Ethernet0-2x50G2] FAILED [ 67%]
test_port_dpb_system.py::TestPortDPBSystem::test_port_breakout_simple[Ethernet0-1x50G(2)+2x25G(2)0] FAILED [ 67%]
test_port_dpb_system.py::TestPortDPBSystem::test_port_breakout_simple[Ethernet0-2x50G3] FAILED [ 67%]
test_port_dpb_system.py::TestPortDPBSystem::test_port_breakout_simple[Ethernet0-1x100G[40G]0] PASSED [ 67%]
test_port_dpb_system.py::TestPortDPBSystem::test_port_breakout_simple[Ethernet0-4x25G[10G]1] FAILED [ 67%]
test_port_dpb_system.py::TestPortDPBSystem::test_port_breakout_simple[Ethernet0-2x25G(2)+1x50G(2)1] FAILED [ 68%]
test_port_dpb_system.py::TestPortDPBSystem::test_port_breakout_simple[Ethernet0-4x25G[10G]2] FAILED [ 68%]
test_port_dpb_system.py::TestPortDPBSystem::test_port_breakout_simple[Ethernet0-1x50G(2)+2x25G(2)1] FAILED [ 68%]
test_port_dpb_system.py::TestPortDPBSystem::test_port_breakout_simple[Ethernet0-4x25G[10G]3] FAILED [ 68%]
test_port_dpb_system.py::TestPortDPBSystem::test_port_breakout_simple[Ethernet0-1x100G[40G]1] PASSED [ 69%]
test_port_dpb_system.py::TestPortDPBSystem::test_port_breakout_simple[Ethernet0-2x25G(2)+1x50G(2)2] FAILED [ 69%]
test_port_dpb_system.py::TestPortDPBSystem::test_port_breakout_simple[Ethernet0-1x50G(2)+2x25G(2)2] FAILED [ 69%]
test_port_dpb_system.py::TestPortDPBSystem::test_port_breakout_simple[Ethernet0-2x25G(2)+1x50G(2)3] FAILED [ 69%]
test_port_dpb_system.py::TestPortDPBSystem::test_port_breakout_simple[Ethernet0-1x100G[40G]2] PASSED [ 69%]
test_port_dpb_system.py::TestPortDPBSystem::test_port_breakout_simple[Ethernet0-1x50G(2)+2x25G(2)3] FAILED [ 70%]
test_port_dpb_system.py::TestPortDPBSystem::test_port_breakout_simple[Ethernet0-1x100G[40G]3] PASSED [ 70%]
test_port_dpb_system.py::TestPortDPBSystem::test_port_breakout_with_vlan FAILED [ 70%]
test_port_dpb_system.py::TestPortDPBSystem::test_port_breakout_with_acl SKIPPED [ 70%]

2021-06-24T15:39:13.6879387Z         if not status:
2021-06-24T15:39:13.6879718Z             message = failure_message or (
2021-06-24T15:39:13.6883959Z                 f"Expected field/value pairs not found: expected={expected_fields}, "
2021-06-24T15:39:13.6884797Z                 f'received={result}, key="{key}", table="{table_name}"'
2021-06-24T15:39:13.6885135Z             )
2021-06-24T15:39:13.6885455Z >           assert not polling_config.strict, message
2021-06-24T15:39:13.6890298Z E           AssertionError: Expected field/value pairs not found: expected={'brkout_mode': '2x50G'}, received={'brkout_mode': '1x100G[40G]'}, key="Ethernet0", table="BREAKOUT_CFG"
2021-06-24T15:39:13.6890812Z 
2021-06-24T15:39:13.6891111Z dvslib/dvs_database.py:203: AssertionError

liat-grozovik · 2021-06-25T11:30:12Z

/azp run

azure-pipelines · 2021-06-25T11:30:22Z

Azure Pipelines successfully started running 1 pipeline(s).

…a plugins for buffer pool calculation and headroom checking (#1781) What I did Bug fixes for buffer pool calculation and headroom checking on Mellanox platforms. Test the number of lanes instead of the speed when determining whether special handling is required for a port. For speeds other than 400G, eg 100G, it's possible that some 100G ports have 8 lanes and others have 4 lanes, which means they can not share the same buffer profile. A suffix _8lane is introduced to indicate it, like pg_lossless_100000_5m_8lane_profile Take the private headroom into account when calculating the buffer pool size Take deviation into account when checking the headroom against the per-port limit to avoid the inaccurate result in a rare case Use hashtable to record the reference count of a profile in lug plugin Signed-off-by: Stephen Sun stephens@nvidia.com How I verified it Run regression and manually test Details if related Test the number of lanes instead of the speed when determining whether special handling (double headroom size) is required for a port. Originally, it was determined by testing whether the ports' speed is 400G but that is not accurate. A user can configure a port with 8 lanes to 100G. In this case, special handling is still required for a port that is not 400G. So we need to adjust the way to do that. The variable names are also updated accordingly: xxx_400g => xxx_8lanes Take deviation into account when checking the headroom against the per-port limit to avoid the inaccurate result in a rare case There are some deviations that make the accumulative headroom a bit larger than the quantity calculated by the buffer manager. We need to take it into account when calculating the accumulative headroom.

Advance submodule head for sonic-swss 3226163 [BufferOrch] Don't call SAI API for BUFFER_POOL/PROFILE handling in case the op is DEL and the SAI OID is NULL (sonic-net/sonic-swss#1786) 6c88e47 [Dynamic Buffer Calc][Mellanox] Bug fixes and enhancements for the lua plugins for buffer pool calculation and headroom checking (sonic-net/sonic-swss#1781) e86b900 [MPLS] sonic-swss changes for MPLS (sonic-net/sonic-swss#1686) 4c8e2b5 [Dynamic Buffer Calc] Avoid creating lossy PG for admin down ports during initialization (sonic-net/sonic-swss#1776) 3602124 [VS test stability] Skip flaky test for DPB (sonic-net/sonic-swss#1807) c37cc1c Support for in-band-mgmt via management VRF (sonic-net/sonic-swss#1726) 1e3a532 Fix config prompt question issue (sonic-net/sonic-swss#1799) Signed-off-by: Stephen Sun <stephens@nvidia.com>

Advance submodule head for sonic-swss on 202012 bb383be2 [Dynamic Buffer Calc][Mellanox] Bug fixes and enhancements for the lua plugins for buffer pool calculation and headroom checking (sonic-net/sonic-swss#1781) f949dfe9 [Dynamic Buffer Calc] Avoid creating lossy PG for admin down ports during initialization (sonic-net/sonic-swss#1776) def0a914 Fix config prompt question issue (sonic-net/sonic-swss#1799) 21f97506 [ci]: Merge azure pipelines from master to 202012 branch (sonic-net/sonic-swss#1764) a83a2a42 [vstest]: add dvs_route fixture 849bdf9c [Mux] Add support for mux metrics to State DB (sonic-net/sonic-swss#1757) 386de717 [qosorch] Dot1p map list initialization fix (sonic-net/sonic-swss#1746) f99abdca [sub intf] Port object reference count update (sonic-net/sonic-swss#1712) 4a00042d [vstest/nhg]: use dvs_route fixture to make test_nhg more robust Signed-off-by: Stephen Sun <stephens@nvidia.com>

Advance submodule head for sonic-swss 3226163 [BufferOrch] Don't call SAI API for BUFFER_POOL/PROFILE handling in case the op is DEL and the SAI OID is NULL (sonic-net/sonic-swss#1786) 6c88e47 [Dynamic Buffer Calc][Mellanox] Bug fixes and enhancements for the lua plugins for buffer pool calculation and headroom checking (sonic-net/sonic-swss#1781) e86b900 [MPLS] sonic-swss changes for MPLS (sonic-net/sonic-swss#1686) 4c8e2b5 [Dynamic Buffer Calc] Avoid creating lossy PG for admin down ports during initialization (sonic-net/sonic-swss#1776) 3602124 [VS test stability] Skip flaky test for DPB (sonic-net/sonic-swss#1807) c37cc1c Support for in-band-mgmt via management VRF (sonic-net/sonic-swss#1726) 1e3a532 Fix config prompt question issue (sonic-net/sonic-swss#1799) Signed-off-by: Stephen Sun <stephens@nvidia.com>

…a plugins for buffer pool calculation and headroom checking (sonic-net#1781) What I did Bug fixes for buffer pool calculation and headroom checking on Mellanox platforms. Test the number of lanes instead of the speed when determining whether special handling is required for a port. For speeds other than 400G, eg 100G, it's possible that some 100G ports have 8 lanes and others have 4 lanes, which means they can not share the same buffer profile. A suffix _8lane is introduced to indicate it, like pg_lossless_100000_5m_8lane_profile Take the private headroom into account when calculating the buffer pool size Take deviation into account when checking the headroom against the per-port limit to avoid the inaccurate result in a rare case Use hashtable to record the reference count of a profile in lug plugin Signed-off-by: Stephen Sun stephens@nvidia.com How I verified it Run regression and manually test Details if related Test the number of lanes instead of the speed when determining whether special handling (double headroom size) is required for a port. Originally, it was determined by testing whether the ports' speed is 400G but that is not accurate. A user can configure a port with 8 lanes to 100G. In this case, special handling is still required for a port that is not 400G. So we need to adjust the way to do that. The variable names are also updated accordingly: xxx_400g => xxx_8lanes Take deviation into account when checking the headroom against the per-port limit to avoid the inaccurate result in a rare case There are some deviations that make the accumulative headroom a bit larger than the quantity calculated by the buffer manager. We need to take it into account when calculating the accumulative headroom.

#### What I did To support loading configuration data in yang schema, the `config load` command is enchanced with the below options - `-t` `--file-format` to specify the file-format. The config file can be `yang` or `config_db` format - `-r` to restart the services. Currently this option is supported for yang file format only. - #### How I did it Add the above mentioned cli options. Add Unit tests #### How to verify it Verify the command on VS. ``` admin@vlab-01:~$ sudo config load -y -c yang -r /etc/sonic/yang_cfg.json Disabling container monitoring ... Stopping SONiC target ... Running command: /usr/local/bin/sonic-cfggen -H -Y /etc/sonic/yang_cfg.json -j /etc/sonic/init_cfg.json --write-to-db Restarting SONiC target ... Enabling container monitoring ... Reloading Monit configuration ... Reinitializing monit daemon Please note setting loaded from minigraph will be lost after system reboot.To preserve setting, run `config save`. admin@vlab-01:~$ sudo config load -y -c yang /etc/sonic/yang_cfg.json Running command: /usr/local/bin/sonic-cfggen -H -Y /etc/sonic/yang_cfg.json -j /etc/sonic/init_cfg.json --write-to-db Please note setting loaded from minigraph will be lost after system reboot.To preserve setting, run `config save`. admin@vlab-01:~$ sudo config load Load config in config_db format from the default config file(s) ? [y/N]: y Running command: /usr/local/bin/sonic-cfggen -j /etc/sonic/config_db.json --write-to-db admin@vlab-01:~$ sudo config load -y Running command: /usr/local/bin/sonic-cfggen -j /etc/sonic/config_db.json --write-to-db ```

stephenxs added Bug 🐛 Request for 202012 Branch labels Jun 15, 2021

stephenxs requested a review from neethajohn June 15, 2021 15:14

stephenxs closed this Jun 16, 2021

stephenxs reopened this Jun 16, 2021

stephenxs force-pushed the 8lane-instead-of-400g branch 2 times, most recently from 7a2a4c3 to d8cfaa3 Compare June 17, 2021 15:33

stephenxs force-pushed the 8lane-instead-of-400g branch 2 times, most recently from 446d92a to da6a40d Compare June 19, 2021 14:35

stephenxs force-pushed the 8lane-instead-of-400g branch from da6a40d to f3f2c75 Compare June 21, 2021 10:36

neethajohn reviewed Jun 22, 2021

View reviewed changes

cfgmgr/buffer_headroom_mellanox.lua Outdated Show resolved Hide resolved

Address comments: store number of lanes instead of whether it's an 8-…

467b8cb

…lane port Signed-off-by: Stephen Sun <stephens@nvidia.com>

neethajohn previously approved these changes Jun 22, 2021

View reviewed changes

stephenxs mentioned this pull request Jun 23, 2021

[Dynamic Buffer Calc] Support dynamic buffer calculation on top of port auto negotiation #1762

Merged

liat-grozovik requested changes Jun 23, 2021

View reviewed changes

Read and check the number of lanes regardless of the ASIC type

031f642

Signed-off-by: Stephen Sun <stephens@nvidia.com>

stephenxs dismissed neethajohn’s stale review via 031f642 June 23, 2021 11:14

stephenxs requested a review from prsunny as a code owner June 23, 2021 11:14

stephenxs requested review from liat-grozovik and neethajohn June 23, 2021 11:14

Add some explaination regarding the "8lane" in the profile name

ea93f12

Signed-off-by: Stephen Sun <stephens@nvidia.com>

liat-grozovik reviewed Jun 23, 2021

View reviewed changes

cfgmgr/buffer_pool_mellanox.lua Show resolved Hide resolved

cfgmgr/buffermgrdyn.cpp Show resolved Hide resolved

liat-grozovik approved these changes Jun 24, 2021

View reviewed changes

neethajohn approved these changes Jun 25, 2021

View reviewed changes

stephenxs changed the title ~~[Dynamic Buffer Calc][Mellanox] Bug fixes for the lua plugins for buffer pool calculation and headroom checking~~ [Dynamic Buffer Calc][Mellanox] Bug fixes and enhancements for the lua plugins for buffer pool calculation and headroom checking Jun 28, 2021

neethajohn merged commit 6c88e47 into sonic-net:master Jun 28, 2021

stephenxs deleted the 8lane-instead-of-400g branch June 28, 2021 23:50

qiluo-msft added the Included in 202012 Branch label Jun 29, 2021

This was referenced Jun 29, 2021

[sonic-swss] submodule update sonic-net/sonic-buildimage#8010

Merged

[sonic-swss][202012] submodule update sonic-net/sonic-buildimage#8011

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Dynamic Buffer Calc][Mellanox] Bug fixes and enhancements for the lua plugins for buffer pool calculation and headroom checking #1781

[Dynamic Buffer Calc][Mellanox] Bug fixes and enhancements for the lua plugins for buffer pool calculation and headroom checking #1781

stephenxs commented Jun 11, 2021 •

edited by liat-grozovik

Loading

stephenxs commented Jun 11, 2021 •

edited

Loading

liat-grozovik commented Jun 17, 2021

stephenxs commented Jun 18, 2021

neethajohn left a comment

stephenxs commented Jun 22, 2021

liat-grozovik left a comment

stephenxs commented Jun 23, 2021

stephenxs commented Jun 24, 2021

liat-grozovik commented Jun 24, 2021

azure-pipelines bot commented Jun 24, 2021

stephenxs commented Jun 25, 2021

liat-grozovik commented Jun 25, 2021

azure-pipelines bot commented Jun 25, 2021

[Dynamic Buffer Calc][Mellanox] Bug fixes and enhancements for the lua plugins for buffer pool calculation and headroom checking #1781

[Dynamic Buffer Calc][Mellanox] Bug fixes and enhancements for the lua plugins for buffer pool calculation and headroom checking #1781

Conversation

stephenxs commented Jun 11, 2021 • edited by liat-grozovik Loading

stephenxs commented Jun 11, 2021 • edited Loading

liat-grozovik commented Jun 17, 2021

stephenxs commented Jun 18, 2021

neethajohn left a comment

Choose a reason for hiding this comment

stephenxs commented Jun 22, 2021

liat-grozovik left a comment

Choose a reason for hiding this comment

stephenxs commented Jun 23, 2021

stephenxs commented Jun 24, 2021

liat-grozovik commented Jun 24, 2021

azure-pipelines bot commented Jun 24, 2021

stephenxs commented Jun 25, 2021

liat-grozovik commented Jun 25, 2021

azure-pipelines bot commented Jun 25, 2021

stephenxs commented Jun 11, 2021 •

edited by liat-grozovik

Loading

stephenxs commented Jun 11, 2021 •

edited

Loading