Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Dynamic Buffer Calc][Mellanox] Bug fixes and enhancements for the lua plugins for buffer pool calculation and headroom checking #1781

Merged
merged 4 commits into from
Jun 28, 2021

Conversation

stephenxs
Copy link
Collaborator

@stephenxs stephenxs commented Jun 11, 2021

What I did
Bug fixes for buffer pool calculation and headroom checking on Mellanox platforms.

  • Test the number of lanes instead of the speed when determining whether special handling is required for a port.
    For speeds other than 400G, eg 100G, it's possible that some 100G ports have 8 lanes and others have 4 lanes,
    which means they can not share the same buffer profile.
    A suffix _8lane is introduced to indicate it, like pg_lossless_100000_5m_8lane_profile
  • Take the private headroom into account when calculating the buffer pool size
  • Take deviation into account when checking the headroom against the per-port limit to avoid the inaccurate result in a rare case
  • Use hashtable to record the reference count of a profile in lug plugin

Signed-off-by: Stephen Sun stephens@nvidia.com

Why I did it

How I verified it
Run regression and manually test

Details if related

  • Test the number of lanes instead of the speed when determining whether special handling (double headroom size) is required for a port.
    Originally, it was determined by testing whether the ports' speed is 400G but that is not accurate. A user can configure a port with 8 lanes to 100G. In this case, special handling is still required for a port that is not 400G.
    So we need to adjust the way to do that.
    The variable names are also updated accordingly: xxx_400g => xxx_8lanes
  • Take deviation into account when checking the headroom against the per-port limit to avoid the inaccurate result in a rare case
    There are some deviations that make the accumulative headroom a bit larger than the quantity calculated by the buffer manager. We need to take it into account when calculating the accumulative headroom.

@stephenxs
Copy link
Collaborator Author

stephenxs commented Jun 11, 2021

This PR contains the fixes of 3 bugs and 1 enhancement. In theory, we open 1 PR for 1 thing. But there are dependencies between all fixes, which makes it very difficult to split it.

@liat-grozovik
Copy link
Collaborator

@neethajohn can you please help to review?
@stephenxs to which branches this PR should be cherry picked and if this is a clean cherry picked or new new PR?

@stephenxs
Copy link
Collaborator Author

@neethajohn can you please help to review?
@stephenxs to which branches this PR should be cherry picked and if this is a clean cherry picked or new new PR?

  1. so far it can be smoothly cherry-picked.

@stephenxs stephenxs force-pushed the 8lane-instead-of-400g branch 2 times, most recently from 446d92a to da6a40d Compare June 19, 2021 14:35
- Take number of lanes instead of speed into account when determining whether it has doubled pipeline latency
  For speeds other than 400G, eg 100G, it's possible that some 100G ports have 8 lanes and others have 4 lanes
  In this case, we need to add "8_lane" to the profile name to indicate whether the profile is for 8 lane ports or normal ports
  This is for Mellanox platform only
- Take advantage of "set" feature of the lua to present the profile referencing count, which also makes the code more maintainable
- Take deviation into account when checking the headroom against the limit
- Take private headroom into account when shared headroom pool is enabled

Signed-off-by: Stephen Sun <stephens@nvidia.com>
Copy link
Contributor

@neethajohn neethajohn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a sonic-mgmt tests for this 8 lane profile

cfgmgr/buffer_headroom_mellanox.lua Outdated Show resolved Hide resolved
…lane port

Signed-off-by: Stephen Sun <stephens@nvidia.com>
@stephenxs
Copy link
Collaborator Author

Please add a sonic-mgmt tests for this 8 lane profile

In almost every test cases, there is a logic to check whether the buffer profile in BUFFER_PG_TABLE is correct. On platforms with 8-lane ports, in case the speed isn't 400G the 8 lane profiles will be used and tested.
The test case PR: sonic-net/sonic-mgmt#3694

neethajohn
neethajohn previously approved these changes Jun 22, 2021
Copy link
Collaborator

@liat-grozovik liat-grozovik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you read the number of lanes and make it more generic and not Spectrum3 only?

Signed-off-by: Stephen Sun <stephens@nvidia.com>
@stephenxs
Copy link
Collaborator Author

can you read the number of lanes and make it more generic and not Spectrum3 only?

fixed by removing the dependency on ASIC type.

Signed-off-by: Stephen Sun <stephens@nvidia.com>
@stephenxs
Copy link
Collaborator Author

VS test failed due to environmental issue. Need rerun

@liat-grozovik
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@stephenxs
Copy link
Collaborator Author

Failed by dynamic port breakdown

test_port_dpb_system.py::TestPortDPBSystem::test_port_breakout_simple[Ethernet0-2x50G0] FAILED [ 66%]
test_port_dpb_system.py::TestPortDPBSystem::test_port_breakout_simple[Ethernet0-4x25G[10G]0] FAILED [ 66%]
test_port_dpb_system.py::TestPortDPBSystem::test_port_breakout_simple[Ethernet0-2x50G1] FAILED [ 66%]
test_port_dpb_system.py::TestPortDPBSystem::test_port_breakout_simple[Ethernet0-2x25G(2)+1x50G(2)0] FAILED [ 66%]
test_port_dpb_system.py::TestPortDPBSystem::test_port_breakout_simple[Ethernet0-2x50G2] FAILED [ 67%]
test_port_dpb_system.py::TestPortDPBSystem::test_port_breakout_simple[Ethernet0-1x50G(2)+2x25G(2)0] FAILED [ 67%]
test_port_dpb_system.py::TestPortDPBSystem::test_port_breakout_simple[Ethernet0-2x50G3] FAILED [ 67%]
test_port_dpb_system.py::TestPortDPBSystem::test_port_breakout_simple[Ethernet0-1x100G[40G]0] PASSED [ 67%]
test_port_dpb_system.py::TestPortDPBSystem::test_port_breakout_simple[Ethernet0-4x25G[10G]1] FAILED [ 67%]
test_port_dpb_system.py::TestPortDPBSystem::test_port_breakout_simple[Ethernet0-2x25G(2)+1x50G(2)1] FAILED [ 68%]
test_port_dpb_system.py::TestPortDPBSystem::test_port_breakout_simple[Ethernet0-4x25G[10G]2] FAILED [ 68%]
test_port_dpb_system.py::TestPortDPBSystem::test_port_breakout_simple[Ethernet0-1x50G(2)+2x25G(2)1] FAILED [ 68%]
test_port_dpb_system.py::TestPortDPBSystem::test_port_breakout_simple[Ethernet0-4x25G[10G]3] FAILED [ 68%]
test_port_dpb_system.py::TestPortDPBSystem::test_port_breakout_simple[Ethernet0-1x100G[40G]1] PASSED [ 69%]
test_port_dpb_system.py::TestPortDPBSystem::test_port_breakout_simple[Ethernet0-2x25G(2)+1x50G(2)2] FAILED [ 69%]
test_port_dpb_system.py::TestPortDPBSystem::test_port_breakout_simple[Ethernet0-1x50G(2)+2x25G(2)2] FAILED [ 69%]
test_port_dpb_system.py::TestPortDPBSystem::test_port_breakout_simple[Ethernet0-2x25G(2)+1x50G(2)3] FAILED [ 69%]
test_port_dpb_system.py::TestPortDPBSystem::test_port_breakout_simple[Ethernet0-1x100G[40G]2] PASSED [ 69%]
test_port_dpb_system.py::TestPortDPBSystem::test_port_breakout_simple[Ethernet0-1x50G(2)+2x25G(2)3] FAILED [ 70%]
test_port_dpb_system.py::TestPortDPBSystem::test_port_breakout_simple[Ethernet0-1x100G[40G]3] PASSED [ 70%]
test_port_dpb_system.py::TestPortDPBSystem::test_port_breakout_with_vlan FAILED [ 70%]
test_port_dpb_system.py::TestPortDPBSystem::test_port_breakout_with_acl SKIPPED [ 70%]

2021-06-24T15:39:13.6879387Z         if not status:
2021-06-24T15:39:13.6879718Z             message = failure_message or (
2021-06-24T15:39:13.6883959Z                 f"Expected field/value pairs not found: expected={expected_fields}, "
2021-06-24T15:39:13.6884797Z                 f'received={result}, key="{key}", table="{table_name}"'
2021-06-24T15:39:13.6885135Z             )
2021-06-24T15:39:13.6885455Z >           assert not polling_config.strict, message
2021-06-24T15:39:13.6890298Z E           AssertionError: Expected field/value pairs not found: expected={'brkout_mode': '2x50G'}, received={'brkout_mode': '1x100G[40G]'}, key="Ethernet0", table="BREAKOUT_CFG"
2021-06-24T15:39:13.6890812Z 
2021-06-24T15:39:13.6891111Z dvslib/dvs_database.py:203: AssertionError

@liat-grozovik
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@stephenxs stephenxs changed the title [Dynamic Buffer Calc][Mellanox] Bug fixes for the lua plugins for buffer pool calculation and headroom checking [Dynamic Buffer Calc][Mellanox] Bug fixes and enhancements for the lua plugins for buffer pool calculation and headroom checking Jun 28, 2021
@neethajohn neethajohn merged commit 6c88e47 into sonic-net:master Jun 28, 2021
@stephenxs stephenxs deleted the 8lane-instead-of-400g branch June 28, 2021 23:50
qiluo-msft pushed a commit that referenced this pull request Jun 29, 2021
…a plugins for buffer pool calculation and headroom checking (#1781)

What I did
Bug fixes for buffer pool calculation and headroom checking on Mellanox platforms.

Test the number of lanes instead of the speed when determining whether special handling is required for a port.
For speeds other than 400G, eg 100G, it's possible that some 100G ports have 8 lanes and others have 4 lanes,
which means they can not share the same buffer profile.
A suffix _8lane is introduced to indicate it, like pg_lossless_100000_5m_8lane_profile
Take the private headroom into account when calculating the buffer pool size
Take deviation into account when checking the headroom against the per-port limit to avoid the inaccurate result in a rare case
Use hashtable to record the reference count of a profile in lug plugin

Signed-off-by: Stephen Sun stephens@nvidia.com

How I verified it
Run regression and manually test

Details if related

Test the number of lanes instead of the speed when determining whether special handling (double headroom size) is required for a port.
Originally, it was determined by testing whether the ports' speed is 400G but that is not accurate. A user can configure a port with 8 lanes to 100G. In this case, special handling is still required for a port that is not 400G.
So we need to adjust the way to do that.
The variable names are also updated accordingly: xxx_400g => xxx_8lanes
Take deviation into account when checking the headroom against the per-port limit to avoid the inaccurate result in a rare case
There are some deviations that make the accumulative headroom a bit larger than the quantity calculated by the buffer manager. We need to take it into account when calculating the accumulative headroom.
liat-grozovik pushed a commit to sonic-net/sonic-buildimage that referenced this pull request Jun 29, 2021
Advance submodule head for sonic-swss

3226163 [BufferOrch] Don't call SAI API for BUFFER_POOL/PROFILE handling in case the op is DEL and the SAI OID is NULL (sonic-net/sonic-swss#1786)
6c88e47 [Dynamic Buffer Calc][Mellanox] Bug fixes and enhancements for the lua plugins for buffer pool calculation and headroom checking (sonic-net/sonic-swss#1781)
e86b900 [MPLS] sonic-swss changes for MPLS (sonic-net/sonic-swss#1686)
4c8e2b5 [Dynamic Buffer Calc] Avoid creating lossy PG for admin down ports during initialization (sonic-net/sonic-swss#1776)
3602124 [VS test stability] Skip flaky test for DPB (sonic-net/sonic-swss#1807)
c37cc1c Support for in-band-mgmt via management VRF (sonic-net/sonic-swss#1726)
1e3a532 Fix config prompt question issue (sonic-net/sonic-swss#1799)

Signed-off-by: Stephen Sun <stephens@nvidia.com>
liat-grozovik pushed a commit to sonic-net/sonic-buildimage that referenced this pull request Jun 29, 2021
Advance submodule head for sonic-swss on 202012

bb383be2 [Dynamic Buffer Calc][Mellanox] Bug fixes and enhancements for the lua plugins for buffer pool calculation and headroom checking (sonic-net/sonic-swss#1781)
f949dfe9 [Dynamic Buffer Calc] Avoid creating lossy PG for admin down ports during initialization (sonic-net/sonic-swss#1776)
def0a914 Fix config prompt question issue (sonic-net/sonic-swss#1799)
21f97506 [ci]: Merge azure pipelines from master to 202012 branch (sonic-net/sonic-swss#1764)
a83a2a42 [vstest]: add dvs_route fixture
849bdf9c [Mux] Add support for mux metrics to State DB (sonic-net/sonic-swss#1757)
386de717 [qosorch] Dot1p map list initialization fix (sonic-net/sonic-swss#1746)
f99abdca [sub intf] Port object reference count update (sonic-net/sonic-swss#1712)
4a00042d [vstest/nhg]: use dvs_route fixture to make test_nhg more robust

Signed-off-by: Stephen Sun <stephens@nvidia.com>
carl-nokia pushed a commit to carl-nokia/sonic-buildimage that referenced this pull request Aug 7, 2021
Advance submodule head for sonic-swss

3226163 [BufferOrch] Don't call SAI API for BUFFER_POOL/PROFILE handling in case the op is DEL and the SAI OID is NULL (sonic-net/sonic-swss#1786)
6c88e47 [Dynamic Buffer Calc][Mellanox] Bug fixes and enhancements for the lua plugins for buffer pool calculation and headroom checking (sonic-net/sonic-swss#1781)
e86b900 [MPLS] sonic-swss changes for MPLS (sonic-net/sonic-swss#1686)
4c8e2b5 [Dynamic Buffer Calc] Avoid creating lossy PG for admin down ports during initialization (sonic-net/sonic-swss#1776)
3602124 [VS test stability] Skip flaky test for DPB (sonic-net/sonic-swss#1807)
c37cc1c Support for in-band-mgmt via management VRF (sonic-net/sonic-swss#1726)
1e3a532 Fix config prompt question issue (sonic-net/sonic-swss#1799)

Signed-off-by: Stephen Sun <stephens@nvidia.com>
raphaelt-nvidia pushed a commit to raphaelt-nvidia/sonic-swss that referenced this pull request Oct 5, 2021
…a plugins for buffer pool calculation and headroom checking (sonic-net#1781)

What I did
Bug fixes for buffer pool calculation and headroom checking on Mellanox platforms.

Test the number of lanes instead of the speed when determining whether special handling is required for a port.
For speeds other than 400G, eg 100G, it's possible that some 100G ports have 8 lanes and others have 4 lanes,
which means they can not share the same buffer profile.
A suffix _8lane is introduced to indicate it, like pg_lossless_100000_5m_8lane_profile
Take the private headroom into account when calculating the buffer pool size
Take deviation into account when checking the headroom against the per-port limit to avoid the inaccurate result in a rare case
Use hashtable to record the reference count of a profile in lug plugin

Signed-off-by: Stephen Sun stephens@nvidia.com

How I verified it
Run regression and manually test

Details if related

Test the number of lanes instead of the speed when determining whether special handling (double headroom size) is required for a port.
Originally, it was determined by testing whether the ports' speed is 400G but that is not accurate. A user can configure a port with 8 lanes to 100G. In this case, special handling is still required for a port that is not 400G.
So we need to adjust the way to do that.
The variable names are also updated accordingly: xxx_400g => xxx_8lanes
Take deviation into account when checking the headroom against the per-port limit to avoid the inaccurate result in a rare case
There are some deviations that make the accumulative headroom a bit larger than the quantity calculated by the buffer manager. We need to take it into account when calculating the accumulative headroom.
EdenGri pushed a commit to EdenGri/sonic-swss that referenced this pull request Feb 28, 2022
#### What I did
To support loading configuration data in yang schema, the `config load` command is enchanced with the below options
- `-t` `--file-format` to specify the file-format. The config file can be `yang` or `config_db` format
-  `-r` to restart the services. Currently this option is supported for yang file format only.
- 
#### How I did it
Add the above mentioned cli options.
Add Unit tests

#### How to verify it
Verify the command on VS.
```
admin@vlab-01:~$ sudo config load -y -c yang -r /etc/sonic/yang_cfg.json
Disabling container monitoring ...
Stopping SONiC target ...
Running command: /usr/local/bin/sonic-cfggen -H -Y /etc/sonic/yang_cfg.json -j /etc/sonic/init_cfg.json --write-to-db
Restarting SONiC target ...
Enabling container monitoring ...
Reloading Monit configuration ...
Reinitializing monit daemon
Please note setting loaded from minigraph will be lost after system reboot.To preserve setting, run `config save`.
admin@vlab-01:~$ sudo config load -y -c yang  /etc/sonic/yang_cfg.json
Running command: /usr/local/bin/sonic-cfggen -H -Y /etc/sonic/yang_cfg.json -j /etc/sonic/init_cfg.json --write-to-db
Please note setting loaded from minigraph will be lost after system reboot.To preserve setting, run `config save`.
admin@vlab-01:~$ sudo config load
Load config in config_db format from the default config file(s) ? [y/N]: y
Running command: /usr/local/bin/sonic-cfggen -j /etc/sonic/config_db.json --write-to-db
admin@vlab-01:~$ sudo config load -y
Running command: /usr/local/bin/sonic-cfggen -j /etc/sonic/config_db.json --write-to-db
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants