[Reclaim buffer][202106] Reclaim unused buffer for dynamic buffer model #1986

stephenxs · 2021-10-26T03:29:43Z

This is to backport community PR #1910 to 202106 branch.
Depends on #2038 which backports #1996 to 202106 branch.

What I did

Reclaim reserved buffer of unused ports for both dynamic and traditional models.
This is done by

Removing lossless priority groups on unused ports.
Applying zero buffer profiles on the buffer objects of unused ports.
In the dynamic buffer model, the zero profiles are loaded from a JSON file and applied to APPL_DB if there are admin down ports.
The default buffer configuration will be configured on all ports. Buffer manager will apply zero profiles on admin down ports.
In the static buffer model, the zero profiles are loaded by the buffer template.

Signed-off-by: Stephen Sun stephens@nvidia.com

Why I did it

How I verified it

Regression test and vs test.

Details if related
Static buffer model

Remove the lossless buffer priority group if the port is admin-down and the buffer profile aligns with the speed and cable length of the port.

Dynamic buffer model

Handle zero buffer pools and profiles

buffermgrd: add a CLI option to load the JSON file for zero profiles.
Load them from JSON file into the internal buffer manager's data structure
Apply them to APPL_DB once there is at least one admin-down port
- Record zero profiles' names in the pool object it references.
  By doing so, the zero profile lists can be constructed according to the normal profile list. There should be one profile for each pool on the ingress/egress side.
- And then apply the zero profiles to the buffer objects of the port.
- Unload them from APPL_DB once all ports are admin-up since the zero pools and profiles are no longer referenced.
  Remove buffer pool counter id when the zero pool is removed.
Now that it's possible that a pool will be removed from the system, the watermark counter of the pool is removed ahead of the pool itself being removed.

Handle port admin status change

Currently, there is a logic of removing buffer priority groups of admin down ports. This logic will be reused and extended for all buffer objects, including BUFFER_QUEUE, BUFFER_PORT_INGRESS_PROFILE_LIST, and BUFFER_PORT_EGRESS_PROFILE_LIST.
- When the port is admin down,
  - The normal profiles are removed from the buffer objects of the port
  - The zero profiles, if provided, are applied to the port
- When the port is admin up,
  - The zero profiles, if applied, are removed from the port
  - The normal profiles are applied to the port.
Ports orchagent exposes the number of queues and priority groups to STATE_DB.
Buffer manager can take advantage of these values to apply zero profiles on all the priority groups and queues of the admin-down ports.
In case it is not necessary to apply zero profiles on all priority groups or queues on a certain platform, ids_to_reclaim can be customized in the JSON file.
Handle all buffer tables, including BUFFER_PG, BUFFER_QUEUE, BUFFER_PORT_INGRESS_PROFILE_LIST and BUFFER_PORT_EGRESS_PROFILE_LIST
- Originally, only the BUFFER_PG table was cached in the dynamic buffer manager.
- Now, all tables are cached in order to apply zero profiles when a port is admin down and apply normal profiles when it's up.
- The index of such tables can include a single port or a list of ports, like BUFFER_PG|Ethernet0|3-4 or BUFFER_PG|Ethernet0,Ethernet4,Ethernet8|3-4. Originally, there is a logic to handle such indexes for the BUFFER_PG table. Now it is reused and extended to handle all the tables.
[Mellanox] Plugin to calculate buffer pool size:
- Originally, buffer for the queue, buffer profile list, etc. were not reclaimed for admin-down ports so they are reserved for all ports.
- Now, they are reserved for admin-up ports only.

Accelerate the progress of applying buffer tables to APPL_DB

This is an optimization on top of reclaiming buffer.

Don't apply buffer profiles, buffer objects to APPL_DB before buffer pools are applied when the system is starting.
This is to apply the items in an order from referenced items to referencing items and try to avoid buffer orchagent retrying due to referenced table items.
However, it is still possible that the referencing items are handled before referenced items. In that case, there should not be any error message.
[Mellanox] Plugin to calculate buffer pool size:
Return the buffer pool sizes value currently in APPL_DB if the pool sizes are not able to be calculated due to lacking some information. This typically happens at the system start.
This is to accelerate the progress of pushing tables to APPL_DB.

This is to backport community PR 1910 to 202106 branch. **What I did** Reclaim reserved buffer of unused ports for both dynamic and traditional models. This is done by - Removing lossless priority groups on unused ports. - Applying zero buffer profiles on the buffer objects of unused ports. - In the dynamic buffer model, the zero profiles are loaded from a JSON file and applied to `APPL_DB` if there are admin down ports. The default buffer configuration will be configured on all ports. Buffer manager will apply zero profiles on admin down ports. - In the static buffer model, the zero profiles are loaded by the buffer template. Signed-off-by: Stephen Sun <stephens@nvidia.com> **Why I did it** **How I verified it** Regression test and vs test. **Details if related** ***Static buffer model*** Remove the lossless buffer priority group if the port is admin-down and the buffer profile aligns with the speed and cable length of the port. ***Dynamic buffer model*** ****Handle zero buffer pools and profiles**** 1. buffermgrd: add a CLI option to load the JSON file for zero profiles. 2. Load them from JSON file into the internal buffer manager's data structure 3. Apply them to APPL_DB once there is at least one admin-down port - Record zero profiles' names in the pool object it references. By doing so, the zero profile lists can be constructed according to the normal profile list. There should be one profile for each pool on the ingress/egress side. - And then apply the zero profiles to the buffer objects of the port. - Unload them from APPL_DB once all ports are admin-up since the zero pools and profiles are no longer referenced. Remove buffer pool counter id when the zero pool is removed. 4. Now that it's possible that a pool will be removed from the system, the watermark counter of the pool is removed ahead of the pool itself being removed. ****Handle port admin status change**** 1. Currently, there is a logic of removing buffer priority groups of admin down ports. This logic will be reused and extended for all buffer objects, including `BUFFER_QUEUE`, `BUFFER_PORT_INGRESS_PROFILE_LIST`, and `BUFFER_PORT_EGRESS_PROFILE_LIST`. - When the port is admin down, - The normal profiles are removed from the buffer objects of the port - The zero profiles, if provided, are applied to the port - When the port is admin up, - The zero profiles, if applied, are removed from the port - The normal profiles are applied to the port. 2. Ports orchagent exposes the number of queues and priority groups to STATE_DB. Buffer manager can take advantage of these values to apply zero profiles on all the priority groups and queues of the admin-down ports. In case it is not necessary to apply zero profiles on all priority groups or queues on a certain platform, `ids_to_reclaim` can be customized in the JSON file. 3. Handle all buffer tables, including `BUFFER_PG`, `BUFFER_QUEUE`, `BUFFER_PORT_INGRESS_PROFILE_LIST` and `BUFFER_PORT_EGRESS_PROFILE_LIST` - Originally, only the `BUFFER_PG` table was cached in the dynamic buffer manager. - Now, all tables are cached in order to apply zero profiles when a port is admin down and apply normal profiles when it's up. - The index of such tables can include a single port or a list of ports, like `BUFFER_PG|Ethernet0|3-4` or `BUFFER_PG|Ethernet0,Ethernet4,Ethernet8|3-4`. Originally, there is a logic to handle such indexes for the `BUFFER_PG` table. Now it is reused and extended to handle all the tables. 4. [Mellanox] Plugin to calculate buffer pool size: - Originally, buffer for the queue, buffer profile list, etc. were not reclaimed for admin-down ports so they are reserved for all ports. - Now, they are reserved for admin-up ports only. ****Accelerate the progress of applying buffer tables to APPL_DB**** This is an optimization on top of reclaiming buffer. 1. Don't apply buffer profiles, buffer objects to `APPL_DB` before buffer pools are applied when the system is starting. This is to apply the items in an order from referenced items to referencing items and try to avoid buffer orchagent retrying due to referenced table items. However, it is still possible that the referencing items are handled before referenced items. In that case, there should not be any error message. 2. [Mellanox] Plugin to calculate buffer pool size: Return the buffer pool sizes value currently in APPL_DB if the pool sizes are not able to be calculated due to lacking some information. This typically happens at the system start. This is to accelerate the progress of pushing tables to APPL_DB.

stephenxs · 2021-12-08T05:08:32Z

/azpw run

mssonicbld · 2021-12-08T05:08:34Z

/AzurePipelines run

azure-pipelines · 2021-12-08T05:08:44Z

Azure Pipelines successfully started running 1 pipeline(s).

stephenxs · 2021-12-08T07:24:04Z

/azpw run

mssonicbld · 2021-12-08T07:24:06Z

/AzurePipelines run

azure-pipelines · 2021-12-08T07:24:16Z

Azure Pipelines successfully started running 1 pipeline(s).

- Improve admin down test cases - Restore cable length to 0m after test in order to prevent traditional buffer manager from creating lossless profiles Signed-off-by: Stephen Sun <stephens@nvidia.com>

Signed-off-by: stephens <stephens@contoso.com>

stephenxs · 2021-12-08T13:16:34Z

Depends on #2038

arlakshm · 2021-12-21T00:35:53Z

/Azp run Azure.sonic-swss

azure-pipelines · 2021-12-21T00:36:02Z

Azure Pipelines successfully started running 1 pipeline(s).

liat-grozovik · 2022-01-03T15:42:23Z

/azp run Azure.sonic-swss

azure-pipelines · 2022-01-03T15:42:33Z

Azure Pipelines successfully started running 1 pipeline(s).

liat-grozovik · 2022-01-03T15:42:49Z

/azp run LGTM

azure-pipelines · 2022-01-03T15:43:00Z

No pipelines are associated with this pull request.

stephenxs · 2022-01-04T03:04:51Z

LGTM failed due to dependency on #2038

stephenxs · 2022-02-22T08:15:33Z

Currently, the PR depends on #2118 for the vstest failure.

- What I did Two bugs were found related to this feature. This PR included fixes for this. 1. non-default since argument is not being applied. Happening because subprocess_exec(["date", "--date='{}'".format(since_cfg)]) is failing. Replacing this with subprocess_exec(["date", "--date={}".format(since_cfg)]) solved the problem. 2. core_cleanup is not working because of the unnecessary recent_file_creation check. - How I did it Remove '' from date - How to verify it Run manual test flow which found the issue Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>

liushilongbuaa · 2022-03-01T00:10:59Z

/azp run

azure-pipelines · 2022-03-01T00:11:10Z

Azure Pipelines successfully started running 1 pipeline(s).

liat-grozovik · 2022-03-10T12:01:32Z

/azp run Azure.sonic-swss

azure-pipelines · 2022-03-10T12:01:41Z

Azure Pipelines successfully started running 1 pipeline(s).

stephenxs requested a review from prsunny as a code owner October 26, 2021 03:29

stephenxs requested a review from neethajohn October 26, 2021 03:44

stephenxs mentioned this pull request Oct 26, 2021

Reclaim reserved buffer for unused ports sonic-net/SONiC#831

Merged

stephenxs force-pushed the reclaim-buffer-202106 branch 3 times, most recently from a2ef5d8 to 0d21a95 Compare November 19, 2021 02:03

stephenxs changed the title ~~[Reclaim buffer][202106] Reclaim unused buffer for both traditional and dynamic buffer model~~ [Reclaim buffer][202106] Reclaim unused buffer for dynamic buffer model Nov 19, 2021

stephenxs force-pushed the reclaim-buffer-202106 branch from 124a6c8 to e963183 Compare November 24, 2021 08:54

stephenxs force-pushed the reclaim-buffer-202106 branch from 124a6c8 to 2dfe25b Compare December 3, 2021 14:08

stephenxs added the Request for 202106 Branch label Dec 3, 2021

neethajohn approved these changes Dec 6, 2021

View reviewed changes

Merge remote-tracking branch 'origin/202106' into reclaim-buffer-202106

ad3cf18

stephenxs and others added 2 commits December 8, 2021 20:50

Adjust vs test

cf7b04a

- Improve admin down test cases - Restore cable length to 0m after test in order to prevent traditional buffer manager from creating lossless profiles Signed-off-by: Stephen Sun <stephens@nvidia.com>

Tolerance between both the old and new format of reference

6875375

Signed-off-by: stephens <stephens@contoso.com>

neethajohn approved these changes Dec 8, 2021

View reviewed changes

stephenxs closed this Mar 28, 2022

stephenxs deleted the reclaim-buffer-202106 branch May 26, 2022 13:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Reclaim buffer][202106] Reclaim unused buffer for dynamic buffer model #1986

[Reclaim buffer][202106] Reclaim unused buffer for dynamic buffer model #1986

stephenxs commented Oct 26, 2021 •

edited

Loading

stephenxs commented Dec 8, 2021

mssonicbld commented Dec 8, 2021

azure-pipelines bot commented Dec 8, 2021

stephenxs commented Dec 8, 2021

mssonicbld commented Dec 8, 2021

azure-pipelines bot commented Dec 8, 2021

stephenxs commented Dec 8, 2021

arlakshm commented Dec 21, 2021

azure-pipelines bot commented Dec 21, 2021

liat-grozovik commented Jan 3, 2022

azure-pipelines bot commented Jan 3, 2022

liat-grozovik commented Jan 3, 2022

azure-pipelines bot commented Jan 3, 2022

stephenxs commented Jan 4, 2022

stephenxs commented Feb 22, 2022

liushilongbuaa commented Mar 1, 2022

azure-pipelines bot commented Mar 1, 2022

liat-grozovik commented Mar 10, 2022

azure-pipelines bot commented Mar 10, 2022

[Reclaim buffer][202106] Reclaim unused buffer for dynamic buffer model #1986

[Reclaim buffer][202106] Reclaim unused buffer for dynamic buffer model #1986

Conversation

stephenxs commented Oct 26, 2021 • edited Loading

stephenxs commented Dec 8, 2021

mssonicbld commented Dec 8, 2021

azure-pipelines bot commented Dec 8, 2021

stephenxs commented Dec 8, 2021

mssonicbld commented Dec 8, 2021

azure-pipelines bot commented Dec 8, 2021

stephenxs commented Dec 8, 2021

arlakshm commented Dec 21, 2021

azure-pipelines bot commented Dec 21, 2021

liat-grozovik commented Jan 3, 2022

azure-pipelines bot commented Jan 3, 2022

liat-grozovik commented Jan 3, 2022

azure-pipelines bot commented Jan 3, 2022

stephenxs commented Jan 4, 2022

stephenxs commented Feb 22, 2022

liushilongbuaa commented Mar 1, 2022

azure-pipelines bot commented Mar 1, 2022

liat-grozovik commented Mar 10, 2022

azure-pipelines bot commented Mar 10, 2022

stephenxs commented Oct 26, 2021 •

edited

Loading