Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve the way of handling BUFFER_PG during PFC storm #1480

Merged

Conversation

stephenxs
Copy link
Collaborator

What I did
One action for the PFC storm is to set a zero buffer profile on the PG under PFC storm.
The zero buffer won't be removed until the PFC storm has gone.
If the user wants to modify the buffer profile for the PG, the bufferorch will return "task_need_retry".
General speaking it doesn't matter unless that:

  • the system can't be warm rebooted until the PFC storm has gone.
  • the "task_need_retry" will block the update coming from any entry of the BUFFER_PG table from being programmed to ASIC.
    In this sense, we need a better solution.

In the new solution, it will:

  • record the new profile user wants to apply during PFC storm as the "pending profile" for that PG and return "task_success" if the PG is under PFC storm.
  • apply the pending profile once the PG is unlocked.
  • the latest pending profile will take effect in case user tries updating the profile for more than 1 times.

Signed-off-by: Stephen Sun stephens@nvidia.com

Why I did it

How I verified it
It can be verified in the following steps:

  1. configure PFC watchdog. the buffer profile should be the original value
  2. start PFC storm on PG 4.
NOTICE swss#orchagent: :- startWdActionOnQueue: Receive notification, storm
NOTICE swss#orchagent: :- startWdActionOnQueue: PFC Watchdog detected PFC storm on port Ethernet64, queue index 4, queue id 0x150000000006e5 and port id 0x10000000006de.
  1. modify PG 3-4
Oct 16 10:31:50.869634 mtbc-sonic-01-2410 NOTICE swss#orchagent: :- processPriorityGroup: Set buffer PG Ethernet64:3-4 to profile
Oct 16 10:31:50.869893 mtbc-sonic-01-2410 WARNING swss#orchagent: :- processPriorityGroup: Priority group 4 on port Ethernet64 is locked, pending until unlocked
  1. stop PFC storm
Oct 16 10:33:29.099110 mtbc-sonic-01-2410 NOTICE swss#orchagent: :- startWdActionOnQueue: Receive notification, restore
Oct 16 10:33:29.099110 mtbc-sonic-01-2410 NOTICE swss#orchagent: :- startWdActionOnQueue: PFC Watchdog storm restored on port Ethernet64, queue index 4, queue id 0x150000000006e5 and port id 0x10000000006de.
Oct 16 10:33:29.117272 mtbc-sonic-01-2410 NOTICE swss#orchagent: :- ~PfcWdZeroBufferHandler: Priority group 4 on port Ethernet64 has been resotred to pending profile 0x19000000000ab1

Details if related

One action for the PFC storm is to set a zero buffer profile on the PG under PFC storm.
The zero buffer won't be removed until the PFC storm has gone.
If the user wants to modify the buffer profile for the PG, the bufferorch will return "task_need_retry".
General speaking it doesn't matter unless that:
- the system can't be warm rebooted until the PFC storm has gone.
- the "task_need_retry" will block the update of the entire BUFFER_PG table from being programmed to ASIC.
In this sense, we need a better solution.

The new solution will:
- record the new profile user wants to apply during PFC storm as the "pending profile" for that PG and return "task_success" if the PG is under PFC storm.
- apply the pending profile once the PG is unlocked.
- the latest pending profile will take effect in case user tries updating the profile for more than 1 times.

Signed-off-by: Stephen Sun <stephens@nvidia.com>
Signed-off-by: Stephen Sun <stephens@nvidia.com>
Signed-off-by: Stephen Sun <stephens@nvidia.com>
@stephenxs stephenxs changed the title Imporve the way of handling BUFFER_PG during PFC storm Improve the way of handling BUFFER_PG during PFC storm Nov 11, 2020
neethajohn
neethajohn previously approved these changes Dec 1, 2020
orchagent/pfcactionhandler.cpp Outdated Show resolved Hide resolved
orchagent/pfcactionhandler.cpp Outdated Show resolved Hide resolved
Signed-off-by: Stephen Sun <stephens@nvidia.com>
@liat-grozovik liat-grozovik merged commit 053aa15 into sonic-net:master Dec 2, 2020
@stephenxs stephenxs deleted the fix-update-pg-under-pfcstorm branch December 2, 2020 10:26
daall pushed a commit to daall/sonic-swss that referenced this pull request Dec 7, 2020
* Fix the issue: failed to updating BUFFER_PG during PFC storm

One action for the PFC storm is to set a zero buffer profile on the PG under PFC storm.
The zero buffer won't be removed until the PFC storm has gone.
If the user wants to modify the buffer profile for the PG, the bufferorch will return "task_need_retry".
General speaking it doesn't matter unless that:
- the system can't be warm rebooted until the PFC storm has gone.
- the "task_need_retry" will block the update of the entire BUFFER_PG table from being programmed to ASIC.
In this sense, we need a better solution.

The new solution will:
- record the new profile user wants to apply during PFC storm as the "pending profile" for that PG and return "task_success" if the PG is under PFC storm.
- apply the pending profile once the PG is unlocked.
- the latest pending profile will take effect in case user tries updating the profile for more than 1 times.

Signed-off-by: Stephen Sun <stephens@nvidia.com>

* Address review comments: add a pair of brackets for the if-block

Signed-off-by: Stephen Sun <stephens@nvidia.com>

* Fix ut error

Signed-off-by: Stephen Sun <stephens@nvidia.com>

* Fix typo

Signed-off-by: Stephen Sun <stephens@nvidia.com>
neethajohn pushed a commit that referenced this pull request Jun 24, 2022
…storm is detected (#2304)

What I did
Avoid dropping traffic that is ingressing the port/pg that is in storm. The code changes in this PR avoid creating the ingress zero pool and profile and does not attach any zero profile to the ingress pg when pfcwd is triggered

Revert changes related to #1480 where the retry mechanism was added to BufferOrch which caches the task retries and while the PG is locked by PfcWdZeroBufferHandler.

Revert changes related to #2164 in PfcWdZeroBufferHandler & ZeroBufferProfile & BufferOrch.

Updated UT's accordingly

How I verified it
UT's.
Ran the sonic-mgmt test with these changes sonic-net/sonic-mgmt#5665 and verified if they've passed.

Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
yxieca pushed a commit that referenced this pull request Jun 25, 2022
…storm is detected (#2304)

What I did
Avoid dropping traffic that is ingressing the port/pg that is in storm. The code changes in this PR avoid creating the ingress zero pool and profile and does not attach any zero profile to the ingress pg when pfcwd is triggered

Revert changes related to #1480 where the retry mechanism was added to BufferOrch which caches the task retries and while the PG is locked by PfcWdZeroBufferHandler.

Revert changes related to #2164 in PfcWdZeroBufferHandler & ZeroBufferProfile & BufferOrch.

Updated UT's accordingly

How I verified it
UT's.
Ran the sonic-mgmt test with these changes sonic-net/sonic-mgmt#5665 and verified if they've passed.

Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
vivekrnv added a commit to vivekrnv/sonic-swss that referenced this pull request Aug 1, 2022
…storm is detected (sonic-net#2304)

What I did
Avoid dropping traffic that is ingressing the port/pg that is in storm. The code changes in this PR avoid creating the ingress zero pool and profile and does not attach any zero profile to the ingress pg when pfcwd is triggered

Revert changes related to sonic-net#1480 where the retry mechanism was added to BufferOrch which caches the task retries and while the PG is locked by PfcWdZeroBufferHandler.

Revert changes related to sonic-net#2164 in PfcWdZeroBufferHandler & ZeroBufferProfile & BufferOrch.

Updated UT's accordingly

How I verified it
UT's.
Ran the sonic-mgmt test with these changes sonic-net/sonic-mgmt#5665 and verified if they've passed.

Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
preetham-singh pushed a commit to preetham-singh/sonic-swss that referenced this pull request Aug 6, 2022
…storm is detected (sonic-net#2304)

What I did
Avoid dropping traffic that is ingressing the port/pg that is in storm. The code changes in this PR avoid creating the ingress zero pool and profile and does not attach any zero profile to the ingress pg when pfcwd is triggered

Revert changes related to sonic-net#1480 where the retry mechanism was added to BufferOrch which caches the task retries and while the PG is locked by PfcWdZeroBufferHandler.

Revert changes related to sonic-net#2164 in PfcWdZeroBufferHandler & ZeroBufferProfile & BufferOrch.

Updated UT's accordingly

How I verified it
UT's.
Ran the sonic-mgmt test with these changes sonic-net/sonic-mgmt#5665 and verified if they've passed.

Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants