-
Notifications
You must be signed in to change notification settings - Fork 539
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BufferOrch] Don't call SAI API for BUFFER_POOL/PROFILE handling in case the op is DEL and the SAI OID is NULL #1786
[BufferOrch] Don't call SAI API for BUFFER_POOL/PROFILE handling in case the op is DEL and the SAI OID is NULL #1786
Conversation
… DEL and the SAI OID is NULL There is an optimization on orch: the SET event will be replaced by a DEL event if it is still pending in m_toSync when the DEL event is coming. This is reasonable but can cause the SAI OID be NULL in rare case: - The application creates an object and then destroy it after a very short period - The create notification is eliminated and replaced by destroy notification - The SAI object hasn't been created and is NULL when it is removed This causes SAI error which eventually makes orchagent exit. We need to avoid it in 202106 and above (In 202012, the orchagent won't exit so it's ok to ignore the error) The solution is not to call SAI removing interface in case SAI OID is NULL It can happen if a user configures something which causes the accumulative headroom exceeds the limit. In this case, the buffer profile was created and then removed in a short time. Signed-off-by: Stephen Sun <stephens@nvidia.com>
Looks like the LGTM failed due to some temporary network issue and wasn't recovered after retriggered.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can we test this?
@stephenxs , please do not close and re-open the PR multiple times for retriggering the checks. It can be done via /azp run. I think once the code changes/review is completed, this can be taken care of. |
Thanks Prince. Looks like we don’t have the privilege to trigger azp rerun. Anyway I’ll avoid closing/reopening the PRs |
We have a regression test (test_exceeding_headroom) for buffer profile. We don’t have a chance to verify the buffer pool because we seldom remove a buffer pool. I updated it because they shared the same logic. |
@stephenxs , does this test already cover this scenario for buffer profile? |
Discussed offline, it is covered |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
Advance submodule head for sonic-swss 3226163 [BufferOrch] Don't call SAI API for BUFFER_POOL/PROFILE handling in case the op is DEL and the SAI OID is NULL (sonic-net/sonic-swss#1786) 6c88e47 [Dynamic Buffer Calc][Mellanox] Bug fixes and enhancements for the lua plugins for buffer pool calculation and headroom checking (sonic-net/sonic-swss#1781) e86b900 [MPLS] sonic-swss changes for MPLS (sonic-net/sonic-swss#1686) 4c8e2b5 [Dynamic Buffer Calc] Avoid creating lossy PG for admin down ports during initialization (sonic-net/sonic-swss#1776) 3602124 [VS test stability] Skip flaky test for DPB (sonic-net/sonic-swss#1807) c37cc1c Support for in-band-mgmt via management VRF (sonic-net/sonic-swss#1726) 1e3a532 Fix config prompt question issue (sonic-net/sonic-swss#1799) Signed-off-by: Stephen Sun <stephens@nvidia.com>
Advance submodule head for sonic-swss 3226163 [BufferOrch] Don't call SAI API for BUFFER_POOL/PROFILE handling in case the op is DEL and the SAI OID is NULL (sonic-net/sonic-swss#1786) 6c88e47 [Dynamic Buffer Calc][Mellanox] Bug fixes and enhancements for the lua plugins for buffer pool calculation and headroom checking (sonic-net/sonic-swss#1781) e86b900 [MPLS] sonic-swss changes for MPLS (sonic-net/sonic-swss#1686) 4c8e2b5 [Dynamic Buffer Calc] Avoid creating lossy PG for admin down ports during initialization (sonic-net/sonic-swss#1776) 3602124 [VS test stability] Skip flaky test for DPB (sonic-net/sonic-swss#1807) c37cc1c Support for in-band-mgmt via management VRF (sonic-net/sonic-swss#1726) 1e3a532 Fix config prompt question issue (sonic-net/sonic-swss#1799) Signed-off-by: Stephen Sun <stephens@nvidia.com>
…ase the op is DEL and the SAI OID is NULL (sonic-net#1786) - What I did Don't call SAI API for BUFFER_POOL/PROFILE handling in case the op is DEL and the SAI OID is NULL in order to avoid orchagent from exiting. We need it only in 202106 or above. In 202012 the orchagent won't exit in such case. - Why I did it Handle rare cases which cause SAI error eventually makes orchagent to exit. - How I verified it Manually test. Signed-off-by: Stephen Sun <stephens@nvidia.com>
…ase the op is DEL and the SAI OID is NULL (sonic-net#1786) - What I did Don't call SAI API for BUFFER_POOL/PROFILE handling in case the op is DEL and the SAI OID is NULL in order to avoid orchagent from exiting. We need it only in 202106 or above. In 202012 the orchagent won't exit in such case. - Why I did it Handle rare cases which cause SAI error eventually makes orchagent to exit. - How I verified it Manually test. Signed-off-by: Stephen Sun <stephens@nvidia.com>
What I did Backport SAI failure handling related commits into the 202012 branch. The following is a list of backported commits: 941875a Deactivate mirror session only when session status is true in updateLagMember (#1666) be12482 Ignore ALREADY_EXIST error in FDB creation (#1815) c9c1aa2 Add failure handling for SAI get operations (#1768) 47b4276 [BufferOrch] Don't call SAI API for BUFFER_POOL/PROFILE handling in case the op is DEL and the SAI OID is NULL (#1786) db9238f Add failure notification for orchagent (#1665) fc8e43f [synchronous mode] Add failure notification for SAI failures in synchronous mode (#1596) Why I did it 202012 image needs to include failure handling mechanism for enough notification in the presence of SAI failures.
…ase the op is DEL and the SAI OID is NULL (sonic-net#1786) - What I did Don't call SAI API for BUFFER_POOL/PROFILE handling in case the op is DEL and the SAI OID is NULL in order to avoid orchagent from exiting. We need it only in 202106 or above. In 202012 the orchagent won't exit in such case. - Why I did it Handle rare cases which cause SAI error eventually makes orchagent to exit. - How I verified it Manually test. Signed-off-by: Stephen Sun <stephens@nvidia.com>
What I did
Don't call SAI API for BUFFER_POOL/PROFILE handling in case the op is DEL and the SAI OID is NULL in order to avoid orchagent from exiting.
We need it only in 202106 or above. In 202012 the orchagent won't exit in such case.
Why I did it
Handle rare cases which cause SAI error eventually makes orchagent to exit.
How I verified it
Manually test.
Details if related
There is an optimization on orch: the SET event will be replaced by a DEL event if it is still pending in m_toSync when the DEL event is coming.
This is reasonable but can cause the SAI OID to be NULL in rare case:
This causes SAI error which eventually makes orchagent exit.
We need to avoid it in 202106 and above
(In 202012, the orchagent won't exit so it's ok to ignore the error)
The solution is not to call SAI removing interface in case SAI OID is NULL
It can happen if a user configures something which causes the accumulative headroom to exceed the limit.
In this case, the buffer profile was created and then removed in a short time.