Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[sai_failure_dump]Invoking dump during SAI failure #1198

Merged
merged 3 commits into from
Feb 2, 2023

Conversation

dgsudharsan
Copy link
Collaborator

HLD: sonic-net/SONiC#1212

What I did
Added logic to invoke SAI failure dump during any SAI programming failure before invoking abort by orchagent.

Why I did it
To collect necessary dumps in problem state in syncd before abort is called and all processes restarts

How I verified it
Manual verification. Added UT to cover abort scenario as well.

@dgsudharsan
Copy link
Collaborator Author

/azpw run Azure.sonic-sairedis

@mssonicbld
Copy link
Collaborator

/AzurePipelines run Azure.sonic-sairedis

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@dgsudharsan
Copy link
Collaborator Author

/azpw run Azure.sonic-sairedis

@mssonicbld
Copy link
Collaborator

/AzurePipelines run Azure.sonic-sairedis

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

lib/RedisRemoteSaiInterface.cpp Outdated Show resolved Hide resolved
lib/RedisRemoteSaiInterface.cpp Outdated Show resolved Hide resolved
syncd/Syncd.cpp Show resolved Hide resolved
syncd/tests.cpp Outdated Show resolved Hide resolved
@dgsudharsan dgsudharsan requested a review from prsunny January 31, 2023 02:47
@dgsudharsan
Copy link
Collaborator Author

/azpw run Azure.sonic-sairedis

@mssonicbld
Copy link
Collaborator

/AzurePipelines run Azure.sonic-sairedis

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@prsunny prsunny merged commit 0434b62 into sonic-net:master Feb 2, 2023
dgsudharsan added a commit to dgsudharsan/sonic-buildimage that referenced this pull request Feb 2, 2023
Update sonic-sairedis submodule pointer to include the following:
* 0434b62 [sai_failure_dump]Invoking dump during SAI failure ([sonic-net#1198](sonic-net/sonic-sairedis#1198))

Signed-off-by: dgsudharsan <sudharsand@nvidia.com>
liat-grozovik pushed a commit to sonic-net/sonic-buildimage that referenced this pull request Feb 2, 2023
Update sonic-sairedis submodule pointer to include the following:
* 0434b62 [sai_failure_dump]Invoking dump during SAI failure ([#1198](sonic-net/sonic-sairedis#1198))

Signed-off-by: dgsudharsan <sudharsand@nvidia.com>
yxieca pushed a commit that referenced this pull request Feb 2, 2023
* [sai_failure_dump]Invoking dump during SAI failure
StormLiangMS pushed a commit that referenced this pull request Feb 10, 2023
* [sai_failure_dump]Invoking dump during SAI failure
@dgsudharsan dgsudharsan deleted the sai_failure branch March 9, 2023 02:08
@kcudnik
Copy link
Collaborator

kcudnik commented Mar 9, 2023

why use extra SAI_REDIS_NOTIFY_SYNCD_INVOKE_DUMP enum instead of actually calling sai_dump api ? that would do exactly the same in more elegant way?

@dgsudharsan
Copy link
Collaborator Author

why use extra SAI_REDIS_NOTIFY_SYNCD_INVOKE_DUMP enum instead of actually calling sai_dump api ? that would do exactly the same in more elegant way?

Hi Kamil. I believe you are referring to saisdkdump which also covers lower layer information. Currently only mellanox platforms collect this information and other vendors may not have implemented it. https://github.com/sonic-net/sonic-utilities/blob/7a604c51671a85470db3d15aaa83b6b39a01531a/scripts/generate_dump#L1075
Other vendors invoke their own proprietary debug dump information as seen in the above generate_dump script. So there needs to be action taken by all vendors to standardize this.

On a note, this feature intends to collect dump immediately after SAI failure before services restart. In order to accommodate all vendors today we took the approach. Once all SAI vendors support the debug dump functionality we can standardize this.

This was also brought up during the HLD discussion and it was decided to take it in the SAI community meeting sonic-net/SONiC#1212 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants