Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

T2-VOQ: Correlate and rephrase the Midplane/Module connectivity related logs if its a genuine scenario #18540

Closed
deepak-singhal0408 opened this issue Apr 2, 2024 · 1 comment · Fixed by sonic-net/sonic-platform-daemons#480
Assignees
Labels
NOKIA Triaged this issue has been triaged

Comments

@deepak-singhal0408
Copy link
Contributor

Description

Currently, we have Module midplane connectivity logs for the cases when supervisor looses connectivity with linecards.
However these logs also come for some genuine scenarios:
a. Supervisor reboot
pmon#chassisd: Module LINE-CARD1 midplane connectivity is up

b. Individual linecard reboot
pmon#chassisd: Module LINE-CARD1 lost midplane connectivity
sr_device_mgr: Unable to reach slot 1 (Linecard) via Midplane
sr_device_mgr: Slot 1 (Linecard) is Reachable via Midplane (after missing count: 6)
pmon#chassisd: Module LINE-CARD1 midplane connectivity is up

Can we rephrase it for genuine scenarios, so that we could better distinguish between expected and not-expected cases.

Steps to reproduce the issue:

  1. T2-VOQ chassis
  2. Supervisor reboot: Monitor sup syslog
  3. Linecard reboot: Monitor sup syslog

Describe the results you received:

Syslogs suggesting that something is wrong.

Describe the results you expected:

Have the syslog Indicating that these messages are expected for genuine scenarios

Output of show version:

20220532.55

(paste your output here)

Output of show techsupport:

(paste your output here or download and attach the file here )

Additional information you deem important (e.g. issue happens only occasionally):

@deepak-singhal0408
Copy link
Contributor Author

@mlok-nokia @judyjoseph @rlhui for viz.

@deepak-singhal0408 deepak-singhal0408 added the Triaged this issue has been triaged label Apr 2, 2024
lguohan pushed a commit that referenced this issue May 12, 2024
…oot to allow SUP to log expected/unepected midplane/module connectivity msg (#18805)

Why I did it
For Linecard expected and unexpected reboot, Supervisor needs to log a expected and unexpected lost connectivity message. After the new mechanism has been introduced by PRs. For Nokia-IXR7250E-36x600G linecard, it requires to handle missing heartbeat reboot is unexpected reboot for SUP. Issue #18540

Work item tracking
Microsoft ADO (number only):
How I did it
On Nokia-IXR7250E-36x400G platform, missing heartbeat reboot also call the "sudo reboot" which creates a CHASSIS_MODULE_REBOOT_INFO_TABLE entry expected reboot on SUP. Since heartbeat reboot is unexpected reboot, it requires to modify the platform_reboot check if it is missing heart reboot, then remove the CHASSIS_MODULE_REBOOT_INFO_TABLE entry on the SUP. So that, SUP can log the unexpected log.

How to verify it
Simulated the missing heartbeat reboot on the linecard, then, verify the log message on SUP as below
Apr 25 19:50:19.286081 ixre-cpm-chassis7 WARNING pmon#chassisd: Module LINE-CARD0 went off-line!
Apr 25 19:50:22.549416 ixre-cpm-chassis7 WARNING pmon#chassisd: Unexpected: Module LINE-CARD0 lost midplane connectivity.


Signed-off-by: mlok <marty.lok@nokia.com>
rlhui pushed a commit that referenced this issue May 31, 2024
… for Nokia-IXR7250E platform (#18862)

This PR add the platform specified linecard_reboot_timeout value to the platform_evn.conf. It works PR sonic-net/sonic-platform-daemons#480 and sonic-net/sonic-utilities#3292 to address issue #18540

Signed-off-by: mlok <marty.lok@nokia.com>
mlok-nokia added a commit to mlok-nokia/sonic-buildimage that referenced this issue Jun 5, 2024
…oot to allow SUP to log expected/unepected midplane/module connectivity msg (sonic-net#18805)

Why I did it
For Linecard expected and unexpected reboot, Supervisor needs to log a expected and unexpected lost connectivity message. After the new mechanism has been introduced by PRs. For Nokia-IXR7250E-36x600G linecard, it requires to handle missing heartbeat reboot is unexpected reboot for SUP. Issue sonic-net#18540

Work item tracking
Microsoft ADO (number only):
How I did it
On Nokia-IXR7250E-36x400G platform, missing heartbeat reboot also call the "sudo reboot" which creates a CHASSIS_MODULE_REBOOT_INFO_TABLE entry expected reboot on SUP. Since heartbeat reboot is unexpected reboot, it requires to modify the platform_reboot check if it is missing heart reboot, then remove the CHASSIS_MODULE_REBOOT_INFO_TABLE entry on the SUP. So that, SUP can log the unexpected log.

How to verify it
Simulated the missing heartbeat reboot on the linecard, then, verify the log message on SUP as below
Apr 25 19:50:19.286081 ixre-cpm-chassis7 WARNING pmon#chassisd: Module LINE-CARD0 went off-line!
Apr 25 19:50:22.549416 ixre-cpm-chassis7 WARNING pmon#chassisd: Unexpected: Module LINE-CARD0 lost midplane connectivity.


Signed-off-by: mlok <marty.lok@nokia.com>
mlok-nokia added a commit to mlok-nokia/sonic-buildimage that referenced this issue Jun 5, 2024
… for Nokia-IXR7250E platform (sonic-net#18862)

This PR add the platform specified linecard_reboot_timeout value to the platform_evn.conf. It works PR sonic-net/sonic-platform-daemons#480 and sonic-net/sonic-utilities#3292 to address issue sonic-net#18540

Signed-off-by: mlok <marty.lok@nokia.com>
arun1355492 pushed a commit to arun1355492/sonic-buildimage that referenced this issue Jul 26, 2024
… for Nokia-IXR7250E platform (sonic-net#18862)

This PR add the platform specified linecard_reboot_timeout value to the platform_evn.conf. It works PR sonic-net/sonic-platform-daemons#480 and sonic-net/sonic-utilities#3292 to address issue sonic-net#18540

Signed-off-by: mlok <marty.lok@nokia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NOKIA Triaged this issue has been triaged
Projects
Archived in project
3 participants