Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Mellanox] Fix issue: watchdogutil command does not work #16091

Merged

Conversation

Junchao-Mellanox
Copy link
Collaborator

@Junchao-Mellanox Junchao-Mellanox commented Aug 10, 2023

Why I did it

watchdogutil uses platform API watchdog instance to control/query watchdog status. In Nvidia watchdog status, it caches "armed" status in a object member "WatchdogImplBase.armed". This is not working for CLI infrastructure because each CLI will create a new watchdog instance, the status cached in previous instance will totally lose. Consider following commands:

admin@sonic:~$ sudo watchdogutil arm -s 100      =====> watchdog instance1, armed=True
Watchdog armed for 100 seconds
admin@sonic:~$ sudo watchdogutil status             ======> watchdog instance2, armed=False
Status: Unarmed
admin@sonic:~$ sudo watchdogutil disarm            =======> watchdog instance3, armed=False
Failed to disarm Watchdog
Work item tracking
  • Microsoft ADO (number only): 24924064

How I did it

Use sysfs to query watchdog status

How to verify it

Manual test
Unit test

Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006
  • 202012
  • 202106
  • 202111
  • 202205
  • 202211
  • 202305

Tested branch (Please provide the tested image version)

Description for the changelog

Link to config_db schema for YANG module changes

A picture of a cute animal (not mandatory but encouraged)

@Junchao-Mellanox
Copy link
Collaborator Author

/azpw run Azure.sonic-buildimage

@mssonicbld
Copy link
Collaborator

/AzurePipelines run Azure.sonic-buildimage

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@Junchao-Mellanox
Copy link
Collaborator Author

Hi @liat-grozovik , could you please help merge this? Thanks!

@liat-grozovik liat-grozovik merged commit 95f317a into sonic-net:master Aug 23, 2023
11 checks passed
@liat-grozovik
Copy link
Collaborator

@Junchao-Mellanox i see there is a conflict with 202211. what is the PR for 202211? can you please refer it ?

@Junchao-Mellanox
Copy link
Collaborator Author

PR for 202211: #16242

@mssonicbld
Copy link
Collaborator

@Junchao-Mellanox PR conflicts with 202205 branch

@yxieca
Copy link
Contributor

yxieca commented Aug 23, 2023

@Junchao-Mellanox there is conflict. please help create 202205 PR. Thanks!

Junchao-Mellanox added a commit to Junchao-Mellanox/sonic-buildimage that referenced this pull request Aug 24, 2023
…6091)

- Why I did it
watchdogutil uses platform API watchdog instance to control/query watchdog status. In Nvidia watchdog status, it caches "armed" status in a object member "WatchdogImplBase.armed". This is not working for CLI infrastructure because each CLI will create a new watchdog instance, the status cached in previous instance will totally lose. Consider following commands:

admin@sonic:~$ sudo watchdogutil arm -s 100      =====> watchdog instance1, armed=True
Watchdog armed for 100 seconds
admin@sonic:~$ sudo watchdogutil status             ======> watchdog instance2, armed=False
Status: Unarmed
admin@sonic:~$ sudo watchdogutil disarm            =======> watchdog instance3, armed=False
Failed to disarm Watchdog

- How I did it
Use sysfs to query watchdog status

- How to verify it
Manual test
Unit test
Conflicts:
	platform/mellanox/mlnx-platform-api/sonic_platform/watchdog.py
	platform/mellanox/mlnx-platform-api/tests/test_watchdog.py
@Junchao-Mellanox
Copy link
Collaborator Author

PR for 202205: #16260

yxieca pushed a commit that referenced this pull request Aug 26, 2023
…6260)

- Why I did it
watchdogutil uses platform API watchdog instance to control/query watchdog status. In Nvidia watchdog status, it caches "armed" status in a object member "WatchdogImplBase.armed". This is not working for CLI infrastructure because each CLI will create a new watchdog instance, the status cached in previous instance will totally lose. Consider following commands:

admin@sonic:~$ sudo watchdogutil arm -s 100      =====> watchdog instance1, armed=True
Watchdog armed for 100 seconds
admin@sonic:~$ sudo watchdogutil status             ======> watchdog instance2, armed=False
Status: Unarmed
admin@sonic:~$ sudo watchdogutil disarm            =======> watchdog instance3, armed=False
Failed to disarm Watchdog

- How I did it
Use sysfs to query watchdog status

- How to verify it
Manual test
Unit test
Conflicts:
	platform/mellanox/mlnx-platform-api/sonic_platform/watchdog.py
	platform/mellanox/mlnx-platform-api/tests/test_watchdog.py
keboliu added a commit to keboliu/sonic-buildimage that referenced this pull request Aug 28, 2023
mssonicbld pushed a commit to mssonicbld/sonic-buildimage that referenced this pull request Sep 3, 2023
…6091)

- Why I did it
watchdogutil uses platform API watchdog instance to control/query watchdog status. In Nvidia watchdog status, it caches "armed" status in a object member "WatchdogImplBase.armed". This is not working for CLI infrastructure because each CLI will create a new watchdog instance, the status cached in previous instance will totally lose. Consider following commands:

admin@sonic:~$ sudo watchdogutil arm -s 100      =====> watchdog instance1, armed=True
Watchdog armed for 100 seconds
admin@sonic:~$ sudo watchdogutil status             ======> watchdog instance2, armed=False
Status: Unarmed
admin@sonic:~$ sudo watchdogutil disarm            =======> watchdog instance3, armed=False
Failed to disarm Watchdog

- How I did it
Use sysfs to query watchdog status

- How to verify it
Manual test
Unit test
@mssonicbld
Copy link
Collaborator

Cherry-pick PR to 202305: #16407

mssonicbld pushed a commit that referenced this pull request Sep 3, 2023
- Why I did it
watchdogutil uses platform API watchdog instance to control/query watchdog status. In Nvidia watchdog status, it caches "armed" status in a object member "WatchdogImplBase.armed". This is not working for CLI infrastructure because each CLI will create a new watchdog instance, the status cached in previous instance will totally lose. Consider following commands:

admin@sonic:~$ sudo watchdogutil arm -s 100      =====> watchdog instance1, armed=True
Watchdog armed for 100 seconds
admin@sonic:~$ sudo watchdogutil status             ======> watchdog instance2, armed=False
Status: Unarmed
admin@sonic:~$ sudo watchdogutil disarm            =======> watchdog instance3, armed=False
Failed to disarm Watchdog

- How I did it
Use sysfs to query watchdog status

- How to verify it
Manual test
Unit test
sonic-otn pushed a commit to sonic-otn/sonic-buildimage that referenced this pull request Sep 20, 2023
…6091)

- Why I did it
watchdogutil uses platform API watchdog instance to control/query watchdog status. In Nvidia watchdog status, it caches "armed" status in a object member "WatchdogImplBase.armed". This is not working for CLI infrastructure because each CLI will create a new watchdog instance, the status cached in previous instance will totally lose. Consider following commands:

admin@sonic:~$ sudo watchdogutil arm -s 100      =====> watchdog instance1, armed=True
Watchdog armed for 100 seconds
admin@sonic:~$ sudo watchdogutil status             ======> watchdog instance2, armed=False
Status: Unarmed
admin@sonic:~$ sudo watchdogutil disarm            =======> watchdog instance3, armed=False
Failed to disarm Watchdog

- How I did it
Use sysfs to query watchdog status

- How to verify it
Manual test
Unit test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants