Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DellEMC] Watchdog support DellEMCS6100 #3187

Closed
wants to merge 1 commit into from

Conversation

paavaanan
Copy link
Contributor

- What I did

  • Added watchdog support for DellEMCS6100 platform

- How I did it

  • Added Intel iTCO_wdt driver support.
  • Enabled watchdog deamon support to monitor watchdog node.
  • Enabled API support to Enable/disable watchdog.

- How to verify it

  • Attached test.py script to test watchdog

- Description for the changelog

- A picture of a cute animal (not mandatory but encouraged)

watchdot-test-script.zip

sudo sed -i 's/run_watchdog=1/run_watchdog=0/' $FILESYSTEM_ROOT/etc/default/watchdog
sudo rm -rf $FILESYSTEM_ROOT/lib/systemd/system/wd_keepalive.service
sudo rm -rf $FILESYSTEM_ROOT/etc/init.d/wd_keepalive

Copy link
Contributor Author

@paavaanan paavaanan Jul 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Here enabled watchdog device in watchdog.conf
  • Watchdog can be enabled only with sonic_platform API on need basis.
  • Removed wd_keepalive support.

#Enable watcdog with nowayout
rmmod iTCO_wdt
modprobe iTCO_wdt nowayout=1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Added nowayout support.
  • So, once watchdog is started we can't stop it.
When the device is closed, the watchdog is disabled, unless the "Magic
Close" feature is supported (see below).  This is not always such a
good idea, since if there is a bug in the watchdog daemon and it
crashes the system will not reboot.  Because of this, some of the
drivers support the configuration option "Disable watchdog shutdown on
close", CONFIG_WATCHDOG_NOWAYOUT.  If it is set to Y when compiling
the kernel, there is no way of disabling the watchdog once it has been
started.  So, if the watchdog daemon crashes, the system will reboot
after the timeout has passed. Watchdog devices also usually support
the nowayout module parameter so that this option can be controlled at
runtime.

https://www.kernel.org/doc/Documentation/watchdog/watchdog-api.txt

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The platform API provides a function to "disarm" the watchdog. I'm not sure how frequently (or even if) this will be called. However, with "nowayout" enabled, it appears that there is no way to disable the watchdog after it has started. Is this correct?

Copy link
Contributor Author

@paavaanan paavaanan Aug 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Yes. We can't able to stop the watchdog once we enable with "nowayout" option.
  • The need for nowayout is if user-space watchdog daemon got crashed and accidentally if it close the /dev/watchdog node proerly then there is a possibility watchdog may never kick-in.
  • To avoid this (slightest possibility) nowayout option is used.

@lguohan lguohan requested a review from jleveque July 30, 2019 22:35
self.write_config(
self.WATCHDOG_DEFAULT_FILE,
"run_watchdog=1",
"run_watchdog=0")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will disable the watchdog at next boot. However, this function is meant to disable the watchdog at runtime, in the event there may ever be a need. Is this not possible because of the "nowayout" feature enabled above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. You are right. The trade-off is with nowayout there is noway to stop watchdog. (Even with magic value)

@jleveque jleveque requested a review from lguohan August 6, 2019 22:19
@jleveque
Copy link
Contributor

jleveque commented Aug 6, 2019

@lguohan: Does this approach look good to you? How do you feel about the "nowayout" feature?

@paavaanan paavaanan closed this May 10, 2020
@paavaanan paavaanan deleted the watchdog-support-z9100 branch May 10, 2020 06:10
mssonicbld added a commit that referenced this pull request Mar 9, 2024
…atically (#18314)

#### Why I did it
src/sonic-utilities
```
* 9d5dacab - (HEAD -> 202311, origin/202311) CLI to skip polling for periodic information for a port in DomInfoUpdateTask thread (#3187) (4 hours ago) [mihirpat1]
```
#### How I did it
#### How to verify it
#### Description for the changelog
mssonicbld added a commit that referenced this pull request Mar 13, 2024
…atically (#18331)

#### Why I did it
src/sonic-utilities
```
* c0ba32ad - (HEAD -> 202305, origin/202305) CLI to skip polling for periodic information for a port in DomInfoUpdateTask thread (#3187) (16 hours ago) [mihirpat1]
* 261cfdf7 - CLI enhancements to revtrieve data from TRANSCEIVER_FIRMWARE_INFO table (#3177) (#3189) (19 hours ago) [mssonicbld]
* 6160ee79 - [202305][config] Add YANG alerting for override (#3195) (20 hours ago) [jingwenxie]
* a55624d8 - [fast/warm-reboot] Put ERR message in syslog when a failure is seen (#3186) (34 hours ago) [Vaibhav Hemant Dixit]
```
#### How I did it
#### How to verify it
#### Description for the changelog
mssonicbld added a commit that referenced this pull request Mar 28, 2024
…atically (#18240)

#### Why I did it
src/sonic-utilities
```
* bdc57206 - (HEAD -> master, origin/master, origin/HEAD) Revert "Fix for Switch Port Modes and VLAN CLI Enhancement (#3108)" (#3246) (89 minutes ago) [jingwenxie]
* e35452b7 - Modify "show interface transceiver status" CLI to show SW cmis state (#3238) (2 days ago) [mihirpat1]
* 04a33e1f - Add "state" field in CONFIG_DB a toggle of the fabric port monitor feature (#2932) (2 days ago) [jfeng-arista]
* 3c489ba5 - Enhance route-check for multi-asic platforms (#3216) (5 days ago) [Deepak Singhal]
* c149e48b - [chassis] Add chassis support for CLI "config qos reload" (#3233) (6 days ago) [wenyiz2021]
* d8541add - Update port2alias (#3217) (8 days ago) [abdosi]
* d4688a8f - [graceful reboot] Add the pre_reboot_hook script execution, add the watchdog arm before the reboot (#3203) (8 days ago) [Vadym Hlushko]
* 125f36f3 - [ipintutil]Handle exception in show ip interfaces command (#3182) (10 days ago) [Sudharsan Dhamal Gopalarathnam]
* 9d532017 - [chassis][show-runningconfig] Fix the show runningconfiguration all issue on the Supervisor (#3194) (2 weeks ago) [Marty Y. Lok]
* 1a9261ce - [Techsupport]Handle SAI kv pair if present in sai common profile (#3196) (2 weeks ago) [Sudharsan Dhamal Gopalarathnam]
* 7466dc4a - Skip the validation of action in acl-loader if capability table in STATE_DB is empty (#3199) (2 weeks ago) [bingwang-ms]
* b879b658 - [Bug] Fix fw_setenv illegel character issue (#3201) (3 weeks ago) [xumia]
* 0b41a560 - [config] Add YANG alerting for override (#3188) (3 weeks ago) [jingwenxie]
* 24683b0c - [show] multi-asic show running test residue (#3198) (3 weeks ago) [jingwenxie]
* 995a797a - CLI to skip polling for periodic information for a port in DomInfoUpdateTask thread (#3187) (3 weeks ago) [mihirpat1]
* 9aa9eaa5 - [config] Add Table hard dependency check (#3159) (3 weeks ago) [jingwenxie]
* 5f0ffcca - [fast/warm-reboot] Put ERR message in syslog when a failure is seen (#3186) (4 weeks ago) [Vaibhav Hemant Dixit]
* 92220dcf - Fix for Switch Port Modes and VLAN CLI Enhancement (#3108) (4 weeks ago) [Saba Akram]
```
#### How I did it
#### How to verify it
#### Description for the changelog
mssonicbld added a commit that referenced this pull request Jun 8, 2024
…lly (#19250)

#### Why I did it
src/sonic-swss
```
* f497c4a0 - (HEAD -> master, origin/master, origin/HEAD) [muxorch] Fixing bug with updateRoute and mux neighbors (#3187) (3 hours ago) [Nikola Dancejic]
```
#### How I did it
#### How to verify it
#### Description for the changelog
mssonicbld added a commit that referenced this pull request Jul 13, 2024
…lly (#19554)

#### Why I did it
src/sonic-swss
```
* d3073b7c - (HEAD -> 202405, origin/202405) [muxorch] Fixing bug with updateRoute and mux neighbors (#3187) (19 hours ago) [Nikola Dancejic]
* b16d6b2a - ADD VOQ COUNTERS(SAI_SWITCH_STAT_PACKET_INTEGRITY_DROP, SAI_QUEUE_ST…T_CREDIT_WD_DELETED_PACKETS) support for VOQ/Fabric switches (#3152) (19 hours ago) [saksarav-nokia]
* 12a95e57 - Revamp module build script to make it work for 5.15 on Ubuntu 20.04 (#3212) (19 hours ago) [Saikrishna Arcot]
* 87cf38e0 - Fix in switchorch: unsupported attribute causes skipping of processing the rest of configurations (#3209) (19 hours ago) [Amir]
* 8f333b69 - [subnet decap] Support decap rule generation based on T0 VIP route (#3183) (5 weeks ago) [Longxiang Lyu]
* 9bcb9b6e - Fixing appl_db FABRIC_MONITOR notification issue. (#3176) (5 weeks ago) [jfeng-arista]
* fff544e6 - Rotate record file before writing new log. (#3158) (5 weeks ago) [mint570]
* 80f52079 - Add SWSS support for link event damping feature (#2933) (5 weeks ago) [Roy Yi]
* b3ebfc46 - [muxorch] Using bulker to program routes/neighbors during switchover (#3148) (5 weeks ago) [Nikola Dancejic]
```
#### How I did it
#### How to verify it
#### Description for the changelog
arun1355492 pushed a commit to arun1355492/sonic-buildimage that referenced this pull request Jul 26, 2024
…lly (sonic-net#19250)

#### Why I did it
src/sonic-swss
```
* f497c4a0 - (HEAD -> master, origin/master, origin/HEAD) [muxorch] Fixing bug with updateRoute and mux neighbors (sonic-net#3187) (3 hours ago) [Nikola Dancejic]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants