Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modified reboot scripts to sync FSIO reads/writes to disk before OS-level reboot #3171

Merged
merged 5 commits into from
May 21, 2024
Merged

Modified reboot scripts to sync FSIO reads/writes to disk before OS-level reboot #3171

merged 5 commits into from
May 21, 2024

Conversation

assrinivasan
Copy link
Contributor

@assrinivasan assrinivasan commented Feb 20, 2024

Added code to sync FS I/O reads/writes just before reboot

What I did

soft/cold reboot: script changes to stop PMON container so as to send a SIGTERM to the Storage Monitoring Daemon before OS-level reboot. This signal is caught by the daemon which then syncs the total and latest FSIO reads and writes from STATE_DB to a location on the disk and would be picked up by the daemon upon reboot.

fast-reboot: Added STORAGE_INFO key as a key from STATE_DB whose underlying information is to be saved.

SONiC Storage Monitoring Daemon HLD

How I did it

soft/cold-reboot: script first stops the pmon container before attempting to kill the container
fast-reboot: Added code to save STORAGE_INFO information from STATE_DB

How to verify it

  1. Flash image with this change (and stormon daemon addition) on the switch.
  2. Navigate to /host/pmon/stormon and cat the fsio-rw-stats.json file for current values
  3. Get the values from STATE_DB to verify what the latest FSIO RW values are.
  4. Call cold/soft/warm-reboot
  5. Cat the above file after regaining SSH access -- you would see updated read and write values (seen in step 3) synced from the database.

COLD REBOOT

root@str-s6100-acs-1:~# cat /host/pmon/stormond/fsio-rw-stats.json
{}
root@str-s6100-acs-1:~#
root@str-s6100-acs-1:~# date +%s
1715714436
root@str-s6100-acs-1:~# redis-cli -n 6 HSET "STORAGE_INFO|sda" total_fsio_reads 1000 ; redis-cli -n 6 HSET "STORAGE_INFO|sda" total_fsio_writes 2000 ; redis-cli -n 6 HSET "STORAGE_INFO|sda" latest_fsio_reads 10 ; redis-cli -n 6 HSET "STORAGE_INFO|sda" latest_fsio_writes 20 ; redis-cli -n 6 HSET "STORAGE_INFO|FSSTATS_SYNC" successful_sync_time 1715714436
(integer) 1
(integer) 1
(integer) 1
(integer) 1
(integer) 1
root@str-s6100-acs-1:~#
root@str-s6100-acs-1:~# cat /host/pmon/stormond/fsio-rw-stats.json
{}
root@str-s6100-acs-1:~#
root@str-s6100-acs-1:~# redis-cli -n 6 HGETALL "STORAGE_INFO|sda"
1) "total_fsio_reads"
2) "1000"
3) "total_fsio_writes"
4) "2000"
5) "latest_fsio_reads"
6) "10"
7) "latest_fsio_writes"
8) "20"
root@str-s6100-acs-1:~# redis-cli -n 6 HGETALL "STORAGE_INFO|FSSTATS_SYNC"
1) "successful_sync_time"
2) "1715714436"
root@str-s6100-acs-1:~#
root@str-s6100-acs-1:~# sudo reboot
Tue May 14 07:24:26 PM UTC 2024 Syncing FS I/O reads and writes to disk
Tue May 14 07:24:27 PM UTC 2024 fsio-rw-sync returned 0
/var/log: 3.8 GiB (4100276224 bytes) trimmed on /dev/loop1
/host: 8.7 GiB (9393045504 bytes) trimmed on /dev/sda4
Tue May 14 07:24:37 PM UTC 2024 Issuing OS-level reboot ...
root@str-s6100-acs-1:~#
.
.

Last login: Tue May 14 19:17:10 2024 from 10.1.84.40
admin@str-s6100-acs-1:~$
admin@str-s6100-acs-1:~$ cat /host/pmon/stormond/fsio-rw-stats.json
{"sda": {"total_fsio_reads": "1000", "total_fsio_writes": "2000", "latest_fsio_reads": "10", "latest_fsio_writes": "20"}, "successful_sync_time": "1715714436"}admin@str-s6100-acs-1:~$
admin@str-s6100-acs-1:~$ redis-cli -n 6 HGETALL "STORAGE_INFO|sda"
(empty array)
admin@str-s6100-acs-1:~$ redis-cli -n 6 HGETALL "STORAGE_INFO|FSSTATS_SYNC"
(empty array)
admin@str-s6100-acs-1:~$

SOFT REBOOT

root@str-s6100-acs-1:~#  echo {} > /host/pmon/stormond/fsio-rw-stats.json
admin@str-s6100-acs-1:~$ cat /host/pmon/stormond/fsio-rw-stats.json
{}
admin@str-s6100-acs-1:~$ redis-cli -n 6 HSET "STORAGE_INFO|sda" total_fsio_reads 5000 ; redis-cli -n 6 HSET "STORAGE_INFO|sda" total_fsio_writes 10000 ; redis-cli -n 6 HSET "STORAGE_INFO|sda" latest_fsio_read                          s 120 ; redis-cli -n 6 HSET "STORAGE_INFO|sda" latest_fsio_writes 240 ; redis-cli -n 6 HSET "STORAGE_INFO|FSSTATS_SYNC" successful_sync_time 1715714436
(integer) 1
(integer) 1
(error) ERR wrong number of arguments for 'hset' command
(integer) 1
(integer) 1
admin@str-s6100-acs-1:~$ redis-cli -n 6 HSET "STORAGE_INFO|sda" total_fsio_reads 5000 ; redis-cli -n 6 HSET "STORAGE_INFO|sda" total_fsio_writes 10000 ; redis-cli -n 6 HSET "STORAGE_INFO|sda" latest_fsio_reads 120 ; redis-cli -n 6 HSET "STORAGE_INFO|sda" latest_fsio_writes 240 ; redis-cli -n 6 HSET "STORAGE_INFO|FSSTATS_SYNC" successful_sync_time 1715714436
(integer) 0
(integer) 0
(integer) 1
(integer) 0
(integer) 0
admin@str-s6100-acs-1:~$
admin@str-s6100-acs-1:~$ redis-cli -n 6 HGETALL "STORAGE_INFO|sda"
1) "total_fsio_reads"
2) "5000"
3) "total_fsio_writes"
4) "10000"
5) "latest_fsio_writes"
6) "240"
7) "latest_fsio_reads"
8) "120"
admin@str-s6100-acs-1:~$ redis-cli -n 6 HGETALL "STORAGE_INFO|FSSTATS_SYNC"
1) "successful_sync_time"
2) "1715714436"

admin@str-s6100-acs-1:~$ sudo soft-reboot
requested COLD shutdown
Tue May 14 07:32:00 PM UTC 2024 Syncing FS I/O reads and writes to disk
Tue May 14 07:32:01 PM UTC 2024 fsio-rw-sync returned 0
Watchdog armed for 180 seconds
client_loop: send disconnect: Broken pipe
assrinivasan@assrinivasan-dev-vm-1:/data/sonic$ ash str-s6100-acs-1
.
.
Last login: Tue May 14 19:29:20 2024 from 10.1.84.40
admin@str-s6100-acs-1:~$ cat /host/pmon/stormond/fsio-rw-stats.json
{"sda": {"total_fsio_reads": "5000", "total_fsio_writes": "10000", "latest_fsio_reads": "120", "latest_fsio_writes": "240"}, "successful_sync_time": "1715714436"}admin@str-s6100-acs-1:~$
admin@str-s6100-acs-1:~$

FAST/WARM REBOOT

root@str-s6100-acs-1:~# echo {} > /host/pmon/stormond/fsio-rw-stats.json
root@str-s6100-acs-1:~# redis-cli -n 6 HSET "STORAGE_INFO|sda" total_fsio_reads 1234 ; redis-cli -n 6 HSET "STORAGE_INFO|sda" total_fsio_writes 22312 ; redis-cli -n 6 HSET "STORAGE_INFO|sda" latest_fsio_reads 654 ; redis-cli -n 6 HSET "STORAGE_INFO|sda" latest_fsio_writes 311 ; redis-cli -n 6 HSET "STORAGE_INFO|FSSTATS_SYNC" successful_sync_time 1715715298
(integer) 1
(integer) 1
(integer) 1
(integer) 1
(integer) 1
root@str-s6100-acs-1:~#
root@str-s6100-acs-1:~# warm-reboot
Tue May 14 09:13:06 PM UTC 2024 Syncing FS I/O reads and writes to disk
Tue May 14 09:13:07 PM UTC 2024 fsio-rw-sync returned 0
ERROR: There are port channels/peer devices that failed the probe: ['PortChannel102', 'PortChannel101', 'PortChannel104', 'PortChannel103']
/usr/local/lib/python3.11/dist-packages/scapy/layers/ipsec.py:471: CryptographyDeprecationWarning: Blowfish has been deprecated
  cipher=algorithms.Blowfish,
/usr/local/lib/python3.11/dist-packages/scapy/layers/ipsec.py:485: CryptographyDeprecationWarning: CAST5 has been deprecated
  cipher=algorithms.CAST5,
Successfully copied 7.43MB to /host/warmboot
Warning: Stopping docker.service, but it can still be activated by:
  docker.socket
Watchdog armed for 180 seconds
0
client_loop: send disconnect: Broken pipe
assrinivasan@assrinivasan-dev-vm-1:/data/sonic$
assrinivasan@assrinivasan-dev-vm-1:/data/sonic$ ash str-s6100-acs-1
10.3.146.16
Warning: Permanently added '10.3.146.16' (RSA) to the list of known hosts.
Debian GNU/Linux 12 \n \l
.
.
admin@str-s6100-acs-1:~$  cat /host/pmon/stormond/fsio-rw-stats.json
{"sda": {"total_fsio_reads": "1234", "total_fsio_writes": "22312", "latest_fsio_reads": "654", "latest_fsio_writes": "311"}, "successful_sync_time": "1715715298"}admin@str-s6100-acs-1:~$
admin@str-s6100-acs-1:~$

Previous command output (if the output of a command-line utility has changed)

N/A

New command output (if the output of a command-line utility has changed)

N/A

…d ssdutil import to match corresponding change in sonic-platform-common
@xumia
Copy link
Collaborator

xumia commented Feb 22, 2024

/azp run Azure.sonic-utilities

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@assrinivasan assrinivasan marked this pull request as ready for review April 18, 2024 20:56
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@assrinivasan I thought we discussed to save the state in STATE_DB and during warm-reboot STATE_DB fileds are saved?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Prince -- I added the script here too because we dont save all the tables, just the following:

 
WARM_RESTART_TABLE
MIRROR_SESSION_TABLE
FG_ROUTE_TABLE
WARM_RESTART_ENABLE_TABLE
VXLAN_TUNNEL_TABLE
BUFFER_MAX_PARAM_TABLE
FAST_RESTART_ENABLE_TABLE

All the other keys are deleted in the backup_database function:

function backup_database()

So if we don't sync the STORAGE_INFO fields, we would lose them. Depending on when warm-reboot is called, the previously sync-ed values might be very outdated. That is why I added the sync script here.

scripts/reboot Outdated
Copy link
Contributor

@prgeor prgeor May 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@assrinivasan for cold reboot why not use the SIGTERM to save in stormond deinit path itself?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stormond currently does this, so I can certainly remove the sync script invocation here. I only kept it here as it is a good fail-safe to have and it doesn't hurt anything.

@@ -81,6 +81,17 @@ function stop_sonic_services()
stop_pmon_service
}

function sync_fsio_rw()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@assrinivasan same comment as fast-reboot case.

@assrinivasan assrinivasan changed the title Added a script to sync FS I/O reads/writes just before OS-level reboot Added a script invocation to sync FSIO reads/writes just before OS-level reboot May 14, 2024
@assrinivasan assrinivasan changed the title Added a script invocation to sync FSIO reads/writes just before OS-level reboot Modified reboot scripts to sync FSIO reads/writes to disk before OS-level reboot May 20, 2024
@prgeor prgeor merged commit 10e5341 into sonic-net:master May 21, 2024
7 checks passed
ryanzhu706 added a commit to ryanzhu706/sonic-utilities that referenced this pull request May 31, 2024
…evel reboot (sonic-net#3171)

* Added a script to sync FS I/O reads/writes just before reboot; renamed ssdutil import to match corresponding change in sonic-platform-common

* Added FSIO RW sync to all reboot scripts

* Reverted changes to setup.py and ssdutil

* Standardized invocation point of the FSIO sync script in all 3 scripts

* Modified code such that FSIO sync is initiated from stormon daemon.
arfeigin pushed a commit to arfeigin/sonic-utilities that referenced this pull request Jun 16, 2024
…evel reboot (sonic-net#3171)

* Added a script to sync FS I/O reads/writes just before reboot; renamed ssdutil import to match corresponding change in sonic-platform-common

* Added FSIO RW sync to all reboot scripts

* Reverted changes to setup.py and ssdutil

* Standardized invocation point of the FSIO sync script in all 3 scripts

* Modified code such that FSIO sync is initiated from stormon daemon.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants