Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check platform reboot cause to see if any reset happened during fast/warm-reboot #7920

Closed
wants to merge 8 commits into from
19 changes: 19 additions & 0 deletions files/build_templates/docker_image_ctl.j2
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,23 @@ function preStartAction()
fi
{%- elif docker_container_name == "snmp" %}
$SONIC_DB_CLI STATE_DB HSET 'DEVICE_METADATA|localhost' chassis_serial_number $(decode-syseeprom -s)
{%- elif docker_container_name == "swss" %}
if [[ "$BOOT_TYPE" == "fast" ]] && [[ -d /host/fast-reboot ]]; then
if [[ -f /host/reboot-cause/previous-reboot-cause.json ]]; then
REG_BOOT_TYPE="fast*"
CAUSE_NO_AVAIL="\"N/A\""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you use white space instead of tab?

REBOOT_CAUSE="$(cat /host/reboot-cause/previous-reboot-cause.json | jq '.cause')"
EXTRA_CAUSE="$(cat /host/reboot-cause/previous-reboot-cause.json | jq '.comment')"

# Clear the FAST_REBOOT|system db setting if EXTRA_REBOOT_CAUSE is not "N/A" before starting swss
if [[ $REBOOT_CAUSE =~ $REG_BOOT_TYPE ]]; then
if [[ "${EXTRA_CAUSE}" != "${CAUSE_NO_AVAIL}" ]]; then
# Delete the FAST_REBOOT|system db setting
$SONIC_DB_CLI STATE_DB DEL "FAST_REBOOT|system" &>/dev/null
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can you guarantee that syncd didn't take fast-reboot approach? syncd could have read this value before swss deletes it.

fi
fi
fi
fi
{%- else %}
: # nothing
{%- endif %}
Expand Down Expand Up @@ -154,6 +171,7 @@ function postStartAction()
fi
fi

# Set the db setting to indicate that the device performed fast_reboot when the BOOT_TYPE is "fast_reboot"
if [[ "$BOOT_TYPE" == "fast" ]]; then
# set the key to expire in 3 minutes
$SONIC_DB_CLI STATE_DB SET "FAST_REBOOT|system" "1" "EX" "180"
Expand Down Expand Up @@ -208,6 +226,7 @@ start() {
# Obtain boot type from kernel arguments
BOOT_TYPE=`getBootType`


# Obtain our platform as we will mount directories with these names in each docker
PLATFORM=${PLATFORM:-`$SONIC_CFGGEN -H -v DEVICE_METADATA.localhost.platform`}

Expand Down
23 changes: 19 additions & 4 deletions src/sonic-host-services/scripts/determine-reboot-cause
Original file line number Diff line number Diff line change
Expand Up @@ -184,15 +184,30 @@ def main():
# Check if the previous reboot was warm/fast reboot by testing whether there is "fast|fastfast|warm" in /proc/cmdline
proc_cmdline_reboot_cause = find_proc_cmdline_reboot_cause()

# Check if the previous reboot was caused by hardware
hardware_reboot_cause = find_hardware_reboot_cause()

# Check if the previous reboot was caused by software, get the reboot cause from REBOOT_CAUSE_FILE
software_reboot_cause = find_software_reboot_cause()

# If /proc/cmdline does not indicate reboot cause, check if the previous reboot was caused by hardware
if proc_cmdline_reboot_cause is None:
previous_reboot_cause = find_hardware_reboot_cause()
previous_reboot_cause = hardware_reboot_cause
if previous_reboot_cause.startswith(REBOOT_CAUSE_NON_HARDWARE):
# If the reboot cause is non-hardware, get the reboot cause from REBOOT_CAUSE_FILE
previous_reboot_cause = find_software_reboot_cause()
# If the reboot cause is non-hardware, set the previous reboot cause with software_reboot_cause
previous_reboot_cause = software_reboot_cause
else:
# Check if any software reboot was issued before this hardware reboot happened
if software_reboot_cause is not REBOOT_CAUSE_UNKNOWN:
additional_reboot_info = software_reboot_cause

else:
# Get the reboot cause from REBOOT_CAUSE_FILE
previous_reboot_cause = find_software_reboot_cause()
previous_reboot_cause = software_reboot_cause
# Check if there is any hardware reboot or reset
if not hardware_reboot_cause.startswith(REBOOT_CAUSE_NON_HARDWARE):
# Add the hardware_reboot_cause into additional_reboot_info
additional_reboot_info = hardware_reboot_cause

# Current time
reboot_cause_gen_time = str(datetime.datetime.now().strftime('%Y_%m_%d_%H_%M_%S'))
Expand Down