Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check platform reboot cause to see if any reset happened during fast/warm-reboot #7920

Closed
wants to merge 8 commits into from
14 changes: 14 additions & 0 deletions files/build_templates/docker_image_ctl.j2
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,17 @@ function getMountPoint()
echo $1 | python -c "import sys, json, os; mnts = [x for x in json.load(sys.stdin)[0]['Mounts'] if x['Destination'] == '/usr/share/sonic/hwsku']; print '' if len(mnts) == 0 else os.path.abspath(mnts[0]['Source'])" 2>/dev/null
}

function isPlatformReset()
{
output=$(python3 -c "import sonic_platform.platform; p = sonic_platform.platform.Platform(); c = p.get_chassis(); hw_rc_major, hw_rc_minor = c.get_reboot_cause(); print(hw_rc_major)" 2>/dev/null)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sujinmkang : On checking the logs double time, found out one of the process(database.sh) is mentioning it as cold reset despite fast-boot happened. I suspect this is because either determine-reboot-cause or process-reboot-cause function is not up until that time?
Will there be any impact if database is still considering the BOOT_TYPE as cold?

sonic database.sh[680]: BOOT_TYPE  we received cold
sonic lldp.sh[2314]: BOOT_TYPE  we received fast
sonic pmon.sh[2308]: BOOT_TYPE  we received fast
sonic-s6100-01 bgp.sh[2886]: BOOT_TYPE  we received fast
sonic-s6100-01 swss.sh[2887]: BOOT_TYPE  we received fast
sonic-s6100-01 teamd.sh[3655]: BOOT_TYPE  we received fast
sonic-s6100-01 snmp.sh[3658]: BOOT_TYPE  we received fast
sonic-s6100-01 radv.sh[3651]: BOOT_TYPE  we received fast
sonic-s6100-01 syncd.sh[3656]: BOOT_TYPE  we received fast
sonic-s6100-01 dhcp_relay.sh[4441]: BOOT_TYPE  we received fast

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@santhosh-kt we're not using determine-reboot-cause or process-reboot-cause for this determination but only using platform api. Probably does the database always consider the boot_type as cold even without this change? Since we're sending the same boot_type to all container starting script. Right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry! I misunderstood. Except database.sh script all other daemon scripts are called lately if I understood.

So all those scripts(except database.sh) could have easily interpreted the boot type since get_reboot_cause() is showing correct value as track_reboot_reason.sh script got executed before these scripts.

But for database.sh, this get_reboot_cause() is showing wrong value as it runs even before s6100_platform.sh itself.
My question is, will wrong BOOT_TYPE has any impact in database.sh script?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@santhosh-kt can we detect the platform driver got initialized or not? Or should we check the exception from platform api call to see the platform driver initialization status? Any other suggestion?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@santhosh-kt I checked the database container start code, I don't think the boot_type is be used for database container starts.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yxieca I don't think the database container is starting differently based on the boot-type. do you have any comments or concern?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Database container start code has a special handling during warm reboot: https://github.com/Azure/sonic-buildimage/blob/master/files/build_templates/docker_image_ctl.j2#L87

echo "${output}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this should echo more context, rather than just the reboot cause?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sujinmkang while testing in 201811 image, This line(66) causing changes in getBootType() function.
(ie) BOOT_TYPE = "reboot cause" value instead of "cold", "warm", "fastfast", etc

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing this line (echo "${output}") resolved the issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I commented about it yesterday but it said pending status.

if [[ "${output}" != "Non-Hardware" ]]; then
return 0
else
return 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a comment here to explain this logic?

fi
}

function getBootType()
{
# same code snippet in files/scripts/syncd.sh
Expand All @@ -76,6 +87,9 @@ function getBootType()
*)
TYPE='cold'
esac
if isPlatformReset; then
TYPE='cold'
fi
echo "${TYPE}"
}

Expand Down