Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warm reboot: restore the database docker with content saved #2216

Merged
merged 8 commits into from
Nov 2, 2018
40 changes: 36 additions & 4 deletions files/build_templates/docker_image_ctl.j2
Original file line number Diff line number Diff line change
Expand Up @@ -2,19 +2,51 @@

function getMountPoint()
{
echo $1 | python -c "import sys, json, os; mnts = [x for x in json.load(sys.stdin)[0]['Mounts'] if x['Destination'] == '/usr/share/sonic/hwsku']; print '' if len(mnts) == 0 else os.path.basename(mnts[0]['Source'])" 2>/dev/null
echo $1 | python -c "import sys, json, os; mnts = [x for x in json.load(sys.stdin)[0]['Mounts'] if x['Destination'] == '/usr/share/sonic/hwsku']; print '' if len(mnts) == 0 else os.path.basename(mnts[0]['Source'])" 2>/dev/null
}

function getRebootType()
Copy link
Collaborator

@lguohan lguohan Nov 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getBootType #Resolved

{
local TYPE
case "$(cat /proc/cmdline)" in
*fast-reboot*)
qiluo-msft marked this conversation as resolved.
Show resolved Hide resolved
TYPE='fast'
;;
*warm-reboot*)
qiluo-msft marked this conversation as resolved.
Show resolved Hide resolved
TYPE='warm'
;;
*)
TYPE="normal"
esac
echo $TYPE
}

function postStartAction()
{
REBOOT_TYPE=`getRebootType`
Copy link
Collaborator

@lguohan lguohan Nov 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BOOT_TYPE #Resolved

{%- if docker_container_name == "database" %}
until [[ $(/usr/bin/docker exec database redis-cli ping | grep -c PONG) -gt 0 ]]; do
sleep 1;
done
if [[ "$REBOOT_TYPE" == "warm" && -d /host/warmboot ]]; then
WARM_DIR=/host/warmboot
function redisLoadAndDelete()
{
Copy link
Contributor

@yxieca yxieca Oct 31, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function needs to also take database ID as a parameter #Resolved

FILENAME="$1"
test -e $FILENAME && redis-load -s /var/run/redis/redis.sock -e EMPTY $FILENAME && rm $FILENAME
Copy link
Contributor

@yxieca yxieca Oct 31, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few issues from test:

  • rm always fail in this function. you need to issue "sudo rm" to get it to work.
  • "-s /var/run/redis/redis.sock" cause import to fail always. Removing this option works better.
  • import fails randomly. I am stilling looking for a way to make it working reliably. This service is crucial that it has to be reliable.
  • I think you shouldn't use '&&' notation. We want to remove these files regardless import succeeded or not. right? I don't think we should retry warm-boot if any failure was encountered. #Resolved

Copy link
Contributor

@yxieca yxieca Oct 31, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a thought:

Maybe we should catch these db restore failures and in case of failure, clear the database and continue with a regular boot up? #Resolved

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • rm fixed
  • redis-load fixed. if any more failure case, let me know
  • I cannot agree to make it retry blindly. I make it exit immediately and we should fix if there is error in normal case.

In reply to: 229778573 [](ancestors = 229778573)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I make it exit immediately and we should fix if there is error in normal case.


In reply to: 229780996 [](ancestors = 229780996)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern is that if we fail database service in product, the device will be in failed state but ASIC is still forwarding. I am not sure if this is better than coming up with cold start and suffer a short IO disruption?

}
# Load applDB from /host/warm-reboot/appl_db.json
redisLoadAndDelete $WARM_DIR/appl_db.json
Copy link
Collaborator

@lguohan lguohan Nov 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is the DB argument? #Resolved

# Load configDB from /host/warm-reboot/config_db.json
redisLoadAndDelete $WARM_DIR/config_db.json
# Load stateDB from /host/warm-reboot/state_db.json
redisLoadAndDelete $WARM_DIR/state_db.json
# Load asicDB from /host/warm-reboot/asic_db.json
redisLoadAndDelete $WARM_DIR/asic_db.json
fi
{%- elif docker_container_name == "swss" %}
docker exec swss rm -f /ready # remove cruft
if [[ -d /host/fast-reboot ]];
then
if [[ "$REBOOT_TYPE" == "fast" && -d /host/fast-reboot ]]; then
test -e /host/fast-reboot/fdb.json && docker cp /host/fast-reboot/fdb.json swss:/
test -e /host/fast-reboot/arp.json && docker cp /host/fast-reboot/arp.json swss:/
test -e /host/fast-reboot/default_routes.json && docker cp /host/fast-reboot/default_routes.json swss:/
Expand Down Expand Up @@ -58,7 +90,7 @@ start() {
echo "Starting existing {{docker_container_name}} container with HWSKU $HWSKU"
docker start {{docker_container_name}}
postStartAction
exit 0
exit $?
fi

# docker created with a different HWSKU, remove and recreate
Expand Down