As an example, you might have these steps in your runbook for a maintenance event:
- Announce in the engineering channel that maintenance is about to start.
- Set in-app message for the maintenance.
- Stop the service:
Step/Stop-Important-Service () {
echo "Stopping service ..."
Runbook/confirm-continue-task
echo "Important service stopped!"
}
- Take a snapshot of the DB.
- Run the maintenance script:
Step/Run-Maintenance-Script () {
Runbook/confirm-continue-task
echo "Doing real work ..."
local i=0
while (( i < 7 )); do
echo -n .; sleep 1
(( ++i ))
done
echo
echo "All done!"
}
- Start the service back up:
Step/Start-Important-Service () {
Runbook/confirm-continue-task
echo "Starting the service back up..."
echo "Service started!"
}
- Check that things are still working.
- Notify in the engineering channel that the maintenance is now over.
Task to check the status of the service:
Task/Check-Service-Status () {
echo Checking service status ...
echo Service is Up.
}
Task to page for help:
Task/Page-2nd-level-on-call () {
echo 'Help!!!'
}