Incident Response

When you get that call at 3am and you need to jump into the prod environment to save the day, what are the things to know, and how can you go about fixing the system?

Quick Checks

supervisorctl status   # see if any services are stopped or have errors

tail /opt/oddslingers.poker/data/logs/http-worker.log     # see errors when starting django

supervisorctl restart all

Response Checklist

backup the system, ALWAYS ALWAYS ALWAYS create a backup/snapshot/db_dump before running custom SSH commands on a production server, especially when under pressure to fix things quickly (see db backup instructions)
where are all the files? get a lay of the land and figure out where these keys things are:
- the main code repo: /opt/oddslingers.poker
- logs: /data/logs
- config files: /opt/oddslingers.poker/env/prod.env
- database: is it on the same machine or a separate server? (check env/{ODDSLINGERS_ENV}.env and env/secrets.env)
- system resources:
  - check running processes with htop, systemctl status , and supervisorctl status
  - check disk space remaining with ncdu -h /
  - network connectivity iftop mtr
Figure out exactly what you're trying to do, and explain it to another team member before proceeding, e.g.:
- fix wrong code deployed or broken deploy -> deploy new code
- fix slow server due to high resource consumption: processes, connections, cpu, disk, etc -> identify bad process with htop, stop it safely, and fix underlying issue
See if there's a tool already built to help achieve the goal you want, e.g.
- if you need to redeploy, don't fuss with files manually, just find the deploy command and run it
- if you need to restart a service, use supervisord, don't just killall and run the proc manually, you'll end up conflicting with other services

See the Production article for more information.

	Github	CircleCI	Sentry

Home | Common Tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incident Response

Quick Checks

Response Checklist

Setup: Dev

Documentation

Guides

Clone this wiki locally