Skip to content

Recovering from a corrupt PostgreSQL Database

garywong-bc edited this page Dec 27, 2017 · 2 revisions

If symptoms appear pointing to a corrupted database (e.g. database pod was killed without a graceful shutdown), then the postgres pod will not be able to restart. NOTE this is a rare case where the pod is indeed started successfully, but the DB isn't able to come up.

In that case, find the identifier for the pod that is 'Crash looping', and oc debug into it.

oc project moe-gwells-<dev/test/prod>
oc get pods
oc debug < postgresql-<identifer>

In that debug shell,

cd /var/lib/pgsql/data
mv userdata userdata_broken

IMPORTANT: In the rare case where the DB pod is up, be sure to scale it down to zero before moving that userdata/ folder. Otherwise, the fresh DB may be corrupted by the DB pod attempting logging, checkpoints, etc . But in this case, you must exist the Debug Pod prior to restarting as the DB Pod, as you may run out of resources (i.e. both the running Debug and running DB pod count as two running pods).

Re-deploy the postgres pod via the console (note that you may need to exit the debug shell if you hit the 'resource limit exceeded' message which stops the re-deploy.

The database will create a new \userdata folder and an empty set of database files. Once the fresh database is up, the gwells pod should re-deploy and run db-replicate.sh. If not (i.e. the database crashed hard, and the gwells pod is unaware that it needs to re-deploy and run the post-hook which starts db-replicate.sh), then you can open the the terminal of gwells pod and run db-replicate.sh

Monitor the log output of the db-replicate.sh script to ensure the data replication completes.

Clone this wiki locally