Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

postgres fails to restart after VM reboot #33

Closed
jsievers opened this issue Jan 5, 2018 · 2 comments
Closed

postgres fails to restart after VM reboot #33

jsievers opened this issue Jan 5, 2018 · 2 comments

Comments

@jsievers
Copy link

jsievers commented Jan 5, 2018

Steps to reproduce

  • use bosh to deploy a postgres-release v23 VM
  • reboot the VM on IaaS level (e.g. nova reboot <VM_CID> if on openstack)

Observed behaviour

  • postgres consistently keeps failing to start because the pid file directory /var/vcap/sys/run/postgres/ does not exist.

from postgres_ctl.err.log :

[2018-01-05 12:53:31+0000] + main start
[2018-01-05 12:53:31+0000] + local action
[2018-01-05 12:53:31+0000] + action=start
[2018-01-05 12:53:31+0000] + source /var/vcap/jobs/postgres/bin/pgconfig.sh
[2018-01-05 12:53:31+0000] ++ current_version=9.6.4
[2018-01-05 12:53:31+0000] ++ pgversion_current=postgres-9.6.4
[2018-01-05 12:53:31+0000] ++ pgversion_old=postgres-9.6.3
[2018-01-05 12:53:31+0000] ++ pgversion_older=postgres-9.6.2
[2018-01-05 12:53:31+0000] ++ JOB_DIR=/var/vcap/jobs/postgres
[2018-01-05 12:53:31+0000] ++ PACKAGE_DIR=/var/vcap/packages/postgres-9.6.4
[2018-01-05 12:53:31+0000] ++ STORE_DIR=/var/vcap/store
[2018-01-05 12:53:31+0000] ++ PG_STORE_DIR=/var/vcap/store/postgres
[2018-01-05 12:53:31+0000] ++ DATA_DIR=/var/vcap/store/postgres/postgres-9.6.4
[2018-01-05 12:53:31+0000] ++ DATA_DIR_PREVIOUS=/var/vcap/store/postgres/postgres-previous
[2018-01-05 12:53:31+0000] ++ DATA_DIR_OLD=/var/vcap/store/postgres/postgres-unknown
[2018-01-05 12:53:31+0000] ++ PACKAGE_DIR_OLD=/var/vcap/packages/postgres-unknown
[2018-01-05 12:53:31+0000] ++ POSTGRES_UPGRADE_LOCK=/var/vcap/store/postgres/POSTGRES_UPGRADE_LOCK
[2018-01-05 12:53:31+0000] ++ pgversion_upgrade_from=postgres-unknown
[2018-01-05 12:53:31+0000] ++ '[' -d /var/vcap/store/postgres/postgres-9.6.3 -a -f /var/vcap/store/postgres/postgres-9.6.3/postgresql.conf ']'
[2018-01-05 12:53:31+0000] ++ '[' -d /var/vcap/store/postgres/postgres-9.6.2 -a -f /var/vcap/store/postgres/postgres-9.6.2/postgresql.conf ']'
[2018-01-05 12:53:31+0000] ++ RUN_DIR=/var/vcap/sys/run/postgres
[2018-01-05 12:53:31+0000] ++ LOG_DIR=/var/vcap/sys/log/postgres
[2018-01-05 12:53:31+0000] ++ PIDFILE=/var/vcap/sys/run/postgres/postgres.pid
[2018-01-05 12:53:31+0000] ++ CONTROL_JOB_PIDFILE=/var/vcap/sys/run/postgres/postgresctl.pid
[2018-01-05 12:53:31+0000] ++ HOST=0.0.0.0
[2018-01-05 12:53:31+0000] ++ PORT=5524
[2018-01-05 12:53:31+0000] ++ [[ -n '' ]]
[2018-01-05 12:53:31+0000] ++ LD_LIBRARY_PATH=/var/vcap/packages/postgres-9.6.4/lib
[2018-01-05 12:53:31+0000] + set +u
[2018-01-05 12:53:31+0000] + source /var/vcap/packages/postgres-common/utils.sh
[2018-01-05 12:53:31+0000] + set -u
[2018-01-05 12:53:31+0000] + case "${action}" in
[2018-01-05 12:53:31+0000] + pid_guard /var/vcap/sys/run/postgres/postgresctl.pid 'Postgres control job' false
[2018-01-05 12:53:31+0000] + local pidfile=/var/vcap/sys/run/postgres/postgresctl.pid
[2018-01-05 12:53:31+0000] + local 'name=Postgres control job'
[2018-01-05 12:53:31+0000] + local logall=false
[2018-01-05 12:53:31+0000] + '[' false == true ']'
[2018-01-05 12:53:31+0000] + '[' -f /var/vcap/sys/run/postgres/postgresctl.pid ']'
[2018-01-05 12:53:31+0000] + echo 1315
[2018-01-05 12:53:31+0000] /var/vcap/jobs/postgres/bin/postgres_ctl: line 18: /var/vcap/sys/run/postgres/postgresctl.pid: No such file or directory
[2018-01-05 12:55:11+0000] + main start
[2018-01-05 12:55:11+0000] + local action
[2018-01-05 12:55:11+0000] + action=start
[2018-01-05 12:55:11+0000] + source /var/vcap/jobs/postgres/bin/pgconfig.sh
[2018-01-05 12:55:11+0000] ++ current_version=9.6.4
[2018-01-05 12:55:11+0000] ++ pgversion_current=postgres-9.6.4
[2018-01-05 12:55:11+0000] ++ pgversion_old=postgres-9.6.3
[2018-01-05 12:55:11+0000] ++ pgversion_older=postgres-9.6.2
[2018-01-05 12:55:11+0000] ++ JOB_DIR=/var/vcap/jobs/postgres
[2018-01-05 12:55:11+0000] ++ PACKAGE_DIR=/var/vcap/packages/postgres-9.6.4
[2018-01-05 12:55:11+0000] ++ STORE_DIR=/var/vcap/store
[2018-01-05 12:55:11+0000] ++ PG_STORE_DIR=/var/vcap/store/postgres
[2018-01-05 12:55:11+0000] ++ DATA_DIR=/var/vcap/store/postgres/postgres-9.6.4
[2018-01-05 12:55:11+0000] ++ DATA_DIR_PREVIOUS=/var/vcap/store/postgres/postgres-previous
[2018-01-05 12:55:11+0000] ++ DATA_DIR_OLD=/var/vcap/store/postgres/postgres-unknown
[2018-01-05 12:55:11+0000] ++ PACKAGE_DIR_OLD=/var/vcap/packages/postgres-unknown
[2018-01-05 12:55:11+0000] ++ POSTGRES_UPGRADE_LOCK=/var/vcap/store/postgres/POSTGRES_UPGRADE_LOCK
[2018-01-05 12:55:11+0000] ++ pgversion_upgrade_from=postgres-unknown
[2018-01-05 12:55:11+0000] ++ '[' -d /var/vcap/store/postgres/postgres-9.6.3 -a -f /var/vcap/store/postgres/postgres-9.6.3/postgresql.conf ']'
[2018-01-05 12:55:11+0000] ++ '[' -d /var/vcap/store/postgres/postgres-9.6.2 -a -f /var/vcap/store/postgres/postgres-9.6.2/postgresql.conf ']'
[2018-01-05 12:55:11+0000] ++ RUN_DIR=/var/vcap/sys/run/postgres
[2018-01-05 12:55:11+0000] ++ LOG_DIR=/var/vcap/sys/log/postgres
[2018-01-05 12:55:11+0000] ++ PIDFILE=/var/vcap/sys/run/postgres/postgres.pid
[2018-01-05 12:55:11+0000] ++ CONTROL_JOB_PIDFILE=/var/vcap/sys/run/postgres/postgresctl.pid
[2018-01-05 12:55:11+0000] ++ HOST=0.0.0.0
[2018-01-05 12:55:11+0000] ++ PORT=5524
[2018-01-05 12:55:11+0000] ++ [[ -n '' ]]
[2018-01-05 12:55:11+0000] ++ LD_LIBRARY_PATH=/var/vcap/packages/postgres-9.6.4/lib
[2018-01-05 12:55:11+0000] + set +u
[2018-01-05 12:55:11+0000] + source /var/vcap/packages/postgres-common/utils.sh
[2018-01-05 12:55:11+0000] + set -u
[2018-01-05 12:55:11+0000] + case "${action}" in
[2018-01-05 12:55:11+0000] + pid_guard /var/vcap/sys/run/postgres/postgresctl.pid 'Postgres control job' false
[2018-01-05 12:55:11+0000] + local pidfile=/var/vcap/sys/run/postgres/postgresctl.pid
[2018-01-05 12:55:11+0000] + local 'name=Postgres control job'
[2018-01-05 12:55:11+0000] + local logall=false
[2018-01-05 12:55:11+0000] + '[' false == true ']'
[2018-01-05 12:55:11+0000] + '[' -f /var/vcap/sys/run/postgres/postgresctl.pid ']'
[2018-01-05 12:55:11+0000] + echo 1473
[2018-01-05 12:55:11+0000] /var/vcap/jobs/postgres/bin/postgres_ctl: line 18: /var/vcap/sys/run/postgres/postgresctl.pid: No such file or directory

the important line from log above is

[2018-01-05 12:53:31+0000] /var/vcap/jobs/postgres/bin/postgres_ctl: line 18: /var/vcap/sys/run/postgres/postgresctl.pid: No such file or directory

Expected behaviour

postgres is able to survive a VM reboot.

Workaround

bosh ssh into the postgres VM and manually execute

mkdir -p /var/vcap/sys/run/postgres/
chown -R vcap:vcap /var/vcap/sys/run/
monit start postgres

Possible fix

there are similar bugs already fixed in other bosh releases, see e.g. cloudfoundry/capi-release#25

I think the problem is that the control scripts rely on prestart to create $RUN_DIR:

mkdir -p "${RUN_DIR}"
chown -R vcap:vcap "${RUN_DIR}"

and in case of external VM reboot (i.e. outside of any bosh operation), prestart is not executed.
/var/vcap/data/sys/run/ is a temp file system

$ mount | grep sys/run
tmpfs on /var/vcap/data/sys/run type tmpfs (rw,size=1m)

and thus is empty after VM reboot (which is fine).

although there is an mkdir -p $RUN_DIR in

, this is too late in case of VM reboot because of this line executed earlier:

echo $$ > ${CONTROL_JOB_PIDFILE}

which assumes the directory $RUN_DIR already exists.

Initialization of $RUN_DIR should probably happen as the very first thing when start) is invoked, see the fix for a similar problem in cloud_controller_ng

In our scenario (we run CCDB on postgres-release) this bug has caused an extended Cloudfoundry downtime triggered by forced VM reboots which were required to deploy urgent security fixes on the IaaS level.

@cf-gitbot
Copy link

We have created an issue in Pivotal Tracker to manage this:

https://www.pivotaltracker.com/story/show/154085953

The labels on this github issue will be updated when the story is started.

@valeriap
Copy link
Contributor

@jsievers
Thanks for reporting this issue.
The fix in included in v24.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants