Skip to content
This repository has been archived by the owner on Feb 22, 2022. It is now read-only.

[stable/postgresql] pod not starting #9093

Closed
alexandruast opened this issue Nov 8, 2018 · 15 comments
Closed

[stable/postgresql] pod not starting #9093

alexandruast opened this issue Nov 8, 2018 · 15 comments
Labels
lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.

Comments

@alexandruast
Copy link

alexandruast commented Nov 8, 2018

Is this a request for help?:
YES

Is this a BUG REPORT or FEATURE REQUEST? (choose one):
BUG REPORT

Version of Helm and Kubernetes:
Client: &version.Version{SemVer:"v2.11.0", GitCommit:"2e55dbe1fdb5fdb96b75ff144a339489417b146b", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.11.0", GitCommit:"2e55dbe1fdb5fdb96b75ff144a339489417b146b", GitTreeState:"clean"}

Client Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.3", GitCommit:"2bba0127d85d5a46ab4b778548be28623b32d0b0", GitTreeState:"clean", BuildDate:"2018-05-21T09:17:39Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.3", GitCommit:"2bba0127d85d5a46ab4b778548be28623b32d0b0", GitTreeState:"clean", BuildDate:"2018-05-21T09:05:37Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}

Which chart:
stable/postgresql
DEPLOYED postgresql-2.3.1 10.5.0 default

What happened:
CrashLoopBackOff on start

nami    INFO  postgresql successfully initialized
INFO  ==> Starting postgresql...
nami    ERROR Unable to start com.bitnami.postgresql: pg_ctl: could not start server
Examine the log output.

Also, there is no straightforward way to see other logs other than this.
Hints on how to inject a sidecar for debugging purposes would also be welcomed.

What you expected to happen:
Pod running and serving connections

How to reproduce it (as minimally and precisely as possible):
helm install stable/postgresql

Anything else we need to know:
Running on Docker for Windows Version 18.06.1-ce-win73 (19507)
Why this chart is based on bitnami image and not the official postgres image is something that I would appreciate clarifying.

@alexandruast alexandruast changed the title [stable/postgresql] [stable/postgresql] pod not starting Nov 8, 2018
@mattange
Copy link

Having the same issue upon restart. Maintainers please help!!!
Have you got persistance enabled?

@alexandruast
Copy link
Author

Cannot believe that after two weeks nobody bothers to fix this...

@carrodher
Copy link
Collaborator

carrodher commented Nov 22, 2018

I am not able to reproduce the issue using Docker for OSX:

kubectl version 1.12
helm version v2.11.0
Docker version 18.09.0

I followed the next steps:

$ helm install stable/postgresql
$ kubectl get pods
NAME                            READY   STATUS    RESTARTS   AGE
dull-grasshopper-postgresql-0   1/1     Running   0          8m
$ kubectl logs dull-grasshopper-postgresql-0

Welcome to the Bitnami postgresql container
Subscribe to project updates by watching https://github.com/bitnami/bitnami-docker-postgresql
Submit issues and feature requests at https://github.com/bitnami/bitnami-docker-postgresql/issues

nami    INFO  Initializing postgresql
postgre INFO  ==> No injected postgresql.conf file found. Creating default postgresql.conf file...
postgre INFO  ==> No injected pg_hba.conf file found. Creating default pg_hba.conf file...
postgre INFO  ==> Deploying PostgreSQL from scratch...
postgre INFO  ==> Creating postgres user with unrestricted access...
postgre INFO  ==> Configuring PostgreSQL...
postgre INFO  ==> Configuring replication parameters...
postgre INFO  ==> Configuring permissions for config files...
postgre INFO
postgre INFO  ########################################################################
postgre INFO   Installation parameters for postgresql:
postgre INFO     Root User: postgres
postgre INFO     Password: **********
postgre INFO   (Passwords are not shown for security reasons)
postgre INFO  ########################################################################
postgre INFO
nami    INFO  postgresql successfully initialized
INFO  ==> Starting postgresql...
2018-11-22 10:10:46.670 GMT [54] LOG:  received fast shutdown request
2018-11-22 10:10:46.673 GMT [54] LOG:  aborting any active transactions
2018-11-22 10:10:46.675 GMT [54] LOG:  worker process: logical replication launcher (PID 61) exited with exit code 1
2018-11-22 10:10:46.678 GMT [56] LOG:  shutting down
2018-11-22 10:10:46.701 GMT [54] LOG:  database system is shut down
2018-11-22 10:10:48.484 GMT [104] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2018-11-22 10:10:48.484 GMT [104] LOG:  listening on IPv6 address "::", port 5432
2018-11-22 10:10:48.489 GMT [104] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
2018-11-22 10:10:48.510 GMT [105] LOG:  database system was shut down at 2018-11-22 10:10:46 GMT
2018-11-22 10:10:48.516 GMT [104] LOG:  database system is ready to accept connections

It seems an issue related to Docker on Windows, can you try installing it without persistence?

$ helm install --set persistence.enabled=false stable/postgresql

About

Why this chart is based on bitnami image and not the official postgres image is something that I would appreciate clarifying.

you can find all the information in the issue and PR opened to migrate the previous image to this one

@alexandruast
Copy link
Author

alexandruast commented Nov 22, 2018

Hi @carrodher,
the issue seems to be with the persistence, starting it with persistence.enabled=false won't throw an error:

Welcome to the Bitnami postgresql container
Subscribe to project updates by watching https://github.com/bitnami/bitnami-docker-postgresql
Submit issues and feature requests at https://github.com/bitnami/bitnami-docker-postgresql/issues

nami    INFO  Initializing postgresql
postgre INFO  ==> No injected postgresql.conf file found. Creating default postgresql.conf file...
postgre INFO  ==> No injected pg_hba.conf file found. Creating default pg_hba.conf file...
postgre INFO  ==> Deploying PostgreSQL from scratch...
postgre INFO  ==> Creating postgres user with unrestricted access...
postgre INFO  ==> Configuring PostgreSQL...
postgre INFO  ==> Configuring replication parameters...
postgre INFO  ==> Configuring permissions for config files...
postgre INFO
postgre INFO  ########################################################################
postgre INFO   Installation parameters for postgresql:
postgre INFO     Root User: postgres
postgre INFO     Password: **********
postgre INFO   (Passwords are not shown for security reasons)
postgre INFO  ########################################################################
postgre INFO
nami    INFO  postgresql successfully initialized
INFO  ==> Starting postgresql...
2018-11-22 11:18:57.537 GMT [57] LOG:  received fast shutdown request
2018-11-22 11:18:57.545 GMT [57] LOG:  aborting any active transactions
2018-11-22 11:18:57.546 GMT [57] LOG:  worker process: logical replication launcher (PID 64) exited with exit code 1
2018-11-22 11:18:57.546 GMT [59] LOG:  shutting down
2018-11-22 11:18:57.653 GMT [57] LOG:  database system is shut down
2018-11-22 11:18:59.065 GMT [106] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2018-11-22 11:18:59.065 GMT [106] LOG:  listening on IPv6 address "::", port 5432
2018-11-22 11:18:59.085 GMT [106] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
2018-11-22 11:18:59.123 GMT [108] LOG:  database system was shut down at 2018-11-22 11:18:57 GMT
2018-11-22 11:18:59.135 GMT [106] LOG:  database system is ready to accept connections

Being able to see the actual logs on why it fails in the first place would be awesome. Logs should be visible without the overhead of sidecar injection.
Logs telling you to check logs is something... weird:

ERROR Unable to start com.bitnami.postgresql: pg_ctl: could not start server
Examine the log output.

@carrodher
Copy link
Collaborator

I am trying to reproduce the error in different platforms but no luck, it seems an issue related to Docker for Windows managing securityContext and volumes

##
## Init containers parameters:
## volumePermissions: Change the owner of the persist volume mountpoint to RunAsUser:fsGroup
##
volumePermissions:
  image:
    registry: docker.io
    repository: bitnami/minideb
    tag: latest
    pullPolicy: Always
  securityContext:
    runAsUser: 0

## Pod Security Context
## ref: https://kubernetes.io/docs/tasks/configure-pod-container/security-context/
##
securityContext:
  enabled: true
  fsGroup: 1001
  runAsUser: 1001

replication:
  enabled: true
  user: repl_user
  password: repl_password
  slaveReplicas: 1

Looking through the internet I found this issue docker/for-win#2048 that seems to be the same, but without any response

@psaia
Copy link

psaia commented Nov 28, 2018

@carrodher I don't believe this is related to Windows. I'm having this problem on GCP using GCEPersistentDisk.

Using the default values, the chart installs well the first time. However, if I:

  1. helm install
  2. helm delete --purge
  3. helm install again without completely deleting the pvc first, this error will happen.

Without digging too deep yet, I imagine this has to do with the permissions but there isn't an obvious error to follow.

Welcome to the Bitnami postgresql container
Subscribe to project updates by watching https://github.com/bitnami/bitnami-docker-postgresql
Submit issues and feature requests at https://github.com/bitnami/bitnami-docker-postgresql/issues

nami    INFO  Initializing postgresql
postgre INFO  ==> No injected postgresql.conf file found. Creating default postgresql.conf file...
postgre INFO  ==> No injected pg_hba.conf file found. Creating default pg_hba.conf file...
postgre INFO  ==> Deploying PostgreSQL with persisted data...
postgre INFO  ==> Configuring PostgreSQL...
postgre INFO  ==> Configuring replication parameters...
postgre INFO  ==> Configuring permissions for config files...
postgre INFO
postgre INFO  ########################################################################
postgre INFO   Installation parameters for postgresql:
postgre INFO     Persisted data and properties have been restored.
postgre INFO     Any input specified will not take effect.
postgre INFO   This installation requires no credentials.
postgre INFO  ########################################################################
postgre INFO
nami    INFO  postgresql successfully initialized
INFO  ==> Starting postgresql...
nami    ERROR Unable to start com.bitnami.postgresql: pg_ctl: could not start server
Examine the log output.

Basically, it's very dangerous to delete your Postgres chart as you won't be able to start it again using the same persistent disk.

@bholzer
Copy link

bholzer commented Dec 5, 2018

I am experiencing the same thing in EKS. If I attempt a deployment with the same name as a previously deleted release, postgres will fail to start if a persistent volume remains from the previous deployment.

@ChSch3000
Copy link

Same issue here. I even fail when I delete all previous pv and pcv.
Any ideas?

@ahmadalli
Copy link
Contributor

ahmadalli commented Jan 2, 2019

I've enabled debug mode of the image. I'm having issues with the ownerships

FATAL: data directory "/opt/bitnami/postgresql/data" has wrong ownership
HINT: The server must be started by the user that owns the data directory.

@ahmadalli
Copy link
Contributor

I changed my nfs setting to no_all_squash and the issue was fixed

@juan131
Copy link
Collaborator

juan131 commented Jan 2, 2019

Hi @bholzer

I am experiencing the same thing in EKS. If I attempt a deployment with the same name as a previously deleted release, postgres will fail to start if a persistent volume remains from the previous deployment.

According to what you say it should be related with previous PVC that were not removed when deleting the Helm chart and do not have the right permissions.

Could you please check the existing PVCs after deleting the previous chart? E.g. (chart named "my-release")

$ helm delete --purge my-release
release "my-release" deleted
$ kubectl get pvc
NAME                           STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS
data-my-release-postgresql-0   Bound    pvc-626c5604-0e73-11e9-be88-42010a8e00c9   8Gi        RWO            standard

You can remove the previous PVCs by running:

$ kubectl delete pvc data-my-release-postgresql-0
persistentvolumeclaim "data-my-release-postgresql-0" deleted

After that you should be able to install PostgreSQL without issues.

@aerotog
Copy link
Contributor

aerotog commented Jan 4, 2019

@alexandruast it sounds like others are having problems with pre-existing PVC in cloud providers but no one has addressed your original issue where Postgres won't start with persistence enabled on Docker for Windows (D4W). I've been seeing the same failure and while I don't have a full answer, here is some extra information that might help and a possible workaround. (Sorry for the long post. This is in part for my own peace of mind to get everything I've found written down in one place)

Also, there is no straightforward way to see other logs other than this.
Hints on how to inject a sidecar for debugging purposes would also be welcomed.

There is a related open ticket on the bitnami repo this chart is based on: bitnami/bitnami-docker-postgresql#91. In that ticket @juan131 mentioned nami logs to /opt/bitnami/postgresql/logs/postgresql.log. With that information, I was able to get the logs from the failed run with:

>kubectl exec vocal-dachshund-postgresql-0 cat /opt/bitnami/postgresql/logs/postgresql.log
2019-01-04 20:49:01.744 GMT [86] FATAL:  data directory "/opt/bitnami/postgresql/data" has wrong ownership
2019-01-04 20:49:01.744 GMT [86] HINT:  The server must be started by the user that owns the data directory.

I don't know how to use a sidecar to read these logs, but running that command should help identify the root cause of your problem. I'm betting it's the same failure as mine as we are both using Docker for Windows.

Docker for Windows has a known limitation for its hostpath storage class. By default D4W mounts volumes to %userprofile%\.docker\Volumes. When running helm install stable/postgresql you should see it create a folder in that directory that mathes the pod name. Unfortunately D4W is unable to correctly assign permissions to volumes mounted in this way. You can find a better explanation at docker/for-win#1669.

That issue also mentions the same error I was seeing The server must be started by the user that owns the data directory. To get around this, @pgayvallet suggests setting the PV to use a mount path on the Linux VM host outside of /host_mnt (the directory on the Linux VM which maps to Windows .docker/Volumes).

You can follow his instructions or create these Kubernetes objects to make a volume that should work:

kind: PersistentVolume
apiVersion: v1
metadata:
  name: pgdata
  labels:
    type: local
spec:
  storageClassName: hostpath
  capacity:
    storage: 8Gi
  accessModes:
    - ReadWriteOnce
  hostPath:
    path: "/tmp"
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: pgdata
spec:
  storageClassName: hostpath
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 8Gi

Once the PV and PVC exist, you should be able to install this chart with the command: helm install stable/postgresql --set persistence.existingClaim=pgdata

...

The catch is that because doing this mounts the volume to the Linux VM /tmp/ directory, it will stick around until the Linux VM is blown away. So all future deployments will use the same mounted volume which can cause problems with permissions etc. D4W intentionally locks down the Linux VM so you can't SSH into it. I've found a roundabout way to gain access to it and clear the /tmp/ directory manually but it's not pretty. I can provide the method if anyone thinks it would be useful. Otherwise, you can delete the Moby Linux VM and have D4W recreate it at startup.

@juan131
Copy link
Collaborator

juan131 commented Jan 8, 2019

Hi @aerotog

Thanks so much for all the details you shared. It might be useful to create a troubleshooting guide with these and other solutions to workaround known issues on D4W. What do you think @javsalgar ?

@stale
Copy link

stale bot commented Feb 7, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.

@stale stale bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 7, 2019
@stale
Copy link

stale bot commented Feb 21, 2019

This issue is being automatically closed due to inactivity.

@stale stale bot closed this as completed Feb 21, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
None yet
Development

No branches or pull requests

9 participants