postgres data gone after minikube node reboot #15065

gattytto · 2019-11-01T18:18:38Z

Describe the bug

rebooting the minikube node hosting a che env, postgres pod's /var/lib/pgsql/data is gone, postgres and keycloak pods go BackOff

Che version

latest
[ *] nightly
other: please specify

Steps to reproduce

anything that causes the minikube node to reboot (be it gracefully or a hard reset)

Expected behavior

I expect the che context to be brought back up with postgres and keycloak pods loading the pre-existing database until I decide to issue chectl:delete

Runtime

kubernetes (include output of kubectl version)
Openshift (include output of oc version)
[* ] minikube (include output of minikube version and kubectl version)
minishift (include output of minishift version and oc version)
docker-desktop + K8S (include output of docker version and kubectl version)
other: (please specify)

Screenshots

Installation method

[* ] chectl
chectl server:start -m -p minikube
che-operator
minishift-addon
I don't know

Environment

Additional context

the PersistentVolume implemented by chectl to start the postgres should use a path beginning with /data to avoid minikube earsing its content upon a node hard-reset.

hostpath field "path:" set to empty when defining a PersistentVolume causes minikube default StorageClass implementation to use /tmp/hostpath-provisioner/ as the folder, which gets emptied upon reboots according to https://minikube.sigs.k8s.io/docs/reference/persistent_volumes/

if this gets sorted out I could go on and run test-scenarios for the workspace pods too.

$ kubectl get pv pvc-90a86e5a-a7d8-43b5-9bae-9e1064f9df0b -o yaml

apiVersion: v1
kind: PersistentVolume
metadata:
annotations:
hostPathProvisionerIdentity: 47e548c5-fca5-11e9-9417-02427d267bb8
pv.kubernetes.io/provisioned-by: k8s.io/minikube-hostpath
creationTimestamp: "2019-11-01T15:56:33Z"
finalizers:

kubernetes.io/pv-protection
name: pvc-90a86e5a-a7d8-43b5-9bae-9e1064f9df0b
resourceVersion: "175275"
selfLink: /api/v1/persistentvolumes/pvc-90a86e5a-a7d8-43b5-9bae-9e1064f9df0b
uid: 071444ea-f8f9-4943-9bd0-c7170b94f995
spec:
accessModes:
ReadWriteOnce
capacity:
storage: 1Gi
claimRef:
apiVersion: v1
kind: PersistentVolumeClaim
name: postgres-data
namespace: che
resourceVersion: "175266"
uid: 90a86e5a-a7d8-43b5-9bae-9e1064f9df0b
hostPath:
path: /tmp/hostpath-provisioner/pvc-90a86e5a-a7d8-43b5-9bae-9e1064f9df0b
type: ""
persistentVolumeReclaimPolicy: Delete
storageClassName: standard
volumeMode: Filesystem
status:
phase: Bound

The text was updated successfully, but these errors were encountered:

gattytto · 2019-11-01T19:03:18Z

kubernetes/minikube#3582 (comment)

ibuziuk · 2019-11-04T17:51:15Z

Looks like related to disaster recovery - #14240
@gattytto thanks for reporting and looks like you did a pretty good analysis. Will you be interested in contributing a fix?

the PersistentVolume implemented by chectl to start the postgres should use a path beginning with /data to avoid minikube earsing its content upon a node hard-reset.

hostpath field "path:" set to empty when defining a PersistentVolume causes minikube default StorageClass implementation to use /tmp/hostpath-provisioner/ as the folder, which gets emptied upon reboots according to https://minikube.sigs.k8s.io/docs/reference/persistent_volumes/

if this gets sorted out I could go on and run test-scenarios for the workspace pods too.

gattytto · 2019-11-05T01:10:48Z

@ibuziuk yes partially, I’m in testing phase but it can be done

gattytto · 2019-11-07T03:13:46Z

I need some help, please. I will provide reproduction steps. First of all this is specific to minikube+chectl deployment of che.

so far I did code changes in https://github.com/gattytto/che-operator and started the deployment using:
chectl server:start -m -p minikube --che-operator-image=quay.io/gattytto/che-operator:latest -t /usr/local/lib/chectl/templates

one part of the change is to controller code adding the persistentVolume, and there's also a storageClass in https://github.com/gattytto/che-operator/blob/master/deploy/storageclass.yaml with which I had to use kubectl command to add it to the cluster, because for some reason the dashboard doesn't accept it (but CMDLine kubectl does). the storage class is hardcoded to the persistentVolumeClaim(PVC) and the persistentVolume(PV) because the PVC gets the standard one when created without specific storageclass and PV gets none. I see the argument to use a specific storage class but for the time I just hardcoded it.

chectl yaml files for role.yaml and cluster-role.yaml had the addition of the persistentvolumes resource, I have edited the ones in https://github.com/gattytto/che-operator/blob/master/deploy/role.yaml and /cluster-role.yaml respectively and copied them to:
/usr/local/lib/chectl/templates/che-operator/
so chectl uses them when starting the deployment.

I have manually created /data/minikube folder and set permission to 777, the operator startup process effectively creates the subfolder "userdata", which holds the postgres db files and has the expected user rights for UID=26 and GID=26.
THIS PART IS IMPORTANT, because the PersistentVolume type is DirectoryOrCreate, and since in the scenario that minikube is using the vm-driver=none tag (running inside LXC container), minikube is running as root and the directory minikube inside /data will be created with root:root rights. so That's why I pre-created it and set the rights to 777.
this will be fixable from code when minikube team implements the "mountoptions" property for persistentVolumes in minikube.

Part of the process gets done and it gets stuck before deploying the plugin registry. I don't know why and I also don't know how to further debug / test why the operator is stopping the deplyment process.
As seen in the screenshot, what I CAN be sure of, is that both keycloak and postgres pods are started and healthy, I have also accessed keycloak-che url and successfully logged in as admin:admin.

gattytto · 2019-11-07T03:30:29Z

and it works after a hard reset of the LXC container, at least what was started, comes back.

sleshchenko · 2019-11-07T11:53:23Z

@gattytto Could you share che-operator logs. AFAIK che-operator do some exec in keycloak, maybe it's failed.

gattytto · 2019-11-08T18:06:18Z

I have finished the code modifications to persist postgres data and it works.

After a hard reset of the LXC container, postgres, keycloack and che come back.

as for Workspaces: they don't, because their storage got deleted by minikube

gattytto · 2019-12-01T16:29:38Z

it seems like persistentvolumeclaim provisioning is split in half for the kubernetes use-case, che-operator provisions postgres-data volume and che-server follows config values set in volumeclaimStrategy and uses java code to make the volumes for the workspaces. Could this be moved to che-operator golang code instead?

simha369 · 2019-12-04T17:35:24Z

I am still facing the same issue, Persistent volume Postgres data lost after minikube stop.
Do we have a solution for this problem? please share.
If this is working in an earlier minikube version. please share the working minikube version.
i am facing issue in minikube version: v1.5.2

gattytto · 2019-12-06T22:52:05Z

@simha369 no there's no fix but I have filed a feature request #15157 .. you can patch the che-operator code to persist your postgres database and general info (like ssh keys?) from your dev env, but after a hard reset you would still need to recreate (delete and create again) the workspaces from your devfiles registry or using factories. So depending on what you need to persist there is a workaround or not (for the moment)

AndrienkoAleksandr · 2020-01-02T14:57:37Z

@gattytto Join to review, please eclipse-che/che-operator#144

tolusha · 2020-01-23T09:25:37Z

@gattytto
Do you think we can close the issue?

gattytto · 2020-01-23T20:00:56Z

I'm very happy to say yes

gattytto added the kind/bug Outline of a bug - must adhere to the bug report template. label Nov 1, 2019

gattytto changed the title ~~postgres data gone after minikube node hard reset~~ postgres data gone after minikube node reboot Nov 1, 2019

che-bot added the status/need-triage An issue that needs to be prioritized by the curator responsible for the triage. See https://github. label Nov 1, 2019

ibuziuk removed the status/need-triage An issue that needs to be prioritized by the curator responsible for the triage. See https://github. label Nov 4, 2019

ibuziuk added severity/P1 Has a major impact to usage or development of the system. area/install Issues related to installation, including offline/air gap and initial setup team/platform labels Nov 4, 2019

skabashnyuk removed the team/platform label Nov 4, 2019

gattytto mentioned this issue Nov 10, 2019

Extra option --PersistentVolumeName and --HostPathSource to CheCTL #15157

Closed

AndrienkoAleksandr mentioned this issue Jan 2, 2020

Add ability configure host volume path. eclipse-che/che-operator#144

Closed

gattytto closed this as completed Jan 23, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

postgres data gone after minikube node reboot #15065

postgres data gone after minikube node reboot #15065

gattytto commented Nov 1, 2019 •

edited

Loading

gattytto commented Nov 1, 2019

ibuziuk commented Nov 4, 2019

gattytto commented Nov 5, 2019

gattytto commented Nov 7, 2019 •

edited

Loading

gattytto commented Nov 7, 2019

sleshchenko commented Nov 7, 2019

gattytto commented Nov 8, 2019 •

edited

Loading

gattytto commented Dec 1, 2019 •

edited

Loading

simha369 commented Dec 4, 2019

gattytto commented Dec 6, 2019

AndrienkoAleksandr commented Jan 2, 2020

tolusha commented Jan 23, 2020

gattytto commented Jan 23, 2020

postgres data gone after minikube node reboot #15065

postgres data gone after minikube node reboot #15065

Comments

gattytto commented Nov 1, 2019 • edited Loading

Describe the bug

Che version

Steps to reproduce

Expected behavior

Runtime

Screenshots

Installation method

Environment

Additional context

$ kubectl get pv pvc-90a86e5a-a7d8-43b5-9bae-9e1064f9df0b -o yaml

gattytto commented Nov 1, 2019

ibuziuk commented Nov 4, 2019

gattytto commented Nov 5, 2019

gattytto commented Nov 7, 2019 • edited Loading

gattytto commented Nov 7, 2019

sleshchenko commented Nov 7, 2019

gattytto commented Nov 8, 2019 • edited Loading

gattytto commented Dec 1, 2019 • edited Loading

simha369 commented Dec 4, 2019

gattytto commented Dec 6, 2019

AndrienkoAleksandr commented Jan 2, 2020

tolusha commented Jan 23, 2020

gattytto commented Jan 23, 2020

gattytto commented Nov 1, 2019 •

edited

Loading

gattytto commented Nov 7, 2019 •

edited

Loading

gattytto commented Nov 8, 2019 •

edited

Loading

gattytto commented Dec 1, 2019 •

edited

Loading