Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cannot initialize nextcloud when enable persistence on kubernetes #1006

Open
mamiapatrick opened this issue Feb 18, 2020 · 18 comments
Open

cannot initialize nextcloud when enable persistence on kubernetes #1006

mamiapatrick opened this issue Feb 18, 2020 · 18 comments
Labels
data persistence Volumes, bind mounts, etc. k8s/helm/etc k8s/helm/etc matters needs review Needs confirmation this is still happening or relevant question

Comments

@mamiapatrick
Copy link

Hello i just install nextcloud in my private kubernetes cluster. If i install with no persistence, the software (pod) launch as well but anytime i tried to install it on a Persistent volume it just stuck at intializing and the pod never starts. With this i cannot persist data, config and others informations. I alors notice that even if i setup an external database. I still have as environnement variable sqllite_database

@johnbayo
Copy link

i had this issue too, the mistake i made was persisting /var/www/html which would get stuck at initializing. Persist only the data directory then it should work by that i mean
volumeMounts:
- name: nextcloud-data-dir
mountPath: /var/www/html/data

if your pod ends up restarting after initial installation you will get another error message which is

Username is invalid because files already exist for this user

the way to get about this is to always change your nextcloud_admin_user before you restart the pod and the new user can be deleted later directly from the application.

Any suggestion on how to bypass this by editing the entrypoint would be nice, because i am currently trying to figure how to do that without editing the nextcloud_admin_user everytime

@mamiapatrick
Copy link
Author

hello @johnbayo

i read your response and thank you but that one is not very "automatic" because we will need human intervention anytime the pods restarts ...like nextcloud cannot works normally on kubernetes as other pods.

In another hand if you do not persist custom_app and setting how do you keep these one persistent while pods restarts...

@mamiapatrick
Copy link
Author

@johnbayo if you persist only data, do the config will be persistent if the pods restarts as config is at the mountPath: /var/www/html/config

@johnbayo
Copy link

@mamiapatrick no you cant persist the config. the config gets generated only on initialization. you have to edit the entrypoint by that let another script update your config on each pod restart.

@mamiapatrick
Copy link
Author

@johnbayo but why everytime i delete the pod, i got an error that the username already exist. The pod is delete when change some configuration

@johnbayo
Copy link

@mamiapatrick you need to change the admin user before deleting your pod each time or another option would be to edit the entrypoint to ignore this. there might be another solution but unfortunately, i am not aware of that

@i5Js
Copy link

i5Js commented May 14, 2020

At least some light in this issue. Indeed html can't be mounted, or will be stuck, but when the installation complete the pod never comes up:

i5Js@nanopim4:~/nextcloud$ kubectl logs --follow nextcloud -n nextcloud
Initializing nextcloud 18.0.4.2 ...
Initializing finished
New nextcloud instance
Installing with MySQL database
starting nextcloud installation
Nextcloud was successfully installed
setting trusted domains…
System config value trusted_domains => 1 set to string domain_name
AH00558: apache2: Could not reliably determine the server's fully qualified domain name, using 10.42.0.56. Set the 'ServerName' directive globally to suppress this message
AH00558: apache2: Could not reliably determine the server's fully qualified domain name, using 10.42.0.56. Set the 'ServerName' directive globally to suppress this message
[Thu May 14 11:06:45.742135 2020] [mpm_prefork:notice] [pid 1] AH00163: Apache/2.4.38 (Debian) PHP/7.3.17 configured -- resuming normal operations
[Thu May 14 11:06:45.742523 2020] [core:notice] [pid 1] AH00094: Command line: 'apache2 -D FOREGROUND'

i5Js@nanopim4:~/nextcloud$ kubectl get pod -n nextcloud
NAME READY STATUS RESTARTS AGE
nextcloud-bcf868c97-q9btj 0/1 Running 0 10m

Any tips?

@kquinsland
Copy link

Glad i'm not the only one that's hitting this.

After some more research while drafting this post, i found an issue that i think is related:
nextcloud/helm#590

My issue / steps to reproduce:

I am attempting to update to the new 19.0.2 build. I have user data on a persistent volume and a database instance set up in a different pod. When I start with a 'fresh' volume for data, I have no issues setting up and installing so I know that there is no issue talking to the database.

Every time i kill the pod, a new one comes back... which is exactly what is supposed to happen. However, when I go to the next cloud instance, I get the message Username is invalid because files already exist for this user. I can work around this by relocating the existing folder, creating a new admin user and copying the content from the old/relocated folder into the 'new' admin folder. This same 'workaround' is required for each user, too.

I think that this has something to do with the instanceID... but this is not something that can be adjusted via env-vars so it can't be kept constant across new pods.

@kquinsland
Copy link

I think i've figured out how to get past this:

  1. create a PVC for user data and config data
  2. pipe the user data PVC into your deployment, but not the config
  3. do the setup wizard / confirm that things "work"
  4. copy the entire /var/www/html/config/* to the user PVC (make a CFG folder or similar)
  5. delete the deployment
  6. modify the deployment to now also use the user PVC
  7. re-apply the deployment
  8. check that the /var/www/html/config/ dir now has nothing in it (or, possibly just a config.php with just the instanceid)
  9. copy all of the php files from the CFGdir on the user PVC into the config dir and back on to the config PVC
  10. restart the pod again. You should still be able to access/login to NextCloud except now the file that has the instanceID is on a PV and no longer tied to the lifecycle of the pod.

Short version:

  • Copy the entire config dir somewhere safe after setting up NC and making sure things work as expected
  • modify the NC deployment to use a PVC for the /var/www/html/config path
  • copy the 'backed up' config to this PVC

It's one hell of a messy work around but it seems to be working for me so far.

@nilbacardit26
Copy link

Hey @kquinsland glad to read your message. Yesterday I managed to install NC on my own kubernetes kluster and I encountered a bunch of errors related to what you are saying.
I've been using the official chart and a newer version of NC, 19.0.1 than the one in the values,yaml

I deployed NC with persistency (PVC) and using an external Postgres database. First run works all as expected, setting up a liveness and readiness proof to 5 minutes, because it takes time to set up the whole environment, and if the pod restarts, I found all the problems exposed here, and here nextcloud/helm#590
I believe we are all interested here on being able to redeploy NC if necessary, that is why we use K8s. If the pods die for some reason, I want the NC instance to deal with that. The problem seems now that the system tries to reinstall and create all the tables in the db again, making it impossible to automize the process.

I will post my values later on during the day, Right now I do not have them, but basically I have an NFS disk which is used in my PV/PVC and then I mount the whole /var/www/html/config exacty as the deployment says, except I deleted the mounting part of /var/www/html. It got stuck if not. Among other things, I spent a lot of time yesterday making it work.

The only solution I found was deleting the whole DB and the whole dir mounted in the PVC to make it run from zero, which is not what I want of course. I am going to try to only replace the config dir.

I could not make it work with more than one replica, I guess it is the same problem though, where all of them try to reinstall NC.

@i5Js
Copy link

i5Js commented Aug 28, 2020

Hi @nilbacardit26 You’ve described my pain word by word... I’m done, I think nextcloud is not ready to work with Kubernetes...

@nilbacardit26
Copy link

@i5Js You are right, we basically use K8s to be able to rely in a system that can recover from errors on its own, and nowadays, that is not the case with the actual chart and entrypoint.

@jeandevops
Copy link

Same problem here. It would be great if it worked at Kubernetes. Sad.

@Alfablos
Copy link

Alfablos commented Nov 2, 2020

Hey guys,
I've tried this too. I've seen that it hangs because of the rsync commands in the entrypoint. I'm using NFS (4.1) as a storage backend and it takes about 20-30 minutes to complete the copy from /usr/src/nextcloud to /var/www/html.
I've added some flags to rsync (basically v,r and --append) and I can see the big list of files being (very) slowly copied.
After it finishes the nextcloud installation works correctly but it's pretty evident that I need to switch to a more performant storage backend, I'll try iSCSI.
Anyway with such a long operation the pod will fail the readiness probe (I set the initialDelaySeconds to 120 seconds) and be killed but if you're using the --append rsync option the next container will continue where the previous left off until after some pod sacrifice the probe is succeded.
If this doesn't happen you can still run
su www-data -s /bin/sh -c "php occ maintenance:install
and nextcloud will complete the installation

@Alfablos
Copy link

Alfablos commented Nov 3, 2020

Here's how the interesting part of the entrypoint.sh looks like:

            if [ "$(id -u)" = 0 ]; then
                rsync_options="-vrlDog --chown www-data:root --progress --append"
            else
                rsync_options="-rlDv --progress --append"
            fi

And here's what I added in the values.yaml after creating the "docker-entrypoint" configMap replacing the original lines of code with the above:

  extraVolumes:
    - name: nextcloud-entrypoint
      configMap:
        name: nextcloud-entrypoint
        defaultMode: 0700                  #Way too generous

  extraVolumeMounts:
    - name: nextcloud-entrypoint
      mountPath: "/entrypoint.sh"
      subPath: entrypoint.sh

Also, in values.yaml:

livenessProbe:
  enabled: true
  initialDelaySeconds: 120
  periodSeconds: 15
  timeoutSeconds: 5
  failureThreshold: 3
  successThreshold: 1
readinessProbe:
  enabled: true
  initialDelaySeconds: 120
  periodSeconds: 15
  timeoutSeconds: 5
  failureThreshold: 3
  successThreshold: 1

It takes 5 "restarts" for the rsync --append to finish copying but I'm ok with that: it just happens once and delaying the check any further means longer time before Kubernetes understands there's a problem of any kind.
Hope this helps

@Diftraku
Copy link

Diftraku commented Oct 2, 2021

Hey guys, I've tried this too. I've seen that it hangs because of the rsync commands in the entrypoint. I'm using NFS (4.1) as a storage backend and it takes about 20-30 minutes to complete the copy from /usr/src/nextcloud to /var/www/html. I've added some flags to rsync (basically v,r and --append) and I can see the big list of files being (very) slowly copied. After it finishes the nextcloud installation works correctly but it's pretty evident that I need to switch to a more performant storage backend, I'll try iSCSI. Anyway with such a long operation the pod will fail the readiness probe (I set the initialDelaySeconds to 120 seconds) and be killed but if you're using the --append rsync option the next container will continue where the previous left off until after some pod sacrifice the probe is succeded. If this doesn't happen you can still run su www-data -s /bin/sh -c "php occ maintenance:install and nextcloud will complete the installation

I can confirm having the same issue with NFS v4 as the backing storage for the PVC used for Nextcloud's persistence, I recently bumped the image from 22.1.1 to 22.2.0 and rsync is still chugging away as I write this reply. I had startup probe enabled on my helm install but it seems to not even exist in the deployment (for better or worse).

I'm curious if cp has the same issues as rsync does with NFS or if it's more about NFS handling small files very badly in the first place. My current setup pretty much relies on sharing data over NFS as I'm not entirely sure KVM allows you to share the same block device between multiple VMs (and how to actually make use of that with k3s' local-path storage class).

iSCSI might be the way to go for situations like these but I'd prefer to use NFS as it's infinitely simpler to setup and get going than iSCSI when using Debian.

BinaryMan32 added a commit to BinaryMan32/argocd that referenced this issue Jan 23, 2022
nextcloud/docker#1006 recommended setting a
livenessProbe and readinessProbe to deal with the slow rsync on initial
launch. However, startupProbe is the recommended way to deal with this
rather than making the livenessProbe and readinessProbe unnecessarily long,
which increases the latency to detect failure conditions.
@joshtrichards joshtrichards added k8s/helm/etc k8s/helm/etc matters question data persistence Volumes, bind mounts, etc. labels Oct 23, 2023
@jessebot
Copy link
Contributor

So I was looking at nextcloud/helm#590 (comment) and nextcloud/helm#590 (comment) in nextcloud/helm#590, and I think both @kquinsland and @WladyX are onto something.

I posted some ideas and suggestions in nextcloud/helm#590 (comment) but the gist seems to be that we check /var/www/html/version.php for the nextcloud version, and if that file doesn't exist, we initialize a new install.

The issue is that I'm not sure how to persist that file, without just using our normal PVC setup, which users don't want to use if they're already using S3, since version.php is not created by nextcloud/helm nor nextcloud/docker. I think it's created by nextcloud/server 🤔

Perhaps we can do some sort of check to see if s3 is already enabled? 🤔 Maybe checking if $OBJECTSTORE_S3_BUCKET is set in the docker-entrypoint.sh? Open to ideas and suggestions to make this more approachable in either repo.

@joshtrichards joshtrichards added the needs review Needs confirmation this is still happening or relevant label Oct 27, 2024
@joshtrichards
Copy link
Member

joshtrichards commented Oct 27, 2024

The core of the matter is that some k8s users seem to be disabling persistence of /var/www/html because "it doesn't work".

E.g.

I deleted the mounting part of /var/www/html. It got stuck if not.

It seems in most cases this is an NFS / rsync interaction. Sometimes it is merely a performance matter (some of the examples above plus others like #1582). Sometimes it's a configuration matter (e.g. #1200)

However it also seems many people have no issues, so perhaps we limit the scope to:

  • why are some people having harder time with NFS than others?
  • are there some things we can document better to help these people?
  • are there some things we can do re: rsync to help a bit too?

P.S. Redesigning the image (and/or Nextcloud Server itself) to work w/o persistent storage for its installation folder is a bigger conversation (and a longer road probably), and already covered in #340 and #2044.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data persistence Volumes, bind mounts, etc. k8s/helm/etc k8s/helm/etc matters needs review Needs confirmation this is still happening or relevant question
Projects
None yet
Development

No branches or pull requests

10 participants