Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use flock for the upgrade lock #1905

Merged
merged 7 commits into from
Mar 1, 2023
Merged

Use flock for the upgrade lock #1905

merged 7 commits into from
Mar 1, 2023

Conversation

remram44
Copy link
Contributor

Fixes #1903

This keeps the NEXTCLOUD_INIT_LOCK environment variable, though I don't know if it is still required. Without it, I don't need to factor do_install_or_upgrade() (because it's not called twice).

Locking around the entire upgrade process allows the container to check the version at the right time (so it can tell whether the upgrade is needed, e.g. it might have failed).

I think there is still a problem that the file copy might have completed but occ upgrade might not have finished, but I'm not sure how to detect that. The simplest way is to just always run occ upgrade, which succeeds immediately if there's nothing to do.

Signed-off-by: Remi Rampin <remi@rampin.org>
Signed-off-by: Remi Rampin <remi@rampin.org>
Signed-off-by: Remi Rampin <remi@rampin.org>
Signed-off-by: Remi Rampin <remi@rampin.org>
@remram44 remram44 changed the title Lock Use flock for the upgrade lock Jan 27, 2023
@meonkeys
Copy link
Contributor

meonkeys commented Feb 8, 2023

  1. Ooh, good call to put the lock in /var/www/html, I should have done that. I bet that's why use flock automatic lock for upgrade #1757 failed when @skjnldsv tested it.
  2. If we use flock, NEXTCLOUD_INIT_LOCK is no longer necessary.
  3. Nitpick (apologies in advance): the patch is hard to read because so many lines have changes. I tried ignoring whitespace but that didn't help. I'll try viewing the patch locally with other tools, just curious if there's any way to reduce the number of lines changed so it is more readable.
  4. Why not reopen use flock automatic lock for upgrade #1757 rather than starting from scratch? Unless this is intended to be a different approach?

@remram44
Copy link
Contributor Author

remram44 commented Feb 8, 2023

I know the patch is big, it's because I had to move some code into a function above. I broke that down into separate commits to help with review, 7a0aba9 is nothing else but the move so you can read the rest.

If we remove the NEXTCLOUD_INIT_LOCK variable the diff will be much shorter.

meonkeys added a commit to meonkeys/nextcloud-docker that referenced this pull request Feb 8, 2023
/var/lock is not likely shared across Nextcloud app server containers

Borrowed this idea from @remram44's patch in nextcloud#1905.

Signed-off-by: Adam Monsen <haircut@gmail.com>
@meonkeys
Copy link
Contributor

meonkeys commented Feb 8, 2023

What do you think of #1917 ? I updated that to only change from using a file to using flock, and incorporated your placement of the lockfile in /var/www/html instead of /var/lock. I think we should do the refactoring separately. Unless I'm missing something and it is required to use flock... it just looks functionally/logically separate at first glance.

meonkeys added a commit to meonkeys/nextcloud-docker that referenced this pull request Feb 8, 2023
/var/lock is not likely shared across Nextcloud app server containers

Borrowed this idea from @remram44's patch in nextcloud#1905.

Signed-off-by: Adam Monsen <haircut@gmail.com>
@remram44
Copy link
Contributor Author

remram44 commented Feb 8, 2023

The only reason for the factoring is to support NEXTCLOUD_INIT_LOCK, ie this part:

docker/docker-entrypoint.sh

Lines 226 to 237 in 295cdf7

if [ -n "${NEXTCLOUD_INIT_LOCK+x}" ]; then
(
if ! flock -n 9; then
# If we couldn't get it immediately, show a message, then wait for real
echo "Another process is initializing Nextcloud. Waiting..."
flock 9
fi
do_install_or_upgrade
) 9> /var/www/html/nextcloud-init-sync.lock
else
do_install_or_upgrade
fi

If there is no condition there is no need to factor into a function at all.

@meonkeys
Copy link
Contributor

meonkeys commented Feb 8, 2023

NEXTCLOUD_INIT_LOCK is not necessary if flock works as we intend (in every situation / image / etc).

And I think I finally get what you're doing now: no need to have other containers waiting longer and longer, use a blocking flock if another container/process already used flock on /var/www/html/nextcloud-init-sync.lock. Cool!

I was trying to change fewer things. I assumed there was perhaps another reason for the increasing waits for competing containers/processes so I left it as-is. Your approach seems cleaner, so why not do that?

Let me know what you think of #1917 or go ahead and make changes in this one... I don't care which patch moves forward.

@remram44
Copy link
Contributor Author

remram44 commented Feb 8, 2023

I removed the NEXTCLOUD_INIT_LOCK condition (and the now-useless function).

The difference between our patches at this point is that I grab the lock earlier, before we load the version information, and that I show a message if the lock can't be acquired immediately. I also updated the README about the removed environment variable.

Signed-off-by: Remi Rampin <remi@rampin.org>
Signed-off-by: Remi Rampin <remi@rampin.org>
Signed-off-by: Remi Rampin <remi@rampin.org>
@meonkeys
Copy link
Contributor

meonkeys commented Feb 8, 2023

Makes sense. I just needed some help understanding it, thank you. Great work! Seems like all upside to widen the scope of the locked region to include the read (version check) and write (install/upgrade) operations.

Let's test this and save the maintainers some time. See acceptance test steps at #1760 (comment)

@meonkeys
Copy link
Contributor

meonkeys commented Feb 8, 2023

LGTM. I ran the test mentioned above (with some tweaks) on 25/apache and 25/fpm-alpine. Locking with flock worked as expected for both.

:shipit:

@remram44
Copy link
Contributor Author

Is the CI failing something I have to deal with? I can't find a specific error from it.

@meonkeys
Copy link
Contributor

@skjnldsv sorry to be a pest -- would you mind reviewing this? LGTM, FWIW

@skjnldsv
Copy link
Member

@meonkeys unfortunately I am lacking available time.
Have you manually tested this ? If so can you both take care of the reviewing?
If confirmed everything is in order, I can approve and merge for you

@meonkeys
Copy link
Contributor

meonkeys commented Feb 24, 2023

@skjnldsv I've reviewed and tested the patch following the method in #1760 (comment).

@remram44 will you test this too? Or perhaps just comment on your confidence... I'm unsure about how well flock works if the filesystem is samba, nfs, or anything besides overlayfs, basically. I guess you already said that's already good to go. The other thing is the code, you moved the "critical section" (the part that is locked) a bit. LGTM, so maybe just give it another once-over and add your "ship it" / thumbs up.

@remram44
Copy link
Contributor Author

Locks should work on any filesystems that claims to be "POSIX" or similar. I have tested on CephFS and it works ✔️, NFS/GlusterFS/SeaweedFS/JuiceFS/Lustre should work too.

@meonkeys
Copy link
Contributor

meonkeys commented Mar 1, 2023

@skjnldsv sounds like we're good to go!

@skjnldsv
Copy link
Member

skjnldsv commented Mar 1, 2023

Restarted CI

@skjnldsv skjnldsv merged commit aac4d09 into nextcloud:master Mar 1, 2023
@remram44 remram44 deleted the lock branch March 12, 2023 17:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use flock for the upgrade lock
3 participants