-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use flock for the upgrade lock #1905
Conversation
Signed-off-by: Remi Rampin <remi@rampin.org>
Signed-off-by: Remi Rampin <remi@rampin.org>
Signed-off-by: Remi Rampin <remi@rampin.org>
Signed-off-by: Remi Rampin <remi@rampin.org>
|
I know the patch is big, it's because I had to move some code into a function above. I broke that down into separate commits to help with review, 7a0aba9 is nothing else but the move so you can read the rest. If we remove the |
/var/lock is not likely shared across Nextcloud app server containers Borrowed this idea from @remram44's patch in nextcloud#1905. Signed-off-by: Adam Monsen <haircut@gmail.com>
What do you think of #1917 ? I updated that to only change from using a file to using |
/var/lock is not likely shared across Nextcloud app server containers Borrowed this idea from @remram44's patch in nextcloud#1905. Signed-off-by: Adam Monsen <haircut@gmail.com>
The only reason for the factoring is to support Lines 226 to 237 in 295cdf7
If there is no condition there is no need to factor into a function at all. |
And I think I finally get what you're doing now: no need to have other containers waiting longer and longer, use a blocking I was trying to change fewer things. I assumed there was perhaps another reason for the increasing waits for competing containers/processes so I left it as-is. Your approach seems cleaner, so why not do that? Let me know what you think of #1917 or go ahead and make changes in this one... I don't care which patch moves forward. |
I removed the The difference between our patches at this point is that I grab the lock earlier, before we load the version information, and that I show a message if the lock can't be acquired immediately. I also updated the README about the removed environment variable. |
Signed-off-by: Remi Rampin <remi@rampin.org>
Signed-off-by: Remi Rampin <remi@rampin.org>
Signed-off-by: Remi Rampin <remi@rampin.org>
Makes sense. I just needed some help understanding it, thank you. Great work! Seems like all upside to widen the scope of the locked region to include the read (version check) and write (install/upgrade) operations. Let's test this and save the maintainers some time. See acceptance test steps at #1760 (comment) |
LGTM. I ran the test mentioned above (with some tweaks) on 25/apache and 25/fpm-alpine. Locking with |
Is the CI failing something I have to deal with? I can't find a specific error from it. |
@skjnldsv sorry to be a pest -- would you mind reviewing this? LGTM, FWIW |
@meonkeys unfortunately I am lacking available time. |
@skjnldsv I've reviewed and tested the patch following the method in #1760 (comment). @remram44 will you test this too? Or perhaps just comment on your confidence... I'm unsure about how well flock works if the filesystem is samba, nfs, or anything besides overlayfs, basically. I guess you already said that's already good to go. The other thing is the code, you moved the "critical section" (the part that is locked) a bit. LGTM, so maybe just give it another once-over and add your "ship it" / thumbs up. |
Locks should work on any filesystems that claims to be "POSIX" or similar. I have tested on CephFS and it works ✔️, NFS/GlusterFS/SeaweedFS/JuiceFS/Lustre should work too. |
@skjnldsv sounds like we're good to go! |
Restarted CI |
Fixes #1903
This keeps the
NEXTCLOUD_INIT_LOCK
environment variable, though I don't know if it is still required. Without it, I don't need to factordo_install_or_upgrade()
(because it's not called twice).Locking around the entire upgrade process allows the container to check the version at the right time (so it can tell whether the upgrade is needed, e.g. it might have failed).
I think there is still a problem that the file copy might have completed but
occ upgrade
might not have finished, but I'm not sure how to detect that. The simplest way is to just always runocc upgrade
, which succeeds immediately if there's nothing to do.