-
-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix deadlock in SCSSCacher #21059
Fix deadlock in SCSSCacher #21059
Conversation
Fixes nextcloud#15794 (comment) Signed-off-by: Leo Le Bouter <lle-bout@zaclys.net>
I don't think so but that code is quite hard to understand. If you see |
@kesselb Yes, indeed. Either way this part is buggy so... I'm for recoding this clearly over-engineered piece of code. |
O man that code... yeah it needs to die... I'm currently a bit puzzled as to why the unlock there fixes things... as it should not be locked at that stage or? |
Me too. Isn't failed to compile and/or save ..../css/results.scss the actual problem? I added the screenshot from your comment to the post. |
@rullzer I think it's that if it gets to this point it's likely there's a deadlock and therefore this could release the buggy lock |
I think this code is responsible for lots of increased latency on each page load because it gets called on every page (I think) |
Maybe. SCSSCacher is responsible to compile the scss and theming app customizations together. That should happen once and the result cached. It seems to be (from the screenshot) that this is actually broken (on your setup) and the compile and cache logic is executed for any request. You are probably right the code is scary but I would try to fix the other issue (disable a custom theme, disable theming app, etc.) first. |
Hey, well I don't have any theming app installed or any sort of visual customization. Does the Register app count as customization? It adds a Register button on the login page.
I have a really standard setup, no weird fuss. I use apache2 with php7.3-fpm, Ubuntu 18.04. I value long term maintenance and the least effort principle as a sysadmin. I have this instance running since a year or so without any Nextcloud related issue at all for that time, until now. This issue started happening out of nowhere, I assume it can be due to an extension updating, how could I figure out which one is it? I have some cron job to automatically update apps to their latest versions every now and then. |
I'm able to reproduce the situation locally:
This pull request "fixes" the symptoms and not the problem. Isn't there something else in your logs? |
Here's my config (it's the one I put in my admin docs, it hasnt been changed manually ever since, but Nextcloud changed the version automatically for example, I also tried raising the log level to see more info but couldnt find anything interesting):
And I got nothing else in my logs How can I query the filecache table? The integrity checks pass and the scss files do exist |
So I started reverting the temporary patch I applied to save my users a headache, disabled the theming app, did not experience the slowness, enabled it again, can't experience the slowness either. So it stopped reproducing for now, and nothing related in the logs either.. ugh - few extension updates have happened since I opened this PR, so it may be it. |
You might use occ maintenance:repair to clear the scss cache (and probably trigger the problem again). |
I did that and it did not make it happen again, however, it logged two errors, and threw HTTP 500 then I refreshed a few times more and it started working OK.
|
Issue reproduced again today somehow |
@@ -175,6 +175,10 @@ public function process(string $root, string $file, string $app): bool { | |||
$retry++; | |||
} | |||
$this->logger->debug('SCSSCacher: Giving up scss caching for '.$lockKey, ['app' => 'core']); | |||
|
|||
// Cleaning lock | |||
$this->lockingCache->remove($lockKey); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But this process basically never locked, so why should it remove?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nickvergessen It's being dead-locked somewhere but I don't know where.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, but if multiple requests are running parallel,
one would unlock for the other?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nickvergessen Yes, I do not understand the current code fully, I think it's acknowledged it's not coherent, I have applied that patch in a live environment that fixed the symptoms but not the root cause. I will free time at some point to investigate this further if no one has done it yet..
@juliushaertl and I looked into this and #23478 should fix the root cause for the issue. It would then allow to use the newly cached files and properly return. |
Let's close this here and continue in the linked PR. |
Fixes #15794 (comment)
It seems to me that the lock needs to be released in this return path too, otherwise it would create a deadlock on every next run. Please double check for me, I'm wholly unfamiliar with the code base. Either way, it solves the slowness in the production deployment I maintain.
It would be great if this change could be backported to 18.x and previous releases.