Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimise database queries #218

Closed
thenameiwantedwastaken opened this issue May 8, 2020 · 36 comments
Closed

Optimise database queries #218

thenameiwantedwastaken opened this issue May 8, 2020 · 36 comments
Assignees
Labels
bug Something isn't working

Comments

@thenameiwantedwastaken
Copy link

Every time I try to access my recipes page I get a 504 timeout. If I refresh the page it works okay. All other pages of my nextcloud implementation work okay, so it seems there is something about the recipe app that is simply too slow. I am running nextcloud via docker / nginx. I don't think it's a problem with my specific setup of nextcloud as all other aspects of nextcloud appear quickly.

Navigating within the recipes is fine once the recipe page has shown... but just the initial view is too slow.

@mrzapp
Copy link
Contributor

mrzapp commented May 8, 2020

I am using the same setup as you, Docker with NGINX, and I've never observed this behaviour. You're also the first to open an issue like this, so I assume there is something else going on.

I can only imagine a timeout would occur if the filesystem is being slow for some reason. Are your recipes stored on a network mount that might not respond at times?

Also, if you can find some relevant log messages, it would help shed some light on the situation.

@thenameiwantedwastaken
Copy link
Author

Recipes are stored in the same folder structure as all the other nextcloud files. I can browse the recipe images and json files without delay via the nextcloud folder interface. However, if I actually try to open a json file to preview it in the nextcloud interface, I just get a spinning circle. This happens to all json files, not just recipes.

@christianlupus
Copy link
Collaborator

Do you mazbe have a plugin installed in NC that tries to start up and open the JSON?

@mrzapp
Copy link
Contributor

mrzapp commented May 12, 2020

@thenameiwantedwastaken are you perhaps using an older NC version than 18?

@dietrichmd
Copy link

dietrichmd commented May 17, 2020

I too have this issue. When I load cookbook, I get 504 error..upon reload, it loads fine.

Looking at the logs, it looks like a php memory error

[proxy_fcgi:error] [pid 17264] [client 172.69.71.97:63198] AH01071: Got error 'PHP message: PHP Fatal error: Allowed memory size of 2097152 bytes exhausted (tried to allocate 4096 bytes) in /home/drive.braeswood.pl/public_html/3rdparty/composer/autoload_static.php on line 446\n', referer: https://drive.braeswood.tk/index.php/apps/cookbook/

Using the latest and gratest Nextcloud (updated 2d ago) -- Virtual Mem is set per nextcloud instructions and this instance is allocated 512M for PHP-FPM.

Included apps in nextcloud: https://i.imgur.com/fuIsXW7.png

In addition to the 504, occasionally I'll click a category and cookbook will just sit there like the link doesn't even exist. A reload will fix this behavior as well.

Thanks

@stenosis
Copy link

Just wanted to mention that I can confirm the same behavior described here, receiving a 504 on the first load.

@mrzapp
Copy link
Contributor

mrzapp commented Jun 7, 2020

Is this still the case for v0.7.0?

@mrzapp mrzapp self-assigned this Jun 7, 2020
@mrzapp mrzapp added the bug Something isn't working label Jun 7, 2020
@stenosis
Copy link

stenosis commented Jun 7, 2020

I just updated the Cookbook app to version 0.7.1 on the latest Nextcloud version 19.0.0 and unfortunately the problem still exists for me.

For the info: I use a Nextcloud docker installation which gets automatically updated to the latest version. In addition it use a PostgreSQL 12 database.

There is plenty of resources available and every other app in Nextcloud just works fine for me so this problem only occurs with the Cookbook app. And a look at the Nextcloud log file shows no recording related to this.

@mrzapp
Copy link
Contributor

mrzapp commented Jun 8, 2020

@stenosis @dietrichmd @thenameiwantedwastaken do you have any interesting setups related to file storage? For instance, are your recipes stored on a remote volume?

In any case, I think the best way to debug this would be to back up your recipe folder and then add recipes back one by one until you hit the problem again. It could be content related, seeing as you're the only ones dealing with this issue.

@dietrichmd
Copy link

@mrzapp nothing weird or interesting here -- everything is local. On the recipe side, I have 20 in there, so definitely not a content-size issue. Ill try to remove and readd over the next few days.

@dietrichmd
Copy link

@mrzapp I will say, after yesterday's update, things are snappier. The initial load of the app sometimes hangs and I have to completely reload nextcloud (web) for it to come up. And after swapping between recipes for a few minutes, it seems to get sluggish, but not near as slow as before and no timeout errors (for now).

@thenameiwantedwastaken
Copy link
Author

Here's what I found after adding recipes one-by-one:

  1. The more recipes you have, the longer the recipes page takes to display. A lot of recipes leads to a time out. For me, it's a little over one second per recipe.
  2. The update-interval affects the above point. The delay only occurs if the time between viewing the recipes page is longer than the update interval.

A temporary fix is to increase the update interval to a large number, and manually use the Rescan Library button under the settings options after you add a recipe. If you set the update interval to 1440, the delay mentioned in point 1 above will only occur once a day.

Could it be that rescanning the library actually occurs when you view the recipes page if the update interval has passed since the last view. If this is the case, it's not a great design. The rescanning would be better to occur somehow in the background.

I hope this helps.

@mrzapp
Copy link
Contributor

mrzapp commented Jun 29, 2020

@thenameiwantedwastaken it does indeed work like that, because cron jobs need to be set up manually by users. This is what I have found on the topic: https://docs.nextcloud.com/server/stable/admin_manual/configuration_server/background_jobs_configuration.html?highlight=background%20jobs

Relying on this would mean no auto rescans until users perform this configuration, so I don't think it's feasible, seeing as PHP is a per-request runtime, and NextCloud doesn't enable any workarounds for that by default.

We could batch the recipes, though, that would certainly speed things up a bit. How many recipes are in your library, and how long does a rescan typically take?

@thenameiwantedwastaken
Copy link
Author

I have 44 recipes. Rescanning the library takes 49 seconds. As it is, this project is really unusable. What actually happens in a rescan? Why and when is it required? Maybe we can have an option to set the rescan interval to zero - for maunal rescans only. But, the bigger picture is that whatever system is currently being used to store recipes is limited by the number of recipes. Maybe a proper database is required for this project going forward.

@mrzapp
Copy link
Contributor

mrzapp commented Jul 1, 2020

@thenameiwantedwastaken OK, something is iffy on your end, then. I have several hundred recipes in my library and a rescan takes less than one second.

What happens is that the app reads every recipe file you've got and then saves details such as names, keywords and categories in a database for quick access. It's quite necessary, otherwise we'd have to read the entire library every time we load a view, which would take even longer.

It's tough to debug, as only a few users are experiencing this, but apparently reading files is extremely slow on some instances. How are other apps acting, like the gallery app? I assume all files are local and not on some slow external volume like S3.

@thenameiwantedwastaken
Copy link
Author

My files are on a local USB drive (USB3). No other files on that drive are 'slow'. Gallery app is fine - the page loads in less than a second, and images appear as I scroll.

@stenosis
Copy link

stenosis commented Jul 2, 2020

In the meantime I also played around with the number of recipes. It makes no difference whether I have a hundred or twenty recipes in the library. It will take about ~60 seconds for a rescan while I only can see the 504 error on the cookbook app.

I can understand that this is a strange error and hard to debug when you can't reproduce it.
Every other Nextcloud app (gallery, music and so on) works snappy and flawless except the cookbook. Very strange indeed.

Would it be possible to add an optional debug log inside the library scan process? I can't find any useful information inside the Nextcloud log while this error occurs.

@mrzapp
Copy link
Contributor

mrzapp commented Jul 3, 2020

@stenosis I would, but I don't know what to log, since there is no actual error. I think the way to go about it would be to use some makeshift PHP profiling to figure out which exact function call is eating up CPU time. But again, we can't do that if we can't reproduce it :/

If you could do that on your instance (either in the /lib/Service/RecipeService.php or the /lib/Db/RecipeDb.php file) and show us what you find, we could definitely move on to some sort of solution.

This answer outlines a really simple way to measure PHP execution time. You could modify the script to output to the NextCloud log, or just echo it out.

@Mageti
Copy link

Mageti commented Jul 15, 2020

I encounter the very same problem (49 recipes, and nextcloud in docker).
And auto-rescan in cron is perfectly doable, that's what is done by other nextcloud app.

In fact, nextcloud already has a cron mecanism. The link given by @mrzapp instruct how the nextcloud instance admin choose to setup how the cron is scheduled. And by default, it relies on a background ajax call done by each visitor, whatever the page of nextcloud they load. As for me, as a nextcloud admin, I choose a side docker, dedicated to run nextcloud cron scripts.
But how cron are executed in nextcloud do not have insidence on how you, as an app developer, could/should code it. If you, as a dev, install a new cron task inside nextcloud, it will always be executed, no mater how the instance admin configured that. The only problem that could occur is this race condition :

  • nextcloud cron set to be executed as default, by background ajax
  • super low number of visit on nextcloud (eg : 1 visit per week), meaning that the cron task would be launch on almost every request.
  • the first page the visitor goes is the cookbook app.
    In that case, the user would have to rely on "old" data on its first page load.

In every other case, the library rescan is done in the background, thanks to the nextcloud centralized cron approach, and does not impact front-end users. It will not fail or timeout in front of the users, nor slow down their page.

@christianlupus
Copy link
Collaborator

I think I am affected by this issue as well. Recently I updated the app to something >= 7.x.x and after that is mostly started. I cannot tell exactly as I first thought connection problems were causing this. Also after the app loads successfully for the first time, the issue disappears. So, I did not give too much credit to the issue. Now, it happens every now and then, when I want to open the app as soon as I need something from my recipes.

My setup is a docker installation with all files locally on the HDD. No remote files on S3 or so. I could offer to provide a test account on my installation and install a dev version of the cookbook app, if this was helping.

@mrzapp
Copy link
Contributor

mrzapp commented Jul 21, 2020

@Mageti it's cool that apps can install crons programmatically, but I'm still just puzzled by how it can take so long for NextCloud to parse a handful of files for some users and not others. I don't feel like we're getting to the bottom of the issue without pinpointing exactly which function is eating the CPU.

@christianlupus it's a good thought, but I can't do much without PHP profiling abilities (meaning root access to the runtime environment)

If either of you guys could manage to dig out exactly which function call is causing the issue, we'd be in a good place to fix this. I'm not really keen on doing hacky workarounds instead of fixing the root cause.

@christianlupus
Copy link
Collaborator

@mrzapp I will look if I can rework those two files to do some basic sort of profiling/tracing. But this will take some time until I get some results.

@christianlupus
Copy link
Collaborator

christianlupus commented Jul 27, 2020

OK, instead of tweaking with the codebase, I decided to go in full-fledged. I added xdebug to the docker image and created a complete profiling file. I once created a file for loading the app and for a rescan of the library. You can use programs like kcachegrind, wincachegrind or webgrind to look into these files.

Two things I have to remark:

  1. The browser did not cause a timeout. The app loaded normally (but slow). Unfortunately I have no way to trigger the issue explicitly. If anyone has a clear test case that causes the problem to show up in a clear way, I'd suggest to contact here so we might be able my installation for testing purposes.
  2. The very most time in the rescan is taken in a huge amount of database accesses. Per recipe there are about 30 database requests.

Screenshot_20200727_094413graph

@Mageti
Copy link

Mageti commented Jul 27, 2020

2. The very most time in the rescan is taken in a huge amount of database accesses. Per recipe there are about 30 database requests.

💡
Could be that : I use a sqlite database on a spinning HD (sufficient for me, as I only have 2 users).
30 database requests * 50 recipes for me => 1500 requests. Is that so ?

@christianlupus
Copy link
Collaborator

I am using a spinning HDD-RAID as well. In my lower image you see for the 58 recipes in my folder (look at the branching point) the 58 accesses to the JSON files in the right branch which causes 11% of the time plus 1641 times access to the database causing 77% of the time.

@Mageti as you wrote you are using docker as well, I can give you my derived Dockerfile. Then you could try yourself to create a profileing if you wanted to give it a try.

@mrzapp
Copy link
Contributor

mrzapp commented Aug 2, 2020

@christianlupus great work! The amount of database accesses is pretty outrageous, there must be a way to compose a more lenient query than the one we have currently. I am unfortunately no wizard in this area, though. Are you perhaps more proficient in than me?

This also explains the huge gap between the execution time users are experiencing, since it would depend on which database they're using and how many records are in it.

@christianlupus
Copy link
Collaborator

@mrzapp I might be able to look into this but I have quite a pile of work in front of me (that is of higher prio unfortunately), so I cannot promise this to happen very soon, sorry.

@mrzapp
Copy link
Contributor

mrzapp commented Aug 3, 2020

@christianlupus no worries, this isn't urgent, as far as I'm concerned

@Mageti
Copy link

Mageti commented Aug 5, 2020

Even if I'am not fluent in PHP, I started taking a look at the corresponding code: https://github.com/mrzapp/nextcloud-cookbook/blob/e39a8ea7ad67fd73913c9be1da7c97bec8da5d3f/lib/Db/RecipeDb.php#L231
If in understand well the code, this is what is happening (once translated in SQL) for every recipe:

-- once per recipe:
DELETE FROM cookbook_names WHERE recipe_id=$id AND user_id=$user;
INSERT INTO cookbook_names VALUES (recipe_id=$i,name=$name,user_id=$user);
DELETE FROM cookbook_keywords WHERE recipe_id=$id AND user_id=$user;

-- now for every keyword in the recipe:
INSERT INTO cookbook_keywords VALUES (recipe_id=$i,name=$keyword,user_id=$user);

-- again, once per recipe:
DELETE FROM cookbook_categories WHERE recipe_id=$id AND user_id=$user;
INSERT INTO cookbook_categories VALUES (recipe_id=$i,name=$category,user_id=$user);

So, for every recipe, we:

  • delete the recipe name
  • recreate the recipe name
  • delete the recipe category
  • recreate the recipe category

and for every keyword in every recipe, we:

  • delete the recipe keyword
  • recreate the recipe keyword

Ok, now I understand why there is sooooooo many db queries. And so why it takes so much time. Sqlite only worsens and enlightens the problem.

I will see if I can simplify that, but I'm not fluent in both php nor cookbook app, so I need to understand why/if the app needs to delete+recreate everytime all that ?
And second question: If this mechanism is really necessary, does an SQL "update" could be enough (nb of query divided by two) ?
Last thing: If the delete+recreate (or update) is mandatory everytime, all the stuff for the keywords could be done in only 1 or 2 queries.

@christianlupus If you could enlighten me on these 2 questions, I could try to enhance this part of the code to only have 3 or 4 SQL queries per recipe instead of the actual 4+2*nb_of_keywords.

@macowie
Copy link

macowie commented Aug 6, 2020

I've been using the cookbook app for over a year (thank you!) or so and I've always had this extremely slow startup, it's now grown to 30+ seconds. I currently have 57 recipes stored. I am running a dockerized setup inside a VM (the VM is on an SSD FWIW, but Nextcloud+2nd Nextcloud container running cron+Postgres+Redis run off a spinning disk array). I do have plans to migrate it all to SSDs but Cookbook is still an order of magnitude slower than any other NC app on this setup.

I'm no PHP or Nextcloud dev expert either, but if what @Mageti found is right, seems like it could at the very least benefit with batching up the db actions. Scan the JSON files, gather the names/keywords/categories in memory, upsert into each table, and delete the ones not in the list.

@christianlupus
Copy link
Collaborator

I want to be honest with you. I do not know, why exactly these codes were written like that. In fact, your analysis of the PHP commands is correct (about removal and readding). Most probably it has simp[ly grown to this state.

I assume the delete and recreate ensures always the newest data in the DB to be present for sure. After looking at the code, I thought about reading all recipes from the DB in one flush (comparably quick), then iterating over all results and see, if

  1. everything is up to date regarding this recipe
  2. the meta-data needs update
  3. the recipe has to be removed from DB.
  4. Additional new recipes need to be marked as well.

All this happens only in variables in PHP rendering this very fast. Then the difference of current state and desired state is calculated and the corresponding SQL commands can be executed (with optimization of prepared statements if possible). This should make the whole reindexing much faster.

We only have to be careful about the access via multiple users at the same time (the user_id column). I do not see why an update might not work in general (for case 2 above!).

@mrzapp
Copy link
Contributor

mrzapp commented Aug 11, 2020

@Mageti That's my bad, I am just not very proficient with relational databases, and at the time this part of the app was written, I was the only user. Trust me, there is no good reason for the queries being structured like that, so if either of you guys want to take a stab at optimising them, by all means, go nuts.

@mrzapp mrzapp changed the title 504 Gateway Time-out Optimise database queries Aug 11, 2020
@christianlupus
Copy link
Collaborator

christianlupus commented Aug 11, 2020

@Mageti did you already started digging into this issue? I considered looking into it tomorrow but I want to avoid doing duplicate work. Would you share your current state with me, s'il vous plaît?

@Mageti
Copy link

Mageti commented Aug 11, 2020

@christianlupus you can go, please don't mind me 😃
As I am not fluent in PHP and didn't know the library used for the DB before that, I struggle to make really something. And in addition to that, I will be off for some time (summer holidays with the family).
So, concider I made nothing. I though of doing something when you wrote you had "quite a pile of work in front of you (that is of higher prio unfortunately)". If it has changed, I'll gladly let you proceed 👍

@christianlupus
Copy link
Collaborator

I created a draft PR to share my current state of work here.

Unfortunately, I am not able to test it on the same machine as above but at least with mostly the same data set.
Independently of the SSD/HDD the reading of the JSON files takes around 2s.

The reindexing with the old implementation (and spinning disks) took 60x0,5s = 30s for the database operation.
The first call on the new implementation (SSD) took around 5s to index all files to the DB (DB was empty before) plus the 2s for reading the files. A second run where the DB was already initialized was finished after 0,2s (plus 2s for reading JSON).

As the latter is the main case for regular db checks this should avoid the timing issue if it was really caused by the DB access in the first point.

@christianlupus
Copy link
Collaborator

I will close this issue as I merged the PR. With the next update of the app this should be rolled out. If the issue is persistent after that, please write a comment. I will reopen then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

7 participants