Reduce search engine memory usage #9495

AdrienClairembault · 2021-09-02T15:59:17Z

Add LIMIT to search queries

Current search engine

The search engine currently only use a LIMIT clause if there is no search criteria specified.
In this case, it will send a second COUNT(*) query after the main search to know the total number of results.

Any query with a defined criteria does not use the LIMIT clause.
The full results are send to PHP and we only iterate on the first X results (the number of results to display on one page).

This help execution time as COUNT requests can be costly but is very bad for memory consumption on big databases.

Changes to the search engine

Every search request is now accompanied by a second COUNT request.

The mandatory clauses to re-use from the main query are FROM, GROUP BY and HAVING.
The SELECT and FROM clauses must only contains mandatory fields and joins to get the fastest request as possible.

The simplest case is a search without a HAVING clause.
This will results in a very simple COUNT request:

SELECT COUNT(*) FROM {main_table} {mandatory joins for the conditions fields} WHERE {condition}

The more complicated case is when we have a HAVING clause as we it forces us to add fields in the SELECT clause (the HAVING clause filters on selected results and thus cannot be done without them in the query).
In this case, we end up with a request like this:

SELECT COUNT(*) FROM (
   SELECT {having fields} FROM {main_table} {mandatory joins for the conditions fields} WHERE {condition} HAVING ... GROUP BY ...
)

Benchmark

Here is a quick benchmark on a ~140k computers database.
We are doing some searches on computers and checking the time used by the Search::constructData() function as well as the memory peak of the application.
I've put in bold some values that stand out as worse.

Search criteria	9.5 bugfixes	#9495
*	1.18s (7.64 Mio)	1.16s(7.60 Mio)
name contains 'se'	0.35s (10.36 Mio)	0.58s (7.62 Mio)
alternate username contains 'a' and location is not "Caen"	1.51s (56.68 Mio)	1.43s (7.63 Mio)
manufacturer is 'HP' and status is "Status A"	0.83s (36.38 Mio)	0.73s (7.64 Mio)
comments contains 'ab' or serial contains '44' or uuid contains '7'	1.78s (72.88 Mio)	1.57s (7.68 Mio)

On the memory consumption side the results are excellent.
Before these changes the memory peak will scale depending on the numbers of returned results and the number of fields displayed (more fields = more data to send back to PHP).
After these change the memory peak will always stays around the same for a given page size.

On the execution time side the results are more difficult to analyze.
The gains / loses all depend on the COUNT query execution time.
If it is almost instantaneous then we gain a little time or are even with the 9.5 bugfixes branches (test 1, 3, 4 and 5).
If it not instantaneous then we have worse performances, it may be a few extra tenth at best and in a few rare cases a possible x2 execution time (the COUNT request would be almost as long as the main query in those cases like test 2).

Remove 'no_search' option

The search engine used this option internally to know if we had a search with no criteria and thus would be able to use the LIMIT query (as described in "Current search engine" section above).
Since we now use LIMIT all the time this option is no longer needed and all associated code as been removed.

Remove 'all' criteria

As we discussed the 'all' criteria is outdated and bugged so this PR is a good opportunity to remove it.

Fix duplicate aliases

It seems we had a few case of duplicate aliases in the SELECT clause of our search requests.
This is not a problem in most simple requests as MySQL allow duplicates in the results.

It is however a problem with wrapped queries like SELECT COUNT(*) FROM ( {wrapped query} ), duplicated alias are not allowed in the wrapped query and will cause a SQL error.

There was two kind of duplicated aliases in the code base:
One of them was on Ticket/Problem/Changes and could be fixed in the searchoption by removing an extra field.
The other was on a special case of a meta criteria join on the current itemtype (for example searching for documents on documents).
This case required a fix in the search engine by adding a "META_" prefix to meta criteria aliases in the SELECT and HAVING clauses.

Q	A
Bug fix?	yes
New feature?	no
BC breaks?	no
Deprecations?	no
Tests pass?	yes
Fixed tickets	!22511

orthagh · 2021-09-03T06:18:12Z

Important point, I liked the previous solution as we did it in a single query and i was pretty sure it wouldn't changed significantly the query execution time.
Double query means double execution time and GLPI can be alredy slows at some pages with some complex criteria.
And adding more time can be catastrophic

AdrienClairembault · 2021-09-07T14:05:29Z

Should be all good for reviews now, I've updated the initial description.

inc/search.class.php

AdrienClairembault · 2021-09-13T12:57:42Z

Will this fix be part of 9.5.6 ?

cedric-anne

It seems OK, but I do not really have enough data to do real tests.

AdrienClairembault · 2021-09-16T14:14:43Z

Regarding the latest (6984fa1) commit, it's related to another memory mismanagement that was found in the internal ref.
It turns out we are also loading all search results into a PHP array (so taking the results out of a memory efficient mysql resultset and putting it all in a standard PHP array row by row).

This isn't an issue when you are browsing a small sized result (e.g. a standard 20 entries result page).
However if you want to export all search results to CSV then you may reach the memory limit very easily if you have a lot of items.

This commit fix it by using a PHP generator.

The downside is that the "parsing" process of the results set can now only be used to build $data['data']['rows'].
Before the changes we would also build $data['data']['currentuser'] and $data['data']['items'].

-> I'm not sure what was in $data['data']['currentuser'] but it seems to be unused data as the 'currentuser' string isn't used anywhere else in GLPI.
-> $data['data']['items'] seemed to contain the ids of every items returned by the searchs.
Removing it doesn't break the unit tests and I can't find any code that would depend on it so it may be unused too.
Note that the items key is used in a lot of place in GLPI so it's hard to check only the ones related to search results...

inc/search.class.php

cedric-anne · 2021-09-17T08:43:46Z

IMHO, we should not introduce such changes on 9.5 branch. Indeed, we are introducing many changes on a critical part of GLPI to fix a behaviour that is present from a while and, I guess, is only a problem on some edge cases.

trasher · 2021-09-17T08:46:01Z

IMHO, we should not introduce such changes on 9.5 branch. Indeed, we are introducing many changes on a critical part of GLPI to fix a behaviour that is present from a while and, I guess, is only a problem on some edge cases.

👍 this should not tagret 9.5 branch

AdrienClairembault · 2021-09-17T10:29:37Z

IMHO, we should not introduce such changes on 9.5 branch. Indeed, we are introducing many changes on a critical part of GLPI to fix a behaviour that is present from a while and, I guess, is only a problem on some edge cases.

Like @orthagh suggested in our weekly meeting, we can keep this PR open in 9.5 for a few weeks while we wait for the code to be "production tested" on the internal ref then moves and merge it to master later.

AdrienClairembault · 2021-09-17T13:28:22Z

The 4 latest commit reduce the memory used when exporting 140k computers to CSV from 1.4GB to ~65MB.

65MB is still significant as there is yet another bottleneck happening simply because we load a lot of data from the database straight into PHP memory.
Getting rid of this bottleneck would require to use unbuffered queries which would imply much bigger changes to the codebase so I don't see it happening in the near future.

AdrienClairembault requested review from trasher, orthagh and cedric-anne September 2, 2021 15:59

AdrienClairembault added this to the 9.5.6 milestone Sep 2, 2021

AdrienClairembault added the enhancement label Sep 2, 2021

AdrienClairembault changed the title ~~Reduce search engine memory usage~~ WIP - Reduce search engine memory usage Sep 3, 2021

AdrienClairembault force-pushed the fix/search-memory-usage-2 branch from 0bba79b to ad935e2 Compare September 7, 2021 12:24

AdrienClairembault changed the title ~~WIP - Reduce search engine memory usage~~ Reduce search engine memory usage Sep 7, 2021

cedric-anne reviewed Sep 8, 2021

View reviewed changes

cedric-anne approved these changes Sep 13, 2021

View reviewed changes

cedric-anne modified the milestones: 9.5.6, 9.5.7 Sep 15, 2021

cedric-anne reviewed Sep 17, 2021

View reviewed changes

inc/search.class.php Outdated Show resolved Hide resolved

cedric-anne approved these changes Sep 17, 2021

View reviewed changes

cedric-anne mentioned this pull request Sep 21, 2021

[9.5.X] No show closed tickets in a large database #8811

Closed

AdrienClairembault added 7 commits October 4, 2021 14:10

Reduce search engine memory usage

47d6baf

Remove 'all' criteria

e9f67f4

Change methods visibility and fix small mistakes

aa19fea

Reduce memory usage when fetching all search results

13f03a4

Fix count in API

89971ba

Fix API count 2

2c922d7

Move intval

fdb383a

Rebase to master

e206bd8

AdrienClairembault force-pushed the fix/search-memory-usage-2 branch from 6c07b3d to e206bd8 Compare October 4, 2021 12:47

AdrienClairembault changed the base branch from 9.5/bugfixes to master October 4, 2021 12:47

AdrienClairembault marked this pull request as draft October 4, 2021 12:47

cedric-anne mentioned this pull request Oct 6, 2021

Extend DropDown "Results to display by page" #9433

Closed

AdrienClairembault added 2 commits October 12, 2021 10:54

Do not wrap GROUPBY

2112037

Revert group by changes

625d272

AdrienClairembault marked this pull request as ready for review October 14, 2021 07:38

cedric-anne marked this pull request as draft November 23, 2021 07:52

AdrienClairembault closed this Jan 17, 2022

cedric-anne mentioned this pull request Jan 21, 2022

Browse items by category #10163

Merged

cedric-anne mentioned this pull request Aug 2, 2023

Performance problems with default search on tickets #15246

Open

2 tasks

AdrienClairembault deleted the fix/search-memory-usage-2 branch October 4, 2023 11:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce search engine memory usage #9495

Reduce search engine memory usage #9495

AdrienClairembault commented Sep 2, 2021 •

edited

Loading

orthagh commented Sep 3, 2021

AdrienClairembault commented Sep 7, 2021

AdrienClairembault commented Sep 13, 2021

cedric-anne left a comment

AdrienClairembault commented Sep 16, 2021 •

edited

Loading

cedric-anne commented Sep 17, 2021

trasher commented Sep 17, 2021

AdrienClairembault commented Sep 17, 2021

AdrienClairembault commented Sep 17, 2021

Reduce search engine memory usage #9495

Reduce search engine memory usage #9495

Conversation

AdrienClairembault commented Sep 2, 2021 • edited Loading

Current search engine

Changes to the search engine

Benchmark

orthagh commented Sep 3, 2021

AdrienClairembault commented Sep 7, 2021

AdrienClairembault commented Sep 13, 2021

cedric-anne left a comment

Choose a reason for hiding this comment

AdrienClairembault commented Sep 16, 2021 • edited Loading

cedric-anne commented Sep 17, 2021

trasher commented Sep 17, 2021

AdrienClairembault commented Sep 17, 2021

AdrienClairembault commented Sep 17, 2021

AdrienClairembault commented Sep 2, 2021 •

edited

Loading

AdrienClairembault commented Sep 16, 2021 •

edited

Loading