Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce search engine memory usage #9495

Conversation

AdrienClairembault
Copy link
Contributor

@AdrienClairembault AdrienClairembault commented Sep 2, 2021

  • Add LIMIT to search queries

Current search engine

The search engine currently only use a LIMIT clause if there is no search criteria specified.
In this case, it will send a second COUNT(*) query after the main search to know the total number of results.

Any query with a defined criteria does not use the LIMIT clause.
The full results are send to PHP and we only iterate on the first X results (the number of results to display on one page).

This help execution time as COUNT requests can be costly but is very bad for memory consumption on big databases.

Changes to the search engine

Every search request is now accompanied by a second COUNT request.

The mandatory clauses to re-use from the main query are FROM, GROUP BY and HAVING.
The SELECT and FROM clauses must only contains mandatory fields and joins to get the fastest request as possible.

The simplest case is a search without a HAVING clause.
This will results in a very simple COUNT request:

SELECT COUNT(*) FROM {main_table} {mandatory joins for the conditions fields} WHERE {condition}

The more complicated case is when we have a HAVING clause as we it forces us to add fields in the SELECT clause (the HAVING clause filters on selected results and thus cannot be done without them in the query).
In this case, we end up with a request like this:

SELECT COUNT(*) FROM (
   SELECT {having fields} FROM {main_table} {mandatory joins for the conditions fields} WHERE {condition} HAVING ... GROUP BY ...
)

Benchmark

Here is a quick benchmark on a ~140k computers database.
We are doing some searches on computers and checking the time used by the Search::constructData() function as well as the memory peak of the application.
I've put in bold some values that stand out as worse.

Search criteria 9.5 bugfixes #9495
* 1.18s (7.64 Mio) 1.16s(7.60 Mio)
name contains 'se' 0.35s (10.36 Mio) 0.58s (7.62 Mio)
alternate username contains 'a' and location is not "Caen" 1.51s (56.68 Mio) 1.43s (7.63 Mio)
manufacturer is 'HP' and status is "Status A" 0.83s (36.38 Mio) 0.73s (7.64 Mio)
comments contains 'ab' or serial contains '44' or uuid contains '7' 1.78s (72.88 Mio) 1.57s (7.68 Mio)

On the memory consumption side the results are excellent.
Before these changes the memory peak will scale depending on the numbers of returned results and the number of fields displayed (more fields = more data to send back to PHP).
After these change the memory peak will always stays around the same for a given page size.

On the execution time side the results are more difficult to analyze.
The gains / loses all depend on the COUNT query execution time.
If it is almost instantaneous then we gain a little time or are even with the 9.5 bugfixes branches (test 1, 3, 4 and 5).
If it not instantaneous then we have worse performances, it may be a few extra tenth at best and in a few rare cases a possible x2 execution time (the COUNT request would be almost as long as the main query in those cases like test 2).

  • Remove 'no_search' option

The search engine used this option internally to know if we had a search with no criteria and thus would be able to use the LIMIT query (as described in "Current search engine" section above).
Since we now use LIMIT all the time this option is no longer needed and all associated code as been removed.

  • Remove 'all' criteria

As we discussed the 'all' criteria is outdated and bugged so this PR is a good opportunity to remove it.

  • Fix duplicate aliases

It seems we had a few case of duplicate aliases in the SELECT clause of our search requests.
This is not a problem in most simple requests as MySQL allow duplicates in the results.

It is however a problem with wrapped queries like SELECT COUNT(*) FROM ( {wrapped query} ), duplicated alias are not allowed in the wrapped query and will cause a SQL error.

There was two kind of duplicated aliases in the code base:
One of them was on Ticket/Problem/Changes and could be fixed in the searchoption by removing an extra field.
The other was on a special case of a meta criteria join on the current itemtype (for example searching for documents on documents).
This case required a fix in the search engine by adding a "META_" prefix to meta criteria aliases in the SELECT and HAVING clauses.

Q A
Bug fix? yes
New feature? no
BC breaks? no
Deprecations? no
Tests pass? yes
Fixed tickets !22511

@orthagh
Copy link
Contributor

orthagh commented Sep 3, 2021

Important point, I liked the previous solution as we did it in a single query and i was pretty sure it wouldn't changed significantly the query execution time.
Double query means double execution time and GLPI can be alredy slows at some pages with some complex criteria.
And adding more time can be catastrophic

@AdrienClairembault AdrienClairembault changed the title Reduce search engine memory usage WIP - Reduce search engine memory usage Sep 3, 2021
@AdrienClairembault AdrienClairembault changed the title WIP - Reduce search engine memory usage Reduce search engine memory usage Sep 7, 2021
@AdrienClairembault
Copy link
Contributor Author

Should be all good for reviews now, I've updated the initial description.

inc/search.class.php Outdated Show resolved Hide resolved
inc/search.class.php Outdated Show resolved Hide resolved
inc/search.class.php Outdated Show resolved Hide resolved
inc/search.class.php Outdated Show resolved Hide resolved
inc/search.class.php Outdated Show resolved Hide resolved
inc/search.class.php Outdated Show resolved Hide resolved
@AdrienClairembault
Copy link
Contributor Author

Will this fix be part of 9.5.6 ?

Copy link
Member

@cedric-anne cedric-anne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems OK, but I do not really have enough data to do real tests.

@cedric-anne cedric-anne modified the milestones: 9.5.6, 9.5.7 Sep 15, 2021
@AdrienClairembault
Copy link
Contributor Author

AdrienClairembault commented Sep 16, 2021

Regarding the latest (6984fa1) commit, it's related to another memory mismanagement that was found in the internal ref.
It turns out we are also loading all search results into a PHP array (so taking the results out of a memory efficient mysql resultset and putting it all in a standard PHP array row by row).

This isn't an issue when you are browsing a small sized result (e.g. a standard 20 entries result page).
However if you want to export all search results to CSV then you may reach the memory limit very easily if you have a lot of items.

This commit fix it by using a PHP generator.

The downside is that the "parsing" process of the results set can now only be used to build $data['data']['rows'].
Before the changes we would also build $data['data']['currentuser'] and $data['data']['items'].

-> I'm not sure what was in $data['data']['currentuser'] but it seems to be unused data as the 'currentuser' string isn't used anywhere else in GLPI.
-> $data['data']['items'] seemed to contain the ids of every items returned by the searchs.
Removing it doesn't break the unit tests and I can't find any code that would depend on it so it may be unused too.
Note that the items key is used in a lot of place in GLPI so it's hard to check only the ones related to search results...

inc/search.class.php Outdated Show resolved Hide resolved
@cedric-anne
Copy link
Member

IMHO, we should not introduce such changes on 9.5 branch. Indeed, we are introducing many changes on a critical part of GLPI to fix a behaviour that is present from a while and, I guess, is only a problem on some edge cases.

@trasher
Copy link
Contributor

trasher commented Sep 17, 2021

IMHO, we should not introduce such changes on 9.5 branch. Indeed, we are introducing many changes on a critical part of GLPI to fix a behaviour that is present from a while and, I guess, is only a problem on some edge cases.

👍 this should not tagret 9.5 branch

@AdrienClairembault
Copy link
Contributor Author

IMHO, we should not introduce such changes on 9.5 branch. Indeed, we are introducing many changes on a critical part of GLPI to fix a behaviour that is present from a while and, I guess, is only a problem on some edge cases.

Like @orthagh suggested in our weekly meeting, we can keep this PR open in 9.5 for a few weeks while we wait for the code to be "production tested" on the internal ref then moves and merge it to master later.

@AdrienClairembault
Copy link
Contributor Author

The 4 latest commit reduce the memory used when exporting 140k computers to CSV from 1.4GB to ~65MB.

65MB is still significant as there is yet another bottleneck happening simply because we load a lot of data from the database straight into PHP memory.
Getting rid of this bottleneck would require to use unbuffered queries which would imply much bigger changes to the codebase so I don't see it happening in the near future.

@AdrienClairembault AdrienClairembault force-pushed the fix/search-memory-usage-2 branch from 6c07b3d to e206bd8 Compare October 4, 2021 12:47
@AdrienClairembault AdrienClairembault changed the base branch from 9.5/bugfixes to master October 4, 2021 12:47
@AdrienClairembault AdrienClairembault marked this pull request as draft October 4, 2021 12:47
@AdrienClairembault AdrienClairembault marked this pull request as ready for review October 14, 2021 07:38
@cedric-anne cedric-anne marked this pull request as draft November 23, 2021 07:52
@AdrienClairembault AdrienClairembault deleted the fix/search-memory-usage-2 branch October 4, 2023 11:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants