-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce search engine memory usage #9495
Reduce search engine memory usage #9495
Conversation
Important point, I liked the previous solution as we did it in a single query and i was pretty sure it wouldn't changed significantly the query execution time. |
0bba79b
to
ad935e2
Compare
Should be all good for reviews now, I've updated the initial description. |
Will this fix be part of 9.5.6 ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems OK, but I do not really have enough data to do real tests.
Regarding the latest (6984fa1) commit, it's related to another memory mismanagement that was found in the internal ref. This isn't an issue when you are browsing a small sized result (e.g. a standard 20 entries result page). This commit fix it by using a PHP generator. The downside is that the "parsing" process of the results set can now only be used to build -> I'm not sure what was in |
IMHO, we should not introduce such changes on 9.5 branch. Indeed, we are introducing many changes on a critical part of GLPI to fix a behaviour that is present from a while and, I guess, is only a problem on some edge cases. |
👍 this should not tagret 9.5 branch |
Like @orthagh suggested in our weekly meeting, we can keep this PR open in 9.5 for a few weeks while we wait for the code to be "production tested" on the internal ref then moves and merge it to master later. |
The 4 latest commit reduce the memory used when exporting 140k computers to CSV from 1.4GB to ~65MB. 65MB is still significant as there is yet another bottleneck happening simply because we load a lot of data from the database straight into PHP memory. |
6c07b3d
to
e206bd8
Compare
Current search engine
The search engine currently only use a
LIMIT
clause if there is no search criteria specified.In this case, it will send a second
COUNT(*)
query after the main search to know the total number of results.Any query with a defined criteria does not use the
LIMIT
clause.The full results are send to PHP and we only iterate on the first X results (the number of results to display on one page).
This help execution time as
COUNT
requests can be costly but is very bad for memory consumption on big databases.Changes to the search engine
Every search request is now accompanied by a second
COUNT
request.The mandatory clauses to re-use from the main query are
FROM
,GROUP BY
andHAVING
.The
SELECT
andFROM
clauses must only contains mandatory fields and joins to get the fastest request as possible.The simplest case is a search without a
HAVING
clause.This will results in a very simple
COUNT
request:The more complicated case is when we have a
HAVING
clause as we it forces us to add fields in theSELECT
clause (theHAVING
clause filters on selected results and thus cannot be done without them in the query).In this case, we end up with a request like this:
Benchmark
Here is a quick benchmark on a ~140k computers database.
We are doing some searches on computers and checking the time used by the
Search::constructData()
function as well as the memory peak of the application.I've put in bold some values that stand out as worse.
On the memory consumption side the results are excellent.
Before these changes the memory peak will scale depending on the numbers of returned results and the number of fields displayed (more fields = more data to send back to PHP).
After these change the memory peak will always stays around the same for a given page size.
On the execution time side the results are more difficult to analyze.
The gains / loses all depend on the
COUNT
query execution time.If it is almost instantaneous then we gain a little time or are even with the 9.5 bugfixes branches (test 1, 3, 4 and 5).
If it not instantaneous then we have worse performances, it may be a few extra tenth at best and in a few rare cases a possible x2 execution time (the
COUNT
request would be almost as long as the main query in those cases like test 2).The search engine used this option internally to know if we had a search with no criteria and thus would be able to use the LIMIT query (as described in "Current search engine" section above).
Since we now use
LIMIT
all the time this option is no longer needed and all associated code as been removed.As we discussed the 'all' criteria is outdated and bugged so this PR is a good opportunity to remove it.
It seems we had a few case of duplicate aliases in the
SELECT
clause of our search requests.This is not a problem in most simple requests as MySQL allow duplicates in the results.
It is however a problem with wrapped queries like
SELECT COUNT(*) FROM ( {wrapped query} )
, duplicated alias are not allowed in the wrapped query and will cause a SQL error.There was two kind of duplicated aliases in the code base:
One of them was on Ticket/Problem/Changes and could be fixed in the searchoption by removing an extra field.
The other was on a special case of a meta criteria join on the current itemtype (for example searching for documents on documents).
This case required a fix in the search engine by adding a "META_" prefix to meta criteria aliases in the
SELECT
andHAVING
clauses.