Speed up fetching range reports, evolution and sparkline reports #20121

tsteur · 2022-12-18T19:40:52Z

Description:

This index change will speed up all kind of reports fetching but especially range periods where the change will be most noticeable.

When fetching evolution graphs, then the method getArchiveIds can be very slow. Especially when looking for many periods.

See eg below where it executed 30 queries and it took 7 seconds:

Or this range period requests where it took 217 seconds for 2240 calls (there might be still more tweaks possible to reduce the number of calls which be a separate fix).

I've been testing this on two of our instances with different queries and profiling and we can reproduce this.

I've then looked at the current numeric index which is:

`idsite`,`date1`,`date2`,`period`,`ts_archived`

The ts_archived is not very useful for an index though because most of the queries wouldn't lower the result set. By changing this part of the index to name(6). I was able to make the queries 150-170 times faster on our instances (we ran the same queries 100 times with SQL_NO_CACHE and compared different queries with different indexes). That's roughly because our archives include 166 different record names. On some smaller instance it improved the query 40+.

On our account 1 such query takes typically 200ms vs with this index change it takes 1.2ms.

That means above screenshot goes roughly down from 7 seconds to say 50ms. The 217 seconds goes down to 1.5 seconds.

While not changing the index size in the end as I've looked over the queries containing ts_archived and the ts_archived index typically doesn't help much there but even more importantly these queries contain also the name column meaning the new index should be even more effective there.

This query getArchiveIds() is also the TOP contributor on our cloud for load.

ts_archived was added initially 12 years ago in f593a3e but a better choice would have been to use name column but that time there were likely certain MySQL limitations.

Why name(6)?
The done flag we use in those archive queries are typically done, done.VisitsSummary or done2499999 or done249999.VisitsSummary. By including the first 6 characters in the index we should be able to select more precise the best rows. Technically, name(5) would do an equally good job as out of 1M records there are only 6 done.* records but figured rather give it one more.

Other changes

I've also removed where query parts ts_archived is not null as it is always not null. I assume it was added for old MySQL versions as it otherwise wouldn't use the index unless all parts were covered.

We have tested these changes on production on our account and reports are now loading pretty much instant.

Review

sgiehl · 2022-12-19T16:07:24Z

@tsteur Generally those changes are looking fine. But I was wondering if adding / changing indexes isn't a database operation we wanted to avoid for Matomo 5. We postponed issues like #17466 to Matomo 6, as they would add new dimensions. Just wondering how intense changing indexes is on bigger databases...

tsteur · 2022-12-19T23:59:55Z

@sgiehl Only log table migrations would be an issue. All others are fine as they run through fairly quickly and won't block tracking. The complexity comes more from log table migrations as when you have a bit of data in a log table then users/admins/hosters will need to go through a specific process of enabling queued tracking while the migrations are being performed and then replaying that traffic after the migration finished a few minutes, hours, days later and ensure all the requests are being reprocessed etc (or alternatively go into maintenance mode and replaying the logs via the importer which is also not the easiest thing to do for people).

core/Updates/5.0.0-b1.php

core/DataAccess/ArchiveSelector.php

Speed up fetching reports (especially evolution graphs and sparklines)

38a2745

tsteur added the Needs Review PRs that need a code review label Dec 18, 2022

tsteur added this to the 5.0.0 milestone Dec 18, 2022

tsteur changed the title ~~Speed up fetching range reports~~ Speed up fetching range reports, evolution and sparkline reports Dec 18, 2022

tsteur removed the Needs Review PRs that need a code review label Dec 18, 2022

tsteur added 2 commits December 19, 2022 12:04

db migrations expect unprefixed table name

57c329d

only remove prefix if it is prefixed

4e8b1ad

tsteur added the Needs Review PRs that need a code review label Dec 19, 2022

Merge branch '5.x-dev' into speedupreports

dff1697

sgiehl requested changes Dec 20, 2022

View reviewed changes

core/Updates/5.0.0-b1.php Outdated Show resolved Hide resolved

core/DataAccess/ArchiveSelector.php Show resolved Hide resolved

sgiehl removed the Needs Review PRs that need a code review label Dec 20, 2022

tsteur and others added 3 commits December 22, 2022 08:10

apply review feedback

2265604

make sure to use prefixed table version

f0a419f

Update core/Updates/5.0.0-b1.php

23e7698

sgiehl approved these changes Dec 22, 2022

View reviewed changes

sgiehl merged commit 000cf71 into 5.x-dev Dec 22, 2022

sgiehl deleted the speedupreports branch December 22, 2022 08:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up fetching range reports, evolution and sparkline reports #20121

Speed up fetching range reports, evolution and sparkline reports #20121

tsteur commented Dec 18, 2022 •

edited

Loading

sgiehl commented Dec 19, 2022

tsteur commented Dec 19, 2022

Speed up fetching range reports, evolution and sparkline reports #20121

Speed up fetching range reports, evolution and sparkline reports #20121

Conversation

tsteur commented Dec 18, 2022 • edited Loading

Description:

Review

sgiehl commented Dec 19, 2022

tsteur commented Dec 19, 2022

tsteur commented Dec 18, 2022 •

edited

Loading