-
-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve page conversion attribution performance with pre-calculated field #20526
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a couple of comments. Otherwise I was wondering if the current approach of simply replacing the archiving logic with a one based on the new column together with providing a command to recalculate the columns is suitable for us.
An alternative would be to keep the old possibility to calculate the values and decide whether to use the new or the old method after checking if the new column has values for all dates within the requested date range or not.
This would remove the requirements of having a command to update old datasets.
Using the current approach would otherwise mean that we would need to run the command for all cloud installs even if archives are already built. Otherwise we would have incorrect numbers after invalidating certain archives or if e.g. a new segment or custom report is created that triggers archiving for date ranges before the update.
@tsteur Which solution would you prefer?
On the cloud it shouldn't cause any issues unless someone invalidated historical data and this should be rare and is ok if then the data wouldn't show up correctly. All other data would be already archived using the old logic and the new logic would then start for newly tracked data. We may just have to populate the data for the last 24 hours to ensure the current day is correct. It would also apply to new segments/custom reports but we're only archiving 1 month back on Cloud so this wouldn't be too concerning and the report is potentially also not viewed too often. In the worst case we'd rather invalidate the data and re-archive it if someone asks. I'm thinking checking all the column values be quite time consuming on the database and may take quite a bit of time. It would only apply for Cloud for invalidated data so it wouldn't be needed. On the Cloud we'd rather not have that extra query. |
b374ea1
to
8604d9f
Compare
8604d9f
to
93bcbb9
Compare
This issue is in "needs review" but there has been no activity for 7 days. ping @matomo-org/core-reviewers |
…, Visitor class and populate via the VisitRecognizer. Adjust referrer attribution and tests to use the immutable properties.
…ulate field on insert conversion, console command to calculate history
…version insert, calculate value from other fields instead of using a query, VisitInfo to provide access to original visit values.
…gregation of new pages before metric
…default, tidy loop logic
…m20375-pages-before-field
@bx80 The system tests on PHP 7.2 are currently running into a segfault. I'm able to reproduce that segfault on my vm. |
The segfault occurs within the test BackwardsCompatibility1XTest. That test insets a database dump of Matomo 1.x and then tries to perform a database update here:
This segfaults when it tries to perform the custom migration you have added. Commenting that part out lets the tests pass again. Don't have time right now to investigate that further... |
Co-authored-by: Stefan Giehl <stefan@matomo.org>
@sgiehl I was able to recreate the segfault locally with PHP 7.4.33, strangely it doesn''t occur at all with with PHP 8.2.5. It seems to be caused by the custom migration using the As part of debugging this I've also reworked the custom migration to more closely match the 4.0.0 custom migration by generating an array of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some more comments.
In relation to this PR we should maybe create a new FAQ to explain what to do if the page goal metrics aren't shown for older dates when e.g. a new segment is created. Running a command won't be something people might expect.
Not sure if that is something we might also need to mention somewhere in the release blog post.
Co-authored-by: Stefan Giehl <stefan@matomo.org>
…orrectly handle multi-conversion visits
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had a last look through the code and imho it looks fine now.
Added a last minor suggestions, which I will apply myself before merging it.
Description:
Fixes #20375
log_conversion.pageviews_before
column (> 4.14.0 installs will already have this).core:calculate-conversion-pages
to populate the new field for historic data.Review