-
Notifications
You must be signed in to change notification settings - Fork 295
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhancement/6623 ga4 widgets real data #6790
Conversation
…a4-widgets-real-data.
…metrics in the request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @eugene-manuilov, good work here so far, but there are a few areas that need some attention.
Additionally, I'd suggest the QAB should be a bit more in depth. We should QA for the cases where there are gaps in the Google-returned data, not just the empty response scenario.
In order to find gaps in real data, I wrote a little snippet that can be run in the console. This will flag some gaps when run while connected as oi.ie
with the GA4 property 285794360
.
let report = await googlesitekit.data.select("modules/analytics-4").getReport({
startDate: "2022-01-01",
endDate: "2023-03-13",
metrics: ["totalUSers"],
dimensions: ["date"],
orderby: [{ dimension: { dimensionName: "date" } }],
});
// Run the above again if `report` is `undefined` the first time.
const format = (date) => {
const y = date.getFullYear();
const m = date.getMonth() + 1;
const d = date.getDate();
return `${y}-${m}-${d}`;
};
let prevDate = undefined;
report.rows.forEach(({ dimensionValues: [{ value }] }) => {
const [_, y, m, d] = /(....)(..)(..)/.exec(value);
const date = new Date(0);
date.setFullYear(y, m - 1, d);
if (prevDate) {
if ( date.getTime() - prevDate !== 86400000) {
console.log("gap", format(prevDate), '-', format(date));
}
}
prevDate = date;
});
I'd suggest this should probably be a QA:Eng
task for an engineer to dig in a bit and test the various scenarios with gaps, single vs multiple date ranges, etc.
includes/Modules/Analytics_4.php
Outdated
$row = new Google_Service_AnalyticsData_Row(); | ||
$row->setDimensionValues( array( $dimension_value ) ); | ||
$row->setMetricValues( $metric_values ); | ||
|
||
$rows[] = $row; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic here needs to be updated based on how many date ranges there are. This is correct for the case where there is a single date range, but when there are multiple date ranges, we need to add a row for each date range, with a 2nd dimension value which is date_range_X
, where X is the zero-based index of the date range.
You can see this in action by requesting a multi-date-range report. For illustration here's one of our fixtures which shows the same thing:
Lines 8 to 36 in 97e835a
"rows": [ | |
{ | |
"dimensionValues": [ | |
{ "value": "20201201" }, | |
{ "value": "date_range_0" } | |
], | |
"metricValues": [ { "value": "55" }, { "value": "8" } ] | |
}, | |
{ | |
"dimensionValues": [ | |
{ "value": "20201201" }, | |
{ "value": "date_range_1" } | |
], | |
"metricValues": [ { "value": "3" }, { "value": "8" } ] | |
}, | |
{ | |
"dimensionValues": [ | |
{ "value": "20201202" }, | |
{ "value": "date_range_0" } | |
], | |
"metricValues": [ { "value": "17" }, { "value": "23" } ] | |
}, | |
{ | |
"dimensionValues": [ | |
{ "value": "20201202" }, | |
{ "value": "date_range_1" } | |
], | |
"metricValues": [ { "value": "10" }, { "value": "44" } ] | |
}, |
Furthermore - the logic also needs changing to ensure the rows are inserted in the right order, when date
is provided as an order-by dimension. As GA4 reports no longer return rows ordered by default, we now need to explicitly specify for them to be ordered by date
for our charts. As things stand this code simply appends the dates to the end of $rows
which can result in an out of order sequence when there are some rows returned in the response.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, added the date range index and updated the logic to ensure that rows are sorted correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, as mentioned on a separate comment though, the logic still needs a bit of an update. Details are on the other comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update, but this still needs another tweak, with details on a separate comment.
Co-authored-by: Tom Rees-Herdman <tom.rees-herdman@10up.com>
} | ||
|
||
$property_id = self::normalize_property_id( $settings['propertyID'] ); | ||
$request->setProperty( $property_id ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just FYI there's a question mark about whether we should retain this use of setProperty()
, as discussed here. It's out of scope for this refactor though.
* @param int|bool $date_range_index The date range index for the current date. | ||
* @return Google_Service_AnalyticsData_Row A new zero-value row instance. | ||
*/ | ||
public static function create_report_row( $metric_headers, $current_date, $date_range_index ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit hesitant about this choice to make create_report_row()
public, when it's apparent it's only public in order to be used as a helper in a test. Ideally, we'd structure our public interfaces to only represent the required interface for use in production. Of course, the usual workaround is to extract the functionality to reuse in tests into a separate class/exported function etc.
With this general principle in mind, I'd prefer to see this made private with a separate test helper utility, or this to be extracted into a separate class. Again, interested to hear your take on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, your concerns make sense to me. Reworked it to be a trait that we can use both in the response class and in the test class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, nice one!
// Loop through all days in the date range and check if there is a metric value | ||
// for it. If the metric value is missing, we will need to add one with a zero value. | ||
$now = $start; | ||
do { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... This will have a side effect of ensuring the rows are always sorted, when empty rows are added, regardless of whether the orderby
clause is specified.
I think it's OK, we just need to be careful not to accidentally rely on this behaviour, it could give the impression that the GA4 report endpoint automatically sorts the response like the UA one does...
Just mentioning this for the sake of raising visibility, I don't think we don't need to change it.
$now += DAY_IN_SECONDS; | ||
|
||
// Copy the existing row if it is available, otherwise create a new zero-value row. | ||
$rows[ $current_date ] = isset( $existing_rows[ $current_date ] ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic is still not quite right. It's currently only inserting a single row per missing date except where there's an overlap with the date ranges.
It needs to be adjusted so that it inserts a row per date, per date range. It's also not sorting holistically across all rows, it's sorting per date range.
In other words, say the requested date ranges are as follows:
'startDate' => '2023-02-01',
'endDate' => '2023-02-03',
'compareStartDate' => '2023-01-01',
'compareEndDate' => '2023-01-03',
The current result looks like this:
array( '20230201', 'date_range_0' ),
array( '20230202', 'date_range_0' ),
array( '20230203', 'date_range_0' ),
array( '20230101', 'date_range_1' ),
array( '20230102', 'date_range_1' ),
array( '20230103', 'date_range_1' ),
But, it should look like this, with rows fully sorted and dates repeated per date range:
array( '20230101', 'date_range_1' ),
array( '20230101', 'date_range_0' ),
array( '20230102', 'date_range_1' ),
array( '20230102', 'date_range_0' ),
array( '20230103', 'date_range_1' ),
array( '20230103', 'date_range_0' ),
array( '20230201', 'date_range_0' ),
array( '20230201', 'date_range_1' ),
array( '20230202', 'date_range_0' ),
array( '20230202', 'date_range_1' ),
array( '20230203', 'date_range_0' ),
array( '20230203', 'date_range_1' ),
When the ranges overlap, care needs to be taken to ensure there's still only one row per date per range:
// Args:
'startDate' => '2023-01-01',
'endDate' => '2023-01-03',
'compareStartDate' => '2023-01-02',
'compareEndDate' => '2023-01-04',
// Result:
array( '20230101', 'date_range_0' ),
array( '20230101', 'date_range_1' ),
array( '20230102', 'date_range_0' ),
array( '20230102', 'date_range_1' ),
array( '20230103', 'date_range_0' ),
array( '20230103', 'date_range_1' ),
array( '20230104', 'date_range_1' ),
array( '20230104', 'date_range_0' ),
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah.. you are right. This should be updated now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned, this still needs a bit of a tweak with details on a separate comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @eugene-manuilov, nice refactor here. I have left a few more comments - notably, the row-adding logic still needs a bit more work.
Additionally, I'd suggest updating the screenshot in the QAB which still shows numeric zeroes, this might cause an unwanted QA/Execution cycle.
…a4-widgets-real-data.
…a4-widgets-real-data.
tests/phpunit/integration/Modules/Analytics_4/Report/ResponseTest.php
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @eugene-manuilov, this is looking great, but there are a couple of aspects that still need to be addressed, please see the related comments.
Also, it looks like you didn't spot this comment I left - I've gone ahead and updated the screenshot myself.
Additionally, I'd suggest updating the screenshot in the QAB which still shows numeric zeroes, this might cause an unwanted QA/Execution cycle.
Thanks for updating that screenshot, @techanvil, I missed your message. |
// Add rows for the current date for each date range. | ||
self::iterate_date_ranges( | ||
$date_ranges, | ||
function( $date ) use ( &$rows, $ranges_count, $metric_headers, $multiple_ranges ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This use
clause is missing $existing_rows
, which means the existing row is never found, so we always return zero data for all rows. It's a small fix, I'll apply the change myself.
function( $date ) use ( &$rows, $ranges_count, $metric_headers, $multiple_ranges ) { | |
function( $date ) use ( &$rows, $existing_rows, $ranges_count, $metric_headers, $multiple_ranges ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @eugene-manuilov - although I'd applied this fix myself, hoping to prevent an extra execution/review cycle, I've realised that the test coverage is lacking a check to ensure that the resulting rows which do have data retain this data and aren't overwritten with zeroes. Therefore, please can you update the tests to provide coverage for this aspect?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, tests are updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @eugene-manuilov, this looks good but as discussed here I've applied a small fix myself and then realized the tests should also be updated. Please can you update the tests as described?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @eugene-manuilov, thanks for making the changes to the test.
I do think the resulting test code could be simplified a bit to make it easier to understand, while including the expected values in the @dataProvider
data set, to give a clearer picture of the inputs vs expectations.
In order to speed things up I've pushed a commit myself to make this change and illustrate the point. Please review the change and let me know your thoughts. If you're happy with this, I think it's ready to merge.
@techanvil looks good to me. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops - forgot to approve this before merge. LGTM, nice work @eugene-manuilov!
Summary
Addresses issue:
Relevant technical choices
PR Author Checklist
Do not alter or remove anything below. The following sections will be managed by moderators only.
Code Reviewer Checklist
Merge Reviewer Checklist