Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure that GA4 widgets display correctly with real data (notably that the graphs handle missing/zero rows). #6623

Closed
techanvil opened this issue Feb 22, 2023 · 13 comments
Labels
Exp: SP P0 High priority QA: Eng Requires specialized QA by an engineer Type: Enhancement Improvement of an existing feature

Comments

@techanvil
Copy link
Collaborator

techanvil commented Feb 22, 2023

Feature Description

At the time of writing, the GA4 API is not returning rows for days with no data when querying by the date dimension, even when the keepEmptyRows flag is set to true. This is true for other dimensions too, but all our graphs use date-indexed reports.

As a result, presented with real data that has missing rows, the GA4 graphs will omit what should be a zero data point for each of those rows and draw a line that doesn't touch the bottom of the graph when it should.

We need to ensure these graphs plot correctly. Additionally, care must be taken to ensure that there are no other aspects of the widgets that are affected by this.

@aaemnnosttv is currently in communication with the Analytics team to clarify whether keepEmptyRows is working as intended and this could result in some change to the API which is currently in beta. As a result this can remain in Triage for now, with the AC to be completed once we know how the API will be working going forward.

The likely resolution will be either:

  • The GA4 API changes and starts returning us rows populated with a zero for dates with no data. This would be the preferred way forward for us. This should allow our graphs to "just work", although we might want to revisit the GA4 isGatheringData() selector which has so far worked around the new state of the API.
  • We pad the missing rows ourselves in the GET:report handler. This should be manageable if we keep the scope contained to the single-dimension date case. Trying to support combinations of dimensions could make this considerably more complicated which is a bit of a warning for future proofing this solution, but it would have the advantage of being a centralised fix.
  • We pad the missing rows on the client-side at the point we hand the data off to Google Charts. This may be a bit more future-proof, but we'd need to remember to apply it to all GA4 charts.

Do not alter or remove anything below. The following sections will be managed by moderators only.

Acceptance criteria

  • The analytics-4 handler for GET:report on the server should be updated so that it always returns rows for the requested date range but only when date is the only requested dimension (i.e. simply including date as a requested dimension along with one or more additional dimensions would not work)
    • The metric values for any generated rows should use zero-value(s) for their respective types

Implementation Brief

  • In includes/Modules/Analytics_4.php:
    • Create a new public static method that will parse the GET:report response, say parse_report_response(). It should do the following:
      • Accept $data and $response parameters.
      • Check if the dimensions array property in $data has only one item, and if that item's name property is date.
      • If not, it should return $response.
      • If the above check is truthy, it should check the rows array property and see if the rows are missing any dates. The logic for determining which dates are missing depends on whether or not the compareStartDate and compareEndDate request parameters are present.
        • If they are not present, then there is a single date range and the whole range from startDate to endDate should have an entry per day, with rows filled in for missing days.
        • If they are present, it means there are two date ranges, one being the date range from startDate to endDate, and the other being the range from compareStartDate to compareEndDate. Any missing days from startDate to endDate should be filled in, and any missing days from compareStartDate to compareEndDate should be filled in. If there's an overlap of the date ranges, only one row per day is required (there shouldn't be multiple rows for the same date).
      • To get all the dates in between two dates, the PHP DatePeriod class can be used, similar to the example here.
      • After the missing dates are found comparing the received rows in response and all the possible dates in the date ranges, a row array should be inserted for each missing date with dimensionValues being a nested array property with the value being the missing date (similar to other rows), and metricValues being another nested array property with the value being '0'. The number of array( 'value' => '0' ) entries in the metricValues array should match the length of metrics being requested.
      • The returned rowCount property should be updated to reflect the new number of rows.
    • In the parse_data_response() method, add a case to the switch for the GET:report datapoint that calls the new parse_report_response() method with $data and $response passed to it and returns it.

Test Coverage

  • In tests/phpunit/integration/Modules/Analytics_4Test.php:
    • Update the existing get_report test cases to reflect the above changes.
    • Add additional test cases mocking missing rows in the response, and depicting that the method fills in missing dates with empty data.

QA Brief

  • Create a new site and set up the Site Kit plugin.
  • Activate the Analytics module and create a new account for the new site.
  • Go to the dashboard page.
  • Open the browser console and run the following command in it:
    googlesitekit.data.select('modules/analytics-4').getReport({ dimensions: 'date', metrics: 'totalUsers', startDate: '2023-01-01', endDate: '2023-01-07' })
  • Open the network tab in Developer Tools and select your request. It will open request details where you need to select the preview tab. It will show you response information. Make sure that you see there are the right number of rows with metric values all zero like on this screenshot:
    image

QA:Eng

See Tom's comment here and try to do what he suggests: #6790 (review)

Changelog entry

  • Ensure Google Analytics 4 charts display correctly with zero data, by padding the data returned from the runReport endpoint to add zero-data rows where data is missing in cases where a single date dimension is requested.
@techanvil techanvil added Module: Analytics Google Analytics module related issues P0 High priority Type: Enhancement Improvement of an existing feature labels Feb 22, 2023
@mxbclang mxbclang removed the Module: Analytics Google Analytics module related issues label Feb 23, 2023
@tofumatt tofumatt self-assigned this Mar 6, 2023
@tofumatt
Copy link
Collaborator

tofumatt commented Mar 7, 2023

ACs here make sense to me, I don't think there's much more we can do here so this seems the best approach to take for now.

We'll probably want to adjust/audit how we handle isGatheringData for GA4 after this change.

There was a lengthy discussion on Slack about this between @felixarntz, @eugene-manuilov, @techanvil, @aaemnnosttv, and myself (@tofumatt), so I don't think multiple AC reviewers are needed.

ACs 👍🏻

@tofumatt tofumatt removed their assignment Mar 7, 2023
@aaemnnosttv
Copy link
Collaborator

We'll probably want to adjust/audit how we handle isGatheringData for GA4 after this change.

I don't think that will be necessary because it's already based on non-zero data existing, otherwise it uses the property's age rather than only the presence of rows as with UA.

@nfmohit nfmohit assigned nfmohit and unassigned nfmohit Mar 9, 2023
@nfmohit nfmohit removed their assignment Mar 13, 2023
@techanvil techanvil self-assigned this Mar 13, 2023
@techanvil
Copy link
Collaborator Author

Hey @nfmohit, this IB is looking good. I have a few points for your consideration:

  • We should be a bit more explicit about the date handling when compareStartDate and compareEndDate are present. When these are present a second date range is added to the request. When there are multiple date ranges, the returned set of dates only includes dates which are present in any of the date ranges. So, if the date ranges overlap the dates returned will be a continuous set of dates from the earliest to the latest. However if the ranges don't overlap, there will and should be a gap in the returned dates which we don't want to pad with zeroes. This should be made clear in the IB so we don't accidentally find ourselves padding data which shouldn't be. To help clarify the point, see the expected output for these test cases where we mimic this behaviour in the mock data generator:
  • We need to take care to reproduce the correct number of metrics, with a separate object in metricValues for each requested metric.
  • A minor point, but I am not sure we really need to fall back to parent::parse_data_response() - we could just return $response directly as that's all parent::parse_data_response() does, and there's nothing that indicates we always need to call parent::parse_data_response() .

@techanvil techanvil assigned nfmohit and unassigned techanvil Mar 13, 2023
@nfmohit nfmohit removed their assignment Mar 15, 2023
@tofumatt tofumatt self-assigned this Mar 16, 2023
@techanvil
Copy link
Collaborator Author

techanvil commented Mar 16, 2023

Hey @tofumatt @nfmohit, while reviewing #6544 I've noticed that it has a requirement to use the rowCount property of a date-dimension report to determine whether the GA4 property has 3 days of data.

First, I'll make a quick side note that we need to make sure we also update the returned rowCount property to reflect the number of added rows in this issue.

With that taken care of it occurs to me that, as it stands, we are set to always populate rows for a requested date range, regardless even of when the property was created, and this in turns would make this check a bit useless as it would always return true.

Not sure what the best answer here is - maybe we should keep track of the creation date for the current GA4 property and only populate rows for subsequent dates.

Mind you it still makes the rowCount check in #6544 a bit redundant, we could just check whether 3 days have gone by since the property was created, as it'll always return rows regardless of whether data has been received...

I guess it depends what we really want to achieve for #6544 - if we want the banner to only display when there are 3 days of real data, then it points to a requirement to make this zero-data padding optional for a request. Alternatively, maybe the logic there should be adjusted to check whether it's 3 days since the first real data was tracked for the property (this could be managed without making the padding optional as we'd just check for a non-zero value), something that might be viable particularly if we're doing the same sort of query for #6572 (comment).

Bit of a tricky one. Interested to hear your thoughts!

@tofumatt
Copy link
Collaborator

tofumatt commented Mar 16, 2023

Not sure what the best answer here is - maybe we should keep track of the creation date for the current GA4 property and only populate rows for subsequent dates.

That seems reasonable, as we need to use the createTime for #6572.

I think it'd be fine to display the banner once several days have gone by since the property has been created. That banner is shown to users who are already collecting Analytics data, so presumably they'll have visits, but if not they'll at least be able to cross-reference the zero data from both and see it's legitimate. And since they're using Site Kit they should be collecting data as we would have placed the GA4 tag for them, so we know it's working.

@tofumatt
Copy link
Collaborator

I've updated the IB with your comments @techanvil, thanks!

Regarding the empty row count vs creation date… I think creation date is the best way forward. Let me know what you think; I mentioned as much in #6544 as well.

@tofumatt tofumatt assigned techanvil and unassigned tofumatt Mar 16, 2023
@wpdarren
Copy link
Collaborator

wpdarren commented Apr 7, 2023

QA Update: ✅

Verified:

  • Following the steps in the QAB in Developer Tools the response information matches the screenshot and there are the right number of rows with metric values all zero.

Added QA:Eng label as per QAB

@wpdarren wpdarren removed their assignment Apr 7, 2023
@wpdarren wpdarren added the QA: Eng Requires specialized QA by an engineer label Apr 7, 2023
@tofumatt tofumatt self-assigned this Apr 10, 2023
@tofumatt
Copy link
Collaborator

QA ✅ from me, follow @techanvil's suggestion and used his code sample to test out some reports with gaps, the data appears padded as-expected 👍🏻

Moving to approval.

@tofumatt tofumatt removed their assignment Apr 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Exp: SP P0 High priority QA: Eng Requires specialized QA by an engineer Type: Enhancement Improvement of an existing feature
Projects
None yet
Development

No branches or pull requests

7 participants