Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data validity for audience tiles when connected to Analytics #8763

Closed
6 tasks done
kelvinballoo opened this issue May 29, 2024 · 8 comments
Closed
6 tasks done

Data validity for audience tiles when connected to Analytics #8763

kelvinballoo opened this issue May 29, 2024 · 8 comments
Labels
Module: Analytics Google Analytics module related issues P1 Medium priority Team S Issues for Squad 1 Type: Bug Something isn't working

Comments

@kelvinballoo
Copy link
Collaborator

kelvinballoo commented May 29, 2024

Bug Description

I don't think there is a task but we should absolutely look into data validity for those data that we surface on audience tiles for audience segmentation.

If you look at the screenshots below, you will notice that for cities, we have percentages like 119400% which is very strange.
Moreover, we'd have to list down the logic from where or how we are computing certain data from analytics.
E.g. how are we getting the visits per visitor? On analytics, I don't see that number and chances are we are computing it?

Note: This ticket stemmed from the comment here: #8138 (comment)

Steps to reproduce

  1. Set up Audience tiles
  2. Connect oi.ie in analytics
  3. Look at the tiles data

Screenshots

Screenshot 2024-05-29 at 14 33 40 Screenshot 2024-05-29 at 14 39 08

Additional Context

  • PHP Version: 8.0
  • Browser: Chrome
  • Plugin Version: 1.127.0
  • Device: Macbook Pro 16inch MacOS Sonoma

Do not alter or remove anything below. The following sections will be managed by moderators only.

Acceptance criteria

  • In order to provide a consistent view of the following metrics:
    • The "pages per visit" metric in an Audience Tile should be formatted to two decimal places, to match the format of the metric in the corresponding Key Metric tile.
    • The "cities with the most visitors" metric values in an Audience Tile should be calculated by dividing the visitor count for the city by the total visitor count (both values being per audience) in order to provide the correct percentage value.
  • When viewing the "all visitors" Audience Tile, the above metrics should match the "pages per visit" and "top cities driving traffic" Key Metrics tiles.

Implementation Brief

  • In assets/js/modules/analytics-4/components/audience-segmentation/dashboard/AudienceTile/AudienceTileMetric.js

    • Accept a new prop metricValueFormat, this would be the object which should be passed to numFmt function as options.
  • In assets/js/modules/analytics-4/components/audience-segmentation/dashboard/AudienceTile/index.js

    • For "Pages per visit" AudienceTileMetric component, pass metricValueFormat prop as:
      {
          style: 'decimal',
          maximumFractionDigits: 2,
      }
  • In assets/js/modules/analytics-4/components/audience-segmentation/dashboard/AudienceTile/AudienceTileCitiesMetric.js

    • Divide topCities?.metricValues[ index ]?.value by topCities.total to get the correct percent value.

Test coverage

  • Update stories for AudienceTile to display correct metric values.

QA Brief

  • Confirm the metrics in the bug description are now shown and calculated correctly.

Changelog entry

  • N/A
@techanvil
Copy link
Collaborator

Hi @kelvinballoo, thanks for raising this issue. The "cities" metric is clearly incorrect and it's entirely reasonable to question the validity of the metrics.

To your question about calculating metrics, the only metric we're actually calculating (as opposed to simply rounding) is in fact the "cities" one - and in fact the problem observed is because we're not performing the expected calculation (we're currently not dividing the values for the "cities" metric by the total number of visitors).

The metrics that are used in the reports are defined in the design doc. As you'll see, "visits per visitor" is directly retrieved via the sessionsPerUser metric, and the other metrics barring "cities" have similiarly direct mappings.

You may notice some discrepancies between data in Site Kit and reports/explorations in Analytics. The difference in the headline "visitors" figure ("users" in Analytics) is due to Site Kit using the totalUsers metric whereas the Analytics report uses activeUsers. We have a couple of open issues for this - #7214 and #7263.

If you notice other differences it's quite possible they can be explained by the factors discussed in this comment: #7214 (comment).

An additional aspect to consider on the topic of data consistency is how we present the same metrics in multiple places within Site Kit itself. It's worth noting that all of the metrics for an audience are viewable elsewhere in Site Kit, so when viewing the "all visitors" audience, we can cross reference other widgets as a sanity check for the figures.

To illustrate the point, here's a map of the metrics on the Audience Tile to the same metrics on Key Metric tiles and the All Traffic widget:

image

As you can see most of them match up. The "cities" values are totally off, and there are a couple of additional differences which are due to rounding.

The net of this is that we need to fix the "cities" values, and we should provide consistent rounding where it makes sense to - specifically, I think the "pages per visit" metric should be rounded to two decimal places on the Audience Tile to match the KM tile. The "total pageviews" figure can remain rounded as it is, in order to match the formatting of the "visitors" figure, it would look a bit odd otherwise. I have therefore added AC to this issue to address these specific points.

Hopefully that clarifies things and provides some useful background / context. It's undoubtedly a complex topic. Please let me know if there's anything that could use further clarification or discussion! :)

@techanvil techanvil added Type: Bug Something isn't working P1 Medium priority Module: Analytics Google Analytics module related issues labels Jun 4, 2024
@techanvil techanvil removed their assignment Jun 4, 2024
@ivonac4 ivonac4 added the Next Up Issues to prioritize for definition label Jun 6, 2024
@ankitrox ankitrox assigned ankitrox and unassigned ankitrox Jun 12, 2024
@techanvil techanvil self-assigned this Jun 18, 2024
@techanvil
Copy link
Collaborator

techanvil commented Jun 18, 2024

Hey @ankitrox, thanks for drafting this IB. A couple of points:

Considering that numFmt() is already called on metricValue in AudienceTileMetric, the suggested approach would result in numFmt() being called twice on the "pages per visit" value.

I'd suggest it would be slightly cleaner to take a similar approach to MetricTileNumeric, passing a metricValueFormat prop into AudienceTileMetric.

export default function MetricTileNumeric( {
metricValue,
metricValueFormat,
subText,
previousValue,
currentValue,
...props
} ) {
const formatOptions = expandNumFmtOptions( metricValueFormat );
return (
<MetricTileWrapper
className="googlesitekit-km-widget-tile--numeric"
{ ...props }
>
<div className="googlesitekit-km-widget-tile__metric-container">
<div className="googlesitekit-km-widget-tile__metric">
{ numFmt( metricValue, formatOptions ) }

Also, pagesPerVisit.metricValueFormat is not currently defined in code or the IB. As I'm sure you are aware, we should be copying the format from PagesPerVisitWidget, and if taking the above approach we can pass it directly to AudienceTileMetric via the metricValueFormat prop without needing to define pagesPerVisit.metricValueFormat.

const format = {
style: 'decimal',
maximumFractionDigits: 2,
};

@techanvil techanvil assigned ankitrox and unassigned techanvil Jun 18, 2024
@ankitrox
Copy link
Collaborator

Hi @techanvil ,

Thanks for reviewing the IB and adding your valuable feedback. I completely agree with your approach and it is the simpler one.

I have updated IB as per the suggestion.

Assigning this to you for re-review.

Thanks

@ankitrox ankitrox assigned techanvil and unassigned ankitrox Jun 18, 2024
@techanvil
Copy link
Collaborator

Thanks @ankitrox! The IB LGTM. ✅

@techanvil techanvil removed their assignment Jun 19, 2024
@ivonac4 ivonac4 removed the Next Up Issues to prioritize for definition label Jun 19, 2024
@binnieshah binnieshah added Team S Issues for Squad 1 Team M Issues for Squad 2 and removed Team M Issues for Squad 2 labels Jun 24, 2024
@benbowler benbowler self-assigned this Jul 8, 2024
@benbowler benbowler removed their assignment Jul 8, 2024
@eugene-manuilov eugene-manuilov self-assigned this Jul 8, 2024
@eugene-manuilov eugene-manuilov removed their assignment Jul 8, 2024
@mohitwp
Copy link
Collaborator

mohitwp commented Jul 11, 2024

QA Update ⚠️

  • Tested on dev environment.

@benbowler

Q-1) I still notice data discrepancy under 'All visitors' group "Cities with the most visitors" tile data when other group is also selected.

When only "All visitors' group is added.

image

When "All visitors" group selected with "Returning visitors". Is highlighted percentage for cities in screenshot correct ?

image

Q-2) City wise percentage showing for "All visitors" and time period of last 7 days is different in analytics. Are we calculating city wise traffic different ?

'All visitors' City wise traffic in Audience tile

image

Analytics data - City wise traffic numbers

image

Analytics data - All visitors

image

Total percentage of users city wise

image

@benbowler
Copy link
Collaborator

benbowler commented Jul 11, 2024

FYI @techanvil, @mohitwp above found a bug where data was inconsistent on the "All visitors" tile. I found a clear way to repeat this:

  1. Set just the "All visitors" tile in the selection panel, you will see the correct stats:
Screenshot 2024-07-11 at 12 51 14
  1. Then add back in the "Returning visitors" tile, you will see incorrect/mixed stats:
Screenshot 2024-07-11 at 13 08 20

The reason this happens is that, the steps above load the tiles in a different order to their default order, and the order in which the audiences are queries and returned in the GA reports. Previously we relied upon the order of the returned rows when choosing which data to pass to the tiles. I fixed this in this PR by filtering rows by their dimensionValue header rather than relying on their order.

As this is new work I will put it into CR.

@benbowler benbowler assigned benbowler and unassigned benbowler and mohitwp Jul 11, 2024
@techanvil
Copy link
Collaborator

techanvil commented Jul 11, 2024

Hi @mohitwp and @benbowler. Thanks for both spotting this and finding a fix; however - it's actually a known issue, #8921, and that issue has an IB and a draft PR.

Actually, I do prefer the PR for that issue - it's very similar but it also looks up the rows by the dateRange dimension which adds an extra element of safety.

As such, and with a view to ensuring this issue doesn't creep over the estimate, I'd say let's keep this issue to what's defined and address the above defect via #8921.

@mohitwp
Copy link
Collaborator

mohitwp commented Jul 12, 2024

QA Update ✅

  • Tested on main environment.
  • Verified the "pages per visit" metric in an Audience Tile formatted to two decimal places, to match the format of the metric in the corresponding Key Metric tile.
  • Verified when viewing the "all visitors" Audience Tile, the metrics match the "pages per visit" and "top cities driving traffic" Key Metrics tiles.
  • Verified audience tiles and key metrics tiles data are matching for all dates.

Note : Data is still not showing correct in Audience tiles as mentioned above. As per @techanvil this issue will be fix by #8921.
Also, I've notice analytics shows different data city wise. I will open a different ticket for discussion because as per AC Audience tile matches the key metrics data.

Last 28 days

image

image

Last 7 Days

image

image

Last 14 days

image

image

Last 90 Days

image

image

@mohitwp mohitwp removed their assignment Jul 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Module: Analytics Google Analytics module related issues P1 Medium priority Team S Issues for Squad 1 Type: Bug Something isn't working
Projects
None yet
Development

No branches or pull requests

9 participants