Skip to content
This repository has been archived by the owner on Feb 20, 2023. It is now read-only.

[Bug] TotalUriCount metric is being misused #6577

Closed
travis79 opened this issue Nov 13, 2019 · 18 comments
Closed

[Bug] TotalUriCount metric is being misused #6577

travis79 opened this issue Nov 13, 2019 · 18 comments
Assignees
Labels
🐞 bug Crashes, Something isn't working, .. Feature:Telemetry P2 Upcoming release

Comments

@travis79
Copy link
Member

travis79 commented Nov 13, 2019

This metric, totalUriCount is defined as a StringMetricType but is clearly a counter. This is not in the spirit of how Glean is supposed to be used and has potential downfalls.

For instance: This metric is being stored in the context.settings() object and is being incremented there while the app is running, but is only reported in setStartupMetrics() when the app starts. This means that Fenix is collecting data in a foreground 'session', and then it could be days before it has another foreground 'session' and it actually reports this data. This violates the 4AM daily data window that has been set up with Data Science for the metrics ping because data that was recorded days ago is not being reported in the correct daily window.

I think that this should be a CounterMetricType with a lifetime of Ping so that it keeps a running total of the UriCount for that session. This would allow Glean to do what it's supposed to do and take the burden of managing the persistence off of Fenix.

┆Issue is synchronized with this Jira Task

@travis79 travis79 added the 🐞 bug Crashes, Something isn't working, .. label Nov 13, 2019
@Dexterp37
Copy link
Contributor

For reference, this is about #6003

@travis79
Copy link
Member Author

There also appears to be two total_uri_count metrics defined in the metrics.yaml, one in the events category and one in the metrics category, which is the one in question here.

Is this intentional and these are instrumenting different things?

@mdboom
Copy link
Contributor

mdboom commented Nov 13, 2019

My biggest concern with this (if I'm reading the code correctly -- please correct me if I'm missing a detail):

On startup:

  • Loads the count from context.settings() and sets it with Glean.
  • Resets the setting in context.settings() to 0

When URLs are visited:

  • The setting in context.settings() is incremented.

The value itself is sent in the metrics ping by Glean every day at (usually) 4am local time.

This means that the count will only ever include the number from the second-to-last run of the application before 4am, and all other counts at other times of day will simply be dropped. (And note that "run of the application" != "foreground session" on Android).

I think what you want is a counter with a ping lifetime on the metrics ping (sent once a day). Glean will take care of resetting the value every time the ping is sent and values won't be dropped.

@sblatz
Copy link
Contributor

sblatz commented Nov 13, 2019

I think what you want is a counter with a ping lifetime on the metrics ping (sent once a day). Glean will take care of resetting the value every time the ping is sent and values won't be dropped.

I think this makes sense as a solution moving forward. Thanks for the clarification on this ticket; we weren't exactly sure how best to implement this :)

@sblatz
Copy link
Contributor

sblatz commented Nov 13, 2019

Related to #4456

@Dexterp37
Copy link
Contributor

I think what you want is a counter with a ping lifetime on the metrics ping (sent once a day). Glean will take care of resetting the value every time the ping is sent and values won't be dropped.

I think this makes sense as a solution moving forward. Thanks for the clarification on this ticket; we weren't exactly sure how best to implement this :)

Please feel free to reach out or ask for review from anybody on the Glean team! We're more than happy to help with any doubt!

@sblatz
Copy link
Contributor

sblatz commented Nov 22, 2019

This fix is required to complete #4456.

@liuche
Copy link
Contributor

liuche commented Nov 22, 2019

Figure out which of these 2 bugs to keep/close (this and #4456)

@boek boek added P1 Current sprint and removed needs:group-triage labels Nov 26, 2019
@Mugurell Mugurell self-assigned this Feb 11, 2020
boek pushed a commit that referenced this issue Feb 13, 2020
This was added as a duplicate to an already existing `total_uri_count`
CounterMetricType in #4456.
boek pushed a commit that referenced this issue Feb 13, 2020
Re-apply the change for #4456.
Keep `total_uri_count` as a CounterMetricType and let Glean manage it's
persistence and reset time (resets with each metrics ping sent).
@AndiAJ
Copy link
Collaborator

AndiAJ commented Mar 27, 2020

Hi, I've just re-checked this matter on the latest Nightly Build #20870608 from 3/27 using a Google Pixel 3a (Android 10)

Performed and generated:
► One search:
https://www.theverge.com/
❌ Baseline Ping - 1df0ab29-7b79-4894-9ad9-c156ca8de399
"events.total_uri_count": 2

► Three searches:
https://yandex.com/
https://www.google.com/
http://www.bing.com/
✔️ Baseline Ping - 354c0e24-a377-48b2-994d-8fb32e1fe852
"events.total_uri_count": 3

► Four searches:
https://imgur.com/
https://www.facebook.com/
https://www.instagram.com/?hl=en
https://twitter.com/
✔️ Baseline Ping - 5b320e9a-b09d-4f2c-b676-09371ee883af
"events.total_uri_count": 4

► Seven searches:
Cat
https://www.taobao.com/
Dog
https://www.qq.com/
Huawei
https://www.baidu.com/
Samsung (Suggestion)
❌ Baseline Ping - b989adc5-4fdd-433c-81cb-c19b3e0dbca8
"events.total_uri_count": 9

► Ten searches:
https://www.ebay.com/
https://www.amazon.com/
Laptop (Suggestion)
https://www.aliexpress.com/
https://www.youtube.com/
https://vimeo.com/
Lamp
https://soundcloud.com/soundloud
https://www.cnet.com/
https://www.digitaltrends.com/
✔️ Baseline Ping - abfdbd8a-72e4-433a-9ea5-53f5469b01cb
"events.total_uri_count": 10

► Metrics
Ping - e8aebbbf-3b4d-4105-89b6-9a994925f87d
"events.total_uri_count": 28

Logcat
Glean dashboard

✔️ The "metrics.total_uri_count" got changed with "events.total_uri_count"
✔️ The grand totals recorded in the baseline pings are equal with the grand total from the metrics ping.
❌ In some cases, the baseline pings don't seem to record the correct `"events.total_uri_count" values.

@travis79 & @sblatz - Please review and share your thoughts. ☺️

@Dexterp37
Copy link
Contributor

@AndiAJ do you have clear steps to reproduce the problem for the cases in which you think there's a mismatch? This is either a problem with the expectations of the data (are we really counting only tld loads?) or a problem in what Fenix is counting. So any help on this would inform @sblatz on how to proceed.

I don't think this is a problem with the SDK.

@AndiAJ
Copy link
Collaborator

AndiAJ commented Mar 27, 2020

Thx @Dexterp37 ,I've added 3 scenarios, hope this helps. ☺️
For the +1 matter #6499 was logged, unfortunately in the 1st and 2nd scenario I've bumped in +2 additions to the baseline pings.

1st Scenario:

• Start Fenix on a new profile
• Navigate to https://www.theverge.com/
• Put Fenix in background (baseline ping is generated)
• Resume Fenix
• Search the term "Cat"
• Navigate to https://www.taobao.com/
• Search the term "Dog"
• Navigate to https://www.qq.com/
• Search the term "Huawei"
• Navigate to https://www.baidu.com/
• Search the term "Samsung" using a suggestion
• Put Fenix in background (baseline ping is generated)

► Video
20200327-114509

1️⃣ The first baseline ping that is generated after the user navigates to https://www.theverge.com
• Ping dd8e8eca-94ed-46fd-924f-ba28d1a7861b - "events.total_uri_count": 2+1 added

2️⃣ The second baseline ping
• Ping 99260fb4-2953-41cc-8304-bf2ab3a70179 - "events.total_uri_count": 9+2 added
✔️ The search actions and suggestions are properly calculated:

"metrics.search_count": {
            "google-b-m.action": 3, ✔️
            "google-b-m.suggestion": 1 ✔️
          }

Logcat
Glean dashboard

2nd Scenario

• Start Fenix on a new profile
• Search the term "Cat"
• Search the term "Dog"
• Search the term "Huawei"
• Search the term "Samsung" using a suggestion
• Put Fenix in background (baseline ping is generated)
• Resume Fenix
• Navigate to https://www.taobao.com/
• Navigate to https://www.qq.com/
• Navigate to https://www.baidu.com/
• Put Fenix in background (baseline ping is generated)

► Video
20200327-114509

1️⃣ The first baseline ping
• Ping cca02cdd-6d8d-477b-b742-5d6e4d537a56 - ""events.total_uri_count": 4 ✔️
The search actions and suggestions are properly calculated:

          "metrics.search_count": {
            "google-b-m.action": 3, ✔️ 
            "google-b-m.suggestion": 1 ✔️ 
          }

2️⃣ The second baseline ping
• Ping 5493765a-37c4-4c4f-9ba8-dd0ecb5a2f59 - ""events.total_uri_count": 5+2 added

Logcat
Glean dashboard

3nd Scenario

• Start Fenix on a new profile
• Navigate to https://yandex.com/
• Navigate to https://www.google.com/
• Navigate to http://www.bing.com/
• Navigate to https://imgur.com/
• Navigate to https://www.facebook.com/
• Navigate to https://www.instagram.com/
• Navigate to https://twitter.com/
• Navigate to https://www.ebay.com/
• Navigate to https://www.amazon.com/
• Navigate to https://www.aliexpress.com/
• Navigate to https://www.youtube.com/
• Navigate to https://vimeo.com/
• Navigate to https://soundcloud.com/soundloud
• Navigate to https://www.cnet.com/
• Navigate to https://www.digitaltrends.com/
• Navigate to https://www.theverge.com/
• Put Fenix in background (baseline ping is generated)

1️⃣ The ** baseline ping** is generated
• Ping 1f894e91-8b3f-4088-97d8-02389b8651ae - """events.total_uri_count": 17+1 added

► Video
20200327-131440

Logcat
Glean dashboard

@Dexterp37
Copy link
Contributor

Dexterp37 commented Apr 1, 2020

Thanks @AndiAJ - I think this should be fairly easy to debug for Fenix devs by checking under which circumstances values are being incremented/reported. cc @sblatz @boek

One potential idea is that Fenix is also counting the initial page?

@eliserichards eliserichards changed the title TotalUriCount metric is being misused[Bug] [Bug] TotalUriCount metric is being misused Apr 21, 2020
@eliserichards
Copy link

We converted this to a counter, but the three scenarios defined by @AndiAJ #6577 (comment) need to be fixed.

@Dexterp37
Copy link
Contributor

Hey @eliserichards ! Just one note here: to make sure that the data is reliable for decision making, please consider prioritizing this for sooner, rather than later :)

@vesta0 vesta0 added the P2 Upcoming release label Apr 28, 2020
@Mugurell
Copy link
Contributor

Sorry for not getting to this sooner.
Based on AndiAJ and Dexterp37's comments I think we have a pretty good understanding of the issue, I'll work to fix it.

@Mugurell
Copy link
Contributor

This seems to be happening because of programatic redirects.
Following AndiAJ's test scenarios from #6577 (comment)

  • https://www.qq.com/ uses
if (location.search.indexOf('?pc') !== 0 && /Android|Windows Phone|iPhone|iPod/i.test(navigator.userAgent)) {
    window.location.href = 'https://xw.qq.com?f=qqcom';
  }
  • http://baidu.com/ uses
    <meta http-equiv="refresh" content="0;url=http://www.baidu.com/">

Don't think there's anything we can do.

@Dexterp37
Copy link
Contributor

@Mugurell I think that one low hanging fruit could be to document this in the metric description within the metrics.yaml file. This would make it easy to figure out the behaviour if someone else stumbles on it in the future.

@Mugurell
Copy link
Contributor

Mugurell commented May 4, 2020

Based on the above investigation that showed total_uri_count could be incremented without explicit user action but this being more or less the expected result, documentation for which was added in #10322, I think this can be closed.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
🐞 bug Crashes, Something isn't working, .. Feature:Telemetry P2 Upcoming release
Projects
None yet
Development

No branches or pull requests

10 participants