Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

change individual metric score rounding #13316

Closed
brendankenny opened this issue Nov 3, 2021 · 3 comments
Closed

change individual metric score rounding #13316

brendankenny opened this issue Nov 3, 2021 · 3 comments
Assignees
Labels

Comments

@brendankenny
Copy link
Member

brendankenny commented Nov 3, 2021

historical background

Originally scores were going to be the primary way to communicate how a page did on individual metrics, with the actual numericValue (née rawValue) being less important and mostly included for automated tools consuming the LHR. We also decided to always round each metric score to a whole number (to two decimal places in the [0, 1] version) to keep score precision realistic for users.

Over time, the metric scores have disappeared from the html report almost completely, only taking part in how they're combined to make the overall perf score and in giving a color/icon to each metric. numericValues, meanwhile, are now displayed right at the top.

In that time, we've also moved from metric thresholds that weren't externally talked about in a very precise way (especially when we defined our curves using the "point of diminishing returns" and not the exact 10th/50th percentiles) to today, where many of the thresholds are shared with the field CWV thresholds and are published very publicly.

The problem

All this comes together to cause confusion for anyone trying to analyze Lighthouse results and expect metric numericValues, scores, and thresholds to all be consistent. This came up in the context of some Web Almanac work, where I made the claim that it was equivalent to either look at a TBT score (≥ 0.9 for good, < 0.5 for poor) or TBT numericValue (≥ 200ms for good, > 600 for poor).

However, these are not equivalent, because we round the score :) Actually, a TBT as high as 204.856ms will still get a TBT score of 90 and a TBT as high as 606.48ms will still get a score of 50.

Proposal

We still round the metric scores, but we always round down (use Math.floor() here). Any TBT score ≥ 90 will also have a numericValue ≥ 200.

This will change scores (so it would be a breaking change) but it's arguably correcting scores to what the threshold system was always intending.

Assuming metrics are currently evenly split between scores that are being rounded up and those being rounded down (probably reasonable), about half of sites will drop one point on each metric from this change. With our six weighted metrics (using LH v8/9 weights):

  • half of sites would lose 0.5 points in the overall perf score (with final rounding, possibly enough to drop them a point downwards)
  • 10% of sites would lose 0.8 points
  • about 1.5% of sites would see a full point drop

edit: actually the overall score calc is easy to get from this. Given the above and if we assume scores are evenly spread in their fractional portion, the final rounded perf score will drop by 1 point for 50% of sites, the rest will see no change in their final score. We could run it against HTTP Archive numbers to get an empirical percentage, but given that it's a max 1 point drop regardless, a ballpark percentage seems fine.

Overall not a large change for a significant consistency gain.

@brendankenny
Copy link
Member Author

There will be numeric issues (e.g. with Math.floor(), getLogNormalScore({p10: 200, median: 600}, 200) gives 0.8999999314038525 and so would get floored to 0.89), but since we're rounding anyways, it would be easy to throw in a clamp against the threshold depending on if the numericValue is on one side of a threshold or the other.

@connorjclark
Copy link
Collaborator

connorjclark commented Nov 15, 2021

from eng discussion:

  1. let's change clampTo2Decimals to clampTo4Decimals
  2. add some tests w/ numeric values around control points, assert that the value is passing as expected
  3. possibly add special rounding treatment in computeLogNormalScore with numeric values around control points
  4. let's not consider this a breaking change. the consistency out weighs the score change, and our intention of scores wrt control points makes this a bug. and the score change is literally a rounding error much less than expected variance.

@brendankenny
Copy link
Member Author

Going to call this closed by #13392 and #13559. We may still want to switch to 4-digit clamping for scores, but that wouldn't have fixed this issue. There would still be very small differences between control points and expected scores, but more importantly we'd also have to make sure that any downstream tool or the report never show whole-number metric scores or they would fall prey to this same issue (e.g. Intl.NumberFormat formatting a score to '90' but using average color based on the full precision being less than 90).

clampTo4Decimals can still be easily layered on top of this, though (switching 100 to 10_000 here), and any downstream or report rounding will do the correct thing by default.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants