core: ensure good and average scores start exactly at control points #13559
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
part of #13316
(this description refers to scores between 0 and 100 because it makes talking about rounding and flooring easier, but everything is actually done on scores between 0 and 1)
Our raw log-normal scoring method takes audit values and generates percentile scores based on control points. e.g. TBT's p10 control point of 200 ensures that TBT values ≤ 200ms get a 90+ score and TBT values > 200ms get a score less than 90.
The problem outlined in #13316 is that
audit.js
then rounds the score prior to adding it to the audit result. A score less than 90 can be rounded up to 90 if it's 89.5 or above, so in practice the TBT p10 control point has actually been ~204.86ms.The simplest fix is to just floor the score rather than rounding it. With the guaranteed score precision provided by #13392, we can be confident that
Math.floor(percentile)
means that a value is never given a score higher than a 50 or a 90 unless it's faster than the median or p10 control points.Here is how the before/after rounding looks between a score of 20 and 30:
statistics.getLogNormalScore
Math.round()
clamping, where half the time the rounded score is above the raw score and half the time it's belowMath.floor()
clamping, where the rounded score is always below the raw score. A rounded score of 90 can only have been from a 90+ raw score.This does mean that the new rounding matches with the old rounding half the time, but the other half of the time it's 1 point lower. This is for individual audits/metrics; as discussed in #13316, we should expect about half of sites to lose 1 point in their final perf score and the other half to be unchanged.
However! There's a big issue with this: attainability of a perfect score. Previously a score of 99.5 would get rounded up to 100, so there was enough room there for some sites to get a perfect score with metric numbers that are attainable in the real world. Flooring the scores takes those sites to 99 instead.
A couple of different approaches are possible to correct this. One lightweight way is to apply a small correction to transition from the new floor clamping at 90 (where we need it to precisely match the control point) to the old rounding at 100 (where we want the old rounding up behavior). This spreads scores between 90 and 100 to a range of 90 to 100.5, so the concluding Math.floor() takes those top end scores to a 100.
Here is what flooring vs flooring with the correction looks like:
Below a score of 90, it's identical to just flooring. As it rises above 90 and approaches 100, it gets closer and closer to what the old
Math.round()
scoring looked like, sometimes under and sometimes over the raw score).This might sound more complicated than it is. It's a linear ramp from 0 to 0.5 between scores of 90 to 100 (
(score - 0.9)/0.1 * 0.005
)Previously any score over 99.5 would get rounded up to 100, now it's any score over ~99.524 will get rounded up. As a result, over 95% of sites that previously got a 100 on a particular audit will still get a perfect score. The analysis of the impact on the overall perf score depends a lot more on the distribution of site scores than the above estimate, but since it's strictly a score boost for some sites, we can at least expect less than half of sites to lose a point on the overall perf score.