Score "Investigate" progress as part of the overall metric #49

jgraham · 2022-01-21T15:06:55Z

Mozilla are concerned that the score being based purely on the test pass rate for the accepted proposals doesn't provide any visible reward for progress on the areas that were marked as "Investigate". Although these are not at the point where we can define a set of tests for implementors to target, several of the "Investigate" areas are places where we see a lot of compatability problems in practice, and completing the pre-implementation work required to address those problems will pay dividends in terms of improving the experience of the web for end users and authors. Therefore that work should be recognised alongside implementation work in terms of improving the Interop-2022 score.

Given this we propose the following:

For each area marked as "Investigate" we define a set of outcomes that will represent mesurable progress toward addressing the compat problems in that area, for example, writing spec text for a feature, or creating a viable plan to align existing implementations.
We dedicate some percentage of the overall score to progress on these goals. Unlike the existing areas, these will be scored manually and the score will be shared across all implementations.

For example, we might decide that 85% of the total score comes from the test pass rate, and 15% of the score comes from progress on the "Investigate" areas. If we completed two thirds of the investigate work, but failed to make progress on the final third, that would mean the most that any one implementation could score on the overall metric would be 95%.

Concretely, I think the areas we marked as "Investigate" that should form part of this proposal are:

I've excluded AVIF because, as I understand it, the scope of "Investigate" there wasn't about the state of the spec or tests, and accent-color because the concern there seemed to be that there weren't actually any failing tests.

In terms of scoring I don't think that all these areas necessarily need to be given equal weight, or that each "Investigate" area has to be exactly equal to an implementation area.

foolip · 2022-01-24T08:06:34Z

I think this makes sense, and 15% is a reasonable portion of the score.

I do think spelling out the scoring of these areas ahead of time might be challenging, in particular for Pointer Events where the issue was that existing failures didn't seem representative of issues that web developers might face, but do we know in any detail what things do need to be tested?

For layerX and layerY, it's rather a lot smaller than the others, so if there's a way to write a tentative test that would fail in all browsers for this, and keep it in the webcompat bucket, I think that would be preferable.

@jensimmons what do you think of this proposal, in particular as regards #41?

chrishtr · 2022-01-24T16:30:03Z

I think this makes sense, and 15% is a reasonable portion of the score.

I also think it's ok, and good to pay attention to these research tasks in addition to fixing bugs.

My suggestion is to start with a 0 score in this area for all three browser implementations and then increase it at the end of each quarter if progress has been made, via a consensus decision.

I think each research group can and should define its own agenda and ways of organizing; it's not necessary for the whole group to pre-agree on these plans. The group can then review the results at the end of the quarter and give a reasonable score.

jensimmons · 2022-01-27T17:03:44Z

I think this makes sense to me for Viewports, perhaps we make the (few) automated tests be 50%, and the manual testing that we plan & do be 50%. Perhaps? We won't be manually able to test over & over, however. So I'm open for debate on this.

jensimmons · 2022-01-27T17:35:55Z

I have a serious problem with reopening and re-litigating decisions already made by changing the process after the fact.

gsnedders · 2022-02-03T15:59:25Z

At this stage, we have a number of concerns about substantially changing the scoring approach, especially when we are so close to the proposed announcement date. This isn’t to say we don’t think there’s substantial value in documenting progress on the things marked as investigate, but I think we’d need a much more concrete proposal as to how the scoring would affect the overall scores.

It’s unclear how the Investigate items will be scored, but presuming that gets figured out, it’s still very unclear how those scores will impact the overall score for each browser. Is the proposal that the resulting score be applied to each browser’s total in an equal fashion? Or that browsers will be able to earn more points than others by participating in the standards and testing work needed?

If the proposal is to have the investigation scoring apply equally to all browsers, it’s not clear it provides much value aside from making it harder for everyone to reach 100%; it lacks the competitive pressure that the metric otherwise provides.

If the proposal is to have investigation scoring apply differently to each browser, based on some sort of measure of participation or contribution, then much more detail would be needed as to how to this would be measured.

In either case, we are dubious that consensus about how each investigation item is scored can be a reached in time for the announcement date, and we very strongly want any discussions about scoring to be concluded by that date.

Our preference, as far as 2022 is concerned—and we can develop further proposals over the coming year for how we want to score investigate items in future—is to list the investigation as a separate item on the dashboard, aside from the browsers and their scores. In that regard, we’re much less concerned about ongoing discussion about the scoring of the items, provided we believe we can reach consensus within the first quarter of the year.

jgraham · 2022-02-03T16:21:48Z

As a point of clarity, the proposal is to have a uniform score that applies to all implementations.

Investigation work intrinsically requires collaboration and so it makes sense to score it in a way that rewards everyone for collective progress. That does come with the tradeoff that you can't use that score in a "competitive" sense against other implementations. But you can use it to demonstrate that you've made good on a commitment to improving the web ecosystem as a whole. Although competitive pressure is certainly one way that the web makes progress, it's not the only way, and scoring an interoperability metric entirely on the basis of what specific implementations acomplish in terms of landing new feature work is missing the bigger picture.

foolip · 2022-02-23T17:04:15Z

This is implemented in https://staging.wpt.fyi/interop-2022 now, in all places except one. In the summary graph, I've forgotten to scale score to 90%. Leaving this issue as a reminder of that...

foolip · 2022-08-18T09:39:11Z

I fixed the summary graph at some point, closing this as fixed.

foolip mentioned this issue Jan 26, 2022

Agenda for Jan 27 meeting #50

Closed

foolip mentioned this issue Feb 2, 2022

Agenda for Feb 3 meeting #54

Closed

foolip mentioned this issue Feb 9, 2022

Agenda for Feb 10 meeting #56

Closed

foolip closed this as completed Aug 18, 2022

gsnedders added the meta Process and/or repo issues label Sep 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Score "Investigate" progress as part of the overall metric #49

Score "Investigate" progress as part of the overall metric #49

jgraham commented Jan 21, 2022 •

edited

Loading

foolip commented Jan 24, 2022

chrishtr commented Jan 24, 2022

jensimmons commented Jan 27, 2022 •

edited

Loading

jensimmons commented Jan 27, 2022

gsnedders commented Feb 3, 2022

jgraham commented Feb 3, 2022

foolip commented Feb 23, 2022

foolip commented Aug 18, 2022

Score "Investigate" progress as part of the overall metric #49

Score "Investigate" progress as part of the overall metric #49

Comments

jgraham commented Jan 21, 2022 • edited Loading

foolip commented Jan 24, 2022

chrishtr commented Jan 24, 2022

jensimmons commented Jan 27, 2022 • edited Loading

jensimmons commented Jan 27, 2022

gsnedders commented Feb 3, 2022

jgraham commented Feb 3, 2022

foolip commented Feb 23, 2022

foolip commented Aug 18, 2022

jgraham commented Jan 21, 2022 •

edited

Loading

jensimmons commented Jan 27, 2022 •

edited

Loading