Move stability checking to `wpt run --verify` #9874

jgraham · 2018-03-06T14:12:46Z

Since we developed the stability checker, wptrunner got a --verify flag for checking test stability. Ideally instead of having separate codepaths here we could keep all the travis handling logic from the check_stability script but do the actual run using the --verify option to wpt run, so that we are confident that the same CI checks apply to both.

The text was updated successfully, but these errors were encountered:

akshitac8 · 2018-03-07T05:58:45Z

Hey @jgraham . I would like to work on this issue. Can I get a little insight on how to proceed.

kriti21 · 2018-03-13T17:59:28Z

@akshitac8 Are you still working on this one ?

jgraham · 2018-03-13T18:15:20Z

Sorry, I somehow missed the comments here. I think this isn't a trivial issue and probably not suitable as a first bug because I currently don't know what's required to land this.

akshitac8 · 2018-03-13T18:45:41Z

yes @kriti21, Its taking a little time. sorry for the inconvience:)

akshitac8 · 2018-03-18T18:02:06Z

Hello,
I wanted to confirm that this file has to be modified?

foolip · 2018-04-03T08:04:01Z

@jgraham, maybe #10269 is now a dupe of this? Is it --verify or --stability?

foolip · 2018-04-04T15:42:52Z

Hmm, both --verify and --stability exist. There's presumably some difference. @kereliuk, can you take a look, and assign to yourself if this makes sense as part of fixing #7660?

gsnedders · 2018-05-08T14:27:53Z

@jgraham

We have both:

  --stability           Stability check tests
  --verify              Run a stability check on the selected tests

What's the difference between these?

jgraham · 2018-05-08T14:43:11Z

--stability does exactly what travis does right now (except the ci command has extra integration with Travis) and --verify is the new thing that we should use instead.

gsnedders · 2018-05-08T14:53:07Z

@jgraham do we want to support both? should we drop support for one? we should at least document the difference somewhere, because at the moment the --help doesn't give any indication as to which to use!

jgraham · 2018-05-09T08:33:12Z

The intent of this issue is to move to using --verify exclusively.

foolip · 2018-05-09T08:52:43Z

There's a great deal of overlap with #10269 and perhaps they'll be fixed by the same PR, but for now I'll just align the labels and assign both to @gsnedders.

gsnedders · 2018-05-11T16:44:02Z

OK, so in short, the current flow is:

tools/ci/ci_stability.sh does little more than start tools/ci/check_stability.py
tools/ci/check_stability.py sets stuff up, calls tools.wpt.stability.run, aggregates results, and posts to pulls.wpt.org
tools/wpt/stability.py is the old thing we want to replace here (and in Make --stability just use the --verify code #10921)

The hardest part is that we need to get the data structure that tools.wpt.stability.run returns somehow out of ./wpt run --verify (and I'm not entirely sure what that data structure is yet!), though I think that's probably just copy/pasting code?

foolip · 2018-05-14T12:18:50Z

@gsnedders, could ./wpt run --verify not simply write a wptreport? If that format doesn't currently support recording multiple runs, how could we make it do that?

@lukebjerring @Hexcles, is that what we'd want for web-platform-tests/wpt.fyi#118?

gsnedders · 2018-05-14T12:26:19Z

@foolip Without having tested, I would assume it would support any of wptrunner's output formats, given it's part of wptrunner? That said, pulls.web-platform-tests.org expects to get data POSTed in a specific format, so somewhere we need to convert the data into that form (or kill the dashboard entirely; see #10923) before making the request there.

foolip · 2018-05-14T12:38:05Z

If, as a part of the work on this issue, posting to pulls.web-platform-tests.org no longer happens (because it would require additional work) I think that'd be fine, since it's no longer commenting, and when it was it was incredible hard to figure out if there were regressions or not. If we want to revive it before solving the same problem with wpt.fyi, lumping that into the same work seems OK. @jugglinmike, does that sound reasonable, or too willing to break things?

jugglinmike · 2018-05-14T15:23:59Z

I have yet to contribute to the pull request dashboard, but out of coincidence (i.e. our effort to centralize secrets), I'm almost in a position to deploy changes to that application. My preference would be to diagnose and correct the existing problem before making further changes since proceeding in the reverse order (or consolidating two steps) would make bug fixing more difficult.

when it was it was incredible hard to figure out if there were regressions or not

Do you mean "regressions [in the WPT pull request being validated]" or "regressions [in the integration between WPT and the pull request dashboard"?

foolip · 2018-05-14T15:26:53Z

I mean "regressions [in the WPT pull request being validated]".

gsnedders · 2018-05-14T15:30:48Z

I'm pretty sure the immediate state after we fix this isn't going to stop posting results, FWIW.

jugglinmike · 2018-05-14T15:32:21Z

Got it, @foolip. I was preparing to file a new issue to discuss simplifications to debugging the integration, but since you were referring to the test review process, I think that's documented by gh-7475

Hexcles · 2018-05-14T15:33:18Z

@foolip I don't know how multiple runs will interact with wptreprot. Even if they end up in the report, I don't think various readers on the wpt.fyi project can understand multiple runs of a same test.

However, the whole code path shouldn't be too hard to fix. And using wptreport is the right way to go IMHO.

foolip · 2018-05-14T16:07:49Z

@Hexcles, I suppose the options are N wptreport JSON files, or 1 wptreport JSON files representing all runs. I take it you prefer separate files? But how would one represent a run with AABB test order? (I'm guessing that's possible to achieve with some combination of flags.)

Note this doesn't yet address web-platform-tests#9874, as it currently only runs the repeat_restart mode of --verify (as it did previously).

gsnedders · 2018-05-14T17:09:52Z

So what I haven't done so far is move over to running the multi-part verify (with/without restarts, with/without chaos mode in Firefox), but it does now use the verify code.

At the moment, it's still calling into the code through Python, rather than letting it run separately and then reading the logs. If we want to avoid going through the logs, we'd need to either make check_stability (in tools/wptrunner/wptrunner/stability.py) return something useful or call get_steps ourselves.

Note this doesn't yet address web-platform-tests#9874, as it currently only runs the repeat_restart mode of --verify (as it did previously).

Note this doesn't yet address #9874, as it currently only runs the repeat_restart mode of --verify (as it did previously).

Note this doesn't yet address web-platform-tests#9874, as it currently only runs the repeat_restart mode of --verify (as it did previously).

jgraham changed the title ~~Move stability checking to wpt run --stability~~ Move stability checking to wpt run --verify Mar 6, 2018

gsnedders added infra wptrunner The automated test runner, commonly called through ./wpt run labels Mar 6, 2018

foolip added the priority:roadmap label Mar 10, 2018

jgraham added priority:backlog and removed priority:roadmap labels Apr 23, 2018

foolip assigned gsnedders May 8, 2018

foolip added ci_stability priority:roadmap and removed priority:backlog labels May 9, 2018

foolip mentioned this issue May 9, 2018

Surface test regressions in PRs #7475

Closed

gsnedders mentioned this issue May 9, 2018

Replace check-stability script with wpt run --stability and document how to run locally #10269

Closed

gsnedders mentioned this issue May 14, 2018

stability/verify unification #10988

Merged

gsnedders added a commit to gsnedders/web-platform-tests that referenced this issue May 14, 2018

Make check_stability.py use the --verify code

a035acf

Note this doesn't yet address web-platform-tests#9874, as it currently only runs the repeat_restart mode of --verify (as it did previously).

zcorpan mentioned this issue May 15, 2018

Store results of web-platform-tests PRs and allow comparing to master web-platform-tests/wpt.fyi#118

Closed

gsnedders added a commit to gsnedders/web-platform-tests that referenced this issue May 28, 2018

Make check_stability.py use the --verify code

0f0baf4

Note this doesn't yet address web-platform-tests#9874, as it currently only runs the repeat_restart mode of --verify (as it did previously).

gsnedders added a commit to gsnedders/web-platform-tests that referenced this issue Jun 5, 2018

Make check_stability.py use the --verify code

9a1fd95

Note this doesn't yet address web-platform-tests#9874, as it currently only runs the repeat_restart mode of --verify (as it did previously).

gsnedders closed this as completed in #10988 Jun 5, 2018

gsnedders added a commit that referenced this issue Jun 5, 2018

Make check_stability.py use the --verify code

38809cb

Note this doesn't yet address #9874, as it currently only runs the repeat_restart mode of --verify (as it did previously).

dhdavvie pushed a commit to dhdavvie/wpt that referenced this issue Jun 7, 2018

Make check_stability.py use the --verify code

4da9bab

Note this doesn't yet address web-platform-tests#9874, as it currently only runs the repeat_restart mode of --verify (as it did previously).

foolip mentioned this issue Jun 13, 2018

When many tests are affected, CI stability jobs will time out #7660

Closed

foolip mentioned this issue Oct 7, 2018

Unify ./wpt check-stability and ./wpt run --verify #13406

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move stability checking to `wpt run --verify` #9874

Move stability checking to `wpt run --verify` #9874

jgraham commented Mar 6, 2018 •

edited

Loading

akshitac8 commented Mar 7, 2018

kriti21 commented Mar 13, 2018

jgraham commented Mar 13, 2018

akshitac8 commented Mar 13, 2018

akshitac8 commented Mar 18, 2018

foolip commented Apr 3, 2018

foolip commented Apr 4, 2018

gsnedders commented May 8, 2018

jgraham commented May 8, 2018

gsnedders commented May 8, 2018

jgraham commented May 9, 2018

foolip commented May 9, 2018

gsnedders commented May 11, 2018 •

edited

Loading

foolip commented May 14, 2018

gsnedders commented May 14, 2018

foolip commented May 14, 2018

jugglinmike commented May 14, 2018

foolip commented May 14, 2018

gsnedders commented May 14, 2018

jugglinmike commented May 14, 2018

Hexcles commented May 14, 2018

foolip commented May 14, 2018

gsnedders commented May 14, 2018

Move stability checking to wpt run --verify #9874

Move stability checking to wpt run --verify #9874

Comments

jgraham commented Mar 6, 2018 • edited Loading

akshitac8 commented Mar 7, 2018

kriti21 commented Mar 13, 2018

jgraham commented Mar 13, 2018

akshitac8 commented Mar 13, 2018

akshitac8 commented Mar 18, 2018

foolip commented Apr 3, 2018

foolip commented Apr 4, 2018

gsnedders commented May 8, 2018

jgraham commented May 8, 2018

gsnedders commented May 8, 2018

jgraham commented May 9, 2018

foolip commented May 9, 2018

gsnedders commented May 11, 2018 • edited Loading

foolip commented May 14, 2018

gsnedders commented May 14, 2018

foolip commented May 14, 2018

jugglinmike commented May 14, 2018

foolip commented May 14, 2018

gsnedders commented May 14, 2018

jugglinmike commented May 14, 2018

Hexcles commented May 14, 2018

foolip commented May 14, 2018

gsnedders commented May 14, 2018

Move stability checking to `wpt run --verify` #9874

Move stability checking to `wpt run --verify` #9874

jgraham commented Mar 6, 2018 •

edited

Loading

gsnedders commented May 11, 2018 •

edited

Loading