Google Chrome CI job occassionally does not run any tests #5406

gsnedders · 2017-04-06T17:06:01Z

See:

https://travis-ci.org/w3c/web-platform-tests/builds/218464208

cc/ @jgraham @jugglinmike @bobholt

gsnedders · 2017-04-06T17:41:14Z

also https://travis-ci.org/w3c/web-platform-tests/builds/218535057

gsnedders · 2017-04-06T18:00:43Z

Potentially has overlap with #5408.

jugglinmike · 2017-04-06T18:36:33Z

The problem here is actually specific to the Chrome job, so I'm updating the
title to reflect that.

In both reported cases, ChromeDriver failed to run any tests at all. The full
log (hosted on TraviCI but not included in the automated GitHub.com comment)
includes the following output:

PROCESS | 12034 | Starting ChromeDriver 2.29.461571 (8a88bbe0775e2a23afda0ceaf2ef7ee74e822cc5) on port 4454
PROCESS | 12034 | Only local connections are allowed.
Failed to connect to navigate initial page
Init failed 1
u'log' (u'debug', {'message': 'Hanging up on Selenium session'})
u'runner_teardown' ()
PROCESS | 12165 | Starting ChromeDriver 2.29.461571 (8a88bbe0775e2a23afda0ceaf2ef7ee74e822cc5) on port 4454
PROCESS | 12165 | Only local connections are allowed.
Failed to connect to navigate initial page
Init failed 2

...repeated 3 more times (for a total of 5 "Init failed" reports), followed by
a JSON-encoded version of the following error message:

Connecting to Selenium failed:
Traceback (most recent call last):
  File "/home/travis/virtualenv/python2.7.12/lib/python2.7/site-packages/wptrunner/executors/executorselenium.py", line 59, in setup
    desired_capabilities=self.capabilities)
  File "/home/travis/virtualenv/python2.7.12/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 98, in __init__
    self.start_session(desired_capabilities, browser_profile)
  File "/home/travis/virtualenv/python2.7.12/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 185, in start_session
    response = self.execute(Command.NEW_SESSION, parameters)
  File "/home/travis/virtualenv/python2.7.12/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 247, in execute
    response = self.command_executor.execute(driver_command, params)
  File "/home/travis/virtualenv/python2.7.12/lib/python2.7/site-packages/selenium/webdriver/remote/remote_connection.py", line 464, in execute
    return self._request(command_info[0], url, body=data)
  File "/home/travis/virtualenv/python2.7.12/lib/python2.7/site-packages/selenium/webdriver/remote/remote_connection.py", line 525, in _request
    resp = opener.open(request, timeout=self._timeout)
  File "/opt/python/2.7.12/lib/python2.7/urllib2.py", line 429, in open
    response = self._open(req, data)
  File "/opt/python/2.7.12/lib/python2.7/urllib2.py", line 447, in _open
    '_open', req)
  File "/opt/python/2.7.12/lib/python2.7/urllib2.py", line 407, in _call_chain
    result = func(*args)
  File "/opt/python/2.7.12/lib/python2.7/urllib2.py", line 1228, in http_open
    return self.do_open(httplib.HTTPConnection, req)
  File "/opt/python/2.7.12/lib/python2.7/urllib2.py", line 1201, in do_open
    r = h.getresponse(buffering=True)
  File "/opt/python/2.7.12/lib/python2.7/httplib.py", line 1136, in getresponse
    response.begin()
  File "/opt/python/2.7.12/lib/python2.7/httplib.py", line 453, in begin
    version, status, reason = self._read_status()
  File "/opt/python/2.7.12/lib/python2.7/httplib.py", line 417, in _read_status
    raise BadStatusLine(line)
BadStatusLine: ''

bobholt · 2017-04-06T19:12:15Z

The relevant issue here are #5336 and #5341. References to this in 5395 and 5343 are mis-attributed, as both of those PRs ran tests to completion with results.

The issue that distinguiishes this from #5407 is that here, Chromedriver tries to start 5 times each on various ports starting at 4444 and continuing to 4463.

bobholt · 2017-04-11T12:46:23Z

Going back through PRs, all PRs since 11:15am EDT on April 7 have completed Chrome tests that match the Firefox tests. I am also no longer able to recreate the timeouts on my fork.

Unless this PR success is due to re-triggering Chrome jobs, I propose that this be closed. This is unsatisfying to me personally, because I was not able to track down a root cause for the timeouts, but as I no longer recreate the behavior, I don't hold out much hope of figuring it out.

jugglinmike · 2017-04-12T16:18:55Z

Looks like we're still having trouble:

GitHub.com pull request: webmessaging: Always use an array to pass transferables via postMessage() #5538
TravisCI job: https://travis-ci.org/w3c/web-platform-tests/jobs/221330860

This is pretty elusive, though: of forty builds that occurred over the last 5 days, it's only occurred once.

This may be a regression in the "Chromedriver" binary. I say this because the "check stability" script was authored to fetch the most recently-published version, and version 2.29 of ChromeDriver was released on April 4, which roughly corresponds to when we started to experience this instability. Nothing in that change log seems particularly relevant, but you never know.

I would like to follow up with the ChromeDriver development group, but we have very little information to share at the moment. As a preliminary step, I am attempting to capture ChromeDriver debugging output at the moment of failure. This output is highly verbose, so we can't enable logging generally (doing so would cause log truncation and potentially obscure output that is relevant to test contributors). Instead, I've opened a dedicated pull request to collect the data. I plan on manually re-triggering that build until the timeout occurs.

I'll report back here when I've got some data.

gsnedders · 2017-04-12T16:46:29Z

Does 2.28 work with the current dev channel of Chrome? Should we try rolling back to 2.28 and seeing if we get stability back?

jugglinmike · 2017-04-12T16:58:24Z

I discussed that possibility with @bobholt. Given the infrequency of the
timeout (again, one in forty builds), it would be some time before we could
determine if that was effective, even informally.

jugglinmike · 2017-04-13T15:36:46Z

Alright, we now have some relevant debugging information. gh-5544 references a
passing and failing build, each with logs extended to include ChomeDriver's
debugging output

Passing job: https://s3.amazonaws.com/archive.travis-ci.org/jobs/221415852/log.txt
Failing job: https://s3.amazonaws.com/archive.travis-ci.org/jobs/221421533/log.txt

I'll skip the analysis here since it's not immediately relevant to WPT.
Suffice it to say, the communication between ChromeDriver and Chrome
(specifically the DevTools component) is definitely failing here.

The good news is that this is a known problem: there are already two separate
bugs in the ChromeDriver issue tracker documenting the behavior:

The bad news is that they have not received very much traction from the
maintainers (which is a shame because that first report is incredibly
thorough). Note that this is starting to look more like an issue with Chromium
rather than ChromeDriver, so maybe a new bug report (this one filed against
Chromium) is in order.

@foolip is there anyone on either team that you can talk to about increasing
the priority?

foolip · 2017-04-13T15:57:57Z

@pavelfeldman or @RByers, do you know who's responsible for triaging ChromeDriver bugs?

RByers · 2017-04-15T21:09:34Z

@pavelfeldman or @RByers, do you know who's responsible for triaging ChromeDriver bugs?

Unfortunately our main ChromeDriver owner (Sam) has recently left Google. I'm trying to find a web platform team to help own it. @NavidZ may be able to help.

jgraham · 2017-04-20T18:03:33Z

Based on the bugs that @jugglinmike linked #5626 may offer a bandaid over the problem (although I'm not yet sure).

@RByers: Assuming it does, it is more beneficial to land the bandaid fix, or leave Chrome running but not blocking PRs? I know that @jugglinmike would like to avoid the bandaid since we don't really understand it, and it may delay a real fix. However I don't want to leave you in a situation where you are importing tests that are unstable in a way that we could have caught.

jugglinmike · 2017-04-20T18:07:13Z

(reposting from IRC) if we implement the workaround, we may never see a fix from Chromium. The bug will continue to trip up application developers. I think ignoring Chromium failures is the better solution because it avoids the workflow interruptions in WPT, places some pressure on the Chromium team, and (importantly) is something we actually understand.

RByers · 2017-04-22T06:44:50Z

For the record here, we agreed to land the temporary work-around but I'm also pushing to get either a workaround landed in Chrome and/or a proper fix to glib shipped at high priority. This is a problem for anyone trying to automate Chrome - so is important regardless of whether WPT has a workaround.

jugglinmike · 2017-04-22T18:34:32Z

Thank you, Rick! 🌈

RByers · 2017-05-12T16:16:51Z

FYI the work-around has now landed in chromium and is included in the latest Chrome dev-channel build (60.0.3095.5). I suggest we try removing the workaround in WPT.

gsnedders · 2017-06-30T23:28:30Z

#6438 ran tests only on Firefox. Is the issue with Safari/Edge related to this?

RByers · 2017-08-16T14:30:09Z

This issue is specific to the problem of Chrome hanging on startup, causing timeout errors during webdriver connection. We believe that issue is now fixed, so closing (but there appear to still be other issues for why tests may not run).

gsnedders added ci_stability infra labels Apr 6, 2017

This was referenced Apr 6, 2017

Stability checker running totally unrelated tests #5408

Closed

Can't merge PR due to Chrome failing stability check and Firefox just timing out #5395

Closed

[css-display-3] Add tests for dynamic changes of display: contents pseudo-element styles. #5343

Merged

jugglinmike changed the title ~~Stability check running different sets of tests on Chrome/Firefox~~ Google Chrome CI job occassionally does not run any tests Apr 6, 2017

bobholt mentioned this issue Apr 6, 2017

Stability checker frequently timing out on Chrome #5407

Closed

bobholt self-assigned this Apr 6, 2017

jugglinmike mentioned this issue Apr 14, 2017

[InputEvent] Make StaticRange immutable and move tests to wpt #5561

Merged

jugglinmike mentioned this issue Apr 20, 2017

Try setting DBUS_SESSION_BUS_ADDRESS when running tests in Chromedriver #5626

Merged

RByers closed this as completed Aug 16, 2017

sideshowbarker unassigned bobholt May 28, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Google Chrome CI job occassionally does not run any tests #5406

Google Chrome CI job occassionally does not run any tests #5406

gsnedders commented Apr 6, 2017

gsnedders commented Apr 6, 2017

gsnedders commented Apr 6, 2017

jugglinmike commented Apr 6, 2017

bobholt commented Apr 6, 2017

bobholt commented Apr 11, 2017

jugglinmike commented Apr 12, 2017

gsnedders commented Apr 12, 2017

jugglinmike commented Apr 12, 2017

jugglinmike commented Apr 13, 2017

foolip commented Apr 13, 2017

RByers commented Apr 15, 2017

jgraham commented Apr 20, 2017

jugglinmike commented Apr 20, 2017

RByers commented Apr 22, 2017

jugglinmike commented Apr 22, 2017

RByers commented May 12, 2017

gsnedders commented Jun 30, 2017

RByers commented Aug 16, 2017

Google Chrome CI job occassionally does not run any tests #5406

Google Chrome CI job occassionally does not run any tests #5406

Comments

gsnedders commented Apr 6, 2017

gsnedders commented Apr 6, 2017

gsnedders commented Apr 6, 2017

jugglinmike commented Apr 6, 2017

bobholt commented Apr 6, 2017

bobholt commented Apr 11, 2017

jugglinmike commented Apr 12, 2017

gsnedders commented Apr 12, 2017

jugglinmike commented Apr 12, 2017

jugglinmike commented Apr 13, 2017

foolip commented Apr 13, 2017

RByers commented Apr 15, 2017

jgraham commented Apr 20, 2017

jugglinmike commented Apr 20, 2017

RByers commented Apr 22, 2017

jugglinmike commented Apr 22, 2017

RByers commented May 12, 2017

gsnedders commented Jun 30, 2017

RByers commented Aug 16, 2017