Multiple Test Failures from Blocked accept0 Syscalls (Debian CI Runs only)

The following build: https://scans.gradle.com/s/4prwec7zf6pba/  failed in a very strange way.
Seemingly all nodes of the internal test cluster keep getting stuck in in accept calls on non-blocking server sockets.
The build log is full of failed connections and the following stuck thread reporting:

```
1> [2019-06-19T16:22:22,985][WARN ][o.e.t.n.MockNioTransport ] [node_sm1] Potentially blocked execution on network thread [elasticsearch[node_sm1][transport_worker][T#2]] [2401 milliseconds]:
--
1> sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
1> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422)
1> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
1> org.elasticsearch.nio.ChannelFactory$RawChannelFactory$$Lambda$1624/1917852998.run(Unknown Source)
1> java.security.AccessController.doPrivileged(Native Method)
1> org.elasticsearch.nio.ChannelFactory$RawChannelFactory.accept(ChannelFactory.java:223)
1> org.elasticsearch.nio.ChannelFactory$RawChannelFactory.acceptNioChannel(ChannelFactory.java:180)
1> org.elasticsearch.nio.ChannelFactory.acceptNioChannel(ChannelFactory.java:55)
1> org.elasticsearch.nio.ServerChannelContext.acceptChannels(ServerChannelContext.java:47)
1> org.elasticsearch.nio.EventHandler.acceptChannel(EventHandler.java:45)
1> org.elasticsearch.transport.nio.TestEventHandler.acceptChannel(TestEventHandler.java:51)
1> org.elasticsearch.nio.NioSelector.processKey(NioSelector.java:227)
1> org.elasticsearch.nio.NioSelector.singleLoop(NioSelector.java:172)
1> org.elasticsearch.nio.NioSelector.runLoop(NioSelector.java:129)
1> org.elasticsearch.nio.NioSelectorGroup$$Lambda$1545/393896253.run(Unknown Source)
1> java.lang.Thread.run(Thread.java:748)

```

It is not immediately clear to me how we could get into these calls blocking. It doesn't seem to be dead locks on some selector lock since no thread leaks are reported on the failing tests (though it could be that us interruption the node's thread pools clears all the stuck sys calls up).  So far this seems to be a one time thing as far as I can see.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Multiple Test Failures from Blocked accept0 Syscalls (Debian CI Runs only) #43387

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Multiple Test Failures from Blocked accept0 Syscalls (Debian CI Runs only) #43387

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions