Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Display any Computer.terminatedBy in the build log #94

Merged
merged 3 commits into from
Jan 11, 2019

Conversation

jglick
Copy link
Member

@jglick jglick commented Jan 10, 2019

I have been seeing a lot of inexplicable build failures on ci.jenkins.io of the form

Stack trace
[Pipeline] End of Pipeline

GitHub has been notified of this commit’s build result

java.nio.channels.ClosedChannelException
Also:   hudson.remoting.Channel$CallSiteStackTrace: Remote call to JNLP4-connect connection from …
		at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1741)
		at hudson.remoting.Request.call(Request.java:202)
		at hudson.remoting.Channel.call(Channel.java:954)
		at hudson.FilePath.act(FilePath.java:1072)
		at hudson.FilePath.act(FilePath.java:1061)
		at hudson.FilePath.exists(FilePath.java:1581)
		at jenkins.branch.WorkspaceLocatorImpl.load(WorkspaceLocatorImpl.java:218)
		at jenkins.branch.WorkspaceLocatorImpl.locate(WorkspaceLocatorImpl.java:159)
		at jenkins.branch.WorkspaceLocatorImpl.locate(WorkspaceLocatorImpl.java:129)
		at jenkins.branch.WorkspaceLocatorImpl.locate(WorkspaceLocatorImpl.java:125)
		at hudson.model.Slave.getWorkspaceFor(Slave.java:334)
		at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask$PlaceholderExecutable.run(ExecutorStepExecution.java:704)
		at hudson.model.ResourceController.execute(ResourceController.java:97)
		at hudson.model.Executor.run(Executor.java:429)
Caused: hudson.remoting.RequestAbortedException
	at hudson.remoting.Request.abort(Request.java:340)
	at hudson.remoting.Channel.terminate(Channel.java:1038)
	at org.jenkinsci.remoting.protocol.impl.ChannelApplicationLayer.onReadClosed(ChannelApplicationLayer.java:209)
	at org.jenkinsci.remoting.protocol.ApplicationLayer.onRecvClosed(ApplicationLayer.java:222)
	at org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.onRecvClosed(ProtocolStack.java:816)
	at org.jenkinsci.remoting.protocol.FilterLayer.onRecvClosed(FilterLayer.java:287)
	at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.onRecvClosed(SSLEngineFilterLayer.java:181)
	at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.switchToNoSecure(SSLEngineFilterLayer.java:283)
	at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.processWrite(SSLEngineFilterLayer.java:503)
	at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.processQueuedWrites(SSLEngineFilterLayer.java:248)
	at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.doSend(SSLEngineFilterLayer.java:200)
	at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.doCloseSend(SSLEngineFilterLayer.java:213)
	at org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.doCloseSend(ProtocolStack.java:784)
	at org.jenkinsci.remoting.protocol.ApplicationLayer.doCloseWrite(ApplicationLayer.java:173)
	at org.jenkinsci.remoting.protocol.impl.ChannelApplicationLayer$ByteBufferCommandTransport.closeWrite(ChannelApplicationLayer.java:314)
	at hudson.remoting.Channel.close(Channel.java:1450)
	at hudson.remoting.Channel.close(Channel.java:1403)
	at hudson.slaves.SlaveComputer.closeChannel(SlaveComputer.java:821)
	at hudson.slaves.SlaveComputer.access$800(SlaveComputer.java:105)
	at hudson.slaves.SlaveComputer$3.run(SlaveComputer.java:737)
	at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
	at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:59)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Finished: FAILURE

The above is from this build of jenkinsci/git-plugin#655. Inspection of running steps confirms that it was entry into a node block (in this case on Windows) which did not succeed, failing here I think from the call to getWorkspaceFor.

According to the stack trace we have, a call to SlaveComputer.disconnect is at least involved, whether or not that is the root cause; but we lack a stack trace for that call (since closeChannel was run in another thread), and the details of the OfflineCause. Termination requests are sent to the system log here (and at a finer level elsewhere after jenkinsci/jenkins#1993), but they are then invisible in the build log and hard to associate with a particular build even for an admin.

This PR attempts to capture all relevant information about SlaveComputer.disconnect, in case that leads to a better diagnosis.

Copy link
Member

@dwnusbaum dwnusbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. I have seen been seeing similar errors quite often as well.

@dwnusbaum
Copy link
Member

dwnusbaum commented Jan 10, 2019

The PR was never built by ci.jenkins.io, repository event logs contain the following:

Getting remote pull request #94...

    Checking pull request #94
      ‘Jenkinsfile’ not found
    Does not meet criteria

    Checking pull request #94
      ‘Jenkinsfile’ not found
    Does not meet criteria

There are similar errors reported in JENKINS-54126, users there suspect that a recent change in github-branch-source to reenable a long-disabled cache might have caused the issue. I will try closing and reopening the PR to see if that does anything.

Edit: It looks like that worked...

@dwnusbaum dwnusbaum closed this Jan 10, 2019
@dwnusbaum dwnusbaum reopened this Jan 10, 2019
@dwnusbaum dwnusbaum merged commit 72b54ab into jenkinsci:master Jan 11, 2019
@jglick jglick deleted the terminatedBy branch February 5, 2019 17:20
@jglick
Copy link
Member Author

jglick commented Feb 5, 2019

Aha!

win2012-66fd90 was marked offline: Connection was broken: java.util.concurrent.TimeoutException: Ping started at 1549377562006 hasn't completed by 1549377802006
	at hudson.remoting.PingThread.ping(PingThread.java:134)
	at hudson.remoting.PingThread.run(PingThread.java:90)

CC @olblak

@jglick
Copy link
Member Author

jglick commented Apr 8, 2019

INFRA-2075 FTR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants