-
Notifications
You must be signed in to change notification settings - Fork 166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
build dependency on single machine - node-ci-test-equinix-ubuntu2004_container-armv7l-1 #2835
Comments
Yep 100% we need more of these now that it's proved to have fundamentally worked, so this is a good time to look at scaling it up in a repeatable way. |
CI has been stuck now for a few days since test-equinix-ubuntu2004_container-armv7l-1 has been unavailable. I don't suppose anyone is around who knows how to fix it? I'm guessing not and either I'll figure it out somehow (if I even have the right permissions) or (more likely) I'll mess it up worse and/or we will otherwise have to wait until January to sort this out. |
Container came back properly after managing to get the host started again and is chewing through a backlog of jobs. Interested to know why the container is going down though. While adding extra redundancy into the system is obviously a good thing, I'm nervous about the fact this isn't the first time that container has disconnected itself... |
The host is offline again and blocking CI: https://ci.nodejs.org/job/node-test-commit-arm/nodes=ubuntu2004-armv7l/ |
Logged into the host machine: root@test-equinix-ubuntu2004-docker-arm64-1:~# docker logs node-ci-test-equinix-ubuntu2004_container-armv7l-1
...
Jan 14, 2022 4:53:27 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Connected
Jan 15, 2022 5:47:26 PM org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer onRecv
WARNING: [JNLP4-connect connection to ci.nodejs.org/107.170.240.62:41913]
java.lang.NullPointerException
at org.jenkinsci.remoting.util.DirectByteBufferPool.acquire(DirectByteBufferPool.java:78)
at org.jenkinsci.remoting.protocol.IOHub.acquire(IOHub.java:165)
at org.jenkinsci.remoting.protocol.ProtocolStack.acquire(ProtocolStack.java:439)
at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.processRead(SSLEngineFilterLayer.java:331)
at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.onRecv(SSLEngineFilterLayer.java:117)
at org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.onRecv(ProtocolStack.java:677)
at org.jenkinsci.remoting.protocol.NetworkLayer.onRead(NetworkLayer.java:136)
at org.jenkinsci.remoting.protocol.impl.BIONetworkLayer.access$2200(BIONetworkLayer.java:49)
at org.jenkinsci.remoting.protocol.impl.BIONetworkLayer$Reader.run(BIONetworkLayer.java:291)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at hudson.remoting.Engine$1.lambda$newThread$0(Engine.java:122)
at java.lang.Thread.run(Thread.java:748)
Jan 15, 2022 5:47:26 PM org.jenkinsci.remoting.protocol.impl.BIONetworkLayer$Reader run
SEVERE: [JNLP4-connect connection to ci.nodejs.org/107.170.240.62:41913] Reader thread killed by NullPointerException
java.lang.NullPointerException
at org.jenkinsci.remoting.util.DirectByteBufferPool.acquire(DirectByteBufferPool.java:78)
at org.jenkinsci.remoting.protocol.IOHub.acquire(IOHub.java:165)
at org.jenkinsci.remoting.protocol.ProtocolStack.acquire(ProtocolStack.java:439)
at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.processRead(SSLEngineFilterLayer.java:331)
at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.onRecv(SSLEngineFilterLayer.java:117)
at org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.onRecv(ProtocolStack.java:677)
at org.jenkinsci.remoting.protocol.NetworkLayer.onRead(NetworkLayer.java:136)
at org.jenkinsci.remoting.protocol.impl.BIONetworkLayer.access$2200(BIONetworkLayer.java:49)
at org.jenkinsci.remoting.protocol.impl.BIONetworkLayer$Reader.run(BIONetworkLayer.java:291)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at hudson.remoting.Engine$1.lambda$newThread$0(Engine.java:122)
at java.lang.Thread.run(Thread.java:748)
Jan 15, 2022 5:49:29 PM hudson.slaves.ChannelPinger$1 onDead
INFO: Ping failed. Terminating the channel JNLP4-connect connection to ci.nodejs.org/107.170.240.62:41913.
java.util.concurrent.TimeoutException: Ping started at 1642268849904 hasn't completed by 1642268969905
at hudson.remoting.PingThread.ping(PingThread.java:134)
at hudson.remoting.PingThread.run(PingThread.java:90)
Jan 15, 2022 5:51:29 PM hudson.slaves.ChannelPinger$1 onDead
INFO: Ping failed. Terminating the channel JNLP4-connect connection to ci.nodejs.org/107.170.240.62:41913.
java.util.concurrent.TimeoutException: Ping started at 1642268969907 hasn't completed by 1642269089908
at hudson.remoting.PingThread.ping(PingThread.java:134)
at hudson.remoting.PingThread.run(PingThread.java:90)
Jan 15, 2022 5:53:28 PM hudson.Launcher$RemoteLaunchCallable$1 join
INFO: Failed to synchronize IO streams on the channel hudson.remoting.Channel@a60e45:JNLP4-connect connection to ci.nodejs.org/107.170.240.62:41913
hudson.remoting.ChannelClosedException: Channel "unknown": Protocol stack cannot write data anymore. It is not open for write
at org.jenkinsci.remoting.protocol.impl.ChannelApplicationLayer$ByteBufferCommandTransport.write(ChannelApplicationLayer.java:331)
at hudson.remoting.AbstractByteBufferCommandTransport.write(AbstractByteBufferCommandTransport.java:301)
at hudson.remoting.Channel.send(Channel.java:766)
at hudson.remoting.Request.call(Request.java:167)
at hudson.remoting.Channel.call(Channel.java:1000)
at hudson.remoting.Channel.syncIO(Channel.java:1739)
at hudson.Launcher$RemoteLaunchCallable$1.join(Launcher.java:1406)
at sun.reflect.GeneratedMethodAccessor41.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at hudson.remoting.RemoteInvocationHandler$RPCRequest.perform(RemoteInvocationHandler.java:929)
at hudson.remoting.RemoteInvocationHandler$RPCRequest.call(RemoteInvocationHandler.java:902)
at hudson.remoting.RemoteInvocationHandler$RPCRequest.call(RemoteInvocationHandler.java:854)
at hudson.remoting.UserRequest.perform(UserRequest.java:211)
at hudson.remoting.UserRequest.perform(UserRequest.java:54)
at hudson.remoting.Request$2.run(Request.java:376)
at hudson.remoting.InterceptingExecutorService.lambda$wrap$0(InterceptingExecutorService.java:78)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at hudson.remoting.Engine$1.lambda$newThread$0(Engine.java:122)
at java.lang.Thread.run(Thread.java:748)
Jan 15, 2022 5:53:28 PM hudson.remoting.Request$2 run
INFO: Failed to send back a reply to the request UserRequest:UserRPCRequest:hudson.Launcher$RemoteProcess.join[](55): hudson.remoting.ChannelClosedException: Channel "unknown": Protocol stack cannot write data anymore. It is not open for write I've restarted the container ( cc @sxa |
This is offline again 😞. |
@sxa has got the host and its containers back online ❤️. |
I've just run the Ansible playbooks on the second Equinix hosted docker host and we now have |
I had thought that we added containers on our large machine to help with the arm build but it looks like we have an additional job to test on those containers instead.
Since we only have one of those containers set up, when it lost it's jenkins connection the build backed up (just firgured that out and restarted the container that had lost the jenkins connection).
We should like add other instances and consider if we can use them in the fanned testing as well.
@sxa what do you think?
The text was updated successfully, but these errors were encountered: