Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Operation timed out (Read failed) during push to us.gcr.io #2014

Closed
praveen4463 opened this issue Sep 22, 2019 · 10 comments
Closed

Operation timed out (Read failed) during push to us.gcr.io #2014

praveen4463 opened this issue Sep 22, 2019 · 10 comments

Comments

@praveen4463
Copy link

praveen4463 commented Sep 22, 2019

Environment:

  • Jib version: 1.6.1
  • Build tool: Maven 3.6.1
  • OS: macOS 10.14.6

Description of the issue:
I tried building an image for the first time for a new project. It failed thrice in a row with Operation timed out (Read failed). I increased the timeout to 60 sec but it failed again.
I then used dockerBuild that built to docker daemon and tried pushing using docker push. It took very long time (~ 13 minutes) and reattempted after a failure in middle but was able to successfully push. The image size showed by docker is 169MB, GCR shows virtual size 86.5MB.

After docker successfully pushed to GCR, I executed jib buildagain with default timeout and surprisingly it succeeded and pushed in no time (32 sec) (as there was no change to project) whereas it was timing out just before docker pushed the image.

Expected behavior:
I believe jib should've pushed it without any exception.

Steps to reproduce:

  1. It's a private project, thus I can't share the project to build the image from, but if there is something I should check configuration wise, please let me know.
  2. I just executed mvn -X -Djib.serialize=true -Djib.console=plain compile jib:build with the given configuration.
  3. I've given the log output post processing of project classes that may be helpful.

jib-maven-plugin Configuration:

    <plugin>
        <groupId>com.google.cloud.tools</groupId>
        <artifactId>jib-maven-plugin</artifactId>
        <version>1.6.1</version>
        <configuration>
          <from>
            <credHelper>gcloud</credHelper>
          </from>
          <to>
            <image>us.gcr.io/${projectId}/zl-wzgp:0.1.0-SNAPSHOT</image>
            <credHelper>gcloud</credHelper>
            <tags>
              <tag>latest</tag>
            </tags>
          </to>
          <container>
            <jvmFlags>
              <jvmFlag>-Xms512m</jvmFlag>
              <jvmFlag>-XX:+UnlockExperimentalVMOptions</jvmFlag>
              <jvmFlag>-XX:+UseCGroupMemoryLimitForHeap</jvmFlag>
              <jvmFlag>-Dspring.profiles.active=production</jvmFlag>
            </jvmFlags>
            <mainClass>com.zl.wzgp.Launcher</mainClass>
            <ports>
              <port>4000</port>
            </ports>
          </container>
        </configuration>
     </plugin>

Log output:

[DEBUG] TIMING Building and pushing image
[INFO] Retrieving registry credentials for us.gcr.io...
[DEBUG] TIMING Retrieving registry credentials for us.gcr.io
[DEBUG] Using credentials from docker-credential-gcloud for us.gcr.io
DEBUG] TIMING Retrieving registry credentials for us.gcr.io
[DEBUG] Using credentials from docker-credential-gcloud for us.gcr.io
[DEBUG] TIMED Retrieving registry credentials for us.gcr.io : 1161.0 ms
[DEBUG] TIMING Authenticating push to us.gcr.io
[DEBUG] TIMED Authenticating push to us.gcr.io : 1923.0 ms
[INFO] Getting base image gcr.io/distroless/java:8...
[DEBUG] TIMING Pulling base image manifest
[DEBUG] TIMED Pulling base image manifest : 6022.0 ms
[DEBUG] TIMING Preparing base image layer pullers
[DEBUG] TIMED Preparing base image layer pullers : 0.0 ms
[DEBUG] TIMING Pulling base image layer sha256:e8d8785a314f385d3675a017f4e2df1707c528c06e7a7989663fdab4900bd8ff
[DEBUG] TIMED Pulling base image layer sha256:e8d8785a314f385d3675a017f4e2df1707c528c06e7a7989663fdab4900bd8ff : 714.0 ms
[DEBUG] TIMING Pulling base image layer sha256:e005d777a298a3529b1c8cf890883359e050cc966089ce84fea4d17b111907db
[DEBUG] TIMED Pulling base image layer sha256:e005d777a298a3529b1c8cf890883359e050cc966089ce84fea4d17b111907db : 709.0 ms
[DEBUG] TIMING Pulling base image layer sha256:3e010093287c245d72a774033b4cddd6451a820bfbb1948c97798e1838858dd2
[DEBUG] TIMED Pulling base image layer sha256:3e010093287c245d72a774033b4cddd6451a820bfbb1948c97798e1838858dd2 : 717.0 ms
[DEBUG] TIMING Pulling base image layer sha256:408ad02cbf46b38f36ac42c09d911cf359a5be4dc3fee6e5956b1e4f96e98bdf
[DEBUG] TIMED Pulling base image layer sha256:408ad02cbf46b38f36ac42c09d911cf359a5be4dc3fee6e5956b1e4f96e98bdf : 964.0 ms
[DEBUG] TIMING Preparing application layer builders
[DEBUG] TIMED Preparing application layer builders : 1.0 ms
[INFO] Building dependencies layer...
[DEBUG] TIMING Building dependencies layer
[DEBUG] TIMED Building dependencies layer : 16.0 ms
[INFO] Building resources layer...
[DEBUG] TIMING Building resources layer
[DEBUG] Building resources layer built sha256:40230baf97c5b3d6f026fc9f734a12b9b485c7337d15c5c35e8a3e3ba7eab64f
[DEBUG] TIMED Building resources layer : 56.0 ms
[INFO] Building classes layer...
[DEBUG] TIMING Building classes layer
[DEBUG] TIMED Building classes layer : 3.0 ms
[DEBUG] TIMING Building container configuration
[INFO]
[INFO] Container entrypoint set to [java, -Xms512m, -XX:+UnlockExperimentalVMOptions, -XX:+UseCGroupMemoryLimitForHeap, -Dspring.profiles.active=production, -cp, /app/resources:/app/classes:/app/libs/*, com.zl.wzgp.Launcher]
[DEBUG] TIMED Building container configuration : 1.0 ms
[DEBUG] TIMING Preparing layer pushers
[DEBUG] TIMED Preparing layer pushers : 0.0 ms
[DEBUG] TIMING Preparing layer pushers
[DEBUG] TIMED Preparing layer pushers : 0.0 ms
[DEBUG] TIMING Pushing BLOB digest: sha256:8b37076acdbeb0264388f03aefe4a023d47fc59c426df22c6a4268ae07b5d280, size: 39479382
[DEBUG] BLOB : digest: sha256:8b37076acdbeb0264388f03aefe4a023d47fc59c426df22c6a4268ae07b5d280, size: 39479382 already exists on registry
[DEBUG] TIMED Pushing BLOB digest: sha256:8b37076acdbeb0264388f03aefe4a023d47fc59c426df22c6a4268ae07b5d280, size: 39479382 : 1143.0 ms
[DEBUG] TIMING Pushing BLOB digest: sha256:40230baf97c5b3d6f026fc9f734a12b9b485c7337d15c5c35e8a3e3ba7eab22f, size: 4250
[ERROR] I/O error for image [us.gcr.io/zl-ln-wz/zl-wzgp]:
[ERROR] Operation timed out (Read failed)
[DEBUG] TIMED Pushing BLOB digest: sha256:40230baf97c5b3d6f026fc9f734a12b9b485c7337d15c5c35e8a3e3ba7eab22f, size: 4250 : 43745.0 ms
[DEBUG] TIMING Pushing BLOB digest: sha256:c6c66cab87735c35fa5845823954ed52b01d1df7c7a064a655ae98eb1f2a11c6, size: 45543
[DEBUG] TIMING pushBlob
[DEBUG] TIMING pushBlob POST sha256:c6c66cab87735c35fa5845823954ed52b01d1df7c7a064a655ae98eb1f2a11c6
[DEBUG] TIMED pushBlob PATCH sha256:c6c66cab87735c35fa5845823954ed52b01d1df7c7a064a655ae98eb1f2a11c6 : 427.0 ms
[DEBUG] TIMED pushBlob PUT sha256:c6c66cab87735c35fa5845823954ed52b01d1df7c7a064a655ae98eb1f2a11c6 : 3646.0 ms
[DEBUG] TIMED pushBlob POST sha256:c6c66cab87735c35fa5845823954ed52b01d1df7c7a064a655ae98eb1f2a11c6 : 8499.0 ms
[DEBUG] TIMED pushBlob : 12572.0 ms
[DEBUG] TIMED Pushing BLOB digest: sha256:c6c66cab87735c35fa5845823954ed52b01d1df7c7a064a655ae98eb1f2a99c6, size: 45543 : 54100.0 ms
[DEBUG] TIMING Pushing container configuration
[DEBUG] TIMING Pushing BLOB digest: sha256:dff9b703c0343d28d384747d74159f1a57cd947bd0abebc09d138ab6f448602f, size: 1736
[DEBUG] BLOB : digest: sha256:dff9b703c0343d28d384747d74159f1a57cd947bd0abebc09d138ab6f448602f, size: 1736 already exists on registry
[DEBUG] TIMED Pushing BLOB digest: sha256:dff9b703c0343d28d384747d74159f1a57cd947bd0abebc09d138ab6f448602f, size: 1736 : 678.0 ms
[DEBUG] TIMED Pushing container configuration : 715.0 ms
[DEBUG] TIMED Building and pushing image : 112051.0 ms
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 01:54 min
[INFO] Finished at: 2019-09-21T22:51:52+05:30
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal com.google.cloud.tools:jib-maven-plugin:1.6.1:build (default-cli) on project us.gcr.io/zl-ln-wz/zl-wzgp: Operation timed out (Read failed) -> [Help 1]
org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal com.google.cloud.tools:jib-maven-plugin:1.6.1:build (default-cli) on project us.gcr.io/zl-ln-wz/zl-wzgp: Operation timed out (Read failed)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:215)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:156)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:148)
at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject (LifecycleModuleBuilder.java:117)
at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject (LifecycleModuleBuilder.java:81)
at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build (SingleThreadedBuilder.java:56)
at org.apache.maven.lifecycle.internal.LifecycleStarter.execute (LifecycleStarter.java:128)
at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:305)
at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:192)
at org.apache.maven.DefaultMaven.execute (DefaultMaven.java:105)
at org.apache.maven.cli.MavenCli.execute (MavenCli.java:956)
at org.apache.maven.cli.MavenCli.doMain (MavenCli.java:288)
at org.apache.maven.cli.MavenCli.main (MavenCli.java:192)
at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke (Method.java:498)
at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced (Launcher.java:282)
at org.codehaus.plexus.classworlds.launcher.Launcher.launch (Launcher.java:225)
at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode (Launcher.java:406)
at org.codehaus.plexus.classworlds.launcher.Launcher.main (Launcher.java:347)
Caused by: org.apache.maven.plugin.MojoExecutionException: Operation timed out (Read failed)
at com.google.cloud.tools.jib.maven.BuildImageMojo.execute (BuildImageMojo.java:158)
at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo (DefaultBuildPluginManager.java:137)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:210)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:156)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:148)
at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject (LifecycleModuleBuilder.java:117)
at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject (LifecycleModuleBuilder.java:81)
at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build (SingleThreadedBuilder.java:56)
at org.apache.maven.lifecycle.internal.LifecycleStarter.execute (LifecycleStarter.java:128)
at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:305)
at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:192)
at org.apache.maven.DefaultMaven.execute (DefaultMaven.java:105)
at org.apache.maven.cli.MavenCli.execute (MavenCli.java:956)
at org.apache.maven.cli.MavenCli.doMain (MavenCli.java:288)
at org.apache.maven.cli.MavenCli.main (MavenCli.java:192)
at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke (Method.java:498)
at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced (Launcher.java:282)
at org.codehaus.plexus.classworlds.launcher.Launcher.launch (Launcher.java:225)
at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode (Launcher.java:406)
at org.codehaus.plexus.classworlds.launcher.Launcher.main (Launcher.java:347)
Caused by: java.net.SocketException: Operation timed out (Read failed)
at java.net.SocketInputStream.socketRead0 (Native Method)
at java.net.SocketInputStream.socketRead (SocketInputStream.java:116)
at java.net.SocketInputStream.read (SocketInputStream.java:171)
at java.net.SocketInputStream.read (SocketInputStream.java:141)
at sun.security.ssl.InputRecord.readFully (InputRecord.java:465)
at sun.security.ssl.InputRecord.read (InputRecord.java:503)
at sun.security.ssl.SSLSocketImpl.readRecord (SSLSocketImpl.java:975)
at sun.security.ssl.SSLSocketImpl.readDataRecord (SSLSocketImpl.java:933)
at sun.security.ssl.AppInputStream.read (AppInputStream.java:105)
at org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer (AbstractSessionInputBuffer.java:161)
at org.apache.http.impl.io.SocketInputBuffer.fillBuffer (SocketInputBuffer.java:82)
at org.apache.http.impl.io.AbstractSessionInputBuffer.readLine (AbstractSessionInputBuffer.java:278)
at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead (DefaultHttpResponseParser.java:138)
at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead (DefaultHttpResponseParser.java:56)
at org.apache.http.impl.io.AbstractMessageParser.parse (AbstractMessageParser.java:259)
at org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader (AbstractHttpClientConnection.java:286)
at org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader (DefaultClientConnection.java:257)
at org.apache.http.impl.conn.AbstractClientConnAdapter.receiveResponseHeader (AbstractClientConnAdapter.java:230)
at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse (HttpRequestExecutor.java:273)
at org.apache.http.protocol.HttpRequestExecutor.execute (HttpRequestExecutor.java:125)
at org.apache.http.impl.client.DefaultRequestDirector.tryExecute (DefaultRequestDirector.java:684)
at org.apache.http.impl.client.DefaultRequestDirector.execute (DefaultRequestDirector.java:486)
at org.apache.http.impl.client.AbstractHttpClient.doExecute (AbstractHttpClient.java:835)
at org.apache.http.impl.client.CloseableHttpClient.execute (CloseableHttpClient.java:83)
at org.apache.http.impl.client.CloseableHttpClient.execute (CloseableHttpClient.java:108)
at org.apache.http.impl.client.CloseableHttpClient.execute (CloseableHttpClient.java:56)
at com.google.api.client.http.apache.ApacheHttpRequest.execute (ApacheHttpRequest.java:65)
at com.google.api.client.http.HttpRequest.execute (HttpRequest.java:1011)
at com.google.cloud.tools.jib.http.Connection.send (Connection.java:197)
at com.google.cloud.tools.jib.registry.RegistryEndpointCaller.call (RegistryEndpointCaller.java:270)
at com.google.cloud.tools.jib.registry.RegistryEndpointCaller.callWithAllowInsecureRegistryHandling (RegistryEndpointCaller.java:186)
at com.google.cloud.tools.jib.registry.RegistryEndpointCaller.call (RegistryEndpointCaller.java:163)
at com.google.cloud.tools.jib.registry.RegistryClient.callRegistryEndpoint (RegistryClient.java:474)
at com.google.cloud.tools.jib.registry.RegistryClient.checkBlob (RegistryClient.java:330)
at com.google.cloud.tools.jib.builder.steps.PushBlobStep.call (PushBlobStep.java:81)
at com.google.cloud.tools.jib.builder.steps.PushLayerStep.call (PushLayerStep.java:95)
at com.google.cloud.tools.jib.builder.steps.PushLayerStep.call (PushLayerStep.java:34)
at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly (TrustedListenableFutureTask.java:125)
at com.google.common.util.concurrent.InterruptibleTask.run (InterruptibleTask.java:69)
at com.google.common.util.concurrent.TrustedListenableFutureTask.run (TrustedListenableFutureTask.java:78)
at com.google.common.util.concurrent.MoreExecutors$DirectExecutorService.execute (MoreExecutors.java:322)
at java.util.concurrent.AbstractExecutorService.submit (AbstractExecutorService.java:134)
at com.google.common.util.concurrent.AbstractListeningExecutorService.submit (AbstractListeningExecutorService.java:66)
at com.google.common.util.concurrent.AbstractListeningExecutorService.submit (AbstractListeningExecutorService.java:36)
at java.util.stream.ReferencePipeline$3$1.accept (ReferencePipeline.java:193)
at java.util.Spliterators$ArraySpliterator.forEachRemaining (Spliterators.java:948)
at java.util.stream.AbstractPipeline.copyInto (AbstractPipeline.java:482)
at java.util.stream.AbstractPipeline.wrapAndCopyInto (AbstractPipeline.java:472)
at java.util.stream.ReduceOps$ReduceOp.evaluateSequential (ReduceOps.java:708)
at java.util.stream.AbstractPipeline.evaluate (AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.collect (ReferencePipeline.java:566)
at com.google.cloud.tools.jib.builder.steps.StepsRunner.scheduleCallables (StepsRunner.java:423)
at com.google.cloud.tools.jib.builder.steps.StepsRunner.lambda$pushApplicationLayers$11 (StepsRunner.java:360)
at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly (TrustedListenableFutureTask.java:125)
at com.google.common.util.concurrent.InterruptibleTask.run (InterruptibleTask.java:69)
at com.google.common.util.concurrent.TrustedListenableFutureTask.run (TrustedListenableFutureTask.java:78)
at com.google.common.util.concurrent.MoreExecutors$DirectExecutorService.execute (MoreExecutors.java:322)
at java.util.concurrent.AbstractExecutorService.submit (AbstractExecutorService.java:134)
at com.google.common.util.concurrent.AbstractListeningExecutorService.submit (AbstractListeningExecutorService.java:66)
at com.google.common.util.concurrent.AbstractListeningExecutorService.submit (AbstractListeningExecutorService.java:36)
at com.google.cloud.tools.jib.builder.steps.StepsRunner.pushApplicationLayers (StepsRunner.java:358)
at java.util.ArrayList.forEach (ArrayList.java:1257)
at com.google.cloud.tools.jib.builder.steps.StepsRunner.run (StepsRunner.java:175)
at com.google.cloud.tools.jib.api.Containerizer.run (Containerizer.java:324)
at com.google.cloud.tools.jib.api.JibContainerBuilder.containerize (JibContainerBuilder.java:496)
at com.google.cloud.tools.jib.api.JibContainerBuilder.containerize (JibContainerBuilder.java:477)
at com.google.cloud.tools.jib.plugins.common.JibBuildRunner.runBuild (JibBuildRunner.java:226)
at com.google.cloud.tools.jib.maven.BuildImageMojo.execute (BuildImageMojo.java:108)
at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo (DefaultBuildPluginManager.java:137)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:210)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:156)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:148)
at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject (LifecycleModuleBuilder.java:117)
at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject (LifecycleModuleBuilder.java:81)
at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build (SingleThreadedBuilder.java:56)
at org.apache.maven.lifecycle.internal.LifecycleStarter.execute (LifecycleStarter.java:128)
at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:305)
at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:192)
at org.apache.maven.DefaultMaven.execute (DefaultMaven.java:105)
at org.apache.maven.cli.MavenCli.execute (MavenCli.java:956)
at org.apache.maven.cli.MavenCli.doMain (MavenCli.java:288)
at org.apache.maven.cli.MavenCli.main (MavenCli.java:192)
at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke (Method.java:498)
at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced (Launcher.java:282)
at org.codehaus.plexus.classworlds.launcher.Launcher.launch (Launcher.java:225)
at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode (Launcher.java:406)
at org.codehaus.plexus.classworlds.launcher.Launcher.main (Launcher.java:347)
[ERROR]
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException

@briandealwis
Copy link
Member

briandealwis commented Sep 23, 2019

Hi @praveen4463. This is a known issue (#691). Your Jib build works after a docker-push as Jib skips layers that already exist in the remote registry.

Duplicate of #691

@praveen4463
Copy link
Author

praveen4463 commented Sep 23, 2019

Hi @briandealwis, do you really think its a duplicate of #691? That issue relates to expired credentials and 'unauthorized' exception while I am getting 'operation time-out'. Also, I am using the default distroless OpenJDK image not any large image like the other issue originally states. Thus there shouldn't be anything taking very long time leading to expired credentials (also I am using 'gcloud' as credential helper).

How come it's a duplicate? I am curious. I am not able to understand the cause of this issue and if this is going to happen how will I create images using jib? Can you please explain what could have made my build fail as everything look normal?

@briandealwis
Copy link
Member

I apologize: I leapt to a conclusion as you posted that the docker push took 13 minutes, and the common failure case there is credential expiry.

We know that Jib needs to be more robust in the face of network failures such as timeouts like you're reporting. We have an issue open to retry pushes in the face of timeouts (#1409). I'll add a note there that we should also investigate using chunked uploads.

@chanseokoh
Copy link
Member

chanseokoh commented Sep 23, 2019

Just wanted to answer your question below, just in case you didn't realize:

I am not able to understand the cause of this issue ... Can you please explain what could have made my build fail as everything look normal?

When a Java application makes network connections and/or read/write data, the application may set some timeout values for such network operations at the JVM level. And if such a network operation does not progress by the set timeout, the JVM network stack cuts off the operation. In your case, a simple HTTP GET connection to GCR (not even uploading anything) timed out at the socket level after Jib was not able to read any data from the GCR server in a "timely" fashion, hence the following failure:

Caused by: java.net.SocketException: Operation timed out (Read failed)
at java.net.SocketInputStream.socketRead0 (Native Method)
...
at com.google.cloud.tools.jib.registry.RegistryClient.checkBlob (RegistryClient.java:330)

From your accounts and logs, looks like it is your network that is universally slow, not that Jib's network operations are unexpectedly slow.

You can increase the Java net timeout values to whatever you want for Jib (for example, -Djib.httpTimeout=7200000 for 2 hours). However, your OS kernel may impose various additional network timeouts at the operating system level (outside a JVM).

BTW, it is a bit surprising that a simple HTTP GET may actually time out in your environment. If I were you, I would avoid using GCR, given the extreme degree of your network unreliability with respect to GCR.

@briandealwis
Copy link
Member

I think @chanseokoh specifically means network unreliability.

@praveen4463
Copy link
Author

praveen4463 commented Sep 24, 2019

@chanseokoh Thanks for looking into it, I agree with some of your observations regarding the read operation failure but its worth noting from the logs that jib was able to push all layers except just one 40230baf97c5b3d6f026fc9f734a12b9b485c7337d15c5c35e8a3e3ba7eab22f where an I/O error occurred and due to that it termed the whole operation as a failure. Let me explain this in detail:

There were 4 layers and jib tried to push them one by one to GCR, it first read the registry to see if a layer exists, if exist and same, it skips the push otherwise it tries to push. Here are those layers as per my log,

8b37076acdbeb0264388f03aefe4a023d47fc59c426df22c6a4268ae07b5d280 - already exists on registry (pushed successfully by last build failure), size 39479382

40230baf97c5b3d6f026fc9f734a12b9b485c7337d15c5c35e8a3e3ba7eab22f - I/O error, Operation timed out, after spending 43745.0 ms (over 43 seconds), size 4250

c6c66cab87735c35fa5845823954ed52b01d1df7c7a064a655ae98eb1f2a11c6 - pushed successfully during the build, size 45543

dff9b703c0343d28d384747d74159f1a57cd947bd0abebc09d138ab6f448602f - already exists on registry (pushed successfully by last build failure), size 1736

so the layer starting with '402' specifically was the one that failed. Let's now work on why that layer failed. Please read the exception from the bottom to see what was on the stack before the I/O exception occurred. I can read that jib invoked GCR to check the layer, it got a response, but probably that response was not what jib expected leading to failure in reading within allotted time.

  • at com.google.cloud.tools.jib.registry.RegistryClient.checkBlob

  • at com.google.cloud.tools.jib.registry.RegistryClient.callRegistryEndpoint

  • at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse

  • at org.apache.http.impl.io.AbstractSessionInputBuffer.readLine

  • at org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer

  • at sun.security.ssl.SSLSocketImpl.readRecord (SSLSocketImpl.java:975)

  • at sun.security.ssl.InputRecord.readFully (InputRecord.java:465)

  • at java.net.SocketInputStream.socketRead0 (Native Method)

  • Caused by: java.net.SocketException: Operation timed out (Read failed)

So, it seems the issue is not with the network but with the code that reads the response from registry, probably some encoding issue, wrong calculation of input stream size or something else that's causing the read failure after a response is received. There doesn't really seems anything with the network to me.

Also, looks like there is no retry mechanism in jib after such a failure where jib try to read from the registry again in hope of getting expected response just like docker did. if you read my comments regarding docker push, you will note that I said docker got stuck on a particular layer for several minutes, it failed on that and reattempted as well, before it was able to push it.

So my observation says, it is not my network fault really or something that I and GCR has any connectivity issues (cause I do lot of stuff on GCR only, for example starting tens of VM's together, sql operations, pushing big files to buckets etc. without any issue). It could be something within the 'layer read operation within jib' that might need to be fixed.

Regarding increasing the timeout, I had already made it to 10 minutes that day but it failed around the same time when it was 1 minute, which tells that it's something within the 'response read' operation of jib and not related with network.

please let me know what you think. Can this be reopened and looked into?

@briandealwis
Copy link
Member

@praveen4463 at this point, the evidence points at network issues:

  • Your docker push encountered a failure mid-way. Jib's jib:build does not use Docker, so Docker encountering a failure in push is a red flag.
  • Your push took 13 minutes. I just pushed a test app based on distroless/java to GCR and it took 7s.
  • You are using Jib 1.6.1, which uses versions of Google HTTP Client and Apache HTTP Client that have been well-tested and validated by us.
  • We test Jib against GCR extensively and do not see similar issues.

If you can provide a reproducible test case we can definitely look at it more. It might be better to try using Wireshark or some other networking tool to to investigate underlying network reports.

@praveen4463
Copy link
Author

praveen4463 commented Sep 24, 2019

@briandealwis I heard you there, I'll try do investigate that way and inform if something is worth sharing.
Thanks @briandealwis, @chanseokoh for your time

@chanseokoh
Copy link
Member

chanseokoh commented Sep 24, 2019

I'll try my best to make you convinced that it was the network read timeout while attempting to read an HTTP response from the server, so please bear with me if what I'm going to explain sounds too pedantic.

Please read the exception from the bottom to see what was on the stack before the I/O exception occurred. I can read that jib invoked GCR to check the layer, it got a response,

that's causing the read failure after a response is received. There doesn't really seems anything with the network to me.

It is unclear to me from which part of the log above or by what kind of reasoning you came to the conclusion that Jib got some kind of "response" (be it a complete HTTP response with all valid HTTP headers or just a partial segment of it) from the HTTP GET request initiated by Jib for checking a blob, but rather, what is evident from the log is that the there was a read timeout (socket timeout) while the Apache HTTP client library ( java.net.SocketInputStream.read() in the end) was attempting to read any incoming bytes (HTTP response when completed) through the socket input stream. Perhaps you were confused by the method name doReceiveResponse() and assumed it already got a "response" (whatever it is), but as you can see in this Apache code, the Apache library was in the middle of trying to read and parse a response from the server after sending a request. Jib delegates the low-level data exchange on the HTTP protocol to the Apache HTTP library, and the stacktrace proves that the whole cycle of one HTTP request-response round-trip pair did not complete successfully but rather cut short in the middle. It is not that Jib futilely tried to read more bytes from the inactive (empty) socket input stream only to be stuck after one clean HTTP request-response cycle has completed. The cycle did not complete, and the library code was stuck on attempting to complete the response part at a lower level.

the 'layer read operation within jib' that might need to be fixed.

The blob check HTTP GET is only for checking if a layer exists on the registry. The server does not return any content but rather just returns an HTTP code. There isn't a layer or something else to read. Jib just checks the return code.

But don't get me wrong. I don't rule out something really weird is gong on on your side, if this is more or less consistently reproducible (e.g., you always time out with that particular HTTP GET). For example,

  • The Apache HTTP library is buggy and not handling the HTTP protocol correctly.
  • Someone in the middle (proxy, firewall, etc.) blocks data exchange.
  • GCR responds with a wrong content length or the content is cut short or incorrect for whatever reason.
  • For whatever reason, GCR doesn't like that particular HTTP GET request and doesn't send data.
  • Your resources (including network-related resources) are saturated.
  • Something else.

But given your accounts, I think the chances of these are slimmer than a common read timeout, so I'm just leaning heavily toward the network issue. But as @briandealwis said, if this is consistently reproducible, there are ways we can dig into the low level HTTP protocol stuff to better understand what is really happening.

Also, looks like there is no retry mechanism in jib after such a failure where jib try to read from the registry again in hope of getting expected response just like docker did.

Your observation is right. This is a known issue (#1409, mostly) as @briandealwis mentioned. I can think of a few workarounds for now, just in case you are blocked:

  1. while ! mvn jib:build; do echo "trying again"; done
    • Not that bad, since layers are cached and once layers are uploaded, they will be skipped.
  2. Push a built image once with docker push and run Jib. Subsequent Jib runs will be fast.
  3. mvn jib:dockerBuild && docker push ...
    • You build to your local Docker, and let docker push to do the upload.

Regarding increasing the timeout, I had already made it to 10 minutes that day but it failed around the same time when it was 1 minute, which tells that it's something within the 'response read' operation of jib and not related with network.

As I said, often you need to lift the timeouts imposed by the kernel. On my machine for example, I found the connection timeout is set to 30 seconds, so setting the timeout over 30 seconds at the JVM level is meaningless.

@praveen4463
Copy link
Author

praveen4463 commented Sep 25, 2019

@chanseokoh

It is unclear to me from which part of the log above or by what kind of reasoning you came to the conclusion that Jib got some kind of "response"

Sorry, I think its clear now that it failed before a chunk was filled into buffer, I read readLine/fillBuffer and came to conclusion that a response was received but it seems those several method calls during socket read operation that were on stack before the exception.

The blob check HTTP GET is only for checking if a layer exists on the registry. The server does not return any content but rather just returns an HTTP code. There isn't a layer or something else to read. Jib just checks the return code.

Oh, I got you, didn't know it checks just the return code, but I was also not specifically expecting it to read a returned layer, I thought it matches the hash of it that was returned by the server to see whether its stale or latest.

Thanks for those insights, I'll try some more build and let you guys know if anything is reproducible given I get a similar issue.

Thanks for the workarounds as well, I'd implement the first one to invoke jib in a loop.

As I said, often you need to lift the timeouts imposed by the kernel. On my machine for example, I found the connection timeout is set to 30 seconds, so setting the timeout over 30 seconds at the JVM level is meaningless.

Yeah I remember you said that in last post, that's a good piece of information. I'll bump that too.

Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants