Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Maven Connection timed out #1499

Closed
BrightRan opened this issue Aug 27, 2020 · 25 comments · Fixed by hashgraph/hedera-mirror-node#2501
Closed

Maven Connection timed out #1499

BrightRan opened this issue Aug 27, 2020 · 25 comments · Fixed by hashgraph/hedera-mirror-node#2501
Assignees
Labels
investigate Collect additional information, like space on disk, other tool incompatibilities etc. OS: Ubuntu

Comments

@BrightRan
Copy link

Associated community ticket: https://github.saobby.my.eu.orgmunity/t/maven-connection-timed-out/129040

Recently the customer is receiving a lot of Connection timed out errors in CI builds from maven when pulling dependencies from https://repo.maven.apache.org and similar public repositories. He has never experienced this error locally even when purging local .m2/repository .
The customer is using GitHub-hosted runners in his CI workflow.

@andy-mishechkin
Copy link
Contributor

Hello, @BrightRan
Would you, please provide the link to workflow, where you've got the connection timed out errors. Also may you clarify the time, when this errors is impacted.
These details help us to investigate this issue.
Thank you.

@Ginxo
Copy link

Ginxo commented Aug 28, 2020

Hi, I'm facing the same issue for several projects, here an example
https://github.com/kiegroup/appformer/pull/1037/checks?sha=29e3f45162a5e1310d38ec1982adbe0102378662
(I've attached the log in case it's removed and pull_request.yml files appformer_log_and_flow.zip)

We faced the same problem in the past in our internal CI environment (running with jenkins), it seems Maven Central bans and holds the request if there is too many requests, we solved the problem adding a Nexus proxy mirroring maven central. Do you think it's the same? Do you (github) have any kind of internal proxy to deal with it?

@andy-mishechkin andy-mishechkin added investigate Collect additional information, like space on disk, other tool incompatibilities etc. Area: Image administration OS: Ubuntu and removed needs triage Area: Image administration labels Aug 28, 2020
@Ginxo
Copy link

Ginxo commented Aug 28, 2020

Just to provide you more info, same job working well (just java8, java11 is known issue not related with this) self-hosted (my laptop)
https://github.com/kiegroup/appformer/pull/1037/checks?check_run_id=1040516642
So it seems there's some github stuff in the middle 🤔

@Darleev
Copy link
Contributor

Darleev commented Aug 31, 2020

Hello @Ginxo
How often do you see this problem? Does it appear only for a particular package?
I believe the issue can be addressed directly to Maven Central Community Support, but need to clarify the details above.

We are looking forward to your reply.

@Ginxo
Copy link

Ginxo commented Sep 1, 2020

HI @Darleev
we have several cases here:

  • appformer case. It's very consistent and happens at the same point/same artifact every time the job is run, it can't get org.codehaus.plexus:plexus-interpolation:jar:1.25 artifacts running org.apache.maven.plugins:maven-war-plugin:3.2.2:war
  • kie-wb-common case this is not consistent but it fails every time the job is run
    • Failed to collect dependencies at org.seleniumhq.selenium:selenium-java:jar:3.13.0: Failed to read artifact descriptor for org.seleniumhq.selenium:selenium-java:jar:3.13.0: Could not transfer artifact org.seleniumhq.selenium:selenium-java:pom:3.13.0 from/to central
    • Failed to collect dependencies at commons-jxpath:commons-jxpath:jar:1.3: Failed to read artifact descriptor for commons-jxpath:commons-jxpath:jar:1.3: Could not transfer artifact commons-jxpath:commons-jxpath:pom:1.3 from/to central
    • you can take any Maven Build (8) from here https://github.com/kiegroup/kie-wb-common/pull/3401/checks?check_run_id=1050198863
  • drools case this is even less consistent than kie-wb-common case. It works most of the times but fails sometimes
    • Failed to collect dependencies at org.asciidoctor:asciidoctorj:jar:2.2.0 -> org.jruby:jruby:jar:9.2.9.0 -> org.jruby:jruby-core:jar:9.2.9.0 -> org.jruby.joni:joni:jar:2.1.30: Failed to read artifact descriptor for org.jruby.joni:joni:jar:2.1.30: Could not transfer artifact org.jruby.joni:joni:pom:2.1.30 from/to central
    • Failed to read artifact descriptor for org.antlr:antlr4-maven-plugin:jar:4.8: Could not transfer artifact org.antlr:antlr4-maven-plugin:pom:4.8 from/to central
    • you can take any Maven Build (8) from here https://github.com/kiegroup/drools/pull/3063

@Darleev Expcept the appformer case, which is very consistent, it's not happening for the same artifact and as I told you I tried with a self-hosted runner and I couldn't reproduce error so it makes me wonder if you have any kind of proxy configuration on your side.
@Darleev thanks for the support, this is blocking our CI
@mareknovotny ^

@Ginxo
Copy link

Ginxo commented Sep 1, 2020

just to provide you more information. I run the build inside a docker container from github action using build-chain@openjdk8 and it's working https://github.com/kiegroup/appformer/pull/1037/checks?check_run_id=1055423092
Every step I do points me there's something "weird" on your (github) side

@Darleev
Copy link
Contributor

Darleev commented Sep 1, 2020

@Ginxo thank you for the information provided.
We are in the midst of the investigation. I will keep you informed.

@Darleev
Copy link
Contributor

Darleev commented Sep 2, 2020

@Ginxo let me clarify some details to speed the investigation:

  1. Is it possible to implement retry logic in your pipeline for Maven build operation?
  2. How is it possible to reproduce the issue on our side? Maybe is it possible to provide us step by step instruction or repository with a code sample where the issue is actual?

Since everything works fine in docker, I believe network issue on the agent machines is not a case here.

@Ginxo
Copy link

Ginxo commented Sep 2, 2020

@Ginxo let me clarify some details to speed the investigation:

  1. Is it possible to implement retry logic in your pipeline for Maven build operation?
  2. How is it possible to reproduce the issue on our side? Maybe is it possible to provide us step by step instruction or repository with a code sample where the issue is actual?

Since everything works fine in docker, I believe network issue on the agent machines is not a case here.

Hi @Darleev replying your questions

  1. Yes it is, but I'm afraid it does not make sense since the appformer one is consistently failing
  2. Do this:
name: Build Chain

on: [pull_request]

jobs:
  build-chain:
    strategy:
      matrix:
        java-version: [8]
      fail-fast: false
    runs-on: ubuntu-latest
    name: Maven Build
    steps:
      - name: Set up JDK
        uses: actions/setup-java@v1
        with:
          java-version: ${{ matrix.java-version }}
      - name: Build Chain ${{ matrix.java-version }}
        id: build-chain
        uses: kiegroup/github-action-build-chain@v1.4
        with:
          build-command: 'mvn -e -nsu -Dfull -Pwildfly install -Prun-code-coverage  -Dcontainer.profile=wildfly -Dcontainer=wildfly -Dintegration-tests=true -Dmaven.test.failure.ignore=true'
          workflow-file-name: "pull_request.yml"
        env:
          GITHUB_TOKEN: "${{ secrets.GITHUB_TOKEN }}"
  • Create a PR over this forked project
  • Wait for job completion/failre

In case you want to try the docker example this would be the action

name: Build Chain

on: [pull_request]

jobs:
  build-chain:
    runs-on: ubuntu-latest
    name: Maven Build (8)
    steps:
      - name: Build Chain
        id: build-chain
        uses: kiegroup/github-action-build-chain@openjdk8
        with:
          build-command: 'mvn -e -nsu -Dfull -Pwildfly install -Prun-code-coverage  -Dcontainer.profile=wildfly -Dcontainer=wildfly -Dintegration-tests=true -Dmaven.test.failure.ignore=true'
          workflow-file-name: "pull_request.yml"
        env:
          GITHUB_TOKEN: "${{ secrets.GITHUB_TOKEN }}"

let me know if you need anything else. Cheers, Kike.

@LeonidLapshin
Copy link
Contributor

Hi, @Ginxo!
Thank you for detailed build instruction, I have tried to reproduce the problem (network timeouts) with Appformer (the problem was persistent with this repo), but have no luck, build stage is successful :(
Link to successful builds (openjdk-8 only) in a forked repo:
https://github.com/LeonidLapshin/appformer/actions
Used steps:

  1. Forked the Appformer repo as well as some other, because they are involved in build process, they are:
  • droolsjbpm-build-bootstrap
  • kie-soup
  • lienzo-tests
  • lienzo-core
  1. made a PR with provided workflow (the one without docker):
name: Build Chain
on: [push]
jobs:
 build-chain:
   strategy:
     matrix:
       java-version: [8]
     fail-fast: false
   runs-on: ubuntu-latest
   name: Maven Build
   steps:
     - name: Set up JDK
       uses: actions/setup-java@v1
       with:
         java-version: ${{ matrix.java-version }}
     - name: Build Chain ${{ matrix.java-version }}
       id: build-chain
       uses: kiegroup/github-action-build-chain@v1.4
       with:
         build-command: 'mvn -e -nsu -Dfull -Pwildfly install -Prun-code-coverage -Dcontainer.profile=wildfly -Dcontainer=wildfly -Dintegration-tests=true -Dmaven.test.failure.ignore=true'
         workflow-file-name: "pull_request.yml"
       env:
         GITHUB_TOKEN: "${{ secrets.GITHUB_TOKEN }}"
  1. Build stage for openjdk-8 completes successfully 3 times in a row (for openjdk-8), openjdk-11 failed, but it’s not a case, as I understood, please correct me if I am wrong.
    I guess that there is a possibility that positive experience with successful docker builds was at moment, when the usual build could be successful too (just a hypothesis).
    It seems that the network problem was temporary or it happens in special conditions, which are unknown.
    Could you please try to build Appformer once again with Github Actions ubuntu-latest image (without docker) to distinguish is there still a problem?
    Thank you!

@Ginxo
Copy link

Ginxo commented Sep 3, 2020

I have created a new PR on my own, let's see how it goes LeonidLapshin/appformer#4

@Ginxo
Copy link

Ginxo commented Sep 4, 2020

So that one LeonidLapshin/appformer#4 works and suddenly this (the one which was consistently failing) also works https://github.com/kiegroup/appformer/pull/1037/checks?check_run_id=1070610025
Now the question is what is working randomly (or it was not working)? can we trust on github runner?
Thanks guys @LeonidLapshin @Darleev

@Ginxo
Copy link

Ginxo commented Sep 4, 2020

This case (same flow, different project) persists https://github.com/kiegroup/drools/pull/3063/checks?check_run_id=1071097964

@LeonidLapshin
Copy link
Contributor

Hey, @Ginxo !
I made a research within that problem and it seems that it is an Azure-related more than Github-related issue, I guess that the root cause of these read timeouts is the SNAT behavior for network connections on Azure.

The similar problems:
https://developercommunity.visualstudio.com/content/problem/357696/maven-project-build-failing-with-connection-reset.html
https://issues.apache.org/jira/browse/WAGON-486
https://issues.apache.org/jira/browse/WAGON-545

Maven creates long-living connections and if they are idle more than 4 minutes (while Maven is busy for a while) they became flushed from Azure VM Balancer’s SNAT, but RST packet is not sent to Maven (on VM side) or remote host (packages destination) so the socket is open but no data is sent over it.

Few assumptions for that error:

  • Maven handle connections for pooling
  • Connections are idle more than 4 minutes sometimes
  • Maven doesn’t implement application layer healthchecks so there is no data sent over the opened connection (not sure)
  • Azure balancer’s SNAT flush the connections that are idle more than 4 minutes and do not implement RST
  • Socket is open but no data comes from destination
  • Maven throws an error because of no data

You can use a workaround, please add:
-Dhttp.keepAlive=false -Dmaven.wagon.http.pool=false -Dmaven.wagon.httpconnectionManager.ttlSeconds=120
to your build command, it will force Maven to create new TCP connections from scratch every 2 min (with 4 min threshold).

For now I hope that you can try to implement Maven’s flags, it will slow a build process, but the time spent on TCP recreation will be tiny (not as much as 1% of total build time I guess).

The SNAT feature (I guess this feature is absent on current VMs):
https://docs.microsoft.com/en-us/azure/load-balancer/load-balancer-tcp-reset

In future we'll discuss the SNAT properties with the team and will try to implement this feature.
Thank you!

@Ginxo
Copy link

Ginxo commented Sep 14, 2020

Thanks @LeonidLapshin
https://issues.redhat.com/browse/BXMSPROD-996 created to test it. I let you know as soon as we test it

@LeonidLapshin
Copy link
Contributor

@Ginxo, happy to hear it, please feel free to open a new ticket if the problem persist :)

StevenMassaro added a commit to liquibase/liquibase that referenced this issue May 4, 2022
…0239) (#2821)

Oracle integration tests frequently fail with the following error:

transfer failed for https://repo.maven.apache.org/maven2/org/firebirdsql/jdbc/jaybird/4.0.6.java8/jaybird-4.0.6.java8.pom: Connection timed out (Read failed)

This issue in another repository provided the suggestion on how to fix this: actions/runner-images#1499 (comment)
@jeacott1
Copy link

@Ginxo this issue is not resolved for me - I have a large long running build, hundreds of modules. the suggested workaround does not work for me. indeed it has become noticeably worse in the past few months.

@Ginxo
Copy link

Ginxo commented May 12, 2022

@jeacott1 could you please share your job URL? or to paste your job content? I have this working for almost two years for really huge maven builds, may be I can help 🤔

@jeacott1
Copy link

jeacott1 commented May 12, 2022

@Ginxo its a private repo - will a job url help?
also - I'm running into gha stalling for hours like this

2022-05-12T05:04:56.2647854Z Waiting for a runner to pick up this job...
2022-05-12T05:04:56.9134609Z Job is waiting for a hosted runner to come online.
2022-05-12T05:04:58.9053850Z Job is about to start running on the hosted runner: Hosted Agent (hosted)

checking a failed run earlier today that I had to shoot -note 2.5 hours between the last maven log and me shooting it.

2022-05-12T02:38:22.0146228Z Downloading from central: https://repo1.maven.org/maven2/org/eclipse/platform/org.eclipse.core.contenttype/maven-metadata.xml
2022-05-12T02:38:22.0159911Z Downloading from nexus-3rd-party: http://172.22.22.9/nexus/content/repositories/thirdparty/org/eclipse/platform/org.eclipse.core.contenttype/maven-metadata.xml
2022-05-12T02:38:23.5242763Z Progress (1): 818 B
2022-05-12T05:04:50.1727931Z ***[error]The operation was canceled.

@Ginxo
Copy link

Ginxo commented May 12, 2022

@jeacott1 you can share GHA workflow yaml content. Anyway those logs you shared does not seem to be related with maven timeout issue but with the job waiting for an available runner. This could be due to you already consumed your GHA quota. In this case you always the chance increase quota or to use your own runners, see https://docs.github.com/en/actions/hosting-your-own-runners/about-self-hosted-runners but this is a different topic.

@jeacott1
Copy link

@Ginxo check above (sorry I edited it).

the config file is large, but this is the crux of it at the failure point (always in the mvn deploy ... )
fyi - this config used to work just fine - lately it doesnt work at all.
I used to run it with mvn -T 8, but that seems to make things much worse.
with -T 8 it used to take just 30 minutes for this to run.

on: 
  pull_request:
      types: [opened, synchronize, reopened]
  push:
    branches:
      - main

env:
    MAVEN_OPTS: >-
        -Dhttp.keepAlive=false
        -Dmaven.wagon.http.pool=false
        -Dmaven.wagon.httpconnectionManager.ttlSeconds=120
jobs:
  build:
  ...
      - name: Set up JDK 8
      uses: actions/setup-java@v2
      with:
        distribution: 'temurin'
        java-version: 8
        # cache: 'maven'
  
      - name: Get version
      id: get_version
      run: |
          VERSION=$( mvn -pl :generator -P !include-agg help:evaluate -Dexpression=project.version -Dbuild.num=${{steps.deploy_qual.outputs.value}} -q -DforceStdout --file ls-mvn/pom.xml )
          echo "::set-output name=version::$VERSION"

    - name: Get version coords
      id: get-coords
      run: |
          VERSION=$(bash -l -c 's="${{ steps.get_version.outputs.version }}"; echo ${s%-*}')
          echo "::set-output name=version::$VERSION"
          
    - name: Build models
      if:  (steps.deploy_qual.outputs.build == 'true')
      run: mvn -am -P generate-build -pl ":generator" -Dbuild.num=${{ steps.deploy_qual.outputs.value }}  compile  --file ls-mvn/pom.xml

    - name: Build with Maven
      if:  (steps.deploy_qual.outputs.build == 'true')
      run: mvn -T 1 deploy -Ddeploy.skip=false -Dmaven.install.skip=true -DfailIfNoTests=false  -Dbuild.num=${{ steps.deploy_qual.outputs.value }} --file ls-mvn/pom.xml
      

@Ginxo
Copy link

Ginxo commented May 12, 2022

@jeacott1 thanks for sharing. I would say you error is not related with this topic. I suggest you open a new query/request/issue for github support.

@jeacott1
Copy link

fwiw, removing the other suggested options and just setting -Dmaven.wagon.httpconnectionManager.ttlSeconds=60 has largely fixed my issue. the other options just break the thing altogether. @lhotari was right here I think.
-Dhttp.keepAlive=false -Dmaven.wagon.http.pool=false aren't useful, and in my case make things worse.

dkfellows added a commit to SpiNNakerManchester/JavaSpiNNaker that referenced this issue Aug 10, 2023
Apparently, the root cause of the super-slow download problem is the
Azure SNAT interacting extremely badly with Maven
actions/runner-images#1499
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
investigate Collect additional information, like space on disk, other tool incompatibilities etc. OS: Ubuntu
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants