Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug] Yet another slow/broken artifacts upload #429

Closed
ab-spingo opened this issue Aug 21, 2023 · 9 comments
Closed

[bug] Yet another slow/broken artifacts upload #429

ab-spingo opened this issue Aug 21, 2023 · 9 comments
Labels
bug Something isn't working

Comments

@ab-spingo
Copy link

What happened?

Deployment process hangs at upload. We're using v2 for Windows (Azure AppService) environment. It worked fine 3 days before.
Normally upload taked about 30 seconds, this time we had to cancel the run after 1,5 hour. Nothing has changed in files count or code structure.

Container for artifact ".net-app" successfully created. Starting upload of file(s)
Total file count: 568 ---- Processed file #2 (0.3%)
Total file count: 568 ---- Processed file #2 (0.3%)
Total file count: 568 ---- Processed file #2 (0.3%)
Total file count: 568 ---- Processed file #2 (0.3%)
Total file count: 568 ---- Processed file #2 (0.3%)
Total file count: 568 ---- Processed file #2 (0.3%)
Total file count: 568 ---- Processed file #2 (0.3%)
Total file count: 568 ---- Processed file #2 (0.3%)
Total file count: 568 ---- Processed file #2 (0.3%)
Total file count: 568 ---- Processed file #2 (0.3%)
A 503 status code has been received, will attempt to retry the upload
Exponential backoff for retry #1. Waiting for 5467 milliseconds before continuing the upload at offset 0
...
A 503 status code has been received, will attempt to retry the upload
Exponential backoff for retry #1. Waiting for 4549 milliseconds before continuing the upload at offset 0
Finished backoff for retry #1, continuing with upload
Total file count: 568 ---- Processed file #351 (61.7%)
Total file count: 568 ---- Processed file #351 (61.7%)
A 503 status code has been received, will attempt to retry the upload
Exponential backoff for retry #2. Waiting for 9089 milliseconds before continuing the upload at offset 0

We also got a couple of these in the log:

An error has been caught http-client index 1, retrying the upload
Error: connect ECONNREFUSED 20.253.95.3:443
    at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1278:16) {
  errno: -4078,
  code: 'ECONNREFUSED',
  syscall: 'connect',
  address: '20.253.95.3',
  port: 443
}

What did you expect to happen?

We should either get a way faster deployment or a clear message that there is a problem with target server.

Container for artifact ".net-app" successfully created. Starting upload of file(s)
Total file count: 568 ---- Processed file #[16](https://github.com/ItFaktoria/spingo-dev/actions/runs/5902357165/job/16010240459#step:6:17)4 (28.8%)
Total file count: 568 ---- Processed file #390 (68.6%)
Total size of all the files uploaded is 25860935 bytes
File upload process has finished. Finalizing the artifact upload
Artifact has been finalized. All files have been successfully uploaded!

How can we reproduce it?

This is the first time we have encountered that issue, not sure how to reproduce it - but from the issue log I see that others have had this problem before.

Anything else we need to know?

No response

What version of the action are you using?

v2

What are your runner environments?

window

Are you on GitHub Enterprise Server? If so, what version?

No response

@ab-spingo ab-spingo added the bug Something isn't working label Aug 21, 2023
@Chimpaya
Copy link

Yep, we are also having this problem

@chernyadev
Copy link

Also observing massive slowdown today

@chernyadev
Copy link

Seems to be ok by now: https://www.githubstatus.com/incidents/q8swpy90g6pp

@yauhen-vastraknutau-epam

Same slowdown for us for a couple of days already.

github-merge-queue bot pushed a commit to axonweb3/axon that referenced this issue Aug 23, 2023
<!--  Thanks for sending a pull request! -->

## What this PR does / why we need it?

1. ci: `build.yml` won't need to be triggered by push event
Because `build.yml` has been triggered by
axon-start-with-short-genesis.yml,
   it won't be triggered by push and pull_request events.

2. ci(clippy): just cache cargo and finish `clippy` quickly
- clear the complicated ci steps in `clippy.yml` -> let's keep things
simple
   - use concurrency ensure that only a clippy job will run in a group

3. start-test ci: only upload logs to GitHub artifact when the test
failed
   Because actions/upload-artifact#429

### What is the impact of this PR?

No Breaking Change

<!--
**Special notes for your reviewer**:
NIL

**PR relation**:
- Ref #

**Which issue(s) this PR fixes**:
You could link a pull request to an issue by using a supported keyword
in the pull request's description or in a commit message.

- Usage: `Fixes #<issue number>`, or `Fixes (paste link of issue)`.
see [Linking a pull request to an issue using a
keyword](https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword)
or [Manually linking a pull request to an issue using the pull request
sidebar](https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue#manually-linking-a-pull-request-or-branch-to-an-issue-using-the-issue-sidebar)

-->

<details><summary>CI Settings</summary><br/>

<!--  Have I run `make ci`? -->
### **CI Usage**

**Tip**: Check the CI you want to run below, and then comment `/run-ci`.

**CI Switch**

- [ ] Cargo Clippy
- [ ] Coverage Test
- [ ] E2E Tests
- [ ] Code Format
- [ ] Unit Tests
- [ ] Web3 Compatible Tests
- [ ] OCT 1-5 And 12-15
- [ ] OCT 6-10
- [ ] OCT 11
- [ ] OCT 16-19
- [ ] v3 Core Tests

### **CI Description**

| CI Name | Description |
| ----------------------------------------- |
-------------------------------------------------------------------------
|
| *Chaos CI* | Test the liveness and robustness of Axon under terrible
network condition |
| *Cargo Clippy* | Run `cargo clippy --all --all-targets --all-features`
|
| *Coverage Test* | Get the unit test coverage report |
| *E2E Test* | Run end-to-end test to check interfaces |
| *Code Format* | Run `cargo +nightly fmt --all -- --check` and `cargo
sort -gwc` |
| *Web3 Compatible Test* | Test the Web3 compatibility of Axon |
| *v3 Core Test* | Run the compatibility tests provided by Uniswap V3 |
| *OCT 1-5 \| 6-10 \| 11 \| 12-15 \| 16-19* | Run the compatibility
tests provided by OpenZeppelin |

<!--
#### Deprecated CIs
- [ ] Chaos CI
-->
</details>
KaoImin pushed a commit to axonweb3/axon that referenced this issue Aug 25, 2023
<!--  Thanks for sending a pull request! -->

## What this PR does / why we need it?

1. ci: `build.yml` won't need to be triggered by push event
Because `build.yml` has been triggered by
axon-start-with-short-genesis.yml,
   it won't be triggered by push and pull_request events.

2. ci(clippy): just cache cargo and finish `clippy` quickly
- clear the complicated ci steps in `clippy.yml` -> let's keep things
simple
   - use concurrency ensure that only a clippy job will run in a group

3. start-test ci: only upload logs to GitHub artifact when the test
failed
   Because actions/upload-artifact#429

### What is the impact of this PR?

No Breaking Change

<!--
**Special notes for your reviewer**:
NIL

**PR relation**:
- Ref #

**Which issue(s) this PR fixes**:
You could link a pull request to an issue by using a supported keyword
in the pull request's description or in a commit message.

- Usage: `Fixes #<issue number>`, or `Fixes (paste link of issue)`.
see [Linking a pull request to an issue using a
keyword](https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword)
or [Manually linking a pull request to an issue using the pull request
sidebar](https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue#manually-linking-a-pull-request-or-branch-to-an-issue-using-the-issue-sidebar)

-->

<details><summary>CI Settings</summary><br/>

<!--  Have I run `make ci`? -->
### **CI Usage**

**Tip**: Check the CI you want to run below, and then comment `/run-ci`.

**CI Switch**

- [ ] Cargo Clippy
- [ ] Coverage Test
- [ ] E2E Tests
- [ ] Code Format
- [ ] Unit Tests
- [ ] Web3 Compatible Tests
- [ ] OCT 1-5 And 12-15
- [ ] OCT 6-10
- [ ] OCT 11
- [ ] OCT 16-19
- [ ] v3 Core Tests

### **CI Description**

| CI Name | Description |
| ----------------------------------------- |
-------------------------------------------------------------------------
|
| *Chaos CI* | Test the liveness and robustness of Axon under terrible
network condition |
| *Cargo Clippy* | Run `cargo clippy --all --all-targets --all-features`
|
| *Coverage Test* | Get the unit test coverage report |
| *E2E Test* | Run end-to-end test to check interfaces |
| *Code Format* | Run `cargo +nightly fmt --all -- --check` and `cargo
sort -gwc` |
| *Web3 Compatible Test* | Test the Web3 compatibility of Axon |
| *v3 Core Test* | Run the compatibility tests provided by Uniswap V3 |
| *OCT 1-5 \| 6-10 \| 11 \| 12-15 \| 16-19* | Run the compatibility
tests provided by OpenZeppelin |

<!--
#### Deprecated CIs
- [ ] Chaos CI
-->
</details>
@LeviPesin
Copy link

@Alan-Jowett
Copy link

+1

https://github.com/microsoft/ebpf-for-windows/actions/runs/6239448484/job/16938150347

Uploading a few GB of profile tracing as an artifact started failing with 503.

@LeviPesin
Copy link

In my case the problem was with an outdated version of the harden-runner.

@rectified95
Copy link

@konradpabjan please reach out - MS projects are hitting this.

@konradpabjan
Copy link
Collaborator

V4 upload-artifact and download-artifact have officially dropped today!
See https://github.blog/changelog/2023-12-14-github-actions-artifacts-v4-is-now-generally-available/

They're effectively total rewrites from scratch that improve on every aspect. v4 is major re-architecture that we've been working on for well over a year. Upload and downloads are orders of magnitudes faster and more reliable (up to 98% faster uploads from our testing). The risk of corrupted artifacts is basically zero now as well. There is a computed sha256 output for each upload and we're even planning on adding it to our APIs in the future (for now you can see it outputted in the logs)

Here is just a random example of a 1GB artifact

Image
https://gh.io/artifact-v3-vs-v4

Because of the major rearchitecture, uploads and downloads should no longer see any ECONNREFUSED or ECONNRESET errors!

Recommend switching to v4 today! 😃

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

8 participants