Yarn fails with ESOCKETTIMEDOUT while installing a large package on a slow disk #8242

darkk · 2020-07-17T10:21:48Z

Bug description

Windows build of our electron app is consistently failing. yarn install --frozen-lockfile failed to download https://registry.yarnpkg.com/date-fns/-/date-fns-2.12.0.tgz and https://registry.yarnpkg.com/@material-ui/icons/-/icons-4.9.1.tgz. Failure to download @material-ui/icons was reported with ESOCKETTIMEDOUT. However, I expected the buildhost to have a well-provisioned network as it was a Github Actions runner. Linux build was working fine.

I assumed that high-latency disk IO may be a reason and managed to get a test-case that reproduces the issue reliably: ESOCKETTIMEDOUT is reliably triggered on Linux when small and realistic delay (8ms) is injected to disk IO system calls.

ESOCKETTIMEDOUT being reported because of slow disk IO is very confusing behavior, as it sounds like temporary network error while the root cause is different. It does not match my understanding of "Developing Javascript projects shouldn't leave the door open to surprises" motto, so I'm reporting this test-case as a separate issue despite possible duplicates in the issue tracker. 🙂

Command

ubuntu $ docker run --privileged --rm -ti --net=host node:12-buster /bin/bash
docker # apt update && apt install -y strace
docker # mkdir ~/prj && cd ~/prj
docker # strace -f -o /dev/null -e trace=stat -e inject=stat:delay_exit=8000 yarn add @material-ui/icons@^4.5.1

What is the current behavior?

yarn add v1.22.4
info No lockfile found.
[1/4] Resolving packages...
[2/4] Fetching packages...
info There appears to be trouble with your network connection. Retrying...
info There appears to be trouble with your network connection. Retrying...
info There appears to be trouble with your network connection. Retrying...
info There appears to be trouble with your network connection. Retrying...
error An unexpected error occurred: "https://registry.yarnpkg.com/@material-ui/icons/-/icons-4.9.1.tgz: ESOCKETTIMEDOUT".
info If you think this is a bug, please open a bug report with the information provided in "/root/prj/yarn-error.log".
info Visit https://yarnpkg.com/en/docs/cli/add for documentation about this command.

What is the expected behavior?
If I run exactly same command with delay_exit=1 (0.001ms) instead of delay_exit=8000 (8ms), I get the expected behavior:

yarn add v1.22.4
info No lockfile found.
[1/4] Resolving packages...
[2/4] Fetching packages...
[3/4] Linking dependencies...
warning " > @material-ui/icons@4.9.1" has unmet peer dependency "@material-ui/core@^4.0.0".
warning " > @material-ui/icons@4.9.1" has unmet peer dependency "react@^16.8.0".
warning " > @material-ui/icons@4.9.1" has unmet peer dependency "react-dom@^16.8.0".
[4/4] Building fresh packages...
success Saved lockfile.
success Saved 3 new dependencies.
info Direct dependencies
└─ @material-ui/icons@4.9.1
info All dependencies
├─ @babel/runtime@7.10.5
├─ @material-ui/icons@4.9.1
└─ regenerator-runtime@0.13.5
Done in 53.01s.

Steps to Reproduce

See "Command" paragraph. Also, the test-case may need some small changes depending on environment.

First, strace adds some overhead on it's own and it may affect reproducibility. E.g. yarn add @material-ui/icons@^4.5.1 is Done in 5.76s. in the very same environment without strace wrapper. That's why I compare strace-with-delay to strace-without-delay and don't compare it to "clean" run.

Second, I've taken stat() call from the following:

I've launched strace -f -o ~/yarn-trace yarn add @material-ui/icons@^4.5.1
I've looked at the output of grep -F AccessAlarmsRounded.d.ts ~/yarn-trace. It had 5 openat() calls, 4 lstat() calls, 1 stat() call, 1 chmod() call. So I've taken stat(/usr/local/share/.cache/yarn/v6/.../AccessAlarmsRounded.d.ts) as a place to inject delay to.

Third, I've taken 8ms delay assuming that there is single stat() system call per unpacked file and I was emulating HDD-based system having 125 IOPS performance. It's all a ballpark estimate: 1ms delay works on my system, 2ms fails with ESOCKETTIMEDOUT once but manages to install a package after retry, 4ms and 8ms fail reliably.

Fourth, as soon as TCP buffering is involved (see comment on TCP ZeroWindow later), available network bandwidth and size of socket buffer may be also a factor playing a role in bug reproducibility. I've reproduced the bug with these exact variable values with Ubuntu 16.04 laptop connected by 100 Mbit/s link in Russian St. Petersburg and on Linode VM in Newark (see below).

Fifth, your node build may interact with OS kernel a bit differently, e.g. it may use open() instead of openat(). So, if the test-case fails for you, try to increase the injected latency for the disk-related system call or change a disk-related system call. I reproduced the issue on Ubuntu 18.04 VM in Linode Newark availability zone, but I had to use openat as a latency-injection point instead of stat. 4 statx() and 3 openat() syscalls were made for the aforementioned filename at that VM.

Comments and assumptions

SQLite has "faster than FS" benchmarks showing that Windows had pretty bad performance (compared to Linux) while operating with lots of small files. Both date-fns and @material-ui/icons have thousands of files as well as packages mentioned in "Possible duplicates" section. That explains that Windows users are suffering way more from ESOCKETTIMEDOUT happening while installing packages with thousands of files.

@FredyC came to the same idea that high-latency HDD being used instead of low-latency SSD triggers the ESOCKETTIMEDOUT in #6221 (comment)

@Hinaser made an excellent comment describing packet capture #5259 (comment) yarn probably stops reading from a socket (so client OS sends TCP ZeroWindow) and eventually closes the socket from the client side.

I assume that node or yarn is busy unpacking well-compressed tarball full of small files and does not restart reading from socket for long enough time, so ESOCKETTIMEDOUT is triggered. I assume that the code also does not disable socket timeout while putting stream in paused state.

I assume, the possible fix is to download .tgz to a temporary file with some timeouts for network interactions and to unpack it without any timeouts as disk can't write faster anyway. Unfortunately, I'm not familiar with yarn codebase to provide a good PR.

Environment

Node Version: 12.18.2
Yarn v1 Version: 1.22.4
OS and version: Docker container node:12-buster running on top of Ubuntu 16.04 or 18.04

yarn-error.log is the following:

Arguments: 
  /usr/local/bin/node /opt/yarn-v1.22.4/bin/yarn.js add @material-ui/icons@^4.5.1

PATH: 
  /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

Yarn version: 
  1.22.4

Node version: 
  12.18.2

Platform: 
  linux x64

Trace: 
  Error: https://registry.yarnpkg.com/@material-ui/icons/-/icons-4.9.1.tgz: ESOCKETTIMEDOUT
      at ClientRequest.<anonymous> (/opt/yarn-v1.22.4/lib/cli.js:141375:19)
      at Object.onceWrapper (events.js:421:28)
      at ClientRequest.emit (events.js:315:20)
      at TLSSocket.emitRequestTimeout (_http_client.js:709:9)
      at Object.onceWrapper (events.js:421:28)
      at TLSSocket.emit (events.js:327:22)
      at TLSSocket.Socket._onTimeout (net.js:481:8)
      at listOnTimeout (internal/timers.js:549:17)
      at processTimers (internal/timers.js:492:7)

npm manifest: 
  No manifest

yarn manifest: 
  No manifest

Lockfile: 
  No lockfile

Possible duplicates:

There appears to be trouble with your network connection. Retrying... #4890 (comment) — 27 Mib grid-styled@4.1.0 with 29090 files on Windows
There appears to be trouble with your network connection. Retrying... #5259 (comment) — 3 MiB nyc@11.7.3 with 4742 files on Windows 10
ESOCKETTIMEDOUT on 31 MB package material-design-icons #5540 — 32 MiB material-design-icons@3.0.1 on macOS 10.13; 14 Mib rxjs-6.5.3 on Windows
Yarn fails to install material-design-icons package everytime on appveyor #5546 — 31 MiB material-design-icons@3.0.1 on Windows
ESOCKETTIMEDOUT for package material-design-icons #5950 — large material-design-icons on Windows
rxjs-compat-6.2.2.tgz: ESOCKETTIMEDOUT #6115 — 0.2 MiB rxjs-compat@6.2.2 with 3115 files on Windows 10
Failing to install big sized module #6221 — 31 Mib material-design-icons-3.0.1 with 89814 files on Windows 10
There appears to be trouble with your network connection. Retrying... (for create-react-native-app command) #6392 — 15 MiB lottie-react-native-2.3.2 with 6275 files on Windows 10
yarn add expo-cli --network-timeout 100000 gives rxjs-5.5.12.tgz: ESOCKETTIMEDOUT #7171 — 1.5 MiB rxjs-5.5.12 with 3661 files on Windows
Problem updating project packages #7455 — 1.2 MiB @material-ui/icons@4.0.1 with 15667 files on Windows
yarn install on vue project is hitting timeout issue #7581 — 4.6 MiB npm-6.11.3 with 4086 files on Windows
Yarn is unable to fetch icons-react-0.0.1-beta.5.tgz package, possible network connection problem ESOCKETTIMEDOUT #7738 — 0.6 Mib @carbon/icons-react@0.0.1-beta.5 with 6165 files on Windows
Packages not installing #7873 — 0.5 Mib core-js@2.6.11 with 1489 files on Windows

The text was updated successfully, but these errors were encountered:

ievgennaida · 2020-07-20T14:00:04Z

Windows 10.
yarn v1.22.4
node v10.19.0
WSL

Executed:
npm init react-app my-ap

Failed:
[2/4] Fetching packages...
error An unexpected error occurred: "https://registry.yarnpkg.com/rxjs/-/rxjs-6.5.4.tgz: ESOCKETTIMEDOUT".

AmirTugi · 2020-07-21T12:51:22Z

I'm experiencing the same exact issue.
It's a massive issue for us, since it's blocking deploys to production.
Anyone has any idea how to solve this?
What changed since last week?

darkk · 2020-07-21T14:03:56Z

@AmirTugi the workaround that works for me is to do yarn config set network-timeout 300000 to raise timeout to 5m from 30s.

However, that's just a workaround, not a fix.

AmirTugi · 2020-07-21T15:25:51Z

Right, I tried to raise it to 1000000 and it didn't work.
So that's an arbitrary work-around :)

m0hamm3d-ac · 2020-07-24T15:19:04Z

We are facing this issue on our pipeline servers too.. Has anything changed on yarn?

AlanKnightly · 2020-07-27T07:07:06Z

me too, I had tries more than 30 times these days and always got timeout, which is so annoying

ankasani · 2020-09-07T07:29:16Z

Even, I see a the same problem in the app center build services. Can anyone please look into this issue?

Is there any yarn status page available?

m0hamm3d-ac · 2020-09-07T07:43:08Z

Even, I see a the same problem in the app center build services. Can anyone please look into this issue?

Is there any yarn status page available?

The previous failures I observed corresponded to npm outages shown on this page - https://status.npmjs.org/

anhvut · 2020-09-30T11:22:12Z

@darkk did a wonderful description of the bug. I proposed a PR with his proposal fix in mind:

download first the tgz file in cache - no uncompress at the same time. This part is still constrained by network timeout
read tgz file in cache folder to uncompress and do other stuff (compute checkums, build metadata ...)

toby-griffiths · 2020-10-05T18:17:28Z

I'm also seeing this on my builds on a Digital Ocean (SSD) build server the last couple of days (since setting the build server up).

marikaner · 2020-12-15T11:56:43Z

This is happening on GH actions for us as well. Every day a few of our checks fail, because of that. Current solution: rerun...

mikehardy · 2020-12-18T15:04:13Z

For github actions failures you might like https://github.com/nick-invision/retry/

merceyz · 2021-01-02T13:29:11Z

Closing as fixed in v2 where the timeout logic is less susceptible to this sort of issue

https://yarnpkg.com/getting-started/migration

jjangga0214 · 2021-02-10T09:11:25Z

@merceyz

I have a few questions.

Is this not supposed to be fixed in v1?
Does this happen only on a specific OS (.e.g. macOS)?
Why does this happen and how is this fixed in v2?

Would appreciate if you share some information.

Thanks!

darkk · 2021-02-10T09:15:14Z

@jjangga0214 WRT Q#2. It might happen on any OS. It's just more probable to trigger the bug on macOS and Windows due to performance characteristics of the filesystems. HDD (or any other high-latency medium) instead of SSD also increases the probability.

ben1one · 2021-06-19T06:39:57Z

thx so much!!!!

lwhiteley mentioned this issue Jul 31, 2020

Yarn not installing fontawesome FortAwesome/Font-Awesome#16047

Closed

3 tasks

anhvut mentioned this issue Sep 30, 2020

avoid ESOCKETTIMEDOUT while installing a large package on a slow disk #8363

Open

oliversturm mentioned this issue Oct 27, 2020

yarn install - "trouble with your network connection" ish-app/ish#929

Closed

Brooooooklyn mentioned this issue Dec 17, 2020

Hosted macOS workflows will experience longer wait times the week of December 14th. actions/runner-images#2247

Closed

9 tasks

merceyz closed this as completed Jan 2, 2021

merceyz added the fixed-in-modern This issue has been fixed / implemented in Yarn 2+. label Jan 2, 2021

wimax-grapl mentioned this issue Feb 3, 2021

Upgrade engagement_view to Yarn v2, Typescript 3.8.0 grapl-security/grapl#574

Merged

This was referenced Feb 11, 2021

pin node version in github actions cylc/cylc-ui#592

Merged

upgrade to yarn 2 cylc/cylc-ui#593

Closed

kevinold mentioned this issue Feb 26, 2021

Error during yarn install procedure cypress-io/cypress-realworld-app#799

Closed

Tobbe mentioned this issue Mar 15, 2021

The CI improvements we need [tracking issue] redwoodjs/redwood#1886

Open

16 tasks

valentino-amadeus mentioned this issue Apr 20, 2021

Fix / prohibit the use of old ABTester versions vtex/cli-plugin-abtest#12

Merged

6 tasks

yarnpkg locked as resolved and limited conversation to collaborators Jun 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Yarn fails with ESOCKETTIMEDOUT while installing a large package on a slow disk #8242

Yarn fails with ESOCKETTIMEDOUT while installing a large package on a slow disk #8242

darkk commented Jul 17, 2020 •

edited

Loading

ievgennaida commented Jul 20, 2020 •

edited

Loading

AmirTugi commented Jul 21, 2020

darkk commented Jul 21, 2020

AmirTugi commented Jul 21, 2020

m0hamm3d-ac commented Jul 24, 2020

AlanKnightly commented Jul 27, 2020

ankasani commented Sep 7, 2020

m0hamm3d-ac commented Sep 7, 2020

anhvut commented Sep 30, 2020

toby-griffiths commented Oct 5, 2020

marikaner commented Dec 15, 2020

mikehardy commented Dec 18, 2020

merceyz commented Jan 2, 2021

jjangga0214 commented Feb 10, 2021

darkk commented Feb 10, 2021 •

edited

Loading

ben1one commented Jun 19, 2021

Yarn fails with ESOCKETTIMEDOUT while installing a large package on a slow disk #8242

Yarn fails with ESOCKETTIMEDOUT while installing a large package on a slow disk #8242

Comments

darkk commented Jul 17, 2020 • edited Loading

Bug description

ievgennaida commented Jul 20, 2020 • edited Loading

AmirTugi commented Jul 21, 2020

darkk commented Jul 21, 2020

AmirTugi commented Jul 21, 2020

m0hamm3d-ac commented Jul 24, 2020

AlanKnightly commented Jul 27, 2020

ankasani commented Sep 7, 2020

m0hamm3d-ac commented Sep 7, 2020

anhvut commented Sep 30, 2020

toby-griffiths commented Oct 5, 2020

marikaner commented Dec 15, 2020

mikehardy commented Dec 18, 2020

merceyz commented Jan 2, 2021

jjangga0214 commented Feb 10, 2021

darkk commented Feb 10, 2021 • edited Loading

ben1one commented Jun 19, 2021

darkk commented Jul 17, 2020 •

edited

Loading

ievgennaida commented Jul 20, 2020 •

edited

Loading

darkk commented Feb 10, 2021 •

edited

Loading