Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Since v20.13.0 fetch() ing small files without reading their body delays process termination for 8 seconds #52909

Open
unilynx opened this issue May 8, 2024 · 8 comments
Labels
fetch Issues and PRs related to the Fetch API v20.x v20.x Issues that can be reproduced on v20.x or PRs targeting the v20.x-staging branch.

Comments

@unilynx
Copy link

unilynx commented May 8, 2024

Version

v20.13.0

Platform

Linux 33f5daf61896 6.4.16-linuxkit #1 SMP PREEMPT Thu Nov 16 10:49:20 UTC 2023 aarch64 GNU/Linux

Subsystem

No response

What steps will reproduce the bug?

docker run --rm -ti node:20.13.0 /bin/bash

 cat > test.js
async function main() {
  const x = await fetch ('http://cms6-sites.webhare.com/test-arnold/empty.docx');
  console.log(x);
}
main()

^D

node test.js

The node process will not terminate after downloading the document

How often does it reproduce? Is there a required condition?

100%

What is the expected behavior? Why is that the expected behavior?

Process should terminate

What do you see instead?

Process does not terminate

Additional information

  • downgrading to 20.12.2 fixes the issue
  • the reported uname above is on a Mac M3 with Linux inside docker, but I can reproduce it on native x86_64 linux hosts too
  • it does not fail when the document is served under a different mimetype (eg try with http://cms6-sites.webhare.com/test-arnold/empty.docx)
  • it also hangs when the URL is changed to https
  • I can also reproduce it with other docx files

Edit: fixed version number where it still works, there is no 20.12.3, oops.

@unilynx unilynx changed the title with v20.13.0, fetching a docx hangs the process. wrled with v20.13.0, fetching a docx prevents process ternination. worked in 20.12.3 May 8, 2024
@unilynx unilynx changed the title with v20.13.0, fetching a docx prevents process ternination. worked in 20.12.3 with v20.13.0, fetching a docx prevents process ternination. worked in 20.12.2 May 8, 2024
@RedYetiDev RedYetiDev added fetch Issues and PRs related to the Fetch API v20.x v20.x Issues that can be reproduced on v20.x or PRs targeting the v20.x-staging branch. labels May 8, 2024
@climba03003
Copy link
Contributor

climba03003 commented May 9, 2024

I see connection timeout instead of hanging with 20.13.0 and 22.1.0.
The problem is not related to fetch but more fundamental network connection.

You can see the same happen on http.request.

import { request } from 'http'

request('http://cms6-sites.webhare.com/test-arnold/empty.docx', (response) => {
  console.log(response)
  response.resume()
}).end()

@unilynx
Copy link
Author

unilynx commented May 9, 2024

more fundamental network connection

Possibly. To exclude the issue with any servers I control (although I originally found the issue on localhost with a different server setup) I've also tested against external servers.

It seems file size related. now with this code

async function main() {
  const x = await fetch (process.argv[2]);
  console.log(x);
}
main()

These are the results I see now, where Fails means 'it downloads but the process never terminates'

To confirm it's not my server:

# FAILS: 20034 bytes, completely different server I know nothing about
node test.js https://www.mhlw.go.jp/content/11120000/000908344.docx
# FAILS: 20306 bytes
node test.js https://www.careers.govt.nz/assets/pages/cv-and-cover-letter-templates/Cover-letter-template.docx
# Works: 92682 bytes
node test.js https://www.euro-divers.com/wp-content/uploads/2020/05/scuba-diving-instructor-resume.docx

To demonstrate it appears to be size related (all these docs are random data)

# Works: 164864 bytes
node test.js http://cms6-sites.webhare.com/test-arnold/empty-160kb.docx
# Fails: 81920 bytes
node test.js http://cms6-sites.webhare.com/test-arnold/empty-80kb.docx
# Fails: 93184 bytes
node test.js http://cms6-sites.webhare.com/test-arnold/empty-91kb.docx
# Works: 94208 bytes
node test.js http://cms6-sites.webhare.com/test-arnold/empty-92kb.docx

And it's not the file size per se but seems to have to do with the request decoding

# Works: 20608 but this file got gzipped by nginx as it didn't recgnoze the extension, and transferred as chunked
node test.js http://cms6-sites.webhare.com/test-arnold/empty.stuff

As far as I can tell, docs < 80KB fail, > 160 KB work, and I can confirm this with a few random servers found in google. For me the exact cutoff point is somewhere between 91KB and 92KB but if it's network related, the exact cutoff might differ for others

EDIT: now also tested from a Rocky 9 host at Linode, ie in a DC, which seems to cut out somewhere between 120KB and 160KB

@climba03003
Copy link
Contributor

climba03003 commented May 9, 2024

It is a bit different from my environment.
All of the link from http://cms6-sites.webhare.com result in network timeout.
But all the others works fine.

The address I connected to for cms6-sites.webhare.com are
104.248.92.95 and 2a03:b0c0:2:f0::a4:5001

import { request as http } from 'http';
import { request as https } from 'https';
import { fetch } from 'undici';

const sites = [
  'http://cms6-sites.webhare.com/test-arnold/empty.docx',
  'http://cms6-sites.webhare.com/test-arnold/empty-80kb.docx',
  'http://cms6-sites.webhare.com/test-arnold/empty-91kb.docx',
  'http://cms6-sites.webhare.com/test-arnold/empty-92kb.docx',
  'http://cms6-sites.webhare.com/test-arnold/empty-160kb.docx',
  'https://www.mhlw.go.jp/content/11120000/000908344.docx',
  'https://www.careers.govt.nz/assets/pages/cv-and-cover-letter-templates/Cover-letter-template.docx',
  'https://www.euro-divers.com/wp-content/uploads/2020/05/scuba-diving-instructor-resume.docx'
]

for(const site of sites) {
  _fetch(site)
    .then(() => { console.log('fetch done: %s', site) })
    .catch((err) => { console.log('fetch fail: %s', site) })
  _request(site)
    .then(() => { console.log('request done: %s', site) })
    .catch((err) => { console.log('request fail: %s', site) })
}

function _fetch(href) {
  return fetch (href)
}

function _request(href) {
  return new Promise((resolve, reject) => {
    const request = href.startsWith('https') ? https : http
    request(href, (response) => {
      resolve(response)
      response.resume()
    }).on('error', (err) => {
      reject(err)
    }).end()
  })
}
request done: https://www.mhlw.go.jp/content/11120000/000908344.docx
fetch done: https://www.mhlw.go.jp/content/11120000/000908344.docx
request done: https://www.euro-divers.com/wp-content/uploads/2020/05/scuba-diving-instructor-resume.docx
fetch done: https://www.euro-divers.com/wp-content/uploads/2020/05/scuba-diving-instructor-resume.docx
request done: https://www.careers.govt.nz/assets/pages/cv-and-cover-letter-templates/Cover-letter-template.docx
fetch done: https://www.careers.govt.nz/assets/pages/cv-and-cover-letter-templates/Cover-letter-template.docx
fetch fail: http://cms6-sites.webhare.com/test-arnold/empty.docx
fetch fail: http://cms6-sites.webhare.com/test-arnold/empty-80kb.docx
fetch fail: http://cms6-sites.webhare.com/test-arnold/empty-91kb.docx
fetch fail: http://cms6-sites.webhare.com/test-arnold/empty-92kb.docx
fetch fail: http://cms6-sites.webhare.com/test-arnold/empty-160kb.docx
request fail: http://cms6-sites.webhare.com/test-arnold/empty.docx
request fail: http://cms6-sites.webhare.com/test-arnold/empty-80kb.docx
request fail: http://cms6-sites.webhare.com/test-arnold/empty-92kb.docx
request fail: http://cms6-sites.webhare.com/test-arnold/empty-160kb.docx
request fail: http://cms6-sites.webhare.com/test-arnold/empty-91kb.docx

@climba03003
Copy link
Contributor

climba03003 commented May 9, 2024

The problem related to the auto select family attempt timeout.
You can increase the timeout to a more reasonable value if your network is slow.

import { setDefaultAutoSelectFamilyAttemptTimeout } from 'net';
setDefaultAutoSelectFamilyAttemptTimeout(1000)

cc @ShogunPanda
I see you added the command-line option --network-family-autoselection-attempt-timeout but it is actually not taking effect while I am testing.
Only the setDefaultAutoSelectFamilyAttemptTimeout does affect the timeout.

@unilynx
Copy link
Author

unilynx commented May 9, 2024

The problem related to the auto select family attempt timeout. You can increase the timeout to a more reasonable value if your network is slow.

I'm not seeing network timeouts. I always see the requests complete immediately (well at least < 300ms, I haven't properly measured it).

My problem is the process is not terminating (apprently because of a lingering/leaking TCPSocketWrap object because that's what my CI reported, but I haven't yet built a testcase that would also pinpoint it)

If you're seeing network timeouts that's a pity, because then you're not reproducing the issue I'm seeing. I see you're in Hong Kong and all the testserves I have easy acces to are in Western Europe, so latency may be a factor.

If noone else chimes in to confirm or deny being able to reproduce the issue, I'll try to set up the files elsewhere

@climba03003
Copy link
Contributor

climba03003 commented May 9, 2024

My problem is the process is not terminating (apprently because of a lingering/leaking TCPSocketWrap object because that's what my CI reported, but I haven't yet built a testcase that would also pinpoint it)

Yes, it is not terminating with the built-in one.
I am checking along with the timeout problem and using the undici one.
If you are using the undici one, it is actually exit properly with keep-alive timeout is over.
I assume the problem is fixed somehow in the upstream and will eventually be landed in v20 release line.

Said too early, it will close in both case but need to wait for a bit long if using the built-in.
Built-in wait until the keep-alive timeout 60 seconds, undici exit early when no more connection after the keep-alive checking threshold.
I assume the problem is fixed somehow in the upstream and will eventually be landed in v20 release line

Built-in ends in 65839.6259ms
Undici ends in 8334.4074ms

@unilynx
Copy link
Author

unilynx commented May 9, 2024

keep-alive might indeed be part of the issue, but that doesn't directly explain why the downloaded file size (or more likely: timing differences caused by file size) changes the behavior. The server doesn't change its keep-alive behavior on the size so I'd expect the process to either always terminate or always hang, but not depend on the downloaded size.

Thanks for the pointer. If I go straight for undici: { const fetch = require("undici") }

npm i undici@6.6.0
node test.js http://cms6-sites.webhare.com/test-arnold/empty-40kb.docx  # works
node test.js http://cms6-sites.webhare.com/test-arnold/empty-160kb.docx. # works

npm i undici@6.6.1
node test.js http://cms6-sites.webhare.com/test-arnold/empty-40kb.docx  # FAILS: process hangs
node test.js http://cms6-sites.webhare.com/test-arnold/empty-160kb.docx. # works

the change: GHSA-9f24-jqhm-jfcw

aha. backpressure is probably the keyword here, as that would be related to file sizes. The workaround is to consume the incoming body, which I indeed don't, but that being the workaround would suggest that I don't need to do that in 6.6.1.

if I consume the body I don't see the timeout anymore, but I didn't care about the body. As undici 6.16.0 still show the same behavior, I'd probably better file a bug there first if this behavior is indeed intentional.

thanks for your help and pointing me in the right direction

PS: the process does terminate after 8 seconds, but for my use case that's an eternity :-)

@unilynx unilynx changed the title with v20.13.0, fetching a docx prevents process ternination. worked in 20.12.2 Since v20.13.0 fetch() ing small files without reading their body delays process termination for 8 seconds May 9, 2024
@unilynx
Copy link
Author

unilynx commented Aug 20, 2024

I posted the issue to undici here - nodejs/undici#3230 - but without them changing anything, it's now fixed in node 22 (And still broken in 20 and 21)

I've run a bisect and as far as I can tell this commit to node 22 fixed the issue: 1abff07

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fetch Issues and PRs related to the Fetch API v20.x v20.x Issues that can be reproduced on v20.x or PRs targeting the v20.x-staging branch.
Projects
None yet
Development

No branches or pull requests

3 participants