requestTimeout and long running uploads #46574

julien-f · 2023-02-08T15:37:50Z

Version

v18.14.0

Platform

Linux prometheus 6.1.9-200.fc37.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Feb 2 00:21:48 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

Subsystem

http

What steps will reproduce the bug?

Create an HTTP server with a big requestTimeout (a day in this example):

require('node:http').createServer({
  requestTimeout: 24 * 60 * 60 * 1e3, // a day
}, (req, res) => {
  console.log(req.method, req.url);

  req.resume().on('end', () => {
    res.end('Ok');
  }).on('error', console.warn);
}
).listen(8080);

Do a long running request to it:

$ curl -T /dev/urandom http://localhost:8080 -v
*   Trying 127.0.0.1:8080...
* Connected to localhost (127.0.0.1) port 8080 (#0)
> PUT /urandom HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/7.85.0
> Accept: */*
> Transfer-Encoding: chunked
> Expect: 100-continue
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 100 Continue
* Mark bundle as not supporting multiuse
< HTTP/1.1 408 Request Timeout
< Connection: close
< 
* Send failure: Connection reset by peer
* Closing connection 0
curl: (55) Send failure: Connection reset by peer

How often does it reproduce? Is there a required condition?

It happens everytime, and I haven't been able to pinpoint the exact value at which it becomes problematic.

What is the expected behavior?

It should interrupt the request after 24 hours.

What do you see instead?

It interrupt the request after ~30 seconds.

Additional information

I can work-around by disabling the timeout (requestTimeout: 0) completely but setting a value, even high, seemed preferable.

Any other suggestions on how to handle long running uploads?

The text was updated successfully, but these errors were encountered:

theanarkh · 2023-02-09T03:51:37Z

Try this.

require('node:http').createServer({
  requestTimeout: 24 * 60 * 60 * 1e3, // a day
  connectionsCheckingInterval: 24 * 60 * 60 * 1e3
}, (req, res) => {
  console.log(req.method, req.url);

  req.resume().on('end', () => {
    res.end('Ok');
  }).on('error', console.warn);
}
).listen(8080);

julien-f · 2023-02-09T08:13:08Z

@theanarkh Why? What do you expect to see with this?

From what I understand, there is still an issue in how requestTimeout is handled, and even if connectionsCheckingInterval works, it will check all connections every 24 hours, this will not guarantee that a specific connection can stay open for 24 hours.

Fixes https://xcp-ng.org/forum/post/58146 Caused by nodejs/node#46574 It caused requests to timeout after 0-30 seconds, which broke all uploads.

julien-f · 2023-02-13T10:29:28Z

@ShogunPanda Any opinions on this?

ShogunPanda · 2023-02-13T10:34:15Z

Inside the connection handling algorithm I always use uv_hrtime to track connections.

@nodejs/libuv Do you think this might explain it?

bnoordhuis · 2023-02-13T10:53:09Z

@ShogunPanda explain what exactly?

ShogunPanda · 2023-02-13T11:44:34Z

@bnoordhuis I re-read it and got it wrong probably. I'll try to reproduce and see what happens. I'll get back here shortly.

StephenKelly94 · 2023-02-14T13:24:41Z

I have been experiencing the same issue. I set a requestTimeout to 2 hours, and was seeing the same result (connection reset by peer after ~30 seconds).

I noticed though after 2 hours of uptime on the system it started to work correctly. After this I set the requestTimeout to 5 minutes and sure enough it started working after 5 minutes uptime. So perhaps it's related to system uptime?

So I opted to set the requestTimeout to 0, and this fixes the issue for now.

ShogunPanda · 2023-02-23T10:53:33Z

Unfortunately I cannot reproduce this. I tried both on a Mac and on Linux.
Given you were mentioning uptime of the system and time of the day, can you provide more info so I can take a look?

julien-f · 2023-02-23T13:10:11Z

I don't see anything related to the uptime of my machine.

@ShogunPanda What behavior do you get with my test script above? On my system (and others like Ubuntu), curl get an ECONNRESET before 30 seconds.

ShogunPanda · 2023-02-23T15:04:14Z

@julien-f In my case the server keeps receiving the request without issues.

StephenKelly94 · 2023-03-01T08:39:58Z

So I altered the @julien-f script to work more closely with how we have encountered it.

Node Version: 18.12.1

const http = require('node:http');
const srv = http.createServer((req, res) => {
    console.log(req.method, req.url);

    req.resume().on('end', () => {
        res.end('Ok');
    }).on('error', console.warn);
}
).listen(10080);
srv.requestTimeout = 10 * 60 * 1000; // 10 minute timeout

I get the following output before 10 minutes uptime:

$ uptime
07:44:46 up 5 min,  0 users,  load average: 29.40, 19.92, 8.74
$ curl -T /dev/urandom http://localhost:10080 | cat
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  211M    0     0    0  211M      0  20.0M --:--:--  0:00:10 --:--:-- 20.6M
curl: (55) Send failure: Broken pipe
$ curl -T /dev/urandom http://localhost:10080 | cat
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  256M    0     0    0  256M      0  20.8M --:--:--  0:00:12 --:--:-- 21.3M
curl: (55) Send failure: Broken pipe

After 10 minutes of uptime it looks like this:

$ uptime
 07:56:26 up 16 min,  0 users,  load average: 2.90, 6.76, 8.16
$ curl -T /dev/urandom http://localhost:10080 | cat
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 13.3G    0     0    0 13.3G      0  21.9M --:--:--  0:10:25 --:--:-- 22.3M
curl: (55) Send failure: Broken pipe
$ curl -T /dev/urandom http://localhost:10080 | cat
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 12.8G    0     0    0 12.8G      0  21.9M --:--:--  0:10:01 --:--:-- 21.2M
curl: (55) Send failure: Broken pipe

I'm not sure if I'm missing anything or if setting the requestTimeout after the server is instantiated makes a difference.

Fixes https://xcp-ng.org/forum/post/58146 Caused by nodejs/node#46574 It caused requests to timeout after 0-30 seconds, which broke all uploads.

nsgundy · 2023-04-03T22:24:28Z

Running into this as well it seems. Its an ARMv7 architecture. So far I have only manually reproduced it by uploading a 230 MB file, takes about 10 seconds to complete, handled by express and multer (certainly not a minimal reproduction setup). The timeout can happen anywhere within the transfer. This only seems to happen when the uptime of the system is between 2 and 5 minutes. Running node 18.15.0. requestTimeout is unaltered at 300 seconds.

sheepa · 2023-04-03T22:31:10Z

Getting this issue as well, there seem to be some upper limit to requestTimeout when it stops working and any connections gets timed out whenever connectionsCheckingInterval fires.

EDIT: Can't say for sure what's causing this, or under what circumstances, but requestTimeout is definitely bugged.

das7pad · 2023-08-31T17:35:19Z

Hello,

We are seeing a lot of unexpected status 408 responses as well. Most notably in our CI environment running Node.js 18.17.1 in Google Cloud Build on n1-highcpu-32 VMs (x86_64). We see 408s for pretty much all requests, even very simple GET requests. The test HTTP servers use default options/timeouts. Our busiest test pipeline saw at least one unexpected 408 request timeout in about 50% of the builds.

We traced this down to a bug in the logic behind connectionsCheckingInterval: The connection cleanup routine uses uv_hrtime() readings for tracking the last activity per connection. It uses uv_hrtime()-requestTimeout*1e6/uv_hrtime()-headersTimeout*1e6 as reference for staleness, stored in an unsigned 64-bit integer.
Per the docs of uv_hrtime(), it returns arbitrary but monotonic clock readings.

node/deps/uv/docs/src/misc.rst

Lines 563 to 572 in c9c958e

    
           .. c:function:: uint64_t uv_hrtime(void) 
        
               Returns the current high-resolution real time. This is expressed in 
        
               nanoseconds. It is relative to an arbitrary time in the past. It is not 
        
               related to the time of day and therefore not subject to clock drift. The 
        
               primary use is for measuring performance between intervals. 
        
               .. note:: 
        
                   Not every platform can support nanosecond resolution; however, this value will always 
        
                   be in nanoseconds.

By definition, uv_hrtime() readings may be less than the configured request or header timeouts. In turn, the subtractions may overflow and flag all connections as stale. I'll add a few example readings below.

The default request timeout is 300000000000, so 242653195462 - 300000000000 = -57346804538 or as uint64 18446744016362747078.

239934240956
233476307396
233536410987
233606060330
238853987548
239820072736
240095063948
240250386896
240297219170
240032647966
238283618445
242653195462
221756831153
233199677409

The overflow bug has been fixed on the main and 20.x branches already: #48291.
The patch applies cleanly to 18.17.1 and our CI pipeline is happy again!
WDYT about backporting the fix to Node.js 18 via 18.x-staging? I'm happy to open a PR.

Also, WDYT about updating the comment in the above patch to explain that this affects ALL environments, not just IoT/embedded devices? Something like this s/On IoT or embedded devices the uv_hrtime() may return the timestamp/By definition, uv_hrtime() may return a timestamp/? Again, I'm happy to open a PR for main and include the comment tweak in the backport PR for 18.x-staging and open another one for 20.x-staging (if needed).

sheepa · 2023-08-31T19:49:43Z

The #48291 fix will set the timeout to infinite if the hrtime is too low? If a shorter timeout is requested is that not unexpected?

das7pad · 2023-09-01T07:33:34Z

You can see uv_hrtime() as a counter that gets incremented for every nanosecond of program runtime. The counter is initialized with a random number x, which is picked from 0 < x < Date.now()*1e6. For two consecutive calls of uv_hrtime(), we get values y and z, with x <= y <= z (the counter only increments and we can either observe the same value or a greater one). y will be our last_message_start_ as assigned when a new message starts and z will be now as read by the connection expiry check. In case uv_hrtime() returns now < timeout we can deduce that last_message_start_ is also last_message_start_ < timeout, via x <= y <= z/x <= last_message_start_ <= now.
In other words, when the counter was not yet incremented timeout times, we can deduce that no prior counter reading was greater than timeout, so subtracting timeout from now would yield a negative number or zero (aka point at a time before the program start or the program start).
Which in turn allows us to deduce that no connection can be stale for more than timeout nanoseconds yet, and we do not need to check any connections for staleness regarding this timeout. We flag a timeout as "do not check" by setting it to zero.

das7pad · 2023-09-25T09:46:47Z

#48291 has been backported to Node 18.18 via 977e9a3. We were able to revert our workaround now.

VoltrexKeyva added the http Issues or PRs related to the http subsystem. label Feb 8, 2023

julien-f added a commit to vatesfr/xen-orchestra that referenced this issue Feb 9, 2023

fix(xo-server): disable broken requestTimeout

a3e37ec

Fixes https://xcp-ng.org/forum/post/58146 Caused by nodejs/node#46574 It caused requests to timeout after 0-30 seconds, which broke all uploads.

eddiecheng mentioned this issue Feb 22, 2023

Restore VM from Delta Backups stops at exactly 5mins vatesfr/xen-orchestra#6656

Closed

MathieuRA pushed a commit to vatesfr/xen-orchestra that referenced this issue Mar 7, 2023

fix(xo-server): disable broken requestTimeout

17f83e3

Fixes https://xcp-ng.org/forum/post/58146 Caused by nodejs/node#46574 It caused requests to timeout after 0-30 seconds, which broke all uploads.

darkowls mentioned this issue Sep 22, 2023

Restoring backups via XO proxy does not work if it takes longer than 5 minutes vatesfr/xen-orchestra#7052

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

requestTimeout and long running uploads #46574

requestTimeout and long running uploads #46574

julien-f commented Feb 8, 2023

theanarkh commented Feb 9, 2023

julien-f commented Feb 9, 2023

julien-f commented Feb 13, 2023

ShogunPanda commented Feb 13, 2023

bnoordhuis commented Feb 13, 2023

ShogunPanda commented Feb 13, 2023

StephenKelly94 commented Feb 14, 2023 •

edited

Loading

ShogunPanda commented Feb 23, 2023

julien-f commented Feb 23, 2023

ShogunPanda commented Feb 23, 2023

StephenKelly94 commented Mar 1, 2023 •

edited

Loading

nsgundy commented Apr 3, 2023 •

edited

Loading

sheepa commented Apr 3, 2023 •

edited

Loading

das7pad commented Aug 31, 2023

sheepa commented Aug 31, 2023

das7pad commented Sep 1, 2023

das7pad commented Sep 25, 2023

requestTimeout and long running uploads #46574

requestTimeout and long running uploads #46574

Comments

julien-f commented Feb 8, 2023

Version

Platform

Subsystem

What steps will reproduce the bug?

How often does it reproduce? Is there a required condition?

What is the expected behavior?

What do you see instead?

Additional information

theanarkh commented Feb 9, 2023

julien-f commented Feb 9, 2023

julien-f commented Feb 13, 2023

ShogunPanda commented Feb 13, 2023

bnoordhuis commented Feb 13, 2023

ShogunPanda commented Feb 13, 2023

StephenKelly94 commented Feb 14, 2023 • edited Loading

ShogunPanda commented Feb 23, 2023

julien-f commented Feb 23, 2023

ShogunPanda commented Feb 23, 2023

StephenKelly94 commented Mar 1, 2023 • edited Loading

nsgundy commented Apr 3, 2023 • edited Loading

sheepa commented Apr 3, 2023 • edited Loading

das7pad commented Aug 31, 2023

sheepa commented Aug 31, 2023

das7pad commented Sep 1, 2023

das7pad commented Sep 25, 2023

StephenKelly94 commented Feb 14, 2023 •

edited

Loading

StephenKelly94 commented Mar 1, 2023 •

edited

Loading

nsgundy commented Apr 3, 2023 •

edited

Loading

sheepa commented Apr 3, 2023 •

edited

Loading