Skip to content

Conversation

@maskit
Copy link
Member

@maskit maskit commented Jul 11, 2023

ATS keeps reading H2 frames until either it reads all data in the receive buffer or it processes 128 H2 frames. This is not great if a client sends many requests at once. The first request will not be responded until ATS finishes/stops reading incoming frames, and that leads to a long TTFB. This also means ATS does not fully utilize a connection. ATS should be able to read following frames while response data is being transferred by Kernel/NIC.

This PR extends Http2CommonSession::_should_do_something_else, so ATS can stop reading frames regardless of the number of frames it read. I added Http2CommonSession::interrupt_reading_frames(), and by calling it after Http2Stream::send_request, ATS can get a chance to send response data before reading following H2 frames.

This benchmark was done on my laptop so rps and tps fluctuate and unreliable, but I can consistently see lower TTFB.

h2load -n 3000 -m 100 -c 10 "https://localhost:8443/256k"

Before

finished in 739.71ms, 4055.66 req/s, 1014.56MB/s
requests: 3000 total, 3000 started, 3000 done, 3000 succeeded, 0 failed, 0 errored, 0 timeout
status codes: 3000 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 750.47MB (786929450) total, 37.20KB (38090) headers (space savings 96.20%), 750.00MB (786432000) data
                     min         max         mean         sd        +/- sd
time for request:    34.20ms    364.11ms    192.23ms     75.67ms    72.47%
time for connect:    11.43ms     67.43ms     40.86ms     18.82ms    60.00%
time to 1st byte:   103.09ms    284.44ms    188.50ms     58.36ms    60.00%
req/s           :     405.98      458.36      432.78       16.96    60.00%

After

finished in 736.39ms, 4073.92 req/s, 1019.12MB/s
requests: 3000 total, 3000 started, 3000 done, 3000 succeeded, 0 failed, 0 errored, 0 timeout
status codes: 3000 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 750.47MB (786929450) total, 37.20KB (38090) headers (space savings 96.20%), 750.00MB (786432000) data
                     min         max         mean         sd        +/- sd
time for request:    14.43ms    371.40ms    200.59ms     66.48ms    75.73%
time for connect:     9.19ms     66.73ms     39.34ms     19.22ms    60.00%
time to 1st byte:    67.39ms    103.57ms     75.14ms     14.92ms    80.00%
req/s           :     407.46      517.14      458.63       36.60    60.00%

And this is from production. A service that makes a lot of concurrent requests. Download time per host within a same pop, and the first row is the host with this change.
image

@maskit maskit added this to the 10.0.0 milestone Jul 11, 2023
@maskit maskit self-assigned this Jul 11, 2023
@maskit
Copy link
Member Author

maskit commented Jul 17, 2023

[approve ci autest]

@bryancall bryancall self-requested a review July 17, 2023 22:39
@maskit maskit force-pushed the h2_respond_first branch from b4c7a7c to dd5336f Compare July 28, 2023 21:00
@maskit
Copy link
Member Author

maskit commented Aug 22, 2023

image

@duke8253 duke8253 self-requested a review August 31, 2023 20:53
verify: {as: equal}

# Unrelated request just to keep the connection open
- client-request:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Going to wait for the tests to be changed to utilize the new PV feature that keeps connection open.

@maskit
Copy link
Member Author

maskit commented Sep 6, 2023

Updated the tests.

@maskit maskit merged commit 62abe06 into apache:master Sep 7, 2023
cmcfarlen pushed a commit to cmcfarlen/trafficserver that referenced this pull request Jun 3, 2024
* asf/master: (22 commits)
  fix: check whether a protocol is enabled during the length calculation in create_npn_advertisement (apache#10381)
  Coverity 1518612: Remove dead code (apache#10384)
  prefetch_cmcd: make autests more robust by removing need for gold file wildcard (apache#10382)
  Give a chance to send a response before receiving next request on H2 (apache#9997)
  CID 1516688: Fix uninitialized member of AcceptOptions (apache#10152)
  Fix slice head request memory issue (apache#10285)
  Fixes the TSMgmt metrics APIs for new API metrics (apache#10379)
  Minor parent.config a/an change (apache#10372)
  Allow DbgCtl tag to be set after instance construction. (apache#10375)
  Fix more build dep issues, for later PRs to work (apache#10376)
  money_trace cid 1518569: string not null terminated (apache#10373)
  Fix a couple of Coverity issues in health check plugin, around filenames (apache#10371)
  Fixes some build issues that happens with  other changes (apache#10374)
  Eliminate unreachable code covered by switch default (apache#10370)
  Add tests for disk failure (apache#10192)
  Disable copying/moving for DbgCtl. (apache#10321)
  Cmake autest (apache#10327)
  cmake: add unit tests from mgmt/rpc (apache#10366)
  Adjust CMakeLists with git worktree (apache#10298)
  Fix example plugins build (apache#10326)
  ...
bneradt added a commit to bneradt/trafficserver that referenced this pull request Jul 12, 2024
Revert "Give a chance to send a response before receiving next request on H2 (apache#9997)"

This reverts commit 62abe06.

I'm reverting this commit because it seems to introduce the crash seen in YTSATS-4273.
bneradt pushed a commit to bneradt/trafficserver that referenced this pull request Jul 12, 2024
…pache#9997)

* Give a chance to send a response before receiving next request on H2

* Keep client connections open to let ATS receive RST_STREAM

* Use new ProxyVerifier that supports keep-connection-open
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants