HTTP resources fails at a lower concurrency than expected. #6666

manuranga · 2024-06-27T09:08:54Z

Description:
HTTP resources fail at a lower concurrency than expected. It fails around 30 threads on my laptop, while similar java application can tolerate more than 200 (could be even higher).

Steps to reproduce:
create a simple service

import ballerina/http;

listener http:Listener apiListener = new(8080);

isolated service / on apiListener {
    resource isolated function get hello() returns error|json {
        return {"hello": "world"};
    }
}

Stress test with JMeter. test-bal-threads.txt (rename .txt to .jmx)
Affected Versions:
Ballerina 2201.9.1

Related Issues (optional):
https://discord.com/channels/957996897782616114/1017493204326686860/threads/1255528309769506969

manuranga · 2024-06-27T10:55:55Z

In my machine issues doesn't happen in 2201.4.3
But issue exists in 2201.5.0

TharmiganK · 2024-06-28T04:18:25Z

The JMX script using a stepping thread group to identify the maximum concurrent threads which can be handled by the server without a timeout.

There seems to be some interruption errors in the jmeter client side. Will get a thread dump for further analysis

TharmiganK · 2024-06-28T11:18:15Z

The issue here is the jmeter plugin client's behaviour. I saw that regardless of HTTP2Sampler.http1_upgrade property the client is sending a mixed of HTTP/1.1 requests and HTTP/1.1 upgrade requests(There is no HTTP/2 requests)(There is also a known issue in this plugin). This was verified by wireshark.

In the above screenshot, the very first request is coming as a HTTP/1.1 upgrade but the next request is a normal HTTP/1.1 request. Also there is a unexpected GO_AWAY from the client to close the established HTTP/2 connection. This is when we make HTTP2Sampler.http1_upgrade property true. When we set it to false, the very first request is normal HTTP/1.1 and the next request is HTTP/1.1 Upgrade. (I used a single user to check this)

Due to these factors, the number of established connections are keep increasing since the ballerina server side does not limit the connections made by the client. So, there is an interruption exception occurred in the client used in the jmeter plugin.

As the core jmeter does not have a HTTP/2 client, it is not advisable to trust the results of these plugins without verifying its behaviour. We recommend to use h2load as a bench marking tool for HTTP/2. You can do a similar stress test using the following script:

#!/bin/bash

URL="http://localhost:8080/hello"
TOTAL_REQUESTS=100000
INITIAL_CONCURRENCY=1
MAX_CONCURRENCY=200
STEP=1

for ((c=$INITIAL_CONCURRENCY; c<=$MAX_CONCURRENCY; c+=$STEP))
do
  echo "Running with concurrency: $c"
  h2load -n $TOTAL_REQUESTS -c $c $URL
  sleep 2  # Pause between steps, adjust as necessary
done

TharmiganK · 2024-07-03T03:45:16Z

Keeping this issue open, to check the server's behaviour when we get an unexpected GO_AWAY from the client. @dilanSachi is checking on this

manuranga · 2024-07-03T05:07:56Z

From my observation, this GO_AWAY comes after 20s. Looks like the client is just holding and closing it after a set timeout.

dilanSachi · 2024-07-03T09:46:52Z

Keeping this issue open, to check the server's behaviour when we get an unexpected GO_AWAY from the client. @dilanSachi is checking on this

At the moment, ballerina service does not act to the GO_AWAY from the client. The connection is kept open until client initiates a connection close. This can be seen in the following server trace logs.

[2024-07-03 15:06:02,057] TRACE {http.tracelog.downstream} - [id: 0x24bd4c93, correlatedSource: n/a, host:/127.0.0.1:8080 - remote:/127.0.0.1:37470] ACTIVE  
[2024-07-03 15:06:02,110] TRACE {http.tracelog.downstream} - [id: 0x24bd4c93, L:/127.0.0.1:8080 - R:/127.0.0.1:37470] OUTBOUND SETTINGS: ack=false settings={INITIAL_WINDOW_SIZE=65535, MAX_HEADER_LIST_SIZE=8192}  
[2024-07-03 15:06:02,113] TRACE {http.tracelog.downstream} - [id: 0x24bd4c93, correlatedSource: n/a, host:/127.0.0.1:8080 - remote:/127.0.0.1:37470] OUTBOUND: << Payload could not be decoded >>  
[2024-07-03 15:06:02,120] TRACE {http.tracelog.downstream} - [id: 0x24bd4c93, L:/127.0.0.1:8080 - R:/127.0.0.1:37470] INBOUND SETTINGS: ack=false settings={HEADER_TABLE_SIZE=25700}  
[2024-07-03 15:06:02,122] TRACE {http.tracelog.downstream} - [id: 0x24bd4c93, L:/127.0.0.1:8080 - R:/127.0.0.1:37470] OUTBOUND SETTINGS: ack=true  
[2024-07-03 15:06:02,928] TRACE {http.tracelog.downstream} - [id: 0x24bd4c93, L:/127.0.0.1:8080 - R:/127.0.0.1:37470] INBOUND SETTINGS: ack=true  
[2024-07-03 15:06:02,934] TRACE {http.tracelog.downstream} - [id: 0x24bd4c93, L:/127.0.0.1:8080 - R:/127.0.0.1:37470] INBOUND HEADERS: streamId=3 headers=DefaultHttp2Headers[:path: /hello, :method: POST, content-type: text/plain, user-agent: wso2] padding=0 endStream=false  
[2024-07-03 15:06:03,929] TRACE {http.tracelog.downstream} - [id: 0x24bd4c93, L:/127.0.0.1:8080 - R:/127.0.0.1:37470] INBOUND DATA: streamId=3 padding=0 endStream=true length=11 data=11B
hellosworld  
Payload received hellosworld
1
[2024-07-03 15:06:03,974] TRACE {http.tracelog.downstream} - [id: 0x24bd4c93, L:/127.0.0.1:8080 - R:/127.0.0.1:37470] OUTBOUND HEADERS: streamId=3 headers=DefaultHttp2Headers[:status: 201, content-type: application/json, server: ballerina, date: Wed, 3 Jul 2024 15:06:03 +0530] padding=0 endStream=false  
[2024-07-03 15:06:03,980] TRACE {http.tracelog.downstream} - [id: 0x24bd4c93, L:/127.0.0.1:8080 - R:/127.0.0.1:37470] OUTBOUND DATA: streamId=3 padding=0 endStream=true length=17 data=17B
{"hello":"world"}  
[2024-07-03 15:06:05,930] TRACE {http.tracelog.downstream} - [id: 0x24bd4c93, L:/127.0.0.1:8080 - R:/127.0.0.1:37470] INBOUND GO_AWAY: lastStreamId=3 errorCode=11 length=0 bytes=

However, this can cause issues with malformed clients. @TharmiganK observed something similar to this with the above client. Here, the client opens a connection and do some communication and send a GO_AWAY, but does not close the connection. Then the client opens another connection and do some comm and send a GO_AWAY again. If this happens repeatedly, there will be too many open connections that cannot do any communication and this is a wastage. So we offline discussed to add a feature for Ballerina http2 server that would act upon receiving GO_AWAY frames. The proposed approach is to wait for sometime until the client closes the connection/wait until the existing streams complete the communication and close the connection from the server side.

TharmiganK · 2024-08-12T02:42:33Z

Improved the behaviour by handling the GO_AWAY from server side with a stale eviction implementation

manuranga added the Type/Bug label Jun 27, 2024

manuranga assigned shafreenAnfar Jun 27, 2024

manuranga added the Priority/Highest label Jun 27, 2024

TharmiganK self-assigned this Jun 27, 2024

TharmiganK added module/http Team/PCM Protocol connector packages related issues labels Jun 27, 2024

dilanSachi mentioned this issue Jul 11, 2024

Handle GO_AWAY frames in the server side #6734

Closed

TharmiganK added Type/Improvement and removed Type/Bug labels Aug 12, 2024

TharmiganK closed this as completed Aug 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HTTP resources fails at a lower concurrency than expected. #6666

HTTP resources fails at a lower concurrency than expected. #6666

manuranga commented Jun 27, 2024

manuranga commented Jun 27, 2024 •

edited

Loading

TharmiganK commented Jun 28, 2024

TharmiganK commented Jun 28, 2024

TharmiganK commented Jul 3, 2024 •

edited

Loading

manuranga commented Jul 3, 2024

dilanSachi commented Jul 3, 2024

TharmiganK commented Aug 12, 2024

HTTP resources fails at a lower concurrency than expected. #6666

HTTP resources fails at a lower concurrency than expected. #6666

Comments

manuranga commented Jun 27, 2024

manuranga commented Jun 27, 2024 • edited Loading

TharmiganK commented Jun 28, 2024

TharmiganK commented Jun 28, 2024

TharmiganK commented Jul 3, 2024 • edited Loading

manuranga commented Jul 3, 2024

dilanSachi commented Jul 3, 2024

TharmiganK commented Aug 12, 2024

manuranga commented Jun 27, 2024 •

edited

Loading

TharmiganK commented Jul 3, 2024 •

edited

Loading