Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTP resources fails at a lower concurrency than expected. #6666

Closed
manuranga opened this issue Jun 27, 2024 · 7 comments
Closed

HTTP resources fails at a lower concurrency than expected. #6666

manuranga opened this issue Jun 27, 2024 · 7 comments
Assignees
Labels
module/http Priority/Highest Team/PCM Protocol connector packages related issues Type/Improvement

Comments

@manuranga
Copy link

Description:
HTTP resources fail at a lower concurrency than expected. It fails around 30 threads on my laptop, while similar java application can tolerate more than 200 (could be even higher).

Steps to reproduce:
create a simple service

import ballerina/http;

listener http:Listener apiListener = new(8080);

isolated service / on apiListener {
    resource isolated function get hello() returns error|json {
        return {"hello": "world"};
    }
}

Stress test with JMeter. test-bal-threads.txt (rename .txt to .jmx)
Affected Versions:
Ballerina 2201.9.1

Related Issues (optional):
https://discord.com/channels/957996897782616114/1017493204326686860/threads/1255528309769506969

@manuranga
Copy link
Author

manuranga commented Jun 27, 2024

In my machine issues doesn't happen in 2201.4.3
But issue exists in 2201.5.0

@TharmiganK TharmiganK self-assigned this Jun 27, 2024
@TharmiganK TharmiganK added module/http Team/PCM Protocol connector packages related issues labels Jun 27, 2024
@TharmiganK
Copy link
Contributor

The JMX script using a stepping thread group to identify the maximum concurrent threads which can be handled by the server without a timeout.

There seems to be some interruption errors in the jmeter client side. Will get a thread dump for further analysis

@TharmiganK
Copy link
Contributor

The issue here is the jmeter plugin client's behaviour. I saw that regardless of HTTP2Sampler.http1_upgrade property the client is sending a mixed of HTTP/1.1 requests and HTTP/1.1 upgrade requests(There is no HTTP/2 requests)(There is also a known issue in this plugin). This was verified by wireshark.

Screenshot 2024-06-28 at 13 57 32

In the above screenshot, the very first request is coming as a HTTP/1.1 upgrade but the next request is a normal HTTP/1.1 request. Also there is a unexpected GO_AWAY from the client to close the established HTTP/2 connection. This is when we make HTTP2Sampler.http1_upgrade property true. When we set it to false, the very first request is normal HTTP/1.1 and the next request is HTTP/1.1 Upgrade. (I used a single user to check this)

Due to these factors, the number of established connections are keep increasing since the ballerina server side does not limit the connections made by the client. So, there is an interruption exception occurred in the client used in the jmeter plugin.

As the core jmeter does not have a HTTP/2 client, it is not advisable to trust the results of these plugins without verifying its behaviour. We recommend to use h2load as a bench marking tool for HTTP/2. You can do a similar stress test using the following script:

#!/bin/bash

URL="http://localhost:8080/hello"
TOTAL_REQUESTS=100000
INITIAL_CONCURRENCY=1
MAX_CONCURRENCY=200
STEP=1

for ((c=$INITIAL_CONCURRENCY; c<=$MAX_CONCURRENCY; c+=$STEP))
do
  echo "Running with concurrency: $c"
  h2load -n $TOTAL_REQUESTS -c $c $URL
  sleep 2  # Pause between steps, adjust as necessary
done

@TharmiganK
Copy link
Contributor

TharmiganK commented Jul 3, 2024

Keeping this issue open, to check the server's behaviour when we get an unexpected GO_AWAY from the client. @dilanSachi is checking on this

@manuranga
Copy link
Author

From my observation, this GO_AWAY comes after 20s. Looks like the client is just holding and closing it after a set timeout.

@dilanSachi
Copy link
Contributor

Keeping this issue open, to check the server's behaviour when we get an unexpected GO_AWAY from the client. @dilanSachi is checking on this

At the moment, ballerina service does not act to the GO_AWAY from the client. The connection is kept open until client initiates a connection close. This can be seen in the following server trace logs.

[2024-07-03 15:06:02,057] TRACE {http.tracelog.downstream} - [id: 0x24bd4c93, correlatedSource: n/a, host:/127.0.0.1:8080 - remote:/127.0.0.1:37470] ACTIVE  
[2024-07-03 15:06:02,110] TRACE {http.tracelog.downstream} - [id: 0x24bd4c93, L:/127.0.0.1:8080 - R:/127.0.0.1:37470] OUTBOUND SETTINGS: ack=false settings={INITIAL_WINDOW_SIZE=65535, MAX_HEADER_LIST_SIZE=8192}  
[2024-07-03 15:06:02,113] TRACE {http.tracelog.downstream} - [id: 0x24bd4c93, correlatedSource: n/a, host:/127.0.0.1:8080 - remote:/127.0.0.1:37470] OUTBOUND: << Payload could not be decoded >>  
[2024-07-03 15:06:02,120] TRACE {http.tracelog.downstream} - [id: 0x24bd4c93, L:/127.0.0.1:8080 - R:/127.0.0.1:37470] INBOUND SETTINGS: ack=false settings={HEADER_TABLE_SIZE=25700}  
[2024-07-03 15:06:02,122] TRACE {http.tracelog.downstream} - [id: 0x24bd4c93, L:/127.0.0.1:8080 - R:/127.0.0.1:37470] OUTBOUND SETTINGS: ack=true  
[2024-07-03 15:06:02,928] TRACE {http.tracelog.downstream} - [id: 0x24bd4c93, L:/127.0.0.1:8080 - R:/127.0.0.1:37470] INBOUND SETTINGS: ack=true  
[2024-07-03 15:06:02,934] TRACE {http.tracelog.downstream} - [id: 0x24bd4c93, L:/127.0.0.1:8080 - R:/127.0.0.1:37470] INBOUND HEADERS: streamId=3 headers=DefaultHttp2Headers[:path: /hello, :method: POST, content-type: text/plain, user-agent: wso2] padding=0 endStream=false  
[2024-07-03 15:06:03,929] TRACE {http.tracelog.downstream} - [id: 0x24bd4c93, L:/127.0.0.1:8080 - R:/127.0.0.1:37470] INBOUND DATA: streamId=3 padding=0 endStream=true length=11 data=11B
hellosworld  
Payload received hellosworld
1
[2024-07-03 15:06:03,974] TRACE {http.tracelog.downstream} - [id: 0x24bd4c93, L:/127.0.0.1:8080 - R:/127.0.0.1:37470] OUTBOUND HEADERS: streamId=3 headers=DefaultHttp2Headers[:status: 201, content-type: application/json, server: ballerina, date: Wed, 3 Jul 2024 15:06:03 +0530] padding=0 endStream=false  
[2024-07-03 15:06:03,980] TRACE {http.tracelog.downstream} - [id: 0x24bd4c93, L:/127.0.0.1:8080 - R:/127.0.0.1:37470] OUTBOUND DATA: streamId=3 padding=0 endStream=true length=17 data=17B
{"hello":"world"}  
[2024-07-03 15:06:05,930] TRACE {http.tracelog.downstream} - [id: 0x24bd4c93, L:/127.0.0.1:8080 - R:/127.0.0.1:37470] INBOUND GO_AWAY: lastStreamId=3 errorCode=11 length=0 bytes=

However, this can cause issues with malformed clients. @TharmiganK observed something similar to this with the above client. Here, the client opens a connection and do some communication and send a GO_AWAY, but does not close the connection. Then the client opens another connection and do some comm and send a GO_AWAY again. If this happens repeatedly, there will be too many open connections that cannot do any communication and this is a wastage. So we offline discussed to add a feature for Ballerina http2 server that would act upon receiving GO_AWAY frames. The proposed approach is to wait for sometime until the client closes the connection/wait until the existing streams complete the communication and close the connection from the server side.

@TharmiganK
Copy link
Contributor

Improved the behaviour by handling the GO_AWAY from server side with a stale eviction implementation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module/http Priority/Highest Team/PCM Protocol connector packages related issues Type/Improvement
Projects
None yet
Development

No branches or pull requests

4 participants