Possibly performance regression in the latest versions of locust #2690

morrisonli76 · 2024-04-26T04:53:31Z

Prerequisites

I am using the latest version of Locust
I am reporting a bug, not asking a question

Description

I used to use Amazon Linux 2 as the base OS for my load tests. Because the python available on that OS is 3.7, the latest locust I could get was 2.17.0. With 5 c5n.xlarge EC2 instances (each has 4 vCPU) as workers, I could use spawn 1200 users. The wait_time for the test was set to constant_thoughtput(1) so that the total 1200 rps stress could be achieved.

Recently, I updated the base OS to Amazon Linux 2023. The python version became 3.11. I could use the latest version of locust - 2.26.0. However, the above setup (5 c5n.xlarge EC2 instances) could not provide the desired load. It could only spawn totally about 830 users. The total rsp was only around 330 even though the wait_time was still constant_thoughtput(1). I noticed that CPU usage of each worker process was close to 100% already.

The server being tested did not change. The same locustfile was used for tests. However, the performance between the above 2 locust setup was day and night difference. This seems like a regression.

Here is the output of the python 3.11 environment:
Package Version

blinker 1.7.0
Brotli 1.1.0
certifi 2024.2.2
charset-normalizer 3.3.2
click 8.1.7
ConfigArgParse 1.7
Flask 3.0.3
Flask-Cors 4.0.0
Flask-Login 0.6.3
gevent 24.2.1
geventhttpclient 2.2.1
greenlet 3.0.3
idna 3.7
itsdangerous 2.2.0
Jinja2 3.1.3
locust 2.26.0
MarkupSafe 2.1.5
msgpack 1.0.8
pip 22.3.1
psutil 5.9.8
pyzmq 26.0.2
requests 2.31.0
roundrobin 0.0.4
setuptools 65.5.1
urllib3 2.2.1
Werkzeug 3.0.2
zope.event 5.0
zope.interface 6.3

Command line

master side: locust -f /opt/locustfile.py --master worker side: locust -f - --worker --master-host <master_ip> --processes -1

Locustfile contents

class QuickstartUser(HttpUser):
    def on_start(self):
        self.pixel_ids = self.environment.parsed_options.pixel_ids.split(",")
        self.client.verify = True if self.environment.parsed_options.verify_cert.lower() == "true" else False

    @task
    def cloudbridge(self):
        pixel_id = random.choice(self.pixel_ids)
        event_body = {
            "fb.pixel_id": pixel_id,
            "event_id": generate_event_id(),
            "event_name": self.environment.parsed_options.event_name,
            "conversion_value": {
                "value": "9",
                "currency": "USD",
            },
        }
        self.client.post(self.environment.parsed_options.path, json=event_body, name="event")
        self.client.close()

    wait_time = constant_throughput(2)

Python version

3.11

Locust version

2.26.0

Operating system

Amazon Linux 2023

cyberw · 2024-04-26T06:39:25Z

Hmm... There IS a known performance regression in OpenSSL 3.x (which was usually introduced in Python 3.12, but maybe your python build is different somehow?), see #2555

The issue will hit tests which close/reopen the connection especially hard (as the issue arises at ssl negotiation)

Can you check to see which ssl version you are running?
python -c "import ssl; print(ssl.OPENSSL_VERSION)"

As a workaround, see if you can run run another python version or keep connections alive (I know, not as realistic but better than nothing)

morrisonli76 · 2024-04-29T19:14:11Z

Hi, I used ubuntu 20.04 for Amazon EC2. I managed install the python 3.10 and the latest locust.

The CPU usage became low. However, the through put did not follow the constant_throughput(1) spec. 1500 users only gave me less than 800 rps.

Here is my python env:

(locust_env) ubuntu@ip-172-31-10-204:$ locust -V
locust 2.26.0 from /opt/locust_env/lib/python3.10/site-packages/locust (python 3.10.14)
(locust_env) ubuntu@ip-172-31-10-204:$ python3.10 -m pip list
Package Version

blinker 1.8.1
Brotli 1.1.0
certifi 2024.2.2
charset-normalizer 3.3.2
click 8.1.7
ConfigArgParse 1.7
Flask 3.0.3
Flask-Cors 4.0.0
Flask-Login 0.6.3
gevent 24.2.1
geventhttpclient 2.2.1
greenlet 3.0.3
idna 3.7
itsdangerous 2.2.0
Jinja2 3.1.3
locust 2.26.0
MarkupSafe 2.1.5
msgpack 1.0.8
pip 24.0
psutil 5.9.8
pyzmq 26.0.2
requests 2.31.0
roundrobin 0.0.4
setuptools 69.5.1
tomli 2.0.1
urllib3 2.2.1
Werkzeug 3.0.2
wheel 0.43.0
zope.event 5.0
zope.interface 6.3

cyberw · 2024-05-10T18:14:07Z

Hi! Did you check your ssl version?

python -c "import ssl; print(ssl.OPENSSL_VERSION)"

morrisonli76 · 2024-05-11T01:39:47Z

Yes, I did that. In fact I used ubuntu 20.04 which uses openssl 1.1.1f. I also updated the python to 3.10. With this setup, the CPU usage was lower, however, I found that even if I set wait = constant_throughput(1) for the test user, 1500 users only gave me less than 800 rps (I have already mentioned this in my previous reply). I did not see this issue when I use locust 2.17.0.

cyberw · 2024-05-11T07:14:56Z

What are your response times like? Wait times can only limit throughput, not increase it, so if a task takes more than 1s to complete you wont get 1 request/user/s.

morrisonli76 · 2024-05-13T03:28:52Z

The average response time is less than 700ms. Also, when I used older version of locust (e.g. 2.17.0), I did not have this issue.

cyberw · 2024-05-13T06:12:28Z

Hmm.. only thing I can think of is if Amazon is throttling somehow. What if you skip closing the session/connection? Can you see how many dns lookups are made? (Using tcpdump or something else). If you close the session then maybe there is a new dns lookup for each task iteration?

morrisonli76 · 2024-05-13T18:58:20Z

I can take a look if there is new dns lookup. However, the same target server and same tests, why locust 2.17.0 did not have the issue. Any major change to the connection logic?

cyberw · 2024-05-13T19:49:24Z

Not that I can think of :-/ But does 2.17.0 not exhibit this problem on python 3.11/Amazon Linux 2023?

morrisonli76 · 2024-06-21T00:47:55Z

Just report back. I changed my system combination. Right now, I am using Amazon Linux 2 with Python 3.10. The ssl version is 1.1.1g. I also follow the instruction https://repost.aws/knowledge-center/dns-resolution-failures-ec2-linux to enable the local dns cache. With this setup, the latency is much lower and CPU usage per worker is at low level as well.

However, even with this setup, the RPS does not hold. I run a test with 1200 users, each with constant_throughput(1) request rate. the RPS is quite far from 1200. It stopped around 800 and started to drop on its own.

cyberw · 2024-06-21T19:54:54Z

What are the response times? If a task takes more than the constant_pacing time, you’ll get falling throughput.

morrisonli76 · 2024-06-22T00:53:53Z

I tried to run the locust 2.17 on the exact same OS (Amazon Linux 2 with Python 3.10). It also showed the same issue. I think the issue is on the load test side because the server being tested is the same. I suspect there could be something in the OS environment that slows down the connection.

However, one thing I don't understand is that when the number of users reaches the desired number, the rps can not reach the expected number and starts to drop and eventually drop to a very low number. It seems locust loses control of creating new connections.

I have enabled local dns cache. Anything else would you suggest me to try out?

Thanks

cyberw · 2024-06-24T08:01:18Z

The main thing I would like to investigate is on the receiving end. Is there some throttling going on? How many locust workers are you using? Are they spread out over multiple machines? Are they passing thru a NAT?

However, one thing I don't understand is that when the number of users reaches the desired number, the rps can not reach the expected number and starts to drop and eventually drop to a very low number. It seems locust loses control of creating new connections.

Again I ask: What are your response times? If response times increase enough, you'll get falling RPS. Nothing to do with Locust, it is just math: If you have a certain number of concurrent users and response times go up you'll get falling throughput.

github-actions · 2024-08-24T01:54:24Z

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 10 days.

morrisonli76 · 2024-08-26T20:55:21Z

Just got the latest locust 2.31. Everything else was the same. The above issue was resolved. Any major improvement in 2.31?

cyberw · 2024-08-26T22:28:26Z

There was a performance fix in requests 2.32.0, but it should really only be needed for openssl 3.x, which you didn't have :) https://github.com/psf/requests/releases/tag/v2.32.0

But its nice that it works for you now :) Ok to close?

cyberw · 2024-08-26T22:30:31Z

Or maybe what you were experiencing was a version of this: #2812 ? That was fixed in Locust 2.31.

morrisonli76 added the bug label Apr 26, 2024

github-actions bot added the stale Issue had no activity. Might still be worth fixing, but dont expect someone else to fix it label Aug 24, 2024

morrisonli76 closed this as completed Aug 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possibly performance regression in the latest versions of locust #2690

Possibly performance regression in the latest versions of locust #2690

morrisonli76 commented Apr 26, 2024

cyberw commented Apr 26, 2024 •

edited

Loading

morrisonli76 commented Apr 29, 2024

cyberw commented May 10, 2024

morrisonli76 commented May 11, 2024

cyberw commented May 11, 2024

morrisonli76 commented May 13, 2024

cyberw commented May 13, 2024

morrisonli76 commented May 13, 2024

cyberw commented May 13, 2024

morrisonli76 commented Jun 21, 2024

cyberw commented Jun 21, 2024

morrisonli76 commented Jun 22, 2024

cyberw commented Jun 24, 2024

github-actions bot commented Aug 24, 2024

morrisonli76 commented Aug 26, 2024

cyberw commented Aug 26, 2024

cyberw commented Aug 26, 2024

Possibly performance regression in the latest versions of locust #2690

Possibly performance regression in the latest versions of locust #2690

Comments

morrisonli76 commented Apr 26, 2024

Prerequisites

Description

Command line

Locustfile contents

Python version

Locust version

Operating system

cyberw commented Apr 26, 2024 • edited Loading

morrisonli76 commented Apr 29, 2024

cyberw commented May 10, 2024

morrisonli76 commented May 11, 2024

cyberw commented May 11, 2024

morrisonli76 commented May 13, 2024

cyberw commented May 13, 2024

morrisonli76 commented May 13, 2024

cyberw commented May 13, 2024

morrisonli76 commented Jun 21, 2024

cyberw commented Jun 21, 2024

morrisonli76 commented Jun 22, 2024

cyberw commented Jun 24, 2024

github-actions bot commented Aug 24, 2024

morrisonli76 commented Aug 26, 2024

cyberw commented Aug 26, 2024

cyberw commented Aug 26, 2024

cyberw commented Apr 26, 2024 •

edited

Loading