-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
User Count Drops when Worker Abruptly Leaves The Test In Distributed Mode #1766
Comments
I think this is expected at the moment (maybe that open PR would change it). Imho, autoscaling workers while the test is running (especially scaling down) is not really useful because it introduces too much noice/risk into the test. |
I do not think the PR #1621 will address this issue. The code at https://github.com/mboutet/locust/blob/f9d0f96a3303f8dd9c202c17755f38fab93f1eb9/locust/runners.py#L771-L773 addresses the re-distribution of users when a new worker joins and maintains the total desired user count, but there is nothing to address when a worker node simply leaves abruptly (auto-scaled) and sends no notification to the master node. That client simply "goes away" and takes its allocated users with it, therefore reducing the number of total users by that number. It seems like there should be a |
Ok, you are probably right. But either way I dont really see this as a major issue, and in the more common use cases (not using nodes that may be scaled down) a disappearing worker means the test is no longer valid. If anything, it could very well just make it extremely confusing if you lose connection to one or more workers during the test (perhaps due to it becoming overloaded), and the remaining workers start to ramp up, causing them to overload. |
To add to the discussion, I also think that there should be a task running in the master node to ensure that the total user count (and distribution) is always kept to the desired state. Synchronization primitives should be used so that this task does nothing if there's a spawning in progress and only act when the test is in a steady state. I don't think this would be a difficult thing to add. It's almost the same thing as the shape worker. |
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 10 days. |
Will be fixed in 2.0 |
User Count Drops when Worker Abruptly Leaves The Test In Distributed Mode
When running in distributed mode with workers running in Docker containers, and a worker leaves the test, total target user count should remain the same and redistribute the users among the remaining workers.
Expected behavior
If using Docker in an auto-scaled environment, based on factors such as CPU, workers can join the test and leave the test. When workers leave the test, because they were no longer needed to maintain test parameters, they also take their assigned users with them. For example, running a test with 4 worker nodes, with 40 users, when 1 worker drops (leaving 3), total remaining users will be 30 instead of 40.
When a worker leaves, it should be the same as when a worker joins, e.g. distribute the total target users among current workers.
Environment
using locustio/locust:master as base to Docker image.
The text was updated successfully, but these errors were encountered: