Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

User Count Drops when Worker Abruptly Leaves The Test In Distributed Mode #1766

Closed
SoonerLR opened this issue May 24, 2021 · 6 comments
Closed
Labels

Comments

@SoonerLR
Copy link

User Count Drops when Worker Abruptly Leaves The Test In Distributed Mode

When running in distributed mode with workers running in Docker containers, and a worker leaves the test, total target user count should remain the same and redistribute the users among the remaining workers.

Expected behavior

If using Docker in an auto-scaled environment, based on factors such as CPU, workers can join the test and leave the test. When workers leave the test, because they were no longer needed to maintain test parameters, they also take their assigned users with them. For example, running a test with 4 worker nodes, with 40 users, when 1 worker drops (leaving 3), total remaining users will be 30 instead of 40.

When a worker leaves, it should be the same as when a worker joins, e.g. distribute the total target users among current workers.

Environment

using locustio/locust:master as base to Docker image.

@SoonerLR SoonerLR added the bug label May 24, 2021
@cyberw
Copy link
Collaborator

cyberw commented May 24, 2021

I think this is expected at the moment (maybe that open PR would change it). Imho, autoscaling workers while the test is running (especially scaling down) is not really useful because it introduces too much noice/risk into the test.

@SoonerLR
Copy link
Author

SoonerLR commented May 24, 2021

I do not think the PR #1621 will address this issue.

The code at https://github.com/mboutet/locust/blob/f9d0f96a3303f8dd9c202c17755f38fab93f1eb9/locust/runners.py#L771-L773 addresses the re-distribution of users when a new worker joins and maintains the total desired user count, but there is nothing to address when a worker node simply leaves abruptly (auto-scaled) and sends no notification to the master node. That client simply "goes away" and takes its allocated users with it, therefore reducing the number of total users by that number.

It seems like there should be a self.start(self.target_user_count, self.spawn_rate) in the heartbeat message section or when locust finally notices that a client is gone.

@cyberw
Copy link
Collaborator

cyberw commented May 24, 2021

I do not think the PR #1621 will address this issue.

Ok, you are probably right. But either way I dont really see this as a major issue, and in the more common use cases (not using nodes that may be scaled down) a disappearing worker means the test is no longer valid. If anything, it could very well just make it extremely confusing if you lose connection to one or more workers during the test (perhaps due to it becoming overloaded), and the remaining workers start to ramp up, causing them to overload.

@mboutet
Copy link
Contributor

mboutet commented May 31, 2021

To add to the discussion, I also think that there should be a task running in the master node to ensure that the total user count (and distribution) is always kept to the desired state. Synchronization primitives should be used so that this task does nothing if there's a spawning in progress and only act when the test is in a steady state.

I don't think this would be a difficult thing to add. It's almost the same thing as the shape worker.

@github-actions
Copy link

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 10 days.

@github-actions github-actions bot added the stale Issue had no activity. Might still be worth fixing, but dont expect someone else to fix it label Jul 31, 2021
@cyberw cyberw removed the stale Issue had no activity. Might still be worth fixing, but dont expect someone else to fix it label Jul 31, 2021
@cyberw
Copy link
Collaborator

cyberw commented Jul 31, 2021

Will be fixed in 2.0

@cyberw cyberw closed this as completed Jul 31, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants