Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scaling delayed or doesn't happen at all #667

Closed
rafalglowacz opened this issue Sep 19, 2019 · 10 comments
Closed

Scaling delayed or doesn't happen at all #667

rafalglowacz opened this issue Sep 19, 2019 · 10 comments
Labels

Comments

@rafalglowacz
Copy link

rafalglowacz commented Sep 19, 2019

  • Horizon Version: v3.3.2
  • Laravel Version: 5.8.22
  • PHP Version: 7.2.4
  • Redis Driver & Version: phpredis 4.3.0
  • Database Driver & Version: MySQL 5.7.22

Description:

When Horizon is freshly started and all workers have been scaled down then scaling only happens after the first job finishes (which might be a long time) or, if minProcesses is set to 0, it never happens, no jobs get processed. I thought I was doing something wrong but I asked a colleague to check it for me and he got the same results.

Steps To Reproduce:

  • have a supervisor config like this:
'supervisor-3' => [
	'connection' => 'redis',
	'queue' => ['myqueue'],
	'balance' => 'auto',
	'minProcesses' => 1, // or 0
	'maxProcesses' => 10,
	'tries' => 1,
	'timeout' => 24 * 3600,
],
  • clear Redis for good measure
  • start Horizon
  • crucial: wait for all workers for myqueue to scale down to the min number (for instance watch 'ps faux | grep horizon')
  • put a job in myqueue - make it run for half a minute or so in order to see what happens next
  • put another job in myqueue
  • depending on minProcesses:
    • if it was 1, the second job (and any subsequent job) won't start until the first one is done. After that happens, it'll work as expected. If the workers scale down to minimum now, it'll also work fine. The problem is only with the first job after starting Horizon and initially scaling down to minimum.
    • if it was 0, no worker processes will start and no jobs will be executed
@driesvints
Copy link
Member

I'm not sure myself. @themsaid maybe?

@rafalglowacz
Copy link
Author

A little update - today I tested it with a vanilla installation of Laravel 6.1 and Horizon 3.4. All fresh files, nothing from my work project that could potentially influence it in subtle ways. Same problem. I forgot to mention before that it's all running in Docker, specifically Laradock.

@driesvints You labelled the issue as needs more info. What kind of information should I provide?

@driesvints
Copy link
Member

@rafalglowacz it's just that I don't know if this is a bug or not. I needs to be investigated more before we label it as a bug.

@rafalglowacz
Copy link
Author

I've investigated the issue, here are my findings.

The problem is in the numberOfWorkersPerQueue method in src/AutoScaler.php:

            } elseif ($timeToClearAll == 0 &&
                      $supervisor->options->autoScaling()) {
                return [$queue => $supervisor->options->minProcesses];
            }

If $timeToClearAll is 0, it won't scale beyond the minimum number of processes. So where does $timeToClearAll come from? It's the sum of individual times to clear every separate queue, which are calculated in timeToClearPerQueue:

            return [$queue => ($size * $this->metrics->runtimeForQueue($queue))];

The time to clear the queue depends on the runtime metric for the queue, but the metric will be empty until the first job finishes. As a result $timeToClearAll will be 0, so we'll be stuck at the minimum number of workers until the first job is done. If we set the minimum to 0, the first job will never be done, so this queue will be basically non-functional (unless it executed jobs while it was initially scaling down and still had workers).

A potential solution

This could probably be fixed by sending information about queue sizes to numberOfWorkersPerQueue and scaling towards the maximum if we don't have the metrics yet but there are jobs waiting:

            } elseif ($timeToClearAll == 0 &&
                      $supervisor->options->autoScaling()) {
                return [
                    $queue => $sizes[$queue] > 0
                        ? $supervisor->options->maxProcesses
                        : $supervisor->options->minProcesses
                ];
            }

I'm not sure if scaling towards maximum could result in creating too many workers. It will get corrected as soon as at least one job finishes. What do you guys think?

Here's a full AutoScaler.php needed to test this potential fix:
https://gist.github.com/rafalglowacz/702dfb92651130e23f5caa9d5a814247

@graemlourens
Copy link
Contributor

graemlourens commented Nov 26, 2019

We are seeing the same behaviour with following setup:
Horizon Version: v3.4.2
Laravel Version: v6.0.3
PHP Version: 7.2.23
Redis Driver & Version: phpredis 5.1.1

If we have a queue with 0 minProcesses, that queue will never scale, it just stays like that forever. In some cases it indeed starts scaling, but only if we restart horizon and in 'bootup' (when several processes are initially spawned) the first job is processed - after that it works, but like Rafal mentioned, only when the first job completed scaling works when you initially have 0 processes.

This is indeed quite problematic for us, as we would like to separate multiple responsibilities into different queues/supervisors (especially dedicated batch-supervisors) where 'standby' processes are a waste of resources, and immediate response to pending jobs is not a requirement in any case.

@graemlourens
Copy link
Contributor

@driesvints is it intended to stay labelled as 'needs more info' ? We're able to reproduce this every time (as described by rafal and myself), and as the option exists for having minProcesses to be set to 0 i would tend to declare this as a bug.
Hoping for your input.

@driesvints
Copy link
Member

@graemlourens unfortunately I don't have time to deep dive into this atm. If you really believe this is a bug then the best thing you can do atm is send in a PR to fix the broken behavior and let it be reviewed by Taylor.

@themsaid
Copy link
Member

themsaid commented Dec 6, 2019

A fix is proposed in #718

@graemlourens
Copy link
Contributor

@themsaid thank you very much for that PR. We tried out the changes and they work as intended. Hopefully this will get merged soon. I would still however have classified this as a bug, and not an enhancement.

@driesvints driesvints added bug and removed enhancement labels Dec 6, 2019
@driesvints
Copy link
Member

PR was merged. Thanks for reporting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants