Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scalers don't retry if connection is lost #2415

Closed
pascallap opened this issue Dec 22, 2021 · 3 comments
Closed

Scalers don't retry if connection is lost #2415

pascallap opened this issue Dec 22, 2021 · 3 comments
Assignees
Labels
bug Something isn't working
Milestone

Comments

@pascallap
Copy link

Report

If the scaler rabbitmq is installed or launched while rabbitmq isn't up.
There is no reconnection attempt.

We only get an error:
ERROR scalehandler error resolving auth params {"scalerIndex": 0, "object": {"apiVersion": "keda.sh/v1alpha1", "kind": "ScaledJob", "namespace": "anamespace", "name": "servicename"}, "trigger": 0, "error": "error establishing rabbitmq connection: dial tcp xxx.xxx.xx.xx:5672: i/o timeout"}

And if rabbitmq is available after that there is no reconnection attempt.
The only way to solve this is by restarting the keda-operator.

Expected Behavior

A reconnection retry.

Actual Behavior

No retry.

Steps to Reproduce the Problem

  1. Install keda in namespace keda
  2. Install rabbitmq in namespace default
  3. configure a network policy on default namespace blocking all external traffic.
  4. Install a scaler in default namespace
  5. ---- Failure --- Error in keda operator log
  6. Delete the network policy to allow traffic.
  7. ---- Nothing ---- There is no reconnect from keda to rabbitmq.

Logs from KEDA operator

ERROR	scalehandler	error resolving auth params	{"scalerIndex": 0, "object": {"apiVersion": "keda.sh/v1alpha1", "kind": "ScaledJob", "namespace": "anamespace", "name": "servicename"}, "trigger": 0, "error": "error establishing rabbitmq connection: dial tcp xxx.xxx.xx.xx:5672: i/o timeout"}

KEDA Version

2.5.0

Kubernetes Version

1.19

Platform

Amazon Web Services

Scaler Details

RabbitMQ

Anything else?

This is a big problem, since we are running rabbitmq in cluster.
And we close the clusters each night.

Keda is too fast to restart, and rabbitmq is not properly up when keda tries to establish the connection.

@pascallap pascallap added the bug Something isn't working label Dec 22, 2021
@zroubalik zroubalik added this to the v2.6.0 milestone Jan 3, 2022
@zroubalik
Copy link
Member

Thanks for opening this issue, it seems like that this bug has been introduced by caching mechanism implemented in 2.5.0 #2187 and it is not exclusive to RabbitMQ.

@Lyrositor
Copy link

I can confirm this, I have experienced this with the Redis Lists scaler as well.

Reverting to 2.4.0 fixes the issue.

@zroubalik zroubalik changed the title Rabbitmq Scaler doesn't retry if connection is lost Scalers don't retry if connection is lost Jan 3, 2022
@JorTurFer JorTurFer self-assigned this Jan 7, 2022
@JorTurFer
Copy link
Member

I think that this PR solves the problem

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants