Strange upstream stickiness issue #7246
Unanswered
StarwingRC
asked this question in
Help
Replies: 1 comment
-
Hi @CrystianRC! We've never had a report similar to that. As this seems a non-deterministic behavior it would be really hard to debug it. Can you share some logs around the time the issue happens? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello,
We are encountering a strange situation with Kong 2.3.3 (encountered the same with 2.1.3) involving upstream & stickiness.
A bit of context:
We have our Kong deployed in Kubernetes (EKS), db-less mode, admin interface disabled, running on Amazon Linux 2.
In our egress config file we have defined an upstream object and it has 4 targets to send requests to. Also, the requirement was that traffic should be sent to the same target based on the value of a cookie. For this, we have made use of the hash_on: "cookie' and hash_on_cookie: "---cookie_name---" directives. We also have enabled the active healthchecks for the targets.
The issue that we are encountering is the following:
From time to time, during a "user session" that is already "stuck" to a target and working fine, Kong redirects a call, or more,of that same session, to another target, without any warning or reason for doing that (or we haven't properly looked). This, in return, has the potential of totally thrashing the user session
Our initial thought was that, somehow, the cookie that is being used in the stickiness decision, has changed. We have changed kong access log to output also the value of the cookie, restarted Kong and let it run for some time (days) until we got the error back. To our surprise, the cookie values are identical.
We weren't able to reproduce this and from our observations it's totally random, takes some time to appear, doesn't happen for a specific call and is driving us nuts :)
Another idea that we had was that maybe there is a network glitch and because of the healthchecks, we get our-stuck-to upstream-target removed from load balancing. This doesn't stand because we have calls from other user-sessions, in the same time, directed to the same target that work fine. Also, the log does not show any healthchecks failing.
When this happens, not all users are affected, just some.
This whole situation resolves itself if we restart Kong, but after a few days, we would have it back again.
By any chance, from your experience, did you encounter anything similar?
Thank you all for reading and help :)
Beta Was this translation helpful? Give feedback.
All reactions