Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(balancer): respect max retries #12346

Merged
merged 1 commit into from
Jan 18, 2024
Merged

fix(balancer): respect max retries #12346

merged 1 commit into from
Jan 18, 2024

Conversation

tzssangglass
Copy link
Member

Summary

In the balancer phase, when obtaining a connection from the upstream connection pool, the cached attribute of the peer connection is set to 1(pc->cached = 1;), indicating that the connection is obtained from the cache.

If an error occurs during the use of this connection, such as "upstream prematurely closed connection" the system will increase the tries attribute of the peer connection by executing u->peer.tries++.

tries represents the maximum number of attempts to connect to an upstream server. It is equal to the normal 1 attempt + retries (default value is 5) = 6.
The occurrence of u->peer.tries++ is unexpected and it results in the actual retry count exceeding 6 in worst cases.

This PR restores tries by callbacks to the balancer when u->peer.tries++ is unexpectedly set.

Checklist

Issue reference

Fix FTI-5616

In the balancer phase, when obtaining a connection from the upstream
connection pool, the `cached` attribute of the peer connection is set
to 1(`pc->cached = 1;`), indicating that the connection is obtained
from the cache.

If an error occurs during the use of this connection, such as
"upstream prematurely closed connection" the system will increase the
`tries` attribute of the peer connection by executing
`u->peer.tries++`.

`tries` represents the maximum number of attempts to connect to an
upstream server. It is equal to the normal 1 attempt + `retries`
(default value is 5) = 6.
The occurrence of `u->peer.tries++` is unexpected and it results
in the actual retry count exceeding 6 in worst cases.

This PR restores tries by callbacks to the balancer when
`u->peer.tries++` is unexpectedly set.

FIX [FTI-5616](https://konghq.atlassian.net/browse/FTI-5616)

Signed-off-by: tzssangglass <tzssangglass@gmail.com>
@tzssangglass tzssangglass marked this pull request as ready for review January 15, 2024 06:22
@tzssangglass
Copy link
Member Author

The previously reviewed PR #12242 had some issues with the latest CI configuration file, so I pulled out a new branch from the master and ported it to this PR.

@windmgc windmgc merged commit ad62d3b into master Jan 18, 2024
30 checks passed
@windmgc windmgc deleted the FTI-5616-2 branch January 18, 2024 06:09
chronolaw added a commit that referenced this pull request Jan 18, 2024
@chronolaw chronolaw added cherry-pick kong-ee schedule this PR for cherry-picking to kong/kong-ee and removed cherry-pick kong-ee schedule this PR for cherry-picking to kong/kong-ee labels Jan 18, 2024
@Kong Kong deleted a comment from team-gateway-bot Jan 18, 2024
tzssangglass added a commit to tzssangglass/kong that referenced this pull request Jan 19, 2024
In the balancer phase, when obtaining a connection from the upstream
connection pool, the `cached` attribute of the peer connection is set
to 1(`pc->cached = 1;`), indicating that the connection is obtained
from the cache.

If an error occurs during the use of this connection, such as
"upstream prematurely closed connection" the system will increase the
`tries` attribute of the peer connection by executing
`u->peer.tries++`.

`tries` represents the maximum number of attempts to connect to an
upstream server. It is equal to the normal 1 attempt + `retries`
(default value is 5) = 6.
The occurrence of `u->peer.tries++` is unexpected and it results
in the actual retry count exceeding 6 in worst cases.

This PR restores tries by callbacks to the balancer when
`u->peer.tries++` is unexpectedly set.

FIX [FTI-5616](https://konghq.atlassian.net/browse/FTI-5616)

Signed-off-by: tzssangglass <tzssangglass@gmail.com>
tzssangglass added a commit that referenced this pull request Jan 31, 2024
…d when handling cached connection errors

address comments of #12346

Signed-off-by: tzssangglass <tzssangglass@gmail.com>
kikito pushed a commit that referenced this pull request Jan 31, 2024
…d when handling cached connection errors (#12468)

* fix(balancer): ensure the `notify` callback is invoked only if defined when handling cached connection errors

address comments of #12346

Signed-off-by: tzssangglass <tzssangglass@gmail.com>

* fix

Signed-off-by: tzssangglass <tzssangglass@gmail.com>

---------

Signed-off-by: tzssangglass <tzssangglass@gmail.com>
github-actions bot pushed a commit that referenced this pull request Jan 31, 2024
…d when handling cached connection errors (#12468)

* fix(balancer): ensure the `notify` callback is invoked only if defined when handling cached connection errors

address comments of #12346

Signed-off-by: tzssangglass <tzssangglass@gmail.com>

* fix

Signed-off-by: tzssangglass <tzssangglass@gmail.com>

---------

Signed-off-by: tzssangglass <tzssangglass@gmail.com>
(cherry picked from commit 415ca0f)
kikito pushed a commit that referenced this pull request Jan 31, 2024
…d when handling cached connection errors (#12468)

* fix(balancer): ensure the `notify` callback is invoked only if defined when handling cached connection errors

address comments of #12346

Signed-off-by: tzssangglass <tzssangglass@gmail.com>

* fix

Signed-off-by: tzssangglass <tzssangglass@gmail.com>

---------

Signed-off-by: tzssangglass <tzssangglass@gmail.com>
(cherry picked from commit 415ca0f)
dndx added a commit that referenced this pull request Jan 31, 2024
* chore(patches): revert the "respect max retries" patch

We have discovered potential segfault risk with the feature and we do
not have enough time to review this in more depth, therefore we have
decided to revert the change temporarily to further investigate.

This reverts PR #12346.

FTI-5616
github-actions bot pushed a commit that referenced this pull request Jan 31, 2024
* chore(patches): revert the "respect max retries" patch

We have discovered potential segfault risk with the feature and we do
not have enough time to review this in more depth, therefore we have
decided to revert the change temporarily to further investigate.

This reverts PR #12346.

FTI-5616

(cherry picked from commit 99a9aa2)
bungle pushed a commit that referenced this pull request Jan 31, 2024
* chore(patches): revert the "respect max retries" patch

We have discovered potential segfault risk with the feature and we do
not have enough time to review this in more depth, therefore we have
decided to revert the change temporarily to further investigate.

This reverts PR #12346.

FTI-5616

(cherry picked from commit 99a9aa2)
tzssangglass added a commit to tzssangglass/kong that referenced this pull request Feb 1, 2024
…when handling cached connection errors

address comments of Kong#12346

Signed-off-by: tzssangglass <tzssangglass@gmail.com>
tzssangglass added a commit that referenced this pull request Feb 2, 2024
tzssangglass added a commit that referenced this pull request Feb 2, 2024
tzssangglass added a commit that referenced this pull request Feb 2, 2024
bungle pushed a commit that referenced this pull request Feb 2, 2024
bungle pushed a commit that referenced this pull request Feb 2, 2024
bungle pushed a commit that referenced this pull request Feb 2, 2024
AndyZhang0707 added a commit that referenced this pull request Jul 18, 2024
AndyZhang0707 added a commit that referenced this pull request Jul 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants