-
Notifications
You must be signed in to change notification settings - Fork 498
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GEP-1619: Refactor SessionPersistencePolicy into BackendLBPolicy #2634
GEP-1619: Refactor SessionPersistencePolicy into BackendLBPolicy #2634
Conversation
Skipping CI for Draft Pull Request. |
5cb9850
to
265ae56
Compare
Once #2689 merges, this will need a rebase and update - the GEP files have moved. |
265ae56
to
34ab84b
Compare
34ab84b
to
0669c1e
Compare
5f5583a
to
2d11cb6
Compare
bd37e24
to
611363c
Compare
611363c
to
1817284
Compare
@ginayeh @costinm @robscott Next round of updates (diff):
|
795f04c
to
4a3c09b
Compare
geps/gep-1619/index.md
Outdated
|
||
When using multiple backends in traffic splitting, all backend services should have session persistence enabled. | ||
Nonetheless, implementations should also support traffic splitting scenarios in which one service has persistence | ||
Nonetheless, implementations MUST also support traffic splitting scenarios in which one service has persistence | ||
enabled while the other does not. This support is necessary, particularly in scenarios where users are transitioning | ||
to or from an implementation version designed with or without persistence. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Idealy, we should have all backend services enable session persistence in traffic splitting scenarios. However, I want to get some clarity on this traffic splitting scenario of one service (svc1) has persistence enabled while the other (svc2, svc3) does not. In this scenario, if we keep session persistence only applied to svc1, all traffic will eventually goes to svc1 because the persistence should be maintained, resulting in traffic overloading svc1 and may break the system. A more reasonable interpretation of this is that session persistence MUST be applied to all backends (svc1, svc2, svc3) involved in traffic splitting if any of them have session persistence enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you have a good point that it may be much more practical to apply session persistence to all backends given one backend has session persistence configured, but like @costinm mentioned, I think it might be too prescriptive to say it must be done that way.
Do you see any problem with leaving the spec open for implementations to do both?:
- Maintain only session persistence to svc1 (eventually transitioning all traffic to svc1, possibly overloading it)
- Apply session persistence to all svc1-3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's leave the spec open and have implementations to decide which behavior they want since it's an edge case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in a new commit: 13eaa83
I can confirm that at least in CloudRun session persistence ( which has
been around for quite some time), the persistence can be enabled on svc1
while others (svc2, svc3 ) do not have it enabled.
I don't think we can impose a MUST on the opposite behavior - probably
other implementations follow the same model.
Applying persistence on svc2 and svc3 may also break them - because the
traffic distribution will be different from what a backend not using
persistent sessions expect ( as well as draining, etc).
Traffic may move to svc1 - which is intended behavior if user is
transitioning to a new version that is using session persistence, and
should not cause any problems with auto-scalling,
Or if user is moving from persistent session to stateless - it is possible
to set svc1 with a weight=0, and no new request will go to it - so use can
transition to svc2.
I don't mind if we allow some implementations to treat one backend having
persistence as all backends having it - migration is a transitory thing and
pretty rare. Also that's the
current Istio behavior.
So far we are not aware of any customers having problems with either
approach.
…On Thu, Mar 7, 2024 at 12:55 PM Gina Yeh ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In geps/gep-1619/index.md
<#2634 (comment)>
:
>
When using multiple backends in traffic splitting, all backend services should have session persistence enabled.
-Nonetheless, implementations should also support traffic splitting scenarios in which one service has persistence
+Nonetheless, implementations MUST also support traffic splitting scenarios in which one service has persistence
enabled while the other does not. This support is necessary, particularly in scenarios where users are transitioning
to or from an implementation version designed with or without persistence.
Idealy, we should have all backend services enable session persistence in
traffic splitting scenarios. However, I want to get some clarity on this
traffic splitting scenario of one service (svc1) has persistence enabled
while the other (svc2, svc3) does not. In this scenario, if we keep session
persistence only applied to svc1, all traffic will eventually goes to svc1
because the persistence should be maintained, resulting in traffic
overloading svc1 and may break the system. A more reasonable interpretation
of this is that session persistence MUST be applied to all backends (svc1,
svc2, svc3) involved in traffic splitting if any of them have session
persistence enabled.
—
Reply to this email directly, view it on GitHub
<#2634 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAUR2SONUZA3WP4XL43QF3YXDH4DAVCNFSM6AAAAABADPMBPKVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMYTSMRTGUYTMMZVGE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
The reason svc1 would be overloaded is if it doesn't use auto-scaling. High
traffic can overload any service that doesn't
correctly auto-scale or shedds load.
Load balancers can (should) also use load reporting and track capacity of a
backend - regardless of traffic splitting
weights a properly configured workload with a good load balancer should
never overload - new sessions should
not be sent to an endpoint that is at capacity.
Without sessions - if you have a 10%->v1 90%->v2 split but v1 is
under-sized and can only handle 5% - I would
expect a load-aware LB to send 5% and prioritize the health/reliability of
the system over the API. At some
point we may need to discuss load shedding ( not in this GEP ), which is
different from draining and another thing
that may make an LB ignore the split settings.
…On Thu, Mar 7, 2024 at 5:18 PM Grant Spence ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In geps/gep-1619/index.md
<#2634 (comment)>
:
>
When using multiple backends in traffic splitting, all backend services should have session persistence enabled.
-Nonetheless, implementations should also support traffic splitting scenarios in which one service has persistence
+Nonetheless, implementations MUST also support traffic splitting scenarios in which one service has persistence
enabled while the other does not. This support is necessary, particularly in scenarios where users are transitioning
to or from an implementation version designed with or without persistence.
I think you have a good point that it may be much more practical to apply
session persistence to all backends given one backend has session
persistence configured, but like @costinm <https://github.com/costinm>
mentioned, I think it might be too prescriptive to say it must be done that
way.
Do you see any problem with leaving the spec open for implementations to
do both?:
1. Maintain only session persistence to svc1 (eventually transitioning
all traffic to svc1, possibly overloading it)
2. Apply session persistence to all svc1-3
—
Reply to this email directly, view it on GitHub
<#2634 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAUR2SYBW4CGHMLA5SMB53YXEGYBAVCNFSM6AAAAABADPMBPKVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMYTSMRTHA3DCOBXHA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Updates #1619 |
Fix confusing "TTL" section to reflect the Expires / Max-Age cookie attribute. Add `LifetimeType` API field to specify between session and persistent cookies. Relates to kubernetes-sigs#2747
Update AbsoluteTimeoutSeconds and IdleTimeoutSeconds to use the standard Duration field from GEP-2257. Rename them to AbsoluteTimeout and IdleTimeout to reflect this change.
13eaa83
to
d77629a
Compare
/retest |
RFC8174 defines words such MUST, SHOULD, MAY need to be in all caps in order to convey a specific meaning.
Add details about how to handle failure case behavior and fix up TODO and Open Questions sections
Approach now uses more generic BackendLBPolicy which supports only attaching to Service. Add API for adding session persistence configuration inline to HTTPRoute and GRPCRoute rule. Add new edge cases for naming collision and two route rules sharing persistence.
Relax the session persistence traffic splitting edge case guidelines to allow implementations flexibility for scenarios within the same route rule.
d77629a
to
bbab28a
Compare
Last update just capitalized a |
Thanks for all the persistence on this @gcs278! This all LGTM. Will defer to @shaneutt or @youngnick for final sign off. /approve |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/ok-to-test
/unhold
/approve
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: gcs278, robscott, shaneutt The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What type of PR is this?
/kind gep
Optionally add one or more of the following kinds if applicable:
N/A
What this PR does / why we need it:
The main goal of this PR is to get GEP-1619 ready for experimental by answering the open questions in Graduation Criteria for Implementable Status:
It also has variety of other clean up and clarification work.
The PR currently does the following (broken in to commits):
SessionPersistencePolicy
intoBackendLBPolicy
and attach only to services, and introduce a new inline route APIWhich issue(s) this PR fixes:
#2747
#1778 (Doesn't fix, but is related because
LoadBalancerPolicy
is the framework for adding algorithm selection)Does this PR introduce a user-facing change?: