Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GEP-1619: Refactor SessionPersistencePolicy into BackendLBPolicy #2634

Merged
merged 9 commits into from
Mar 15, 2024

Conversation

gcs278
Copy link
Contributor

@gcs278 gcs278 commented Dec 1, 2023

What type of PR is this?
/kind gep

Optionally add one or more of the following kinds if applicable:
N/A

What this PR does / why we need it:

The main goal of this PR is to get GEP-1619 ready for experimental by answering the open questions in Graduation Criteria for Implementable Status:

  1. Should we leave room in this policy to add additional concepts in the future such as Session Affinity? If so, how would we adjust the naming and overall scope of this policy?
  2. Should we leave room for configuring different forms of Session Persistence? If so, what would that look like?
    It also has variety of other clean up and clarification work.

The PR currently does the following (broken in to commits):

  1. Fix links, formatting, metadata.yaml
  2. Add new implementations with their session persistence APIs
  3. Answer question on forms of Session Persistence
  4. Distinguish between session vs. persistent cookies (see GEP 1619: distinguish 'session' cookies. #2747)
  5. Update timeouts to use Duration field
  6. Adopt RFC8174 language
  7. Answer open question about unhealthy backends
  8. Refactor the API from SessionPersistencePolicy into BackendLBPolicy and attach only to services, and introduce a new inline route API

Which issue(s) this PR fixes:
#2747
#1778 (Doesn't fix, but is related because LoadBalancerPolicy is the framework for adding algorithm selection)

Does this PR introduce a user-facing change?:

NONE

@k8s-ci-robot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/gep PRs related to Gateway Enhancement Proposal(GEP) cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Dec 1, 2023
@k8s-ci-robot k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Dec 1, 2023
@gcs278 gcs278 force-pushed the gep1619-api-revision branch from 5cb9850 to 265ae56 Compare December 18, 2023 23:21
@youngnick
Copy link
Contributor

Once #2689 merges, this will need a rebase and update - the GEP files have moved.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 22, 2023
@gcs278 gcs278 force-pushed the gep1619-api-revision branch from 265ae56 to 34ab84b Compare February 6, 2024 19:16
@gcs278 gcs278 force-pushed the gep1619-api-revision branch from 34ab84b to 0669c1e Compare February 13, 2024 21:36
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Feb 13, 2024
@gcs278 gcs278 force-pushed the gep1619-api-revision branch 3 times, most recently from 5f5583a to 2d11cb6 Compare February 15, 2024 00:36
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 15, 2024
@gcs278 gcs278 force-pushed the gep1619-api-revision branch 6 times, most recently from bd37e24 to 611363c Compare February 15, 2024 06:56
@gcs278 gcs278 changed the title WIP: GEP-1619: API revision and tweaks WIP: GEP-1619: Refactor SessionPersistencePolicy to LoadBalancerPolicy Feb 15, 2024
@gcs278 gcs278 changed the title WIP: GEP-1619: Refactor SessionPersistencePolicy to LoadBalancerPolicy WIP: GEP-1619: Refactor SessionPersistencePolicy into LoadBalancerPolicy Feb 15, 2024
@gcs278 gcs278 force-pushed the gep1619-api-revision branch from 611363c to 1817284 Compare February 15, 2024 17:56
@gcs278
Copy link
Contributor Author

gcs278 commented Mar 7, 2024

@ginayeh @costinm @robscott Next round of updates (diff):

  • Updated title of GEP
  • Changed name from BackendLoadBalancePolicy to BackendLBPolicy
  • Made the route API common with BackendLBPolicy
  • Added a clause to route API to basically say, it's not done, reevaluate and added another Implementable criteria for this
  • Changed Timeouts to Extended and use Duration field (added as new commit)

@gcs278 gcs278 force-pushed the gep1619-api-revision branch from 795f04c to 4a3c09b Compare March 7, 2024 19:48

When using multiple backends in traffic splitting, all backend services should have session persistence enabled.
Nonetheless, implementations should also support traffic splitting scenarios in which one service has persistence
Nonetheless, implementations MUST also support traffic splitting scenarios in which one service has persistence
enabled while the other does not. This support is necessary, particularly in scenarios where users are transitioning
to or from an implementation version designed with or without persistence.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idealy, we should have all backend services enable session persistence in traffic splitting scenarios. However, I want to get some clarity on this traffic splitting scenario of one service (svc1) has persistence enabled while the other (svc2, svc3) does not. In this scenario, if we keep session persistence only applied to svc1, all traffic will eventually goes to svc1 because the persistence should be maintained, resulting in traffic overloading svc1 and may break the system. A more reasonable interpretation of this is that session persistence MUST be applied to all backends (svc1, svc2, svc3) involved in traffic splitting if any of them have session persistence enabled.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you have a good point that it may be much more practical to apply session persistence to all backends given one backend has session persistence configured, but like @costinm mentioned, I think it might be too prescriptive to say it must be done that way.

Do you see any problem with leaving the spec open for implementations to do both?:

  1. Maintain only session persistence to svc1 (eventually transitioning all traffic to svc1, possibly overloading it)
  2. Apply session persistence to all svc1-3

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's leave the spec open and have implementations to decide which behavior they want since it's an edge case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in a new commit: 13eaa83

@costinm
Copy link

costinm commented Mar 7, 2024 via email

@costinm
Copy link

costinm commented Mar 8, 2024 via email

@youngnick
Copy link
Contributor

Updates #1619

gcs278 added 2 commits March 11, 2024 13:00
Fix confusing "TTL" section to reflect the Expires / Max-Age cookie
attribute. Add `LifetimeType` API field to specify between session and
persistent cookies.

Relates to kubernetes-sigs#2747
Update AbsoluteTimeoutSeconds and IdleTimeoutSeconds to use the standard
Duration field from GEP-2257. Rename them to AbsoluteTimeout and
IdleTimeout to reflect this change.
@gcs278 gcs278 force-pushed the gep1619-api-revision branch from 13eaa83 to d77629a Compare March 11, 2024 17:01
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/invalid-commit-message Indicates that a PR should not merge because it has an invalid commit message. label Mar 11, 2024
@shaneutt
Copy link
Member

/retest

gcs278 added 4 commits March 12, 2024 13:46
RFC8174 defines words such MUST, SHOULD, MAY need to be in all caps in
order to convey a specific meaning.
Add details about how to handle failure case behavior and fix up TODO
and Open Questions sections
Approach now uses more generic BackendLBPolicy which supports only
attaching to Service. Add API for adding session persistence
configuration inline to HTTPRoute and GRPCRoute rule.

Add new edge cases for naming collision and two route rules sharing
persistence.
Relax the session persistence traffic splitting edge case guidelines
to allow implementations flexibility for scenarios within the same route
rule.
@gcs278 gcs278 force-pushed the gep1619-api-revision branch from d77629a to bbab28a Compare March 12, 2024 17:47
@gcs278
Copy link
Contributor Author

gcs278 commented Mar 12, 2024

Last update just capitalized a MUST.

@robscott robscott added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Mar 14, 2024
@robscott
Copy link
Member

Thanks for all the persistence on this @gcs278! This all LGTM. Will defer to @shaneutt or @youngnick for final sign off.

/approve

Copy link
Member

@shaneutt shaneutt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/ok-to-test
/unhold
/approve
/lgtm

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Mar 15, 2024
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 15, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: gcs278, robscott, shaneutt

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit 7c672a7 into kubernetes-sigs:main Mar 15, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/gep PRs related to Gateway Enhancement Proposal(GEP) lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note-none Denotes a PR that doesn't merit a release note. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants