Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP-2086: alpha prod readiness review #2441

Merged
merged 2 commits into from
Feb 9, 2021

Conversation

andrewsykim
Copy link
Member

The PRR for KEP-2086 is already filled out for alpha but it needs to be approved

Signed-off-by: Andrew Sy Kim kim.andrewsy@gmail.com

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/network Categorizes an issue or PR as relevant to SIG Network. labels Feb 6, 2021
@k8s-ci-robot k8s-ci-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Feb 6, 2021
@andrewsykim andrewsykim changed the title KEP-2086: prod readiness review KEP-2086: alpha prod readiness review Feb 7, 2021
@andrewsykim
Copy link
Member Author

/assign @johnbelamaric @thockin

@@ -0,0 +1,3 @@
kep-number: 2086
alpha:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@wojtek-t wojtek-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/hold

@@ -0,0 +1,3 @@
kep-number: 2086
alpha:
approver: "@johnbelamaric"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually claim we should rethink this proposal with the new approach that we're proceed with decribed in:
#2434

The arguments described in the proposal (EndpointSlice per node) will no longer hold with that (we can easily imagine "forNode" hint too).
So instead of rushing with this one, I would really like to take a step back and ensure that we can't converge this with #2434

@thockin @robscott @aojea @johnbelamaric

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the slices have already a hostname field, IIUIC this is a just a matter of signalling kube-proxy to leverage it, it can be an annotation or an api field, no?

	// hostname of this endpoint. This field may be used by consumers of
	// endpoints to distinguish endpoints from each other (e.g. in DNS names).
	// Multiple endpoints which use the same hostname should be considered
	// fungible (e.g. multiple A values in DNS). Must pass DNS Label (RFC 1123)
	// validation.
	// +optional
	Hostname *string

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, the problem is that the API field can be contradictory with the new annotations topology.kubernetes.io/topology-aware-routing, right?

Maybe we should add a new value to the annotation to implement local traffic ...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point that we don't need any additional hints for that.

Regarding API - given it's explicitly stated there as "local-traffic will require careful handing" (or sth like that), I think we should think if it doesn't make sense to unify (using single annotation is what i was thinking, but I think we should rely on API approvers (@thockin - back to you) to recommend if that makes sense).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding API - given it's explicitly stated there as "local-traffic will require careful handing" (or sth like that), I think we should think if it doesn't make sense to unify (using single annotation is what i was thinking, but I think we should rely on API approvers (@thockin - back to you) to recommend if that makes sense).

There was a lot of back and forth on this and the conclusion IIRC is that "route to node-local" is a distinct and common enough use-case to warrant a different API from topology-aware routing. More on this email thread: https://groups.google.com/g/kubernetes-sig-network/c/wXd1D_fKjqU. But that was also under the assumption that we were going to use EndpointSlice subsetting and a EndpointSlice per-node didn't make sense.

So, the problem is that the API field can be contradictory with the new annotations topology.kubernetes.io/topology-aware-routing, right?

There is an overlap if you consider Local or PreferLocal. The assumption so far has been that this annotation would only be used with Cluster (now All). My personal preference would actually be to have TopologyAware be another possible value of internalTrafficPolicy as opposed to an annotation but I'll leave that feedback in the other KEP PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @maplain who is working on the initial implementation for internalTrafficPolicy kubernetes/kubernetes#96600

Copy link
Member

@swetharepakula swetharepakula Feb 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to try to combine these. There is a lot of overlap. I prefer PreferZone over Topology as it leaves room for expansion in the future if we add other topology algorithms.

In regards to "Auto" meaning "decide for me"- IIUC that means that based on the endpoints available, choosing between Cluster, Local, PreferLocal, and PreferZone. Would that require an additional field or annotation to state what was chosen? PreferZone is the only one that would be detectable because of the the topologyHints. How would kube-proxy differentiate between the others?

Copy link
Member

@thockin thockin Feb 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the easiest thing to understand here might be adding PreferZone

The semantics of PreferNode are "if there is an endpoint on-node, use it. Only
if there is no endpoint should you overflow to another node". I don't think we
want the same for zone (or at least, that's not the heuristic you have
developed and almost certainly not the best for "auto").

potentially RequireZone as options for InternalTrafficPolicy

If and when we have use-cases.

EndpointSlice hints are only generated when internalTrafficPolicy == PreferZone

We could always generate hints, but why bother unless we are in a mode where
they matter? We might want external traffic to have topology, too?

Limiting this to internalTrafficPolicy removes a potential link to LB config.

Actually - is that a reason why topology and local-only might not be the same?
What happens if I set internalTrafficPolicy: Topology and I take external
traffic? Does that not follow topology? Or does topological subsetting apply
to all "whole cluster" policies?

In the context of external traffic, "Local" has a semantic implication (no
SNAT) as well as a perfomance/cost implication. We have to respect that. But
the default is "Cluster" and it seems to me that topology should apply in those
cases (cost/perf win but no semantic change).

This point is weighing on me - thoughts?

In regards to "Auto" meaning "decide for me"- IIUC that means that based on the endpoints available, choosing between Cluster, Local, PreferLocal, and PreferZone. Would that require an additional field or annotation to state what was chosen? PreferZone is the only one that would be detectable because of the the topologyHints. How would kube-proxy differentiate between the others?

I'm not sure it's the right approach, but "Auto" could look at the endpoints
and decide "I have at least one endpoint on every node, so I will set a hint to
use the same node" vs "I don't have enough for node, but I can bound by zone" or
even some mixed mode.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we should all jump on a call? I don't think we need to block the KEP(s), though

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updates from Slack thread:

  • rename internalTrafficPolicy to trafficPolicy to reflect increased scope
  • if externalTrafficPolicy=Cluster, fall back to trafficPolicy for external sources
  • if externalTrafficPolicy=Local -- this rule only takes precedent for external sources, with internal traffic following trafficPolicy

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 8, 2021
@wojtek-t
Copy link
Member

wojtek-t commented Feb 8, 2021

@thockin - feel free to hold cancel when you're fine with it; I just wanted to flag the overlap and it seems I succeeded. I will not have time to go back to it by feature freeze, so I'm leaving the resolution to you.

@andrewsykim
Copy link
Member Author

@thockin - feel free to hold cancel when you're fine with it; I just wanted to flag the overlap and it seems I succeeded. I will not have time to go back to it by feature freeze, so I'm leaving the resolution to you.

Thanks @wojtek-t, I will also update PRR reviewer to you. I think you still need to approve the PR for changes in keps/prod-readiness?

@johnbelamaric
Copy link
Member

I'll let @wojtek-t take this one...

Signed-off-by: Andrew Sy Kim <kim.andrewsy@gmail.com>
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Feb 9, 2021
Signed-off-by: Andrew Sy Kim <kim.andrewsy@gmail.com>
Copy link
Member

@robscott robscott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the work on this! Just a few nits but nothing that should block this PR from getting in.

InternalTrafficPolicy ServiceInternalTrafficPolicyType `json:"internalTrafficPolicy,omitempty"`
// trafficPolicy denotes if the traffic for a Service should route
// to cluster-wide endpoints or node-local endpoints. "Cluster" routes traffic
// to a Service to all cluster-wide endpoints. "Topology" routes traffic based on
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I'd personally prefer "PreferZone" over "Topology" here but no reason to block this PR.

@@ -34,7 +34,7 @@ milestone:
# The following PRR answers are required at alpha release
# List the feature gate name and the components for which it must be enabled
feature-gates:
- name: ServiceInternalTrafficPolicy
- name: ServiceITrafficPolicy
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- name: ServiceITrafficPolicy
- name: ServiceTrafficPolicy

@@ -34,7 +34,7 @@ milestone:
# The following PRR answers are required at alpha release
# List the feature gate name and the components for which it must be enabled
feature-gates:
- name: ServiceInternalTrafficPolicy
- name: ServiceITrafficPolicy
components:
- kube-apiserver
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: This will also need to be read from kube-controller-manager now that topology is dependent on this feature gate.

Copy link
Member

@thockin thockin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

1. Cluster (default): route to all cluster-wide endpoints (or use topology aware subsetting if enabled).
2. Topology: route to endpoints using topology-aware routing. See Topology Aware Hints KEP for more details.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can argue names in the PR I need to think about it, but I don't think I like this layout. :) Minor

* when `internalTrafficPolicy=PreferLocal`, route to endpoints in EndpointSlice that matches the local node's topology (topology defined by `kubernetes.io/hostname`),
* when `trafficPolicy=Cluster`, default to existing behavior today.
* when `trafficPolicy=Topology`, use topology hints from EndpointSlice API.
* when `trafficPolicy=PreferLocal`, route to endpoints in EndpointSlice that matches the local node's topology (topology defined by `kubernetes.io/hostname`),
fall back to "Cluster" behavior if there are no local endpoints.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll have to figure out if the fallback is Cluster or Topology or if we need 2 modes. OK for here, the PR will need to explore that.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 9, 2021
@thockin
Copy link
Member

thockin commented Feb 9, 2021

/approve


## Proposal

Introduce a new field in Service `spec.internalTrafficPolicy`. The field will have 3 codified values:
Introduce a new field in Service `spec.trafficPolicy`. The field will have 4 codified values:
1. Cluster (default): route to all cluster-wide endpoints (or use topology aware subsetting if enabled).
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should #2166 be rebased or we could take the chance to update Cluster to All
cc @howardjohn

@wojtek-t wojtek-t self-assigned this Feb 9, 2021
@wojtek-t
Copy link
Member

wojtek-t commented Feb 9, 2021

/approve PRR

@andrewsykim - please apply the remaining minor comments in the follow up PR

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 9, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andrewsykim, thockin, wojtek-t

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 9, 2021
@k8s-ci-robot k8s-ci-robot merged commit 01ffe07 into kubernetes:master Feb 9, 2021
@k8s-ci-robot k8s-ci-robot added this to the v1.21 milestone Feb 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/network Categorizes an issue or PR as relevant to SIG Network. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants