Skip to content
This repository has been archived by the owner on Jul 11, 2023. It is now read-only.

Allow splitting traffic to same FQDN using multiple traffic split policies matching different routes #2759

Closed
addozhang opened this issue Mar 6, 2021 · 14 comments
Labels
area/SMI SMI implementation related kind/enhancement priority/P3 P3 priority size/M 7 days (~1.5 week) stale

Comments

@addozhang
Copy link
Contributor

addozhang commented Mar 6, 2021

Please describe the Improvement and/or Feature Request

Scope (please mark with X where applicable)

  • New Functionality [ ]
  • Install [ ]
  • SMI Traffic Access Policy [ ]
  • SMI Traffic Specs Policy [ ]
  • SMI Traffic Split Policy [ ]
  • Permissive Traffic Policy [ ]
  • Ingress [ ]
  • Egress [ ]
  • Envoy Control Plane [X]
  • CLI Tool [ ]
  • Metrics [ ]
  • Certificate Management [ ]
  • Sidecar Injection [ ]
  • Logging [ ]
  • Debugging [ ]
  • Tests [ ]
  • CI System [ ]
  • Project Release [ ]

Possible use cases

In 0.7.0, it works fine. But not work in 0.8.0.

I guess it relates with logic https://github.com/openservicemesh/osm/blob/main/pkg/catalog/routes.go#L89.

- apiVersion: split.smi-spec.io/v1alpha2
  kind: TrafficSplit
  metadata:
    labels:
      target: boot
    name: boot-gray
  spec:
    backends:
    - service: boot-prod
      weight: 100
    service: boot
- apiVersion: split.smi-spec.io/v1alpha2
  kind: TrafficSplit
  metadata:
    labels:
      target: boot
    name: boot-gray-282
  spec:
    backends:
    - service: boot-gray
      weight: 100
    service: boot

I think the logic of 'apex service' should be that if 'existing', append to existing WeightedClusters set (make apexServices variable as map, key is MeshService, and value is OutboundTrafficPolicy).

Current logic limits one service can have ONLY one traffice split.

@shashankram shashankram added kind/discussion Discussing a topic area/SMI SMI implementation related and removed improvement / feature request labels Mar 8, 2021
@shashankram
Copy link
Member

shashankram commented Mar 8, 2021

What's the use case to have multiple traffic splits for the same FQDN? The check you see above is present to ensure conflicting traffic split policies for the same root service are not configured.

To avoid misconfiguration, it is advisable to list all the backends for the root service in the same traffic split policy. I don't see a good use case to support multiple traffic split policies corresponding to the root service. The reason this might be working in v0.7 is because if no checks in place, doesn't mean it was the intended behavior.

@michelleN , do you see a use case for multiple traffic splits for the same root service? Eventually, such ambiguity must be resolved using a validating webhook for SMI.

@addozhang
Copy link
Contributor Author

addozhang commented Mar 9, 2021

@shashankram I think the split is not only applicable for canary release. Canary release usually has traffic splitted between two versions, and is implemented by route. Widely, route can be used among more than two versions, like bellow.

Some times, the environment resources are offten limited. And services may have multi versions being tested in one environment. Assuming that we have a flow A-B-C, and some of them have multi versions: A(A-v1,A-v2, A-main), B-main, C(C-v1, c-v2, c-main), main means the default/main branch.
The route is header with v1 will fall into v1 flow, and v2 is v2, v3 is v3.
Then for FQDN, the will be two traffic splits for service A, one is for v1 and another for v2.

We can call this 'flow dyeing', and the colorful flow likes swiming lane. The traffic may be splited two, maybe more.

I think the route is NOT ONLY for canary/AB/blue-gray testing only. It should be more widely used.

@shashankram
Copy link
Member

@shashankram I think the split is not only applicable for canary release. Canary release usually has traffic splitted between two versions, and is implemented by route. Widely, route can be used among more than two versions, like bellow.

Some times, the environment resources are offten limited. And services may have multi versions being tested in one environment. Assuming that we have a flow A-B-C, and some of them have multi versions: A(A-v1,A-v2, A-main), B-main, C(C-v1, c-v2, c-main), main means the default/main branch.
The route is header with v1 will fall into v1 flow, and v2 is v2, v3 is v3.
Then for FQDN, the will be two traffic splits for service A, one is for v1 and another for v2.

We can call this 'flow dyeing', and the colorful flow likes swiming lane. The traffic may be splited two, maybe more.

I think the route is NOT ONLY for canary/AB/blue-gray testing only. It should be more widely used.

Traffic splitting in SMI is simple, you pick the root service's FQDN for which you want to split traffic, and specify the backends to split traffic to. When you decide to change your traffic splitting configuration, you simply update the existing traffic policy spec corresponding to the root service. This is simple and easy to configure.

Allowing multiple traffic split for the same service FQDN opens the door for misconfiguration, and this can become hard for users to spot and debug.

Currently, with v1alpha2 of TrafficSplit, only canary splitting is possible. With v1alpha4 of TrafficSplit, A/B testing with route matching on traffic splitting will be supported. When route matching can be linked with traffic splitting, you can specify a route match on headers, methods etc.

To be able to do traffic splitting based on routes matching, the following 2 issues need to be resolved:
#2368
#2369

@addozhang let me know if this clarifies your request.

@addozhang
Copy link
Contributor Author

Totally agrees with "Traffic splitting in SMI is simple, you pick the root service's FQDN for which you want to split traffic".
But I think what we should do is not closing the door, it is asking a verification check to avoiding miconfiguration for users insteadl.

The logic in v0.7.0 is ignoring multi splits for one FQDN checing, and combine them at last. That approach is better, and it follows the envoy routing design.

@shashankram
Copy link
Member

The logic in v0.7.0 is ignoring multi splits for one FQDN checing, and combine them at last. That approach is better, and it follows the envoy routing design.

SMI is not meant to follow Envoy routing design. OSM implements SMI with Envoy as the proxy. Envoy allowing something does not imply SMI should behave one way or the other.

As I mentioned before, route matching with traffic splitting will be available with v1alpha4 of SMI traffic splitting.

Is there a reason you can't specify all the backends in a single traffic split resource?

@addozhang
Copy link
Contributor Author

In v1alpha4 split, will it support mult splits for same FQDN?

@shashankram
Copy link
Member

shashankram commented Mar 9, 2021

In v1alpha4 split, will it support mult splits for same FQDN?

As it stands this is something we haven't planned to support. Could you provide sample SMI traffic split configurations for your use case? I don't follow the example in #2759 (comment).

@shashankram
Copy link
Member

@addozhang , I imagine with v1alpha4 of traffic split something like the following will be supported, which will allow splitting same FQDN differently based on routes. I think that's what you are looking for (but not supported with v1alpha2 of split).

kind: TrafficSplit
metadata:
  name: split-1
spec:
  service: foo.bar.svc.cluster.local
  matches:
  - kind: HTTPRouteGroup
    name: route-1
  backends:
  - service: foo-v1
    weight: 10
  - service: foo-v2
    weight: 90
---
kind: TrafficSplit
metadata:
  name: split-2
spec:
  service: foo.bar.svc.cluster.local
  matches:
  - kind: HTTPRouteGroup
    name: route-2
  backends:
  - service: foo-v2
    weight: 100

@shashankram
Copy link
Member

@addozhang , I imagine with v1alpha4 of traffic split something like the following will be supported, which will allow splitting same FQDN differently based on routes. I think that's what you are looking for (but not supported with v1alpha2 of split).

kind: TrafficSplit
metadata:
  name: split-1
spec:
  service: foo.bar.svc.cluster.local
  matches:
  - kind: HTTPRouteGroup
    name: route-1
  backends:
  - service: foo-v1
    weight: 10
  - service: foo-v2
    weight: 90
---
kind: TrafficSplit
metadata:
  name: split-2
spec:
  service: foo.bar.svc.cluster.local
  matches:
  - kind: HTTPRouteGroup
    name: route-2
  backends:
  - service: foo-v2
    weight: 100

ref: #2368

@michelleN michelleN added this to the v0.9.0 milestone Mar 9, 2021
@michelleN michelleN added the priority/P0 P0 priority label Mar 9, 2021
@addozhang
Copy link
Contributor Author

@shashankram yes, that's it. Currently we are developping a UI to implement routes configuring via management TrafficSplit.
As code snippet mentioned in issue description, we create TrafficSplit for each backends individually. In v0.7.0, they are combined to route finally.

@shashankram shashankram changed the title one service has multiple traffic splits not working in 0.8.0 Allow splitting traffic to same FQDN using multiple traffic split policies matching different routes Mar 10, 2021
@michelleN michelleN self-assigned this Mar 15, 2021
michelleN pushed a commit to michelleN/osm that referenced this issue Mar 15, 2021
+ updates OSM to support SMI Traffic Split v1alpha4
which includes HTTPRouteGroup support enabling scenarios
such as A/B testing and routing based on HTTP attributes

* resolves openservicemesh#2759
* resolves openservicemesh#2368

Signed-off-by: Michelle Noorali <minooral@microsoft.com>
michelleN pushed a commit to michelleN/osm that referenced this issue Mar 16, 2021
+ updates OSM to support SMI Traffic Split v1alpha4
which includes HTTPRouteGroup support enabling scenarios
such as A/B testing and routing based on HTTP attributes

* resolves openservicemesh#2759
* resolves openservicemesh#2368

Signed-off-by: Michelle Noorali <minooral@microsoft.com>
michelleN pushed a commit to michelleN/osm that referenced this issue Mar 16, 2021
+ updates OSM to support SMI Traffic Split v1alpha4
which includes HTTPRouteGroup support enabling scenarios
such as A/B testing and routing based on HTTP attributes

* resolves openservicemesh#2759
* resolves openservicemesh#2368

Signed-off-by: Michelle Noorali <minooral@microsoft.com>
michelleN pushed a commit to michelleN/osm that referenced this issue Mar 16, 2021
+ updates OSM to support SMI Traffic Split v1alpha4
which includes HTTPRouteGroup support enabling scenarios
such as A/B testing and routing based on HTTP attributes

* resolves openservicemesh#2759
* resolves openservicemesh#2368

Signed-off-by: Michelle Noorali <minooral@microsoft.com>
michelleN pushed a commit to michelleN/osm that referenced this issue Mar 16, 2021
+ updates OSM to support SMI Traffic Split v1alpha4
which includes HTTPRouteGroup support enabling scenarios
such as A/B testing and routing based on HTTP attributes

* resolves openservicemesh#2759
* resolves openservicemesh#2368

Signed-off-by: Michelle Noorali <minooral@microsoft.com>
@michelleN michelleN modified the milestones: v0.9.0, v0.10.0 May 6, 2021
@michelleN michelleN removed their assignment Jun 14, 2021
@shashankram shashankram removed the kind/discussion Discussing a topic label Jun 25, 2021
@draychev draychev modified the milestones: v0.10.0, vNext Aug 23, 2021
@snehachhabria snehachhabria removed this from the vNext milestone Feb 2, 2022
@github-actions
Copy link

github-actions bot commented Apr 5, 2022

This issue will be closed due to a long period of inactivity. If you would like this issue to remain open then please comment or update.

@github-actions github-actions bot added the stale label Apr 5, 2022
@shashankram shashankram removed the stale label Apr 5, 2022
@shashankram shashankram added this to the vFuture milestone Apr 5, 2022
@github-actions
Copy link

Added default label size/needed. Please consider re-labeling this issue appropriately.

@steeling steeling added size/M 7 days (~1.5 week) priority/P3 P3 priority and removed size/needed priority/P0 P0 priority labels Jul 15, 2022
@trstringer trstringer removed this from the vFuture milestone Nov 14, 2022
@github-actions
Copy link

This issue will be closed due to a long period of inactivity. If you would like this issue to remain open then please comment or update.

@github-actions github-actions bot added the stale label Jan 28, 2023
@github-actions
Copy link

github-actions bot commented Feb 4, 2023

Issue closed due to inactivity.

@github-actions github-actions bot closed this as completed Feb 4, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area/SMI SMI implementation related kind/enhancement priority/P3 P3 priority size/M 7 days (~1.5 week) stale
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants