Better support Istio Mesh Internal to cluster #3485

mwm5945 · 2021-08-12T15:23:10Z

Currently, the VirtualServices created by Seldon are attached to a gateway, which means that all models must be invoked via the hosts in that gateway, thereby also forcing all traffic through the Istio IngressGateway. While this works for traffic coming in from outside the cluster, its cumbersome when traffic is coming from within the same cluster (having to go through multiple side cars/IGW to get to the model).

Ex Current VS:

apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: test-model-1-http
  namespace: my-ns
  ownerReferences:
  - apiVersion: machinelearning.seldon.io/v1
    blockOwnerDeletion: true
    controller: true
    kind: SeldonDeployment
    name: test-model-1
spec:
  gateways:
  - my-ns/seldon-gateway
  hosts:
  - '*'
  http:
  - match:
    - uri:
        prefix: /seldon/bonsai-maas/test-model-1/
    rewrite:
      uri: /
    route:
    - destination:
        host: test-model-1-test-model-0
        port:
          number: 9000
        subset: test-model-0

Since gateway is set to my-ns/seldon-gateway, this VS will only apply to traffic coming out of the Istio IngressGateway.

Enabling the mesh gateway means the sidecar can now direct traffic through the Istio mesh, instead of forcing it through the IGW.

apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: test-shadow-graph-1-http-2
  namespace: my-ns
spec:
  gateways:
  - mesh
  hosts:
  #  - '*' wildcard hosts aren't allowed with 'mesh' gateways
  - test-shadow-graph-1-test-shadow-graph-0
  http:
  - match:
    - uri:
        prefix: /seldon/my-ns/test-shadow-graph-1/
    mirror:
      host: test-shadow-graph-1-test-shadow-graph-1
      port:
        number: 9001
      subset: test-shadow-graph-1
    rewrite:
      uri: /
    route:
    - destination:
        host: test-shadow-graph-1-test-shadow-graph-0
        port:
          number: 9000
        subset: test-shadow-graph-0

This would allow the models Service to be invoked, and still be able to utilize advance traffic management patterns (i.e. shadow, canary, etc)

The second option is fine for REST, but might be a bit annoying for GRPC (having to create a new GRPC connection / host). Ideally, there'd be one host name that can be invoked without having to go through the IGW, something like:

apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: test-model-1-gprc
  namespace: my-ns
spec:
  gateways:
  - mesh
  hosts:
  - my-common-service-hostname
  http:
  - match:
    - headers:
        namespace:
          exact: my-ns
        seldon:
          exact: test-model-q
      uri:
        regex: .*tensorflow.*|.*seldon.protos.*
    mirror:
      host: my-other-model-1-0
      port:
        number: 9001
      subset: my-other-model-0
    route:
    - destination:
        host: test-model-1-test-model-0
        port:
          number: 5001
        subset: test-model-0

The text was updated successfully, but these errors were encountered:

ukclivecox · 2021-08-12T21:20:40Z

This seems reasonable. Do you have suggestions on how this could be configured maybe via some additional annotations?

mwm5945 · 2021-08-12T22:28:44Z

Thats an excellent question...not sure to be honest. For us, I know its more "all or nothing,", meaning either the traffic is coming from within the same cluster, or its not, so it could be set at installation time. I'm also wondering if it would be possible to remove the wildcard hostname, and have multiple gateways so that you can do both. Something like

apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: test-model-1-gprc
  namespace: my-ns
spec:
  gateways:
  - mesh
  - my-ns/seldon-gateway
  hosts:
  - my-common-service-hostname
  - my-external.dns.name.com
  http:
  - match:
    - headers:
        namespace:
          exact: my-ns
        seldon:
          exact: test-model-q
      uri:
        regex: .*tensorflow.*|.*seldon.protos.*
    mirror:
      host: my-other-model-1-0
      port:
        number: 9001
      subset: my-other-model-0
    route:
    - destination:
        host: test-model-1-test-model-0
        port:
          number: 5001
        subset: test-model-0

I think this would probably be an ideal "best of both worlds" situation.

agilgur5 · 2021-08-22T14:57:27Z

Hi, I work with @mwm5945 and after thinking about it a bit, it would likely be non-trivial to have some "common hostname" as that would require Seldon to effectively proxy requests to models. To keep the existing paradigm, Seldon would have to add cluster-local hostname for each model, which should basically be the equivalent of a Service for the model, e.g. test-model-1.namespace.svc.cluster.local.

The other thing I realized is that this current issue actually presents a security issue as well. Normally, the Istio Ingress Gateway (IGW) only takes traffic from its LoadBalancer (LB), and therefore we can craft a specific NetworkPolicy (NP) for that to deny traffic from inside the cluster.

Due to this issue, we have to allow intra-cluster traffic to the IGW. As the IGW might be forwarding traffic to all sorts of Istio enabled services (not just Seldon models), allowing that traffic also allows for existing NPs to be bypassed. Direct communication between two services may be denied via an NP, but if they go through the IGW, then it's allowed.

I think that raises the stakes a decent bit on this issue as the current workaround is a security issue, not just a latency / efficiency issue

ukclivecox · 2021-09-01T08:15:50Z

Can you presently get what you need by specifying the gateway as an annotation on the sdep as described here: https://docs.seldon.io/projects/seldon-core/en/latest/ingress/istio.html#istio-configuration-annotation-reference

mwm5945 · 2021-09-21T21:36:53Z

@cliveseldon sorry for the delay--yes, with a bit of trickery, you can avoid using a Gateway if the traffic is only internal.

You'll need to create a "common service" that can be defined in the virtual service, and set the gateway as mesh.

Ex:

apiVersion: v1
kind: Service
metadata:
  name: seldon
  namespace: my-ns
spec:
  ports:
  - name: http
    port: 80
    protocol: TCP
    targetPort: 80
  sessionAffinity: None
  type: ClusterIP

SDEP Annotations

...
spec:
    seldon.io/istio-gateway: mesh
    seldon.io/istio-host: seldon
...

From the same namespace, in a pod that has the istio sidecar enabled, somthing like the following will work

curl -X POST -v -H 'Content-Type: application/json' -d '{"data": { "names": ["a", "b"], "ndarray": [[1,2]]}}' http://seldon/seldon/my-ns/test-model-1/api/v1.0/predictions

mwm5945 · 2021-09-21T22:54:56Z

@cliveseldon however things do break when it comes to GRPC and the Virtual Services. Because the VS's are separate from one another, and (for some reason) merging VS's for Host (mesh) networks isn't supported, you're not able to use the mesh for GRPC.

From their docs:

A VirtualService can only be fragmented this way if it is bound to a gateway. Host merging is not supported in sidecars.

Is there a specific reason why the VS's are split the way they are? I've attempted a simple combind VS and everything works correctly:

apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: my-vs
  namespace: my-ns
spec:
  gateways:
  - mesh
  hosts:
  - seldon # or whatever you defined your service to be
  http:
  - match:
    - uri:
        prefix: /seldon/my-ns/test-model-1/
    rewrite:
      uri: /
    route:
    - destination:
        host: test-model-1-test-model-0
        port:
          number: 9000
        subset: test-model-0
  - match:
    - headers:
        namespace:
          exact: my-ns
        seldon:
          exact: test-model-1
      uri:
        regex: .*tensorflow.*|.*seldon.protos.*|.*inference.*
    route:
    - destination:
        host: test-model-1-test-model-0
        port:
          number: 9500
        subset: test-model-0

mwm5945 · 2021-09-21T23:01:05Z

Sure seems like these could just be combined into one:

seldon-core/operator/controllers/seldondeployment_controller.go

Line 229 in 460bc65

Spec: istio_networking.VirtualService{

ukclivecox · 2021-09-23T12:44:42Z

I see you have open a PR for this. Be good to discuss your use case. We are looking for future API updates to create a single SVC entrypoint and allow customers to create whatever service mesh they wish above.

mwm5945 · 2021-09-23T13:06:17Z

With the annotation to set the hostname in the virtual service, we can essentially cut out the gateway (assuming we don't have traffic coming from outside the cluster, only internally, which is the case for us). I've created a service that doesn't have a selector, and listens on the http2 protocol that the virutal services will attach to in the mesh, after specifying the mesh "special" gateway through the other annotation, thereby allowing mesh invocation of the model, instead of having to hairpin through the IngressGateway. I see you've added the v2 label, when is that slated to be created?

ukclivecox · 2021-09-23T13:22:39Z

v2 label is for current early stage discussion on work for updated APIs so is still early stage thinking of how we can improve. So your PR would still be valid for the medium term.

mwm5945 · 2021-09-24T20:00:35Z

@cliveseldon i've verified that the change in the PR works, however i needed to add an additional work around. If there's one common service for all models in a NS, we still encounter the same issue of multiple VS's not being able to be applied in the mesh network. For example:
I create a service seldon in my-ns, but if I try to deploy multiple models into the NS, we encounter the same issue of only one VS being applied to the hostname.

Workaround:
Use the seldon.io/svc-name annotation to create a unique service for each of the predictors, and set the seldon.io/istio-hostname to be the "primary" one to invoke.

apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
  labels:
    app: seldon
  name: canary-example-1
spec:
  annotations:
    seldon.io/istio-gateway: mesh            # NOTE
    seldon.io/istio-host: mysvcname        # NOTE
  name: canary-example-1
  predictors:
  - annotations:
      seldon.io/svc-name: mysvcname      # NOTE
    componentSpecs:
    - spec:
        containers:
        - image: seldonio/mock_classifier:1.11.0
          imagePullPolicy: IfNotPresent
          name: classifier
        terminationGracePeriodSeconds: 1
    graph:
      children: []
      endpoint:
        type: GRPC
      name: classifier
      type: MODEL
    labels:
      sidecar.istio.io/inject: "true"
    name: main
    replicas: 1
    traffic: 75
  - annotations:
      seldon.io/svc-name: mysvcname2        # NOTE
    componentSpecs:
    - spec:
        containers:
        - image: seldonio/mock_classifier:1.11.0
          imagePullPolicy: IfNotPresent
          name: classifier
        terminationGracePeriodSeconds: 1
    graph:
      children: []
      endpoint:
        type: GRPC
      name: classifier
      type: MODEL
    labels:
      sidecar.istio.io/inject: "true"
    name: canary
    replicas: 1
    traffic: 25

Then something like this will work, and correctly split traffic:

curl -X POST -H 'Content-Type: application/json' -d '{"data": { "names": ["a", "b"], "ndarray": [[1,2]]}}' http://mysvcname:8000/seldon/batest/canary-example-1/api/v1.0/predictions

mwm5945 · 2021-09-28T22:11:53Z

@cliveseldon or @axsaucedo would either of you be able to take a look at my PR for this? We're working on releasing something relatively soon, and it would be awesome if we could invoke the model through the mesh over through the gateway :D

ukclivecox · 2021-09-29T07:24:36Z

Hi @mwm5945 looks good. Added one comment about upgrades and started the tests.

mwm5945 added the triage Needs to be triaged and prioritised accordingly label Aug 12, 2021

ukclivecox added awaiting-feedback and removed triage Needs to be triaged and prioritised accordingly labels Sep 2, 2021

mwm5945 mentioned this issue Sep 21, 2021

combine virtualservices into one #3609

Merged

ukclivecox added the v2 label Sep 23, 2021

axsaucedo closed this as completed in #3609 Oct 25, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better support Istio Mesh Internal to cluster #3485

Better support Istio Mesh Internal to cluster #3485

mwm5945 commented Aug 12, 2021

ukclivecox commented Aug 12, 2021

mwm5945 commented Aug 12, 2021

agilgur5 commented Aug 22, 2021 •

edited

Loading

ukclivecox commented Sep 1, 2021

mwm5945 commented Sep 21, 2021

mwm5945 commented Sep 21, 2021

mwm5945 commented Sep 21, 2021

ukclivecox commented Sep 23, 2021

mwm5945 commented Sep 23, 2021

ukclivecox commented Sep 23, 2021

mwm5945 commented Sep 24, 2021 •

edited

Loading

mwm5945 commented Sep 28, 2021

ukclivecox commented Sep 29, 2021

Better support Istio Mesh Internal to cluster #3485

Better support Istio Mesh Internal to cluster #3485

Comments

mwm5945 commented Aug 12, 2021

ukclivecox commented Aug 12, 2021

mwm5945 commented Aug 12, 2021

agilgur5 commented Aug 22, 2021 • edited Loading

ukclivecox commented Sep 1, 2021

mwm5945 commented Sep 21, 2021

mwm5945 commented Sep 21, 2021

mwm5945 commented Sep 21, 2021

ukclivecox commented Sep 23, 2021

mwm5945 commented Sep 23, 2021

ukclivecox commented Sep 23, 2021

mwm5945 commented Sep 24, 2021 • edited Loading

mwm5945 commented Sep 28, 2021

ukclivecox commented Sep 29, 2021

agilgur5 commented Aug 22, 2021 •

edited

Loading

mwm5945 commented Sep 24, 2021 •

edited

Loading