-
Notifications
You must be signed in to change notification settings - Fork 836
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
combine virtualservices into one #3609
Conversation
Hi @mwm5945. Thanks for your PR. I'm waiting for a SeldonIO or todo member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the jenkins-x/lighthouse repository. |
/assign @majolo @axsaucedo |
/test integration |
/test notebooks |
Hi @mwm5945 Have you tested with upgrades. I would hope seldon-core/operator/controllers/seldondeployment_controller.go Lines 1062 to 1067 in 68a594d
works as expected. |
/test notebooks |
@cliveseldon let me validate and get back to you, i did test with the canary example and it worked as expected, but i'd like to do some more validation! |
@cliveseldon i've confirmed that the behavior is the same as the old operator: Canary invoked via a gateway: apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
labels:
app: seldon
name: canary-example-1
namespace: my-ns
spec:
name: canary-example-1
predictors:
- componentSpecs:
- spec:
containers:
- image: seldonio/mock_classifier:1.11.0
imagePullPolicy: IfNotPresent
name: classifier
terminationGracePeriodSeconds: 1
graph:
children: []
endpoint:
type: GRPC
name: classifier
type: MODEL
labels:
sidecar.istio.io/inject: "true"
name: main
replicas: 1
traffic: 75
- componentSpecs:
- spec:
containers:
- image: seldonio/mock_classifier:1.11.0
imagePullPolicy: IfNotPresent
name: classifier
terminationGracePeriodSeconds: 1
graph:
children: []
endpoint:
type: GRPC
name: classifier
type: MODEL
labels:
sidecar.istio.io/inject: "true"
name: canary
replicas: 1
traffic: 25 Canary with Mesh (invoked from inside the Istio Mesh) apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
labels:
app: seldon
name: canary-example-1
namespace: my-ns
spec:
annotations:
seldon.io/istio-gateway: mesh
seldon.io/istio-host: canary-example-1
name: canary-example-1
predictors:
- annotations:
seldon.io/svc-name: canary-example-1
componentSpecs:
- spec:
containers:
- image: seldonio/mock_classifier:1.11.0
imagePullPolicy: IfNotPresent
name: classifier
securityContext:
readOnlyRootFilesystem: false
terminationGracePeriodSeconds: 1
graph:
children: []
endpoint:
type: GRPC
name: classifier
type: MODEL
labels:
sidecar.istio.io/inject: "true"
name: main
replicas: 1
traffic: 75
- componentSpecs:
- spec:
containers:
- image: seldonio/mock_classifier:1.11.0
imagePullPolicy: IfNotPresent
name: classifier
terminationGracePeriodSeconds: 1
graph:
children: []
endpoint:
type: GRPC
name: classifier
type: MODEL
labels:
sidecar.istio.io/inject: "true"
name: canary
replicas: 1
traffic: 25 |
Creating/Updating/Deleting SDEPs also cleans all resources up |
That's great @mwm5945 I was thinking more of the upgrade process for someone running 1.11 and then switching and upgrading to this. I see the upgrade test did fail in notebooks test. |
/test integration |
i realized that after posting haha--i did "upgrade" from the public v1.11.0 image to a v1.11.0 image that included my changes, and the old VSs were deleted, and the new combined ones were created accordingly it doesn't look like i have access to view whats failing in the tests though |
@mwm5945 great |
@cliveseldon in a weird coincidence to the code you were referencing before, we discovered an instance where the seldon operator would delete a virtual service (that it didn't create) if the ownerReference name was the same. I've included an update to only delete the VS if the UID matches that of the SDEP. would you be able to let me know what tests are failing and where? I can't seem to get |
/test notebooks |
/test integration |
Interesting. So in what circumstances would the name be the same but it did not create the virtual service and its not one of the virtual services listed for that SeldonDeployment?
2 of the tests in integration are known to be flaky so I think we can ignore those. For the notebook tests its also the flaky upgrade test which do cover this case but I think is probably ok to ignore. Am running the tests again. |
Without getting too far into it, we have some other CRs that do lots of other things in clusters, and are usually named the same as the SDEP. One solution is to rename our custom CRs, but adding this logic to the operator seems safer in the long term :)
Thanks for letting me know! |
/test notebooks |
/test integration |
@axsaucedo added some documentation on using the mesh, not sure whats making the docs build break though... |
@mwm5945 for docs test we've added a PR that fixed it so if you rebase should be passing again |
/test integration |
d227c7a
to
2db6631
Compare
/test integration |
/test notebooks |
Nice one @mwm5945 - looks good My current question is that most e2e tests are currently run with Ambassador, would you be able to try running a previous version of Seldon Core (ie 1.9.1) wth a couple of modela and upgrade to just validate that the conversation of virtualservices is done correctly? I will also try a test similar to that to validate and send requests, just to make sure that there is no upgrade hiccups to avoid downtime. Other than that looks good from my side |
@mwm5945: The following test failed, say
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the jenkins-x/lighthouse repository. I understand the commands that are listed here. |
@axsaucedo i've confirmed that upgrading will cause no downtime, as well as downgrading :) |
While running a perf test, i upgraded from 1.10, watched the new VS be created, and then the old ones deleted w/o errors. Same goes for the reverse. |
Nice one @mwm5945 - tested locally as well and all seems good |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: axsaucedo The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Only thing is we should make a note in the UPGRADING.md page for 1.12, but we can do this in a separate PR (would just be a heads up for people that upgrade that the virtual service would be modified) |
* combine virtualservices into one * fix return vsvcs * rename vsvc * update istio cleaner to only delete with matching uid * add istio mesh docs * add images * remove azure metrics file * remove azure metrics file
What this PR does / why we need it:
Combines the istio virtual services into one to allow for better use of mesh networking.
Which issue(s) this PR fixes:
Fixes #3485 #1472
Special notes for your reviewer:
Does this PR introduce a user-facing change?: