Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add WasmPlugin API #1940

Merged
merged 17 commits into from
Jul 27, 2021
Merged

Add WasmPlugin API #1940

merged 17 commits into from
Jul 27, 2021

Conversation

dgn
Copy link
Contributor

@dgn dgn commented Apr 6, 2021

@istio-policy-bot
Copy link

😊 Welcome @dgn! This is either your first contribution to the Istio api repo, or it's been
awhile since you've been here.

You can learn more about the Istio working groups, code of conduct, and contributing guidelines
by referring to Contributing to Istio.

Thanks for contributing!

Courtesy of your friendly welcome wagon.

@google-cla google-cla bot added the cla: yes Set by the Google CLA bot to indicate the author of a PR has signed the Google CLA. label Apr 6, 2021
@istio-testing istio-testing added the do-not-merge/work-in-progress Block merging of a PR because it isn't ready yet. label Apr 6, 2021
@istio-testing
Copy link
Collaborator

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@istio-testing istio-testing added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Apr 6, 2021
@dgn
Copy link
Contributor Author

dgn commented Apr 6, 2021

/test all

@istio-testing istio-testing added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Apr 6, 2021
@dgn
Copy link
Contributor Author

dgn commented Apr 6, 2021

/test all

@dgn dgn marked this pull request as ready for review April 7, 2021 07:29
@istio-testing istio-testing removed the do-not-merge/work-in-progress Block merging of a PR because it isn't ready yet. label Apr 7, 2021
// workloads in the same namespace. If the `WasmExtension` is
// present in the config root namespace, it will be applied to all
// applicable workloads in any namespace.
istio.type.v1beta1.WorkloadSelector workload_selector = 5;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way the proposed API is only designed for local workloads correct? What is the thought process to selectively select the workload for a or two other clusters?

You may be thinking we could leverage the MCS expose + import service API but I don't think that makes sense as I could be deploying Wasm extensions to workloads that are not exposed outside of the cluster.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have such cluster/mesh selectors anywhere else? I would imagine you would label them differently, that would certainly work with this API

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@linsun what makes you think its local workloads? Everything else selects across all clusters - since the config only lives in one cluster?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the reason I am asking this question is we have a similar Solo API called WasmDeployment and that API allows users to specify which cluster this WasmDeployment is for. So I'm trying to understand how this could be mapped to Istio API.

IIUC, @dgn, in this API, the workload selector is only for the local cluster in multi primary user case so basically Istio selects the matching workload based on the selector and apply the Wasm extension to the workload. This means if I have reviews in cluster 1 and cluster 2, and if I want this Wasm extension to be applied to reviews in both of these clusters, I would deploy a Wasm resource to both clusters

@howardjohn I think you are referring to primary-remote scenario, which I agree the Istio config is applied to all clusters.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In all Istio APIs currently, which I would assume this follows suite, workload selectors apply to all clusters. If we were to change it I would want to change it to all APIs which is a major undertaking - but this one should be consistent?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@linsun we do not have any other api that is cluster specific today. This api uses an existing istio.type.v1beta1.WorkloadSelector message. If you want to propose cluster awareness (or location, or anything other than labels) you should do it as a separate proposal.

@mandarjog I am not proposing adding cluster to this API yet. I am trying to understand the semantics of this API... wanted to see if we could translate Solo's Wasmdeployment resource to this new Wasm resource.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just mentioning that the semantics here are not just for this API, but for all of Istio. We need to be consistent in our configuration application across the project. So maybe uplevel the conversation out of this PR and into a discussion in TOC or something if we want to talk about how config gets applied to particular endpoints and how to take cluster etc. into account.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, it is across all APIs. The question came up specifically here is because the comments were not clear.

I can try create some doc for this. If anyone else is interested, let me know!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does k8s intend to standardize a synthetic cluster label just like they have for namespace IIRC

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does anyone know more about this synthetic cluster label? that is interesting. @howardjohn @smawson @nmittler

@hzxuzhonghu
Copy link
Member

How could i apply this to a specific service port? inbound and outbound?

@dgn
Copy link
Contributor Author

dgn commented Apr 13, 2021

How could i apply this to a specific service port? inbound and outbound?

Very good question. In my current PoC, it will be applied to all inbound HTTP filter chains of the service ports. We might want to limit it to specific ports, but I considered this an extension we can add later.

I don't think we should apply to outbound listeners as it makes the system quite complex to reason about (different behaviour depending on which service is sending the request)

// <namespace>/<name>. Names that do not contain forward-slashes are
// interpreted as the names of secrets in the same namespace as the
// `WasmExtension`.
repeated string image_pull_secrets = 4;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this actually work? How does the proxy get the secret to pull the image?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the current design, the agent is responsible for fetching and unpacking the OCI images, so the agent would have to fetch the secrets using client-go prior to that

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think this is viable. Importing client-go into the agent adds major overhead (ie invert this: https://mobile.twitter.com/karlstoney/status/1367134238613663744). Additionally, this would require giving ~every pod RBAC permissions for secrets. The load from every single pod in the cluster opening k8s watches is also likely to kill and non-trivial cluster.

Copy link
Contributor Author

@dgn dgn Apr 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think this is viable. Importing client-go into the agent adds major overhead (ie invert this: https://mobile.twitter.com/karlstoney/status/1367134238613663744).

that would be an argument for moving the pulling of images to a central component, either istiod itself or something else. I don't think it means the API is unimplementable.

Additionally, this would require giving ~every pod RBAC permissions for secrets. The load from every single pod in the cluster opening k8s watches is also likely to kill and non-trivial cluster.

All pods have access to secrets in their namespace by default. For secrets outside the pod namespace, yes, users would have to make sure they have access. I would also argue that watching isn't necessary, as a) pull secrets rarely change and b) we only need it when pulling the image which, depending on pullPolicy, doesn't necessarily happen often. If we get a 403 back, we can still go and re-fetch the Secret

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it means the API is unimplementable.

I agree, but I would not feel comfortable approving an API that doesn't also have a planned implementation?

All pods have access to secrets in their namespace by default.

For mounting yes, but not for reading through the API, right? Since this is dynamic, they would need to read through the API.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, just compiling in the library adds to the RAM in the tweet, and binary size.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is also fairly incompatible with VMs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All pods have access to secrets in their namespace by default.

For mounting yes, but not for reading through the API, right? Since this is dynamic, they would need to read through the API.

That's right, nevermind. I had assumed that this was the case by default. Makes sense that it isn't though. So yes, a user would have to guarantee that all pods that need to use an Extension have access to the pullSecret, should a pullSecret be required. It's not a blocker for me personally, but I understand the concern. Pulling from a central component would alleviate this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For istio-agent based fetching, we can make Wasm API only referencing local mounted secrets. That would remove concern of client-go and incompatibility with VMs. Although user would need a way to indicate secret mount, which I original thought ExtensionProvider would be a great fit for this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can separate concerns for accessing the extension provider from the use of the extension? E.g. add a Wasm extension provider with pull secrets that is globally defined in MeshConfig, and reference the Wasm provider from this API. We could support both inlined and referenced Wasm extension provider with the intent to allow using secrets only from the referenced provider.

extensions/v1alpha1/wasm.proto Outdated Show resolved Hide resolved
@linsun
Copy link
Member

linsun commented Apr 13, 2021

Another q: If I have multiple Wasm images, do I need to create one Wasm resource for each Wasm image/Wasm filter for a given workload?

extensions/v1alpha1/wasm.proto Outdated Show resolved Hide resolved
// workloads in the same namespace. If the `WasmExtension` is
// present in the config root namespace, it will be applied to all
// applicable workloads in any namespace.
istio.type.v1beta1.WorkloadSelector workload_selector = 5;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just mentioning that the semantics here are not just for this API, but for all of Istio. We need to be consistent in our configuration application across the project. So maybe uplevel the conversation out of this PR and into a discussion in TOC or something if we want to talk about how config gets applied to particular endpoints and how to take cluster etc. into account.

istio.type.v1beta1.WorkloadSelector workload_selector = 5;

// Determines where in the filter chain this `WasmExtension` is to be injected.
istio.networking.v1alpha3.EnvoyFilter.Patch.FilterClass phase = 6;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not take a dependency on EnvoyFilter and pull that enum out into its own thing that we can define here.

I think the values/semantics are actually pretty good (which phase to run this was extension during) but this shouldn't depend on EnvoyFilter proto.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also do we need an option to run an extension before the istio authentication filters?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

enum likely also needs refinement @mandarjog

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smawson @louiscryan we will relocate the FilterClass and amend enums in a separate PR after some more discussion here.

@bianpengyuan
Copy link
Contributor

I don't think we should apply to outbound listeners as it makes the system quite complex to reason about (different behaviour depending on which service is sending the request)

@dgn I disagree. Manipulate request at outbound (including gateway) and refresh routing accordingly are probably one of the most common use cases that user would want with Wasm extensions. I think for now, at outbound, we can make it only support injecting filters at listener level. Envoy Wasm extension does not yet support per route config anyway.

@mandarjog
Copy link
Contributor

@dgn want to make sure that it is our collective intention to only use Wasm api and no longer need EnvoyFilter api to manipulate all wasm use cases.

wasm api will not cover aspects of EnvoyFilter api that are unrelated to wasm.

@mdhume we would love if you can come up with a proposal for migration from EF to the new API since you are running things in prod.

// +cue-gen:WasmPlugin:printerColumn:name=Age,type=date,JSONPath=.metadata.creationTimestamp,description="CreationTimestamp is a timestamp
// representing the server time when this object was created. It is not guaranteed to be set in happens-before order across separate operations.
// Clients may not set this value. It is represented in RFC3339 form and is in UTC.
// Populated by the system. Read-only. Null for lists. More info: https://git.k8s.io/community/contributors/devel/api-conventions.md#metadata"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brian-avery
Copy link
Member

This is looking really good to me.

// If `priority` is not set, or two `WasmPlugins` exist with the same
// value, the ordering will be deterministically derived from name and
// namespace of the `WasmPlugins`.
google.protobuf.Int64Value priority = 10;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the default priority value? In your scenario above, what if I specify one Wasm resource without priority and one with a priority of 1000 (or 10) within the same phase, how do I know which is prioritized first?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defaults to 0, I added a sentence to the comment. Thank you!

@linsun
Copy link
Member

linsun commented Jul 14, 2021

LGTM overall! would like the following to be addressed:

  • use workloadSelector to be consistent with other istio resources?
  • clarify priority default value (when it is not specified) and how it compares with a priority value supplied by user?

@mandarjog
Copy link
Contributor

@linsun the new APIs (telemetry) actually uses selector so if we change this, then we need to change the other too. selector is consistent with other k8s api.

@dgn we have used priority in ascending order in EnvoyFilter. I think we should do the same here. The default value is 0.

@dgn
Copy link
Contributor Author

dgn commented Jul 19, 2021

@dgn we have used priority in ascending order in EnvoyFilter. I think we should do the same here. The default value is 0.

So higher priority means later in the filter chain? That sounds confusing

@linsun
Copy link
Member

linsun commented Jul 19, 2021

@mandarjog thanks, SG!

I agree with @dgn - I had thought priority higher means it is first/earlier in the filter chain. Let's clarify default 0 in the comment.

@mandarjog
Copy link
Contributor

@dgn let’s clarify the comment and merge it!

@mandarjog
Copy link
Contributor

@linsun please lgtm it since you have reviewed it, John and Brian have already indicated so.

@mandarjog mandarjog added the do-not-merge/hold Block automatic merging of a PR. label Jul 19, 2021
@mandarjog
Copy link
Contributor

adding label until we get 3 lgtms.

Copy link
Member

@linsun linsun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - pending the priority comment clarification of default.

// Criteria used to select the specific set of pods/VMs on which
// this plugin configuration should be applied. If omitted, this
// configuration will be applied to all workload instances in the same
// namespace. If omitted, the `WasmPlugin` will be applied to all
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

extensions/v1alpha1/wasm.proto Show resolved Hide resolved
// the `url` field contains the URL of an OCI image that is referenced
// by tag, the reference will be replaced by the value of the `sha256`
// field (using the `@sha256:` notation).
optional string sha256 = 3;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is definitely more confusing than is necessary.

For non-OCI usage, I'm assuming this is treated as a checksum, and used to verify that what you fetch is what you wanted to fetch. For the OCI usage it is used to fetch a specific version right? Not just as a checksum.

My assumptions would be for OCI:

  • If @sha256 is set in the URL, we raise an error if sha256 is set in the field to something different
  • If sha is set (either in URL or field), we also verify that the downloaded image has the correct sha.
  • If sha field is set, but not in the URL, the image is still downloaded by tag and sha verified after download (not sure about this one).

It is confusing to have fields used for both fetch and verify, better to make it more explicit and have the sha field always used just for verification.

// Insert plugin before Istio authentication filters.
AUTHN = 1;

// Insert plugin before Istio authorization filters.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this also after Istio authentication filters? Maybe specify that if so to be explicit (same with STATS being after authorization filters).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's also after the istio auth filters. I made it more explicit, also in the comment for STATS below

// mirroring K8s behaviour.
//
// buf:lint:ignore ENUM_VALUE_UPPER_SNAKE_CASE
enum PullPolicy {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need an UNSPECIFIED = 0 here.

Otherwise I think even explicitly setting it to IfNotPresent would get overridden if latest is in the tag, since we couldn't distinguish between default and explicit set to IfNotPresent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I'll add UNSPECIFIED so we can match k8s behaviour

- Clarify default for priority
- Add UNSPECIFIED_POLICY value
- Change sha256 behaviour
@smawson
Copy link
Contributor

smawson commented Jul 22, 2021

I'm happy with this if you want to remove the hold Mandar.

@dgn dgn changed the title Add WasmExtension API Add WasmPlugin API Jul 27, 2021
@mandarjog mandarjog removed the do-not-merge/hold Block automatic merging of a PR. label Jul 27, 2021
@mandarjog
Copy link
Contributor

/cherry-pick release-1.10

@istio-testing
Copy link
Collaborator

@mandarjog: once the present PR merges, I will cherry-pick it on top of release-1.10 in a new PR and assign it to you.

In response to this:

/cherry-pick release-1.10

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@istio-testing istio-testing merged commit 5e10e5c into istio:master Jul 27, 2021
@istio-testing
Copy link
Collaborator

@mandarjog: new pull request created: #2067

In response to this:

/cherry-pick release-1.10

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@mandarjog
Copy link
Contributor

/cherry-pick release-1.11

@istio-testing
Copy link
Collaborator

@mandarjog: new pull request created: #2068

In response to this:

/cherry-pick release-1.11

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@dgn dgn mentioned this pull request Oct 14, 2021
10 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes Set by the Google CLA bot to indicate the author of a PR has signed the Google CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.