-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add design doc for Control Plane and Data Plane Separation #344
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this makes sense to me. I left a number of questions and comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments:
- control plane per gatewayClass/namespace - wouldn't this be burdensome to both manage and in resource consumption / overhead? I don't think we win anything if we introduce a control plane that is as heavy as or even has the potential than the data plane it is managing. This is in regard to overall resource consumption of control vs data planes. Not really a fan of having multiple control planes in this way (this is me envisioning a control plane deployment with replicas and scaling that per gatewayclass / namespace - that seems entirely excessive).
- control plane resiliency - can agent reconnect if a control plane pod is recycled?
- Why the requirement for agent as an image? "produce a container image as an artifact"
- How does the agent at the data plane discover a control plane? Wouldn't any form of auto-discovery make having many control planes a rather complex discover scenario?
- There is a use case for being able to push nginx access logs to an external collector (not sending to std_out and mixing them with operational logs. Or do we wait for OpenTel to solve that?
- The customer providing mTLS certs for agent is burdensome. There should be a way to generate a key with enough to not share anything common.
- why does agent need a service account token? In addition to an mTLS cert? Isn't this redundant auth for the sake of being redundant. Maybe the cert is only TLS and not mTLS.
- "agent container" is it implied that it will run as a side car? Is there a reason why? Seems like additional complexity. Not to mention that we should not introduce sidecars just because we can, it creates additional complexity of managing two images (data plane image, agent image) instead of just one.
- What is the resource burden of the agent on the data plane?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
GatewayClass is a cluster-scoped resource, so I wouldn't suggest one control plane per namespace. We made a conscious decision when we started working on NKG only to support a single GatewayClass resource. Given this, it follows that we would need one control plane per GatewayClass. If there's a strong use case for supporting multiple GatewayClasses we can revisit this.
Yes. If it's connected to a Pod and that Pod restarts or is terminated, it will attempt to reconnect using the control plane's service name.
Because otherwise, we will have to maintain a dockerfile for the agent and produce an additional container every release.
The agent reads the control plane address from its config or command line arg on startup. Our installation manifests will specify the dns name of the control plane service either in the config or with a cli arg.
My understanding is that this will be implemented with an nginx module. Shouldn't pertain to the agent work.
Not sure what you mean by "There should be a way to generate a key with enough to not share anything common." We can provide a K8s Job that generates self-signed certificates for testing and development.
I guess that could be seen as redundant. If we use the api token to verify the identity of the agent then we could probably use TLS instead of mTLS.
I used agent and data plane interchangeably in this doc. There will only be one container that is running both the agent and nginx.
This is still unknown. We will have to run some benchmarks once we have it implemented. Presumably, if we drop all the unneeded features in agent, it should be lightweight. |
f700c5a
to
08de0ba
Compare
08de0ba
to
01e7258
Compare
Design doc for issue #292