Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace ConfigMaps with Config CRDs #8114

Open
markusthoemmes opened this issue May 28, 2020 · 24 comments
Open

Replace ConfigMaps with Config CRDs #8114

markusthoemmes opened this issue May 28, 2020 · 24 comments
Assignees
Labels
area/API API objects and controllers kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. kind/feature Well-understood/specified features, ready for coding. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. triage/accepted Issues which should be fixed (post-triage)
Milestone

Comments

@markusthoemmes
Copy link
Contributor

In what area(s)?

/area API

Describe the feature

Currently, all of our configuration relies on well known ConfigMaps placed next to our control plane. The components pick up the config in those ConfigMaps through an informer on ConfigMaps and they then parse the map[string]string typed data into a struct which is then used by the individual components.

This is all nice and it works in principle. It has a few drawbacks though:

  1. Discoverability. We place a huge _example block at the top of each ConfigMap that is supposed to document the available keys. A ton of users stumbled over this and it's arguably a hacky way of providing documentation.
  2. Typing. Related to the above, typing is done through parsing and thus the expected schema needs to be documented.
  3. Compatibility guarantees. Are there any? I think we have tried to not break backwards compatibility on the config, at least in Serving but I'm not sure if we technically have any guarantees here.

Config CRDs

Instead of ConfigMaps, we could use Configuration CRDs, i.e. CRDs that represents the same data as the ConfigMaps do today, but in a structured way, for example AutoscalerConfiguration, DeploymentConfiguration etc. They could have an openAPI schema, telling the user directly which fields are supported and what format we expect in them. Their APIs can be versioned just like any other API we ship, making it easier to do controlled breaking changes while giving clear guarantees on compatibility.

This is especially important for the operator. Currently, we ship the operator with a spec.config field which is map[string]map[string], a map of maps, each able to represent a ConfigMap's content. Essentially, by putting this in the operator API, we kinda commit to a certain schema (maybe not because it's a map of maps and it's willynilly but you get the idea).

Having versioned Configuration APIs in the upstream (Knative Serving in this case) would be great wrt. guarantees and construction for the API as a whole.

Further down the line, this could potentially also open up the possibility to define per-namespace configuration as proposed in #7711. Generally agreeing on Configuration CRDs across the project would probably make per-namespace configuration trivial as it'd mostly come down to scoped informer calls to fetch the respective configuration for a given resource. But that's a stretch goal.

@markusthoemmes markusthoemmes added the kind/feature Well-understood/specified features, ready for coding. label May 28, 2020
@knative-prow-robot knative-prow-robot added the area/API API objects and controllers label May 28, 2020
@vaikas
Copy link
Contributor

vaikas commented May 28, 2020

+1 for exploring this. In our Alternate Brokers, we were expecting some Brokers to implement their own CRD for having their configurations specified in a type safe way:
https://docs.google.com/document/d/1pBLkMptwbdEHUgjn1K5Obf7Xe-G9PFGChQcDCZE9vMQ/edit

Idea being that if you just wanted to have a ConfigMap that you're using for your config, cool, no extra steps, but This is probably something that we should mosdef look at doing 👍

@n3wscott
Copy link
Contributor

kinda sounds like the knative operator co?

@markusthoemmes
Copy link
Contributor Author

@n3wscott more or less. The operator API surface is potentially a bit bigger than "just" the plain configuration for various components (see the image specification and other things that directly affects the shape of the deployments as they are created).

It would probably inline the types I'm alluding to here and it might even be worthwhile exploring to have all configuration be triggered directly by the operator CR, if it's present.

I'm imagining a shape like this in the operator CRD:

type KnativeServingSpec spec {
  Autoscaler servingv1alpha1.AutoscalerConfiguration
  Tracing knativev1alpha1.LoggingConfiguration
  ...
}

Tangentially, we could also consider to not replicate the configmaps into APIs in a 1:1 mapping but instead create a uber-configuration KnativeServingConfiguration CRD right away that more or less inlines the existing config types.

@julz
Copy link
Member

julz commented May 29, 2020

Big fan of this.

An extra reason this would be nice: right now we have a lot of individual properties (particularly in defaults.yaml) for doing things like setting default resource limits/requests. There's a fair few use cases where there'd be other fields of pod/container that we'd like to let operators set, without leaking in to the ksvc contract (for example, pod affinity, anti-affinity, runtimeClass, securityContext, topologySpreadConstraints).

With config map each field of a struct type like ResourceLimits or SecurityContext needs its own stringy property, it'd be much nicer to just embed the real typed corev1.ResourceList / corev1.SecurityContext struct (or even to allow the operator to set an explicit base containerSpec that could be merged with the user's containerSpec).

@lionelvillard
Copy link
Member

+1 on exploring the idea of making configuration files as part of Knative APIs.

I'm not sure all configuration files are API worthy, for instance config-logging exposes implementation details like zap, unless we call it zaploggings.config.knative.dev.

@evankanderson
Copy link
Member

Question (based on @n3wscott 's comment:

Do we need to create new COs for this configuration, or should we teach the code that currently watches ConfigMaps to instead watch the operator CO if present?

Pros:

  • UX: Fewer new objects -- avoids the Istio proliferation of types
  • Code/Reliability: Avoids a copy-here-to-there dance in the controller

Cons:

  • Packaging: Do we ship the CRD with the core, but the controller for the CRD with the operator?
  • UX: In the non-controller case, how does the user figure out that some fields aren't reacted to?

@evankanderson
Copy link
Member

Additional question (which probably applies today anyway):

What happens if there are multiple COs for this configuration? Is there some sort of hierarchy (global in kn-serving, then per-namespace), or is only a single CO for each supported, possibly a cluster-level CO?

@markusthoemmes
Copy link
Contributor Author

@evankanderson yeah the operator relationship seems to be something to explore here. @jcrossley3 and @bbrowning brought up a similar point.

I touched on the multiple CO thing a little bit above. I'd imagine us starting with a global config that the components watch for in their own namespace. We could then in a next step explore per-namespace configuration too.

@aliok
Copy link
Member

aliok commented Jun 23, 2020

I was just today complaining that there are too many configmaps to configure eventing :)
Big +1 on exploring this issue.

I'm not sure all configuration files are API worthy, for instance config-logging exposes implementation details like zap, unless we call it zaploggings.config.knative.dev.

As a naive solution, we can aim strong typing as much as possible with proper api. But for things like logging, we can create a map.

@julz
Copy link
Member

julz commented Jul 20, 2020

This seems to have stalled a bit, but I think it'd be super useful to explore.. anyone have an idea for the best next step we could take to progress it? (I'm happy to pick up some work for it if the answer is it needs someone to work on it!).

@markusthoemmes
Copy link
Contributor Author

I think at this point, this mostly needs an actual design/analysis on what the desired outcome would look like for serving for example.

@skonto
Copy link
Contributor

skonto commented Aug 7, 2020

@markusthoemmes @julz I can work on this.

@markusthoemmes
Copy link
Contributor Author

This was touched on in the last ToC meeting as part of the work around making the Operator more visible. I think we should handle it as part of it and potentially explore how we could make the KnativeServing CRD more intrinsic to this repository. As such, this is currently not actionable code-wise but requires a proposal in conjunction with the proposal @Cynocracy is going to work out regarding pushing the operator upstream further.

@skonto
Copy link
Contributor

skonto commented Aug 7, 2020

ok I can work on the proposal, anyone willing to help on that?

@markusthoemmes
Copy link
Contributor Author

I'm thinking about picking this up and playing it through with an example CRD (for autoscaling?) to see where we'd end up with this. Gonna write up a proposal for it too once that solidifies somewhat more.

/assign

@skonto
Copy link
Contributor

skonto commented Nov 24, 2020

Immutable configmaps are now in beta in 1.19.

Advantages: protects you from accidental (or unwanted) updates that could cause applications outages
improves performance of your cluster by significantly reducing load on kube-apiserver, by closing watches for ConfigMaps> marked as immutable.

It would be nice to have immutability and type-safety together in some occasions. In some cases immutability could protect us from scenarios like in here. In general for the Ops persona it makes sense to freeze configuration.
However beyond control plane configuration, there are options in ksvc defined in annotations such autoscaling.knative.dev/class: kpa.autoscaling.knative.dev that look to me compleltly bound to an app lifecycle and wondering if they could be frozen eg. in a production deployment scenario.

@markusthoemmes
Copy link
Contributor Author

/unassign

I didn't actually get around to look into this properly and likely won't get to it this year. Unassigning to make way if somebody else wants to pick it up. I might revisit myself once time permits.

@julz
Copy link
Member

julz commented Nov 24, 2020

/assign

Can't resist at least having a play 😄

@vaikas
Copy link
Contributor

vaikas commented Dec 2, 2020

@lionelvillard @matzew since we chatted about this again in Eventing.

@github-actions
Copy link

github-actions bot commented Mar 3, 2021

This issue is stale because it has been open for 90 days with no
activity. It will automatically close after 30 more days of
inactivity. Reopen the issue with /reopen. Mark the issue as
fresh by adding the comment /remove-lifecycle stale.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 3, 2021
@julz
Copy link
Member

julz commented Mar 3, 2021

Still slowly noodling at this in spare moments

/reopen
/remove-lifecycle stale

@knative-prow-robot knative-prow-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 3, 2021
@evankanderson
Copy link
Member

/kind cleanup

/triage accepted

@knative-prow-robot knative-prow-robot added kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. triage/accepted Issues which should be fixed (post-triage) labels Mar 22, 2021
@github-actions
Copy link

This issue is stale because it has been open for 90 days with no
activity. It will automatically close after 30 more days of
inactivity. Reopen the issue with /reopen. Mark the issue as
fresh by adding the comment /remove-lifecycle stale.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 24, 2021
@markusthoemmes
Copy link
Contributor Author

/remove-lifecycle stale
/lifecycle frozen

I think we still want this.

@knative-prow-robot knative-prow-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 24, 2021
@dprotaso dprotaso added this to the Icebox milestone Feb 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/API API objects and controllers kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. kind/feature Well-understood/specified features, ready for coding. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. triage/accepted Issues which should be fixed (post-triage)
Projects
None yet
Development

No branches or pull requests

10 participants