Skip to content
This repository has been archived by the owner on Sep 2, 2024. It is now read-only.

Common Auth Config option for Kafka #157

Closed
matzew opened this issue Nov 2, 2020 · 17 comments
Closed

Common Auth Config option for Kafka #157

matzew opened this issue Nov 2, 2020 · 17 comments
Assignees
Labels
kind/feature-request lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.

Comments

@matzew
Copy link
Contributor

matzew commented Nov 2, 2020

Motivation

Provide a common AUTH/Config approach for the different Knative Kafka components.

Current Kafka auth config story

The different Knative Kafka components have different AUTH configuration options:

  • KafkaSource:
    • SASL/TLS via API on the SPEC
  • KafkaChannel:
    • none
  • SAP Kafka Channel:
    • ConfigMap and use of Secrets
  • KafkaBroker:
    • none
  • KafkaSink:
    • none

The overall idea should be that there is one unified configuration for all components.

Other Configuration option in the Knative ecosystem

Looking at some other Knative components how they implement their AUTH configuration:

  • Duck-typing
    • Broker (ConfigMap) (no auth aspects)
    • RabbitMQ knative-broker (Rabbit CRD / Secrets: supports only auth)
  • Secret/ConfigMap:
    • SAP KafkaChannel
  • CRD based
    • GCP Channel (has a Secret field)
    • KafkaSource has various fields for TLS/SASL

Proposal 1: Ducktype (Auth) Configuration

Introduce a AuthConfig field that holds references to a Secret, which contains a list of well-defined keys, for TLS/SASL configuration.

apiVersion: messaging.knative.dev/v1beta1
kind: KafkaChannel
metadata:
 name: kafka-channel-oney
 namespace: default
spec:
 authConfig:
   apiVersion: v1
   kind: Secret
   name: my-tls-secret
   namespace: default

The referenced Secret would have a structure like:

Data
====
tls.user.crt:      x bytes
tls.user.key:      x bytes
tls.user.ca-cert:  x bytes
sasl.username:     x bytes
sasl.password:     x bytes.

Proposal 2: Ducktype global config

Introduce a (global) config, that would be able to configure all sorts of aspects:

  • AUTH (TLS/SASL)
  • Kafka settings for consumer/producer
  • Resource request/limits for the dispatchers
  • potentially more things to configure
apiVersion: messaging.knative.dev/v1beta1
kind: KafkaChannel
metadata:
 name: kafka-channel-oney
 namespace: default
spec:
 authConfig:
   apiVersion: v1
   kind: ConfigMap
   name: my-global-config-that-configures-it-all
   namespace: default

The referenced ConfigMap would contain all sorts of aspects, that are possible to configure for a Kafka “application”, like:

apiVersion: v1
data:
  sarama: |
    Version: 2.0.0
    Net:
      TLS:
        Enable: true
      SASL:
        Enable: true
...
  eventing-kafka: |
    tls-secret: my-tls-secrt
...      
    dispatcher:
      cpuLimit: 500m
...
kind: ConfigMap

Discussion: (Auth)Config? Or global config?

Looking at the Knative Broker implementations (e.g. MTChannelbased or RabbitMQ), the CRD for brokers.eventing.knative.dev uses Config, for this. However, it’s up to the implementation what GVK it supports. The name Config is a pretty generic construct. In the case of the “default” channel-based broker, it does configure what channel is used behind the scenes. In case of Rabbit, it does actually deal with auth concerns around RabbitMQ broker.

Question: do we want a more global configuration, or more specific configuration options, like authConfig etc ?

@matzew
Copy link
Contributor Author

matzew commented Nov 2, 2020

Going with the "global" config for all the components, would have the benefit, that the same API would work for the Kafka Broker as well...

apiVersion: eventing.knative.dev/v1
kind: Broker
metadata:
  name: default
  namespace: default
  annotations:
    eventing.knative.dev/broker.class: Kafka
spec:
  config:
    apiVersion: v1
    kind: ConfigMap (or ConfigMap)
    name: my-cfg

But I not sure if mixing different configuration concerns is really a good idea 😄

@vaikas
Copy link
Contributor

vaikas commented Nov 2, 2020

Maybe I didn't see it here, but another option is to create a proper CRD (I see you using ConfigMap in all the examples) that holds the deets about the required connection parameters. If we do it as a CRD say, KafkaConnection then we can do proper validation, defaulting, etc. over a configmap. Then we could embed that in other constructs, so for example, you could create a KafkaBrokerConfig CRD that has the KafkaConnection embedded in it and KafkaChannel could also embed this field, etc.

@lionelvillard
Copy link
Contributor

One thing to consider is not all Kafka backed implementations have the same set of configuration parameters. For instance, which there is a lots of commonality between the Sarama library and the Java Kafka library, there are also differences. A better name for KafkaConnection might be KafkaSaramaConnection and KafkaJavaConnection (to mention only those 2 impls).

@lionelvillard
Copy link
Contributor

I wonder if we should recommend the use of the config duck type across all Knative. For instance, CouchDBSource defines credentials, GitHubSource defines a bunch of properties under spec. Food for thoughts :-)

@matzew
Copy link
Contributor Author

matzew commented Nov 2, 2020

spec:
 authConfig:
   apiVersion: v1
   kind: Secret
   name: my-tls-secret
   namespace: default

Not always using just ConfigMaps - I think a very focused type would be the authConfig, using duck. and that would point to a well defined Secret. Rather than going w/ some overhead of the different "Kafka impl-lib" CRD?

@lionelvillard
Copy link
Contributor

We don't have to do the "Kafka impl-lib" CRD right now or even ever. All depends on the tooling.

I personally prefer Proposal 2, except instead of authConfig I would name it config.

@matzew
Copy link
Contributor Author

matzew commented Nov 12, 2020

@vaikas @lionelvillard for Serving there is also interest in doing some "config CRDs"

knative/serving#8114

@lionelvillard
Copy link
Contributor

@travis-minke-sap @eric-sap, would you be ok to extend the KafkaChannel API as follows:

spec:
  config: # Reference to an k8s object
     apiVersion: v1
     kind: ConfigMap
     name: config-kafka

The referenced object could be anything and implementation dependent (ie. not part of the API).

Explicitly referencing the Kafka configuration in the channel allows for multiple Kafkas per k8s cluster. I believe the distributed impl relies on labels to provide this capability. Correct me if I'm wrong.

Eventually I would also like to change the KafkaSource API to use config to be consistent. But first I'd like to hear your thoughts.

@travis-minke-sap
Copy link
Contributor

travis-minke-sap commented Nov 18, 2020

The distributed KafkaChannel implementation doesn't currently support connecting to multiple Kafkas. For EventHub usage it does aggregate/pool multiple SASL auths to work around Azure limitations, but the user cannot control which Topics go to which Kafka/EventHubNamspace as they are just round robin'd. Other usages (standard kafka and custom sidecar) only allow for a single kafka/auth.

Moving the the config from installation level to per-channel level is an interesting approach. This would be a significant change to the implementation to support such fine-grained brokers/auth. I suppose you could add it to the API and we could ignore it for the short term while we work on refactoring to supporting it. Let me look at the implementation a bit and get my thoughts together and get back to you here (soon - I promise.)

@travis-minke-sap
Copy link
Contributor

While the idea of per-channel brokers/authentication is interesting, and I generally don't have anything against the approach, it seems like a big feature/capability shift that merits a design doc and discussion. I have the following questions just to start with...

  • Is the intent to provide multi-tenancy and does this achieve that in a coherent manner? There are different kinds of multi-tenancy in the kafka sense (multi-tenant knative use vs multi-tenant kafka cluster itself).
  • If we make the auth config so granular, then what about providing separate auth config for Subscriptions as well (to separate producer/consumer capabilities from a security perspective.)?
  • Is this desire unique to the KafkaChannel in some way, or should ALL Knative Channels provide this capability (add to the spec).
  • It wasn't clear to me in eventing.knative.dev/scope: namespace does not work with distributed kafka channel #126 how this used to work (assuming in the eventing-contrib/kafka channel?) ?
  • There seem to be several conversations/efforts afoot around multi-tenancy and consolidation/sharing between the Kafka Channels (recently delivered Auth commonality attempt in Adding global TLS/SASL for the consolidated KafkaChannel #192, etc). I wonder if we should plan a mini-roadmap for bringing together the "configuration" of the two channels, along with the Sarama client creation and some other items. Such consolidation might reduce some of the redundancy in implementing support for this capability twice.

Supporting this feature would require a large-scale refactoring of the distributed KafkaChannel's controller / receiver / dispatcher implementations. I'm open to doing that work but feel like we shouldn't just jam it in as a means to fixing an issue/bug. Thoughts on how to proceed?

@lionelvillard
Copy link
Contributor

Here my current thoughts.

  • Is the intent to provide multi-tenancy and does this achieve that in a coherent manner? There are different kinds of multi-tenancy in the kafka sense (multi-tenant knative use vs multi-tenant kafka cluster itself).

Yes that's the main goal, being able to connect to multiple Kafka clusters from a single k8s cluster.

  • If we make the auth config so granular, then what about providing separate auth config for Subscriptions as well (to separate producer/consumer capabilities from a security perspective.)?

That's a great idea! Definitely something we should support

  • Is this desire unique to the KafkaChannel in some way, or should ALL Knative Channels provide this capability (add to the spec).

all Knative channels and beyond, sources, bindings, brokers, etc... More on this topic soon.

When the annotation is set to namespace, a Kafka dispatcher is created in the same namespace as the channel and look for config-kafka in the same ns.

  • There seem to be several conversations/efforts afoot around multi-tenancy and consolidation/sharing between the Kafka Channels (recently delivered Auth commonality attempt in Adding global TLS/SASL for the consolidated KafkaChannel #192, etc). I wonder if we should plan a mini-roadmap for bringing together the "configuration" of the two channels, along with the Sarama client creation and some other items. Such consolidation might reduce some of the redundancy in implementing support for this capability twice.

Absolutely.

Supporting this feature would require a large-scale refactoring of the distributed KafkaChannel's controller / receiver / dispatcher implementations. I'm open to doing that work but feel like we shouldn't just jam it in as a means to fixing an issue/bug. Thoughts on how to proceed?

Let's come up with a design first and then will see.

@lionelvillard
Copy link
Contributor

lionelvillard commented Jan 5, 2021

Proposal 3: Leverage KafkaBinding


TL;DR: Use KafkaBinding to represent Kafka authorization and config parameters, and binding. Kafka resource implementations (Channel dispatcher, Source adapter, Sink receiver, Broker) look for the matching binding and applies it.

Examples

Binds all KafkaChannels to config-kafka:

apiVersion: bindings.knative.dev/v1alpha1
kind: KafkaBinding
metadata:
  name: all-channels
spec:
  subject:
    kind: KafkaChannel
  config:
     apiVersion: v1
     kind: ConfigMap
     name: config-kafka

Binds all channel labeled team:a to config-kafka-team-a:

apiVersion: bindings.knative.dev/v1alpha1
kind: KafkaBinding
metadata:
  name: team-a-channels
spec:
  subject:
    selector:
      matchLabels:
        team: 'a' 
  config:
     apiVersion: v1
     kind: ConfigMap
     name: config-kafka-team-a

Binds all channel labeled team:a to config-kafka-team-a-producer for producer and config-kafka-team-a-consumer for all subscriptions.

apiVersion: bindings.knative.dev/v1alpha1
kind: KafkaBinding
metadata:
  name: team-a-channels
spec:
  subject:
    selector:
      matchLabels:
        team: 'a' 
  producer-config:
     apiVersion: v1
     kind: ConfigMap
     name: config-kafka-team-a-producer
  consumer-configs:
     subscriptions:
       - '*'
     config: 
       apiVersion: v1
       kind: ConfigMap
       name: config-kafka-team-a-consumer

This is where I'm currently heading. There are lots of details to flesh out, in particular around config aggregation rules (ie. determining which config applies to a given resource).

In this proposal I'm reusing the concept of KafkaBinding but the implementation is quite different since in this case the subject is not a PodSpecable. How it's actually done is us to decide (maybe the binding controller can add an annotation on the CRs, eg. bindings.knative.dev: <ref to binding object>, or each component (eg. dispatcher) looks for the binding using a common library).

I think this proposal offers a lot of flexibility and nicely decouple the configuration from the API. It covers a wide spectrum of configurations, from one config for the entire k8s cluster down to each subscription with their own config. Vendors can decide to fully manage KafkaBindings, or not. And the current API does not have to change too much (only KafkaBinding and KafkaSource)

@matzew @travis-minke-sap WDYT?

@n3wscott do you think that's an acceptable use of the binding concept?

@travis-minke-sap
Copy link
Contributor

fwiw (not much ; )... I haven't looked at the "bindings" implementations much, but the above seems really cool - I like the flexibility for the user to control the granularity of the configuration. Probably you're already aware and it likely doesn't impact this but the ConfigMap will reference a Secret with the Kafka Auth - wasn't sure if there was any reason to de-couple them and specify both in the binding or not? (Meaning would you want to reuse a ConfigMap with two different sets of credentials?

Also, since I last posted in this Issue... Matthias and Ali have made good progress on consolidating some of the Configuration logic from the "distributed" KafkaChannel for re-use by both implementations. I am also in the process of removing the distributed channel's "odd" approach of watching Secrets to align with the consolidated approach of specifying the single Secret in the ConfigMap. This will remove the ability to pool EventHub Namespaces but the tradeoff is worth it for the moment. Only mentioning since it means we'll be in a better place to adopt something like this binding approach than we we're before.

@lionelvillard
Copy link
Contributor

More thinking on this. KafkaBinding is cute but bad UX IMO (in this context, not KafkaBinding in general).

To get started, are we ok with:

apiVersion: messaging.knative.dev/v1beta1
kind: KafkaChannel
metadata:
 name: kafka-channel-oney
 namespace: default
spec:
 config:
   apiVersion: v1
   kind: ConfigMap
   name: my-configmap
   namespace: default

config is optional. When not specified, the current implementation behavior is applied (no change).

When config is specified, it fully replaces the default config (no attempt to merge, at least initially).

It is up to the implementation to define which configuration parameters are supported, and even the format. For instance, some implementations might decide to support bootstrapServers, and others not.

For the consolidated implementation, the configmap follows this format: https://github.com/knative-sandbox/eventing-kafka/tree/master/pkg/channel/consolidated#configuring-kafka-client-sarama

Later on we may decide to promote some configuration parameters to the API to improve the UX. IMO bootstrapServers is a candidate and from a conformance perspective, may be supported.

@github-actions
Copy link

This issue is stale because it has been open for 90 days with no
activity. It will automatically close after 30 more days of
inactivity. Reopen the issue with /reopen. Mark the issue as
fresh by adding the comment /remove-lifecycle stale.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 30, 2021
@itsmurugappan
Copy link
Contributor

/remove-lifecycle stale

@knative-prow-robot knative-prow-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 11, 2021
@github-actions
Copy link

This issue is stale because it has been open for 90 days with no
activity. It will automatically close after 30 more days of
inactivity. Reopen the issue with /reopen. Mark the issue as
fresh by adding the comment /remove-lifecycle stale.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 10, 2021
@github-actions github-actions bot closed this as completed Sep 9, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/feature-request lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
None yet
Development

No branches or pull requests

6 participants