Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cluster scoped Bus #117

Merged
merged 7 commits into from
Jul 2, 2018
Merged

Cluster scoped Bus #117

merged 7 commits into from
Jul 2, 2018

Conversation

scothis
Copy link
Contributor

@scothis scothis commented Jun 25, 2018

Add ClusterBus in addition to Bus for buses that are operator provisioned. Channels may target either a Bus by name within the current namespace, or a ClusterBus installed cluster wide.

The bus monitor is able to manage both Bus or ClusterBus and their Channels/Subscriptions.

ClusterBus backed Subscriptions using a short DNS name will automatically be expanded to
use the full name.namespace.svc.cluster.local form based on the
namespace of the subscription resource. Other bus implementations will
also need to make this change.

Fixes #106

/assign @evankanderson @ericbottard

@google-prow-robot google-prow-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jun 25, 2018
@vaikas
Copy link
Contributor

vaikas commented Jun 25, 2018

My understanding is that Cluster scoped resources should be still named indicating that regardless of if there's Namespace scoped counterparts. So, I think this should be ClusterBus. @pmorie who has more recent information on it, thoughts?

@pmorie
Copy link
Member

pmorie commented Jun 25, 2018

@vaikas-google you're correct - the current guidance is that cluster-scoped resources should have names prefixed with Cluster. It makes sense to change this now, possibly in a follow-up PR. Prior experience with a very similar problem (services and plans in the k8s service-catalog tells me there should be a namespaced version of this as well. Some lessons learned previously that may be relevant:

  • There is no ACL-filtering for k8s resources that would let people see different views of the cluster-scoped resource based on their permission to see individual busses
  • The cluster scope should be reserved for things everyone should be able to see
  • Developers will want to create new buses and test them in namespaces before promoting them to the cluster scope

@@ -25,6 +25,7 @@ import (

// +genclient
// +genclient:noStatus
// +genclient:nonNamespaced
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a comment for my own edification. I imagine the motivation here is partially due to efficiency for developers - ie, you don't want to have to maintain independent namespaced versions of this resource for every namespace that wants to use the bus - is that accurate? I'm still learning this API, but it also looks like there are concerns for where containers are run (ie, provision and dispatch containers).

If that's accurate, then there's more context I can share that may be helpful, having recently worked through a similar problem in the cluster-registry that may be relevant to discuss.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we probably want to remove the provisioner and dispatcher containers from the Bus spec -- there are possible efficient multi-tenant implementations of a Bus which wouldn't need long-running management containers for channels which could simply have a controller watching the Channel and Subscription resources which configures the underlying system.

The motivation for moving Bus to a cluster-level resource is that the backing implementation for most buses is likely to involve configuring a semi-complex storage backend (e.g. Kafka or the equivalent). Our hypothesis is that provisioning and installing a kafka cluster is more likely to be an operator concern which will then be shared across multiple development teams than each development team managing their own kafka cluster.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The move to cluster scope is because many buses require a backing messages broker, which typically should be managed by an operator rather than a developer. We can also gain efficiencies by avoiding the need for redundant deployments in each namespace for functionality that could be provided centrally.

With the namespaces model, we created two deployments (provisioner, dispatcher) in the same namespace as the bus. Now those deployments have moved into the knative-eventing namespace. Because these deployments need to communicate with the k8s api, we also need a service account with a rolebinding to read the related resources. The service account and role binding were created dynamically for the namespaced bus, but can be defined statically for the cluster wide bus.

If that's accurate, then there's more context I can share that may be helpful, having recently worked through a similar problem in the cluster-registry that may be relevant to discuss.

Always happy to learn something new.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[...] which could simply have a controller watching the Channel and Subscription resources which configures the underlying system.

To clarify the role of the provisioner and dispatcher containers: this is exactly what the provisioner is for. It is a controller-like (helped by BusMonitor) container that watches for changes to Channel resources and translates those changes to middleware specific changes (eg create a Kafka topic on an already existing Kafka broker). The role of the provisioner is not to create a new instance of a Kafka broker deployment (although that is an interesting idea).

I can see a world where this is taken care of by a Job (where the k8s custom resources changes are translated eg to program arguments for a container that has middleware specific logic), but this does not offer the same level of flexibility in terms of responsiveness and error message propagation, etc

The dispatcher pod on the other hand has to be long running IMHO

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@evankanderson would you like to sketch out your architectural idea in an issue or in the Channel CRD doc? I'm having trouble understanding the full implications of your proposal.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, I left a comment in that document on how I was thinking that this might look like. It inverts this a little bit from CRD => Creating the bus. So instead of a a CRD creating the buses, the bus would be created by some other mean (helm chart, etc.)
A CRD would get created representing the Bus as a resource and a controller would run for that bus, watching for Channel / Subscriptions for that bus and talk to the underlying bus resource whatever the native way is for that bus.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The move to cluster scope is because many buses require a backing messages broker, which typically should be managed by an operator rather than a developer. We can also gain efficiencies by avoiding the need for redundant deployments in each namespace for functionality that could be provided centrally.

This is very similar to the motivation we had for cluster-scoping service brokers in the service-catalog.

So instead of a a CRD creating the buses, the bus would be created by some other mean (helm chart, etc.)

I was actually going to ping you on slack @vaikas-google and ask about this. I am majorly +1 on @vaikas-google's suggestion that the bus should be created out of band relative to the CRD.

@scothis
Copy link
Contributor Author

scothis commented Jun 25, 2018

I'll rename Bus to ClusterBus in this PR and worry about a namespaced Bus in the future.

Copy link
Member

@evankanderson evankanderson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly LGTM; one direct nit, but also the general comment I left for @pmorie that the provisioner/dispatcher containers feel like an implementation detail that would be better configured by the containers.

Having a declaration of expected parameters (with or without defaults) feels useful here, though.

I can even imagine a bus implementation which behind the scenes sets up and manages a kafka cluster per namespace, though I'm not signing up to write it.

@@ -95,6 +96,14 @@ func (b *StubBus) dispatchEvent(subscriber string, body []byte, headers http.Hea
}
}

func (b *StubBus) resolveSubscriber(subscription channelsv1alpha1.SubscriptionSpec, namespace string) string {
subscriber := subscription.Subscriber
if strings.Index(subscriber, ".") == -1 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this indicate that subscriber names should not contain ".", or is this a shortcut way of being able to refer to subscribers in the same namespace by their short names?

The API is somewhat mysterious as to the expected format of this string.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we already had an issue to track this, but I couldn't find it. Opened #118

The current stopgap value is a naive string which is treated as a DNS name for a k8s service. Because the dispatcher pods are now running in a different namespace from the subscription, we add the subscription's namespace to the string if string is a single segment (label) DNS name.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't that what LocalObjectReference is for?

I think in the long run we'll want to lookup the Service object this refers to, because it provides additional info like what port(s) to connect to and how (via the port name, which is likely to indicate the kind of traffic expected). Any higher level construct (like route) which can ultimately be mapped to a service is fine, but it has to be a service (not just a DNS name)

@google-prow-robot google-prow-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 26, 2018
@pmorie
Copy link
Member

pmorie commented Jun 26, 2018

Mostly LGTM; one direct nit, but also the general comment I left for @pmorie that the provisioner/dispatcher containers feel like an implementation detail that would be better configured by the containers.

I am very much +1 on allowing Buses to be operated out of band from how they are modeled as resources in knative. People are going to be very opinionated on how they deploy complex packages.

scothis added 3 commits June 26, 2018 20:32
Make the Bus resource cluster scpoped instead of namespaced. Bus
releated deployments will be put into the knative-eventing namespace
and use a shared serviceaccount for all bus provisioners and
dispatchers.

This PR does not change the name of the Bus resource to ClusterBus.
That should happen later if we desire to have both cluster wide and
namespaced buses.

Subscribers using a short DNS name will automatically be expanded to
use the full name.namespace.svc.cluster.local form based on the
namespace of the subscription resource. Other bus implementations will
also need to make this change.

Fixes #106
Channels can target either a Bus or a ClusterBus, but never both.
@scothis
Copy link
Contributor Author

scothis commented Jun 27, 2018

This PR includes both a Bus and ClusterBus implementation. The bus monitor has been updated to handle both types transparently. Tested with the stub bus implementation deployed as a standard Bus and as a ClusterBus.

@@ -78,3 +87,11 @@ type BusList struct {
meta_v1.ListMeta `json:"metadata"`
Items []Bus `json:"items"`
}

// GenericBus may be backed by Bus or ClusterBus
type GenericBus interface {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pmorie do you have a better pattern for working with cluster and namespaced resources in a similar way?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If their specs are going to be the exact same, this seems better than what I've had to do in the past :)

errInvalidChannelBusMutation = errors.New("the Channel's Bus may not change")
errInvalidChannelInput = errors.New("failed to convert input into Channel")
errInvalidChannelBusMissing = errors.New("the Channel must reference a Bus or ClusterBus")
errInvalidChannelBusMutation = errors.New("the Channel's ClusterBus may not change")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like this should be Bus instead of ClusterBus

@@ -43,11 +44,16 @@ func ValidateChannel(ctx context.Context) ResourceCallback {
}

func validateChannel(old, new *v1alpha1.Channel) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should validate that Bus and ClusterBus aren't both set at the same time

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -55,3 +62,15 @@ func TestChannelBusMutation(t *testing.T) {
t.Errorf("Expected %s got %s", e, a)
}
}

func TestChannelClusterBusMutation(t *testing.T) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be a test case for both Bus and ClusterBus being set at the same time

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

@@ -38,8 +38,11 @@ type Channel struct {
// ChannelSpec (what the user wants) for a channel
type ChannelSpec struct {

// Name of the bus backing this channel (optional)
Bus string `json:"bus`
// Bus name of the bus backing this channel (mutually exclusive with ClusterBus)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it would be cleaner to always include the namespace for the bus in the channel and remove clusterBus here.

Or add a busNamespace property to signal which namespace we are referring to.

I have seen this clusterFoo and Foo pattern before and I have always felt like it complicates the understanding of the object over the wire...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my understanding, a ClusterBus is not the same thing as a (namespaced) Bus in a namespace different from the channel namespace. At least in usage intention.

It is a Bus that is widely available for anyone to use

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there's a good way to use the lack of an explicit namespace to mean cluster scoped. Users are not used to having to explicitly state the namespace - this will complicate templates/charts/etc. I would expect a name with no explicit namespace to mean 'my namespace', not cluster scope.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @pmorie here.

The alternative implementation would be to do something like:

spec:
  bus:
    type: cluster
    name: pubsub

The trade off is a single field in config that is set, and the user has to know which field to use, or two nested fields that both must be set. I don't have a strong opinion as to which model is best.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh nice, I like this idea ^, it gets us closer to label selector style syntax.

I could be alone, but it bothers me when there are mutually exclusive fields

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should type be scope? could it be a boolean (cluster-scoped), and should it have a default value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scope is better than type. Not crazy about using a boolean, but we could certainly make either namespace or cluster the default.

I'm willing to go with the group consensus here.

meta_v1.TypeMeta `json:",inline"`
meta_v1.ObjectMeta `json:"metadata"`
Spec ClusterBusSpec `json:"spec"`
Status *ClusterBusStatus `json:"status,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Statuses don't use pointers, from my reading of other examples in the k8s codebase

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accurate, status is not a pointer field.

@@ -58,13 +59,15 @@ const (
errResourceSync = "ErrResourceSync"
)

// Monitor utility to manage channels and subscriptions for a bus
// Monitor utility to manage channels and subscriptions for a clusterbus
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this got caught in a find/replace session.

@@ -178,7 +181,7 @@ type subscriptionSummary struct {
Subscription channelsv1alpha1.SubscriptionSpec
}

// NewMonitor creates a monitor for a bus
// NewMonitor creates a monitor for a clusterbus
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idem, this is really about a GenericBus, not a clusterbus specifically

channelInformer := informerFactory.Channels().V1alpha1().Channels()
subscriptionInformer := informerFactory.Channels().V1alpha1().Subscriptions()

// Create event broadcaster
// Add bus-controller types to the default Kubernetes Scheme so Events can be
// logged for bus-controller types.
// Add clusterbus-controller types to the default Kubernetes Scheme so Events can be
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idem

// Set up an event handler for when Bus resources change
busInformer.Informer().AddEventHandler(cache.ResourceEventHandlerFuncs{
// Set up an event handler for when ClusterBus resources change
clusterBusInformer.Informer().AddEventHandler(cache.ResourceEventHandlerFuncs{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this come in addition to the one for Bus?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scratch that, one Monitor should only track one kind of bus. But this should be for the super type.

I believe this currently works (haven't tested) by coincidence because of the type alias

Copy link
Contributor Author

@scothis scothis Jun 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, need to add back the event handlers for Bus.

It works now because the bus is actively looked up rather than waiting for the informer event, however, update for Bus will be broken until restored.

bus, err := m.busesLister.Buses(namespace).Get(name)
if err != nil {
glog.Fatalf("Unknown bus '%s/%s'", namespace, name)
if len(busNamespace) == 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to answer my earlier question: the kind of Bus being monitored is decided by Run() rather than by NewMonitor. This is weird IMHO

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The core of the monitor doesn't care if it's working with a Bus or a ClusterBus. The Run method starts the informers and gets a reference to the target bus object. When the monitor is created the informers haven't yet synched, so the bus isn't available.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point is that passing name + namespace should happen at creation time rather than usage time (Run()). Although technically possible, this gives the impression that a monitor can be reused and re-purposed to address a different bus after the fact, which is not the typical use of this helper

@@ -636,12 +676,12 @@ func (m *Monitor) getOrCreateChannelSummary(key channelKey) *channelSummary {
}

func (m *Monitor) createOrUpdateBus(bus *channelsv1alpha1.Bus) error {
if bus.Name != m.bus.Name {
if bus.Namespace == m.bus.GetObjectMeta().GetNamespace() && bus.Name != m.bus.GetObjectMeta().GetName() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be if ns != ns2 || name != name2 ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is verifying that the bus and channel are in the same namespace and there is a named bus change, so I think it is correct. Otherwise you would allow a new name to be assigned to a bus/channel in different namespaces.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eric is correct here, the test should be for either a ns or name mismatch. Will fix

@@ -73,7 +73,7 @@ func (b *StubBus) handleEvent(res http.ResponseWriter, req *http.Request) {
safeHeaders.Set("x-bus", b.name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, this doesn't reliably id the bus

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point - this should include an indication of cluster/ns scope and ns, if applicable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added these headers early on for debuggability. I'm comfortable either removing them, or adding richer metadata. Either way the name x-bus and x-channel will need to evolve.

@@ -73,7 +73,7 @@ func (b *StubBus) handleEvent(res http.ResponseWriter, req *http.Request) {
safeHeaders.Set("x-bus", b.name)
safeHeaders.Set("x-channel", channel)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto for the channel

func (b *StubBus) resolveSubscriber(subscription channelsv1alpha1.SubscriptionSpec, namespace string) string {
subscriber := subscription.Subscriber
if strings.Index(subscriber, ".") == -1 {
subscriber = fmt.Sprintf("%s.%s.svc.cluster.local", subscriber, namespace)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that the svc.cluster.local part is not required and shouldn't be hardcoded because it could be different on some clusters

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's accurate - the cluster dns suffix is programmable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eww, the istio docs recommend not using short names and instead using the fully qualified form. See the note under the hosts property https://istio.io/docs/reference/config/istio.networking.v1alpha3/#VirtualService

For the purpose of this PR I can drop down to just the name.namespace form.

@scothis
Copy link
Contributor Author

scothis commented Jun 27, 2018

Thanks for all the feedback, I believe I addressed everything that was actionable.

@scothis
Copy link
Contributor Author

scothis commented Jun 28, 2018

/test pull-knative-eventing-build-tests

@scothis
Copy link
Contributor Author

scothis commented Jun 28, 2018

@vaikas-google PTAL

@n3wscott
Copy link
Contributor

n3wscott commented Jul 2, 2018

/lgtm

@google-prow-robot google-prow-robot added the lgtm Indicates that a PR is ready to be merged. label Jul 2, 2018
@evankanderson
Copy link
Member

/approve

@vaikas-google is out for the week.

@google-prow-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: evankanderson, scothis

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@google-prow-robot google-prow-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 2, 2018
@google-prow-robot google-prow-robot merged commit ef4a2ba into knative:master Jul 2, 2018
matzew pushed a commit to matzew/eventing that referenced this pull request May 23, 2019
creydr pushed a commit to creydr/knative-eventing that referenced this pull request Feb 14, 2023
Signed-off-by: Matthias Wessendorf <mwessend@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants