-
Notifications
You must be signed in to change notification settings - Fork 600
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cluster scoped Bus #117
Cluster scoped Bus #117
Conversation
My understanding is that Cluster scoped resources should be still named indicating that regardless of if there's Namespace scoped counterparts. So, I think this should be ClusterBus. @pmorie who has more recent information on it, thoughts? |
@vaikas-google you're correct - the current guidance is that cluster-scoped resources should have names prefixed with
|
@@ -25,6 +25,7 @@ import ( | |||
|
|||
// +genclient | |||
// +genclient:noStatus | |||
// +genclient:nonNamespaced |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a comment for my own edification. I imagine the motivation here is partially due to efficiency for developers - ie, you don't want to have to maintain independent namespaced versions of this resource for every namespace that wants to use the bus - is that accurate? I'm still learning this API, but it also looks like there are concerns for where containers are run (ie, provision and dispatch containers).
If that's accurate, then there's more context I can share that may be helpful, having recently worked through a similar problem in the cluster-registry that may be relevant to discuss.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that we probably want to remove the provisioner and dispatcher containers from the Bus spec -- there are possible efficient multi-tenant implementations of a Bus which wouldn't need long-running management containers for channels which could simply have a controller watching the Channel and Subscription resources which configures the underlying system.
The motivation for moving Bus to a cluster-level resource is that the backing implementation for most buses is likely to involve configuring a semi-complex storage backend (e.g. Kafka or the equivalent). Our hypothesis is that provisioning and installing a kafka cluster is more likely to be an operator concern which will then be shared across multiple development teams than each development team managing their own kafka cluster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The move to cluster scope is because many buses require a backing messages broker, which typically should be managed by an operator rather than a developer. We can also gain efficiencies by avoiding the need for redundant deployments in each namespace for functionality that could be provided centrally.
With the namespaces model, we created two deployments (provisioner, dispatcher) in the same namespace as the bus. Now those deployments have moved into the knative-eventing namespace. Because these deployments need to communicate with the k8s api, we also need a service account with a rolebinding to read the related resources. The service account and role binding were created dynamically for the namespaced bus, but can be defined statically for the cluster wide bus.
If that's accurate, then there's more context I can share that may be helpful, having recently worked through a similar problem in the cluster-registry that may be relevant to discuss.
Always happy to learn something new.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[...] which could simply have a controller watching the Channel and Subscription resources which configures the underlying system.
To clarify the role of the provisioner
and dispatcher
containers: this is exactly what the provisioner
is for. It is a controller-like (helped by BusMonitor
) container that watches for changes to Channel
resources and translates those changes to middleware specific changes (eg create a Kafka topic on an already existing Kafka broker). The role of the provisioner is not to create a new instance of a Kafka broker deployment (although that is an interesting idea).
I can see a world where this is taken care of by a Job
(where the k8s custom resources changes are translated eg to program arguments for a container that has middleware specific logic), but this does not offer the same level of flexibility in terms of responsiveness and error message propagation, etc
The dispatcher
pod on the other hand has to be long running IMHO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@evankanderson would you like to sketch out your architectural idea in an issue or in the Channel CRD doc? I'm having trouble understanding the full implications of your proposal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, I left a comment in that document on how I was thinking that this might look like. It inverts this a little bit from CRD => Creating the bus. So instead of a a CRD creating the buses, the bus would be created by some other mean (helm chart, etc.)
A CRD would get created representing the Bus as a resource and a controller would run for that bus, watching for Channel / Subscriptions for that bus and talk to the underlying bus resource whatever the native way is for that bus.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The move to cluster scope is because many buses require a backing messages broker, which typically should be managed by an operator rather than a developer. We can also gain efficiencies by avoiding the need for redundant deployments in each namespace for functionality that could be provided centrally.
This is very similar to the motivation we had for cluster-scoping service brokers in the service-catalog.
So instead of a a CRD creating the buses, the bus would be created by some other mean (helm chart, etc.)
I was actually going to ping you on slack @vaikas-google and ask about this. I am majorly +1 on @vaikas-google's suggestion that the bus should be created out of band relative to the CRD.
I'll rename |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly LGTM; one direct nit, but also the general comment I left for @pmorie that the provisioner/dispatcher containers feel like an implementation detail that would be better configured by the containers.
Having a declaration of expected parameters (with or without defaults) feels useful here, though.
I can even imagine a bus implementation which behind the scenes sets up and manages a kafka cluster per namespace, though I'm not signing up to write it.
@@ -95,6 +96,14 @@ func (b *StubBus) dispatchEvent(subscriber string, body []byte, headers http.Hea | |||
} | |||
} | |||
|
|||
func (b *StubBus) resolveSubscriber(subscription channelsv1alpha1.SubscriptionSpec, namespace string) string { | |||
subscriber := subscription.Subscriber | |||
if strings.Index(subscriber, ".") == -1 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this indicate that subscriber names should not contain ".", or is this a shortcut way of being able to refer to subscribers in the same namespace by their short names?
The API is somewhat mysterious as to the expected format of this string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we already had an issue to track this, but I couldn't find it. Opened #118
The current stopgap value is a naive string which is treated as a DNS name for a k8s service. Because the dispatcher pods are now running in a different namespace from the subscription, we add the subscription's namespace to the string if string is a single segment (label) DNS name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't that what LocalObjectReference
is for?
I think in the long run we'll want to lookup the Service
object this refers to, because it provides additional info like what port(s) to connect to and how (via the port name, which is likely to indicate the kind of traffic expected). Any higher level construct (like route
) which can ultimately be mapped to a service is fine, but it has to be a service (not just a DNS name)
I am very much +1 on allowing Buses to be operated out of band from how they are modeled as resources in knative. People are going to be very opinionated on how they deploy complex packages. |
Make the Bus resource cluster scpoped instead of namespaced. Bus releated deployments will be put into the knative-eventing namespace and use a shared serviceaccount for all bus provisioners and dispatchers. This PR does not change the name of the Bus resource to ClusterBus. That should happen later if we desire to have both cluster wide and namespaced buses. Subscribers using a short DNS name will automatically be expanded to use the full name.namespace.svc.cluster.local form based on the namespace of the subscription resource. Other bus implementations will also need to make this change. Fixes #106
Channels can target either a Bus or a ClusterBus, but never both.
This PR includes both a Bus and ClusterBus implementation. The bus monitor has been updated to handle both types transparently. Tested with the stub bus implementation deployed as a standard Bus and as a ClusterBus. |
@@ -78,3 +87,11 @@ type BusList struct { | |||
meta_v1.ListMeta `json:"metadata"` | |||
Items []Bus `json:"items"` | |||
} | |||
|
|||
// GenericBus may be backed by Bus or ClusterBus | |||
type GenericBus interface { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pmorie do you have a better pattern for working with cluster and namespaced resources in a similar way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If their specs are going to be the exact same, this seems better than what I've had to do in the past :)
pkg/webhook/channel.go
Outdated
errInvalidChannelBusMutation = errors.New("the Channel's Bus may not change") | ||
errInvalidChannelInput = errors.New("failed to convert input into Channel") | ||
errInvalidChannelBusMissing = errors.New("the Channel must reference a Bus or ClusterBus") | ||
errInvalidChannelBusMutation = errors.New("the Channel's ClusterBus may not change") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like this should be Bus
instead of ClusterBus
@@ -43,11 +44,16 @@ func ValidateChannel(ctx context.Context) ResourceCallback { | |||
} | |||
|
|||
func validateChannel(old, new *v1alpha1.Channel) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should validate that Bus
and ClusterBus
aren't both set at the same time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
@@ -55,3 +62,15 @@ func TestChannelBusMutation(t *testing.T) { | |||
t.Errorf("Expected %s got %s", e, a) | |||
} | |||
} | |||
|
|||
func TestChannelClusterBusMutation(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be a test case for both Bus
and ClusterBus
being set at the same time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
@@ -38,8 +38,11 @@ type Channel struct { | |||
// ChannelSpec (what the user wants) for a channel | |||
type ChannelSpec struct { | |||
|
|||
// Name of the bus backing this channel (optional) | |||
Bus string `json:"bus` | |||
// Bus name of the bus backing this channel (mutually exclusive with ClusterBus) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it would be cleaner to always include the namespace for the bus in the channel and remove clusterBus here.
Or add a busNamespace property to signal which namespace we are referring to.
I have seen this clusterFoo and Foo pattern before and I have always felt like it complicates the understanding of the object over the wire...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my understanding, a ClusterBus is not the same thing as a (namespaced) Bus in a namespace different from the channel namespace. At least in usage intention.
It is a Bus that is widely available for anyone to use
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there's a good way to use the lack of an explicit namespace to mean cluster scoped. Users are not used to having to explicitly state the namespace - this will complicate templates/charts/etc. I would expect a name with no explicit namespace to mean 'my namespace', not cluster scope.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @pmorie here.
The alternative implementation would be to do something like:
spec:
bus:
type: cluster
name: pubsub
The trade off is a single field in config that is set, and the user has to know which field to use, or two nested fields that both must be set. I don't have a strong opinion as to which model is best.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh nice, I like this idea ^, it gets us closer to label selector style syntax.
I could be alone, but it bothers me when there are mutually exclusive fields
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should type
be scope
? could it be a boolean (cluster-scoped
), and should it have a default value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
scope
is better than type
. Not crazy about using a boolean, but we could certainly make either namespace
or cluster
the default.
I'm willing to go with the group consensus here.
meta_v1.TypeMeta `json:",inline"` | ||
meta_v1.ObjectMeta `json:"metadata"` | ||
Spec ClusterBusSpec `json:"spec"` | ||
Status *ClusterBusStatus `json:"status,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Statuses don't use pointers, from my reading of other examples in the k8s codebase
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Accurate, status is not a pointer field.
pkg/buses/monitor.go
Outdated
@@ -58,13 +59,15 @@ const ( | |||
errResourceSync = "ErrResourceSync" | |||
) | |||
|
|||
// Monitor utility to manage channels and subscriptions for a bus | |||
// Monitor utility to manage channels and subscriptions for a clusterbus |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this got caught in a find/replace session.
pkg/buses/monitor.go
Outdated
@@ -178,7 +181,7 @@ type subscriptionSummary struct { | |||
Subscription channelsv1alpha1.SubscriptionSpec | |||
} | |||
|
|||
// NewMonitor creates a monitor for a bus | |||
// NewMonitor creates a monitor for a clusterbus |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
idem, this is really about a GenericBus
, not a clusterbus specifically
pkg/buses/monitor.go
Outdated
channelInformer := informerFactory.Channels().V1alpha1().Channels() | ||
subscriptionInformer := informerFactory.Channels().V1alpha1().Subscriptions() | ||
|
||
// Create event broadcaster | ||
// Add bus-controller types to the default Kubernetes Scheme so Events can be | ||
// logged for bus-controller types. | ||
// Add clusterbus-controller types to the default Kubernetes Scheme so Events can be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
idem
pkg/buses/monitor.go
Outdated
// Set up an event handler for when Bus resources change | ||
busInformer.Informer().AddEventHandler(cache.ResourceEventHandlerFuncs{ | ||
// Set up an event handler for when ClusterBus resources change | ||
clusterBusInformer.Informer().AddEventHandler(cache.ResourceEventHandlerFuncs{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this come in addition to the one for Bus?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Scratch that, one Monitor should only track one kind of bus. But this should be for the super type.
I believe this currently works (haven't tested) by coincidence because of the type alias
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, need to add back the event handlers for Bus.
It works now because the bus is actively looked up rather than waiting for the informer event, however, update for Bus will be broken until restored.
bus, err := m.busesLister.Buses(namespace).Get(name) | ||
if err != nil { | ||
glog.Fatalf("Unknown bus '%s/%s'", namespace, name) | ||
if len(busNamespace) == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to answer my earlier question: the kind of Bus being monitored is decided by Run()
rather than by NewMonitor
. This is weird IMHO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The core of the monitor doesn't care if it's working with a Bus or a ClusterBus. The Run method starts the informers and gets a reference to the target bus object. When the monitor is created the informers haven't yet synched, so the bus isn't available.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My point is that passing name + namespace should happen at creation time rather than usage time (Run()
). Although technically possible, this gives the impression that a monitor can be reused and re-purposed to address a different bus after the fact, which is not the typical use of this helper
pkg/buses/monitor.go
Outdated
@@ -636,12 +676,12 @@ func (m *Monitor) getOrCreateChannelSummary(key channelKey) *channelSummary { | |||
} | |||
|
|||
func (m *Monitor) createOrUpdateBus(bus *channelsv1alpha1.Bus) error { | |||
if bus.Name != m.bus.Name { | |||
if bus.Namespace == m.bus.GetObjectMeta().GetNamespace() && bus.Name != m.bus.GetObjectMeta().GetName() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be if ns != ns2 || name != name2
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is verifying that the bus and channel are in the same namespace and there is a named bus change, so I think it is correct. Otherwise you would allow a new name to be assigned to a bus/channel in different namespaces.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eric is correct here, the test should be for either a ns or name mismatch. Will fix
@@ -73,7 +73,7 @@ func (b *StubBus) handleEvent(res http.ResponseWriter, req *http.Request) { | |||
safeHeaders.Set("x-bus", b.name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically, this doesn't reliably id the bus
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point - this should include an indication of cluster/ns scope and ns, if applicable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added these headers early on for debuggability. I'm comfortable either removing them, or adding richer metadata. Either way the name x-bus
and x-channel
will need to evolve.
@@ -73,7 +73,7 @@ func (b *StubBus) handleEvent(res http.ResponseWriter, req *http.Request) { | |||
safeHeaders.Set("x-bus", b.name) | |||
safeHeaders.Set("x-channel", channel) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto for the channel
pkg/buses/stub/main.go
Outdated
func (b *StubBus) resolveSubscriber(subscription channelsv1alpha1.SubscriptionSpec, namespace string) string { | ||
subscriber := subscription.Subscriber | ||
if strings.Index(subscriber, ".") == -1 { | ||
subscriber = fmt.Sprintf("%s.%s.svc.cluster.local", subscriber, namespace) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that the svc.cluster.local
part is not required and shouldn't be hardcoded because it could be different on some clusters
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's accurate - the cluster dns suffix is programmable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
eww, the istio docs recommend not using short names and instead using the fully qualified form. See the note under the hosts property https://istio.io/docs/reference/config/istio.networking.v1alpha3/#VirtualService
For the purpose of this PR I can drop down to just the name.namespace
form.
Thanks for all the feedback, I believe I addressed everything that was actionable. |
/test pull-knative-eventing-build-tests |
@vaikas-google PTAL |
/lgtm |
/approve @vaikas-google is out for the week. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: evankanderson, scothis The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Adding OLM for 0.6.0
Signed-off-by: Matthias Wessendorf <mwessend@redhat.com>
Add ClusterBus in addition to Bus for buses that are operator provisioned. Channels may target either a Bus by name within the current namespace, or a ClusterBus installed cluster wide.
The bus monitor is able to manage both Bus or ClusterBus and their Channels/Subscriptions.
ClusterBus backed Subscriptions using a short DNS name will automatically be expanded to
use the full name.namespace.svc.cluster.local form based on the
namespace of the subscription resource. Other bus implementations will
also need to make this change.
Fixes #106
/assign @evankanderson @ericbottard