-
Notifications
You must be signed in to change notification settings - Fork 382
Share OSB client for ServiceBroker #2337
Share OSB client for ServiceBroker #2337
Conversation
487afab
to
3f33290
Compare
/ok-to-test |
/test pull-service-catalog-integration |
3f33290
to
4c73b89
Compare
@@ -119,13 +124,45 @@ func (c *controller) reconcileClusterServiceBrokerKey(key string) error { | |||
return c.reconcileClusterServiceBroker(broker) | |||
} | |||
|
|||
func (c *controller) updateClusterServiceBrokerClient(broker *v1beta1.ClusterServiceBroker) (osb.Client, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking about a change:
return only one value here - the error
the brokerClientManager UpdateBrokerClient also return only error
it seems to be better, but I'd like to wait for tests and comments about the main concept - caching OSB clients
return client, nil | ||
} | ||
|
||
func (m *BrokerClientManager) configHasChanged(cfg1 *osb.ClientConfiguration, cfg2 *osb.ClientConfiguration) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this be better if it was a function instead of a method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wrote the helper method to be used inside BrokerClientManager and I don't want to expose it. It was not designed to be used by other components.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed it to a function.
@@ -0,0 +1,132 @@ | |||
/* | |||
Copyright 2017 The Kubernetes Authors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: 2018
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've only got a couple of minor nits. This looks solid to me. I had concerns around this concept but on review it looks good. I'll take a closer look at the tests you removed, I would think they are still valid tests but perhaps need to be reworked?
I've mentioned it to @nilebox, it would be beneficial to have additional review other others that have been deep in this code. Also @kibbles-n-bytes if you have cycles.
delete(m.clients, brokerKey) | ||
} | ||
|
||
// BrokerClient returns broker client fro a broker specified by the brokerKey |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: spelling s/fro/for/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
@@ -0,0 +1,139 @@ | |||
/* | |||
Copyright 2017 The Kubernetes Authors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit 2018
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
I think the test I removed, and additional integration test I skipped should be reworked. The old test was testing how the controller behaves when the service broker auth is wrong with processing serviceinstances. The test was testing scenario, where "controller fails to locate the broker authentication secret." In current solution - the controller does not need to locate the secret - it is done when clusterservicebroker is processd. I'll try to create such test for broker resource processing. |
I'm thinking about checking the size of the cache in a unit (or integration) test. Just to be sure it is not growing without reason (to not make new memory leak). |
I've added a test for non-existing broker and reconcile service instance. I realized one change - how controller handles not existing secret with auth credentials. In my solution, when a user creates a ClusterServiceBroker instance with a reference to a secret which does not exists - the broker client won't be created. After that, if he creates a secret - nothing changes until next reconciliation. Maybe that is an issue. |
I performed tests described in the issue: #2276 I've applied the PR to new version 0.1.32 and the result is: no restart I've also tested the fix with version 0.1.31 and I saw the controller manager pod was working more than one day without restart. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a test for non-existing broker and reconcile service instance. I realized one change - how controller handles not existing secret with auth credentials. In my solution, when a user creates a ClusterServiceBroker instance with a reference to a secret which does not exists - the broker client won't be created. After that, if he creates a secret - nothing changes until next reconciliation. Maybe that is an issue.
Can you elaborate on this - "nothing changes until the next reconciliation" - the user creates the missing secret, the broker client will be created when the exponential backoff expires and it does the retry, right? If that is the case, it seems pretty correct to me, I'm good with that.
Thanks for the additional analysis and long runs @piotrmiskiewicz. This is looking good, I'd like to move this forward. @luksa reviewed last week and only had one minor comment, I discussed briefly with @nilebox and he was on board with the idea. Let's get one more review.
// TestReconcileServiceInstanceWithAuthError tests reconcileInstance when Kube Client | ||
// fails to locate the broker authentication secret. | ||
func TestReconcileServiceInstanceWithAuthError(t *testing.T) { | ||
// TestReconcileServiceInstanceWithNotExistingBroker tests reconcileInstance when the BrokerClientManager instance does not contain client for the broker. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: as a rule we wrap all function comments at column 80
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jboyd01 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
I'll give an example. The way it works is like an image pull secret, when you need credentials for a Docker repository. If you specify an
In my opinion it is not a problem, but I wanted to describe what was changed |
re #2337 (comment) |
3196f48
to
156ab76
Compare
156ab76
to
b717a14
Compare
/retest |
/test pull-service-catalog-integration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. I'm just not completely sure about the fact that we now create the client only when reconciling the broker. When reconciling other resources, we now no longer create the client, but simply log an error.
The new way seems more correct, but I need to think about the implications.
The new way also ensures we only retrieve the broker auth secret once instead of every time.
FYI: I tested this manually, and have confirmed that the broker only has one open connection for each ServiceBroker/ClusterServiceBroker
instance.
// BrokerKey defines a key which points to a broker (cluster wide or namespaced) | ||
type BrokerKey struct { | ||
name string | ||
namespace string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Idea for a future improvement: consider a case where a large multi-user cluster has a large number of ServiceBroker
instances all pointing to the same broker (with the same osb.ClientConfiguration
). We may want to ensure the connections are shared between all those ServiceBrokers
, so we don't hold too many open connections to the same broker.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the improvement would be the key is not a namespace/name pair but the configuration (the TLS config). The authentication part (username/password) is not set in the golang http.Client. The best improvement would be a change in the OSB client and share http.Client even if username/password is different. This change is much bigger.
Anyway, the current solution (without my implementation of sharing clients) - when the resync is set to 5 min for all 1000 registered brokers (3 get catalog request per second) - the controller manager will be restarting every few minutes (because of "out of memory").
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. The improvement should go in later, in a separate PR. We need to get this PR in fast, since it will solve a lot of problems for us.
"Error getting broker auth credentials for broker %q: %s", | ||
broker.Name, err, | ||
"The instance references a broker %q which has no OSB client created", | ||
serviceClass.Spec.ClusterServiceBrokerName, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When does this happen? Previously, we would create the client here, but now we expect it to always exist at this point. Would panicking be more appropriate, since we're not expecting the client to not exist here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A 'clusterservicebroker' resource is created and at the same time as a 'serviceclass' provisioning request. The provisioning request is being processed before processing adding 'clusterservicebroker'. From the controller perspective (or the implementation of the method) it could happen. Another scenario - deletion of clusterservice broker comes to the controller at the same time as 'deprovisioning' request. Before my PR the problem also could happen. The client can be removed before processing. I'm not aware of every details. If such error occurs, the call will be retried with proper backoff policy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I realized later that we definitely shouldn't panic.
@@ -107,8 +108,12 @@ func shouldReconcileClusterServiceBroker(broker *v1beta1.ClusterServiceBroker, n | |||
func (c *controller) reconcileClusterServiceBrokerKey(key string) error { | |||
broker, err := c.clusterServiceBrokerLister.Get(key) | |||
pcb := pretty.NewContextBuilder(pretty.ClusterServiceBroker, "", key, "") | |||
|
|||
glog.V(4).Info(pcb.Message(fmt.Sprintf("Processing service broker %s", key))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Redundant info. This is how the log line looks:
...icebroker.go:112] ClusterServiceBroker "ups-broker": Processing service broker ups-broker
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
It is what I described before, it is like On the other hand, storing credentials in the OSB client maybe is not the best solution. Providing credentials in every call would fix the problem. It allows us to implement better caching - few brokers with different credentials but one TLS config could share one OSB client. |
/lgtm |
This PR is a
What this PR does / why we need it:
This PR introduces BrokerClientManager which stores OSB clients - one client per broker. It allows to share one OSB client instance for all calls to the broker. It prevents the controller from creating OSB clients for every operation and it follows the description of the Golang HTTP client: "The Client's Transport typically has internal state (cached TCP connections), so Clients should be reused instead of created as needed. Clients are safe for concurrent use by multiple goroutines."
Which issue(s) this PR fixes
Fixes #2276
Please leave this checklist in the PR comment so that maintainers can ensure a good PR.
Merge Checklist:
breaking the chart release and existing clients who provide a
flag that will get an error when they try to update