-
Notifications
You must be signed in to change notification settings - Fork 324
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cluster Peering #1273
Cluster Peering #1273
Conversation
@@ -142,14 +142,13 @@ jobs: | |||
- run: mkdir -p ${{env.TEST_RESULTS}} | |||
- run: echo "$HOME/bin" >> $GITHUB_PATH | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes in this file will need to be reverted before merging-- but we need the Consul release to happen first. Then we need to bump the CONSUL_VERSION variable to be the latest Consul release
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we won't be able to merge this until after Consul's release?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- yup
a9637a9
to
3f6b9c0
Compare
@@ -142,14 +142,13 @@ jobs: | |||
- run: mkdir -p ${{env.TEST_RESULTS}} | |||
- run: echo "$HOME/bin" >> $GITHUB_PATH | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we won't be able to merge this until after Consul's release?
@@ -69,6 +68,8 @@ type ExportedService struct { | |||
type ServiceConsumer struct { | |||
// Partition is the admin partition to export the service to. | |||
Partition string `json:"partition,omitempty"` | |||
// [Experimental] PeerName is the name of the peer to export the service to. | |||
PeerName string `json:"peerName,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: should this be Peer
to match partition above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have matched the name from the consul config entry so far but I actually do prefer this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think its ok to have a different field name from the consul config entry/have we done this before? Should we consider changing the config entry field in core to also match?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be great if we could change this in core as well TBH. I think peer
is verbose enough.
control-plane/connect-inject/peering_acceptor_controller_test.go
Outdated
Show resolved
Hide resolved
if specSecret == nil { | ||
err = errors.New("PeeringDialer spec.peer.secret does not exist") | ||
r.updateStatusError(ctx, dialer, err) | ||
return ctrl.Result{}, err | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should getSecret
return an error for this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the future, this step will move into a validation webhook. and trying to get a secret that does not exist is a valid use case for when we want to get the secret from the status.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 In this case though we're getting the secret from the spec not status. And the call above on line 88 gets the secret. Then here we're checking if secret is nil and returning an error, so I'm wondering why not combine these two calls into one and just make getSecret
return an error when the secret doesn't exist instead of returning an error here.
Maybe I'm misunderstanding something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aah! sorry. I should have been specific. We use the method to get the status secret (L110) too. And if it is the first reconcile, it is ok if the secret doesn't exist, so it would be ok for getSecret to not return an error if the secret does not exist.
@ndhanushkodi will changelog be added separately or in this PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! A couple of follow-up comments, but won't block approval as everything looks great!
❤️ ❤️ ❤️
@@ -11,13 +11,13 @@ import ( | |||
// the metrics merging server when metrics merging feature is enabled. | |||
// It always disables service registration because for connect we no longer | |||
// need to keep services registered as this is handled in the endpoints-controller. | |||
func (h *Handler) consulSidecar(pod corev1.Pod) (corev1.Container, error) { | |||
metricsPorts, err := h.MetricsConfig.mergedMetricsServerConfiguration(pod) | |||
func (w *MeshWebhook) consulSidecar(pod corev1.Pod) (corev1.Container, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️
if source.Peer != "" && source.Partition != "" { | ||
errs = append(errs, field.Invalid(path.Child("sources").Index(i), source, `Both source.peer and source.partition cannot be set.`)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yes I do mean that! Thank you for reading my mind!
|
||
service := parts[0] | ||
|
||
pieces := strings.Split(service, ".") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, sounds good!
if err := r.deletePeering(ctx, req.Name); err != nil { | ||
return ctrl.Result{}, err | ||
} | ||
if acceptor.Secret().Backend == "kubernetes" { | ||
if err := r.deleteK8sSecret(ctx, acceptor); err != nil { | ||
return ctrl.Result{}, err | ||
} | ||
} | ||
controllerutil.RemoveFinalizer(acceptor, FinalizerName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah that is surprising, but I suppose is a good thing for this case since this will just work.
if specSecret == nil { | ||
err = errors.New("PeeringDialer spec.peer.secret does not exist") | ||
r.updateStatusError(ctx, dialer, err) | ||
return ctrl.Result{}, err | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 In this case though we're getting the secret from the spec not status. And the call above on line 88 gets the secret. Then here we're checking if secret is nil and returning an error, so I'm wondering why not combine these two calls into one and just make getSecret
return an error when the secret doesn't exist instead of returning an error here.
Maybe I'm misunderstanding something.
|
||
// After reconciliation, Consul should have a peering with its DeletedAt set. | ||
peering, _, err := consulClient.Peerings().Read(context.Background(), "dialer-deleted", nil) | ||
require.NotNil(t, peering.DeletedAt) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seeing the tests fail with a nil pointer exception here. I think its because the deletion logic was implemented in Consul, and the peering is sometimes being deleted before we can read it here. So I don't think we can make this assertion anymore. Maybe we can do a retry to check that the peering is nil? Since it's a deferred delete?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need to build a newer Consul image. I ran into the same issue but turns out, I had an older Consul binary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the new binaries for consul/consulent and saw the same issue in the pipeline over multiple runs with the new binaries. But locally with those same binaries I was seeing different results. I think it is the deferred deletion logic that was added since I added a retry that waits 5 seconds and retries asserting that the peering is nil instead of looking at the DeletedAt stamp, and now it doesn't intermittently fail with nil pointer exceptions. I'll try pushing it now and seeing if it passes in the pipeline. Edit: I see it pass the pipeline now!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
THE DEFERRED DELETION!!! ok ok. that makes sense. thank you so much Nitya! that was an excellent catch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Woot woot! lets get this puppy merged today!!
4b2ee85
to
46ba198
Compare
Adds new PeeringAcceptor CRD and controller with unit tests. This controller determines whether to generate a new peering token in Consul by reconciling the PeeringAcceptor resource. It currently supports writing the peering token to a K8s secret. Generating the new CRD and API: After updating control-plane/PROJECT L6 to reference the control plane folder, ran the command: `operator-sdk create api --group consul --version v1alpha1 --kind Peering --controller --namespaced=true --make=false --resource=true`
PeeringDialer CRD and controller, not including unit tests.
- Clean up some of the logic in peering dialler and acceptor - Rename handler -> connectWebhook
update source intentions with peer name and bump api module and change initiate -> establish
- Remove ownerRef from secret created by acceptor.
- when global.peering.enabled is true, and connectInject.enabled is true, enable the peering controllers and the peering CRDs.
- rename exported service CRD consumer.peerName field from peerName --> peer - rename ConnectWebhook --> MeshWebhook - add some missing unit test cases - only add permissions to connect-inject service account for peering CRs when peering is enabled
Changes proposed in this PR:
global.peering.enabled
which will deploy the controllers and CRs for peering, if connectInject.enabled is also true.How I've tested this PR:
unit tests and manual testing using steps here
How I expect reviewers to test this PR:
👀
Checklist: