Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement] Add extra Secret + Password checks to KafkaConnect #3475

Closed
samuel-hawker opened this issue Aug 7, 2020 · 13 comments · Fixed by #5983
Closed

[Enhancement] Add extra Secret + Password checks to KafkaConnect #3475

samuel-hawker opened this issue Aug 7, 2020 · 13 comments · Fixed by #5983

Comments

@samuel-hawker
Copy link
Contributor

Is your feature request related to a problem? Please describe.

For Kafka Connect, one common problem I've seen is people putting the actual password in their KafkaConnect Custom Resource (instead of the name of the password field in the secret)
It'd be a great enhanement if the operator updated the status of the KafkaConnect CR when you do that, so it says "you asked for password 'blah' but there is no key 'blah' in your secret" or something helpful and friendly like that.

Similarly, there has been some confusion around certs. Where a user has been using a KafkaUser cert in spec.tls.trustedCertificates where the KafkaConnect spec was expecting a cluster ca cert
It wasn't at all obvious to the user why his connect cluster was broken.

If we could do anything to tell if we've been given the right type of cert, it'd be a huge usability improvement.

An examples:

kind: KafkaConnectS2I
metadata:
  name: my-connect-cluster
spec:
  # ...
  authentication:
    passwordSecret:
      password: my-password
      secretName: my-secret

Describe the solution you'd like
The solution should be if the operator can, at runtime infer any reasoning why it is not working, this should be propagated into the status of the custom resource.

Describe alternatives you've considered
The only other alternative is changing the format of the CR to be clearer the password is a key, i.e.

    passwordSecret:
     # key: my-password
     or
      #passwordKey: my-password
      secretName: my-secret

I think documentation and examples already exist for this that is more than sufficient, but sometimes a user may still make these mistakes

@samuel-hawker
Copy link
Contributor Author

Up for discussion how it would be implemented, but I think it best belongs in the status.conditions as an error or warning.

@ppatierno
Copy link
Member

Similarly, there has been some confusion around certs. Where a user has been using a KafkaUser cert in spec.tls.trustedCertificates where the KafkaConnect spec was expecting a cluster ca cert
It wasn't at all obvious to the user why his connect cluster was broken.

Tbh maybe it's just a lack of knowledge on how TLS works for people who make the mistake.
When you say "trusted certificates store" it should be clear that it's something you use for trusting what you are getting from the remote peer and you have to verify and trust.
In a TLS handshake, you always get a server certificate and you can trust it:

  • having the certificate used for signing the server certificate
  • having the server certificate itself

The above has to be in the trust store.
Why using a KafkaUser certificate when it's for client authentication and why they should think that it's a "trusted" cert?
I should not put anything about myself (a KafkaUser cert) ... I trust myself :-)

Anyway, what do you expect to have in the status? An error about TLS handshake?

@samuel-hawker
Copy link
Contributor Author

Tbh maybe it's just a lack of knowledge on how TLS works for people who make the mistake.

I think this is a very valid point, but on the other hand, I would like to aspire that as little security knolwedge needs to be known as possible to connect to your cluster (if that makes sense).

I think however the operator is capable of figuring out bad secrets from what I might define as red flags, for instance the operator doesn't even need to look at the data inside the secret to realize it is bad, i.e.
if a secret has the label: kafka.strimzi.io/kind: KafkaUser we can immediately inform them of their mistake.

I think also the operator has the liberty of being able to infer if the provided bootstrp belongs to a strimzi instance (i.e. via the name construction - if it suspects that it is a strimzi bootstrap, it could search the listeners in the Kafka object associated, and check that the certificate provided is correct either the default -cluster-ca-cert or the provided ca cert in the spec.

This logic would then lead to a warning being put in th status.conditions of the KafkaConnect if the cert smells wrong. Better yet, it could even suggest the correct configuration of the Strimzi Kafka cluster-ca

@samuel-hawker
Copy link
Contributor Author

Similarly for the password, the operator could simply check that the provided secret name contains the provided key.

@fjbecerra
Copy link
Contributor

hey, i'd like to give this a go, do you think is ready for dev? if so, assign to me please

@scholzj
Copy link
Member

scholzj commented Oct 7, 2021

I do not think anyone implemented this yet. But it might be good if @samuel-hawker helps to clarify what exactly should be implemented.

@samuel-hawker
Copy link
Contributor Author

I'm glad someone is picking this up!

I've been thinking on this, and normally the flow is:
User supplied secret reference via the CR
Operator reads the CR and propagates this reference to an underlying resource (normally a deployment or statefulset)

An error is normally only observed through noticing the deployment is erroring with cm/secret not found or something similar.
It would be nice to have a utility something like SecretReferenceValidator (and similar for CMs) that validates a user supplied secret reference + key are indeed present and existing.
Think of it as a basic sanity check equivalent of running kubectl get secret my-secret -ojsonpath={.data.my-key}

If we made it fairly generic, we would have the opportunity to implement it for other configurations, such as MetricsConfig etc in other CRs.

If the validator fails to find what its looking for, it passes back a user friendly error to the user saying, secret my-secret does not contain key my-key and individual callers could add additional context like this secret key pair is used for the truststore and is normally generated by doing the following https://strimzi.io/docs/operators/latest/using.html#security-str

Does that seem like a reasonable value add still?
It doesn't even need to error a reconcile, a warning may be sufficient in most cases.

@scholzj
Copy link
Member

scholzj commented Oct 7, 2021

Does that seem like a reasonable value add still?
It doesn't even need to error a reconcile, a warning may be sufficient in most cases.

The Secrets or ConfigMaps would normally cause something to fail if they are not valid ... either the Pods will be pending or some exceptions pops up in the operator. So maybe failing with better message is better.

@samuel-hawker Did you checked if something like this is already implemented elsewhere? I also wonder how muhc it conflicts with the work in #5549

@samuel-hawker
Copy link
Contributor Author

Did you check if something like this is already implemented elsewhere?

I have not, it would definitely be worth looking into.
As for #5549 I am unsure if this conflicts with it, in theory replacing certs would just be replacing keys in a pre-existing secret, or making a new secret, right?
If there is a conflict of interest here though we could always now apply validation to secret keys that we know might change.

@scholzj
Copy link
Member

scholzj commented Oct 7, 2021

As for #5549 I am unsure if this conflicts with it, in theory replacing certs would just be replacing keys in a pre-existing secret, or making a new secret, right?

That is correct. But at the same time, to collect the fingerprints it would need to query the secrets. That is where it intersects since we should make sure it doesn't for example query the secrets multiple times etc.

@samuel-hawker
Copy link
Contributor Author

That makes sense, I will look further into the fingerprints, and will do some searching around other operators and kube projects to see if a similar thing to this has already been done.

@fjbecerra
Copy link
Contributor

I think there's no need to do an extra query to secrets since that info it's already present while reconciling here .
However, by stepping into Util.authTlsHash(...) here , for both cases AuthenticationPlain & KafkaClientAuthenticationScramSha512, it checks whether the secret is present or not, however in the case when the secret is present, it doesn't check if the password key exists, but in the case of, it does exist, it's already in memory as secretOperator.getAsync(...) returns a Future<Secret>, where secret.getData() is a Map which contains the secret queried.
Maybe by checking if the password key exists in that Map, secret.getData(), a Failure could be returned mentioning both the secret & the secret key given, wdyt?

@scholzj
Copy link
Member

scholzj commented Dec 2, 2021

Maybe by checking if the password key exists in that Map, secret.getData(), a Failure could be returned mentioning both the secret & the secret key given, wdyt?

You mean from the getPasswordAsync? Yeah, I think that would work.

fjbecerra added a commit to fjbecerra/strimzi-kafka-operator that referenced this issue Dec 5, 2021
…3475)

Signed-off-by: Francisco Becerra <fran.bcra@gmail.com>
@scholzj scholzj linked a pull request Dec 5, 2021 that will close this issue
8 tasks
scholzj pushed a commit that referenced this issue Dec 6, 2021
…5983)

Signed-off-by: Francisco Becerra <fran.bcra@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants