-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add security layer negotiation to the GSSAPI authentication. #1283
Conversation
When trying to establish a connection with Kafka using SASL with the GSSAPI authentication mechanism the connection was hanging an timing out after 60 secons. On the Kafka broker side I noticed that the SaslServerAuthenticator was going from the AUTHENTICATE to the FAILED state. The GSSAPI auth implementation was missing the second handshake defined in RFC 2222, which happens after the security context is established. This handshake is used by the client and server to negotiate the security layer (QoP) to be used for the connection. Kafka currently only support the "auth" QoP, so the implementation in this commit doesn't make it configurable, but this can be extended later. With this change I was able to successfully connect to a Kerberos-enabled Kafka broker using the SASL_PLAINTEXT protocol and the GSSAPI mechanism.
Thanks so much for the PR. I'm a little wary of changes to the raw protocol parser, but will let tests pass and then see if it impacts benchmark performance. Do you have any quick docs that we could add re: setting up a client for gssapi connection? Also, any suggestions re: putting additional configuration in kafka-python, like keytab etc? I assume that you are setting these externally via environment variable or system defaults? |
No worries. I understand your concerns about changing this at such low level. Nevertheless, without the change I could not connect to a Kerberos-enabled broker over SASL_PLAINTEXT and using GSSAPI. The connection would hang forever because the broker would keep waiting for the QoP negotiation. How have you tested the GSSAPI authentication? Have you ever run into this situation? It would be good to understand the environment where it was tested before to see if I could reproduce in my own cluster. BTW, the cluster I'm testing this on is running the Cloudera Distribution of Apache Kafka 2.1.1, which is based on Kafka 0.10.0.0 with a bunch of fixes on top of it. All the changes I made were derived directly from RFC 2222. I wouldn't call that a quick doc though :) All we need for this change, though, is explained in sections 7.2.[1-3], and it's quite well explained there. Let me know if I can help make it clearer. |
Oops, sorry, I misunderstood your question about the documentation. |
@dpkp , here's what I used to test the connections before and after my changes: from kafka import KafkaProducer
params = ...
producer = KafkaProducer(**params)
msg = 'Hello, World!'
producer.send(cfg['topic'], bytes(msg))
producer.flush()
producer.close() The only thing that needs to be changed between PLAIN and SASL authentication are the parameters passed to the producer. Without SASL, I used these parameters: params = {
'bootstrap_servers': ['broker1:9092','broker2:9092'],
'security_protocol': 'PLAINTEXT',
} To connect to a broker using SASL, the parameters need to be changed to this: params = {
'bootstrap_servers': ['broker1:9092','broker2:9092'],
'security_protocol': 'SASL_PLAINTEXT',
'sasl_mechanism': 'GSSAPI',
} To successfully authentication with the broker the user need to acquire a valid Kerberos ticket. Both of the methods below work fine with the Python example:
Adding a keytab option to kafka-python would be a plus but not essential, since this can be easily worked around as per above. Currently the QoP level set by this patch is hard-coded to 'auth', since that's the only one supported by Kafka. I could be easily parameterized later when/if Kafka is extended to support them. I'm not sure if this in the roadmap, though, since it already supports SASL_SSL, which provides integrity and confidentiality. For reference, this is my SASL-enabled broker configuration:
HTH |
I saw the build error above and re-run the tests on my own machine and they all passed:
Would you know what happened to the automated build? |
Hold on a bit on this PR. I'm looking at adding some support to SASL integration tests. I noticed that all tests are run in PLAINTEXT and there needs to be some changes to allow SASL tests. |
Don't mind the pypy builds, the do fail from time to time, we just restart them. You can see the errors on travis by clicking on the failed check. |
Test coverage for SASL would be greatly appreciated! The original support was a community submission, which we decided to accept w/o tests hoping that others might find it useful and eventually add test coverage. |
try: | ||
if type(data) is not str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What aboutif not isinstance(data, str)
rather than limiting yourself with type()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. I'll change that. Thanks!
Hi, guys, In regards to the GSSAPI integration tests, should André |
The structure so far has been that |
@asdaraujo Given that we use An again, thank you for being willing to put some time into these, it is very much appreciated! |
Thanks, @dpkp and @jeffwidman .
A have a few thoughts/questions about this: About the Kerberos build The Kerberos "fixture" differs a bit from Kafka in that it requires to be built for the local architecture. I was leaning towards having The alternative would be to download the Kerberos source from MIT and build it as required, but I think this could be an overkill. Assuming that the integration tests will always run on RedHat or Debian OS's, having Kerberos installed through packages should suffice. Kerberos instances and configuration I would like to avoid having multiple Kerberos fixtures running at once on a single server using different ports and configurations. This is very unusual and could lead to unforeseen issues. It also seems unnecessary to me to start/stop the Kerberos KDC server before and after every tests. I believe it's better to have a single Kerberos realm set up and used for all the integration tests. The approach that I had in mind was to complete the Kerberos configuration and startup in the The Kerberos server and keytab would then be configured and started only once. When the tests are launched, if the A Python fixture could still be used to start/stop Kerberos, but I'm not sure how much value that would add. I'd be inclined to let the OS manage the service but if you think it's better to have a fixture to spawn/stop it I can look into it. In that case, I wonder it if would be better to build it from source to ensure installation prefix is controlled and not OS-dependent.
Yeah, I've seen that. I found some inconsistency with the integration tests. Not sure if that was intentional or not. Many of the tests use fixtures but most of the existing integration tests are based on TestCase classes, instead, setting up and tearing down fixtures explicitly. I've decided to break down my PR in two: some changes to tests to pave way for the GSSAPI tests, and then the GSSAPI changes (this PR). I'm currently working on the improvements to the integration tests. However, since most of the integration tests were using TestCase classes, I kept that convention and extended from there. Let me know your thoughts. |
Re: TestCase, IMHO, a lot of that is just legacy cruft from before we used pytest. Personally, I would rather standardize on pytest fixtures. I've been meaning to migrate the TestCase classes over to pytest style fixtures, but decided to wait because many of the old SimpleClient/SimpleConsumer/SimpleProducer tests use TestCase, and we are intending to remove those altogether (#1193 / #1196), so no sense migrating them. But that in turn is held up by #633, which probably requires writing an AdminClient (#935) first. None of these are absolute blockers of each other in any way, it just is the most efficient order to tackle all of these things without doing extra work. |
Thanks, @jeffwidman , I'll take that into account. Let me know your thoughts about the approach for the Kerberos use. |
I'd love to get this GSSAPI fix merged before next release. |
@dpkp I was holding off on this because I wanted to to first introduce SASL coverage to integration test (#1293). I did test SASL connections with my cluster successfully, but the integration tests all run with PLAINTEXT and don't cover SASL at all. Just trying to be on the safe side. If you want to go ahead with this before the tests I can finish and push the minor changes mentioned above. Let me know what you prefer. |
# Kafka currently doesn't support integrity or confidentiality security layers, so we | ||
# simply set QoP to 'auth' only (first octet). We reuse the max message size proposed | ||
# by the server | ||
msg = Int8.encode(SASL_QOP_AUTH & Int8.decode(msg[0])) + msg[1:] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my only concern is that this single call to decode()
is requiring a very big change to the low level protocol decoding that will touch everything. And while I agree that the protocol decoding probably should not require BytesIO (or some other stream), I am not excited about including that protocol change here. How about Int8.decode(BytesIO(msg[0]))
?
I still haven't been able to get a functioning kerberos auth setup locally, but given that the current code is broken I would like to merge these changes as-is. The one change I'm going to make is to revert the protocol decoding changes in favor of a smaller change in the place we need it. |
When trying to establish a connection with Kafka using SASL with the GSSAPI authentication mechanism the connection was hanging an timing out after 60 secons. On the Kafka broker side I noticed that the SaslServerAuthenticator was going from the AUTHENTICATE to the FAILED state. The GSSAPI auth implementation was missing the second handshake defined in RFC 2222, which happens after the security context is established. This handshake is used by the client and server to negotiate the security layer (QoP) to be used for the connection. Kafka currently only support the "auth" QoP, so the implementation in this commit doesn't make it configurable, but this can be extended later. With this change I was able to successfully connect to a Kerberos-enabled Kafka broker using the SASL_PLAINTEXT protocol and the GSSAPI mechanism.
Reverted protocol changes and merged as 4cfeaca |
When trying to establish a connection with Kafka using SASL with the
GSSAPI authentication mechanism the connection was hanging an timing out
after 60 secons. On the Kafka broker side I noticed that the
SaslServerAuthenticator was going from the AUTHENTICATE to the FAILED state.
The GSSAPI auth implementation was missing the second handshake defined in
RFC 2222, which happens after the security context is established. This
handshake is used by the client and server to negotiate the security layer (QoP)
to be used for the connection.
Kafka currently only support the "auth" QoP, so the implementation in this commit
doesn't make it configurable, but this can be extended later.
With this change I was able to successfully connect to a Kerberos-enabled Kafka
broker using the SASL_PLAINTEXT protocol and the GSSAPI mechanism.