-
-
Notifications
You must be signed in to change notification settings - Fork 30.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[bpo-28414] In SSL module, store server_hostname as an A-label #3010
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good.
cc: @reapererhulk, @Lukasa
Modules/_ssl.c
Outdated
@@ -637,8 +637,9 @@ newPySSLSocket(PySSLContext *sslctx, PySocketSockObject *sock, | |||
self->owner = NULL; | |||
self->server_hostname = NULL; | |||
if (server_hostname != NULL) { | |||
/* server_hostname was encoded to ascii by our caller */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this comment should explain this in terms of a-labels.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, done
@@ -0,0 +1,2 @@ | |||
The ssl module now correctly supports validating hostnames that contain non- | |||
ASCII characters. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason not to refer to these as IDNs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just that IDN is pretty jargony and this is the end-user-oriented news text. I made it "hostnames that contain non-ASCII characters (IDNs)".
Lib/test/test_ssl.py
Outdated
if support.verbose: | ||
sys.stdout.write("\n") | ||
|
||
server_context = ssl.SSLContext(ssl.PROTOCOL_TLSv1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Is there a reason to use TLSv1
instead of SSLv23
(lol)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know, I just made it match all the other tests around it (aka this is blatantly copy/paste code).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's have it be PROTOCOL_TLS
, surely.
CC @reaperhulk |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This generally looks reasonable to me. Of course, I don't have the commit bit, so my enthusiasm is almost worthless. 😉
Lib/test/test_ssl.py
Outdated
if support.verbose: | ||
sys.stdout.write("\n") | ||
|
||
server_context = ssl.SSLContext(ssl.PROTOCOL_TLSv1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's have it be PROTOCOL_TLS
, surely.
Lib/test/test_ssl.py
Outdated
with context.wrap_socket(socket.socket(), | ||
server_hostname="python.example.org") as s: | ||
with self.assertRaises(ssl.CertificateError): | ||
s.connect((HOST, server.port)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't speak to the CPython tests, but...it'd be really nice if this was four tests with common setup instead of one test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tell you what, you port CPython to pytest, and I'll write a parametrized test :-). (Though honestly I don't really see what difference it makes.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need a parametrised test: just have a method (or even a function!) that does all the setup and returns the wrapped socket.
As to "what difference it makes", the answer is that it makes it easier to isolate a test failure. For example, if this test fails with an assertion error that says Can't get peer certificate
, what hostname failed? All of them? The only way to know is to re-run the tests under pdb and wait until the error is hit.
It also shadows failures: for example, if the test fails with one of the acceptable hostnames you don't know whether the functionality that fails on incorrect hostnames is still working, because that part of the test never runs.
Regardless, I wouldn't block the patch on this because it's a prevalent pattern: it's just a bad one. 😁
Fine, fine, all right, done. (And why won't github let me reply to your comment? The reply box is just missing. It Is a Mystery 😕) |
I suggested the same solution in pyca/cryptography#3357 (comment) and in my WIP PEP. I'm glad you had the same idea. There is also the SNI callback. I like to change it to IDN A-Label, too. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is backwards incompatible but the best option we have. Please update the documentation of SSLSocket.server_hostname and state that the variable now contains the hostname in ACE instead of IDN U-label. The whatsnew doc for 3.7 needs an entry with a incompatibility warning, too.
@tiran: done |
@njsmith this PR is not abandoned, I hope? 😟 |
test_ssl has changed a lot because CPython now required threading. In the past code was indented one additional level. |
Historically, our handling of international domain names (IDNs) in the ssl module has been very broken. The flow went like: 1. User passes server_hostname= to the SSLSocket/SSLObject constructor. This gets normalized to an A-label by using the PyArg_Parse "et" mode: bytes objects get passed through unchanged (assumed to already be A-labels); str objects get run through .encode("idna") to convert them into A-labels. 2. newPySSLSocket takes this A-label, and for some reason decodes it *back* to a U-label, and stores that as the object's server_hostname attribute. 3. Later, this U-label server_hostname attribute gets passed to match_hostname, to compare against the hostname seen in the certificate. But certificates contain A-labels, and match_hostname expects to be passed an A-label, so this doesn't work at all. This PR fixes the problem by removing the pointless decoding at step 2, so that internally we always use A-labels, which matches how internet protocols are designed in general: A-labels are used everywhere internally and on-the-wire, and U-labels are basically just for user interfaces. This also matches the general advice to handle encoding/decoding once at the edges, though for backwards-compatibility we continue to use 'str' objects to store A-labels, even though they're now always ASCII. Technically there is a minor compatibility break here: if a user examines the .server_hostname attribute of an ssl-wrapped socket, then previously they would have seen a U-label like "pythön.org", and now they'll see an A-label like "xn--pythn-mua.org". But this only affects non-ASCII domain names, which have never worked in the first place, so it seems unlikely that anyone is relying on the old behavior. This PR also adds an end-to-end test for IDN hostname validation. Previously there were no tests for this functionality. Fixes bpo-28414.
Okay, I think I managed to rebase this. @tiran, want to hit merge? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for PR!
@tiran are there other things need to be fixed?
I love to have the patch in 3.7
Maybe we need a backport to 3.6 as well.
Asyncio for non-ascii SSL hosts is broken now :(
@tiran ping |
@tiran ? |
@tiran please review. |
@asvetlov I'll make sure a complete patch lands in 3.7. This is a) a breaking change and b) I like to fix all of the SSL module at once. The SNI callback is still broken. I'll drop a mail to python-dev as soon as I have some spare time. |
Thanks! |
The patch does not work as I expected. Shouldn't
|
I just looked at the code again, and I can't see how that could happen...
Are you positive you're running the right branch?
…On Dec 30, 2017 5:30 AM, "Christian Heimes" ***@***.***> wrote:
The patch does not work as I expected. Shouldn't sock.server_hostname
return www.strasse.de instead of 'www.straße.de <http://www.strasse.de>'`?
>>> import ssl, socket
>>> ctx = ssl.create_default_context()
>>> sock = ctx.wrap_socket(socket.socket(), server_hostname='www.straße.de <http://www.strasse.de>')
>>> sock.server_hostname
'www.straße.de <http://www.strasse.de>'
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3010 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAlOaG6Zvr1nIbKGK0teXWevZsuG8VrUks5tFjrhgaJpZM4Ou2c2>
.
|
Absolutely, I'm running 948bc68 on branch pr/3010:
|
You can trust me. I speak German, so I cannot be an evil man :) |
Oh, blah, I see. It's a problem in the interaction between the Python and C level modules. I fixed the C level
Two options occur to me:
The latter is a larger change and riskier, but maybe also an independently valuable simplification? I'm looking through the code trying to figure out what would change if we eagerly allocated the Most or all of the socket methods have explicit checks like: "if This should be extremely rare. In practice, What else would change? I guess right now, you can wrap an unconnected socket, and then assign to the I guess right now you can wrap a UDP socket in an What do you think? |
The SSLSocket / SSLObject relationship became complicated over the years. I don't like the fact that SSLObject has so many levels of indirection, see https://bugs.python.org/issue24334 We could also go for a variation of your first idea and perform IDNA encoding in Python code only. In this variant, C code accepts just ASCII text or ASCII bytes. I've to think about it and will get back to you next year. |
I don't think doing it all in Python would be simpler. We'd still have to
do it twice (once for SSLObject and once for SSLSocket), and then we'd need
some additional hoop-jumping in the C code to make sure we were working
with an ascii-only Unicode string.
…On Dec 31, 2017 3:21 AM, "Christian Heimes" ***@***.***> wrote:
The SSLSocket / SSLObject relationship became complicated over the years.
I don't like the fact that SSLObject has so many levels of indirection, see
https://bugs.python.org/issue24334
We could also go for a variation of your first idea and perform IDNA
encoding in Python code *only*. In this variant, C code accepts just
ASCII text or ASCII bytes. I've to think about it and will get back to you
next year.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3010 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAlOaFmUPafjJKn3hXc8O_dP-H2XXcprks5tF24zgaJpZM4Ou2c2>
.
|
I've spent some time on the train to refactor code and to implement your second proposal. A first proof of concept implementation works mostly. I was able to get rid of SSLObject for socket-based I/O and to instantiate the SSL object directly. I'm now hitting a road block: |
Oh yeah, SSL shutdown is a whole other mess. It's not as simple as "the current behavior is buggy" – the semantics Python implements now are what HTTPS implementations universally use and expect, and if We definitely don't want to touch that in this branch, or in 3.7 – we should preserve the current behavior where (If you're curious, I did come up with some kind of solution to these issues in Trio, see the docs on |
We have to touch the close part in order to fix session handling. I haven't looked into the I/O part of SSLSocket yet ... I'm now starting to regret that I ever did. I understand why sessions are partly broken with OpenSSL 1.1.0. A forceful close of I/O without any proper shutdown invalidates the session. There are fourish ways to close a TLS connection
How do you feel about a |
@tiran Please make a separate issue/PR for any changes in close handling, it's orthogonal to IDNA support in server_hostname and deserves its own proper attention :-) For right now, maybe I should just do the quick hack version so this can go in, and then we'll worry about the refactoring separately? |
I have opened PR #5128, which addresses the issue of |
Replaced by #5128 |
Historically, our handling of international domain names (IDNs) in the
ssl module has been very broken. The flow went like:
User passes server_hostname= to the SSLSocket/SSLObject
constructor. This gets normalized to an A-label by using the
PyArg_Parse "et" mode: bytes objects get passed through
unchanged (assumed to already be A-labels); str objects get run
through .encode("idna") to convert them into A-labels.
newPySSLSocket takes this A-label, and for some reason decodes
it back to a U-label, and stores that as the object's
server_hostname attribute.
Later, this U-label server_hostname attribute gets passed to
match_hostname, to compare against the hostname seen in the
certificate. But certificates contain A-labels, and match_hostname
expects to be passed an A-label, so this doesn't work at all.
This PR fixes the problem by removing the pointless decoding at step
2, so that internally we always use A-labels, which matches how
internet protocols are designed in general: A-labels are used
everywhere internally and on-the-wire, and U-labels are basically just
for user interfaces.
This also matches the general advice to handle encoding/decoding once
at the edges, though for backwards-compatibility we continue to use
'str' objects to store A-labels, even though they're now always
ASCII. Technically there is a minor compatibility break here: if a
user examines the .server_hostname attribute of an ssl-wrapped socket,
then previously they would have seen a U-label like "pythön.org", and
now they'll see an A-label like "xn--pythn-mua.org". But this only
affects non-ASCII domain names, which have never worked in the first
place, so it seems unlikely that anyone is relying on the old
behavior.
This PR also adds an end-to-end test for IDN hostname
validation. Previously there were no tests for this functionality.
Fixes bpo-28414.
https://bugs.python.org/issue28414