Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bpo-28414] In SSL module, store server_hostname as an A-label #3010

Closed
wants to merge 3 commits into from

Conversation

njsmith
Copy link
Contributor

@njsmith njsmith commented Aug 6, 2017

Historically, our handling of international domain names (IDNs) in the
ssl module has been very broken. The flow went like:

  1. User passes server_hostname= to the SSLSocket/SSLObject
    constructor. This gets normalized to an A-label by using the
    PyArg_Parse "et" mode: bytes objects get passed through
    unchanged (assumed to already be A-labels); str objects get run
    through .encode("idna") to convert them into A-labels.

  2. newPySSLSocket takes this A-label, and for some reason decodes
    it back to a U-label, and stores that as the object's
    server_hostname attribute.

  3. Later, this U-label server_hostname attribute gets passed to
    match_hostname, to compare against the hostname seen in the
    certificate. But certificates contain A-labels, and match_hostname
    expects to be passed an A-label, so this doesn't work at all.

This PR fixes the problem by removing the pointless decoding at step
2, so that internally we always use A-labels, which matches how
internet protocols are designed in general: A-labels are used
everywhere internally and on-the-wire, and U-labels are basically just
for user interfaces.

This also matches the general advice to handle encoding/decoding once
at the edges, though for backwards-compatibility we continue to use
'str' objects to store A-labels, even though they're now always
ASCII. Technically there is a minor compatibility break here: if a
user examines the .server_hostname attribute of an ssl-wrapped socket,
then previously they would have seen a U-label like "pythön.org", and
now they'll see an A-label like "xn--pythn-mua.org". But this only
affects non-ASCII domain names, which have never worked in the first
place, so it seems unlikely that anyone is relying on the old
behavior.

This PR also adds an end-to-end test for IDN hostname
validation. Previously there were no tests for this functionality.

Fixes bpo-28414.

https://bugs.python.org/issue28414

Copy link
Member

@alex alex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good.

cc: @reapererhulk, @Lukasa

Modules/_ssl.c Outdated
@@ -637,8 +637,9 @@ newPySSLSocket(PySSLContext *sslctx, PySocketSockObject *sock,
self->owner = NULL;
self->server_hostname = NULL;
if (server_hostname != NULL) {
/* server_hostname was encoded to ascii by our caller */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this comment should explain this in terms of a-labels.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, done

@@ -0,0 +1,2 @@
The ssl module now correctly supports validating hostnames that contain non-
ASCII characters.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason not to refer to these as IDNs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just that IDN is pretty jargony and this is the end-user-oriented news text. I made it "hostnames that contain non-ASCII characters (IDNs)".

if support.verbose:
sys.stdout.write("\n")

server_context = ssl.SSLContext(ssl.PROTOCOL_TLSv1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Is there a reason to use TLSv1 instead of SSLv23 (lol)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know, I just made it match all the other tests around it (aka this is blatantly copy/paste code).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's have it be PROTOCOL_TLS, surely.

@njsmith
Copy link
Contributor Author

njsmith commented Aug 6, 2017

CC @reaperhulk

Copy link

@Lukasa Lukasa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This generally looks reasonable to me. Of course, I don't have the commit bit, so my enthusiasm is almost worthless. 😉

if support.verbose:
sys.stdout.write("\n")

server_context = ssl.SSLContext(ssl.PROTOCOL_TLSv1)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's have it be PROTOCOL_TLS, surely.

with context.wrap_socket(socket.socket(),
server_hostname="python.example.org") as s:
with self.assertRaises(ssl.CertificateError):
s.connect((HOST, server.port))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't speak to the CPython tests, but...it'd be really nice if this was four tests with common setup instead of one test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tell you what, you port CPython to pytest, and I'll write a parametrized test :-). (Though honestly I don't really see what difference it makes.)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need a parametrised test: just have a method (or even a function!) that does all the setup and returns the wrapped socket.

As to "what difference it makes", the answer is that it makes it easier to isolate a test failure. For example, if this test fails with an assertion error that says Can't get peer certificate, what hostname failed? All of them? The only way to know is to re-run the tests under pdb and wait until the error is hit.

It also shadows failures: for example, if the test fails with one of the acceptable hostnames you don't know whether the functionality that fails on incorrect hostnames is still working, because that part of the test never runs.

Regardless, I wouldn't block the patch on this because it's a prevalent pattern: it's just a bad one. 😁

@njsmith
Copy link
Contributor Author

njsmith commented Aug 7, 2017

@Lukasa:

PROTOCOL_TLS

Fine, fine, all right, done. (And why won't github let me reply to your comment? The reply box is just missing. It Is a Mystery 😕)

@tiran
Copy link
Member

tiran commented Aug 7, 2017

I suggested the same solution in pyca/cryptography#3357 (comment) and in my WIP PEP. I'm glad you had the same idea. There is also the SNI callback. I like to change it to IDN A-Label, too.

Copy link
Member

@tiran tiran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is backwards incompatible but the best option we have. Please update the documentation of SSLSocket.server_hostname and state that the variable now contains the hostname in ACE instead of IDN U-label. The whatsnew doc for 3.7 needs an entry with a incompatibility warning, too.

@njsmith
Copy link
Contributor Author

njsmith commented Aug 13, 2017

@tiran: done

@decaz
Copy link
Contributor

decaz commented Oct 16, 2017

@njsmith this PR is not abandoned, I hope? 😟

@njsmith
Copy link
Contributor Author

njsmith commented Oct 16, 2017

@decaz AFAIK I'm waiting for @tiran or @alex to make the next move :-).

I guess some conflicting change has been merged in the mean time. I don't understand the conflict in test_ssl.py at all. @tiran I guess this must be your change, does it mkae sense to you?

@tiran
Copy link
Member

tiran commented Oct 16, 2017

test_ssl has changed a lot because CPython now required threading. In the past code was indented one additional level.

Historically, our handling of international domain names (IDNs) in the
ssl module has been very broken. The flow went like:

1. User passes server_hostname= to the SSLSocket/SSLObject
   constructor. This gets normalized to an A-label by using the
   PyArg_Parse "et" mode: bytes objects get passed through
   unchanged (assumed to already be A-labels); str objects get run
   through .encode("idna") to convert them into A-labels.

2. newPySSLSocket takes this A-label, and for some reason decodes
   it *back* to a U-label, and stores that as the object's
   server_hostname attribute.

3. Later, this U-label server_hostname attribute gets passed to
   match_hostname, to compare against the hostname seen in the
   certificate. But certificates contain A-labels, and match_hostname
   expects to be passed an A-label, so this doesn't work at all.

This PR fixes the problem by removing the pointless decoding at step
2, so that internally we always use A-labels, which matches how
internet protocols are designed in general: A-labels are used
everywhere internally and on-the-wire, and U-labels are basically just
for user interfaces.

This also matches the general advice to handle encoding/decoding once
at the edges, though for backwards-compatibility we continue to use
'str' objects to store A-labels, even though they're now always
ASCII. Technically there is a minor compatibility break here: if a
user examines the .server_hostname attribute of an ssl-wrapped socket,
then previously they would have seen a U-label like "pythön.org", and
now they'll see an A-label like "xn--pythn-mua.org". But this only
affects non-ASCII domain names, which have never worked in the first
place, so it seems unlikely that anyone is relying on the old
behavior.

This PR also adds an end-to-end test for IDN hostname
validation. Previously there were no tests for this functionality.

Fixes bpo-28414.
@njsmith
Copy link
Contributor Author

njsmith commented Oct 17, 2017

Okay, I think I managed to rebase this. @tiran, want to hit merge?

Copy link
Contributor

@asvetlov asvetlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for PR!
@tiran are there other things need to be fixed?
I love to have the patch in 3.7
Maybe we need a backport to 3.6 as well.
Asyncio for non-ascii SSL hosts is broken now :(

@asvetlov
Copy link
Contributor

asvetlov commented Nov 7, 2017

@tiran ping

@decaz
Copy link
Contributor

decaz commented Nov 22, 2017

@tiran ?

@asvetlov
Copy link
Contributor

@tiran please review.
The patch is quite important, I love to have it in Python 3.7 at least.
Backport to Python 3.6 would be awesome also.

@tiran
Copy link
Member

tiran commented Nov 30, 2017

@asvetlov I'll make sure a complete patch lands in 3.7. This is a) a breaking change and b) I like to fix all of the SSL module at once. The SNI callback is still broken. I'll drop a mail to python-dev as soon as I have some spare time.

@asvetlov
Copy link
Contributor

Thanks!

@tiran
Copy link
Member

tiran commented Dec 30, 2017

The patch does not work as I expected. Shouldn't sock.server_hostname return www.strasse.de instead of 'www.straße.de'`?

>>> import ssl, socket
>>> ctx = ssl.create_default_context()
>>> sock = ctx.wrap_socket(socket.socket(), server_hostname='www.straße.de')
>>> sock.server_hostname
'www.straße.de'

@njsmith
Copy link
Contributor Author

njsmith commented Dec 30, 2017 via email

@tiran
Copy link
Member

tiran commented Dec 30, 2017

Absolutely, I'm running 948bc68 on branch pr/3010:

$ ./python 
Python 3.7.0a3+ (heads/pr/3010:948bc687ec, Dec 30 2017, 22:48:25) 
[GCC 7.2.1 20170915 (Red Hat 7.2.1-2)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import ssl, socket
>>> ctx = ssl.create_default_context()
>>> sock = ctx.wrap_socket(socket.socket(), server_hostname=u'www.pythön.de')
>>> sock.server_hostname
'www.pythön.de'
>>> sock = ctx.wrap_socket(socket.socket(), server_hostname=u'www.straße.de')
>>> sock.server_hostname
'www.straße.de'

@tiran
Copy link
Member

tiran commented Dec 30, 2017

You can trust me. I speak German, so I cannot be an evil man :)

@njsmith
Copy link
Contributor Author

njsmith commented Dec 31, 2017

Oh, blah, I see. It's a problem in the interaction between the Python and C level modules. I fixed the C level PySSLSocket, but ssl.SSLSocket is a different class entirely, implemented in Python. When an ssl.SSLSocket object is created, it sets self.server_hostname to whatever was passed in, so that's what you're seeing. The only thing this is used for is to later construct a PySSLSocket, and then after that everything's handled correctly by my code, so that's why we didn't notice.

ssl.SSLObject is also a Python class, but it just delegates its server_hostname field to the wrapped PySSLSocket, so it doesn't have this problem. However, we can't trivially do the same with ssl.SSLSocket, because it waits until the socket is connected before creating a PySSLSocket.

Two options occur to me:

  • Implement the same U-label-str → A-label-str normalization in ssl.SSLSocket.__init__. Not the prettiest, but pretty simple.

  • Instead of lazily allocating the PySSLSocket, always allocate it immediately in ssl.SSLSocket.__init__.

The latter is a larger change and riskier, but maybe also an independently valuable simplification?

I'm looking through the code trying to figure out what would change if we eagerly allocated the PySSLSocket. Concretely, this means ssl.SSLSocket._sslobj would always be initialized to a non-None value.

Most or all of the socket methods have explicit checks like: "if self._sslobj is None, delegate to the regular socket method, else, use SSL". So there's some idea that a ssl.SSLSocket could be used for both regular unencrypted communication and also encrypted communication. Except... these are methods like send and recv, which require a connected socket. And when you wrap a connected socket, or connect a wrapped socket, then self._sslobj gets unconditionally set to non-None. So AFAICT these checks are almost meaningless. There is one case where they could fire: if you set up an SSL connection, and then call ssl.SSLSocket.unwrap to cleanly close the SSL layer, and then keep using the socket object. In this case the docs are explicit that you cannot keep using the ssl.SSLSocket object: "The [socket returned by unwrap] should always be used for further communication with the other side of the connection, rather than the original socket." But if you ignored those docs and kept using it anyway, then I think that's the one case where you could end up depending on the if self._sslobj is None checks.

This should be extremely rare. In practice, unwrap is essentially useless, because why would you ever want to switch from encrypted back to plaintext. No mainstream protocols require it. Twisted doesn't even implement it. And even if you are using it, you're not supposed to keep using the SSLSocket afterwards anyway.

What else would change? I guess right now, you can wrap an unconnected socket, and then assign to the server_hostname attribute like wrapped_socket.server_hostname = whoops_changed_my_mind, and it works; switching to eager initialization would make that fail. (Note that this case is also problematic for my other proposed fix – if we put IDNA normalization in ssl.SSLSocket.__init__ then direct assignment to the attribute would skip it. Do we care?)

I guess right now you can wrap a UDP socket in an ssl.SSLSocket and then never initiate SSL at all, and it'll work? But this would be extremely perverse.

What do you think?

@tiran
Copy link
Member

tiran commented Dec 31, 2017

The SSLSocket / SSLObject relationship became complicated over the years. I don't like the fact that SSLObject has so many levels of indirection, see https://bugs.python.org/issue24334

We could also go for a variation of your first idea and perform IDNA encoding in Python code only. In this variant, C code accepts just ASCII text or ASCII bytes. I've to think about it and will get back to you next year.

@njsmith
Copy link
Contributor Author

njsmith commented Dec 31, 2017 via email

@tiran
Copy link
Member

tiran commented Dec 31, 2017

I've spent some time on the train to refactor code and to implement your second proposal. A first proof of concept implementation works mostly. I was able to get rid of SSLObject for socket-based I/O and to instantiate the SSL object directly.

I'm now hitting a road block: SSLSocket.close() and SSLSocket.__exit__ never called SSL_shutdown(SSL*). The methods only killed the socket fd forcefully. WIth correct shutdown, test_server_accept is hanging indefinitely. What a mess :/

@njsmith
Copy link
Contributor Author

njsmith commented Jan 1, 2018

Oh yeah, SSL shutdown is a whole other mess. It's not as simple as "the current behavior is buggy" – the semantics Python implements now are what HTTPS implementations universally use and expect, and if close and __exit__ become blocking operations then you need to think about timeouts, and if we're fixing ssl to send shutdown notifications correctly then should we also make it expect to receive them, and so on and so on. And this is behavior that people are definitely not expecting to change under their feet.

We definitely don't want to touch that in this branch, or in 3.7 – we should preserve the current behavior where close does a hard-shutdown, and you have to call unwrap to do a clean shutdown.

(If you're curious, I did come up with some kind of solution to these issues in Trio, see the docs on https_compatible, and my tests are set up accordingly etc.)

@tiran
Copy link
Member

tiran commented Jan 1, 2018

We have to touch the close part in order to fix session handling. I haven't looked into the I/O part of SSLSocket yet ... I'm now starting to regret that I ever did. I understand why sessions are partly broken with OpenSSL 1.1.0. A forceful close of I/O without any proper shutdown invalidates the session.

There are fourish ways to close a TLS connection

  1. bidirectional close, both sides wait for the close notifiy confirmation from the other side. bidirectional close is required to downgrade a connection from TLS/SSL to plain. AFAIK a bidirectional close needs to calls to SSL_shutdown.
  2. unidirectional close, one sides sends a close notification and then closes I/O.
  3. force close, one side just closes I/O without any notification or state change of its SSL object
  4. quiet close, like force close but SSL_CTX_set_quiet_shutdown or SSL_CTX_set_quiet_shutdown are used to properly modify the state of the SSL object before I/O is closed.

SSLSocket.__exit__ and SSLSocket.close currently use force close, which is is clearly the worst option. We cannot default to bidirectional close because that may cause issues with blocking connections or bad peers. unidirectional close may be good enough for HTTPS but I don't know if it may block if the send buffer is full.

How do you feel about a close_mode flag on SSLContext with modes quiet (default), unidirectional and bidirectional, with SSLSocket.unwrap() enforcing bidirectional shutdown?

@njsmith
Copy link
Contributor Author

njsmith commented Jan 2, 2018

@tiran Please make a separate issue/PR for any changes in close handling, it's orthogonal to IDNA support in server_hostname and deserves its own proper attention :-)

For right now, maybe I should just do the quick hack version so this can go in, and then we'll worry about the refactoring separately?

@tiran
Copy link
Member

tiran commented Jan 7, 2018

I have opened PR #5128, which addresses the issue of server_hostname attributes, the missing fix for the callback, and proper test cert.

@njsmith
Copy link
Contributor Author

njsmith commented Feb 24, 2018

Replaced by #5128

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants