-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New PEP 546: Backport MemoryBIO to Python 2.7 #272
Conversation
cc @Lukasa |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few small notes; obviously I'm 👍 on the idea :-)
pep-0546.txt
Outdated
For practical reasons, Cory Benfield would like to first implement an | ||
I/O-less class similar to ssl.MemoryBIO and ssl.SSLObject for the | ||
:pep:`543`, and provide a second class based on the first one to use | ||
sockets or file descriptors. This design would help to factorize the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think "factorize" is a word; maybe just "help to structure the code" :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't blame. I don't know well the whole project, I only tried to make my small contribution! ;-)
I tried to summarize Cory's big plan. I asked him to review my PEP and proposed him to even rewrite it from scratch if needed. I'm not sure of the rationale of writing a first I/O-less class.
Obvious, I'm open to any kind of changes on this PEP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
pep-0546.txt
Outdated
|
||
While Python 2.7 is getting closer to its end-of-line (scheduled for | ||
2020), it is still used on production and the Python community is still | ||
responsible for its security. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And to facilitate the future adoption of PEP 543, which will improve security for Python3 users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
production on top of Python 2.7. To make the new TLS API more widely | ||
used, it should be usable on all Python versions currently supported: | ||
Python 2.7, 3.5, 3.6. Otherwise, some applications would have to wait | ||
until they drop Python 2 support to be able to use the new TLS API. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to connect all the dots: delaying adoption of the PEP 543 API means delaying the adoption for security improvements for Python3 users as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
||
The code will be backported and adapted from the master branch | ||
(Python 3). | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may be too in the weeds for a PEP, but when I worked on this in 2014, it also significantly reduced the size of the Python2/Python3 diff of the _ssl
module, which I would expect to make maintenance easier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
pep-0546.txt
Outdated
that it would have to embed a copy of pyOpenSSL. That would imply | ||
usability pain to install pip. Currently, pip doesn't support embedding | ||
C extensions which must be compiled on each platform and so require a C | ||
compiler. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This chain goes further: CPython embeds a copy of pip both for default installation and for use in virtual environments, so if pip ends up bundling PyOpenSSL, then CPython will end up bundling PyOpenSSL, and at that point it makes sense to ask "Err, maybe we should just make the necessary changes in 2.7 itself and skip the whole PyOpenSSL bundling idea for sufficiently recent Python versions?"
I think that's really the heart of the argument - Python 2.7 is going to end up shipping this feature anyway, and it's going to be significantly easier to maintain as a security related backport than it is as a full bundled copy of PyOpenSSL.
While I don't think it should be a determining factor in resolving the PEP, I do think it's worth noting that while redistributors will be able to handle a security backport relatively easily (this will be significantly less intrusive than PEP 466 or 476, and we managed to cope with those, albeit with the PEP 493 amendments), outright bunding of PyOpenSSL would be a genuine pain to handle sensibly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added something to explain the ensurepip issue, I tried to summarize your comment ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments inline, but the substantive requests are to backport from 3.6 instead of the unreleased development branch, and to explicitly reference PEP 466 (and quote the policy statement in that PEP that makes this one necessary).
Python 2.7. | ||
|
||
The code will be backported and adapted from the master branch | ||
(Python 3). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to backport from the Python 3.6 maintenance branch rather than from the development branch - that way it's a true backport of a released version, rather than potentially including code that hasn't previously been published in a stable release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I expect the code to be exactly the same. I don't think that MemoryBIO or SSLObject changed much since Python 3.5.
I prefer to backport from master to ease comparison of 2.7 and master branches, to easy re-sync later. As explained in another paragraph: reduce the diff between these two branches.
pep-0546.txt
Outdated
While Python 2.7 is getting closer to its end-of-line (scheduled for | ||
2020), it is still used on production and the Python community is still | ||
responsible for its security. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest adding a section about PEP 466 here, starting with the statement that it synchronised the Python 2.7 SSL module with the Python 3.4 implementation, but included the following limitation on the precedent it set:
This PEP does NOT propose a general exception for backporting new features to Python 2.7 - every new feature proposed for backporting will still need to be justified independently. In particular, it will need to be explained why relying on an independently updated backport on the Python Package Index instead is not an acceptable solution.
Then PEP 543, the potential dependency on PyOpenSSL from pip
(by way of requests
), and the impracticality of pip
(and hence CPython) bundling PyOpenSSL become the rationale for why the PyOpenSSL backport isn't adequate in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
implementation will be based on the ``ssl`` module of the Python | ||
standard library. | ||
|
||
In a perfect world, all applications would already run on Python 3 since |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this paragraph is necessary - wanting pip
, which we ship as part of CPython 2.7, to be able to reliably access the system provided TLS APIs, which we want it to do by way of PEP 543, is the real reason we want to update the standard library instead of just relying on PyOpenSSL.
Relying on PyOpenSSL used to be major problem due to the difficulty of building it on Windows and Mac OS X, but the introduction of the wheel format largely addressed that problem (you still need to build the underlying cryptography
library for *nix systems, but that isn't that difficult).
On second thoughts, I don't think my review comments should block publication, so just treat them as potential improvements to consider :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm 👍 on this. Having a MemoryBIO backport to 2.7 will greatly ease the work required for PEP 543.
--------------------------- | ||
|
||
There are plans afoot to look at moving Requests to a more event-loop-y | ||
model, and doing so basically mandates a MemoryBIO. In the absence of a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tornado has been doing TLS in an event-loop model in python 2.5+ with just wrap_socket
, no MemoryBIO
necessary. What am I missing? MemoryBIO
certainly gives some extra flexibility, but nothing I can see that's strictly required for an HTTP client. (Maybe it comes up in some proxy scenarios that Tornado hasn't implemented?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Lukasa can maybe reply to this question. In my experience, on Windows, you really want to use IOCP rather than select() to implement an event loop, and you need MemoryBIO for IOCP. (Hum, but you also need C code to access to IOCP.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bdarnell So the short answer is that wrap_socket interferes awkwardly with event loop management. A wrapped socket does not respond to selecting like a regular socket does, in the following ways:
- A wrapped socket that select considers readable may not be.
- A wrapped socket that select considers writable may not be.
- A wrapped socket that select does not consider readable may be.
Essentially, for all selecting models that use level-triggering as their approach, a wrapped socket behaves very strangely. It's at best an edge-triggered object (due to point 3), but even then it's an edge triggered object that may consistently refuse to behave the way you want it to due to the fact that triggering the socket into either readable or writable state may still prevent you from reading or writing any data at all.
The MemoryBIO object gives you much more predictable behaviour because it doesn't intercept socket calls. When the FD is marked readable, it really is: there just may be no data to transfer further up the chain. This means that the event loop doesn't dirty its hands with special knowledge about the way TLS works and handle all of the wacky TLS edge case behaviours.
Is it possible to write an event loop with just a wrapped socket? Sure. But the MemoryBIO provides a much more reasonable interface to do so. Most notably, Twisted does not use the wrapped socket approach any longer and I wouldn't propose that they should.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the answer @Lukasa :-) IMHO it's worth it to include this answer into the PEP since it was a very good question :-) (For example, I was unable to find the proper answer.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should also note that @Haypo's concerns about alternative styles of event loop also come into play here and are worth including. ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For what it's worth, Twisted used to implement TLS via the wrap_socket-esque approach (since this is what OpenSSL strongly encourages you to do). Despite extensive test coverage and plenty of real-world usage, it never really worked right, and when we finally managed to switch over to the in-memory BIO model, dozens of bugs were fixed overnight, the whole system got considerably more reliable.
These edge cases manifest most significantly when using embedded systems, low-spec hardware (think raspberry pi), or weird operating systems. I do still occasionally experience weird flakiness when using Tornado event loops on this kind of hardware that doesn't happen with Twisted, and tellingly, doesn't happen with Twisted's TLS support using tornado.platform.twisted
. I can't prove that there are bugs (the bugs tend to be extremely squirrely race-conditions that are hard to reproduce in-vitro) but I would definitely suspect there are some there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, while this PEP doesn't mention it: wrap_socket doesn't work with pipes, and it's helpful to be able to speak wire protocols over UNIX pipes (or other non-socket transports) between processes for things like multi-process parallelism.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it support Unix domain sockets?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still, I agree with @bdarnell that the sentence "There are plans afoot to look at moving Requests to a more event-loop-y model, and doing so basically mandates a MemoryBIO" is basically wrong.
Besides, the idea that future directions for Requests should guide our 2.7 backport strategy also sounds entirely bogus.
@glyph "For what it's worth, Twisted used to implement TLS via the wrap_socket-esque approach (since this is what OpenSSL strongly encourages you to do). Despite extensive test coverage and plenty of real-world usage, it never really worked right, (...)" Well, I never understood the concept of the "readiness" of a SSLSocket. It's different of readable low-level socket. "Readable" at SSL level may mean "need to write". I was always confused by that :-) For an event loop, I like having MemoryBIO layer to have a clear split between I/O and SSL. Mixing I/O and SSL as does SSLSocket leads to confusion and pain. |
Antoine:
Can you please elaborate? Do you mean that you are convinced by what @Lukasa and @glyph experience with wrap_socket()?
Can you please elaborate? Do you mean that the indirection relation between requests dependencies and ensurepip "BLOB" is wrong? Or do you suggest to Requests devs to not take the event-loop road? Or do you mean that it's possible to implement TLS on Python 2.7 without MemoryBIO? Or do you mean that Requests can use MemoryBIO from a third-party module (or even include an implementation)? In term of performance, I like the idea of using an event loop for network requests like HTTP. So I understand why Requests wants to use an event loop. While @Lukasa can probably build something "working" using the existing Python 2.7 without MemoryBIO, I expect a lot of painful corner cases. If the PEP is rejected, IMHO the most reliable option is to use blocking code on Python 2, and only use an event loop on Python 3 when all required TLS features are available in the stdlib. Maybe Requests might use an event loop on Python 2 if all requests are done in clear text... but mixing blocking and non-blocking code is painful :-( |
General-comment: I would prefer that this discussion occurs on the python-dev mailing list ;-) |
Ditto.
I'm not sure what @Lukasa experienced with wrap_socket(). I trust @glyph that there may be delicate issues that are non-trivial to solve in a full-fledged way. But there are tons of fragile or delicate things in 2.7, yet we don't backport entire subsystems from 3.x in order to solve that.
I'd like to see benchmarks of an event loop for remote HTTP requests against a simple pool of threads, because I'm skeptical that would make much of a difference.
Requests can do whatever it wants as long as it doesn't try to coerce python-dev into questionable maintenance decisions. |
@bdarnell Thanks for adding that to python-dev btw! ❤️ |
No description provided.