Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New PEP 546: Backport MemoryBIO to Python 2.7 #272

Merged
merged 4 commits into from
May 30, 2017
Merged

New PEP 546: Backport MemoryBIO to Python 2.7 #272

merged 4 commits into from
May 30, 2017

Conversation

vstinner
Copy link
Member

No description provided.

@vstinner
Copy link
Member Author

cc @Lukasa

@vstinner vstinner requested a review from ncoghlan May 30, 2017 11:48
@vstinner
Copy link
Member Author

cc @njsmith @1st1

Copy link
Member

@alex alex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few small notes; obviously I'm 👍 on the idea :-)

pep-0546.txt Outdated
For practical reasons, Cory Benfield would like to first implement an
I/O-less class similar to ssl.MemoryBIO and ssl.SSLObject for the
:pep:`543`, and provide a second class based on the first one to use
sockets or file descriptors. This design would help to factorize the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think "factorize" is a word; maybe just "help to structure the code" :-)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't blame. I don't know well the whole project, I only tried to make my small contribution! ;-)

I tried to summarize Cory's big plan. I asked him to review my PEP and proposed him to even rewrite it from scratch if needed. I'm not sure of the rationale of writing a first I/O-less class.

Obvious, I'm open to any kind of changes on this PEP.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

pep-0546.txt Outdated

While Python 2.7 is getting closer to its end-of-line (scheduled for
2020), it is still used on production and the Python community is still
responsible for its security.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And to facilitate the future adoption of PEP 543, which will improve security for Python3 users.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

production on top of Python 2.7. To make the new TLS API more widely
used, it should be usable on all Python versions currently supported:
Python 2.7, 3.5, 3.6. Otherwise, some applications would have to wait
until they drop Python 2 support to be able to use the new TLS API.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to connect all the dots: delaying adoption of the PEP 543 API means delaying the adoption for security improvements for Python3 users as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


The code will be backported and adapted from the master branch
(Python 3).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be too in the weeds for a PEP, but when I worked on this in 2014, it also significantly reduced the size of the Python2/Python3 diff of the _ssl module, which I would expect to make maintenance easier.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

pep-0546.txt Outdated
that it would have to embed a copy of pyOpenSSL. That would imply
usability pain to install pip. Currently, pip doesn't support embedding
C extensions which must be compiled on each platform and so require a C
compiler.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This chain goes further: CPython embeds a copy of pip both for default installation and for use in virtual environments, so if pip ends up bundling PyOpenSSL, then CPython will end up bundling PyOpenSSL, and at that point it makes sense to ask "Err, maybe we should just make the necessary changes in 2.7 itself and skip the whole PyOpenSSL bundling idea for sufficiently recent Python versions?"

I think that's really the heart of the argument - Python 2.7 is going to end up shipping this feature anyway, and it's going to be significantly easier to maintain as a security related backport than it is as a full bundled copy of PyOpenSSL.

While I don't think it should be a determining factor in resolving the PEP, I do think it's worth noting that while redistributors will be able to handle a security backport relatively easily (this will be significantly less intrusive than PEP 466 or 476, and we managed to cope with those, albeit with the PEP 493 amendments), outright bunding of PyOpenSSL would be a genuine pain to handle sensibly.

Copy link
Member Author

@vstinner vstinner May 30, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added something to explain the ensurepip issue, I tried to summarize your comment ;-)

Copy link
Contributor

@ncoghlan ncoghlan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments inline, but the substantive requests are to backport from 3.6 instead of the unreleased development branch, and to explicitly reference PEP 466 (and quote the policy statement in that PEP that makes this one necessary).

Python 2.7.

The code will be backported and adapted from the master branch
(Python 3).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to backport from the Python 3.6 maintenance branch rather than from the development branch - that way it's a true backport of a released version, rather than potentially including code that hasn't previously been published in a stable release.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expect the code to be exactly the same. I don't think that MemoryBIO or SSLObject changed much since Python 3.5.

I prefer to backport from master to ease comparison of 2.7 and master branches, to easy re-sync later. As explained in another paragraph: reduce the diff between these two branches.

pep-0546.txt Outdated
While Python 2.7 is getting closer to its end-of-line (scheduled for
2020), it is still used on production and the Python community is still
responsible for its security.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest adding a section about PEP 466 here, starting with the statement that it synchronised the Python 2.7 SSL module with the Python 3.4 implementation, but included the following limitation on the precedent it set:

This PEP does NOT propose a general exception for backporting new features to Python 2.7 - every new feature proposed for backporting will still need to be justified independently. In particular, it will need to be explained why relying on an independently updated backport on the Python Package Index instead is not an acceptable solution.

Then PEP 543, the potential dependency on PyOpenSSL from pip (by way of requests), and the impracticality of pip (and hence CPython) bundling PyOpenSSL become the rationale for why the PyOpenSSL backport isn't adequate in this case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

implementation will be based on the ``ssl`` module of the Python
standard library.

In a perfect world, all applications would already run on Python 3 since
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this paragraph is necessary - wanting pip, which we ship as part of CPython 2.7, to be able to reliably access the system provided TLS APIs, which we want it to do by way of PEP 543, is the real reason we want to update the standard library instead of just relying on PyOpenSSL.

Relying on PyOpenSSL used to be major problem due to the difficulty of building it on Windows and Mac OS X, but the introduction of the wheel format largely addressed that problem (you still need to build the underlying cryptography library for *nix systems, but that isn't that difficult).

@ncoghlan
Copy link
Contributor

On second thoughts, I don't think my review comments should block publication, so just treat them as potential improvements to consider :)

Copy link
Contributor

@Lukasa Lukasa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm 👍 on this. Having a MemoryBIO backport to 2.7 will greatly ease the work required for PEP 543.

@vstinner vstinner merged commit f06c5be into python:master May 30, 2017
@vstinner vstinner deleted the pep546 branch May 30, 2017 12:58
@vstinner
Copy link
Member Author

I addressed most comments except one of the last Nick's comment. I suggested to @Lukasa to write a new PR. @ncoghlan: I let you see with @Lukasa to rephrase some parts if you want.

I merged the PR to reserve the PEP number to ease discussion on a unique PR number.

---------------------------

There are plans afoot to look at moving Requests to a more event-loop-y
model, and doing so basically mandates a MemoryBIO. In the absence of a

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tornado has been doing TLS in an event-loop model in python 2.5+ with just wrap_socket, no MemoryBIO necessary. What am I missing? MemoryBIO certainly gives some extra flexibility, but nothing I can see that's strictly required for an HTTP client. (Maybe it comes up in some proxy scenarios that Tornado hasn't implemented?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Lukasa can maybe reply to this question. In my experience, on Windows, you really want to use IOCP rather than select() to implement an event loop, and you need MemoryBIO for IOCP. (Hum, but you also need C code to access to IOCP.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bdarnell So the short answer is that wrap_socket interferes awkwardly with event loop management. A wrapped socket does not respond to selecting like a regular socket does, in the following ways:

  1. A wrapped socket that select considers readable may not be.
  2. A wrapped socket that select considers writable may not be.
  3. A wrapped socket that select does not consider readable may be.

Essentially, for all selecting models that use level-triggering as their approach, a wrapped socket behaves very strangely. It's at best an edge-triggered object (due to point 3), but even then it's an edge triggered object that may consistently refuse to behave the way you want it to due to the fact that triggering the socket into either readable or writable state may still prevent you from reading or writing any data at all.

The MemoryBIO object gives you much more predictable behaviour because it doesn't intercept socket calls. When the FD is marked readable, it really is: there just may be no data to transfer further up the chain. This means that the event loop doesn't dirty its hands with special knowledge about the way TLS works and handle all of the wacky TLS edge case behaviours.

Is it possible to write an event loop with just a wrapped socket? Sure. But the MemoryBIO provides a much more reasonable interface to do so. Most notably, Twisted does not use the wrapped socket approach any longer and I wouldn't propose that they should.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the answer @Lukasa :-) IMHO it's worth it to include this answer into the PEP since it was a very good question :-) (For example, I was unable to find the proper answer.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should also note that @Haypo's concerns about alternative styles of event loop also come into play here and are worth including. ;)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For what it's worth, Twisted used to implement TLS via the wrap_socket-esque approach (since this is what OpenSSL strongly encourages you to do). Despite extensive test coverage and plenty of real-world usage, it never really worked right, and when we finally managed to switch over to the in-memory BIO model, dozens of bugs were fixed overnight, the whole system got considerably more reliable.

These edge cases manifest most significantly when using embedded systems, low-spec hardware (think raspberry pi), or weird operating systems. I do still occasionally experience weird flakiness when using Tornado event loops on this kind of hardware that doesn't happen with Twisted, and tellingly, doesn't happen with Twisted's TLS support using tornado.platform.twisted. I can't prove that there are bugs (the bugs tend to be extremely squirrely race-conditions that are hard to reproduce in-vitro) but I would definitely suspect there are some there.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, while this PEP doesn't mention it: wrap_socket doesn't work with pipes, and it's helpful to be able to speak wire protocols over UNIX pipes (or other non-socket transports) between processes for things like multi-process parallelism.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it support Unix domain sockets?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still, I agree with @bdarnell that the sentence "There are plans afoot to look at moving Requests to a more event-loop-y model, and doing so basically mandates a MemoryBIO" is basically wrong.

Besides, the idea that future directions for Requests should guide our 2.7 backport strategy also sounds entirely bogus.

@vstinner
Copy link
Member Author

vstinner commented Jun 1, 2017

@glyph "For what it's worth, Twisted used to implement TLS via the wrap_socket-esque approach (since this is what OpenSSL strongly encourages you to do). Despite extensive test coverage and plenty of real-world usage, it never really worked right, (...)"

Well, I never understood the concept of the "readiness" of a SSLSocket. It's different of readable low-level socket. "Readable" at SSL level may mean "need to write". I was always confused by that :-)

For an event loop, I like having MemoryBIO layer to have a clear split between I/O and SSL. Mixing I/O and SSL as does SSLSocket leads to confusion and pain.

@vstinner
Copy link
Member Author

vstinner commented Jun 1, 2017

Antoine:

Still, I agree with @bdarnell that the sentence "There are plans afoot to look at moving Requests to a more event-loop-y model, and doing so basically mandates a MemoryBIO" is basically wrong.

Can you please elaborate? Do you mean that you are convinced by what @Lukasa and @glyph experience with wrap_socket()?

Besides, the idea that future directions for Requests should guide our 2.7 backport strategy also sounds entirely bogus.

Can you please elaborate? Do you mean that the indirection relation between requests dependencies and ensurepip "BLOB" is wrong? Or do you suggest to Requests devs to not take the event-loop road? Or do you mean that it's possible to implement TLS on Python 2.7 without MemoryBIO? Or do you mean that Requests can use MemoryBIO from a third-party module (or even include an implementation)?

In term of performance, I like the idea of using an event loop for network requests like HTTP. So I understand why Requests wants to use an event loop.

While @Lukasa can probably build something "working" using the existing Python 2.7 without MemoryBIO, I expect a lot of painful corner cases.

If the PEP is rejected, IMHO the most reliable option is to use blocking code on Python 2, and only use an event loop on Python 3 when all required TLS features are available in the stdlib. Maybe Requests might use an event loop on Python 2 if all requests are done in clear text... but mixing blocking and non-blocking code is painful :-(

@vstinner
Copy link
Member Author

vstinner commented Jun 1, 2017

General-comment: I would prefer that this discussion occurs on the python-dev mailing list ;-)

@pitrou
Copy link
Member

pitrou commented Jun 1, 2017

General-comment: I would prefer that this discussion occurs on the python-dev mailing list ;-)

Ditto.

Do you mean that you are convinced by what @Lukasa and @glyph experience with wrap_socket()?

I'm not sure what @Lukasa experienced with wrap_socket(). I trust @glyph that there may be delicate issues that are non-trivial to solve in a full-fledged way. But there are tons of fragile or delicate things in 2.7, yet we don't backport entire subsystems from 3.x in order to solve that.

In term of performance, I like the idea of using an event loop for network requests like HTTP.

I'd like to see benchmarks of an event loop for remote HTTP requests against a simple pool of threads, because I'm skeptical that would make much of a difference.

Or do you mean that Requests can use MemoryBIO from a third-party module (or even include an implementation)?

Requests can do whatever it wants as long as it doesn't try to coerce python-dev into questionable maintenance decisions.

@bdarnell
Copy link

bdarnell commented Jun 1, 2017

OK, I replied on python-dev here

@glyph, a bug report with any details you remember about the issues you ran into with SSLIOStream (even if it's far from reproducible) would be welcome - no issues like that have come to my attention in a long time.

@Lukasa
Copy link
Contributor

Lukasa commented Jun 1, 2017

@bdarnell Thanks for adding that to python-dev btw! ❤️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants