Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New PEP 546: Backport MemoryBIO to Python 2.7 #272

Merged
merged 4 commits into from
May 30, 2017
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
141 changes: 141 additions & 0 deletions pep-0546.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,141 @@
PEP: 546
Title: Backport ssl.MemoryBIO and ssl.SSLObject to Python 2.7
Version: $Revision$
Last-Modified: $Date$
Author: Victor Stinner <victor.stinner@gmail.com>,
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 30-May-2017


Abstract
========

Backport ssl.MemoryBIO and ssl.SSLObject classes from Python 3 to Python
2.7 to enhance the overall security of Python 2.7.


Rationale
=========

While Python 2.7 is getting closer to its end-of-line (scheduled for
2020), it is still used on production and the Python community is still
responsible for its security. And to facilitate the future adoption of
:pep:`543`, which will improve security for Python3 users.

This PEP does NOT propose a general exception for backporting new
features to Python 2.7 - every new feature proposed for backporting will
still need to be justified independently. In particular, it will need to
be explained why relying on an independently updated backport on the
Python Package Index instead is not an acceptable solution.


PEP 543
-------

The :pep:`543` defines a new TLS API for Python which would enhance the
Python security: give access to the root certificate authorities on
Windows and macOS by using native APIs, instead of OpenSSL. A side effect
is that it gives access to certificates installed locally by system
administrators, allowing to use "company certificates" without having to
modify each Python application and so validate correctly TLS
certificates (instead of having to ignore or bypass the TLS certificate
validation).

For practical reasons, Cory Benfield would like to first implement an
I/O-less class similar to ssl.MemoryBIO and ssl.SSLObject for the
:pep:`543`, and provide a second class based on the first one to use
sockets or file descriptors. This design would help to structure the code
to support more backends and simplify testing and auditing. Later,
optimized classes using directly sockets or file descriptors may be
added for performance.

While the :pep:`543` defines an API, the PEP would only make sense if it
comes with at least one complete and good implementation. The first
implementation will be based on the ``ssl`` module of the Python
standard library.

In a perfect world, all applications would already run on Python 3 since
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this paragraph is necessary - wanting pip, which we ship as part of CPython 2.7, to be able to reliably access the system provided TLS APIs, which we want it to do by way of PEP 543, is the real reason we want to update the standard library instead of just relying on PyOpenSSL.

Relying on PyOpenSSL used to be major problem due to the difficulty of building it on Windows and Mac OS X, but the introduction of the wheel format largely addressed that problem (you still need to build the underlying cryptography library for *nix systems, but that isn't that difficult).

Python 3.0 was released. In practice, many applications still run on
production on top of Python 2.7. To make the new TLS API more widely
used, it should be usable on all Python versions currently supported:
Python 2.7, 3.5, 3.6. Otherwise, some applications would have to wait
until they drop Python 2 support to be able to use the new TLS API.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to connect all the dots: delaying adoption of the PEP 543 API means delaying the adoption for security improvements for Python3 users as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


Delaying adoption of the PEP 543 API means delaying the adoption for
security improvements for Python 3 users as well.


requests, pip and ensurepip
---------------------------

There are plans afoot to look at moving Requests to a more event-loop-y
model, and doing so basically mandates a MemoryBIO. In the absence of a

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tornado has been doing TLS in an event-loop model in python 2.5+ with just wrap_socket, no MemoryBIO necessary. What am I missing? MemoryBIO certainly gives some extra flexibility, but nothing I can see that's strictly required for an HTTP client. (Maybe it comes up in some proxy scenarios that Tornado hasn't implemented?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Lukasa can maybe reply to this question. In my experience, on Windows, you really want to use IOCP rather than select() to implement an event loop, and you need MemoryBIO for IOCP. (Hum, but you also need C code to access to IOCP.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bdarnell So the short answer is that wrap_socket interferes awkwardly with event loop management. A wrapped socket does not respond to selecting like a regular socket does, in the following ways:

  1. A wrapped socket that select considers readable may not be.
  2. A wrapped socket that select considers writable may not be.
  3. A wrapped socket that select does not consider readable may be.

Essentially, for all selecting models that use level-triggering as their approach, a wrapped socket behaves very strangely. It's at best an edge-triggered object (due to point 3), but even then it's an edge triggered object that may consistently refuse to behave the way you want it to due to the fact that triggering the socket into either readable or writable state may still prevent you from reading or writing any data at all.

The MemoryBIO object gives you much more predictable behaviour because it doesn't intercept socket calls. When the FD is marked readable, it really is: there just may be no data to transfer further up the chain. This means that the event loop doesn't dirty its hands with special knowledge about the way TLS works and handle all of the wacky TLS edge case behaviours.

Is it possible to write an event loop with just a wrapped socket? Sure. But the MemoryBIO provides a much more reasonable interface to do so. Most notably, Twisted does not use the wrapped socket approach any longer and I wouldn't propose that they should.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the answer @Lukasa :-) IMHO it's worth it to include this answer into the PEP since it was a very good question :-) (For example, I was unable to find the proper answer.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should also note that @Haypo's concerns about alternative styles of event loop also come into play here and are worth including. ;)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For what it's worth, Twisted used to implement TLS via the wrap_socket-esque approach (since this is what OpenSSL strongly encourages you to do). Despite extensive test coverage and plenty of real-world usage, it never really worked right, and when we finally managed to switch over to the in-memory BIO model, dozens of bugs were fixed overnight, the whole system got considerably more reliable.

These edge cases manifest most significantly when using embedded systems, low-spec hardware (think raspberry pi), or weird operating systems. I do still occasionally experience weird flakiness when using Tornado event loops on this kind of hardware that doesn't happen with Twisted, and tellingly, doesn't happen with Twisted's TLS support using tornado.platform.twisted. I can't prove that there are bugs (the bugs tend to be extremely squirrely race-conditions that are hard to reproduce in-vitro) but I would definitely suspect there are some there.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, while this PEP doesn't mention it: wrap_socket doesn't work with pipes, and it's helpful to be able to speak wire protocols over UNIX pipes (or other non-socket transports) between processes for things like multi-process parallelism.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it support Unix domain sockets?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still, I agree with @bdarnell that the sentence "There are plans afoot to look at moving Requests to a more event-loop-y model, and doing so basically mandates a MemoryBIO" is basically wrong.

Besides, the idea that future directions for Requests should guide our 2.7 backport strategy also sounds entirely bogus.

Python 2.7 backport, Requests is required to basically use the same
solution that Twisted currently does: namely, a mandatory dependency on
`pyOpenSSL <https://pypi.python.org/pypi/pyOpenSSL>`_.

The `pip <https://pip.pypa.io/>`_ program has to embed all its
dependencies for pratical reason. Since pip depends on requests, it means
that it would have to embed a copy of pyOpenSSL. That would imply
usability pain to install pip. Currently, pip doesn't support embedding
C extensions which must be compiled on each platform and so require a C
compiler.

Since Python 2.7.9, Python embeds a copy of pip both for default
installation and for use in virtual environments: the new ``ensurepip``
module. If pip ends up bundling PyOpenSSL, then Python will end up
bundling PyOpenSSL. Only backporting ``ssl.MemoryBIO`` and
``ssl.SSLObject`` would avoid to have to embed pyOpenSSL to only include
the strict minimum features required by requests and fix the bootstrap
issue (python -> ensurepip -> pip -> requests -> MemoryBIO).


Changes
=======

Add ``MemoryBIO`` and ``SSLObject`` classes to the ``ssl`` module of
Python 2.7.

The code will be backported and adapted from the master branch
(Python 3).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to backport from the Python 3.6 maintenance branch rather than from the development branch - that way it's a true backport of a released version, rather than potentially including code that hasn't previously been published in a stable release.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expect the code to be exactly the same. I don't think that MemoryBIO or SSLObject changed much since Python 3.5.

I prefer to backport from master to ease comparison of 2.7 and master branches, to easy re-sync later. As explained in another paragraph: reduce the diff between these two branches.


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be too in the weeds for a PEP, but when I worked on this in 2014, it also significantly reduced the size of the Python2/Python3 diff of the _ssl module, which I would expect to make maintenance easier.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

The backport also significantly reduced the size of the Python 2/Python
3 difference of the ``_ssl`` module, which make maintenance easier.


Links
=====

* :pep:`543`
* `[backport] ssl.MemoryBIO
<https://bugs.python.org/issue22559>`_: Implementation of this PEP
written by Alex Gaynor (first version written at October 2014)
* :pep:`466`


Discussions
===========

* `[Python-Dev] Backport ssl.MemoryBIO on Python 2.7?
<https://mail.python.org/pipermail/python-dev/2017-May/147981.html>`_
(May 2017)


Copyright
=========

This document has been placed in the public domain.




..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End: