Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request that secures file uploads #862

Merged
merged 5 commits into from
Feb 6, 2015

Conversation

garrettr
Copy link
Contributor

@garrettr garrettr commented Feb 5, 2015

Background

Initially, this PR was intended to revert the reverting of request_that_secures_files_uploads.py in f42765e. That commit demonstrated using Flask's request wrappers to replace the stream class used for requests (including POSTs, which may contain submission plaintext) with a custom stream class.

Flask's default approach is to buffer requests in memory (with io.BytesIO) if they are small (less than 512KB), and otherwise buffer them to disk (with tempfile.TemporaryFile). This does not work well for us because large requests are usually POSTs containing submission plaintext, and we want to prevent plaintext from ever being written on disk due to the threat of forensic recovery.

Original approach

The original approach in f42765e was to always buffer into memory, regardless of the upload size. Since SecureDrop servers have swap disabled, this prevents plaintext from being written to disk. This commit was temporarily reverted while I was investigating one of the issues from the audit (which had to do with forensic recovery of plaintext from memory).

Drawbacks

My original plan was simply to revert the temporary revert of original commit, and return to using memory for buffering. However, I was somewhat concerned about the ability of this approach to scale for large submissions. Some testing confirmed that it does not, and would fail with MemoryError and other, memory-related errors on sufficiently large submissions (100-200MB seemed to do it on a development VM with 1GB of memory allocated).

We actually needed 2x the submission size in RAM (!!) for each submission because:

  1. The initial submission needed to be fully buffered in RAM

  2. We store all file submissions in zipfiles (see store.save_file_submission) to:

    1. compress them if possible
    2. retain their original filename in an encrypted and user-friendly-to-decrypt-and-use-on-Tails manner.

    Since we didn't want to write that zipfile (containing plaintext) to disk either, we also allocated it in memory. In the worst case, the submission was already compressed and so we would need to allocate memory equal to the full size of the submission to store the corresponding zipfile.

Our current maximum file size is 500MB (and we'd actually like to increase that to 1GB or even more). You can see how this all-in-memory technique implies:

  1. SecureDrop servers would need lots of RAM just to operate normally.
  2. They would be very easy to DOS - just get a couple of large concurrent uploads going to trigger OOM and fail the uploads.

Other possibilities

Ideally, we would like to use Flask's original approach, including buffering large submissions to disk, which usually has much greater capacity than memory. However, we want to prevent a forensic attacker from recovering the plaintext of submissions. Once a submission has been received and encrypted, it would ideally be impossible for an attacker with access to the computer (but not the key used to encrypt the submission) to recover the plaintext - even if they have forensic disk analysis capabilities.

One possible solution that I considered (and actually implemented, but is not the final solution presented here) was to use a forensic disk erasure tool (e.g. shred) to erase the plaintext submission after it's been encrypted. The problem is that these tools are of uncertain efficacy, especially when used on (increasingly widespread) SSDs. I decided this uncertainty was unacceptable for such an important security property.

Final solution

The final solution implemented in this pull request is, at a high level:

  1. Like Flask, small requests (< 512KB) are buffered in memory.
  2. When a larger submission is received, SecureDrop:
    1. Generates a random, ephemeral AES key that is only stored in memory.
    2. Creates a temporary file on disk.
    3. Streams the submission onto disk, encrypting it with the ephemeral key using AES-CTR mode.
    4. When the submission needs to be read, it is similarly decrypted on the fly.
    5. When the stream class (SecureTemporaryFile) is closed:
      1. The file on disk is deleted (normally, no need for shred)
      2. The AES key is only stored in memory (on the stream class), and will eventually be GC'ed and overwritten (unfortunately we can't precisely control this behavior, at least at the moment, although we might be able to use this approach to zerofill the key, but we would need to do some forensic analysis to make sure that really works as expected).

Additional changes required for zipfiles

An additional change was required to handle the zip file correctly (without buffering the whole thing into memory). Ideally, you would also be able to build a zipfile with this new SecureTemporaryFile class, and you can - but unfortunately, you cannot buffer a file-like object into a zip file, you must instead do it all in one go with zipfile.ZipFile.writestr. This returns us to needing to buffer the whole thing in memory (argh!) and caused stability problems in my testing.

To fix this issue, I switched to using gzip, since it supported writing and reading from file-like objects. This enabled me to build a gzip.GzipFile from a SecureTemporaryFile and have it return a SecureTemporaryFile, which could then be read by gpg.encrypt to perform the final encryption.

Using gzip achieves the other primary goals of using zipfiles as well:

  1. The files are compressed (if possible). gzip uses the same zlib compression as zipfile so there is no tradeoff here.
  2. The original filename is embedded inside the gzip'ed file. This filename is encrypted inside the final .gz.gpg file, so it is not stored in plaintext on the Application Server's disk. Additionally, double-clicking a .gz in Tails leads to the same outcome (thank god) as double-clicking a .zip: it opens an archive window which shows the file contained inside the archive (with the original filename), which can then by opened by double-clicking. So there is also no change in the workflow or usability tradeoff.

Additional changes to templates

While testing this, I noticed that the default filesizeformat template filter formats file sizes in base-10 by default. Most browsers (and most computer things in general...) format file sizes in base 2. This lead to confusing discrepancies between the file sizes listed on the Document Interface and the files sizes shown in the browser UI when downloading a file (in both Firefox/Tor Browser and Chrome). Fortunately, this was easy to fix with a built-in argument meant just for this purpose 👍

Acknowledgements

Thanks to Globaleaks for thinking of this first (and implementing it)! I based this pull request in part on their implementation.

Also thanks to @jacksingleton for implementing the original request_that_secures_file_uploads.py and agitating to take this approach, like, a long time ago.

@dolanjs dolanjs mentioned this pull request Feb 5, 2015
@garrettr garrettr force-pushed the request-that-secures-file-uploads branch from 791d4c0 to 58e21cf Compare February 6, 2015 17:55
@garrettr
Copy link
Contributor Author

garrettr commented Feb 6, 2015

Originally, this pr used the Python cryptography library to implement AES-CTR. Unfortunately, cryptography relies on libffi for its bindings to crypto libraries like OpenSSL, and libffi is incompatible with grsecurity's memory protections. While it is possible to use a combination of patches to Python and libffi to safely circumvent this problem, I decided instead to replace the use of cryptography with pycrypto, which we already have as a dependency for their convenient cryptographically secure random module.

I have tested this PR and confirmed that it works with grsecurity. It works with small and large files, and if there was any performance difference between the two libraries it was not noticeable.

garrettr added a commit that referenced this pull request Feb 6, 2015
…-uploads

Request that secures file uploads
@garrettr garrettr merged commit 7ed392a into develop Feb 6, 2015
dolanjs pushed a commit to dolanjs/securedrop that referenced this pull request Feb 6, 2015
added the apparmor entries required for pr freedomofpress#862
@psivesely
Copy link
Contributor

Have you considered using AES in CBC mode instead of CTR mode for performance reasons?

Although the parameters are a bit wonky—1,000 16,384-byte blocks are encrypted—here are the results from the pct-speedtest.py test from pycrypto run on my i7-5600U in a VM:

AES256•CBC key setup: 345.11 kKeys/sec
AES256•CBC encryption: 267.46 MBps
AES256•CTR key setup: 45.38 kKeys/sec
AES256•CTR encryption: 192.34 MBps

Since CBC does encryption serially, but decryption in parallel, decryption benchmarks should show an even more marked increase in performance. I can modify the file later to get some results with more realistic parameters and to show decryption as well.

I know CBC has some downsides in certain contexts, but I don’t believe any that I am aware of apply here. Some further contemplation should be necessary, however:

(1) Padding oracle attacks shouldn’t be possible because the SecureDrop server in no way servers as a decryption oracle.
(2) Adaptive chosen plaintext attacks are certainly not possible with random, epehemeral keys, so attacks such as the one described second here should not be possible.

If you have already considered CBC and found a reason to choose CTR over it, I’d be curious to know why.

Lastly, I think shred should still be called on these encrypted files because they still may reveal valuable metadata to forensic analysts including time of write and file size.

@redshiftzero redshiftzero deleted the request-that-secures-file-uploads branch November 9, 2016 01:11
redshiftzero added a commit that referenced this pull request Nov 8, 2017
These tests were added in PR #476, but will not pass as we do have
secrets in memory (but not on disk, see PR #862). Anyway, these tests
are clever, so let's keep them but mark them xfail to prevent
confusion in the future.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants