-
Notifications
You must be signed in to change notification settings - Fork 685
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Request that secures file uploads #862
Conversation
This class is intended to be used to safely buffer uploads (which may contain submission plaintext) to disk.
791d4c0
to
58e21cf
Compare
Originally, this pr used the Python cryptography library to implement AES-CTR. Unfortunately, cryptography relies on libffi for its bindings to crypto libraries like OpenSSL, and libffi is incompatible with grsecurity's memory protections. While it is possible to use a combination of patches to Python and libffi to safely circumvent this problem, I decided instead to replace the use of cryptography with pycrypto, which we already have as a dependency for their convenient cryptographically secure random module. I have tested this PR and confirmed that it works with grsecurity. It works with small and large files, and if there was any performance difference between the two libraries it was not noticeable. |
…-uploads Request that secures file uploads
added the apparmor entries required for pr freedomofpress#862
Have you considered using AES in CBC mode instead of CTR mode for performance reasons? Although the parameters are a bit wonky—1,000 16,384-byte blocks are encrypted—here are the results from the pct-speedtest.py test from pycrypto run on my i7-5600U in a VM:
Since CBC does encryption serially, but decryption in parallel, decryption benchmarks should show an even more marked increase in performance. I can modify the file later to get some results with more realistic parameters and to show decryption as well. I know CBC has some downsides in certain contexts, but I don’t believe any that I am aware of apply here. Some further contemplation should be necessary, however: (1) Padding oracle attacks shouldn’t be possible because the SecureDrop server in no way servers as a decryption oracle. If you have already considered CBC and found a reason to choose CTR over it, I’d be curious to know why. Lastly, I think |
Background
Initially, this PR was intended to revert the reverting of
request_that_secures_files_uploads.py
in f42765e. That commit demonstrated using Flask's request wrappers to replace the stream class used for requests (including POSTs, which may contain submission plaintext) with a custom stream class.Flask's default approach is to buffer requests in memory (with
io.BytesIO
) if they are small (less than 512KB), and otherwise buffer them to disk (withtempfile.TemporaryFile
). This does not work well for us because large requests are usually POSTs containing submission plaintext, and we want to prevent plaintext from ever being written on disk due to the threat of forensic recovery.Original approach
The original approach in f42765e was to always buffer into memory, regardless of the upload size. Since SecureDrop servers have swap disabled, this prevents plaintext from being written to disk. This commit was temporarily reverted while I was investigating one of the issues from the audit (which had to do with forensic recovery of plaintext from memory).
Drawbacks
My original plan was simply to revert the temporary revert of original commit, and return to using memory for buffering. However, I was somewhat concerned about the ability of this approach to scale for large submissions. Some testing confirmed that it does not, and would fail with
MemoryError
and other, memory-related errors on sufficiently large submissions (100-200MB seemed to do it on a development VM with 1GB of memory allocated).We actually needed 2x the submission size in RAM (!!) for each submission because:
The initial submission needed to be fully buffered in RAM
We store all file submissions in zipfiles (see
store.save_file_submission
) to:Since we didn't want to write that zipfile (containing plaintext) to disk either, we also allocated it in memory. In the worst case, the submission was already compressed and so we would need to allocate memory equal to the full size of the submission to store the corresponding zipfile.
Our current maximum file size is 500MB (and we'd actually like to increase that to 1GB or even more). You can see how this all-in-memory technique implies:
Other possibilities
Ideally, we would like to use Flask's original approach, including buffering large submissions to disk, which usually has much greater capacity than memory. However, we want to prevent a forensic attacker from recovering the plaintext of submissions. Once a submission has been received and encrypted, it would ideally be impossible for an attacker with access to the computer (but not the key used to encrypt the submission) to recover the plaintext - even if they have forensic disk analysis capabilities.
One possible solution that I considered (and actually implemented, but is not the final solution presented here) was to use a forensic disk erasure tool (e.g.
shred
) to erase the plaintext submission after it's been encrypted. The problem is that these tools are of uncertain efficacy, especially when used on (increasingly widespread) SSDs. I decided this uncertainty was unacceptable for such an important security property.Final solution
The final solution implemented in this pull request is, at a high level:
SecureTemporaryFile
) is closed:Additional changes required for zipfiles
An additional change was required to handle the zip file correctly (without buffering the whole thing into memory). Ideally, you would also be able to build a zipfile with this new
SecureTemporaryFile
class, and you can - but unfortunately, you cannot buffer a file-like object into a zip file, you must instead do it all in one go withzipfile.ZipFile.writestr
. This returns us to needing to buffer the whole thing in memory (argh!) and caused stability problems in my testing.To fix this issue, I switched to using
gzip
, since it supported writing and reading from file-like objects. This enabled me to build agzip.GzipFile
from aSecureTemporaryFile
and have it return aSecureTemporaryFile
, which could then be read bygpg.encrypt
to perform the final encryption.Using
gzip
achieves the other primary goals of using zipfiles as well:gzip
uses the samezlib
compression aszipfile
so there is no tradeoff here.gzip
'ed file. This filename is encrypted inside the final.gz.gpg
file, so it is not stored in plaintext on the Application Server's disk. Additionally, double-clicking a.gz
in Tails leads to the same outcome (thank god) as double-clicking a.zip
: it opens an archive window which shows the file contained inside the archive (with the original filename), which can then by opened by double-clicking. So there is also no change in the workflow or usability tradeoff.Additional changes to templates
While testing this, I noticed that the default
filesizeformat
template filter formats file sizes in base-10 by default. Most browsers (and most computer things in general...) format file sizes in base 2. This lead to confusing discrepancies between the file sizes listed on the Document Interface and the files sizes shown in the browser UI when downloading a file (in both Firefox/Tor Browser and Chrome). Fortunately, this was easy to fix with a built-in argument meant just for this purpose 👍Acknowledgements
Thanks to Globaleaks for thinking of this first (and implementing it)! I based this pull request in part on their implementation.
Also thanks to @jacksingleton for implementing the original
request_that_secures_file_uploads.py
and agitating to take this approach, like, a long time ago.