Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure that files never touch an actual disk unencrypted #14

Closed
ioerror opened this issue May 16, 2013 · 24 comments
Closed

Ensure that files never touch an actual disk unencrypted #14

ioerror opened this issue May 16, 2013 · 24 comments

Comments

@ioerror
Copy link

ioerror commented May 16, 2013

It appears that in https://github.com/deaddrop/deaddrop/blob/master/crypto.py#L97 the data is unencrypted on the disk. Even if the disk is encrypted, I think this is sub-optimal. I would suggest either a tmpfs (not so solid for various forensics reasons), encrypting the files on the client as they're uploading, or encrypting them in chunks once they're in the server's memory.

@djon3s
Copy link

djon3s commented Oct 8, 2013

Link to the code ioerror references above has been broken :-( new link https://github.com/freedomofpress/securedrop/blob/master/modules/deaddrop/files/deaddrop/crypto.py#L97

@fpietrosanti
Copy link

At globaleaks we also had a discussion about the same kind of problem and in this ticket you can find the different steps and analysis that has been done on the topic globaleaks/globaleaks-whistleblowing-software#672

@garrettr
Copy link
Contributor

@fpietrosanti Thanks for the detailed design doc! I have encountered this problem in my own work and ultimately used solution c) from your list. I believe it would be a good solution for this project as well. Concretely, the two options that I see as being useful are

  1. tmpfs (@ioerror, can you clarify the forensic concerns you mentioned?)
  2. Temporary, per-upload, encrypted loopback filesystems using ephemeral keys.

In the second option, we would need to write a utility to prepare the temporary storage area and hook it in to the web server/web framework in question to ensure that cleartext is never written to disk. It would probably have to be written in C or another low-level language so we can guarantee the ephemeral key is securely wiped from memory. This is non-trivial but certainly doable.

he next step would be looking into the details of how web.py/our webserver (nginx) handles file uploads. IIRC they typically stream requests into memory up to a certain size of request, and for larger requests they are streamed onto disk (in the clear) instead.

@Taipo
Copy link

Taipo commented Oct 16, 2013

(@ioerror, can you clarify the forensic concerns you mentioned?)

Perhaps its a reference to cold boot attacks.

@stsquad
Copy link

stsquad commented Oct 17, 2013

@garrettr /@Taipo: tmpfs is usually backed by swap so if you fill up tmpfs enough and create memory pressure on the machine parts of the file-system will be swapped out onto disk. Once there it would be recoverable unless other measures are taken.

@fpietrosanti
Copy link

@garrettr i think that keeping everything over a CryptoFS is good, but is not good to make a cryptofs for each file uploaded because it would increase the effect of a DOS with tons of submission of files done in parallel. I updated globaleaks/globaleaks-whistleblowing-software#672 with some last consideration coming from this thread about a possible complete solution.

@garrettr
Copy link
Contributor

@stsquad We can get around this by either a) encrypting the swap space, or b) not having swap at all, which is what we do now (the installation scripts call swapoff -a on the servers). There is a stabilty/privacy trade-off here (if the server gets DDoS'ed, it will crash hard instead of going in to swappy hell) but I think it is worth it.

@fpietrosanti There is a lot of overhead in that scheme - but it gets you perfect forward secrecy.

It sounds like the best compromise would be to create a tmpfs, and direct the web server to use that for caching requests. We could secure-wipe/zerofill the tmpfs, either periodically or after every successfully processed upload, to mitigate cold boot attacks. I am not sure if it would even be helpful to encrypt it (since a decrypted version would be maintained elsewhere in memory by the cryptfs anyway).

@garrettr
Copy link
Contributor

Another related concern is reading plaintext into processes whose memory we cannot securely erase, especially long-running processes like the Python web server (which GC, and depending on their activity level, may keep things in memory for a long time).

@fpietrosanti
Copy link

@garrettr you can achieve PFS also without creating a new loopfs, the only important elements is that there is a "temporary encryption key per file" used, to symmetrically encrypted the files also going trough a simple AES pipe.

The points IMHO are:

  • setting or not the requirement to be able to resist an "application restart", that means being able to be resilient against application bugs or being able to process files properly while being under a denial of service.
    It's easier not to be resilient from application restart because you only need to keep the key in the RAM memory. But it does impact the "availability" criteria, making the application less resilient.
    For that reason IMHO the temporary key must be written to disk (encrypted) and then deleted, otherwise if something bad happen (crash, out of memory, need to force app restart to try to tweak the apps while resisting a DOS) between a file is uploaded and the files is encrypted multiple times with the journo's PGP keys, that file has gone.
  • The "amount of processing resources" that has to be done before receiving a file
  • The privileges required to use this method to process the file

Setting up a Luks container does require high privileges because it does need to interact with the kernel, so this operation must be an extraordinary action (likely to be done only once at first boot of the server/application) and not an ordinary actions (like for each file).

So i'd suggest to consider only methods that can be managed from the application itself, under it's low privileges with only application logic (while extraordinary stuff such as mounting a Luks container is something to be done by the root operator at first server/application startup time).

@djon3s
Copy link

djon3s commented Oct 20, 2013

Named pipe. Pipe through GPG then over SSH. Fifo does not (shouldn't) write to disks. "File does not touch the disk unencrypted" (or even encrypted). Double check with Strace.

As for the concerns about plain-text remaining in memory mentioned by @garrettr, perhaps that should be a different issue.

The concerns about DDOSing the service, use some sort of polling / asynchronous io to make sure there is always a process ready on the "read" side of the named pipe ready to go to be written to.

@fpietrosanti
Copy link

@djon3s The problem with pipe come from The need to do buffering that will cause a DOS if uploading a file bigger than RAM memory. In globaleaks we use a scheduler based on python APScheduler to handle asyncronously all activities that can lead to DOS in processing and need to be managed in the parallelism/threshold of execution.

@djon3s
Copy link

djon3s commented Oct 21, 2013

@fpietrosanti With regard to pipes: I have a feeling though that pipes have become more like streams, rather than copying data. (https://lwn.net/Articles/118750/) and the according to the man page, splice() syscall was implemented in 2.6.17 kernel (I think this kernel change is what is relevant or please tell me if I'm wrong). Either way, it's academic, I am doing some stress testing now by piping some random data over ssh and then monitoring memory consumption with top. It seems to not increase memory consumption.

With regards to piping through gpg / whatever else, that might be an issue. I could guess why the buffer will perhaps fill if we are encrypting to a public key (this is not a stream cipher after all)... however I did a test like the above with the following ....cat /dev/urandom | gpg --output -a - --encrypt --recipient user@email.tld - | ssh djon3s@othercomputer "cat - > ~/random.data" and then monitored both computers with top to watch memory consumption, and the memory consumption did not seem to rise. Results are the same for gpg as with a plaintext pipe.

If my reasoning is correct, (confidence level: medium) I believe this would solve the issue of not touching the disk unencrypted, with the issues of submissions/sensitive data cached in RAM by the server mentioned by @garrettr still to resolve, and perhaps best to open as a new issue.

Ways I could be wrong:

  • I'm using GPG wrongly (most likely)
  • I have not done sufficient performance monitoring.
  • ???

@garrettr
Copy link
Contributor

The problem with pipe come from The need to do buffering that will cause a DOS if uploading a file bigger than RAM memory.

@fpietrosanti While this is a problem, it could be temporarily mitigated by limiting the maximum size of uploads to something reasonable and within the limit of server money (say, 1 or 2 GB). While there are legitimate cases in which sources may want to upload larger files (video, etc.), we could special-case those in a variety of ways since I do not expect them to make up the bulk of submissions. Two ways to do this would be:

  1. Advise sources with files larger than the limit to first contact journalists, and they can advise them on how to split the file into pieces and upload each piece. Or they could set up a separate one-time channel for the larger upload.
  2. Allow file uploads larger than the limit if they are already encrypted (mitigating the risk of plaintext touching disk)

@garrettr
Copy link
Contributor

and the memory consumption did not seem to rise

@djon3s Why does this indicate to you that the file is not touching disk? To me, this seems indicative of one of two cases:

  1. The file is being streamed and is only present in kernel memory pages (good)
  2. The flie is being written to disk temporarily (bad!). This is, for example, what web.py does by default with file uploads (the cause of this issue).

The only way to be sure is to examine the source and use strace to make sure nothing sensitive is being written to disk.

@garrettr
Copy link
Contributor

I believe that the "pipe to GPG" method is the best method of solving this problem. To do this properly, we will need to check each point in the chain of programs handling the file to make sure they don't write anything to disk, via examining the source and using strace. We need to check:

  1. The web server (Apache). Does it ever buffer uploads to disk?
  2. The web application/framework. At the moment, the web application writes the plaintext file to a temporary file on disk, then passes the filename of the plaintext file to gpg, which encrypts it. We will need to patch the web application (and the underlying web framework) to stream to GPG instead.
  3. GPG - does it ever buffer to disk in the process of encrypting a file?

With regards to piping through gpg / whatever else, that might be an issue. I could guess why the buffer will perhaps fill if we are encrypting to a public key (this is not a stream cipher after all)...

This is not necessarily a problem. GPG actually generates a random symmetric (stream cipher) key, encrypts the file with that, then encrypts the random symmetric key with the chosen asymmetric key and combines them together to create the final encrypted file. So it is not impossible for it to encrypt a stream as it comes in... but we must confirm this.

@garrettr
Copy link
Contributor

with the issues of submissions/sensitive data cached in RAM by the server mentioned by @garrettr still to resolve, and perhaps best to open as a new issue

Opened #99 to address this.

@ageis
Copy link
Contributor

ageis commented Feb 6, 2014

Interesting discussion. Either way /proc/sys/vm/swappiness should be set to 0 in sysctl.conf.

Did a little research and it appears neither Apache or GPG buffer to disk in normal operation. How they behave in an OOM state is another question...

@garrettr
Copy link
Contributor

Resolved in #862.

@evilaliv3
Copy link
Contributor

@garrettr @redshiftzero : was the analysis of Apache2 completed in relation to this topic?

i think that by default the same issue may exist in the multipart implementation of it and this could cause to write plaintext data for both POST and file uploads: https://github.com/apache/httpd/blob/5f32ea94af5f1e7ea68d6fca58f0ac2478cc18c5/server/apreq_parser_multipart.c

Probably while this is analyzed it would be better to enlarge the buffer size and limit the requests size to the size of the buffer?

@jimjag @wrowe: would you please point us to the right person inside apache can could support identifying the best configuration possibilities and if necessary design a patch for this?

@redshiftzero
Copy link
Contributor

After your report @evilaliv3 I just uploaded a very large file in a SecureDrop production-like environment (to trigger the logic introduced in #862) and monitored apache's behavior with strace (to look for open calls) and lsof (to see if it's writing files to disk in unexpected locations - apache's capabilities here are limited by our AppArmor profile fortunately, though it can write to e.g. /tmp). In our configuration I'm only seeing us writing files in the expected place in the expected manner i.e. files stream AES encrypted to /tmp/<foo>.aes. Are other preconditions necessary here to trigger the case you describe above?

@evilaliv3
Copy link
Contributor

evilaliv3 commented Aug 27, 2019

@redshiftzero: i'm actually not sure because i'm still trying to understand if apache, while using libapache2-mod-wsgi works in streaming passing the stream directly to flask (where #862 is effective) or if it may be doing some multipart parsing as well.

In this condition i doubt that when the stream is bigger that the brigade_limit (https://github.com/apache/httpd/blob/5f32ea94af5f1e7ea68d6fca58f0ac2478cc18c5/server/apreq_parser_multipart.c#L272) the file could be parsed using the filesystem as temporary storage.

@GrahamDumpleton: would you please take a look at this ticket?

@GrahamDumpleton
Copy link

Nothing in mod_wsgi should be trying to parse the content of a request. It gets passed through straight to the WSGI application as is. Whether the WSGI framework will send it to disk is a different matter.

@evilaliv3
Copy link
Contributor

Thank you @GrahamDumpleton for checking this already! this is a relief.

The implementation of the WSGI Framework (Flask) was fixed in #862

We should be fine so @redshiftzero !

@GrahamDumpleton
Copy link

If not already done, what you perhaps want to look at is any input filters for Apache that may sensibly be applied when using mod_wsgi.

One that comes to mind is mod_deflate, when request content is compressed. That said, that is actually hard to use with WSGI applications because you need extensions in the WSGI framework to step around WSGI spec and ignore CONTENT_LENGTH on input and read to end of input. Flask does support that though.

legoktm pushed a commit that referenced this issue Sep 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants