Skip to content
This repository was archived by the owner on Feb 8, 2023. It is now read-only.

Convergent Encryption #63

Open
prusnak opened this issue Oct 7, 2015 · 5 comments
Open

Convergent Encryption #63

prusnak opened this issue Oct 7, 2015 · 5 comments

Comments

@prusnak
Copy link

prusnak commented Oct 7, 2015

A node operator can currently read the contents of the blocks stored on their node, because they are not encrypted. They cannot recover the file unless they have its IPFS hash, but still blocks might reveal some sensitive data.

One approach to deal with this situation in a way that is a perfect match for IPFS is Convergent Encryption.

It would work like this:

  1. compute the IPFS hash of the plaintext file - let's call it H_p (plaintext hash)
  2. encrypt the plaintext file with AES, using H_p (or KDF(H_p) such as scrypt) as the encryption key
  3. add the resulting ciphertext file into IPFS, this will produce hash H_c (ciphertext hash)

To read the file contents one would need to have both H_c (to retreive the ciphertext file from IPFS) and H_p (to decrypt the ciphertext file and to confirm that the resulting file is indeed the wanted one).

Because the encryption key depends on the plaintext file and is generated in a deterministic manner, this solution has a nice property that it allows the block-level deduplication of encrypted files as well.

I am posting this idea here, because I am not sure if this might be interesting to implement into IPFS library directly, or this should belong to the application level built on top of IPFS.

@eminence
Copy link

eminence commented Oct 7, 2015

So let's say Alice has data she wants to encrypt. She has plaintext P1, and produces H_p, H_c, and the ciphertext C1 as per the approach you describe. She wants to make P1 data available to Bob, but wants to ensure that all nodes (including her own node, Bob's node, and any other intermediate nodes) never see P1, they only see H_c and C1.

Without this encryption scheme, Alice would normally communicate H_p to Bob. But in this scenario, she communicates H_c (which is used to get the raw data out of IPFS), and also H_p (which must be treated as a secret). Bob uses H_p to recover P1 from C1, but since H_p is a secret, no one else can do this.

Do I have this right?

@prusnak
Copy link
Author

prusnak commented Oct 7, 2015

@eminence Yes, you do have this right.

@prusnak
Copy link
Author

prusnak commented Oct 7, 2015

Created a simple PoC here: https://github.com/prusnak/ipfs-ce

Plaintext hash is used directly. Real implementation should probably use a key derivation function.

@jbenet
Copy link
Member

jbenet commented Oct 9, 2015 via email

@frankbraun
Copy link

It's good to hear that "object level crypto" is planned for IPFS. I think object level encryption should be the default for a distributed file system like IPFS.

I agree with @jbenet's comment about the Tahoe-LAFS folks, they figured this out a long time ago.

As a reference: https://tahoe-lafs.org/hacktahoelafs/drew_perttula.html describes the two problems with the simple approach to convergent encryption. But the proposed solution of using an added_secret is good and very easy to implement.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants