Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] v2 cryptographic overview #54

Open
franky47 opened this issue Apr 13, 2023 · 0 comments
Open

[RFC] v2 cryptographic overview #54

franky47 opened this issue Apr 13, 2023 · 0 comments

Comments

@franky47
Copy link
Member

franky47 commented Apr 13, 2023

Overview

v1.x used a naive AES-GCM cipher with a single key and random nonces, which is not scalable. Another issue was a vulnerability to confused deputy attacks (CDA).

v2 aims to improve the cryptographic layer using the following properties:

  • AEAD construction where AAD is set to the path of the field (model name + field name)
  • Key derivation to increase the cipher input entropy (pseudorandom key + random IV)

Caveats

Note that the following operations will still not be supported on encrypted fields in v2, and are not planned to be:

  • Partial matching (startsWith, endsWith, contains etc..)
  • Ordering

AEAD

In order to strongly bind a ciphertext to its storage location - to defend against CDA and field value swapping - the path where a record is stored should be part of the additional authenticated data (AAD).

This path is made of three dimensions:

  • The table
  • The column
  • The row

While it is fairly easy to pin the column (by setting the table and column name in AAD), pinning the row is more challenging. Usually, pinning the row is done by setting the row ID as AAD. However, this does not work in cases where the row ID is not available.

When encrypting a new record, the row ID may be omitted to be automatically generated by the database engine (eg: autoincremental integer and UUIDs primary keys).

When decrypting a record, the ID may be absent from either the query or the returned data.

Options here are:

  1. Not including any row pinning, and only pin table & column. Pros: simple to implement. Cons: no practical defense against CDA.
  2. Have 1 as default for auto-incrementing (database-generated) IDs, but allow runtime-generated IDs (eg: UUIDs, CUIDs) by parsing the Prisma schema. Pros: allows defense against CDA. Cons: cannot use autoincremental IDs, may be difficult to implement, especially regarding connections & relations. Won't work when using queries that don't supply the ID (where clause on other @unique fields).
  3. Perform operations in multiple steps. Eg: write an empty record to the database to obtain a row and its ID, then encrypt fields with full AAD pinning, and write the ciphertexts, all in a transaction. Pros: can use autogenerated IDs. Cons: performance cost, increased risks of conflicts leaving data in an inconsistent state, possible data race conditions.

Rejected ideas:

  • Having a separate column managed internally by the middleware to serve as an AAD row reference. Rejected as it would be trivial for an attacker to swap these references along with ciphertext and still cause a valid decryption across rows.

Note: model and column renaming may also cause AAD mismatches, such cases should be covered by data migrations.

Composite IDs (using @@id) could be supported, with extra care about canonicalisation attacks. For example, with a naive string concatenation, those two rows would have the same AAD data:

model User {
  firstName String
  lastName  String
  @@id([firstName, lastName])
}
firstName lastName Resulting AAD
John Doe UserJohnDoe
Joh nDoe UserJohnDoe

Algorithm selection

The use of AES-GCM with 256 bit keys will be maintained, not for retrocompatibility (there won't be any due to the additional use of AAD), but because it's a common cipher available in most implementations. A non-NIST alternative native to Node.js would be ChaCha20-Poly1305, which conveniently has the same nonce and auth tag sizes as AES GCM.

Key derivation

Rather than specifying a single encryption key in the middleware configuration and use random nonces, a root secret will be used to derive individual keys and nonces, using HKDF-SHA-256.

This still assumes a cryptographically strong root secret.

Key derivation takes care of reducing the reuse of keys across a database. While a per-field derivation would be possible, it may be costly and redundant with the use of AAD, so a per-row derivation may be preferable. The upper bound for the number of encryptions would then be defined by the number of columns on a particular table, the size of the data to encrypt, and the number of edits per row. In most applications, this is far below the threshold where using random nonces becomes problematic.

Update 2024-06-29: A possibly simpler alternative would be to use XAES-256-GCM, which was recently rated FIPS-140.

Key commitment

In order to mitigate the invisible salamanders attack, where multiple keys can decrypt the same ciphertext and verify the authentication tag, the key itself (and probably the nonce too) should be part of the AAD.

Note: AES-GEM does this, but increases the size of the authentication tag. Can we ensure commitment with a standard 16 bytes AT?

Edit: no. https://crypto.stackexchange.com/questions/108200/key-commitment-in-gcm-or-aead-in-general

Rotation

Rotation of the root derivation secret will be planned just as multiple keys were supported in V1. Ciphertext rotations will require the same data migration technique.

Ciphertext format

todo: Document v2 ciphertext format

v1 to v2 migration

While both versions may require different configurations (root secrets/keys differences), it would be recommended to provide a data migration strategy, to ease with adoption.

That being said, the migration workflow may require several deployment phases (expand & contract pattern, akin to database migrations) if a zero-downtime upgrade is desired.

Therefore, v2 will ship with a read-only compatibility layer for v1, which will be removed altogether in a later update.

Resources

https://soatok.blog/2023/03/01/database-cryptography-fur-the-rest-of-us/
https://scottarc.blog/2022/10/17/lucid-multi-key-deputies-require-commitment/
https://docs.aws.amazon.com/encryption-sdk/latest/developer-guide/supported-algorithms.html
https://words.filippo.io/dispatches/xaes-256-gcm/

@franky47 franky47 pinned this issue Apr 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant