Skip to content

Conversation

kevinAlbs
Copy link
Contributor

@kevinAlbs kevinAlbs commented Apr 5, 2023

Summary

  • Do not create or drop eccCollection.
  • Require escCollection and ecocCollection not be documented.
  • Update spec tests to assert eccCollection is not created.
  • Return error if attempting to create a QE collection with server version < 7.0.0 (maxWireVersion < 21)

Additional Improvements

  • Use YAML anchors for repeated collection names and encryptedFields.
  • Remove no longer applicable test: encryptedFieldsMap with cyclic entries does not loop
  • Add test encryptedFields are consulted for metadata collection names

Changes have been tested in the C driver here: mongodb/mongo-c-driver#1232

Tests require libmongocrypt 1.8.0-alpha0 or newer. Binaries for 1.8.0-alpha0 are available here: https://spruce.mongodb.com/task/libmongocrypt_publish_upload_all_042603b1d72f49d9034e7059b82a65e843c7e38a_23_03_29_15_26_54/logs?execution=0

Background & Motivation

Terminology

Queryable Encryption is also referred to as QE or FLE2.
The versions of the QE protocols are noted as QEv1 and QEv2.
QEv1 was introduced in 6.0.0. QEv2 is introduced in 7.0.0.

Removal of eccCollection

This is requested by DRIVERS-2524.
In QEv1, each QE collection required three additional metadata collections: eccCollection, ecocCollection, and escCollection.
In QEv2, the eccCollection is no longer required.

Removal of documenting escCollection and ecocCollection

This is requested by DRIVERS-2586.

Addition of Wire Version check

QEv2 is a backwards breaking change to QEv1. Drivers supporting QEv2 will not support QEv1. Drivers supporting QEv1 will not support QEv2. QEv2 is introduced in server 7.0.0. I expect QEv2 will not be backported.

Sending an QEv1 payload to mongod 7.0.0-alpha-1139-gc9e3390 results in a server error. Example:
(Location7292602) Encountered a Queryable Encryption find payload type that is no longer supported: 5

Sending a QEv2 payload to mongod 6.1.0 results in a server error. Example:
Enumeration value '11' for field 'subtype' is not a valid value.

Creating a QE collection using encryptedFields without including eccCollection does not result in a server error from 6.1.0. Instead, the eccCollection is implicitly created by the server. The wire version check is intended to give users a helpful error if attempting to use a QEv2 driver on a QEv1 server. Without the wire version check, the user may not discover the incompatibility until a QEv2 payload is sent. This may result in the QE collections being created but being unusable.

Creating a collection with encryptedFields with eccCollection does not result in a server error from 7.0.0-alpha-1139-gc9e3390. SERVER-75683 requests returning a helpful error if attempting to create a QEv1 collection on a QEv2 server.

Interaction with createEncryptedCollection

The wire version check is proposed in createCollection. createEncryptedCollection calls createCollection after creating the data keys. If the wire version check fails, this may result in data keys being created and returned. This behavior is expected to be improved by the proposal in DRIVERS-2540 to create data keys and collections in a transaction.

Please complete the following before merging:

  • Update changelog.
  • Make sure there are generated JSON files from the YAML test files.
  • Test changes in at least one language driver. Tested in C driver here
    - [] Test these changes against all server versions and topologies (including standalone, replica set, sharded clusters, and serverless). QE is skipped on serverless pending DRIVERS-2589. C driver does not currently test QE on sharded

@kevinAlbs kevinAlbs marked this pull request as ready for review April 5, 2023 14:20
@kevinAlbs kevinAlbs requested a review from a team as a code owner April 5, 2023 14:20
@kevinAlbs kevinAlbs requested review from jmikola, katcharov and markbenvenuto and removed request for a team April 5, 2023 14:20
@kevinAlbs kevinAlbs requested a review from jmikola April 7, 2023 13:48
@kevinAlbs kevinAlbs requested a review from jmikola April 10, 2023 13:39
Copy link
Member

@jmikola jmikola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some suggested YAML formatting changes that I missed in earlier reviews. I'll leave you to address these on your own and merge w/o another round of review.

Leaving an LGTM to you're not blocked.

Comment on lines 18 to 26
default.encryptedCollection: &encrypted_fields0 {
"fields": [
{
"path": "firstName",
"bsonType": "string",
"keyId": { "$binary": { "subType": "04", "base64": "AAAAAAAAAAAAAAAAAAAAAA==" }}
}
]
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
default.encryptedCollection: &encrypted_fields0 {
"fields": [
{
"path": "firstName",
"bsonType": "string",
"keyId": { "$binary": { "subType": "04", "base64": "AAAAAAAAAAAAAAAAAAAAAA==" }}
}
]
}
default.encryptedCollection: &encrypted_fields
fields:
- path: "firstName"
bsonType: "string"
keyId: { $binary: { base64: "AAAAAAAAAAAAAAAAAAAAAA==", subType: "04" }}

Sorry for not noticing this before. There should be no reason to use [ and { syntax across multiple lines in YAML. I also swapped the order of the $binary fields above to be consistent with our other test files.

I also removed the trailing 0 from the anchor name to be consistent with other files. If you accept that then make sure other references in the file are also updated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for not noticing this before. There should be no reason to use [ and { syntax across multiple lines in YAML.

I do not think rewriting the JSON adds value. The JSON more closely matches the the encryptedFields shown in output of listCollections from the server. The JSON may be easier for developers to visually compare. I prefer not to change.

I also swapped the order of the $binary fields above to be consistent with our other test files.

Done.

I also removed the trailing 0 from the anchor name to be consistent with other files. If you accept that then make sure other references in the file are also updated.

encrypted_fields0 was unused. 3fc008e removes the anchor.

"escCollection": "enxcol_.encryptedCollection.esc",
"eccCollection": "enxcol_.encryptedCollection.ecc",
"ecocCollection": "enxcol_.encryptedCollection.ecoc",
default.encryptedCollection: &encrypted_fields {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise, no reason to use { here. Consider the more idiomatic YAML syntax I suggested above:

default.encryptedCollection: &encrypted_fields
  fields:
    - path: "firstName"
      bsonType: "string"
      keyId: { $binary: { base64: "AAAAAAAAAAAAAAAAAAAAAA==", subType: "04" }}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think rewriting the JSON adds value. The JSON more closely matches the the encryptedFields shown in output of listCollections from the server. The JSON may be easier for developers to visually compare. I prefer not to change.

Comment on lines +103 to +115
encryptedFields: &encrypted_fields_expectation {
# Expect state collections are not included in the encryptedFields sent to the server.
"escCollection": null,
"ecocCollection": null,
"eccCollection": null,
"fields": [
{
"path": "firstName",
"bsonType": "string",
"keyId": { "$binary": { "subType": "04", "base64": "AAAAAAAAAAAAAAAAAAAAAA==" }}
}
]
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
encryptedFields: &encrypted_fields_expectation {
# Expect state collections are not included in the encryptedFields sent to the server.
"escCollection": null,
"ecocCollection": null,
"eccCollection": null,
"fields": [
{
"path": "firstName",
"bsonType": "string",
"keyId": { "$binary": { "subType": "04", "base64": "AAAAAAAAAAAAAAAAAAAAAA==" }}
}
]
}
encryptedFields: &encrypted_fields_expectation
# Expect state collections are not included in the encryptedFields sent to the server.
escCollection: null
ecocCollection: null
eccCollection: null
fields:
- path: "firstName"
bsonType: "string"
keyId: { $binary: { base64: "AAAAAAAAAAAAAAAAAAAAAA==", subType: "04" }}

Removing the JSON-like syntax with trailing commas.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think rewriting the JSON adds value. The JSON more closely matches the the encryptedFields shown in output of listCollections from the server. The JSON may be easier for developers to visually compare. I prefer not to change.

kmsProviders:
aws: {} # Credentials filled in from environment.
encryptedFieldsMap:
default.encryptedCollection: {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same criticism as the above regarding JSON-like syntax. I'll leave this to you to address. The fields array can be copied over from an earlier suggestion for consistency.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think rewriting the JSON adds value. The JSON more closely matches the the encryptedFields shown in output of listCollections from the server. The JSON may be easier for developers to visually compare. I prefer not to change.

kmsProviders:
local: {'key': {'$binary': {'base64': 'Mng0NCt4ZHVUYUJCa1kxNkVyNUR1QURhZ2h2UzR2d2RrZzh0cFBwM3R6NmdWMDFBMUN3YkQ5aXRRMkhGRGdQV09wOGVNYUMxT2k3NjZKelhaQmRCZGJkTXVyZG9uSjFk', 'subType': '00'}}}
encryptedFieldsMap: {
"default.default": {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize this is outside the scope of the PR, but appears to be more JSON-like syntax in YAML. This probably deserves another ticket to clean up in a separate PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think rewriting the JSON adds value. The JSON more closely matches the the encryptedFields shown in output of listCollections from the server. The JSON may be easier for developers to visually compare. I prefer not to change.

{
"escCollection": "enxcol_.default.esc",
"eccCollection": "enxcol_.default.ecc",
"ecocCollection": "enxcol_.default.ecoc",
Copy link
Member

@jmikola jmikola Apr 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kevinAlbs: I just noticed this, but if you're removing the metadata collections here, doesn't that make this file identical to its range-encryptedFields-<type>.json counterpart used by Prose Test 22? The only difference I see is that the range- files also specify contention: 0, which is the default and may be irrelevant.

If so, would it make sense to just stick with a single file and update the yamlfile() calls in the test templates?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related PR where I encountered this: #1399

Copy link
Contributor Author

@kevinAlbs kevinAlbs Apr 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only difference I see is that the range- files also specify contention: 0, which is the default and may be irrelevant.

Another difference: the field names for encryptedDouble / encryptedDecimal in encryptedFields-Range-<type>.json are encryptedDoubleNoPrecision / encryptedDecimalNoPrecision in range-encryptedFields-<type>.json

I suggest filing a DRIVERS ticket for the improvement. This change can be made in a separate PR to avoid adding unrelated changes to this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@katcharov katcharov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants