-
Notifications
You must be signed in to change notification settings - Fork 25
INTPYTHON-527 Add Queryable Encryption support #329
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Wrong commit message for 65bd15a and I don't want to force push yet. It should have said:
I'm aware that
|
It's not working as you think it is. As I said elsewhere, Does this fix the "command not supported for auto encryption: buildinfo" error? If so, it's perhaps because I'd suggest to use my patch is as a starting point for maintaining two connections. |
I don't disagree, but it feels a lot like
Yes it works by design, not a side effect. I'm
I'd make a few passes at it but did not get anywhere, I'll try again though. |
Your "stumble" theory of how it's working isn't correct. |
Copy that, thanks! I've removed
Still working on an unencrypted connection, but perhaps the only time we need it is for the version check. |
@ShaneHarvey @Jibola @timgraham FYI here is the
And here is the error again with some additional debug:
And the full traceback:
Test settings:
This is happening in the |
def get_encrypted_fields_map(self, connection): | ||
return { | ||
"fields": [ | ||
field | ||
for app_config in apps.get_app_configs() | ||
for model in router.get_migratable_models( | ||
app_config, connection.alias, include_auto_created=False | ||
) | ||
if getattr(model, "encrypted", False) | ||
for field in connection.schema_editor()._get_encrypted_fields_map(model) | ||
] | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This format doesn't look correct. Doesn't it have to include the database and collection information? Look at the pymongo example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe. If I pass the results as schema_map
to AutoEncryptionOpts
I don't get an error, but I'll try with namespace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add the database name too. Quoting the linked example:
# The MongoDB namespace (db.collection) used to store the
# encrypted documents in this example.
encrypted_namespace = "test.coll"
connection.features.__dict__.pop("supports_queryable_encryption", None) | ||
|
||
def tearDown(self): | ||
connection.features.__dict__.pop("supports_queryable_encryption", None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can use the del ...
version since it will exist by now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can use the
del ...
version since it will exist by now.
By now? I still get an attribute error …
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean in tearDown()
since each test will (presumably) initialize the attribute.
django_mongodb_backend/encryption.py
Outdated
|
||
class EncryptedRouter: | ||
def _get_db_for_model(self, model): | ||
if getattr(model, "encrypted", False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry If I ask something that was previously discussed, but how does it know if a model was an encrypted model or a model that has a field encrypted = models.BooleanField()
. Maybe with getattr(model, "encrypted", False) is True
could save for some false positives.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, thanks. I don't think we've finalized using that conditional yet and more importantly there is one in schema._create_collection
.
return name, path, args, kwargs | ||
|
||
|
||
class EncryptedCharField(EncryptedFieldMixin, models.CharField): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't find in the docs if an encrypted collection could have an aggregate query. So my question is:
does it support all the lookups from CharField ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No aggregation stages are supported and two tests from Django's test_charfield
are failing, though only one is an aggregation stage failure:
======================================================================
ERROR: test_assignment_from_choice_enum (encryption_.test_charfield.TestEncryptedCharField.test_assignment_from_choice_enum)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/alex.clark/Developer/django-mongodb-cli/src/mongo-python-driver/pymongo/synchronous/encryption.py", line 124, in _wrap_encryption_errors
yield
File "/Users/alex.clark/Developer/django-mongodb-cli/src/mongo-python-driver/pymongo/synchronous/encryption.py", line 466, in encrypt
encrypted_cmd = self._auto_encrypter.encrypt(database, encoded_cmd)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/alex.clark/Developer/django-mongodb-cli/.venv/lib/python3.12/site-packages/pymongocrypt/synchronous/auto_encrypter.py", line 44, in encrypt
return run_state_machine(ctx, self.callback)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/alex.clark/Developer/django-mongodb-cli/.venv/lib/python3.12/site-packages/pymongocrypt/synchronous/state_machine.py", line 136, in run_state_machine
result = callback.mark_command(ctx.database, mongocryptd_cmd)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/alex.clark/Developer/django-mongodb-cli/src/mongo-python-driver/pymongo/synchronous/encryption.py", line 286, in mark_command
res = self.mongocryptd_client[database].command(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/alex.clark/Developer/django-mongodb-cli/src/mongo-python-driver/pymongo/_csot.py", line 125, in csot_wrapper
return func(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/alex.clark/Developer/django-mongodb-cli/src/mongo-python-driver/pymongo/synchronous/database.py", line 930, in command
return self._command(
^^^^^^^^^^^^^^
File "/Users/alex.clark/Developer/django-mongodb-cli/src/mongo-python-driver/pymongo/synchronous/database.py", line 770, in _command
return conn.command(
^^^^^^^^^^^^^
File "/Users/alex.clark/Developer/django-mongodb-cli/src/mongo-python-driver/pymongo/synchronous/helpers.py", line 47, in inner
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/Users/alex.clark/Developer/django-mongodb-cli/src/mongo-python-driver/pymongo/synchronous/pool.py", line 414, in command
return command(
^^^^^^^^
File "/Users/alex.clark/Developer/django-mongodb-cli/src/mongo-python-driver/pymongo/synchronous/network.py", line 212, in command
helpers_shared._check_command_response(
File "/Users/alex.clark/Developer/django-mongodb-cli/src/mongo-python-driver/pymongo/helpers_shared.py", line 250, in _check_command_response
raise OperationFailure(errmsg, code, response, max_wire_version)
pymongo.errors.OperationFailure: Comparison disallowed between fields where one is randomly encrypted; field 'title' is randomly encrypted., full error: RawBSONDocument(b"\xae\x00\x00\x00\x01ok\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02errmsg\x00k\x00\x00\x00Comparison disallowed between fields where one is randomly encrypted; field 'title' is randomly encrypted.\x00\x10code\x00\xb6y\x00\x00\x02codeName\x00\x0e\x00\x00\x00Location31158\x00\x00", codec_options=CodecOptions(document_class=<class 'bson.raw_bson.RawBSONDocument'>, tz_aware=False, uuid_representation=UuidRepresentation.UNSPECIFIED, unicode_decode_error_handler='strict', tzinfo=None, type_registry=TypeRegistry(type_codecs=[], fallback_encoder=None), datetime_conversion=DatetimeConversion.DATETIME))
ERROR: test_lookup_integer_in_charfield (encryption_.test_charfield.TestEncryptedCharField.test_lookup_integer_in_charfield)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/alex.clark/Developer/django-mongodb-cli/src/mongo-python-driver/pymongo/synchronous/encryption.py", line 124, in _wrap_encryption_errors
yield
File "/Users/alex.clark/Developer/django-mongodb-cli/src/mongo-python-driver/pymongo/synchronous/encryption.py", line 466, in encrypt
encrypted_cmd = self._auto_encrypter.encrypt(database, encoded_cmd)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/alex.clark/Developer/django-mongodb-cli/.venv/lib/python3.12/site-packages/pymongocrypt/synchronous/auto_encrypter.py", line 44, in encrypt
return run_state_machine(ctx, self.callback)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/alex.clark/Developer/django-mongodb-cli/.venv/lib/python3.12/site-packages/pymongocrypt/synchronous/state_machine.py", line 136, in run_state_machine
result = callback.mark_command(ctx.database, mongocryptd_cmd)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/alex.clark/Developer/django-mongodb-cli/src/mongo-python-driver/pymongo/synchronous/encryption.py", line 286, in mark_command
res = self.mongocryptd_client[database].command(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/alex.clark/Developer/django-mongodb-cli/src/mongo-python-driver/pymongo/_csot.py", line 125, in csot_wrapper
return func(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/alex.clark/Developer/django-mongodb-cli/src/mongo-python-driver/pymongo/synchronous/database.py", line 930, in command
return self._command(
^^^^^^^^^^^^^^
File "/Users/alex.clark/Developer/django-mongodb-cli/src/mongo-python-driver/pymongo/synchronous/database.py", line 770, in _command
return conn.command(
^^^^^^^^^^^^^
File "/Users/alex.clark/Developer/django-mongodb-cli/src/mongo-python-driver/pymongo/synchronous/helpers.py", line 47, in inner
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/Users/alex.clark/Developer/django-mongodb-cli/src/mongo-python-driver/pymongo/synchronous/pool.py", line 414, in command
return command(
^^^^^^^^
File "/Users/alex.clark/Developer/django-mongodb-cli/src/mongo-python-driver/pymongo/synchronous/network.py", line 212, in command
helpers_shared._check_command_response(
File "/Users/alex.clark/Developer/django-mongodb-cli/src/mongo-python-driver/pymongo/helpers_shared.py", line 250, in _check_command_response
raise OperationFailure(errmsg, code, response, max_wire_version)
pymongo.errors.OperationFailure: Aggregation stage $internalFacetTeeConsumer is not allowed or supported with automatic encryption., full error: RawBSONDocument(b'\xa6\x00\x00\x00\x01ok\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02errmsg\x00c\x00\x00\x00Aggregation stage $internalFacetTeeConsumer is not allowed or supported with automatic encryption.\x00\x10code\x00#y\x00\x00\x02codeName\x00\x0e\x00\x00\x00Location31011\x00\x00', codec_options=CodecOptions(document_class=<class 'bson.raw_bson.RawBSONDocument'>, tz_aware=False, uuid_representation=UuidRepresentation.UNSPECIFIED, unicode_decode_error_handler='strict', tzinfo=None, type_registry=TypeRegistry(type_codecs=[], fallback_encoder=None), datetime_conversion=DatetimeConversion.DATETIME))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, Are lookups needed? We could try to add something. But it won't be so easy given we cannot use aggregate.
Only testing EncryptedIntegerField
Both checks cannot exist in the same class else one may be interrupted by the other and fail as a result. Instead, check the version once and cache the results so subsequent checks can check the cache instead of the connection.
Regression in daa9a8e
We need a test that verifies the fields are: - In the collection - Encrypted Possibly via comparing type to bson
Let's leave `EncryptedRouter` in for now and bikeshed at the end. It seems to me as appropriate as any other helper to include, but I'm open to discussion. Also `KMS_CREDENTIALS` are not in use yet. I will test with Azure and AWS prior to the end of this month.
Regression in 8a05af8
Also, - Use db_table in management command - Move feature check to base class
|
||
class Meta: | ||
abstract = True | ||
required_db_features = {"supports_queryable_encryption"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think required_db_features
is appropriate for EncryptedModel
since that will silently cause encrypted models not to be created in user projects.
connection.features.__dict__.pop("supports_queryable_encryption", None) | ||
|
||
def tearDown(self): | ||
connection.features.__dict__.pop("supports_queryable_encryption", None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean in tearDown()
since each test will (presumably) initialize the attribute.
class Meta: | ||
db_table = "billing" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See https://docs.djangoproject.com/en/dev/internals/contributing/writing-code/coding-style/#model-style on positioning of Meta
.
|
||
class Patient(EncryptedModel): | ||
class Meta: | ||
db_table = "patient" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the custom table names are only temporary for debugging or something that's fine, but it's not really appropriate to use an unprefixed table name that could collide with other test apps.
class Meta: | ||
db_table = "billing" | ||
|
||
cc_type = EncryptedCharField("cc_type", max_length=20, queries=QueryType.equality()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you add an explicit verbose name to every field (e.g. "cc_type"
)?
:class:`~pymongo.encryption_options.AutoEncryptionOpts` requires a key vault | ||
namespace to store encryption keys. The key vault namespace is typically a | ||
combination of a database and collection name. ``KEY_VAULT_COLLECTION_NAME`` | ||
and ``KEY_VAULT_DATABASE_NAME`` are defined in :mod:`~django_mongodb_backend.encryption` | ||
and used to create the key vault namespace with can be imported and used as follows. | ||
|
||
``KEY_VAULT_NAMESPACE`` | ||
~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
E.g.:: | ||
|
||
AutoEncryptionOpts( | ||
key_vault_namespace=encryption.KEY_VAULT_NAMESPACE, | ||
... | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be simpler and more transparent to document an example:
AutoEncryptionOpts(
key_vault_namespace="keyvault.__keyvault",
rather than to provide and use a constant.
KMS_CREDENTIALS = { | ||
"aws": { | ||
"key": os.getenv("AWS_KEY_ARN", ""), | ||
"region": os.getenv("AWS_KEY_REGION", ""), | ||
}, | ||
"azure": { | ||
"keyName": os.getenv("AZURE_KEY_NAME", ""), | ||
"keyVaultEndpoint": os.getenv("AZURE_KEY_VAULT_ENDPOINT", ""), | ||
}, | ||
"gcp": { | ||
"projectId": os.getenv("GCP_PROJECT_ID", ""), | ||
"location": os.getenv("GCP_LOCATION", ""), | ||
"keyRing": os.getenv("GCP_KEY_RING", ""), | ||
"keyName": os.getenv("GCP_KEY_NAME", ""), | ||
}, | ||
"kmip": {}, | ||
"local": {}, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there some documentation we can link to help users know how to configure credentials, providers, etc? It doesn't feel like Django's job to document and maintain this sort of mapping.
I also read:
To enable the driver’s behavior to obtain credentials from the environment, add the appropriate key (“aws”, “gcp”, or “azure”) with an empty map to “kms_providers” in either AutoEncryptionOpts or ClientEncryption options.
so this won't work for that use case (I think).
I'd suggest trying to minimize the amount of "helpers" in this PR. We can always add things later if there are user pain points, but I feel these thing shouldn't be our focus for v1. Really, we should enhance MongoDB/pymongo docs if it's unclear how to construct the providers dictionary. I don't think a solution of "set these environment variables instead" is making things simpler.
kms_providers = options._kms_providers | ||
codec_options = CodecOptions() | ||
|
||
ce = ClientEncryption(kms_providers, key_vault_namespace, client, codec_options) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is some example code that uses:
codec_options=client.codec_options
which might be more appropriate (though you wonder why codec_options
and is a required argument if options can be retrieved from the also passed client
).
|
||
ce = ClientEncryption(kms_providers, key_vault_namespace, client, codec_options) | ||
|
||
# TODO: Validate schema! `create_encrypted_collection` appears to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would validating it involve?
class Tee(StringIO): | ||
"""Print the output of management commands to stdout.""" | ||
|
||
def write(self, txt): | ||
sys.stdout.write(txt) | ||
super().write(txt) | ||
|
||
out = Tee() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tee
is for temporary debugging?
(see previous attempts in #318, #319 and #323 for additional context)