-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Skip Indexing Custom Privacy Request Field Arrays #5127
Skip Indexing Custom Privacy Request Field Arrays #5127
Conversation
…the value for search if the value is a list. This avoids us potentially indexing values that are too large and with the way we are indexing, this is not going to be useful for search, regardless.
The latest updates on your projects. Learn more about Vercel for Git ↗︎ 1 Skipped Deployment
|
Passing run #9141 ↗︎
Details:
Review all test suite changes for PR #5127 ↗︎ |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #5127 +/- ##
=======================================
Coverage 86.56% 86.56%
=======================================
Files 357 357
Lines 22349 22349
Branches 2955 2954 -1
=======================================
Hits 19347 19347
Misses 2480 2480
Partials 522 522 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK I consolidated some code and added test coverage -
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good Dawn! Thank you for making this change so quickly. Nice test coverage as usual and the sample requests were a nice touch to help with manual testing. I submitted both requests and verified the hashed_value
is no longer set for list values.
fides=# select privacy_request_id, field_name, hashed_value, left(encrypted_value, 16) as encrypted_value from custom_privacy_request_field;
privacy_request_id | field_name | hashed_value | encrypted_value
------------------------------------------+--------------+--------------+------------------
pri_71b295b7-cbf6-4621-a855-162fcf07f345 | internal_ids | | 780gDzPwNE/KKhU0
pri_71b295b7-cbf6-4621-a855-162fcf07f345 | evidence_box | | UbSq53Bt1EB4hIV+
pri_3f8a3c78-a9a0-4fab-8f9a-ebc0e6321989 | internal_ids | | 7hF5S/N7hcJMgZo9
pri_3f8a3c78-a9a0-4fab-8f9a-ebc0e6321989 | evidence_box | | r3pw7c4AgQ5eGG/n
(4 rows)
Thank you @galvana this was quick in part because your CustomPrivacyRequestField.hashed_value was already nullable ⭐ |
Passing run #9147 ↗︎
Details:
Review all test suite changes for PR #5127 ↗︎ |
Closes #PROD-2359
Description Of Changes
Sending in a large list value for a single custom privacy request field results in an extremely lengthy response time that can time out due to 1) hashing the list and 2) trying to then index that list for search.
We're going to skip hashing/index custom privacy request values for search if the supplied value is an array. This avoids us potentially hashing/indexing too large of a value. Further, indexing arrays don't really serve a purpose given that we're indexing the entire array, there's no full text search or anything of that nature enabled.
Code Changes
Steps to Confirm
This request should be extremely quick locally. Previously, this would hang for several minutes and then fail with a (psycopg2.errors.ProgramLimitExceeded) index row requires 51720 bytes, maximum size is 8191
Or POST http://localhost:8080/api/v1/privacy-request/authenticated
Pre-Merge Checklist
CHANGELOG.md
main
downgrade()
migration is correct and works