API-26689: VBADocuments
Data Migration to Remove doc_type
from UploadSubmission
Records (Take 2)
#12864
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
The
doc_type
stored inUploadSubmission
records is a consumer-provided string. We previously stopped storing this key-value pair due to PII/PHI concerns, since we can't control what the consumer sends us in this field. To build upon that work, this PR scrubs thedoc_type
key and value from theuploaded_pdf
jsonb column of allUploadSubmission
records.A prior PR attempted to do the same thing, but it wasn't performant enough to run successfully in the Production environment (statement timed out).
The updates in this PR use a combination of
in_batches
(with the default batch size of 1_000) andupdate_all
to scrub thedoc_type
key.in_batches
was used rather thanfind_in_batches
becausein_batches
returns anActiveRecord::Result
, which is compatible withupdate_all
, whilefind_in_batches
returns anArray
.The documentation for
in_batches
uses an example with the default batch size of 1_000 combined withupdate_all
, which makes me feel this query may be performant enough to run successfully. The alternative is going more toward raw SQL and temporary tables for the ID lookup, which is what I will try next if the updates here aren't enough.Related issue(s)
API-26689
Testing done
The rake task was run locally against my database containing a variety of
UploadSubmission
records. The contents of the rake task were also run on the development server and confirmed to work as expected to remove thedoc_type
from theuploaded_pdf
jsonb column and leave the rest of the column's contents intact.The task will be run in the following order on the
vets-api
environments:Screenshots
None
What areas of the site does it impact?
This PR impacts the data stored in
UploadSubmission
records (VBADocuments
module).Acceptance criteria
Requested Feedback
Interested in feedback related to this migration's ability to run successfully in Production against ~2.5 million records.