API-26689: VBADocuments
Data Migration to Remove doc_type
from UploadSubmission
Records
#12843
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
The
doc_type
stored inUploadSubmission
records is a consumer-provided string. We previously stopped storing this key-value pair due to PII/PHI concerns, since we can't control what the consumer sends us in this field. To build upon that work, this PR scrubs thedoc_type
key and value from theuploaded_pdf
jsonb column of allUploadSubmission
records.Related issue(s)
API-26689
Testing done
The rake task was run locally against my database containing a variety of
UploadSubmission
records. The contents of the rake task were also run on the development server and confirmed to work as expected to remove thedoc_type
from theuploaded_pdf
jsonb column and leave the rest of the column's contents intact.The task will be run in the following order on the
vets-api
environments:Screenshots
None
What areas of the site does it impact?
This PR impacts the data stored in
UploadSubmission
records (VBADocuments
module). It also renames a couple of existingVBADocuments
rake task files to move the date stamp to the front of the file name (for file sorting) and updates the namespace of the data migration tasks so that they're not under a generictemp
namespace.Acceptance criteria
Requested Feedback
Given the large difference in
UploadSubmission
record count between the lower environments and Production, I request that this PR be reviewed through the lens of performance, and I hope that the reviewer will be able to help identify if there are any data loss risks (beyond the intended data loss) to running the migration as written on the Production server. From my research, it appeared to be a performant query (and certainly more performant than updating the records viaActiveRecord
), but a second set of eyes/expertise would be appreciated. Thanks!