Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API-26689: VBADocuments Data Migration to Remove doc_type from UploadSubmission Records (Take 2) #12864

Merged

Conversation

kristen-brown
Copy link
Contributor

@kristen-brown kristen-brown commented Jun 1, 2023

Summary

The doc_type stored in UploadSubmission records is a consumer-provided string. We previously stopped storing this key-value pair due to PII/PHI concerns, since we can't control what the consumer sends us in this field. To build upon that work, this PR scrubs the doc_type key and value from the uploaded_pdf jsonb column of all UploadSubmission records.

A prior PR attempted to do the same thing, but it wasn't performant enough to run successfully in the Production environment (statement timed out).

The updates in this PR use a combination of in_batches (with the default batch size of 1_000) and update_all to scrub the doc_type key. in_batches was used rather than find_in_batches because in_batches returns an ActiveRecord::Result, which is compatible with update_all, while find_in_batches returns an Array.

The documentation for in_batches uses an example with the default batch size of 1_000 combined with update_all, which makes me feel this query may be performant enough to run successfully. The alternative is going more toward raw SQL and temporary tables for the ID lookup, which is what I will try next if the updates here aren't enough.

Related issue(s)

API-26689

Testing done

The rake task was run locally against my database containing a variety of UploadSubmission records. The contents of the rake task were also run on the development server and confirmed to work as expected to remove the doc_type from the uploaded_pdf jsonb column and leave the rest of the column's contents intact.

The task will be run in the following order on the vets-api environments:

  1. Development (6,415 records) ✅
  2. Staging (21,715 records) ✅
  3. Sandbox (166,619 records) ✅
  4. Production (2,577,864 records)

Screenshots

None

What areas of the site does it impact?

This PR impacts the data stored in UploadSubmission records (VBADocuments module).

Acceptance criteria

  • I fixed|updated|added unit tests and integration tests for each feature (if applicable). – N/A
  • No error nor warning in the console.
  • Events are being sent to the appropriate logging solution – N/A
  • Documentation has been updated (link to documentation) – N/A
  • No sensitive information (i.e. PII/credentials/internal URLs/etc.) is captured in logging, hardcoded, or specs
  • Feature/bug has a monitor built into Datadog or Grafana (if applicable) – N/A
  • If app impacted requires authentication, did you login to a local build and verify all authenticated routes work as expected – N/A
  • I added a screenshot of the developed feature – N/A

Requested Feedback

Interested in feedback related to this migration's ability to run successfully in Production against ~2.5 million records.

Copy link
Contributor

@mathisto mathisto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, GREAT call @kristen-brown. update_all + in_batches is much neater than the raw SQL building we discussed. NICE.

@kristen-brown kristen-brown merged commit df0ff7d into master Jun 1, 2023
@kristen-brown kristen-brown deleted the API-26689-vba-documents-doc-type-data-migration-take-2 branch June 1, 2023 19:31
ryan-mcneil pushed a commit that referenced this pull request Dec 11, 2023
…ploadSubmission` Records (Take 2) (#12864)

* API-26689: Update 'doc_type' removal data migration to perform updates in batches of 1_000

* API-26689: Skip rubocop requirement for use of 'update_all'

* API-26689: Move rubocop enable statement for clarity
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
banana-peels Lighthouse Banana Peels Team Lighthouse lighthouse
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants