Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

📍 [Epic] Migration data loading: final push #4240

Open
4 of 8 tasks
danswick opened this issue Aug 28, 2024 · 8 comments
Open
4 of 8 tasks

📍 [Epic] Migration data loading: final push #4240

danswick opened this issue Aug 28, 2024 · 8 comments

Comments

@danswick
Copy link
Contributor

danswick commented Aug 28, 2024

What problems would we like to solve?

  1. Some historical data is still missing from our public dissemination data.
  2. All migration records (information about how we interpreted data while migrating) have yet to be loaded into a production database.
  3. SF-SAC data cannot be loaded using the same techniques we used for loading dissemination data. It's not clear how we can do so while maintaining a clean foreign key relationship between the user table and the other SF-SAC tables while the database is live and submissions are being added.

How do we know we’re done?

  1. Data from each of the data categories described below has been loaded into production and can be verified using verification methods to be determined and documented.
  2. Scripts and other data loading processes are documented in a single place and can be replicated later if needed.

Who will work on this epic?

@gsa-suk @rocheller123 @sambodeme @jadudm

Where are we now?

This table describes the status of each category of data-that-needs-loading.

Field Count Load method Status Date completed
singleauditchecklists (and friends) ~3,000 Utility ✅ In prod Late June ‘24
dissemination_ ~3,000 Shell script ✅ In prod Late June ‘24
singleauditchecklists (and friends) ~300 Utility Backed up to Drive and need to be loaded. These are leftovers from the ~3,000 batch.
dissemination_ ~300 Shell script Backed up to Drive and need to be loaded. These are leftovers from the ~3,000 batch. ✅ In Prod 09/09/24
dissemination_ ~277,000 ✅ In prod Late Jan ‘24
singleauditchecklists (and friends) ~277,000 Shell script Backed up to S3 and need to be loaded.
migrationstatus ~280,000 Shell script Backed up to drive and need to be loaded. ✅ In Prod 09/09/24
Historic_ tables ? ? Backed up to S3 and need to be loaded. These are the Census tables. ✅ In prod 09/12/24
PDF-only 212 ? Needs decision on how to handle. These are leftover reports that are just PDFs with no SF-SAC data.

Links! Tickets, documents, repos, etc. Things we’ve used to track work in recent months:

What needs to happen next

This list divides the work up into three categories: housekeeping, SF-SAC strategy and loading, and loading everything else. The project team should break the work up into more specific/detailed tickets if needed.

Next steps

  1. 4 of 9

Notes previously: https://docs.google.com/document/d/1wC8PC3_VeAz09-msIL_uIRO9a3tzz1nOgqnbMdWoczE/edit

@gsa-suk
Copy link
Contributor

gsa-suk commented Sep 9, 2024

09/09/24 - Load 318 dissemination data and migration status to Prod (Sudha, Hassan, Rochelle, Matt)
#4266

@gsa-suk
Copy link
Contributor

gsa-suk commented Sep 12, 2024

09/12/24 - Load census historical tables to Prod - (Sudha, Matt, Rochelle)

#4279

@danswick
Copy link
Contributor Author

  • Testing should be completed in the next couple of days.
  • Need to determine communications plan: how long to give notice after testing is complete.

@gsa-suk
Copy link
Contributor

gsa-suk commented Oct 21, 2024

@gsa-suk
Copy link
Contributor

gsa-suk commented Oct 21, 2024

Testing SAC load procedure with Prod data loaded into Staging:

  1. Updated audit_singleauditcheklist with user id from prod auth_user.
  2. Loaded 275K + 316 historic sacs to prod table in Staging.
  3. Working on exporting audit_singleauditchecklist with 330+K audits from Staging to the GFE.

@gsa-suk
Copy link
Contributor

gsa-suk commented Oct 22, 2024

10/22/24 -

  1. Copied 275K+316 reportfile and access data to Staging S3.
  2. Loaded 275K+316 reportfile and access to Staging.
  3. Ran tests to verify loaded data in Staging.

@gsa-suk
Copy link
Contributor

gsa-suk commented Oct 25, 2024

10/25/24 -

Tested 'Maintenance mode toggle' in Staging. This worked well. When maintenance mode was turned on, sac could not be created. When maintenance mode was turned off, sac could be created.

@gsa-suk
Copy link
Contributor

gsa-suk commented Oct 30, 2024

10/29/24 -

Copied ~4.5 G sac data files to Prod S3. #4421

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: In Progress
Development

No branches or pull requests

2 participants