Adding tools for workbook generation and testing #1723

jadudm · 2023-08-04T10:56:23Z

This brings two tools into the tree that have been used for exploring workbook generation.

generate-sqlite-files takes public, pipe-delimited Census data and turns it into an SQLite3 database. This improves on previous tools that did this in the past.

workbook-generator is a script that takes a DBKEY and an SQLite3 database containing public Census data, and outputs a set of GFAC-style XLSX workbooks containing data.

To do this, workbook-generator

Loads one of our templates using openpyxl
Loads data from the SQLite3 database into the named ranges in the workbook template
Saves the template out to a new filename, populated with data.

The generator attempts to do everything right. For example, it attempts to follow old linking IDs (e.g. ELECAUDITID) and replace them with AWARD-####-style references. Similarly, it unpacks the odd design of the Notes to SEFA tables to convert the Census data back into a functional workbook.

This is not a fully validated tool. However, it does generate workbooks with authentic data, and they have been used to drive upload processes into our system. Using them, we have found errors in our validations that we did not know about previously.

If we continue using this tool, we will likely want to build it into our testing automation process. In theory, we could have hundreds (or thousands) of workbooks generated and ready for testing. We can also generate workbooks that explicitly test (e.g.) having secondary auditors (or not), or that exhibit specific other properties... all from existing, previously-validated data.

The generator also spits out a JSON document. That document records:

The table data was pulled from
The fields pulled
The values in those fields

The purpose of that document is to be able to do something like:

Load the workbooks through our pipeline
Have them undergo validation, cross-val, and ETL
Use the JSON document to compare what we pulled from the SQLite DB to what ended up in our dissemination DB.

In other words, the JSON document is to enable end-to-end testing of the dissemination pipeline. Ideally, we would do that final check using the API. This would let us use the JSON document to generate API calls that query the DB (from the "outside"), and verify that the API produces data we expect.

This brings two tools into the tree that have been used for exploring workbook generation. `generate-sqlite-files` takes public, pipe-delimited Census data and turns it into an SQLite3 database. This improves on previous tools that did this in the past. `workbook-generator` is a script that takes a DBKEY and an SQLite3 database containing public Census data, and outputs a set of GFAC-style XLSX workbooks containing data. To do this, `workbook-generator` 1. Loads one of our templates using `openpyxl` 2. Loads data from the SQLite3 database into the named ranges in the workbook template 3. Saves the template out to a new filename, populated with data. The generator attempts to do everything right. For example, it attempts to follow old linking IDs (e.g. ELECAUDITID) and replace them with AWARD-####-style references. Similarly, it unpacks the odd design of the Notes to SEFA tables to convert the Census data back into a functional workbook. This is not a fully validated tool. However, it does generate workbooks with authentic data, and they have been used to drive upload processes into our system. Using them, we have found errors in our validations that we did not know about previously. If we continue using this tool, we will likely want to build it into our testing automation process. In theory, we could have hundreds (or thousands) of workbooks generated and ready for testing. We can also generate workbooks that explicitly test (e.g.) having secondary auditors (or not), or that exhibit specific other properties... all from existing, previously-validated data. The generator also spits out a JSON document. That document records: 1. The table data was pulled from 2. The fields pulled 3. The values in those fields The purpose of that document is to be able to do something like: 1. Load the workbooks through our pipeline 2. Have them undergo validation, cross-val, and ETL 3. Use the JSON document to compare what we pulled from the SQLite DB to what ended up in our dissemination DB. In other words, the JSON document is to enable end-to-end testing of the dissemination pipeline. Ideally, we would do that final check using the API. This would let us use the JSON document to generate API calls that query the DB (from the "outside"), and verify that the API produces data we expect.

github-actions · 2023-08-04T10:57:15Z

Terraform plan for dev

No changes. Your infrastructure matches the configuration.

No changes. Your infrastructure matches the configuration.

Terraform has compared your real infrastructure against your configuration
and found no differences, so no changes are needed.

✅ Plan applied in Deploy to Development and Management Environment #74

github-actions · 2023-08-04T10:57:16Z

Terraform plan for management

No changes. Your infrastructure matches the configuration.

No changes. Your infrastructure matches the configuration.

Terraform has compared your real infrastructure against your configuration
and found no differences, so no changes are needed.

📝 Plan generated in Pull Request Checks #288

tools/generate-sqlite-files/.gitignore

tools/workbook-generator/.gitignore

asteel-gsa · 2023-08-04T13:16:12Z

tools/generate-sqlite-files/main.py

+    # Do everything in a temp dir.
+    # It will disappear when we hit the end of the with block.


Suggested change

# Do everything in a temp dir.

# It will disappear when we hit the end of the with block.

# Do everything in a temp dir.

# It will disappear when we hit the end of the with block.

This is a non blocker, but what I would personally like, is to have comments removed from file, and put in the tools/generate-sqlite-files/readme.md as code snippets with these comments.

so in readme.md
with tempfile.TemporaryDirectory('_fac') as tdir: will create a temp dir for running this.. etc etc.

Lets pick this up in an iteration. I have a bunch of questions, actually, as to how we would even work some of this into a GH workflow. That would necessarily drive changes to this. For example, we might want to make it a Django command, so it is part of the FAC app.

The number of changes coming for this script could be many. Or, we might dump it. So, at that point, we have opportunities to iterate/reshape the whole thing. Which we might have to.

tools/workbook-generator/data/.gitignore

tools/workbook-generator/historic_workbooks

Co-authored-by: Alex Steel <130377221+asteel-gsa@users.noreply.github.com>

asteel-gsa · 2023-08-04T13:25:33Z

The move of comments into readme is very much a personal thing, it is not at all a necessity and can be ignored. I like the idea of dropping stackoverflow links for references, and its really up to you if you feel that those comments can be documented in a readme vs inside code. Who you ask will tell you yes/no, so i think what im trying to say, if you feel those are worth expanding upon in a readme, great, if not, leave them in if you feel they are necessary to the code.

Otherwise, LGTM @jadudm

jadudm temporarily deployed to dev August 4, 2023 10:56 — with GitHub Actions Inactive

jadudm temporarily deployed to management August 4, 2023 10:56 — with GitHub Actions Inactive

asteel-gsa reviewed Aug 4, 2023

View reviewed changes

tools/generate-sqlite-files/.gitignore Outdated Show resolved Hide resolved

asteel-gsa reviewed Aug 4, 2023

View reviewed changes

tools/workbook-generator/.gitignore Outdated Show resolved Hide resolved

asteel-gsa reviewed Aug 4, 2023

View reviewed changes

tools/workbook-generator/data/.gitignore Outdated Show resolved Hide resolved

asteel-gsa reviewed Aug 4, 2023

View reviewed changes

tools/workbook-generator/historic_workbooks Outdated Show resolved Hide resolved

Update tools/generate-sqlite-files/.gitignore

4546b52

Co-authored-by: Alex Steel <130377221+asteel-gsa@users.noreply.github.com>

jadudm temporarily deployed to dev August 4, 2023 13:20 — with GitHub Actions Inactive

Update tools/workbook-generator/.gitignore

d3f791f

Co-authored-by: Alex Steel <130377221+asteel-gsa@users.noreply.github.com>

jadudm temporarily deployed to management August 4, 2023 13:20 — with GitHub Actions Inactive

jadudm temporarily deployed to management August 4, 2023 13:21 — with GitHub Actions Inactive

jadudm temporarily deployed to dev August 4, 2023 13:21 — with GitHub Actions Inactive

Update tools/workbook-generator/data/.gitignore

b53bdac

Co-authored-by: Alex Steel <130377221+asteel-gsa@users.noreply.github.com>

jadudm temporarily deployed to dev August 4, 2023 13:22 — with GitHub Actions Inactive

jadudm temporarily deployed to management August 4, 2023 13:22 — with GitHub Actions Inactive

Update tools/workbook-generator/historic_workbooks

3099977

Co-authored-by: Alex Steel <130377221+asteel-gsa@users.noreply.github.com>

jadudm temporarily deployed to dev August 4, 2023 13:22 — with GitHub Actions Inactive

jadudm temporarily deployed to management August 4, 2023 13:22 — with GitHub Actions Inactive

asteel-gsa approved these changes Aug 4, 2023

View reviewed changes

jadudm merged commit 1bda9c6 into main Aug 4, 2023

jadudm deleted the jadudm/workbook-testing-tools branch August 4, 2023 21:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding tools for workbook generation and testing #1723

Adding tools for workbook generation and testing #1723

jadudm commented Aug 4, 2023

github-actions bot commented Aug 4, 2023 •

edited

Loading

github-actions bot commented Aug 4, 2023 •

edited

Loading

asteel-gsa Aug 4, 2023

jadudm Aug 4, 2023

asteel-gsa commented Aug 4, 2023

		# Do everything in a temp dir.
		# It will disappear when we hit the end of the with block.

Adding tools for workbook generation and testing #1723

Adding tools for workbook generation and testing #1723

Conversation

jadudm commented Aug 4, 2023

github-actions bot commented Aug 4, 2023 • edited Loading

github-actions bot commented Aug 4, 2023 • edited Loading

asteel-gsa Aug 4, 2023

Choose a reason for hiding this comment

jadudm Aug 4, 2023

Choose a reason for hiding this comment

asteel-gsa commented Aug 4, 2023

github-actions bot commented Aug 4, 2023 •

edited

Loading

github-actions bot commented Aug 4, 2023 •

edited

Loading