How to run PUF-based tests on PRs? #2826

MattHJensen · 2020-08-25T11:02:38Z

MattHJensen
Aug 25, 2020
Maintainer

Currently, only CPS-based tests (and other tests not requiring external tests data) are run automatically on PRs to GH.

This issue is to encourage the exploration of running PUF-based tests on PRs as well.

Running the full test suite on PRs would give contributors greater ownership of their commits, it would result in fewer problems/bugs founds during the release process, and it would be a significant step towards adopting continuous delivery or continuous deployment (to be discussed further in a forthcoming issue).

Challenges:

Tests will run longer and require more resources.
The automated testing will be a new security threat for the PUF.

(1) is unlikely to present a significant problem, but it is something to be aware of.

For (2), a primary concern is that someone without access to the PUF could write a new test that prints or otherwise discloses secure data and then use the automated tests, triggered by a PR, to release those data.

One approach to resolving (2) would be that the tests only run automatically on PRs from whitelisted developers; for other developers, a reviewer would need to manually trigger the tests. We would need to find a testing service / automation service that offers this. There may be other solutions. All ideas regarding automation services or other solutions welcome!

hdoupe · 2020-08-25T13:55:39Z

hdoupe
Aug 25, 2020
Maintainer

@MattHJensen I dropped a similar comment in @Peter-Metz's state-taxdata project, but you could read the PUF file from the OSPC AWS bucket like C/S does. To do this, you'd just need to set your AWS access variables as Github secrets. And then you can access them from a Github Actions run.

At some point, I read that only users with collaborator-level access to a repository have the ability to read GH secrets. However, I'm having trouble finding that link right now. There is this quote from the Github Actions docs (emphasis is mine):

You can use and read encrypted secrets in a workflow file if you have access to edit the file. For more information, see "Access permissions on GitHub."

However, I'd like to test this out before making the PUF available.

GitHub will redact secret values if they are printed, too. There's some more info here: https://docs.github.com/en/actions/configuring-and-managing-workflows/creating-and-storing-encrypted-secrets#naming-your-secrets

0 replies

MaxGhenis · 2020-08-25T14:59:58Z

MaxGhenis
Aug 25, 2020

Could the synthetic PUF help?

0 replies

jdebacker · 2020-08-25T16:36:37Z

jdebacker
Aug 25, 2020
Maintainer

The challenges of data privacy seem hard to overcome with outputs from tests being printed to the pytest log files on Travis.

What is the reason behind testing with the PUF exactly? Do these tests pick up (1) errors in the logic? (2) errors related to using variables that don't exist in the PUF? (3) errors in assuming certain values for variables that might differ in the PUF vs CPS? (4) something else?

I think if you identify the issues that are behind your motivation for wanting to include the PUF-based tests, you could perhaps write tests that check against such issues without using the PUF.

E.g., I don't see very good testing of the internal logic of Tax-Calculator. test_calcfunctions.py mostly checks number of arguments, but does not test that given certain values for inputs, certain output values are returned. If each function were tested in that way, errors in tax logic would be caught quickly, without the need to use any particular dataset.

0 replies

rickecon · 2020-08-25T18:43:52Z

rickecon
Aug 25, 2020

We have had this question in OG-USA both with regard to the CI computing time restraint and with regard to the PUF. And we have an additional question about storing data files used only for testing outside of the GitHub repository.

CI Time constraint. Travis CI allows for two processors and 50 minutes of testing per pull request. The full OG-USA testing suite takes between 12-13 hours using 7 processors on a fast machine. So our CI battery is a select group of tests. For major refactoring, we run the full suite of tests on either Jason's or my local machine. Those tests offline tests include tests that use the PUF.
Proprietary data in testing. The only way we see that you can use the PUF in your testing is if the tests are all run remotely on a protected server. Compute.studio can do this. Most most of our other PSLmodels projects do not currently pay for that service. For right now, Jason and I just decided to run all our big refactoring pull requests through the full testing suite manually. But as projects mature, we will need to invest in some more sophisticated infrastructure. And to @MaxGhenis' comment, a synthetic PUF would solve this privacy problem.
Storing testing data off repo. OG-USA has GBs of non-private data used for verification in our unit testing suite. If we had a good way to store this off of the repository and just use the requests library to load it into tests, that would streamline the memory footprint of the repo. However, it takes time to download data, which would eat into our Travis CI compute time limit described in point 1. Furthermore, a package probably doesn't need all the testing data along with it.

cc: @MattHJensen @jdebacker @hdoupe

0 replies

MattHJensen · 2020-08-28T12:28:27Z

MattHJensen
Aug 28, 2020
Maintainer Author

@jdebacker asked why the PUF is used in testing:

What is the reason behind testing with the PUF exactly? Do these tests pick up (1) errors in the logic? (2) errors related to using variables that don't exist in the PUF? (3) errors in assuming certain values for variables that might differ in the PUF vs CPS? (4) something else?

The PUF tests help with (1) and (3) in that we keep track of baseline and can see deviations form them, but those purposes are also backstopped with our validation scripts against TAXSIM, the CPS-based tests, and some other unit testing. As you note, additional unit testing could be added as a further backstop, but I don't see that as a panacea because there's nothing that would prevent logic errors introduced in calcfunctions from being duplicated in the unit tests. The TAXSIM validation scripts and comparison of PUF-based results against other modeling outfits provides 3rd party validation, if incomplete and imperfect.

We also use PUF for (2) to identify which parameters are "compatible" with the PUF, i.e., which parameters influence PUF-based results. See https://github.com/PSLmodels/Tax-Calculator/blob/3b9f4642ba53ed8e5d01155f543ad5d0e6f2f526/taxcalc/tests/test_compatible_data.py.

Mostly we use the PUF for (4), "something else." Specifically, we keep an extensive record of baseline and reform results for Tax-Calculator when used with the puf.csv. These results give users a historical review of the Tax-Calculator + TaxData integrated capabilities and allow users to roughly compare Tax-Calculator baseline and reform against other sources like CBO. We rely on the reform comparisons regularly for development purposes when adding new parameters and reform capabilities.
See https://github.com/PSLmodels/Tax-Calculator/blob/master/taxcalc/tests/test_reforms.py,
https://github.com/PSLmodels/Tax-Calculator/blob/master/taxcalc/tests/test_puf_var_stats.py, https://github.com/PSLmodels/Tax-Calculator/blob/master/taxcalc/tests/test_pufcsv.py.

0 replies

MattHJensen · 2020-09-09T11:45:14Z

MattHJensen
Sep 9, 2020
Maintainer Author

Any reactions to this approach?

Set up a private repository that mirrors Tax-Calculator.
Configure testing in the private mirror to only run PUF-based tests.
Tax-Calculator maintainers check the private repo before merging PRs.

Downsides (and mitigating factors).

Would require figuring out mirroring with an additional wrinkle of different test configs. (But this will be good knowledge for PSL)
Expenses may be associated with testing for the private repo. (But OSPC can likely fund it)
Requires maintainers to check tests in two repos before merging. (But that's not much work)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to run PUF-based tests on PRs? #2826

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 6 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

How to run PUF-based tests on PRs? #2826

MattHJensen Aug 25, 2020 Maintainer

Replies: 6 comments

hdoupe Aug 25, 2020 Maintainer

MaxGhenis Aug 25, 2020

jdebacker Aug 25, 2020 Maintainer

rickecon Aug 25, 2020

MattHJensen Aug 28, 2020 Maintainer Author

MattHJensen Sep 9, 2020 Maintainer Author

MattHJensen
Aug 25, 2020
Maintainer

hdoupe
Aug 25, 2020
Maintainer

MaxGhenis
Aug 25, 2020

jdebacker
Aug 25, 2020
Maintainer

rickecon
Aug 25, 2020

MattHJensen
Aug 28, 2020
Maintainer Author

MattHJensen
Sep 9, 2020
Maintainer Author