Use sampling seed to standardize records subsample in test_pufcsv.py #869

martinholmer · 2016-08-19T20:51:23Z

This pull request eliminates randomness in the sub sample selected in the sampling test added in pull request #844 (as fixed in #864). It also does two other things. First, it moves the comparison of combined tax liabilities generated by the sub-sample and the full-sample into the test_agg() function in order to reduce test execution time. Second, it uses a sampling random-number seed that produces a relatively small difference under current-law policy between the sub-sample and full-sample combined tax liability. Below is the maximum (among the ten years of results from 2013 to 2022) relative difference between the sub-sample combined tax liability and the full-sample combined tax liability for each of nine tested sampling random-number seeds:

SEED  MAX RELATIVE DIFFERENCE (%)
---------------------------------
  10  -1.66 %
  20  -2.65
  30  +3.18
  40  -5.90
  50  +2.20
  60  -2.77
  70  +3.81
  80  +0.61
  90  -2.41
---------------------------------

These results show that picking the "right" sampling seed will make a big difference in user satisfaction.

@MattHJensen @feenberg @talumbau @Amy-Xu @GoFroggyRun @zrisher @codykallen

codecov-io · 2016-08-19T20:54:58Z

Current coverage is 98.12% (diff: 100%)

Merging #869 into master will increase coverage by <.01%

@@             master       #869   diff @@
==========================================
  Files            13         13          
  Lines          1816       1818     +2   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits           1782       1784     +2   
  Misses           34         34          
  Partials          0          0

Powered by Codecov. Last update 9772cd1...430e82f

Use sampling seed to standardize records subsample.

430e82f

talumbau added the in progress label Aug 19, 2016

martinholmer changed the title ~~Use sampling seed to standardize Records subsample in test_pufcsv.py~~ Use sampling seed to standardize records subsample in test_pufcsv.py Aug 19, 2016

martinholmer mentioned this pull request Aug 19, 2016

Known issues with "Quick Calculation" ospc-org/ospc.org#306

Closed

2 tasks

martinholmer merged commit 51f0ccc into PSLmodels:master Aug 22, 2016

talumbau removed the in progress label Aug 22, 2016

martinholmer deleted the sample0 branch August 22, 2016 14:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use sampling seed to standardize records subsample in test_pufcsv.py #869

Use sampling seed to standardize records subsample in test_pufcsv.py #869

martinholmer commented Aug 19, 2016

codecov-io commented Aug 19, 2016 •

edited

Loading

Use sampling seed to standardize records subsample in test_pufcsv.py #869

Use sampling seed to standardize records subsample in test_pufcsv.py #869

Conversation

martinholmer commented Aug 19, 2016

codecov-io commented Aug 19, 2016 • edited Loading

Current coverage is 98.12% (diff: 100%)

codecov-io commented Aug 19, 2016 •

edited

Loading