You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've been giving some thought as to how we refactor TaxData and wanted to share what I've come up with here. The general goal is to remove redundant code and set it up to be spun into its own package or as part of a tax data generator app on Compute Studio. Here's my proposed structure with the main directory in bold, subdirectories and files in each directory below
cps_data: everything needed to create tax units from CPS files will be here. Rather than puf_data having it's own set of scripts to make tax units for statistical matching, it will import the needed functions from here.
cps_data/data: will contain all of the CPS and C-TAM files used in the
all of the files currently in cps_data/pycps will be moved up
statmatch: this is a new one. It'll have all of the code used to run a statistical match, generalized to work with more than just the PUF and CPS. I actually have already written most of this. Code can be found here (could also just be a single file, rather than a whole directory)
puf_data:
All of the scripts to prepare the PUF for matching, scripts that call the functions in cps_data to create CPS tax units, run the statistical match, and do all the final prep work.
stage1:
-stage1/data: contains all of the population projections, SOI estimates, CBO projections, etc. used in stage 1 of the extrapolation process.
cps_stage1.py, puf_stage1.py. Since there's some overlap in what these files do, it should be possible to boil these down into something more generalized where it's possible to provide alternative inputs for thinks like the CBO projectons
stage2:
cps_stage2.py, puf_stage2.py, solve_lp_for_year.py. The last one will be re-written so that both the PUF and CPS file can use the same functions. This would mean moving the PUF to the LP model that the CPS uses. All of the specialized code that's in each individual solve_lp_for_year.py file currently will be moved to the specific stage 2 files.
stage3
PUF stage 3 script. Parameterize to take different distributional targets.
All of this is just a rough sketch. Down to change any of it. The general steps to take to get here are
Just move all the files to the new directory structure, but avoid any major changes. After this, all the files we produce should still be exactly the same.
Remove redundant code. Swap to new LP model, use code in cps_data to make all CPS tax units
Generalize as many of the pieces as possible. Move statistical matching to a standalone module, parameterize as many of the inputs as possible.
Parts of the tasks in point 3 could probably be done in conjunction with steps 1 and 2.
The text was updated successfully, but these errors were encountered:
I've been giving some thought as to how we refactor TaxData and wanted to share what I've come up with here. The general goal is to remove redundant code and set it up to be spun into its own package or as part of a tax data generator app on Compute Studio. Here's my proposed structure with the main directory in bold, subdirectories and files in each directory below
cps_data: everything needed to create tax units from CPS files will be here. Rather than
puf_data
having it's own set of scripts to make tax units for statistical matching, it will import the needed functions from here.cps_data/pycps
will be moved upstatmatch: this is a new one. It'll have all of the code used to run a statistical match, generalized to work with more than just the PUF and CPS. I actually have already written most of this. Code can be found here (could also just be a single file, rather than a whole directory)
puf_data:
cps_data
to create CPS tax units, run the statistical match, and do all the final prep work.stage1:
-stage1/data: contains all of the population projections, SOI estimates, CBO projections, etc. used in stage 1 of the extrapolation process.
cps_stage1.py
,puf_stage1.py
. Since there's some overlap in what these files do, it should be possible to boil these down into something more generalized where it's possible to provide alternative inputs for thinks like the CBO projectonsstage2:
cps_stage2.py
,puf_stage2.py
,solve_lp_for_year.py
. The last one will be re-written so that both the PUF and CPS file can use the same functions. This would mean moving the PUF to the LP model that the CPS uses. All of the specialized code that's in each individualsolve_lp_for_year.py
file currently will be moved to the specific stage 2 files.stage3
All of this is just a rough sketch. Down to change any of it. The general steps to take to get here are
cps_data
to make all CPS tax unitsParts of the tasks in point 3 could probably be done in conjunction with steps 1 and 2.
The text was updated successfully, but these errors were encountered: