-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Measuring quality of the weighted file for its intended uses #8
Comments
Broad approach to short-term goals -- getting base, syn, and synadj I anticipate general approach as:
|
General approach to comparisons of base, syn, and synadj
Why do all 4 comparisons?
One possible result of test 4 is that we may learn that a constructed file is good for analyzing some kinds of reforms and not others. That would be valuable information for users. |
One possible summary measure for Comparison of base year tax law One summary measure for item 3 in the list above (Comparison of base year tax law) would be to compute the cumulative distribution of weighted total tax (e06500) vs. AGI for our three files (the PUF, the synthetic PUF with synthesized weights, and the synthetic PUF with adjusted weights), where tax and agi are obtained by running the data files through Tax-Calculator. An exploratory look would put all 3 distributions on a graph (similar to the graph in issue #16 but with 3 lines). The comparison could be formalized with two goodness-of-fit statistics (one to compare the fit of syn to base, and one to compare the fit of synadj to base). I don't think it would be a Kolmogorov-Smirnov test because that is univariate whereas this involves 2 variables (total income tax, and AGI), but I am sure we can choose an appropriate test. It might also make sense to do the same comparisons on some of the underlying variables that will have a strong effect on tax calculations, such as major components of income and deductions. |
@donboyd responded looking forward to seeing it. |
I think two of our most important file-quality goals should be having a file that is good for:
Are these both crucial file-quality goals?
Are there other crucial file-quality goals?
How should we operationalize measuring file quality with these goals in mind?
The text was updated successfully, but these errors were encountered: