Should we tweak variables to capture structural relationships between them? #35

MaxGhenis · 2019-01-07T21:30:11Z

We currently avoid synthesizing some invalid relationships like between wages and EITC by calculating variables like EITC via Tax-Calculator rather than synthesis.

We also tweak some variables to work better in synthesis, by modeling e00600 - e00650 and e01500 - e01700 rather than e00600 and e01500, respectively. This ensures that e00600>e00650 and e01500>e01700 as required by Tax-Calculator (see #17).

This issue is to explore whether we should engineer other features to better capture relationships between synthesis, like the latter example. It is motivated by a recent call with Benedetto and Stinson from Census, where they recommended thinking through important structural relationships.

The text was updated successfully, but these errors were encountered:

feenberg · 2019-01-07T23:59:24Z

On Mon, 7 Jan 2019, Max Ghenis wrote: We currently avoid synthesizing some invalid relationships like between wages and EITC by calculating variables like EITC via Tax-Calculator rather than synthesis. We also tweak some variables to work better in synthesis, by modeling e00600 - e00650 and e01500 - e01700 rather than e00600 and e01500, respectively. This ensures that e00600>e00650 and e01500>e01700 as required by Tax-Calculator (see #17). This issue is to explore whether we should engineer other features to better capture relationships between synthesis, like the latter example.

I think there are 2 major places where this might be significant (I don't know if it really is, though). These would be itemization status and AMT status. It is possible that sythesis might produce insufficient itemized deductions to justify itemization in the appropriate number of taxpayers, and AMT income might be attributed to taxpayers with sufficient regular tax to avoid paying AMT. My suggestion is to synthesize extra records, and substitute from among the xtras for any synthetic records that can't balance. Again, I am not sure this is really a big problem. If a taxpayer is synthesized with one deduction early in the process, then he is likely to be synthesized with other deductions. So the shortfall may be minor. If we have some extra sythetic records, though, we can use them instead (and throw away the unused extra synthetic records). Dan

MaxGhenis · 2019-01-08T00:09:05Z

Interesting, we are indeed synthesizing f6251 (Form 6251, Alternative Minimum Tax) and fded (Form of Deduction Code, itemized/standard/neither), both required by Tax-Calculator (spreadsheet). Does Tax-Calculator need these though, or could they be determined by whatever minimizes tax burden? Seems like this would be a valuable Tax-Calculator feature regardless of our project. @andersonfrailey

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should we tweak variables to capture structural relationships between them? #35

Should we tweak variables to capture structural relationships between them? #35

MaxGhenis commented Jan 7, 2019 •

edited

Loading

feenberg commented Jan 7, 2019 via email

MaxGhenis commented Jan 8, 2019

Should we tweak variables to capture structural relationships between them? #35

Should we tweak variables to capture structural relationships between them? #35

Comments

MaxGhenis commented Jan 7, 2019 • edited Loading

feenberg commented Jan 7, 2019 via email

MaxGhenis commented Jan 8, 2019

MaxGhenis commented Jan 7, 2019 •

edited

Loading