Ensure e00600 >= e00650 and e01500 >= e01700 #17

MaxGhenis · 2018-12-11T19:06:04Z

This is a condition to run Tax-Calculator, and as reported by @donboyd5, the current synthesis throws the error:

ValueError: expression "e00600 >= e00650" is not true for every record

Don proposed two approaches:

Compute non-qualified dividends as nqd=e00600 - e00650,
then synthesize nqd and e00650 independently,
then compute the total as the sum of the two.
This is the way the synthpop authors tend to do this, per my email conversation with them.

compute the ratio of qualified to total -- ratio=e00650 / e00600
synthesize e00600 and ratio independently
compute e00650 from the two
I found this to yield higher quality results, but I did not examine it rigorously.

I think we'll need more logic for each of these methods, since the model could still produce cases where nqd < 0 or ratio < 1. In particular, nqd should produce identical outcomes to the current approach of synthesizing e00600 and e00650 separately on average. So we'd probably want to amend these approaches to nqd=max(nqd, 0) (equivalent to e00600=max(e00650, e00600)) or ratio=max(ratio, 1).

That said, ideally the prediction models would figure out this relationship, so it'll be worth re-evaluating after increasing the number of trees (currently 20).

I'd suggest starting with e00600=max(e00650, e00600) as an interim approach.

The text was updated successfully, but these errors were encountered:

donboyd5 · 2018-12-11T20:27:24Z

That sounds like a fine interim approach to me. We can do that with the synthesized file until you implement something else in the synthesis phase.

Here is a table with the distribution of the 4 variables (nqd and ratio calculated as per above) in the training (full puf) set and the first-cut synthesis. It's not a terribly common problem:

One simple way to impose the logic is to use the ratio approach (i.e., predict e00650 and also the ratio e00650 / e00600), fitting and predicting the ratio with CART. Because all of the ratio values computed from the training data will be in [0, 1], the fitted values also will be in [0, 1], and we won't have a problem (that's one reason I used the ratio approach).

MaxGhenis · 2018-12-11T20:40:53Z

Ah you're right, I suppose tree-based methods should make your approaches work, even if the nqd approach would be equivalent to the current behavior in linear models.

donboyd5 · 2018-12-11T21:37:51Z

I've been going through the command-line-interface documentation (https://pslmodels.github.io/Tax-Calculator/) to make sure I prepare the synthesized file properly to run through Tax-Calculator.

Per the documentation, one other variable set raises the same kind of issue as dividends. Tax-Calculator:

also expects that the value of total pension and annuity income (e01500) will be no less than the value of taxable pension and annuity income (e01700) for each filing unit

We can use the same sort of interim approach for this for now, too, adjusting the total (e01500) after synthesis, as needed. I don't think we need a separate issue for this.

The documentation also mentions splitting wages and 2 other variables between prime and spouse. I do that after synthesis, and I think we always will want that to occur in a post synthesis stage (as part of taxdata, normally).

MaxGhenis · 2018-12-15T00:23:48Z

synpuf5 and 6 fix this:

(synth.E00600 < synth.E00650).sum()  # 0
(synth.E01500 < synth.E01700).sum()  # 0

@donboyd5 could you test it out with Tax-Calculator?

FYI these files also use several more seeds for #21 and use 50 rather than 20 trees (runtime near 3 hours on the full set).

donboyd5 · 2018-12-15T00:53:54Z

Yes, tomorrow.

…

On Fri, Dec 14, 2018 at 7:23 PM Max Ghenis ***@***.***> wrote: synpuf5 and 6 fix this: (synth.E00600 < synth.E00650).sum() # 0 @donboyd5 <https://github.com/donboyd5> could you test it out with Tax-Calculator? FYI these files also use several more seeds for #21 <#21> and use 50 rather than 20 trees (runtime near 3 hours on the full set). — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#17 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AGPEmE8cVNciHSaEyusoRe3sIu9Da-lWks5u5EEUgaJpZM4ZOC4O> .

donboyd5 · 2018-12-15T09:55:24Z

synpuf in its entirety runs through Tax-Calculator without a hitch, without needing any adjustments to the variables above.

donboyd5 changed the title ~~Ensure e00600 >= e00650~~ Ensure e00600 >= e00650 and e01500 >= e01700 Dec 13, 2018

MaxGhenis mentioned this issue Dec 15, 2018

Synthesize binary and categorical features as strings or seeds #21

Closed

MaxGhenis self-assigned this Dec 15, 2018

MaxGhenis assigned donboyd5 and unassigned MaxGhenis Dec 15, 2018

donboyd5 mentioned this issue Dec 17, 2018

Comparing results from R package synthpop to our results from synthimpute #27

Open

MaxGhenis mentioned this issue Jan 7, 2019

Should we tweak variables to capture structural relationships between them? #35

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ensure e00600 >= e00650 and e01500 >= e01700 #17

Ensure e00600 >= e00650 and e01500 >= e01700 #17

MaxGhenis commented Dec 11, 2018

donboyd5 commented Dec 11, 2018

MaxGhenis commented Dec 11, 2018

donboyd5 commented Dec 11, 2018 •

edited

Loading

MaxGhenis commented Dec 15, 2018 •

edited

Loading

donboyd5 commented Dec 15, 2018 via email

donboyd5 commented Dec 15, 2018

Ensure e00600 >= e00650 and e01500 >= e01700 #17

Ensure e00600 >= e00650 and e01500 >= e01700 #17

Comments

MaxGhenis commented Dec 11, 2018

donboyd5 commented Dec 11, 2018

MaxGhenis commented Dec 11, 2018

donboyd5 commented Dec 11, 2018 • edited Loading

MaxGhenis commented Dec 15, 2018 • edited Loading

donboyd5 commented Dec 15, 2018 via email

donboyd5 commented Dec 15, 2018

donboyd5 commented Dec 11, 2018 •

edited

Loading

MaxGhenis commented Dec 15, 2018 •

edited

Loading