-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure e00600 >= e00650 and e01500 >= e01700 #17
Comments
That sounds like a fine interim approach to me. We can do that with the synthesized file until you implement something else in the synthesis phase. Here is a table with the distribution of the 4 variables (nqd and ratio calculated as per above) in the training (full puf) set and the first-cut synthesis. It's not a terribly common problem: One simple way to impose the logic is to use the ratio approach (i.e., predict e00650 and also the ratio e00650 / e00600), fitting and predicting the ratio with CART. Because all of the ratio values computed from the training data will be in [0, 1], the fitted values also will be in [0, 1], and we won't have a problem (that's one reason I used the ratio approach). |
Ah you're right, I suppose tree-based methods should make your approaches work, even if the |
I've been going through the command-line-interface documentation (https://pslmodels.github.io/Tax-Calculator/) to make sure I prepare the synthesized file properly to run through Tax-Calculator. Per the documentation, one other variable set raises the same kind of issue as dividends. Tax-Calculator:
We can use the same sort of interim approach for this for now, too, adjusting the total (e01500) after synthesis, as needed. I don't think we need a separate issue for this. The documentation also mentions splitting wages and 2 other variables between prime and spouse. I do that after synthesis, and I think we always will want that to occur in a post synthesis stage (as part of taxdata, normally). |
Yes, tomorrow.
…On Fri, Dec 14, 2018 at 7:23 PM Max Ghenis ***@***.***> wrote:
synpuf5 and 6 fix this:
(synth.E00600 < synth.E00650).sum() # 0
@donboyd5 <https://github.com/donboyd5> could you test it out with
Tax-Calculator?
FYI these files also use several more seeds for #21
<#21> and use 50 rather than 20
trees (runtime near 3 hours on the full set).
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#17 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AGPEmE8cVNciHSaEyusoRe3sIu9Da-lWks5u5EEUgaJpZM4ZOC4O>
.
|
synpuf in its entirety runs through Tax-Calculator without a hitch, without needing any adjustments to the variables above. |
This is a condition to run Tax-Calculator, and as reported by @donboyd5, the current synthesis throws the error:
Don proposed two approaches:
I think we'll need more logic for each of these methods, since the model could still produce cases where
nqd < 0
orratio < 1
. In particular,nqd
should produce identical outcomes to the current approach of synthesizinge00600
ande00650
separately on average. So we'd probably want to amend these approaches tonqd=max(nqd, 0)
(equivalent toe00600=max(e00650, e00600)
) orratio=max(ratio, 1)
.That said, ideally the prediction models would figure out this relationship, so it'll be worth re-evaluating after increasing the number of trees (currently 20).
I'd suggest starting with
e00600=max(e00650, e00600)
as an interim approach.The text was updated successfully, but these errors were encountered: