-
-
Notifications
You must be signed in to change notification settings - Fork 159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve handling of non-positive incomes in tables and graphs #1902
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1902 +/- ##
======================================
Coverage 100% 100%
======================================
Files 37 37
Lines 3371 3312 -59
======================================
- Hits 3371 3312 -59
Continue to review full report at Codecov.
|
Are incomes of zero included in the separate bucket to avoid dividing by zero? For this to be an issue in decile graphs, one would have to exclude benefits (set all weights to zero) and then the share of tax units with zero income would have to nearly triple (e.g. aging into the future with odd dynamic effects). #1888 only concerns negative incomes (aside from my brief comment), which are most important because we don't think they're actually poor, and because other tax analysis groups give them special treatment. @codykallen's research did not show that other groups treat tax units with zero income differently, i.e. they're included as normal members of the bottom decile. It seems unlikely that they're not actually poor, since their business loss would have to be exactly zero, but it could be worth follow-up investigation if anyone's concerned. If you want to offer flexibility for zero incomes, what do you think about splitting into two separate excluded groups, zero and negative? |
On Sun, 4 Mar 2018, Max Ghenis wrote:
Are incomes of zero included in the separate bucket to avoid dividing by zero? For this
If there are any summary measures that are the average of ratios, that is
probably a bad idea. Summaries should be the ratio of averages. That
avoids dividing by zero, and it avoid giving excess weight to observations
with near-zero denominators.
dan
|
@feenberg agreed which is why the entire bottom decile would have to have zero income in order for any zeros to be problematic. Tax units with zero expanded income currently make up ~0.7% of tax units when including benefits, or ~4% without. This could still affect percentile plots though. Zeroing out benefits would create full percentiles with potentially infinite change. Is this the use case you're thinking of @martinholmer? |
@feenberg said in a comment on pull request #1902:
As the source code shows, Tax-Calculator never constructs table statistics as "the average of ratios" and always construct statistics as "the ratio of averages [or weighted sums]". For example, the main "ratio" statistic is the difference table's percentage change in after-tax expanded income. This statistic is calculated by computing the subgroup's weighted sum of after-tax expanded income in the baseline and computing the subgroup's weighted sum of after-tax expanded income in the reform, and then uses these two numbers to compute the percentage change for the subgroup. There is never a case when Tax-Calculator computes the percentage change in after-tax expanded income for individual filing units. |
Let me provide some information about the weighted number of filing units with zero or negative These tabulations use the version of Tax-Calculator that is in pull request #1902, which includes the pension fix for non-filers in the newest First, the SQL tabulation program:
Next are 2015 PUF results (with some hand calculations after the
And now the 2015 CPS results (with some hand calculations after the
And finally here are some of my thoughts. My understanding of the discussion in issue #1888 is that negative Others may differ, but my conclusion is that it is impossible to tell with our data whether those with negative However, there is a practical reason to segregate those with negative There is an additional set of questions about filing units with exactly zero From an economic point of view, it is difficult to determine with our data whether or not they are similar to those with low positive If you want to propose an alternative way of handling those with zero |
Could be, but at 0.8% it's at least a similar order of magnitude to the Survey of Income and Program Participation, which estimated 0.2% in 2012 (see PSLmodels/C-TAM#61).
True but tc reports income, not consumption. As long as there's a clear distinction made, as there must be for this group, I think it's a reasonably legitimate data point. Their assets are also unlikely to be considerable since they don't report capital gains taxes, right?
I'd suggest including them in buckets that also contain tax units with positive income, which would be all deciles and percentiles when including benefits. This part is consistent with other tax analysis groups. Buckets that contain no tax units with positive income (e.g. some percentiles when excluding benefits) would have an undefined % change, so would not be plotted. |
In the PUF data, can you tell why the person with no income is filing a
return? Is it EIC, withholding or AMT?
In the CPS data, are the taxpayers with no income their own household, or
do they live in another's household? If they live in another's household,
I would believe they have low lifetime income.
I do wish we had the SCF (with assets) or the CEX (with consumption)
datasets.
dan
…On Sun, 4 Mar 2018, Martin Holmer wrote:
Let me provide some information about the weighted number of filing units with zero or negative
expanded_income in order to inform the discussion of pull request #1902. First, I show the tabulation
program, then use it to tabulate 2015 baseline policy dump results, and then offer some thoughts on
the results.
These tabulations use the version of Tax-Calculator that is in pull request #1902, which includes the
pension fix for non-filers in the newest puf.csv file described in issue #1892.
First, the SQL tabulation program:
$ head -50 neginc.sql
/*
DEFINITION OF EI:
expanded_income = (
e00200 + # wage and salary income
e00300 + # taxable interest income
e00400 + # non-taxable interest income
e00600 + # dividends
e00700 + # state and local income tax refunds
e00800 + # alimony received
e00900 + # Sch C business net income/loss ***
e01100 + # capital gain distributions not reported on Sch D ***
e01200 + # Form 4797 other net gain/loss ***
e01400 + # taxable IRA distributions
e01500 + # total pension and annuity income
e02000 + # Sch E total rental, ..., partnership, S-corp income/loss ***
e02100 + # Sch F farm net income/loss ***
p22250 + # Sch D: net short-term capital gain/loss ***
p23250 + # Sch D: net long-term capital gain/loss ***
cmbtp + # other AMT taxable income items from Form 6251
0.5 * ptax_was + # employer share of FICA taxes
benefit_value_total + # consumption value of all benefits received
ubi # total UBI benefit
*/
select "#m in total", round(sum(s006)*1e-6,3)
from dump;
select "#m with EI=0", round(sum(s006)*1e-6,3)
from dump where expanded_income = 0;
select "#m with EI<0", round(sum(s006)*1e-6,3)
from dump where expanded_income < 0;
select "#m with EI<0 & e00900<0", round(sum(s006)*1e-6,3)
from dump where expanded_income < 0 and e00900 < 0;
select "#m with EI<0 & e01200<0", round(sum(s006)*1e-6,3)
from dump where expanded_income < 0 and e01200 < 0;
select "#m with EI<0 & e02000<0", round(sum(s006)*1e-6,3)
from dump where expanded_income < 0 and e02000 < 0;
select "#m with EI<0 & e02100<0", round(sum(s006)*1e-6,3)
from dump where expanded_income < 0 and e02100 < 0;
select "#m with EI<0 & CapGain<0", round(sum(s006)*1e-6,3)
from dump where expanded_income < 0 and (p22250+p23250) < 0;
select "#m with EI<0 & cmbtp<0", round(sum(s006)*1e-6,3)
from dump where expanded_income < 0 and cmbtp < 0;
Next are 2015 PUF results (with some hand calculations after the ==>):
$ tc puf.csv 2015 --sqldb
$ cat neginc.sql | sqlite3 puf-15-#-#-#.db
#m in total|164.306
#m with EI=0|3.2 ==> 1.95%
#m with EI<0|1.254 ==> 0.76%
#m with EI<0 & e00900<0|0.444
#m with EI<0 & e01200<0|0.17
#m with EI<0 & e02000<0|0.461
#m with EI<0 & e02100<0|0.039
#m with EI<0 & CapGain<0|0.496
#m with EI<0 & cmbtp<0|0.103
And now the 2015 CPS results (with some hand calculations after the ==>):
$ tc cps.csv 2015 --sqldb
$ cat neginc.sql | sqlite3 cps-15-#-#-#.db
#m in total|163.196
#m with EI=0|1.229 ==> 0.75%
#m with EI<0|0.087 ==> 0.05%
#m with EI<0 & e00900<0|0.083
#m with EI<0 & e01200<0|
#m with EI<0 & e02000<0|
#m with EI<0 & e02100<0|0.007
#m with EI<0 & CapGain<0|
#m with EI<0 & cmbtp<0|
And finally here are some of my thoughts.
My understanding of the discussion in issue #1888 is that negative expanded_income is thought to be a
poor indicator of some more sensible (but unmeasurable with our data) notion of income, and therefore,
filing units with negative expanded_income should be somehow separated out from those with low
positive expanded_income. There is no doubt that in some cases this is true. But the above tabulation
results suggest to me that others with negative expanded_income are quite similar to those with low
positive expanded_income. For example, slightly more than a third of those with negative
expanded_income in the PUF data have negative Schedule C income. In the #1888 discussion much was made
of large loss carryforwards, but surely these (Trump-like) cases are not doing this through an
unincorporated Schedule C business, are they? And then there is the much smaller group whose negative
expanded_income seems to be related to farming losses. Do we not think at least some of these farmers
filing Schedule F are similar to those with low positive expanded_income?
Others may differ, but my conclusion is that it is impossible to tell with our data whether those with
negative expanded_income are similar or not to those with low positive expanded_income. So, I don't
think there is any solid economic argument one way or the other on this matter.
However, there is a practical reason to segregate those with negative expanded_income from those with
low positive expanded_income. It has to do with the misleading results that can be generated for the
key percentage change in after-tax expanded income statistic if a subgroup's baseline after-tax
expanded income is negative (which is common among those with negative expanded_income). This issue
was first raised by @MaxGhenis in #1806. It is this practical reason that is the rationale for this
pull request #1902. Note that in the 2015 PUF results only 0.72 percent of those with negative
expanded_income have positive after-tax expanded income; In the 2015 CPS data that fraction is zero
(in fact, every filing unit with negative expanded_income has negative after-tax expanded income).
There is an additional set of questions about filing units with exactly zero expanded_income. You can
see in the above results that the group with zero expanded_income is much larger than the group with
negative expanded_income in both the PUF and CPS data. In this pull request they have also been
segregated from those with small positive expanded_income and grouped with those with negative
expanded_income. Why? For practical reasons. When we have many expanded_income subgroups (as in the
graph of percentage change in after-tax expanded income by percentile), some of them can consist of
all zeros, which may leave any ratio statistic undefined. This is the reason those percentile are not
shown in the graph of percentage change in after-tax expanded income by percentile. Those with zero
expanded_income are treated the same way in this pull request: they are segregated away from those
with low positive expanded_income and grouped with those with negative expanded_income for practical
reasons.
From an economic point of view, it is difficult to determine with our data whether or not they are
similar to those with low positive expanded_income. My guess, it that many with zero expanded_income
have missing (or incorrectly imputed) income amounts. But others with zero expanded_income probably
really do have zero annual income but are supporting their consumption by drawing down their (perhaps
considerable) assets. So, again we are back to the practical reasons for segregating them.
If you want to propose an alternative way of handling those with zero expanded_income in this pull
request, you need to suggest a way to construct the graph of percentage change in after-tax expanded
income by percentile in a way this is consistent with your proposal in this pull request.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the
thread.[AHvQVbKqCMcE6qJ0zy6RzLdzn3YDTQcpks5tbIFagaJpZM4SbV1X.gif]
|
Dan @feenberg asked in the discussion of pull request #1902:
Below are some tabulations that answer some of your questions. In the PUF data, 80% of those with zero In the CPS data, 51% of those with zero It seems to me that these results suggest that not all of those with zero
|
@MaxGhenis said in the discussion of pull request #1902:
But when grouping filing units by percentiles, this leaves several percentiles with a percentile total for baseline after-tax expanded income of zero. In those cases the percentage change in after-tax expanded income is not defined because of the attempted division by zero. For this practical reason, it seems reasonable to segregate those with zero |
What in these results suggest that? Can you run the numbers for those with expanded income of, say, $1-100? These tax units may be different in some way, but the question I think is whether they're miscategorized as being in the bottom decile. Negative income tax units are believed to not truly belong there (plus messing up the sign of %chg). Is there evidence that those with zero are actually richer than those with low positive income?
Understood, and those could be nulled out / not shown. If the tradeoff is nulling out a null value, vs. excluding them from buckets we believe them to belong to (unless other data shows otherwise) and deviating from other tax analysis groups, I don't see the big harm in nulling out the x/0 cases. And again, these x/0 cases don't actually exist in tc's current form. |
@MaxGhenis said in the discussion of pull request #1902:
I'm not sure what you mean by the "current form" of |
Is that from PUF? Using CPS I see 0.73% of tax units having zero expanded income when advancing to 2018 (notebook). |
@MaxGhenis asked in the discussion of #1902:
Yes. You need to read my earlier comment in this discussion. |
Thanks, I see now that 1.95% of PUF tax units have zero expanded income, or 1.96% of those with nonnegative expanded income. So one or maybe two percentiles after aging would show up as null if they're included. |
I just read through the PR code a bit more, and see that the parameter is named |
@MaxGhenis said:
OK, that's a good suggestion. |
This pull request, which is built on #1901 and consists of commits 4aa34ce, 31bc38f, 84a453d, 4054ebf and 10b274a, attempts to resolve issue #1888. It does this by dividing the bottom decile in the distribution and difference tables into two subgroups: one containing filing units with negative or zero income (either
expanded_income
or AGIc00100
) and the other containing filing units with positive income. The tables compute all statistics for both subgroups, leaving to Tax-Calculator users the decision about whether or not to show the statistics for the non-positive subgroup (only one of which is misleading).The decile graph of percentage change in after-tax expanded income has been revised to allow the user to decide whether or not to show the percentage change for the bottom decile subgroup with non-positive income. The default option is to hide the percentage change of the non-positive subgroup and to scale the width of the positive subgroup's bar so that it is proportional to the weighted number of filing units in that subgroup.
The approach taken in this pull request produces tables in which the components add up to the total, which is an essential logical requirement.