-
-
Notifications
You must be signed in to change notification settings - Fork 158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TaxBrain and Tax-Calculator results are not the same in 0.7.3 #1119
Comments
When I run with taxcalc version 0.7.3 on my Mac with the latest PUF in my possession (from November 16 of size 53246969), I get the following results:
When I go to the Reform tab on the results page when running this reform on the latest TaxBrain, I see these results: so I get the same answer as TaxBrain. |
I'm guessing from these lines:
you are not running on the release, but since it's only a few commits behind master, I don't think that's the issue. I wonder if your PUF version is more recent. Can you say how many bytes your |
this is the link to my TaxBrain job: http://www.ospc.org/taxbrain/9380/ |
@talumbau said:
Not sure what
Notice none of them contain 53246969 bytes. When I go to the OSPC developer's private Google Doc and download the most recent version available (the one dated 11.21.2016 for taxcalc #1077 and taxdata #48) it is exactly the same size as the last two in my listing above. Where did you get your |
Using @martinholmer's JSON file I was able to reproduce his results on my computer as well. I'm also using the PUF file uploaded on November 21st of size 50953138. |
@martinholmer I forwarded you the email containing the link to the PUF I used. The email was sent by @andersonfrailey on November 7th. It seems clear this is not the best PUF to use. Is that correct? |
@talumbau I also sent out an email on November 21st after TaxData PR 48 was merged with a link to the latest PUF. I will forward that email on to you. |
I can confirm that using the PUF from november 21st produces the same results that @martinholmer and @andersonfrailey report. All of the worker nodes for TaxBrain have the PUF that I was using from the November 7th email. Should those worker nodes be updated with the PUF from November 21? This issue raises the urgency level of the issue @PeterDSteinberg raised on taxdata: |
@talumbau said:
Yes. @talumbau also said:
But there is still no answer to the question posed by @zrisher thirteen days ago. Why not always use the latest |
The exchange of information in Tax-Calculator issue #1119 suggests several ideas for improving the way we distribute the private First, the description of each version of the Second, we need a better method for distributing this file. One idea is to create a conda package containing the
Given that nobody has offered an answer in the two weeks since this concern was raised suggests there might not be an "easy answer". What about a different approach to distributing the Perhaps there is a better approach. Can anybody suggest a better approach? @MattHJensen @talumbau @feenberg @Amy-Xu @andersonfrailey @zrisher @PeterDSteinberg |
Using Combining those ideas, what if we:
So there's no significant change for contributors without access and a single additional step for contributors with access. It won't automatically ensure your package version is correct, but it will throw an error until you make it so. For automated environments, you'll probably always be throwing away your pre-deploy environment and pulling the latest anyway. |
@zrisher, Thanks for contributing to the conversation in #1119 about how to improve the methods we use to distribute the private (1) Automation of
I think we need an explicit conversation about why we should "automate the distribution" of the (2) Data format of the puf distribution:
Not sure I understand your point. Are you suggesting that the private puf data be a binary Pandas DataFrame rather than an ASCII CSV-formatted file? If so, I think this is not a good idea. Yes, it is true that the Records class constructor can accept a DataFrame (rather than a CSV file name). But changing the file format to a DataFrame makes it impossible to analyze the puf data using command-line tools. There is plenty of this sort of work that goes on every day. At this stage of the project, I think we should change the file format of the puf data only if the benefits are quite substantial and exceed by a wide margin the costs of changing the file format. Can that case be made? Or, if I have completely misunderstood your suggestion, please set me straight on your thinking. (3) Process of puf distribution:
What does "no significant change" mean? Does it mean absolutely no change? If not, what would be the changes for those without puf access? For those with puf access, why impose additional costs on them if the puf distribution process "won't automatically ensure your package version is correct"? What's the point of it all? Why not notify each user of the new puf on the private GitHub repo and let them download and install it on their computer(s)? The problem identified in issue #1119 seems to be caused primarily by a lack of communication within the development team and the absence of an easily accessible (but private) place to store new versions of the puf data. I'm not yet convinced that the #1119 problem has much to do with a lack of automation. @MattHJensen @feenberg @Amy-Xu @andersonfrailey @codykallen |
On Wed, 4 Jan 2017, Martin Holmer wrote:
The exchange of information in Tax-Calculator issue #1119 suggests several
ideas for improving the way we distribute the private puf.csv file.
First, the description of each version of the puf.csv file should include
the size of the file in bytes. And it would probably be better to distribute
The number of bytes will depend on the the line-ending convention (2 bytes
for windows, 1 byte for Mac/Linux) - probably the number of records is a
more useful indicator of completeness.
What about a different approach to distributing the puf.csv file that relies
not on conda-package technology but rather uses a GitHub private repository?
Is this the CPS derived file? Then why private?
dan feenberg
|
@feenberg said:
Yes, end-of-line is different on Windows, but one
No, the |
In order to make an intelligent decision about how to improve My earlier comment in the #1119 discussion has already identified the need to make email notifications of new But there seems to be another factor involved in the #1119 mixup. It would seem as if TaxBrain developers are not running the Here is the evidence that suggests that the full Tax-Calculator test suite has not been executed by TaxBrain developers since sometime in November, which is when the @MattHJensen @talumbau @PeterDSteinberg @Amy-Xu @andersonfrailey @zrisher |
I have created a private Anaconda package Get the Anaconda token file from meI'll send you all an email with this token. Save it at Conda install with token
Simple usageWrite the most recent
Now run Tax-Calculator, deploy, etc repositories like usual. Optional Usage - latest puf file as a dataframe
More examples
cc @MattHJensen @talumbau @PeterDSteinberg @Amy-Xu @andersonfrailey @zrisher @martinholmer |
@PeterDSteinberg said:
I haven't received the email you promised. |
Sent. |
@PeterDSteinberg, Thanks for the taxpuf package and the secret token you sent via email. (1) If we ever got a developer who was working on Windows, how would the install work (with (2) How would a member of the development team, or anyone else with access to the OSPC-developed (3) How would a notified user update? Using the The @MattHJensen @talumbau @feenberg @Amy-Xu @andersonfrailey @zrisher @codykallen |
@martinholmer Here are answers to each of your questions:
(again replacing 1b) Yes, Windows users can do the
I think if Tax-Calculator and other repos eventually used |
@PeterDSteinberg said:
Thanks for your answers to my questions. This is nice work: a clear improvement over the current (1) Where is the source code (including the (2) I think it would be a high priority to add the automatic email notification feature into the (3) Once users get the email notification, they would execute this command:
Is that correct? @talumbau @Amy-Xu @andersonfrailey @zrisher @feenberg @GoFroggyRun @codykallen |
@martinholmer Answers:
|
@PeterDSteinberg, Thanks for your speedy and complete answers to my questions in issue #1119. You should be asking @MattHJensen about this, but my initial thought is that putting the |
+1 for the private GH repo and email notifications. The repo should go in the opensourcepolicycenter GH organization rather than open-source-economics. |
@talumbau, do you have an estimate for when you'll be able to update TaxBrain to the nov-21 puf.csv? |
I can do this by end of day today. I'll update this issue when it's done. |
The PUF has been updated on the worker nodes and the services restarted. This taxbrain job: http://www.ospc.org/taxbrain/9428/ confirms that TaxBrain now gives the same results as taxcalc 0.7.3. |
@martinholmer said:
This is a good idea and indeed the |
@talumbau said:
Thanks. Your ideas for more comprehensive pre-deployment TaxBrain testing sound good. |
@martinholmer @talumbau @PeterDSteinberg @MattHJensen It could be useful to document the aforementioned setup and update steps for the |
For a simple income and payroll tax reform, the results generated by TaxBrain and Tax-Calculator are not the same for income taxes. I get the same results on my local computer using either a custom Python script (shown below) or the inctax.py command-line interface to taxcalc. And I get the same results from TaxBrain using either the JSON reform file upload capability or hand-entered parameters.
Notice that payroll taxes under the reform are the same in TaxBrain and Tax-Calculator, but the income tax revenues are lower in TaxBrain than in Tax-Calculator by about 1.1 percent or about $19.1 billion.
I'm not sure where the problem is. Can others replicate my local results on their computers?
Here is the JSON reform file (
file0.json
):Here are the taxcalc-inctax results:
And here is the custom Python script that generates the same local results:
which produces the following results:
@MattHJensen @feenberg @talumbau @Amy-Xu @GoFroggyRun @andersonfrailey @zrisher
The text was updated successfully, but these errors were encountered: