-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
File Upload Fixes #593
File Upload Fixes #593
Conversation
Also, I think its a good idea if we move the discussion here from now on. |
@hdoupe, could you explore whether the cps.csv.gz file in tax-calculator could be used to write failing tests for the bugs described in #578? |
@martinholmer I think maybe I am missing some understanding here. So when I use the regular taxbrain form (not file input) to do calculations, and I set my start year, it calculates 9 years from that start date. So, if my start date is 2015, it calculates up to 2024. When parsing the So what does the ideal outcome look for you? If we use that sample file, do you want calculations from 2018 until 2026? That is a straightforward fix. Please let me know. |
@brittainhard said:
Yes, exactly.
That's one of the problems. The drop-down menu of start_year values should work exactly the same on the file upload page as it does on the regular TaxBrain form. The start_year is the first of the ten budget years, for which we show results. There is no problem If the tax reform is not implemented until several years into the ten budget year window; the pre-reform years will just show no change in tax revenue. Perhaps some of the confusion is caused by the fact that the way to specify on the regular TaxBrain form a reform that is first implemented after the start_year is not very well documented. If start_year is 2017 (the default) and you want to make all earnings taxable under the payroll tax beginning in 2020, you would enter the following in the appropriate box:
The asterisk symbol means use the value under current law policy and 9e99 is a very large number for the maximum social security taxable earnings parameter. I recommend that you try this to see how it works from the regular TaxBrain form.
No, we always want ten budget years. What we need is exactly the same as on the regular TaxBrain form: start_year specified by the pull-down menu (which has 2017 as the maximum allowable value) and nine more years after the start_year. Does this make sense? If not, ask another question before doing any more coding. |
@MattHJensen Sure thing. I'll look into it. |
I think this highlights a fundamental problem with the start-year dropdown. First, if you want to display an entire 10 year window and your parameters start at a year later than the year specified in the dropdown, you're going to end up with columns in the table that contain no useful information. Second, suppose you specified your start year as 2013 and your first parameters in the reform file started at 2023. You would get a table containing only columns without data, and your parameters would be completely ignored -- the table would only show data up to the year 2022. Moreover, in this case, if we were to make this change to show all 10 years even if there is no data, that can be considered a regression in itself. From a user experience standpoint, this is a problem. Having a table that contains useless information or no information at all does not help the user understand the effects of their reform, and it can confuse the user about how these file-based reforms work. If they are receiving useless information, they might think there is something wrong with their parameters file, when that is not at all the case. The solution I propose is this: include some code that will limit the number of years to be calculated if the start date is after 2017 (this will handle the error), and remove the start year drop-down entirely. This will mean the start year will be the earliest year in the reform file. With that in mind, I have reopened issue #557. I think this is the best way to handle this problem. @MattHJensen @PeterDSteinberg @hdoupe let me know what you think. |
The start_year drop down should specify the beginning of the budget window. If the user makes no parameter changes within the ten-year budget window, then all revenue changes should be zero (not missing), and the current_law and reform revenues should be the same (but not missing). If the user's first parameter change is for the 6th year of the budget window (as defined by the start year dropdown), then the revenue changes for the first five years should be 0 and the change in year 6 should be non zero. All of this is the same as what @martinholmer describes in this comment and what I described in this comment.
We could add a warning that "your first parameter change happens outside of the budget window," but this should probably be discussed and dealt with in a separate issue as an enhancement. |
Here is a test that should pass:
|
@MattHJensen I can make that fix. All I need to do is supply dummy information prepended to the reform data depending on what start year is selected. After that, we will need some tests to handle the case I mentioned before, with the start year of 2013. This will require some change in the |
Thanks.
Conforming to the goal of a unified API for reform files and GUI based inputs, I'd suggest following the same technique that is used for the asterisk (*) operator.
I'd suggest handling this as a separate issue and supplying a warning rather than an error. There are legitimate reasons why a user would set the start year at 2013 and use a reform file with provisions starting in 2025. For example, s/he might be running two simulations with the same reform file, one with the start year at 2013 and one with the start year at 2017, in order to cobble together revenue projections from 2013 through 2026. |
@MattHJensen said:
Yes, I agree, this is not an error. |
Not quite sure I understand this part, can you clarify a bit? Where do you see this asterisk coming into the JSON pushed forward to the server? Is it something like |
The asterisk is not used in the JSON files. My point is just that these two are equivalent
So ideally their API representation is the same. |
This fix is pretty simple. Going to add some regression testing to ensure that first_budget_year is being sent forward (which was the source of the problem). |
@hdoupe did you have any luck using |
@brittainhard Yes,
In the future, it would be nice to have access to the raw cps file. I'm not sure how to import that from
@martinholmer I would be interested in hearing your thoughts on this. |
Something like the above might be useful. But exactly when would it be useful to have CPS data in a pandas DataFrame rather than in a Records object? |
@martinholmer said:
After giving this some more thought, I can't think of a good reason for why we would want to use the raw data. At first, I thought that we would need it since we need the puf data in the webapp, but I realized that we have it because it can't be in the However, for testing the dropq function in the
Or, there could be some other complications there that I don't immediately see. |
@hdoupe said:
Why? What advantage would this provide? |
@martinholmer It would be useful for unit tests. We can set up some regression tests using the full pipeline from webapp to taxcalc. Right now I can only test the API functions in webapp. Thats mainly why I was asking the question. |
@martinholmer I definitely do not have a complete understanding of how the travis testing on github works. But, the way I understand it is that it does not use the However, I could be completely off base here. If so, I am pair programing with @brittainhard tomorrow and maybe he can help clear up some of my confusion. |
@hdoupe said:
That is correct. The
Why not just use the
and all the tests that need
in order to run all the tests. Does this procedure make sense? Can't you use this approach for TaxBrain testing? I don't see an easy way to adapt dropq to work with the |
@martinholmer I like this idea. It would be a huge benefit for making the app more stable. @PeterDSteinberg and I have had talked before about setting up a more extensive testing suite that will use the services to run actual calculations. Some of these tests can be used in Issue #543 makes mention of a larger mocked system for testing interactions, which is good, but we could also benefit from a much larger services based test structure. I think that should be tabled for now and we should go ahead with the tests we have now. My test covers the regression, which was failing to include the budget year value, and we should add an issue for that pytest decorator and local tests. I'll also open a larger discussion issue about ways to enhance testing beyond simple unit testing. |
@PeterDSteinberg @martinholmer @MattHJensen @hdoupe
I was unable to add a unit test that could help us solve the problem with this file, mainly because it would require saving
puf.csv.gz
. I was, however, able to make a small script that duplicates the behavior that is causing the problem.You have to run it from the repository root, and
puf.csv.gz
has to also be in the root. This way, you can test out the function and make changes without having to use celery.Let me know if it helps.
EDIT: You can cause the error by increasing the
year_n
argument, the first argument, to something like 9 or 10.