-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve dissemination / deployment related to Public Use File (puf.csv) #53
Comments
In my version of Idea 2,
|
In your view @talumbau of Idea 2, should |
the first one - that is, when executing |
@PeterDSteinberg said:
What is the nature of these problems exactly? |
@martinholmer We have set up Jenkins CI testing in a way that allows running any OG-USA / Tax-Calculator new versions with the same set of reforms that were tested with OG-USA 0.5.5 and Tax-Calc 0.6.6 several months ago. We made an automated system of installing the right versions of the whole stack but found full automation of the regression tests on every PR to OG-USA were not possible because the puf.csv is often changed in conjunction with OG-USA and Tax-Calculator changes. |
The issue is more precisely defined as follows: the PUF is a moving target and the PUF comes with no metadata that allows one to know, for a given instance of the PUF sitting on your computer, which version(s) of Tax-Calculator that PUF is compatible with. This issue presents as a problem in primarily two contexts:
|
@talumbau said in the conversation about taxdata issue #53:
Yes, I can see that there is no metadata embedded in the I can imagine that various clients of Tax-Calculator may need to have a mechanism to select an older version of the In any case, whatever solution you come up with for the clients should leave Tax-Calculator logic unchanged. |
@martinholmer The Anaconda private package would accomplish essentially what the database you mentioned would accomplish, but in a more portable way. Then the Tax-Calculator, B-Tax, and OG-USA repositories can each choose how they want to make use of the new |
@PeterDSteinberg said:
Can you be more specific? What do you mean by "Then the Tax-Calculator, B-Tax, and OG-USA repositories can each choose how they want to make use of the new taxpuf package"? |
@martinholmer I mean that Tax-Calculator, B-Tax and OG-USA could name in their conda.recipe's meta.yaml, a specific version of |
@PeterDSteinberg, I'm also having trouble understanding the specific problems after Tax-Calculator PSLmodels/Tax-Calculator#1035. Is this an accurate example of a problem situation?
If so, why not just always adopt the latest version of puf.csv whenever it comes out and trigger a test at that time? Is the problem that "always adopt the latest version of puf.csv whenever it comes" is a fraught human process? @jdebacker, could you also chime in whith a description of when and why puf.csv releases are painful for you, particularly after PSLmodels/Tax-Calculator#1035? |
@martinholmer, could you describe the downside you see to having a Conda-versioned puf.csv file? |
@PeterDSteinberg @talumbau, would we be able to (easily) maintain backwards compatibility between the current Tax-Calculator and earlier versions of the puf.csv files that have already been distributed with the Anaconda private package system? |
@talumbau @MattHJensen If we put something like |
I asked:
Which @PeterDSteinberg answered:
Got it, thanks. |
@MattHJensen says:
Yes. If we one is not following each change to the TC, it's hard to know a new PUF is needed. And when one is aware, several manual steps are necessary to acquire the file. In my case, I then need to drop any updated PUF into three repos on each machine I'm running TC, B-Tax, and OG-USA on. With 4+ versions of the PUF since July, it can be difficult to keep track of which version is where.
PR #1035 has been very helpful. Now model runs don't break. But as @PeterDSteinberg and @talumbau note above, this PR doesn't solve issues with testing when values in the PUF change across versions. To me, @PeterDSteinberg and @talumbau propose a very elegant solution to this without altering the logic of any of the models and with only a few minor changes to the source code to use the proposed conda package. |
One area to clarify would be to describe how this would impact Tax-Calculator, as this seemed to be an area of concern for @martinholmer. To answer that, I would say that it impacts Tax-Calculator to the extent that the Tax-Calculator package is concerned with where the PUF comes from. To my knowledge, the current repo cares very little about this. A Records object can either be created by passing a path to the PUF file, or passing a DataFrame that one gets from reading in the PUF. There's essentially no logic, as far as I'm aware, as to what happens before this in a user's script. So, as far as I can tell, I don't think it would really matter to the Tax-Calculator, if we added the following capability:
instead of what is often done:
or
Or similar functionality addition if we use Idea 1 above. In the example above, |
@talumbau @martinholmer @MattHJensen @PeterDSteinberg @jdebacker I like the idea of using conda to manage the My biggest concern is this:
Right now We could distribute a separate pip requirements file (which we just got rid of) to contributors with access and make it part of their setup process. It could also hold the package access credentials. But that's replacing a manual dependency deployment problem with a less frequent manual deployment. And it would mean Hoping someone has an easy answer to this. |
@zrisher asked:
This is an excellent question, particularly because the change And I have two more questions about this plan: (1) Will the (2) Will there be absolutely no changes to the If we are going to make this sort of change, the discussion |
The creation of the private taxpuf repository resolves taxdata issue #53, right? Unless somebody corrects my understanding, I'll close this issue on Monday, Feb 6th. |
@PeterDSteinberg, The creation of the private taxpuf repository resolves taxdata issue #53, right? |
Yes, this is resolved by |
Issue - Better track and distribute
puf.csv
We have found OG-USA regression testing difficult because the Tax-Calculator, OG-USA and B-Tax packages that require a
puf.csv
public use file do not maintain metadata about which version of thatpuf.csv
is okay.We have discussed the idea of coming up with a
taxpuf
private Anaconda package to help ensure installations of Tax-Calculator, B-Tax, OG-USA get the rightpuf.csv
. Each of the packages may require a version oftaxpuf
just like requiring a specific version of numpy.There are a few ideas about how to do this with minimal interruption of existing code in Tax-Calculator, B-Tax and OG-USA.
Idea 1 -
write_puf
At the top of our modules in B-Tax, Tax-Calculator, and OG-USA, just put the following:
write_puf()
would writepuf.csv
to the top level of the repo where it is currently expected, but would not write it if themd5sum
indicated no change was needed.Idea 2 -
from taxpuf import PUF
from taxpuf import PUF
thenPUF
is the string contents of the CSV filepuf.csv
. This would require some modifications in the related repositories where currently a CSV path is expected. I don't think we wantfrom taxpuf import PUF
to import a path to a CSV because the path may need to be processed differently, depending on whether it is a path within an egg.Auto-build
taxpuf
packageIn either case of Idea 1 or 2 above, I have some helper scripts started for automatically building a new
taxpuf
package whenever we need to distribute a newpuf.csv
. It is a one line package creation script used like:The above would take puf.csv from the current directory and make it the
0.0.4
version oftaxpuf
package, adding the metadata "testing it out".Thoughts on Idea 1 vs 2?
The text was updated successfully, but these errors were encountered: