Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pinning package versions for dependencies? #418

Closed
jdebacker opened this issue Nov 5, 2018 · 6 comments
Closed

Pinning package versions for dependencies? #418

jdebacker opened this issue Nov 5, 2018 · 6 comments

Comments

@jdebacker
Copy link
Member

Following standards from the Tax-Calculator project, the virtual environment for OG-USA only pins to packages based on "at least a recent as..." standards. E.g.,

- scipy>=0.18.1
- matplotlib
- taxcalc>=0.19.0
- numba
- numpy>=1.12.1
- pandas>=0.20.1

The question I have is why use the >=? Why not pin to a specific version where you know the results are going to be produced as expected - and then update to new versions of the dependency packages as they are tested in OG-USA (not as they are released)?

Over the last few weeks, I've run into API changes for dependent packages(Dask, Pandas) breaking B-Tax and OG-USA. Because I hadn't updated my package versions since creating my virtual environment some time ago, I hadn't encountered these issues before other users had runs fail or I had Travis CI testing failures. I've also had regression tests failing because of differences in output from more recent Tax-Calculator versions.

Should the standard for OG-USA be different than TC (i.e., should we pin to specific package versions) since I'm not testing against newer packages at a high frequency)? I feel like I'd prefer to have an environment where the model runs and produces expected results as compared to one that has the most recent package versions, but may cause failures. Can models in the PSL have different package dependencies (I think common dependency needs for PolicyBrain caused me to follow the TC standard)?

cc @rickecon @hdoupe @martinholmer

@martinholmer
Copy link

@jdebacker said in OG-USA issue #418:

Over the last few weeks, I've run into API changes for dependent packages (dask, pandas) breaking B-Tax and OG-USA. Because I hadn't updated my package versions since creating my virtual environment some time ago, I hadn't encountered these issues before other users had runs fail or I had Travis CI testing failures. I've also had regression tests failing because of differences in output from more recent Tax-Calculator versions.

You're describing an issue we all deal with, so it is good you've raised this issue for discussion. For the discussion to be useful, we need a little more background on your experiences with OG-USA.

Can you give us a few examples of OG-USA tests failing because you moved to a "more recent Tax-Calculator version"? Sounds like your problems with dask and pandas are those packages changing their API, right? I've certainly had my share of problems with pandas changing their API, but remember they are (relative to numpy) an immature project. The leading zero in their version number 0.N.P is the way in semantic versioning that projects signal their API is still fluid.

But you seem to say that the test failures caused by "more recent Tax-Calculator versions" were caused not by API changes, but by "differences in output." Is that correct?

If there was a Tax-Calculator API change that affected OG-USA in the past year, please share it with us so we can be more aware of what OG-USA is relying on in the Tax-Calculator API.

If there was a change in Tax-Calculator results, those changes could have been caused either by better data or by bug fixes. In either case, it would seem that any model using Tax-Calculator would want correct output as soon as possible.

When you say:

I hadn't encountered these issues before other users had runs fail or I had Travis CI testing failures.

Were the "other users" writing Python scripts that imported the OG-USA conda package ogusa?

How often do you submit a pull request to GitHub (or add another commit to an existing pull request)? That is almost always the way I find that pandas changes have broken Tax-Calculator, or whatever else is happening.

When your say:

I hadn't updated my package versions since creating my virtual environment some time ago

that makes me think that perhaps the easiest way to prevent the problems you are describing is to update your environment frequently (like at least once a month). The whole conda environment capability has its advantages, but you're describing one of its biggest disadvantages: it can get stale leaving you in an out-of-date package environment.

@MattHJensen @rickecon @hdoupe

@martinholmer
Copy link

@jdebacker asked in OG-USA issue #418:

The question I have is why use the >= [in the package version specification]? Why not pin to a specific version where you know the results are going to be produced as expected?

Have you read the discussion from October 2017 in PolicyBrain issue 697?

@jdebacker
Copy link
Member Author

If there was a Tax-Calculator API change that affected OG-USA in the past year, please share it with us so we can be more aware of what OG-USA is relying on in the Tax-Calculator API.

It's been a long time since any API changes have caused me to update OG-USA. So no issue here.

If there was a change in Tax-Calculator results, those changes could have been caused either by better data or by bug fixes. In either case, it would seem that any model using Tax-Calculator would want correct output as soon as possible.

This is true. My "problems" then are just trying to get a set of comparable results for regression testing. This is on OG-USA developers to fix. We need to (1) Save more metadata about what version of TC and PUF file were used for prior regression tests and (2) More flexibility in the tests so that we can understand how output changes when a new TC/PUF are used. I've opened a separate issue on the former topic (Issue #417). I'll need to think more about the later. One idea is to run the new code with the TC/PUF used for the last set of regression test results and then again with the latest TC/PUF and do deltas between the old results we are testing against and the two sets of new results, it's just that is a lot of compute time.

Were the "other users" writing Python scripts that imported the OG-USA conda package ogusa?

They were building the package from source. That's generally the method I encourage since the conda packages of OG-USA and B-Tax haven't been kept up to date (my fault since I was never able to successfully build a package with the PB builder - maybe it's easier now, I haven't tried since April).

How often do you submit a pull request to GitHub (or add another commit to an existing pull request)? That is almost always the way I find that pandas changes have broken Tax-Calculator, or whatever else is happening.

It varies quite a bit. Probably at least every few weeks with OG-USA, but it had been since May with B-Tax.

Thanks for telling me about your experiences. I'm not able to keep the development pace of the TC project, but I'll aim to keep up with the projects I maintain with more frequency. That sounds like the best solution.

@jdebacker
Copy link
Member Author

Have you read the discussion from October 2017 in PolicyBrain issue 697?

I hadn't. That is a great summary of this issue. Thanks! I'll close this discussion in light of what was said there.

@martinholmer
Copy link

@jdebacker said in the discussion of OG-USA issue #418:

Were the "other users" writing Python scripts that imported the OG-USA conda package ogusa?

They were building the package from source. That's generally the method I encourage since the conda packages of OG-USA and B-Tax haven't been kept up to date (my fault since I was never able to successfully build a package with the PB builder - maybe it's easier now, I haven't tried since April).

In just the last few days, I've made several changes to the package-builder that do make it easier to build, and upload to the Anaconda Cloud, packages for models that rely on Tax-Calculator. Formerly, those models included only B-Tax and OG-USA, but now there is another one, Behavioral-Responses.

The latest version of the package-builder can build and upload packages for Behavioral-Responses and B-Tax (haven't yet tried OG-USA). My B-Tax test was to use the current version of Tax-Calculator (0.22.2) and the B-Tax release 0.2.1, which had never had packages created for it (even though there are packages for 0.2.2 in the Anaconda Cloud). Building and uploading the four packages for B-Tax 0.2.1 took about three minutes on my computer.

I plan to test the packages for 0.2.1 and 0.2.2 by installing them (one at a time) on my computer and executing the B-Tax/run_examples/run_btax_example.py script. Is there a file somewhere in the B-Tax repo that contains the results I should expect to get from that example script using B-Tax version 0.2.2?

@jdebacker
Copy link
Member Author

Is there a file somewhere in the B-Tax repo that contains the results I should expect to get from that example script using B-Tax version 0.2.2?

There is not. Feel free to open an issue on this. Is the validation scripts for TC something to follow for this type of test?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants