Pinning package versions for dependencies? #418

jdebacker · 2018-11-05T18:30:14Z

Following standards from the Tax-Calculator project, the virtual environment for OG-USA only pins to packages based on "at least a recent as..." standards. E.g.,

- scipy>=0.18.1
- matplotlib
- taxcalc>=0.19.0
- numba
- numpy>=1.12.1
- pandas>=0.20.1

The question I have is why use the >=? Why not pin to a specific version where you know the results are going to be produced as expected - and then update to new versions of the dependency packages as they are tested in OG-USA (not as they are released)?

Over the last few weeks, I've run into API changes for dependent packages(Dask, Pandas) breaking B-Tax and OG-USA. Because I hadn't updated my package versions since creating my virtual environment some time ago, I hadn't encountered these issues before other users had runs fail or I had Travis CI testing failures. I've also had regression tests failing because of differences in output from more recent Tax-Calculator versions.

Should the standard for OG-USA be different than TC (i.e., should we pin to specific package versions) since I'm not testing against newer packages at a high frequency)? I feel like I'd prefer to have an environment where the model runs and produces expected results as compared to one that has the most recent package versions, but may cause failures. Can models in the PSL have different package dependencies (I think common dependency needs for PolicyBrain caused me to follow the TC standard)?

cc @rickecon @hdoupe @martinholmer

The text was updated successfully, but these errors were encountered:

martinholmer · 2018-11-05T20:14:19Z

@jdebacker said in OG-USA issue #418:

Over the last few weeks, I've run into API changes for dependent packages (dask, pandas) breaking B-Tax and OG-USA. Because I hadn't updated my package versions since creating my virtual environment some time ago, I hadn't encountered these issues before other users had runs fail or I had Travis CI testing failures. I've also had regression tests failing because of differences in output from more recent Tax-Calculator versions.

You're describing an issue we all deal with, so it is good you've raised this issue for discussion. For the discussion to be useful, we need a little more background on your experiences with OG-USA.

Can you give us a few examples of OG-USA tests failing because you moved to a "more recent Tax-Calculator version"? Sounds like your problems with dask and pandas are those packages changing their API, right? I've certainly had my share of problems with pandas changing their API, but remember they are (relative to numpy) an immature project. The leading zero in their version number 0.N.P is the way in semantic versioning that projects signal their API is still fluid.

But you seem to say that the test failures caused by "more recent Tax-Calculator versions" were caused not by API changes, but by "differences in output." Is that correct?

If there was a Tax-Calculator API change that affected OG-USA in the past year, please share it with us so we can be more aware of what OG-USA is relying on in the Tax-Calculator API.

If there was a change in Tax-Calculator results, those changes could have been caused either by better data or by bug fixes. In either case, it would seem that any model using Tax-Calculator would want correct output as soon as possible.

When you say:

I hadn't encountered these issues before other users had runs fail or I had Travis CI testing failures.

Were the "other users" writing Python scripts that imported the OG-USA conda package ogusa?

How often do you submit a pull request to GitHub (or add another commit to an existing pull request)? That is almost always the way I find that pandas changes have broken Tax-Calculator, or whatever else is happening.

When your say:

I hadn't updated my package versions since creating my virtual environment some time ago

that makes me think that perhaps the easiest way to prevent the problems you are describing is to update your environment frequently (like at least once a month). The whole conda environment capability has its advantages, but you're describing one of its biggest disadvantages: it can get stale leaving you in an out-of-date package environment.

@MattHJensen @rickecon @hdoupe

martinholmer · 2018-11-05T20:27:28Z

@jdebacker asked in OG-USA issue #418:

The question I have is why use the >= [in the package version specification]? Why not pin to a specific version where you know the results are going to be produced as expected?

Have you read the discussion from October 2017 in PolicyBrain issue 697?

jdebacker · 2018-11-05T20:30:54Z

If there was a Tax-Calculator API change that affected OG-USA in the past year, please share it with us so we can be more aware of what OG-USA is relying on in the Tax-Calculator API.

It's been a long time since any API changes have caused me to update OG-USA. So no issue here.

If there was a change in Tax-Calculator results, those changes could have been caused either by better data or by bug fixes. In either case, it would seem that any model using Tax-Calculator would want correct output as soon as possible.

This is true. My "problems" then are just trying to get a set of comparable results for regression testing. This is on OG-USA developers to fix. We need to (1) Save more metadata about what version of TC and PUF file were used for prior regression tests and (2) More flexibility in the tests so that we can understand how output changes when a new TC/PUF are used. I've opened a separate issue on the former topic (Issue #417). I'll need to think more about the later. One idea is to run the new code with the TC/PUF used for the last set of regression test results and then again with the latest TC/PUF and do deltas between the old results we are testing against and the two sets of new results, it's just that is a lot of compute time.

Were the "other users" writing Python scripts that imported the OG-USA conda package ogusa?

They were building the package from source. That's generally the method I encourage since the conda packages of OG-USA and B-Tax haven't been kept up to date (my fault since I was never able to successfully build a package with the PB builder - maybe it's easier now, I haven't tried since April).

How often do you submit a pull request to GitHub (or add another commit to an existing pull request)? That is almost always the way I find that pandas changes have broken Tax-Calculator, or whatever else is happening.

It varies quite a bit. Probably at least every few weeks with OG-USA, but it had been since May with B-Tax.

Thanks for telling me about your experiences. I'm not able to keep the development pace of the TC project, but I'll aim to keep up with the projects I maintain with more frequency. That sounds like the best solution.

jdebacker · 2018-11-05T20:33:55Z

Have you read the discussion from October 2017 in PolicyBrain issue 697?

I hadn't. That is a great summary of this issue. Thanks! I'll close this discussion in light of what was said there.

martinholmer · 2018-11-05T20:47:06Z

@jdebacker said in the discussion of OG-USA issue #418:

Were the "other users" writing Python scripts that imported the OG-USA conda package ogusa?

They were building the package from source. That's generally the method I encourage since the conda packages of OG-USA and B-Tax haven't been kept up to date (my fault since I was never able to successfully build a package with the PB builder - maybe it's easier now, I haven't tried since April).

In just the last few days, I've made several changes to the package-builder that do make it easier to build, and upload to the Anaconda Cloud, packages for models that rely on Tax-Calculator. Formerly, those models included only B-Tax and OG-USA, but now there is another one, Behavioral-Responses.

The latest version of the package-builder can build and upload packages for Behavioral-Responses and B-Tax (haven't yet tried OG-USA). My B-Tax test was to use the current version of Tax-Calculator (0.22.2) and the B-Tax release 0.2.1, which had never had packages created for it (even though there are packages for 0.2.2 in the Anaconda Cloud). Building and uploading the four packages for B-Tax 0.2.1 took about three minutes on my computer.

I plan to test the packages for 0.2.1 and 0.2.2 by installing them (one at a time) on my computer and executing the B-Tax/run_examples/run_btax_example.py script. Is there a file somewhere in the B-Tax repo that contains the results I should expect to get from that example script using B-Tax version 0.2.2?

jdebacker · 2018-11-05T20:54:56Z

Is there a file somewhere in the B-Tax repo that contains the results I should expect to get from that example script using B-Tax version 0.2.2?

There is not. Feel free to open an issue on this. Is the validation scripts for TC something to follow for this type of test?

jdebacker closed this as completed Nov 5, 2018

martinholmer mentioned this issue Nov 5, 2018

"dirty" is in version name if using builder.py PSLmodels/Package-Builder#88

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pinning package versions for dependencies? #418

Pinning package versions for dependencies? #418

jdebacker commented Nov 5, 2018

martinholmer commented Nov 5, 2018

martinholmer commented Nov 5, 2018

jdebacker commented Nov 5, 2018

jdebacker commented Nov 5, 2018

martinholmer commented Nov 5, 2018

jdebacker commented Nov 5, 2018

Pinning package versions for dependencies? #418

Pinning package versions for dependencies? #418

Comments

jdebacker commented Nov 5, 2018

martinholmer commented Nov 5, 2018

martinholmer commented Nov 5, 2018

jdebacker commented Nov 5, 2018

jdebacker commented Nov 5, 2018

martinholmer commented Nov 5, 2018

jdebacker commented Nov 5, 2018