Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework the tensorflow hook to improve compatibility across different tensorflow versions #46

Merged
merged 9 commits into from
Sep 15, 2020

Conversation

rokm
Copy link
Member

@rokm rokm commented Sep 13, 2020

This PR reworks the tensorflow hook to improve compatibility across different tensorflow versions, with aim to fix issues such as pyinstaller/pyinstaller#5002.

Currently, the tensorflow hook collects the submodules from tensorflow_core in a rather roundabout way: by specifying tensorflow_core as a hidden import, and relying on this to pull in tensorflow_core.python. The hook for tensorflow_core.python is the one that actually collects data and submodules from tensorflow_core.

This mechanism is brittle, and seems to break on:

  • tensorflow 1.15.x: due to import rerouting between tensorflow_core and tensorflow, the tensorflow_core.python is not pulled in
  • tensorflow 2.2.0+: the tensorflow_core module has been removed, and its contents now reside within tensorflow itself.

To fix this issue and improve compatibility across different versions of tensorflow, the tensorflow_core.python hook is removed, and tensorflow hook is modified to collect data and submodules from either tensorflow or tensorflow_core, depending on version:

  • 1.14.x and earlier: collect from tensorflow
  • 1.15.x - 2.1.x: collect from tensorflow_core
  • 2.2x and later: collect from tensorflow

The PR also adds two new tests for verifying that basic functionality of tensorflow actually works, and aim to uncover submodule-not-found errors that are not picked up by the existing (import) test.

In addition, all three tensorflow tests in onefile mode are now skipped, because the latest versions of tensorflow are quite large (~1.5 GB), and the onefile tests sometimes timeout due to compression/decompression times.

Added two new tests for tensorflow in order to verify that at least
the basic tensorflow functionality is both importable and functional:
* a basic use of a Conv2D layer
* a basic model training and validation on MNIST dataset
…fferent versions

Clean and up and rework the tensorflow hook to improve compatibility
across different tensorflow versions.

Currently, the `tensorflow` hook collects the submodules from
`tensorflow_core` in a rather roundabout way: by specifying
`tensorflow_core` as a hidden import, and relying on this to pull in
`tensorflow_core.python`. The hook for `tensorflow_core.python` is
the one that actually collects data and submodules from `tensorflow_core`.

This mechanism is brittle, and seems to break on:
* tensorflow 1.15.x: due to import rerouting between `tensorflow_core`
and `tensorflow`, the `tensorflow_core.python` is not pulled in
* tensorflow 2.2.0+: the `tensorflow_core` module has been removed, and
its contents now reside within `tensorflow` itself.

To fix this issue and improve compatibility across different versions
of tensorflow, the `tensorflow_core.python` hook is removed, and
`tensorflow` hook is modified to collect data and submodules from
either `tensorflow` or `tensorflow_core`, depending on version:
* 1.14.x and earlier: collect from `tensorflow`
* 1.15.x - 2.1.x: collect from `tensorflow_core`
* 2.2x and later: collect from `tensorflow`
Skip tensorflow tests in modes other than onedir mode. With recent
tensorflow versions, the resulting dist size could be up to 1.5 GB,
which may cause onefile tests to timeout due to relatively long
compression and decompression time.
@rokm rokm requested review from a team and bwoodsend and removed request for a team September 13, 2020 16:45
@rokm rokm marked this pull request as draft September 13, 2020 16:45
Copy link
Member

@bwoodsend bwoodsend left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really good work. Only issue I see is you forgot the changelog entry 😝 :.

@bwoodsend
Copy link
Member

Ohh and there's that linter to appease. God forbid we allow a 95 character long line into our repository...

@bwoodsend
Copy link
Member

Oh, one genuine problem: You forgot to add tensorflow to the test-requirement.txt so your tests won't be run on CI or by any user unless they already have tensorflow.

Although, in the interest of keeping the CI reasonably quick, I might be tempted to leave it this way. I believe we've already run these tests on all platforms? Is there any need to keep re-running them? @rokm @Legorooj Thoughts on this?

@rokm
Copy link
Member Author

rokm commented Sep 13, 2020

Thanks for the feedback!

Yeah, I figured I'll add the changelog entry after the linter is appeased... (and to add insult to injury, the tests file already contains lines longer than the ones I added here :))

Two questions:

  • Is there a better way to compare versions in the hook file than using is_module_satisfies repeatedly?
    If not, I was planning to work around the linter by defining four booleans:
tf_pre_1_15_0 = is_module_satisfies("tensorflow < 1.15.0")
tf_post_1_15_0 = is_module_satisfies("tensorflow >= 1.15.0")
tf_pre_2_0_0 = is_module_satisfies("tensorflow < 2.0.0")
tf_pre_2_2_0 = is_module_satisfies("tensorflow < 2.2.0")

if tf_pre_1_15_0:
    ...

and using those within the if/else clauses. Would that be OK?

  • For the added tests, would it make more sense to put them in the external scripts instead of having them inline? It might be nicer to break lines in a stand-alone script than within the string...

As for requirements-test.txt, this means that the existing import test wasn't being run, either. We could add it, but it would probably mean testing only against one version of tensorflow. In order to test all different versions, we would need a setup like the one you used last time (different platforms, different python versions, different tf versions) - but I imagine this is infeasible in the grand scheme of things.

On the other hand, with tensorflow, people will likely be using specific versions instead of the latest one due to incompatibilities and the fact that academic code usually doesn't get maintained after the paper publication...

@bwoodsend
Copy link
Member

Is there a better way to compare versions in the hook file than using is_module_satisfies repeatedly?

Umm, well we can circumvent PyInstaller slightly:

from pkg_resources import get_distribution, Requirement
tf_version = get_distribution("tensorflow").version

Then bizarrely you use in to test version constraints. And you still have to write package>1.15 although the name itself is ignored:

tf_version in Requirement.parse("x>1.15")

To be honest, I'd go with what you were suggesting, just because it's less confusing to read.

the existing import test wasn't being run

That's reassuring 🤣. And you're right in that we can only test one version with our current CI setup. I say we leave this one out then. If/when a significantly different update hits either PyInstaller or tensorflow then we can do another 1-shot bulk test again. I'll be sure not to delete that branch - although I have big plans for our CI to make that kind custom of one-off testing more accessible.

would it make more sense to put them in the external scripts
I'd personally prefer all the non-trivial (not just import xxx) test scripts were separate files. That test_packages.py is going to end up enormous otherwise. @Legorooj What do you think?

I think your last point is an important one. Normally we'd be able to forget about much older versions of packages but I guess in this case we can't. Ohh well...

@Legorooj
Copy link
Member

Ignore the linter failures - I've been meaning to adjust the limit to 100 chars.

A minor cleanup; use `is_module_satisfies()` to define version
related booleans, and then use those in the if/else blocks to make
them easier to read.
Also grouped the tensorflow tests together in the test_libraries.py
@rokm
Copy link
Member Author

rokm commented Sep 14, 2020

OK, I moved those is_module_satisfies from if/else block into booleans, and moved the tests into separate scripts. I ended up going with the decorator as well, because with test bodies simplified, it looks better this way.

@rokm rokm marked this pull request as ready for review September 14, 2020 11:24
news/46.update.rst Outdated Show resolved Hide resolved
Copy link
Member

@bwoodsend bwoodsend left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I'm happy. @Legorooj Ready to merge?

Copy link
Member

@Legorooj Legorooj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants