Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial tensorflow stubs #8974

Merged
merged 22 commits into from
Jan 14, 2023

Conversation

hmc-cs-mdrissi
Copy link
Contributor

@hmc-cs-mdrissi hmc-cs-mdrissi commented Oct 24, 2022

Summary

Initial tensorflow stubs to start with. These are very incomplete given that tensorflow's api is very large. I'm reviving this now as supporting non-types dependencies has made progress and is a requirement for these stubs as they strongly rely on numpy. Similar to opencv stubs this should not merge until non type dependencies work in CI.

These are all manually made based on tensorflow documentation, testing in repl, and usage in large codebase reliant on tensorflow. mypy stubgen last I tried, crashes on tensorflow's codebase and pyright works but produces a large number of false positives. Both mypy and pyright share a weakness in that tensorflow relies on a lot of dynamic module magic with imports. It uses a decorator(tf_export) to re-export class in a different module sometimes even though module may not exist by itself. A secondary issue is most tensorflow classes are defined in internal locations without any private prefixing, and re-exported elsewhere that's documented. And some core tensorflow classes rely on dynamic patching of methods. As an example tf.Tensor class one of the most commonly used types in tensorflow many of it's basic methods like __add__ and __mul__ are dynamically patched on.

I consider this ready for discussion as there are a couple questions I have before moving forward.

  1. What would be best approach for review? I currently have roughly ~3k lines of incomplete stubs for tensorflow and this pr adds some of the basic pieces. Would you prefer a large pr with all I currently have or chop it up into dozen smaller prs of a few hundred lines each? I'm using getattr(name: str) -> Incomplete pattern to minimize false positives users would experience. If I do make a pr with full stubs I have currently I'll need to be careful of that as my current local stubs often missed that because at work I followed a philosophy of false positives are better then missing an error and fix stub as needed. A full version of what I have would look roughly like this pr.
  2. Is it fine to leave default values for functions in stubs? My current stubs include default values and they are mostly int/str/None. I find them useful for IDE usage, but I'm ok dropping them if that's preferred for consistency with most other typeshed stubs. If including defaults is ok, where should I adjust mypy/pyright/etc settings to not produce an error of Default values in stub files should be specified as "..."
  3. Tensor related classes probably should be generic over data type and shape. I'm avoiding that as both would add a lot more complexity to stubs. Shapes especially as typevartuple support is in progress and a lot of shape types may require further peps. DTypes are mostly possible now just may add a lot more overloads/verbosity to track. I could make class generic today but mostly use Any for the type arguments? Not sure on recommended practice here.

The files/classes added here are a couple of the most commonly used ones. If small pr approach is good I'll likely continue by filling in more of tensorflow/__init__.pyi and tensorflow/math.pyi followed by adding in other commonly used files.

@github-actions

This comment has been minimized.

@hauntsaninja
Copy link
Collaborator

You're missing a layer of nesting. It should be stubs/distribution_name/package_name. This explains many of the mypy_primer errors you're seeing.

For the questions you ask, my non-authoritative opinions are:

  1. I'm mostly agnostic. If there's stuff that you expect to be controversial, I would separate it from stuff that is non-controversial. Note that while typeshed's philosophy is that false negatives are better than false positives, it's not the biggest deal if we have some false positives as we work through your PRs. As long as we do this quickly, it won't hurt users, since there are currently zero users of types-tensorflow.
  2. I would prefer omitting default values from stubs.
  3. Let's stick with the simple non-generic stuff for now.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@gramster
Copy link

gramster commented Oct 25, 2022

  1. I would prefer omitting default values from stubs.

@hauntsaninja, an advantage of including default values is that tools like pyright or pylance will use stubs (if available) for the signatures they show to users when hovering over a function call, and users want to see the default values. I'd like to understand your point of view though as to why you would prefer them removed?

@hmc-cs-mdrissi
Copy link
Contributor Author

I took a look at primer hits.

The jax hit mypy's redefinition rule and they have code like,

try:
  import tensorflow
except:
  tensorflow = None

I'm less sure about what streamlit hit means about note ... here. The code they have is,

try:
    import tensorflow as tf
except ImportError:
    pass

The remaining errors look to be expected from lack of numpy in environment.

@hauntsaninja
Copy link
Collaborator

@gramster I'm personally okay in theory with default values in stubs if we restrict to simple literals, but it should be its own discussion — my comment here mostly just reflected the status quo, changing of which would need discussion. I also figured we're not really losing that much work on this PR: simple literal default values should be easy to automatically reintroduce to all non-overload stubs and we'd need to do so for all existing stubs anyway.

@hauntsaninja
Copy link
Collaborator

Anyway, opened #8988 to discuss default values

@github-actions

This comment has been minimized.

@AlexWaygood
Copy link
Member

@hmc-cs-mdrissi, it's now possible to declare a dependency on numpy, if you're still interested in working on this!

@github-actions

This comment has been minimized.

@hmc-cs-mdrissi
Copy link
Contributor Author

I am interested. Pyright now passes. mypy primer looks reasonable and is mainly patterns like,

try:
    import tensorflow as tf
except ImportError:
    pass

Next on mypy test, one error I'm unsure how to deal with is,

/tmp/tmpi6bf00j9/.venv--5516585432835101127/lib/python3.10/site-packages/numpy/__init__.pyi:641: error: Positional-only parameters are only supported in Python 3.8 and greater  [syntax]

As tensorflow depends on numpy and numpy already uses positional only syntax. Can I skip mypy 3.7 runs for tensorflow? Does Metadata.toml for typeshed packages have a way to indicate minimum python version?

From the defaults issue discussion sounds like including simple defaults is fine now. flake8 fails but I can make a separate small pr to drop that rule.

The other errors I'll tackle fixing and make sense to me.

@hmc-cs-mdrissi
Copy link
Contributor Author

@rchen152 Any recommendations on pytype error? Looks like it crashes handling numpy.

ValueError: Unreplaced NamedType: 'np.ndarray'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/hostedtoolcache/Python/3.10.9/x64/lib/python3.10/site-packages/pytype/load_pytd.py", line 588, in finish_and_verify_ast
    self._resolver.verify(mod_ast)
  File "/opt/hostedtoolcache/Python/3.10.9/x64/lib/python3.10/site-packages/pytype/load_pytd.py", line 290, in verify
    raise BadDependencyError(str(e), name) from e
pytype.load_pytd.BadDependencyError: Unreplaced NamedType: 'np.ndarray', referenced from 'tensorflow'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/runner/work/typeshed/typeshed/./tests/pytype_test.py", line 74, in run_pytype
    loader.finish_and_verify_ast(ast)
  File "/opt/hostedtoolcache/Python/3.10.9/x64/lib/python3.10/site-packages/pytype/load_pytd.py", line 602, in finish_and_verify_ast
    raise (
pytype.load_pytd.BadDependencyError: Can't find pyi for 'np', referenced from 'tensorflow'

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@hmc-cs-mdrissi
Copy link
Contributor Author

I think this is ready for review. CI now almost all passes. Only failures left I'm unsure how to deal with.

  1. Mypy 3.7 fails as numpy uses 3.8 syntax (positional arguments)
  2. Pytype crashes with numpy usage.

@Avasam Avasam mentioned this pull request Jan 11, 2023
@Avasam
Copy link
Collaborator

Avasam commented Jan 11, 2023

pytype.load_pytd.BadDependencyError: Can't find pyi for 'np', referenced from 'tensorflow'

I wonder if it's caused by aliasing a top-level import (import numpy as np).
pytype_test uses the list of external requires found in METADATA.toml to feed pytype modules to ignore.
But if pytype sees the module as np, and ignoring numpy, then it won't match.

@AlexWaygood
Copy link
Member

Okay, I fixed the mypy issue! It required actually running mypy_test.py on Python 3.7 in CI (rather than just running it with the --python-version=3.7 flag). Running it on Python 3.7 means pip's dependency resolver will install an older, 3.7-compatible version of numpy when 3.7 is selected from the version matrix in CI.

@Avasam
Copy link
Collaborator

Avasam commented Jan 11, 2023

Okay, I fixed the mypy issue! It required actually running mypy_test.py on Python 3.7 in CI (rather than just running it with the --python-version=3.7 flag). Running it on Python 3.7 means pip's dependency resolver will install an older, 3.7-compatible version of numpy when 3.7 is selected from the version matrix in CI.

Or that. Nice :)

@AlexWaygood
Copy link
Member

AlexWaygood commented Jan 11, 2023

@AlexWaygood Alternatively you could try pinning numpy to "<=1.21" until mypy fixes this bug or python 3.7 support is dropped.

Yup, but upper bounds are (in my opinion) very bad and best avoided where possible in Python packaging, especially when you're packaging a library rather than an app 🙂 See https://iscinumpy.dev/post/bound-version-constraints/ for the (very) long version.

@Avasam
Copy link
Collaborator

Avasam commented Jan 11, 2023

Yup, but upper bounds are (in my opinion) very bad and best avoided where possible in Python packaging 🙂 See iscinumpy.dev/post/bound-version-constraints for the very long version.

I edited my previous comment instead. But yeah, I remembered after writing this that requires isn't used only for typeshed and the version restriction would propagate to users of types-tensorflow

@github-actions

This comment has been minimized.

@AlexWaygood
Copy link
Member

I'll separate out the infrastructure changes into another PR (it was useful to edit this PR directly so I could test whether the changes fixed the problem 🙂)

@AlexWaygood
Copy link
Member

I'll separate out the infrastructure changes into another PR (it was useful to edit this PR directly so I could test whether the changes fixed the problem 🙂)

@hmc-cs-mdrissi
Copy link
Contributor Author

pytype.load_pytd.BadDependencyError: Can't find pyi for 'np', referenced from 'tensorflow'

I wonder if it's caused by aliasing a top-level import (import numpy as np). pytype_test uses the list of external requires found in METADATA.toml to feed pytype modules to ignore. But if pytype sees the module as np, and ignoring numpy, then it won't match.

Thank you that fix works. Not using alias for numpy fixes the error and pytype now passes. All checks are now good (one mypy 3.8 error in last run is transient error).

@github-actions

This comment was marked as outdated.

@AlexWaygood
Copy link
Member

(Marking with do-not-merge for now, as I'd like a decision to be made on #9499 first :)

Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
@github-actions

This comment was marked as outdated.

@github-actions

This comment was marked as outdated.

@github-actions
Copy link
Contributor

Diff from mypy_primer, showing the effect of this PR on open source code:

jax (https://github.com/google/jax)
- jax/collect_profile.py:25: error: Cannot find implementation or library stub for module named "tensorflow.python.profiler"  [import]
- jax/experimental/jax2tf/examples/tf_js/quickdraw/quickdraw.py:30: error: Cannot find implementation or library stub for module named "tensorflow"  [import]
- jax/experimental/jax2tf/tests/model_harness.py:36: error: Cannot find implementation or library stub for module named "tensorflow"  [import]
- jax/tools/jax_to_ir.py:86: error: Cannot find implementation or library stub for module named "tensorflow"  [import]
- jax/experimental/jax2tf/tests/converters.py:20: error: Cannot find implementation or library stub for module named "tensorflow"  [import]
- jax/experimental/jax2tf/tests/models_test_main.py:48: error: Cannot find implementation or library stub for module named "tensorflow"  [import]
+ jax/experimental/jax2tf/tests/call_tf_test.py:36: error: Incompatible types in assignment (expression has type "None", variable has type Module)  [assignment]

streamlit (https://github.com/streamlit/streamlit)
+ lib/tests/streamlit/runtime/caching/hashing_test.py:48: note: ... from here:

@hmc-cs-mdrissi
Copy link
Contributor Author

Anything left to merge this? @AlexWaygood After this is merged I'll continue to make tensorflow stub prs of a similar size (few hundred lines) until all the ones I have accumulated are landed.

@AlexWaygood
Copy link
Member

Anything left to merge this?

This all looks good to me. I don't really feel very qualified to review stubs outside of my pure-Python comfort zone, but if there are any issues here, I think they can be resolved in follow-up PRs as and when they're discovered. This PR has been open long enough now :)

After this is merged I'll continue to make tensorflow stub prs of a similar size (few hundred lines) until all the ones I have accumulated are landed.

That sounds like a great strategy, and will make it much easier to review!

Copy link
Member

@AlexWaygood AlexWaygood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@AlexWaygood AlexWaygood merged commit ea0ae21 into python:main Jan 14, 2023
@alkatar21 alkatar21 mentioned this pull request Feb 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants