Initial tensorflow stubs #8974

hmc-cs-mdrissi · 2022-10-24T04:26:05Z

Summary

Initial tensorflow stubs to start with. These are very incomplete given that tensorflow's api is very large. I'm reviving this now as supporting non-types dependencies has made progress and is a requirement for these stubs as they strongly rely on numpy. Similar to opencv stubs this should not merge until non type dependencies work in CI.

These are all manually made based on tensorflow documentation, testing in repl, and usage in large codebase reliant on tensorflow. mypy stubgen last I tried, crashes on tensorflow's codebase and pyright works but produces a large number of false positives. Both mypy and pyright share a weakness in that tensorflow relies on a lot of dynamic module magic with imports. It uses a decorator(tf_export) to re-export class in a different module sometimes even though module may not exist by itself. A secondary issue is most tensorflow classes are defined in internal locations without any private prefixing, and re-exported elsewhere that's documented. And some core tensorflow classes rely on dynamic patching of methods. As an example tf.Tensor class one of the most commonly used types in tensorflow many of it's basic methods like __add__ and __mul__ are dynamically patched on.

I consider this ready for discussion as there are a couple questions I have before moving forward.

What would be best approach for review? I currently have roughly ~3k lines of incomplete stubs for tensorflow and this pr adds some of the basic pieces. Would you prefer a large pr with all I currently have or chop it up into dozen smaller prs of a few hundred lines each? I'm using getattr(name: str) -> Incomplete pattern to minimize false positives users would experience. If I do make a pr with full stubs I have currently I'll need to be careful of that as my current local stubs often missed that because at work I followed a philosophy of false positives are better then missing an error and fix stub as needed. A full version of what I have would look roughly like this pr.
Is it fine to leave default values for functions in stubs? My current stubs include default values and they are mostly int/str/None. I find them useful for IDE usage, but I'm ok dropping them if that's preferred for consistency with most other typeshed stubs. If including defaults is ok, where should I adjust mypy/pyright/etc settings to not produce an error of Default values in stub files should be specified as "..."
Tensor related classes probably should be generic over data type and shape. I'm avoiding that as both would add a lot more complexity to stubs. Shapes especially as typevartuple support is in progress and a lot of shape types may require further peps. DTypes are mostly possible now just may add a lot more overloads/verbosity to track. I could make class generic today but mostly use Any for the type arguments? Not sure on recommended practice here.

The files/classes added here are a couple of the most commonly used ones. If small pr approach is good I'll likely continue by filling in more of tensorflow/__init__.pyi and tensorflow/math.pyi followed by adding in other commonly used files.

hauntsaninja · 2022-10-24T04:46:45Z

You're missing a layer of nesting. It should be stubs/distribution_name/package_name. This explains many of the mypy_primer errors you're seeing.

For the questions you ask, my non-authoritative opinions are:

I'm mostly agnostic. If there's stuff that you expect to be controversial, I would separate it from stuff that is non-controversial. Note that while typeshed's philosophy is that false negatives are better than false positives, it's not the biggest deal if we have some false positives as we work through your PRs. As long as we do this quickly, it won't hurt users, since there are currently zero users of types-tensorflow.
I would prefer omitting default values from stubs.
Let's stick with the simple non-generic stuff for now.

gramster · 2022-10-25T00:34:43Z

I would prefer omitting default values from stubs.

@hauntsaninja, an advantage of including default values is that tools like pyright or pylance will use stubs (if available) for the signatures they show to users when hovering over a function call, and users want to see the default values. I'd like to understand your point of view though as to why you would prefer them removed?

hmc-cs-mdrissi · 2022-10-25T01:34:18Z

I took a look at primer hits.

The jax hit mypy's redefinition rule and they have code like,

try:
  import tensorflow
except:
  tensorflow = None

I'm less sure about what streamlit hit means about note ... here. The code they have is,

try:
    import tensorflow as tf
except ImportError:
    pass

The remaining errors look to be expected from lack of numpy in environment.

hauntsaninja · 2022-10-26T01:45:32Z

@gramster I'm personally okay in theory with default values in stubs if we restrict to simple literals, but it should be its own discussion — my comment here mostly just reflected the status quo, changing of which would need discussion. I also figured we're not really losing that much work on this PR: simple literal default values should be easy to automatically reintroduce to all non-overload stubs and we'd need to do so for all existing stubs anyway.

hauntsaninja · 2022-10-26T01:48:56Z

Anyway, opened #8988 to discuss default values

AlexWaygood · 2023-01-11T00:42:16Z

@hmc-cs-mdrissi, it's now possible to declare a dependency on numpy, if you're still interested in working on this!

hmc-cs-mdrissi · 2023-01-11T04:00:37Z

I am interested. Pyright now passes. mypy primer looks reasonable and is mainly patterns like,

try:
    import tensorflow as tf
except ImportError:
    pass

Next on mypy test, one error I'm unsure how to deal with is,

/tmp/tmpi6bf00j9/.venv--5516585432835101127/lib/python3.10/site-packages/numpy/__init__.pyi:641: error: Positional-only parameters are only supported in Python 3.8 and greater  [syntax]

As tensorflow depends on numpy and numpy already uses positional only syntax. Can I skip mypy 3.7 runs for tensorflow? Does Metadata.toml for typeshed packages have a way to indicate minimum python version?

From the defaults issue discussion sounds like including simple defaults is fine now. flake8 fails but I can make a separate small pr to drop that rule.

The other errors I'll tackle fixing and make sense to me.

hmc-cs-mdrissi · 2023-01-11T04:24:18Z

@rchen152 Any recommendations on pytype error? Looks like it crashes handling numpy.

ValueError: Unreplaced NamedType: 'np.ndarray'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/hostedtoolcache/Python/3.10.9/x64/lib/python3.10/site-packages/pytype/load_pytd.py", line 588, in finish_and_verify_ast
    self._resolver.verify(mod_ast)
  File "/opt/hostedtoolcache/Python/3.10.9/x64/lib/python3.10/site-packages/pytype/load_pytd.py", line 290, in verify
    raise BadDependencyError(str(e), name) from e
pytype.load_pytd.BadDependencyError: Unreplaced NamedType: 'np.ndarray', referenced from 'tensorflow'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/runner/work/typeshed/typeshed/./tests/pytype_test.py", line 74, in run_pytype
    loader.finish_and_verify_ast(ast)
  File "/opt/hostedtoolcache/Python/3.10.9/x64/lib/python3.10/site-packages/pytype/load_pytd.py", line 602, in finish_and_verify_ast
    raise (
pytype.load_pytd.BadDependencyError: Can't find pyi for 'np', referenced from 'tensorflow'

stubs/tensorflow/tensorflow/__init__.pyi

hmc-cs-mdrissi · 2023-01-11T07:20:13Z

I think this is ready for review. CI now almost all passes. Only failures left I'm unsure how to deal with.

Mypy 3.7 fails as numpy uses 3.8 syntax (positional arguments)
Pytype crashes with numpy usage.

Avasam · 2023-01-11T11:15:48Z

pytype.load_pytd.BadDependencyError: Can't find pyi for 'np', referenced from 'tensorflow'

I wonder if it's caused by aliasing a top-level import (import numpy as np).
pytype_test uses the list of external requires found in METADATA.toml to feed pytype modules to ignore.
But if pytype sees the module as np, and ignoring numpy, then it won't match.

AlexWaygood · 2023-01-11T12:19:16Z

Okay, I fixed the mypy issue! It required actually running mypy_test.py on Python 3.7 in CI (rather than just running it with the --python-version=3.7 flag). Running it on Python 3.7 means pip's dependency resolver will install an older, 3.7-compatible version of numpy when 3.7 is selected from the version matrix in CI.

Avasam · 2023-01-11T12:19:45Z

Okay, I fixed the mypy issue! It required actually running mypy_test.py on Python 3.7 in CI (rather than just running it with the --python-version=3.7 flag). Running it on Python 3.7 means pip's dependency resolver will install an older, 3.7-compatible version of numpy when 3.7 is selected from the version matrix in CI.

Or that. Nice :)

AlexWaygood · 2023-01-11T12:24:07Z

@AlexWaygood Alternatively you could try pinning numpy to "<=1.21" until mypy fixes this bug or python 3.7 support is dropped.

Yup, but upper bounds are (in my opinion) very bad and best avoided where possible in Python packaging, especially when you're packaging a library rather than an app 🙂 See https://iscinumpy.dev/post/bound-version-constraints/ for the (very) long version.

Avasam · 2023-01-11T12:26:12Z

Yup, but upper bounds are (in my opinion) very bad and best avoided where possible in Python packaging 🙂 See iscinumpy.dev/post/bound-version-constraints for the very long version.

I edited my previous comment instead. But yeah, I remembered after writing this that requires isn't used only for typeshed and the version restriction would propagate to users of types-tensorflow

AlexWaygood · 2023-01-11T12:49:52Z

I'll separate out the infrastructure changes into another PR (it was useful to edit this PR directly so I could test whether the changes fixed the problem 🙂)

AlexWaygood · 2023-01-11T15:27:36Z

I'll separate out the infrastructure changes into another PR (it was useful to edit this PR directly so I could test whether the changes fixed the problem 🙂)

Allow stubs to depend on numpy #9499

hmc-cs-mdrissi · 2023-01-11T17:14:13Z

pytype.load_pytd.BadDependencyError: Can't find pyi for 'np', referenced from 'tensorflow'

I wonder if it's caused by aliasing a top-level import (import numpy as np). pytype_test uses the list of external requires found in METADATA.toml to feed pytype modules to ignore. But if pytype sees the module as np, and ignoring numpy, then it won't match.

Thank you that fix works. Not using alias for numpy fixes the error and pytype now passes. All checks are now good (one mypy 3.8 error in last run is transient error).

AlexWaygood · 2023-01-11T17:25:48Z

(Marking with do-not-merge for now, as I'd like a decision to be made on #9499 first :)

stubs/tensorflow/METADATA.toml

stubs/tensorflow/tensorflow/__init__.pyi

stubs/tensorflow/tensorflow/dtypes.pyi

Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>

.github/workflows/tests.yml

github-actions · 2023-01-12T10:18:51Z

Diff from mypy_primer, showing the effect of this PR on open source code:

jax (https://github.com/google/jax)
- jax/collect_profile.py:25: error: Cannot find implementation or library stub for module named "tensorflow.python.profiler"  [import]
- jax/experimental/jax2tf/examples/tf_js/quickdraw/quickdraw.py:30: error: Cannot find implementation or library stub for module named "tensorflow"  [import]
- jax/experimental/jax2tf/tests/model_harness.py:36: error: Cannot find implementation or library stub for module named "tensorflow"  [import]
- jax/tools/jax_to_ir.py:86: error: Cannot find implementation or library stub for module named "tensorflow"  [import]
- jax/experimental/jax2tf/tests/converters.py:20: error: Cannot find implementation or library stub for module named "tensorflow"  [import]
- jax/experimental/jax2tf/tests/models_test_main.py:48: error: Cannot find implementation or library stub for module named "tensorflow"  [import]
+ jax/experimental/jax2tf/tests/call_tf_test.py:36: error: Incompatible types in assignment (expression has type "None", variable has type Module)  [assignment]

streamlit (https://github.com/streamlit/streamlit)
+ lib/tests/streamlit/runtime/caching/hashing_test.py:48: note: ... from here:

hmc-cs-mdrissi · 2023-01-14T20:48:39Z

Anything left to merge this? @AlexWaygood After this is merged I'll continue to make tensorflow stub prs of a similar size (few hundred lines) until all the ones I have accumulated are landed.

AlexWaygood · 2023-01-14T20:58:34Z

Anything left to merge this?

This all looks good to me. I don't really feel very qualified to review stubs outside of my pure-Python comfort zone, but if there are any issues here, I think they can be resolved in follow-up PRs as and when they're discovered. This PR has been open long enough now :)

After this is merged I'll continue to make tensorflow stub prs of a similar size (few hundred lines) until all the ones I have accumulated are landed.

That sounds like a great strategy, and will make it much easier to review!

AlexWaygood

Thank you!

Initial tensorflow stubs

fc7a2c0