Ensure pyarrow/interchange passes type checks by mypy, pyright and ty #2

mpelko · 2025-08-20T08:33:55Z

Rationale for this change

Example of fixes required to pass a type check for inline typed python files in pyarrow/interchange.

Allows for passing:

pushd arrow/python
mypy pyarrow/interchange
pyright pyarrow/interchange
ty check pyarrow/interchange
popd

Results of running the above check

BEFORE (main):

...
Found 44 errors in 8 files (checked 5 source files)  # by mypy
...
17 errors, 0 warnings, 0 informations  # by pyright
...
Found 14 diagnostics  # by ty

AFTER (this branch):

Success: no issues found in 5 source files  # mypy
0 errors, 0 warnings, 0 informations  # pyright
WARN ty is pre-release software and not ready for production use. Expect to encounter bugs, missing features, and fatal errors.
Checking ------------------------------------------------------------ 5/5 files                                                                                   All checks passed!  # ty

What changes are included in this PR?

Combination of required type ignores, mainly for missing pyarrow type stubs from coming from cython files.
Some type ignores for untyped external dependencies
Minor changes, either fixing the uncovered issues or adding more precision to type handling.

Are these changes tested?

All tests still pass locally. Additionally, type checks using mypy, pyright and ty on pyarrow/interchange/dataframe.py are now passing.

Are there any user-facing changes?

No.

mpelko · 2025-08-20T08:43:55Z

python/pyarrow/__init__.py

    # Package is not installed, parse git tag at runtime
    try:
-        import setuptools_scm
+        import setuptools_scm  # type: ignore[import-untyped]


Example of external dependencies not providing stubs or being typed.

mpelko · 2025-08-20T08:44:10Z

python/pyarrow/__init__.py

+        __version__ = ""

-import pyarrow.lib as _lib
+import pyarrow.lib as _lib  # type: ignore[import-not-found]


Example of internal stubs missing.

mpelko · 2025-08-20T08:45:45Z

python/pyarrow/compute.py

            return func.call(args, None, memory_pool)
    else:
-        def wrapper(*args, memory_pool=None, options=None, **kwargs):
+        def wrapper(*args, memory_pool=None, options=None, **kwargs):  # type: ignore


This is a nasty one, as the wrapper signature is inconsistent within two logical branches. I couldn't find a quick fix.

mpelko · 2025-08-20T08:48:13Z

python/pyarrow/interchange/column.py


    @property
-    def null_count(self) -> int:
+    def null_count(self) -> int | None:


Using | operator here that is only available from python 3.10 - do we rather want to stick to Optional[X], which is supported earlier already? Or maybe rather add from __future__ import annotations? In any case, we should chose one and be consistent within the project.

Spec uses Optional and we're currently supporting from 3.9 on and pyarrow-stubs uses |. I'd go with |.

Sounds good. Given we're supporting 3.9 and 3.9 did not yet have | I suppose we add from __future__ import annotations?

Switched all the Optional to | within interchange folder. annotations import was arleady there.

mpelko · 2025-08-20T08:49:00Z

python/pyarrow/interchange/column.py

        The metadata for the column. See `DataFrame.metadata` for more details.
        """
-        pass
+        return {}


This didn't break any tests and seemed "the right thing to do", but it would be great if somebody with more insight into the project looks at this.

Empty dict seems ok as per spec this is implementing. Actual metadata would be even better, but that's out of scope :).

mpelko · 2025-08-20T08:50:52Z

python/pyarrow/interchange/from_dataframe.py

+    )
+    offset_buff, offset_dtype = (
+        buffers["offsets"] if buffers["offsets"] else (None, None)
+    )


I'm rather sure this fixes the actual implementation, as I am not really sure TypedDict actually throws any TypeErrors as what the original logic was counting on. I might be wrong, and there was some other reason TypeErrors were expected. Either solution passes the unit tests.

Would this approach work?

validity_buff, validity_dtype = buffers.get("validity", (None, None))

Sadly no, as per the signature of ColumnBuffers validity field is optional and as such the getter actually receives a value (None) which can not be mapped to (None, None). See:

arrow/python/pyarrow/interchange/column.py

Line 125 in a606011

validity: Optional[Tuple[_PyArrowBuffer, Dtype]]

Same for offsets.

We could rework the ColumnBuffer to make this nicer (e.g. Tuple[_PyArrowBuffer, Dtype] is a good candidate for a dedicated type given that it's used in several places), but I'm trying to keep the changes at the minimum here.

Thanks for the explanation! Agreed, let's keep changes to a minimum.

rok

Thanks for doing this @mpelko!
Some comments, but nothing major.
This will be useful to discuss the inline vs stubs annotation for .py files.

rok · 2025-08-21T18:42:20Z

python/pyarrow/interchange/from_dataframe.py

+    )
+    offset_buff, offset_dtype = (
+        buffers["offsets"] if buffers["offsets"] else (None, None)
+    )


Would this approach work?

validity_buff, validity_dtype = buffers.get("validity", (None, None))

rok · 2025-08-21T18:46:00Z

python/pyarrow/interchange/column.py

        The metadata for the column. See `DataFrame.metadata` for more details.
        """
-        pass
+        return {}


Empty dict seems ok as per spec this is implementing. Actual metadata would be even better, but that's out of scope :).

rok · 2025-08-21T18:48:29Z

python/pyarrow/interchange/column.py


    @property
-    def null_count(self) -> int:
+    def null_count(self) -> int | None:


Spec uses Optional and we're currently supporting from 3.9 on and pyarrow-stubs uses |. I'd go with |.

rok · 2025-08-21T18:52:16Z

python/pyarrow/interchange/column.py

+            raise ValueError(
+                "Column offsets buffer must have 2 or 3 buffers, "
+                f"but has {n} buffers: {array.buffers()}"
+            )


Are we sure this won't break anything? Same above.

We're sure it didn't break the tests. :-) I lack the insight in the project to know if this would break other things. Are there cases where the pa.Array can have more than 3 buffers? If so, the existing logic (on main) would return None for _get_data_buffer or for _get_offsets_buffer and the new logic would raise a ValueError.

Alternatively we could simply change the signatures of the corresponding functions to allow for returning optional values?

For offsets I now allow for None as a return of this function, as this is actually ok in the usage. As per above, the _get_data really should return something as it's packed into something that expects some data so I am keeping an explicit ValueError in case there is no data in pa.Array.

I think we never have more then three see (I think variadic buffers are handled differently).

Yeah, I was thinking maybe someone is depending on this returning None.

Let's leave this as is.

rok · 2025-08-22T15:58:55Z

This looks good now. I think we need to wait for the discussion to catch up now :)

Ensure pyarrow/interchange passes type checks by mypy, pyright and ty

a606011

mpelko marked this pull request as ready for review August 20, 2025 08:38

mpelko commented Aug 20, 2025

View reviewed changes

rok suggested changes Aug 21, 2025

View reviewed changes

Address review comments.

32f1d08

Ensure pyarrow/interchange passes type checks by mypy, pyright and ty #2

Are you sure you want to change the base?

Ensure pyarrow/interchange passes type checks by mypy, pyright and ty #2

Uh oh!

Conversation

mpelko commented Aug 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mpelko Aug 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rok left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rok commented Aug 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mpelko commented Aug 20, 2025 •

edited

Loading

mpelko Aug 22, 2025 •

edited

Loading