@WillAyd @jbrockmendel @simonjayhawkins How would you describe current state of things? I've seen that many core components already have pretty good API coverage. Is it realistic to expect this to happen in foreseeable future (let's say next release or two)?

To explain the context of this question ‒ for the last three years I've been working on stub files for Apache Spark. Over this time contact surface between Pandas and PySpark grown significantly, mostly due to introduction and active development of so-called Pandas udfs. Since Pandas doesn't advertise its annotation it effectively creates a growing gap, in practice not covered by type checkers.
Additionally the latest upstream developments utilize type hints in Pandas-dependent components, leading conflicts between static type checking, and upstream runtime requirements.

Furthermore lack of actionable annotations leads to rather ugly escalations in case of polymorphic functions, which accept Pandas objects, as well as other types,

Now... For some time, to partially address the problem, I've been using Protocols and dummy compatibility imports. The idea is basically to:

Extract existing annotations from Pandas (example)
Type ignore missing Pandas imports
Provide intermediate annotations.

This approach is not without its own problems, but does the trick. If Pandas is going to PEP 561 these will become obsolete, but if such move is not going to happen any time soon, I will consider formalizing this approach, and agitating for required adjustments in core PySpark. However, given amount of red tape, I'd really like to avoid it :)

Uh oh!

Typing Stubs and PEP 561 compatibility #28142

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions