Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Roadmap for Static Type Hints in Low-level Algorithms #11170

Closed
chrisconlan opened this issue May 30, 2018 · 11 comments
Closed

Roadmap for Static Type Hints in Low-level Algorithms #11170

chrisconlan opened this issue May 30, 2018 · 11 comments

Comments

@chrisconlan
Copy link

chrisconlan commented May 30, 2018

Will scikit-learn accept pull requests for low-level code with static type hints? Does it ever plan to?

I don't know where the core team stands on static type hints in Python, but I think they make it 10x easier to grok code and convert users to contributors.

2018 seems like a great time to make this change, in light of Numpy dropping support for new features in Python 2.7. #11115

Supported version would need to change from Python (>= 2.7 or >= 3.4) to Python (>= 3.5) to support static type hints.

@rth
Copy link
Member

rth commented May 30, 2018

That's an interesting question.

From what I understand, static typing won't be useful much (e.g. for mypy) as long as scipy and numpy don't have type annotations. Looking at numpy/numpy#7370 some annotations were added as a stand-alone package in https://github.com/numpy/numpy-stubs . If that works well, maybe a similar approach could be used for scikit-learn? Which mean that interested parties could start working on it (in a separate package) without waiting for #11115, I think...

@chrisconlan
Copy link
Author

chrisconlan commented May 30, 2018

I think something like this could be done in Python 3.5+ without any extra dependencies:

import numpy as np
from typing import List, NewType

# Accepts either a one-dimensional array or a list of floats ...
Array1D = NewType('Array1D', np.ndarray, List[np.float64])

# Accepts either a two-dimensional array or a list of floats ...
Array2D = NewType('Array2D', np.ndarray, List[np.float64, np.float64])

# Define the function with appropriate type hints ...
def gradient_descent(some_input: Array2D, param_one: int, param_two:float) -> Array1D:
    # some logic ...

Just to clarify, this wouldn't help end users by catching compile-time typing errors. This would only be for readability.

I'd jump in on this for sure.

@chrisconlan
Copy link
Author

P.S. I'm happy to see some big organizations supporting mypy. Type-annotated Python is definitely gonna be big: https://github.com/dropbox/mypy-PyCharm-plugin

@rth
Copy link
Member

rth commented May 30, 2018

I guess, a start could be to some write stab files (.pyi) for the scikit-learn API (in a separate package) similarly to how it's done in https://github.com/numpy/numpy-stubs/blob/master/numpy-stubs/__init__.pyi for numpy. Some of it can be probably auto-generated.

There was some previous discussion about this in https://mail.python.org/pipermail/scikit-learn/2016-July/000306.html also see related discussion in #6599 (comment)

@rth
Copy link
Member

rth commented May 18, 2019

I have been experimenting with mypy a bit, and in my experience results are mixed. It works quite well for pure python types and code relying on the stdlib where mypy stubs are built-in, though there are a few quirks and the general experience is still some way off from that of using a modern strongly typed language with type inference. For scientific computing, I think it is of limited use as numpy and scipy still don't have proper type annotations.

Still, I wonder if now that scikit-learn is Python 3.5+ only, starting to progressively add types at least on init parameters of estimators could be worthwhile. Given that keeping standalone stub files it in sync with the main repo would probably be somewhat difficult. cc @thomasjpfan

@jnothman
Copy link
Member

jnothman commented May 21, 2019 via email

@thomasjpfan
Copy link
Member

Keeping stubs in sync would be hard to maintain. Placing them in __init__ using a new indentation style should work. We would need to come up with common union types to be use throughout the codebase, such as RandomStateType.

@adrinjalali
Copy link
Member

Do we want to allow having hints for some input params and not the others?

@amueller
Copy link
Member

I hadn't seen this issue before and I'm confused by the "low level algorithms" part. Our low level algorithms are all in Cython and already fully statically typed, right?

@adrinjalali
Copy link
Member

Yeah I kinda assumed we don't really mean "low level" here. I guess the others did too.

@NicolasHug
Copy link
Member

closing as duplicate of #16705 which was updated more recently

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants