Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Mypy type checking #1121

Closed
1 task done
qgallouedec opened this issue Oct 14, 2022 · 13 comments · Fixed by #1143
Closed
1 task done

[Feature Request] Mypy type checking #1121

qgallouedec opened this issue Oct 14, 2022 · 13 comments · Fixed by #1143
Labels
enhancement New feature or request

Comments

@qgallouedec
Copy link
Collaborator

qgallouedec commented Oct 14, 2022

🚀 Feature

Use mypy as the type checker.

Follow-up of #1119 (comment)

Motivation

Currently, type checking is done with pytype. It seems that many errors are not correctly detected: #894 #1027 #1039 #1040 #1117 #1119

Pitch

Replace pytype by mypy.
Or just add mypy.

Alternatives

No response

Additional context

This would require a lot of changes, in particular removing the use of None for undefined variables:

current usage

from typing import Optional

class A:
    def __init__(self):
        self.a = None # type: Optional[int]

what should be done, according to PEP526 (if I understand properly)

class A:
    def __init__(self):
        self.a: int

Checklist

  • I have checked that there is no similar issue in the repo
@qgallouedec qgallouedec added the enhancement New feature or request label Oct 14, 2022
@Rocamonde
Copy link
Contributor

I think having both is better. Once we fix all the issues of adding mypy, there is little to no cost in keeping both. Very occasionally they conflict, but normally the solution is fairly simple.

@Rocamonde
Copy link
Contributor

Rocamonde commented Oct 14, 2022

I am happy to contribute to adding mypy as I have done this for imitation already and I have some experience in how to do and not do things. Some of the takeaways I would say are:

  1. Install mypy and do a preliminary check on how many errors each file has
  2. Add mypy to CI and ignore all the files with errors immediately as PR 1, to prevent more errors being introduced in other files.
  3. Divide fixing the types of all the files into chunks roughly following the number of errors and how related they are. Submit a separate PR for each chunk. Each PR should get started and merged quickly (~under 1 week) to avoid hundreds of merge conflicts, since this will likely change thousands of SLOC in total. So adjust the size based on that too.
  4. Every time a chunk gets fixed and merged, remove those files from the CI ignore.

Additionally, one could also check the diff of ignored files so when e.g. adding new classes these don't introduce more errors. But I would not expect this to be very useful given that the project is quite mature & unlikely to have significant API changes in the near future (afaik), and it seems hard to set up.

@qgallouedec
Copy link
Collaborator Author

This roadmap sounds reasonable.

unlikely to have significant API changes in the near future (afaik)

#780 introduces changes to a significant number of files, but I think these changes should not interfere too much.

@araffin
Copy link
Member

araffin commented Oct 17, 2022

what should be done, according to PEP526 (if I understand properly)

oh nice, I didn't know that syntax. Yes, we do a lot of a = None in the code...

I think having both is better.

what is the reasoning behind?
pytype is smart but slow, mypy is faster? have you tried pyright btw?

This roadmap sounds reasonable.

sounds good =)

@Rocamonde
Copy link
Contributor

what is the reasoning behind?

Pytype and mypy work in fundamentally different ways. Pytype tries to infer types and is generally lenient. Mypy relies on annotations exclusively and is generally strict. Also see https://docs.google.com/presentation/d/1GYqLeLkknjYaYX2JrMzxX8LGw_rlO-6kTk-VNPVG9gY/htmlpresent

@qgallouedec
Copy link
Collaborator Author

Do we want this strict typing? I would swing 75% for a yes.

@qgallouedec
Copy link
Collaborator Author

2. Add mypy to CI and ignore all the files with errors immediately as PR 1, to prevent more errors being introduced in other files.

What is the most effective process in your opinion?

  1. Process by error type?
  2. Process by file?
  3. Process by error type in file?

This determines how you specify the ignores in setup.cfg.

For 1.

[mypy]
disable_error_code = error-code1, error-code2, error-code3

For 2.

[mypy]
exclude = (?x)(
    file1.py
    | file2.py
  )

For 3.

[mypy-file1]
disable_error_code = error-code1, error-code2

[mypy-file2]
disable_error_code = error-code2, error-code3

@araffin
Copy link
Member

araffin commented Oct 27, 2022

Process by error type?

I would prefer that one but I don't know how easy it is compared to option 2.

@Rocamonde
Copy link
Contributor

Rocamonde commented Oct 27, 2022

I don't think it makes much sense to ignore a certain type of error across the entire codebase.

In my experience doing this for imitation, you don't fix a certain error type across different files, you fix a whole file. This is because the same error across different files is usually caused by totally different reasons and requires a lot of context switching. Whereas fixing single files has much less friction and is faster, since the different sections of the code are much more closely related, and often fixing one thing depends on fixing another.

It also ensures that whole pieces of the API are fully type safe, rather than potentially having bugs across the entire API until everything is fixed.

Also, ignoring an error type for all files prevents the type checker from detecting new errors that are added in clean files. And because the codebase is quite large, I expect that the vast majority of possible error types are somewhere in the codebase. So ignoring them all would make adding mypy basically useless.

@qgallouedec
Copy link
Collaborator Author

So we would go with option 3? The only problem is that the setup.cfg file might be a bit ugly... But it allows you to take the best of the other two options.

@Rocamonde
Copy link
Contributor

I don't mind either way. We can go with 3 for now, if it gets too messy we can switch to 2.

@araffin
Copy link
Member

araffin commented Oct 27, 2022

So we would go with option 3?

i thought that @Rocamonde described option 2, no?

This is because the same error across different files is usually caused by totally different reasons

I was expecting the opposite actually (that's why I was suggesting option 1).

@qgallouedec
Copy link
Collaborator Author

qgallouedec commented Oct 27, 2022

I don't mind either way. We can go with 3 for now, if it gets too messy we can switch to 2.

Let's do the opposite: we go for option 2 and if fixing is too hard, we switch to 3. Ok for you?

@qgallouedec qgallouedec linked a pull request Oct 27, 2022 that will close this issue
14 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants