-
-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Proposed refactoring of mypy/build.py (and cache metadata) #4365
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
There is also #4277 in-flight for supporting PEP 420 namespace packages, which extensively reworks the import parts of build.py. |
Oh thanks for the pointer! That seems to improve the structure of the
module-finding code and cache, which is most welcome. I will limit my ideas
to the dependency management. I'll also try to wait for #4277 to land.
|
I'm also going to try to wait for #4278 to land. |
I think it would also be nice to keep in mind (but perhaps not change in the same diff) that abstracting the build from "it runs on files" to "it runs on one or more file like objects" has many benefits. Allowing StringIO as a way to tell mypy "type check this text, and only this text" is a nice way to a) speed up tests) and b) make editor integrations easier. |
I believe this was basically done in #5686. |
Over the years, build.py has become a dumping ground of all things having to do with module dependencies and caching, making it the 4th largest file in mypy. Prompted by #4353 I think it's time to refactor build.py.
One particular idea I'd like to focus on is the distinction between imports, which are determined (almost) purely syntactically by pass one of the semantic analyzer, and have priorities; and dependencies, which include indirect dependencies, and which are associated with the interface hash of the depended-upon module (once available). Dependencies are seeded from the imports, minus missing modules, in the load_graph() phase. They are extended (after type checking of the SCC) with indirect dependencies (computed as always by TypeIndirectionVisitor). Both tables (imports and full dependencies) are then written to the cache metadata, together with a bit representing the presence of errors in this particular module.
A module for which a cache file exists is then considered fresh (no need to process) if all of the following hold:
For SCCs this needs to be tweaked somewhat -- dependencies within the SCC don't count, and the condition must hold for every module in the SCC. (There are other tweaks needed to account for changed options and changes in the "library path".)
One benefit of this algorithm is that we no longer have to depend on linear mtimes for cache data files to compute freshness. Another is that we may be able to skip processing modules even if there are errors upstream, as long as those errors don't affect the interface hash.
Other things to refactor include the "stat cache" that's used by find_module(), logging, and the fact that the constructor of the State class does way too much work.
This is a big refactoring and I expect it will take a few weeks at least. But I think it's time to start this operation. [UPDATE: I won't start until January 2018 at the earliest.]
The text was updated successfully, but these errors were encountered: