-
-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Make incremental not propagate staleness if the public interface of a module is unchanged (v2) #2014
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make incremental not propagate staleness if the public interface of a module is unchanged (v2) #2014
Conversation
A note about the "chained modules" problem: Suppose module "a" imports module "b" which also imports module "c", and we change module "c" such that it causes an interface change. This means that we would also need to recheck "b", but would like to avoid having to recheck "a" if module "b"'s interface didn't end up changing. Unfortunately, we're currently forced to, due to this case: import b
b.c.some_function(3) # what if c.some_function was changed to accept str? If the change in "c" propagates only to "b", then the new error in "a" would not be caught. I attempted adding a basic check to try and forbid cases where people chained module references like this, but unfortunately it caused too many failures in real-life code, mainly due to import os
os.path.join("a", "b", "c") # argh However, I wanted to turn what I have right now into a pull request/get at least some improvements reviewed while I think more about the best way to handle this problem |
I've thought about the chained modules problem previously, and here's one idea I had. I'm not sure if it would work, but it might. We could perhaps store all accessed chained modules in module metadata as "implicit dependencies". Based on the example above, metadata of module We can find implicit dependencies by processing every member expression and type reference in the module body. If a module accesses an attribute via multiple modules, such as When a module We'd still include changes in the names of imported modules as changes in the module interface. If we remove an import from a module without changing anything else, that would mark the public interface as changed. Also, if we'd change |
def write_cache(id: str, path: str, tree: MypyFile, | ||
dependencies: List[str], suppressed: List[str], | ||
child_modules: List[str], dep_prios: List[int], | ||
manager: BuildManager) -> None: | ||
manager: BuildManager) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add the meaning of the return value to the docstring please.
I'm not sure if you're satisfied with this or if you'd like to ponder the chained modules solution more? Also, you have a merge conflict. |
c6435fb
to
3a7622b
Compare
@@ -774,10 +785,14 @@ def random_string() -> str: | |||
return binascii.hexlify(os.urandom(8)).decode('ascii') | |||
|
|||
|
|||
def compute_md5_hash(text: str) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This probably shouldn't have "md5" in its name -- just compute_hash()
sounds better, in case we want to switch hash implementations in the future.
This commit modifies the incremental tests so that changes that don't change the "public interface" of the file will not trigger a recheck in any dependees.
This commit updates the incremental tests to use the 'rechecked' keywords and recalibrates a few tests to work against a more precise interface change detector.
This commit makes the incremental tests distinguish between "stale" and "rechecked" files. Stale files are ones where the "public interface" of that method is changed; rechecked files are ones that must be re-analyzed (perhaps because one of their dependencies changed?) but did not necessarily have a changed public interface. This functionality is currently unused, but paves a path for upcoming incremental-mode changes.
Previously, the incremental checks did not correctly handle the case where... 1. Module "A" imports and uses module "B" 2. Incremental mode is successfully run 3. Module "B" is changed such that module "B" typechecks, but module "A" now throws an error 4. Incremental mode is run again What happens in this case is that the cache files for "A" and "B" are produced in the first run, and only the cache files for "B" is updated in the second run. Previously, the checks made the assumption that if a file has an error, it was due to a change within that file. This is not actually always the case.
64c7e38
to
aaf0362
Compare
This pull request modifies incremental mode so that if a module is modified in a way such that the types for any of its publicly accessible symbols are unchanged, we recheck the file without queueing up the importers to be rechecked. Or, more precisely, if module A imports module B and module B's public interface is unchanged, then we consider module B to be internally stale but not externally stale and consider module A to still be fresh.
It does so by producing a hash of the serialized version of the output before writing the cache data to a file, and saving that hash to the meta file. The hashes are then compared against each other in subsequent runs.
This pull request is a strictly better version of #2002, although it also does not solve the "chained modules" problem.