Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mypy: allow explicit specification of package roots #9632

Closed
wants to merge 10 commits into from

Conversation

hauntsaninja
Copy link
Collaborator

@hauntsaninja hauntsaninja commented Oct 23, 2020

This is the next in a series of import handling improvements. (The overall vision is #8584 + there are some important bugs that should be fixed as part of this)

This, along with #9614 and #9616, fix #5759.
It turns out we already have an explicit package root flag (with tests :-) ) that we can co-opt for our purposes. It looks like the intention behind it was to satisfy some Dropbox build requirements, but as far as I can tell those requirements can be met by treating things as namespace packages. And we get to drop some pretty hacky code!
While this PR in itself is probably a breaking change for Dropbox, the good news is that the next item on #8584 is to make --namespace-packages the default. However, if there's a reason to have the old Dropbox behaviour and have namespace packages turned off that would be concerning, but I'm betting that's not the case.

While much of the groundwork for this was laid in #9614, there were a couple issues that needed revisiting now that the explicit package root code paths get hit. The first is that the crawling up logic needs to take package roots into account. The second is that because of that, we make the package roots an instance attribute so that the caching logic is cleaner. While we no longer call crawl very much (due to #9614), the one case this can happen is if users pass in many files to mypy. I switch to functools.lru_cache because there are more return points now. The third is that we need to normalise paths before checking whether they're a package root (An aside here is I wanted to change things to absolute paths so that behaviour is consistent regardless of cwd — we've had one or two issues because of this — but unit tests really did not like that).

I've also started to make a list of places we could put suggestions to use this (but let me know if you have something in mind) + I'll make a docs PR once I'm at a stopping point.

@hauntsaninja
Copy link
Collaborator Author

hauntsaninja commented Oct 24, 2020

Also maybe there should be some interaction with MYPYPATH? E.g. looking at testConfigMypyPath. One idea is that eg --explicit-package-root could be a boolean flag and if set the package roots are what's on MYPYPATH. This would be more consistent with -p too.

@JukkaL
Copy link
Collaborator

JukkaL commented Oct 27, 2020

However, if there's a reason to have the old Dropbox behaviour and have namespace packages turned off that would be concerning, but I'm betting that's not the case.

The old behavior is used in a fairly hacky Bazel integration, but I'm not very familiar with how it works internally. I suspect we could enable namespace packages when using the Bazel integration and disable it otherwise, but it hard to say without trying it out.

I wonder how hard would it be to retain the old behavior when using --no-namespace-packages, at least for a while?

Also maybe there should be some interaction with MYPYPATH?

Maybe we can always make every directory in MYPYPATH a package root when using namespace packages? I.e., it would be an alternative to using --package-root.

@hauntsaninja
Copy link
Collaborator Author

If using namespace packages causes Dropbox's Bazel to crash and burn, I'll try to reinstate all the old hacky code.

I'll make the change to treat all MYPYPATH as package roots if the flag --explicit-package-root is passed (name suggestions welcome). We need the extra flag here a) for backward compatibility, b) because the backward compatible crawling behaviour is more ergonomic, c) as long as we support both the crawling behaviour and the explicit package root behaviour, it's unintuitive and undesirable that setting MYPYPATH should trigger a switch between the two.
The flag --package-root can then continue to be undocumented (it can implicitly entail --explicit-package-root and --namespace-packages if we're unable to get #9636 working), and users will have consistency between passing files to mypy and passing packages to mypy.

@JukkaL
Copy link
Collaborator

JukkaL commented Oct 28, 2020

If using namespace packages causes Dropbox's Bazel to crash and burn, I'll try to reinstate all the old hacky code.

Sounds good! Providing backward compatibility is usually the right call if we make major breaking changes, since there are so many users and use cases that we can't possibly predict. But if this makes maintenance harder, it may be better to avoid this. Extra code that has tests and rarely needs to be updated isn't necessarily much of a burden, though.

I'll make the change to treat all MYPYPATH as package roots if the flag --explicit-package-root is passed (name suggestions welcome).

Hmm I'd prefer to avoid another flag. What if MYPYPATH defines a package root only if namespace packages are enabled, and otherwise we fall back to the current behavior? I.e. when not using namespace packages, we'd keep the old behavior as closely as reasonable. With namespace packages enabled we can change some behavior, since the old behavior isn't a good match.

Another thought: with namespace packages enabled MYPYPATH could be similar to how PYTHONPATH works. PYTHONPATH defines additional package roots, right?

I don't remember all the details of namespace packages, so I may be mistaken above. If the idea doesn't work, it would be good to have an example use case which is affected. Another thing that could help is a matrix showing how the various configuration settings affect various behaviors. I've made a partial attempt below (again, not sure if this is actually correct).

If namespace package are enabled:

  • Presence of __init__.py[i] has no impact on import resolution.
  • Module search path directories are the same as namespace package roots (i.e. there would be only one list internally). --package-root and MYPYPATH behave the same (but with different priorities?).
  • We can infer the full name of a file by crawling up to one of the package roots. Each module must be below one of the package roots.
  • If a target module passed on the command line isn't under any of the package roots, we'll give an error and ask the user to specify another package root.

If namespace packages are disabled:

  • Each package much have __init__.py[i], or otherwise it can't be found (unless --package-root is used?).
  • If --package-root is used, __init__.py can be omitted from packages below this root only.
  • We can infer the full name of a file by crawling up looking at __init__.py (unless --package-root is used).
  • Arbitrary files can be passed on the command line, and we can always infer a full name for any path pointing to a .py[i] file.

@hauntsaninja
Copy link
Collaborator Author

I think that would work... One common use case for where the old __init__.py is easier is src layouts:

.
├── src
│   └── proj
│       ├── __init__.py
│       └── ...
└── tests
    └── ...

Here the breaking change is the command would go from mypy . / mypy src tests to MYPYPATH=src mypy . / MYPYPATH=src mypy src tests

While the concepts are related, I'm also still a little hesitant to solely rely on --namespace-packages to switch the behaviour here (especially since we're trying to make it the default). Users wouldn't be able to get back the old CLI behaviour and have their namespace package dependencies followed. One guiding thought for me is that there shouldn't even be any half-decent reasons to want to use --no-namespace-packages.

Here's my attempt and writing down some options (apologies for repetition and verbosity!). In all options, the following apply:
a) Namespace package roots are determined by module search paths (in particular MYPYPATH and cwd)
b) --package-root remains an undocumented, hidden option and we just twist its behaviour to preserve Dropbox's existing builds. We shall speak of it no further!
c) These all assume that --namespace-packages is made the default in the same release as this change (if this assumption doesn't hold, the interactions become messier)
d) The behaviour when passed directories is determined by the behaviour when passed files: we recursively find all files and use the package root we would have if we'd been passed that file (this was implemented in #9614; it occurs to me there's a case where this might not be ideal, but I'll save it for another time unless asked).

Option A (your suggestion)

By default, when given files we:

  • Crawl files up to a package root

If --no-namespace-packages is passed, we:

  • Crawl files up to a directory without __init__.py
  • Incur the other effects of --no-namespace-packages, e.g., losing the ability to follow imports that are namespace packages

Effect on src layout example:

  • Breaking change, mypy . needs to be changed to MYPYPATH=src mypy .

Pros:

  • Namespace packages don't need special flags when passed as files/directories
  • No new flags introduced
  • By default, no confusion resulting from missing __init__.py (e.g. source module found duplicate times or new errors resulting from find_sources: find build sources recursively #9614)

Cons:

  • Fairly annoying breaking change. I feel that passing directories / files is the quick way of calling mypy (as opposed to -p), and this makes it slower (I've seen a number of issues where people just always want mypy . to work).
  • There's some selection effect, but I think this would be inconvenient for many more users than the number of users who want to check namespace packages.
  • Don't have an easy way to get the old behaviour back without other side effects

Option B

By default, when given files we:

  • Crawl files up to a directory without __init__.py

If --explicit-package-root is passed, we:

  • Crawl files up to a package root

Effect on src layout example:

  • No change, mypy . continues to work

Pros:

  • Very similar to how things work today

Cons:

  • Namespace packages need a special flag when passed as files/directories (but we save having to specify MYPYPATH in many other cases)

Option C

By default, when given files we:

  • Crawl files up to a package root

If --init-package-root is passed, we:

  • Crawl files up to a directory without __init__.py

Effect on src layout example:

  • Breaking change, mypy . needs to be changed to MYPYPATH=src mypy . or mypy --init-package-root .

Pros & Cons:

  • Similar to Option A, but provides an easy and explicit way to get the __init__.py behaviour back, at the cost of a new flag

@JukkaL
Copy link
Collaborator

JukkaL commented Nov 4, 2020

(Sorry for the slow reply, I wanted to think about this carefully since this is a potentially major breaking change.)

You've convinced me that my proposal needlessly breaks too many existing use cases.

Let me reiterate how I understand your alternative proposal, to make sure I haven't missed anything:

  • By default, namespace packages are enabled (they can be disabled with --no-namespace-packages)
  • The cwd, entries in MYPYPATH, and the default module search path items are namespace package roots
    • Additional roots can also be added with --package-root DIR
  • We use namespace package roots to resolve imports
  • For files passed on the command line, we rely on __init__.py files to determine the module prefix
    • This includes mypy dir and mypy foo/bar.py
  • For files passed using -m or -p we use namespace package roots to find the module/package
  • If --explicit-package-root is given, __init__.py files have no effect on module prefix inference, and all files must be under one of the namespace package roots
    • This is required when passing files/directories on the command line that live in namespace packages
  • --explicit-package-root only makes sense if namespace packages are enabled
  • When --no-namespace-packages is used, we use the exact same semantics as the current default

Open questions:

  1. The roots inferred from crawling __init__.py files are not treated as namespace package roots, but they would have to be module search path items, for compatibility with existing behavior?

  2. Do PEP 561 packages need some additional rules?

  3. How often might users need to use --explicit-package-root?

@hauntsaninja
Copy link
Collaborator Author

hauntsaninja commented Nov 11, 2020

Thanks, I agree we need to be careful, and sorry for any slowness on my part!

Yes, that basically sums it up. To address the open questions brought up:

  1. Yes, we'd still need to treat crawled roots as module search path items:
    if source.base_dir:
  2. I might be underthinking it, but I can't see any different needs for PEP 561 packages.
  3. Users would need --explicit-package-root anytime they weren't using -p and wanted to check packages missing __init__.pys.

I think there is a case where things get pretty annoying and so I have an additional proposal. The case is if you have a top level __init__.py, but you're missing an __init__.py somewhere in your tree. E.g. mypy pkg on:

.
└── pkg
    ├── __init__.py
    └── subpkg
        └── a.py

Before #9614, we had really bad behaviour, which is we just completely ignore a.py. With #9614, we will check a.py, but we'll check it as module a, not pkg.subpkg.a. We can't treat it as pkg.subpkg.a without --namespace-packages, but if we do have --namespace-packages, I think we should.
Doing so would mean users wouldn't have to use --explicit-package-root (and specify package roots) to ensure correct module prefixes in the face of missing not top level __init__.py / you'd probably only have to use --explicit-package-root if you don't have a top level __init__.py.
The downsides of this are a) it's a little more complicated, b) maybe you do actually want to nest packages for whatever reason, c) it means files found recursively might use a different module prefix than if you passed the file directly (this is assumption d in my previous message, but it can already happen if your cwd is different)

Concretely, this would mean adding a term to

# If the current directory is an explicit package root, explore it as such.
that looks at if --namespace-packages are on and if there's a current mod_prefix.

I'll try and implement this so we'll have a concrete thing to play around and discover UX issues with.

We might also want to implement some kind of ignore feature, e.g. so mypy can check mypy with --namespace-packages without also checking mypy/typeshed (see #9683)

hauntsaninja added 4 commits November 11, 2020 14:25
Merge for the logging improvements in 9672
This doesn't do much for us here, but makes things easier for my future
change to crawl using absolute paths
@JukkaL
Copy link
Collaborator

JukkaL commented Nov 20, 2020

We can't treat it as pkg.subpkg.a without --namespace-packages, but if we do have --namespace-packages, I think we should.

Agreed. This is likely the common case, at least for projects that are using namespace packages.

c) it means files found recursively might use a different module prefix than if you passed the file directly (this is assumption d in my previous message, but it can already happen if your cwd is different)

Could we make this consistent, for example by always assuming that we are inside a namespace package if one of the parent directories contains a __init__.py file? So running mypy pkg/subpkg/a.py would treat the module as pkg.subpkg.a if namespace packages are enabled.

This would be a backward incompatible change, so we'd have to document this carefully in release notes.

e.g. so mypy can check mypy with --namespace-packages without also checking mypy/typeshed

Maybe we should just move <root>/mypy/typeshed to <root>/typeshed? I wonder if this would make packaging tricky.

Brainstorming: We could write a tool to help debug issues with wrongly configured namespace package roots. Here are some things we could report:

  • If there's a .py file in a directory without __init__.py, but one of the parent directories contains __init__.py, report this as a potential namespace root. (Maybe also look for potential imports targeting this file.)
  • Look for a top-level directory foo without __init__.py, so that one of the subdirectories contains a .py file. In this case we can look for all imports in all source files, and if some of them targets foo.*, report a warning that passing files on the command line will require --explicit-package-root to be used, or the addition of foo/__init__.py. (Maybe also extend this to subdirectories, e.g. src/foo when src/foo/__init__.py doesn't exist.)

@JukkaL
Copy link
Collaborator

JukkaL commented Nov 20, 2020

Also, sorry for the delay in review. This is complicated by needing to make sure the Bazel integration we use at Dropbox isn't broken by this, and I'm not very familiar how the integration works. Also, we can't easily try GitHub master with internal repos right now since there are some issues we need to investigate and fix/work around first. This will take at least until the end of month, because of vacations next week.

I think that I might be able to merge this sooner if you can preserve the old hacky code used in our Bazel integration (i.e. without enabling namespace packages). How much trouble would this be? We could try removing the legacy code later, separately from this PR. Otherwise, we may need wait until master is clean with Dropbox internal repos.

@hauntsaninja
Copy link
Collaborator Author

Sounds good, I'll break up this PR. Decoupling these changes from a) Bazel integration, b) making namespace packages the default, seems like an increasingly good idea.

I'll make the change so that crawling up files will crawl all the way up in case of __init__.py in an arbitrarily high parent directory.

Agreed that good diagnostics and documentation will be important here.

I think moving /mypy/typeshed to /typeshed will indeed potentially cause packaging issues. I'm going to hold off on looking into that, especially with modular typeshed looming on the horizon.

@hauntsaninja
Copy link
Collaborator Author

This is succeeded by #9742

hauntsaninja added a commit that referenced this pull request Dec 12, 2020
This is the successor to #9632. Things should basically be as discussed in that PR. Since #9616 is merged, this should now resolve #5759.

We leave the Bazel integration with `--package-root` almost entirely untouched, save for a) one change that's a bugfix / doesn't affect the core of what `--package-root` is doing, b) another drive by bugfix that's not related to this PR.
Change a) fixes the package root `__init__.py` hackery when passed absolute paths. Change b) fixes the validation logic for package roots above the current directory; it was broken if you passed `..` as a package root

Since we're leaving `--package-root` alone for now, I named the new flag `--explicit-package-base` to try and avoid confusion. Doing so also matches the language used by BuildSource a little better.

The new logic is summarised in the docstring of `SourceFinder.crawl_up`.

Some commentary:
- I change `find_sources_in_dir ` to call `crawl_up` directly to construct the BuildSource. This helps codify the fact that files discovered will use the same module names as if you passed them directly.
- Doing so keeps things DRY with the more complicated logic and means, for instance, that we now do more sensible things in some cases when we recursively explore directories that have invalid package names.
- Speaking of invalid package names, if we encounter a directory name with an invalid package name, we stop crawling. This is necessary because with namespace packages there's no guarantee that what we're crawling was meant to be a Python package. I add back in a check in the presence of `__init__.py` to preserve current unit tests where we raise InvalidSourceList.
- The changes to modulefinder are purely cosmetic and can be ignored (there's some similar logic between the two files and this just makes sure they mirror each other closely)
- One notable change is we now always use absolute paths to crawl. This makes the behaviour more predictable and addresses a common complaint: fixes #9677, fixes #8726 and others.
- I figured this could use more extensive testing than a couple slow cmdline tests. Hopefully this test setup also helps clarify the behaviour :-)

Co-authored-by: hauntsaninja <>
@hauntsaninja hauntsaninja deleted the find2 branch January 16, 2021 22:04
hauntsaninja pushed a commit to hauntsaninja/mypy that referenced this pull request Jan 18, 2021
This should cover the current state on master, as implemented
across python#9742, python#9683, python#9632, python#9616, python#9614, etc.

This will need to be changed if we can make `--namespace-packages` the
default (python#9636).
hauntsaninja pushed a commit to hauntsaninja/mypy that referenced this pull request Jan 18, 2021
This should cover the current state on master, as previously
discussed / implemented across python#9742, python#9683, python#9632, python#9616, python#9614, etc.

This will need to be changed if we can make `--namespace-packages` the
default (python#9636).

I haven't documented some of the finer points of the changes, since it
felt like an inappropriate level of detail (e.g. using absolute paths
when crawling, how directories with invalid package names affect
crawling, etc)
hauntsaninja pushed a commit to hauntsaninja/mypy that referenced this pull request Jan 18, 2021
This should cover the current state on master, as previously
discussed / implemented across python#9742, python#9683, python#9632, python#9616, python#9614, etc.

This will need to be changed if we can make `--namespace-packages` the
default (python#9636).

I haven't documented some of the finer points of the changes, since it
felt like an inappropriate level of detail (e.g. using absolute paths
when crawling, how directories with invalid package names affect
crawling, etc)
hauntsaninja added a commit that referenced this pull request Jan 19, 2021
This should cover the current state on master, as previously
discussed / implemented across #9742, #9683, #9632, #9616, #9614, etc.

This will need to be changed if we can make `--namespace-packages` the
default (#9636).

I haven't documented some of the finer points of the changes, since it
felt like an inappropriate level of detail (e.g. using absolute paths
when crawling, how directories with invalid package names affect
crawling, etc)

Co-authored-by: hauntsaninja <>
ilevkivskyi pushed a commit that referenced this pull request Jan 20, 2021
This should cover the current state on master, as previously
discussed / implemented across #9742, #9683, #9632, #9616, #9614, etc.

This will need to be changed if we can make `--namespace-packages` the
default (#9636).

I haven't documented some of the finer points of the changes, since it
felt like an inappropriate level of detail (e.g. using absolute paths
when crawling, how directories with invalid package names affect
crawling, etc)

Co-authored-by: hauntsaninja <>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support namespace packages from command line
2 participants