Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature-request] enhanced first-party detection #1332

Open
Tracked by #6190
smackesey opened this issue Dec 22, 2022 · 9 comments
Open
Tracked by #6190

[feature-request] enhanced first-party detection #1332

smackesey opened this issue Dec 22, 2022 · 9 comments
Labels
core Related to core functionality isort Related to import sorting

Comments

@smackesey
Copy link

smackesey commented Dec 22, 2022

Probably the most common use-case for marking a package as a "known first party" import is when you have an associated test package. In the world of pytest, the associated test package is typically called <TARGET_PKG>_tests. It would be cool if auto-first-party detection was smart enough to understand this.

For example:

### foo_tests/test_foo.py

from foo import bar  # this import should be first-party because it's from `foo`

This could be implemented by a tool.ruff.isort option called something like first-party-grouping-suffixes (certainly not wedded to the name). The idea would be that any package matching a suffix is treated identically to its "stem" for purposes of first-party grouping. So:

[tool.ruff.isort]

# This causes any package matching "<STEM>_tests" to be treated identically to "<STEM>" for the purposes of first-party grouping.
first-party-grouping-suffixes = ["_tests"]

IMO ["_tests"] would be a reasonable default for this option. There might also be other suffixes to include used by other test frameworks.

@andersk
Copy link
Contributor

andersk commented Dec 26, 2022

foo is detected as first-party anyway as long as it exists in one of the paths in src (defaults to ["."], for the typical case where foo is in the same directory as pyproject.toml). Is that not sufficient? It seems better for this classification to rely on facts than heuristics.

@smackesey
Copy link
Author

smackesey commented Dec 26, 2022

Hi @andersk, we disagree here.

foo is detected as first-party anyway as long as it exists in one of the paths in src

Unless something changed very recently, this is only true if the package is a direct child of one of the paths on src.

the typical case where foo is in the same directory as pyproject.toml

While this might be typical, it is not necessarily the most common design for large projects using monorepos. I work on the large monorepo Dagster. Our python packages are not siblings of pyproject.toml.

It seems better for this classification to rely on facts than heuristics.

There's nothing ambiguous about a setting that tells ruff to classify packages as first-party depending on a suffix.

Also, I think by your definition Ruff is already using a heuristic that works 99% of the time for first-party detection (i.e. for non-namespace packages), which is to trace __init__.py files up the directory hierarchy to identify the root package for a module, and match against this root package to determine first-party status. The feature requested here is an addition to that functionality.

FWIW, I requested that functionality specifically to support situations where there are many python packages in a monorepo but one doesn't want to add a pyproject.toml for each one. I can understand an argument in favor of requiring explicit config here, but that is just a philosophical difference-- in the vast majority of cases, if I have foo_tests next to foo, foo should be first-party in foo_tests. It makes sense to achieve this with minimal configuration using a clear rule, given that first-party auto-detection already happens.

@andersk
Copy link
Contributor

andersk commented Dec 26, 2022

My goal here isn’t to agree or disagree, just to figure out whether the existing mechanisms are sufficient and where they fall short.

Looking at your monorepo, it seems to me that you should be able to configure

[tool.ruff]
src = [
    "docs/dagit-screenshot",
    "examples/*",
    "helm/dagster/schema",
    "integration_tests/python_modules/*",
    "python_modules/*",
    "python_modules/libraries/*",
]

which seems manageable?

Meanwhile, your proposed _tests heuristic would fail to capture that foo should also be first-party within itself.

@smackesey
Copy link
Author

The src config you posted would have the effect of making all packages captured by src first-party in every other package, which I'm trying to avoid-- only foo should be first-party inside of foo.

Meanwhile, your proposed _tests heuristic would fail to capture that foo should also be first-party within itself.

See my previous comment-- ruff already detects foo is first-party within itself:

I think by your definition Ruff is already using a heuristic that works 99% of the time for first-party detection (i.e. for non-namespace packages), which is to trace init.py files up the directory hierarchy to identify the root package for a module, and match against this root package to determine first-party status. The feature requested here is an addition to that functionality.

@andersk
Copy link
Contributor

andersk commented Dec 27, 2022

Ah, my apologies for misreading—I hadn’t noticed that update (#1266).

It seems to me that if we’re going to trace up the __init__.py chain to find project-root/foo, then we might as well detect project-root/*.py and project-root/*/__init__.py as first-party, not just project-root/foo itself. That would solve your issue, right?

@charliermarsh charliermarsh added core Related to core functionality and removed enhancement labels Dec 31, 2022
@condemil
Copy link

condemil commented Mar 2, 2023

I have a similar problem with bazel monorepo setup, where pyproject.toml file is located in the root of the monorepo. I don't have __init__.py files, but project root could be identified by BUILD or BUILD.bazel files. Probably the solution could be a configuration variable in pyproject.toml that allows to specify filenames that marks project root?

Example (variable name could be better):

[tool.ruff]
root-file = ["BUILD", "BUILD.bazel"]

Example structure:

.
├── app
│   └── project1
│       └── backend
│           ├── BUILD.bazel
│           └── main.py
├── lib
│   └── util
│       ├── BUILD.bazel
│       └── log.py
└── pyproject.toml

@charliermarsh
Copy link
Member

@condemil - Oh yeah that's an interesting use-case. Is it possible to solve with src, and wildcarding? Like:

[tool.ruff]
src = [
  "app/*",
  "lib/*",
]

@condemil
Copy link

condemil commented Mar 2, 2023

Unfortunately, the solution you mentioned isn't working. I don't know the exact reason, but I may guess it is because of the following:

  1. The "app/*" wildcard doesn't support folding, and in monorepo the project could be under app/project1/subproject. Folding could be solved by introducing some wildcard pattern like "app/**/*".
  2. When projects is specified under src, all of those is threated as a first-party imports for each other instead of being threated as a third-party imports.

@charliermarsh
Copy link
Member

Ah yeah that makes sense -- I didn't realize that you wanted each package to consider each other package as third-party (which is reasonable but not always the case for monorepos). Lemme try a few things out before I make any other suggestions.

@charliermarsh charliermarsh added the isort Related to import sorting label Aug 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Related to core functionality isort Related to import sorting
Projects
None yet
Development

No branches or pull requests

4 participants