Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Meta issue: plugin system #283

Open
sobolevn opened this issue Sep 29, 2022 · 71 comments
Open

Meta issue: plugin system #283

sobolevn opened this issue Sep 29, 2022 · 71 comments
Labels
core Related to core functionality

Comments

@sobolevn
Copy link
Contributor

I think that the main thing why Flake8 is so popular is its plugin system.
You can find plugins for every possible type of problems and tools.

Right now docs state:

Beyond rule-set parity, ruff suffers from the following limitations vis-à-vis Flake8:

1. Flake8 has a plugin architecture and supports writing custom lint rules.

I propose designing and implementing plugin API.
This way ruff can compete with flake8 in terms of adoption and usability.

Plugin API

I think that there are some flake8 problems that should be fixed and also there are some unique chalenges that should be addressed.

  1. Explicit "opt-in" for plugins. Right now flake8 suffers from a problem when you install some tool and it has a flake8 plugin definition. This plugin is automatically enabled due to how setuptools hooks work. I think that all rules must be explicit. So, eslint's explicit plugins: looks like a better way.
  2. Special "fix" API and tooling: so many typical problems can be solved easily by plugin authors
  3. Plugin order. Since plugins will change the source code, we must be very strict about what order they run in. Do they run in parallel while checking files?
  4. Plugin configuration: current way of configuring everything in [flake8] section can cause conflicts between plugins. Probably, the way eslint does that is better
  5. Packaging: how to buld and package rust extensions? How to build wheels?

Please, share your ideas and concerns.

@charliermarsh
Copy link
Member

Strongly agree. Will write up some thoughts on this later!

@charliermarsh
Copy link
Member

The first big decision is: should plugins be written in Rust? Or in Python? I believe that either could be possible (though I haven't scoped out the work at all), e.g., using pyo3. It may even be easier to support plugins in Python given that loading code dynamically is much easier in a scripted language...

However, I'm partial to requiring plugins be written in Rust. It will lead to a more cohesive codebase, allow us to maintain a focus on performance, and avoid requiring extensive cross-language FFI. I'm open to being convinced otherwise here though.

Here are a few relevant resources on implementing a plugin system for Rust:

One of the main challenges seem to be around the lack of ABI stability in Rust. In many of the above write-ups, they discuss how both the plugins and calling library need to use the same versions of Rust in order to be compatible, which feels like a tall order. (From that perspective, one thing that's interesting to me is: could we compile plugins to WASM?)

@sobolevn
Copy link
Contributor Author

sobolevn commented Sep 29, 2022

I think that instead of Python based plugins, it is better to provide some kind of query language to make easy plugins very easy. Like in https://github.com/hchasestevens/astpath

In my opinion, any complex stuff should be in Rust. This way it can reuse existing APIs and be fast. But, I don't know how many Python developers actually know Rust 🤔

I think another way of dealing with it is to ask exisiting flake8 plugin authors about their prefered way of writting it. Their feedback would be very valuable!

@charliermarsh
Copy link
Member

Interesting, Fixit / LibCST has something kind of like that too. It's not quite a distinct query language, but it's effectively a DSL (in Python) to pattern-match against AST patterns.

@thejcannon
Copy link
Contributor

thejcannon commented Oct 25, 2022

My 2c. flake8's plugin support is pretty rudimentary (operates on tokens/lines plus a handfull of metadata). Therefore if you supported Python plugins, you likely could craft it in a way that supporting flake8 plugins out-of-the-box(-ish) would be feasible.

Then you could have most flake8 plugins available through ruff, and wouldn't need to support new plugins by porting the code to Rust (only if you want to because yummy yummy perf).

(lightly related to #414)

@charliermarsh
Copy link
Member

Another idea: we could build a plug-in system atop https://github.com/ast-grep/ast-grep. This would allow users to express lint rules in YAML or via a simple DSL.

@charliermarsh
Copy link
Member

(That tool is itself built atop tree-sitter.)

@ljodal
Copy link

ljodal commented Nov 8, 2022

I’m coming to this from a position of having written a flake8 plugin for a very specific need at work, and as part of a larger project. This is not something generic, so it’d never make sense as a built-in feature in ruff. I’d love a way to write plugins to ruff in Python, mostly because it’s convenient as someone familiar with Python and not so much rust, but also because it would be nice to keep a project in pure Python even while interacting with rust.

The specific plugin in my case, oida, was first written as a standalone thing before I discovered how easy it was to add it as a plugin to flake8. It also uses LibCST, for its ability to round trip code, where we do codemodding for its. If it would be possible to expose a similar ast based Python-interface for plugins that would be awesome.

I also have use cases where I’d like to do auto fixing, which it would also be nice to support. The first thing I’d like to do is normalize import statements (relative vs absolute). In order to do that I’d need an interface where I get import statements or ast nodes and the path to the file so I can locate it in relation to other files on the system.

I understand that writing plugin in Python would be a slowdown compared to writing them in rust, but I think that tradeoff would be very much acceptable in many cases.

@charliermarsh
Copy link
Member

Very helpful and all makes sense.

Maybe just as another data point for the thread: when I was at Spring Discovery, we wrote a few Flake8 plugins to enforce highly codebase-specific rules.

For example:

  • "Always late-import TensorFlow" (i.e., import it within a function that depends on it, rather than at module top-level)
  • "If you ever import module X, make sure the file also imported module Y"
  • "Imports to module Z should always use import from structure"

@charliermarsh
Copy link
Member

So in that light, I think there are different categories of plugins:

  • Plugins that are custom to a codebase
  • Plugins that may apply to many codebases, but don't make sense to include directly in Ruff (e.g., Django-specific stuff could qualify here)

@charliermarsh
Copy link
Member

I think most of those "custom" plugins / checks could be built atop something like ast-grep, but more complex checks (like rewriting absolute and relative imports) would be limited by that approach.

@charliermarsh
Copy link
Member

The first thing I’d like to do is normalize import statements (relative vs absolute). In order to do that I’d need an interface where I get import statements or ast nodes and the path to the file so I can locate it in relation to other files on the system.

(Separately: this could arguably make sense to include in Ruff directly.)

@ljodal
Copy link

ljodal commented Nov 8, 2022

I think most of those "custom" plugins / checks could be built atop something like ast-grep, but more complex checks (like rewriting absolute and relative imports) would be limited by that approach.

Yeah, I wouldn’t really be able to implement any of Oida using ast-grep, as all the rules depend on the context of the project. I use in-process caching to keep that state ready between files in the current flake8 plugin btw, forgot to mention that above, so the flake8 interface isn’t ideal for that kind of plugin.

The first thing I’d like to do is normalize import statements (relative vs absolute). In order to do that I’d need an interface where I get import statements or ast nodes and the path to the file so I can locate it in relation to other files on the system.

(Separately: this could arguably make sense to include in Ruff directly.)

I guess some rules could be, again not for my specific case. What we’re considering at work is to enforce relative imports within a Django app and use absolute imports for everything else. Our structure will be project.component.app or project.app, so it would be very specific to our use case how that rule should be applied. I’ve already played with implementing it in isort, but I found that code base hard to navigate and would love a clean ast/cst based plugin interface where I could add this logic :)

@peterjc
Copy link

peterjc commented Nov 14, 2022

You asked for feedback from other flake8 plugin authors, so:

  • https://github.com/peterjc/flake8-black (620k downloads/month on PyPI), not needed if you can run black directly as well as running flake8, for example via the tool pre-commit or otherwise. Currently this reloads each Python file from disk (scope here to refactor to let black use its cache), it would not be possible to use the AST from flake8 directly. Does not make sense to plug into ruff.

  • https://github.com/peterjc/flake8-rst-docstrings/ (238k downloads/month on PyPI), uses the AST to extract docstrings, which are passed as strings to the Python library docutils to be validated as RST. My code is essentially a wrapper, and since docutils is written in Python that would have to be used internally if this plugin were to be ported to ruff.

  • https://github.com/peterjc/flake8-sfs (15k downloads/month on PyPI), uses the AST directly looking for particular kinds of node. Probably could be done in either Python or Rust, although unlikely to be popular enough to deserve including in ruff itself.

@charliermarsh
Copy link
Member

Thank you @peterjc! Really appreciate your engagement here as a plugin author!

(Regarding RST: it looks like there's at least one Rust crate for parsing RST, though it doesn't look super popular.)

@ofek
Copy link
Contributor

ofek commented Nov 15, 2022

Is this possible, or supported currently? https://github.com/adamchainz/flake8-tidy-imports

@charliermarsh
Copy link
Member

@ofek - Not currently supported but it’s a pretty small surface area so should be easy to add some time in the next few days.

@ofek
Copy link
Contributor

ofek commented Nov 17, 2022

Thanks! I've been enforcing absolute imports recently (except in tests) https://github.com/pypa/hatch/blob/b0911bb0eaa8d331c24eda940b97bf244ecd5ac3/.flake8#L8-L11

After that I'll switch over, and make new projects generated by Hatch use this.

@charliermarsh
Copy link
Member

Sweet! The banned relative import rule I can definitely do today.

@charliermarsh
Copy link
Member

@ofek -- I252 (banned relative imports) just went out in v0.0.125.

You can use it in Hatch by adding this to your pyproject.toml:

[tool.ruff]
select = [
  "B",
  "C",
  "E",
  "F",
  "W",
  # Ruff doesn't have this, but it does have E722.
  # "B001",
  "B003",
  "B006",
  "B007",
  # These don't exist in newer flake8-bugbear versions IIUC.
  # "B301",
  # "B305",
  # "B306",
  # "B902",
  "Q000",
  "Q001",
  "Q002",
  "Q003",
  "I252",
]
ignore = [
  "B027",
  # "E203",
  # "E722",
  # "W503",
]
line-length = 120
# tests can use relative imports
per-file-ignores = {"tests/*" = ["I252"], "tests/**/*" = ["I252"]}

[tool.ruff.flake8-tidy-imports]
ban-relative-imports = "all"

Let me know if it works, or doesn't! :)

@ofek
Copy link
Contributor

ofek commented Nov 20, 2022

Thank you!!! pypa/hatch#607

@ljodal
Copy link

ljodal commented Nov 28, 2022

@charliermarsh You wrote somewhere that libcst is significantly slower than the current ast implementation in ruff (can't find it right now). Do you know why? Is it because it's a cst or is it because the classes it exposes are Python "compatible"?

I'm asking because I've started looking into pyo3 and from what I see the only way to expose an ast to a Python plugin would be to make the ast classes Python classes in pyo3. If that's what's slow with libcst I guess there's not really any point in investigating that route too much, but if we could make that fast enough I guess it could be one way to make plugins work.

That doesn't resolve auto-fixing, but as I suggested in another thread I think maybe doing auto-fixing on the token level could be made to work. Maybe an interface like this:

def visit_Import(node: ast.Import, tokens: list[str]) -> list[str]:
    # Check ast (or tokens) for violations and return updated token
    return ["import", " ", "foo"]

Or maybe have tokens as an attribute on the ast nodes 🤔

@charliermarsh
Copy link
Member

charliermarsh commented Nov 28, 2022

@ljodal - This was all based on LibCST as a Rust crate, with no Python FFI -- so I think it's just the CST and parser, and not anything to do with the the serialization. (I also hacked in some RustPython vs. LibCST benchmarks into the existing LibCST benches and got similar results. As with all benchmarking, though, I could definitely be doing something wrong!)

@charliermarsh
Copy link
Member

@ljodal - I don't have great intuition for whether the PyO3 FFI would add much overhead and what the performance impact would be. I think it's worth exploring!

@ljodal
Copy link

ljodal commented Nov 29, 2022

Aight, then I'll continue investigating :)

I haven't written any rust before, so it's slow going (thinking of doing advent of code in rust to get a kickstart). My plan was to use the Python ASDL definitions to generate AST classes, but it's been years since last I touched compilers so I'll have to see how I go about the tokenization and conversion to ast

@monotkate
Copy link

It sounds like the jury is still out when it comes to creating custom rules, is that correct? I've seen custom linting rules be a valuable tool when modularizing a monolith, with rules very customized for the codebase you're working in. As far as I can tell, Ruff does not allow you to develop custom rules at this time, so we'd have to run another linter alongside Ruff for that ability.

We just switched our codebase to using Ruff, and are also looking to start modularizing. I'm trying to figure out if I need to chose a second tool alongside Ruff for customizations.

@adam-azarchs
Copy link

I think the main point of contention is not whether it should be allowed but rather how.

IMO if people want to author a plugin in python they should probably use a python-based tool (e.g. flake8 or pylint) to run it. One of the things I like about ruff is that it doesn't have a dependency on a python runtime, and not unrelatedly that it is very fast. A plugin architecture for ruff would be nice, certainly, but I'd advocate for it being either native plugins (e.g. .so libs that can be dynamically loaded into the process and register themselves), WASM, or maybe some kind of DSL (ast-grep was mentioned earlier).

I strongly suspect that a DSL would be sufficient for 80+% of the kind of use cases people are describing where a repo has rules very specific to their code base that wouldn't be sufficiently broadly applicable to upstream. Especially nice about that is that such custom rules could just be included in the pyproject.toml (toml is isomorphic to yaml, though it does get awkward compared to yaml for more deeply nested structures).

@morgante
Copy link

We've been working with Biome to integrate GritQL as an extension/plugin system and I'd love to offer the same for Ruff. The problem space is similar and I think GritQL provides a few advantages:

  • Preserve some of the best things about Ruff: no runtime Python dependency, pure Rust, no separate installation steps, etc.
  • Most custom/codebase-specific rules can be expressed as simple AST-based transforms.
  • Traversal still happens in pure Rust and, because declarative queries are used, they could be optimized to maintain Ruff's excellent performance

Here's a few example of how @charliermarsh's earlier custom suggestions could be implemented directly:

  • Always late import TensorFlow - studio
    `import $import` where {$import <: contains `tensorflow`, $import <: not within block()}
    
  • If you ever import module X, make sure the file also imported module Y - studio
    `import $import` where {
        $import <: contains `moduleX`,
        $program <: not contains `import moduleY`
    }
    
  • Imports to module Z should always use import from structure - studio
    `import $import` where {$import <: contains `module`, $import <: not within `from $_ import $_`}
    

@JobaDiniz
Copy link

I'm looking to write a few fitness functions by extending ruff linters. I think that would be ideal, and I'd like to use python.

Since ruff does not support plugins, I'm writing these functions as tests that run on CI using pytest, but these specific fitness functions I'm writing are linters, so it would make sense to write them as part of ruff linters.

@jhosmanfriasbravo
Copy link

@charliermarsh hi! do you think that ruff will consider a plugin system in the short-medium term?

@MichaReiser
Copy link
Member

Thank you, @morgante, for offering your support to help us build a GritQL-based plugin system.

GritQL is undoubtedly at the top of my mind when it comes to designing a plugin system for Ruff, and I'm following the work in the Biome repository from a distance (but I must admit, not very closely).

It will probably be a while before we evaluate solutions for a plugin system because we're currently in the middle of rewriting Ruff's compiler infrastructure to support multifile analysis (and more ;)). But I'll come back to your offer when we're ready to explore Ruff plugins.

@ssbarnea
Copy link

One flake8 plugin that could be very useful to be covered by ruff would be pydoclint. Even having external plugins, like called as shell commands would prove very useful, speed should not be no1 priority. In time plugin authors might rewrite them in rust, but for start a way to hook external ones would prove very useful.

@adam-azarchs
Copy link

I am far from convinced that there's value in having ruff launch such external processes, including hosting a python runtime (which while technically could be in-process, there wouldn't really be much of an advantage over forking it as a subprocess). The primary value that flake8 gets out of integrating a lot of plugins is that they all can share the same parsed AST representation, to avoid redundant work. Likewise with ruff's integrated linters. That would not be the case for ruff plugins unless those plugins were also written in rust, or some other language that could consume ruff's in-memory representations with limited or no transformations.

It sort of sounds to me that what some people are looking for is a way to run a bunch of arbitrary checkers on python files in a repository, and they don't really care whether those commands are integrated into the same executable or not. If that's what you're looking for, maybe look at something like pre-commit? pre-commit is quite good at that job. ruff just feels like the wrong layer for doing that - it's a thing that gets run by pre-commit, and shouldn't be trying to do the same job.

@flying-sheep
Copy link
Contributor

A new blog post investigating Rust plugin systems, probably helpful! https://benw.is/posts/plugins-with-rust-and-wasi

@2bndy5
Copy link

2bndy5 commented Aug 30, 2024

FYI, using external processes (like a rust-based standalone binary or a python-based entrypoint script) is how mdbook implements plugins for preprocessing Markdown. This is how mdbook also supports plugins implemented in python or possibly other languages. If designed well enough, one could write a ruff plugin that also acts as a standalone linter. This means adding unsupported rules (from linters that ruff is non-compliant with) could be implemented externally in the (unsupported) third-party linter.

@carlpaten
Copy link

Members of my team have been agitating for Ruff, but without support for custom rules it's a tough sell. We need to be able to implement bespoke rules to support our own coding style - stuff like forbidding top-level statements not guarded by if __name__ == "__main__":, for example. We have the choice between maintaining our own fork of Ruff, with limited in-house Rust experience, or use Flake8 + Black.

@Dreamsorcerer
Copy link

We have the choice between maintaining our own fork of Ruff, with limited in-house Rust experience, or use Flake8 + Black.

Or (and I don't necessarily recommend it), Ruff + Flake8 (using the external config option for rules handled by Flake8).

@muglug
Copy link

muglug commented Sep 4, 2024

Members of my team have been agitating for Ruff, but without support for custom rules it's a tough sell.

If they want to monetise what they’ve built (and add this as a paid feature) then Astral might welcome this sort of feedback, but otherwise it feels a little off-key for a free tool with a permissive open-source license.

@bverhoeve
Copy link

I have the same problem as @Dreamsorcerer for my team. flake8 + black works fast enough for us, so the speed of ruff isn't a good enough trade off for the lack of custom plugins.

@jd-solanki
Copy link

Along with plugin system I would like to share great idea for beginners to lint their code who don't know how to write linting rules using AST or something else.

"Lint using regex"

In JS, ESLint has plugin system and we used to write custom rules for our solutions. We wrote some rules using official rules docs but using regex to lint the code allowed all of us in our team to writing linting rules without any additional learning.

Regex plugin that allowed beginner devs to write linting rules: https://www.npmjs.com/package/eslint-plugin-regex

@chadrik
Copy link

chadrik commented Oct 12, 2024

I had a look at ast-grep and it's pretty fantastic. It is fast, it is written in rust and provides multiple high-level APIs, including both python and YAML. But it has one giant problem: it doesn't preserve comments, making it pretty useless for real-world code manipulation.

Would it be possible to publish ruff's python ast parser as a crate, which ast-grep could be modified to use, and then ruff could use ast-grep's high level APIs for its plugin system?

huonw added a commit to pantsbuild/pants that referenced this issue Oct 30, 2024
This switches the Ruff subsystem to download and run it via the
artifacts published to GitHub releases.

Ruff is published to PyPI and thus runnable as a PEX, which is what
Pants currently does... but under the hood ruff is a single binary, and
the PyPI package is for convenience. The package leads to a bit of extra
overhead (e.g. have to build a `VenvPex` before being able to run it).
It is also fiddly to change the version of a Python tool, requiring
building a resolve and using `python_requirement`s.

By downloading from GitHub releases, we can:

- more easily support multiple ruff versions and allow users to pin to
one/switch between them with a `version = "..."` (this PR demonstrates
this, by including both 0.6.4 as in `main`, and 0.4.9 as in 2.23, i.e.
if someone upgrades to 2.24 and wants to stick with 0.4.9, they can just
add `version = "0.4.9"`, no need to fiddle with `known_versions`)
- eliminate any Python/pex overhead by invoking the binary directly
- side-step fiddle like interpreter constraints
(#21235 (comment))

Potential issues:

- If Ruff adds plugins via Python in future, maybe we will want it
installed in a venv so that the venv can include those plugins...
astral-sh/ruff#283 seems to be inconclusive
about the direction, so I'm inclined to not worry and deal with it in
future, if it happens.

This PR does:

- Switches the ruff subsystem to be a `TemplatedExternalTool` subclass,
not `PythonToolBase`
- Plumbs through the requisite changes, including: setting up some
`default_known_versions` for the `main` and 2.23 versions (0.6.4 and
0.4.9 respectively), changing how the tool is installed, removing
metadata about interpreter constraints that is no longer relveant or
computable
- Eases the upgrade by adding deprecated instances of the fields
provided by `PythonToolBase`: people may've customised the Ruff version
with `install_from_resolve`. If the field was just removed, running any
pants command will fail on start-up with `Invalid option
'install_from_resolve' under [ruff] in path/to/pants.toml`, which isn't
very helpful. By having the fields exist, we ensure the user gets a
descriptive warning about what to do. NB. the backend is labelled
experimental, so I haven't tried to preserve behaviour, just focused on
ensuring we can show a removal error message to them.
@dgilmanAIDENTIFIED
Copy link

It hasn't been mentioned explicitly in this thread but a plugin system gives the opportunity to lint non-Python languages that have been embedded in a Python file. This does show up in Python files somewhat often (rST in docstrings, SQL string literals) but implementing those formatters is clearly out of scope for ruff itself.

It still makes sense to have ruff as the host for these linters, though: you'd get to take advantage of ruff's configuration (file exclude rules), you could piggyback on the # noqa functionality, and I think there's even an opportunity to extend the linter comment syntax for string literals: something like # lint: mysqlplugin to indicate which string literals you want linted.

Even when you're just linting a string literal you may want to expose the AST to that linter. A SQL statement might want to know what level of indentation a string literal was declared at to match vertical alignment:

def foo():
    x = """
select *
    """

could be linted/formatted into

def foo():
    x = """
    select *
    """

But maybe there is also opportunity to have a simpler "if this string literal is annotated for my linter (# lint: mysqlplugin), pass just the string literal's body to the plugin" API, or at least provide the AST-consuming glue for that as sample code.

I see a bit of discussion in this thread over forking vs in-process processing: if you are providing an AST or AST-lite API you can let the plugin decide if they want to fork or not. It would be unfortunate for performance if they had to fork but it is entirely possible that a plugin might want to use an existing linter and be practically constrained to fork to run it.

@MichaReiser
Copy link
Member

@dgilmanAIDENTIFIED supporting embedded languages that are commonly used with Python is a long-term goal. See #8237

@bverhoeve
Copy link

Maybe it's a bit too naive of an idea - but wouldn't the embedding also immediately solve the issue of needing Python linting/formatting plugins to be written in Rust? I.e. if you have a flake8 plugin that uses the AST as an interface you could use ruff which interprets and provides the AST to the plugin.

@adam-azarchs
Copy link

That sort of implies that there's some universal representation for ASTs, which isn't the case. ruff's in-memory representation of the AST is not something that python code could consume without conversion into python objects, and in particular python objects using the representation that flake8 wants. It doesn't seem obvious to me that such conversion would be significantly less time-consuming than simply having flake8 parse the source itself, and would certainly be a lot more effort.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Related to core functionality
Projects
None yet
Development

No branches or pull requests