Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow using a shell command/adhoc tool as part of any goal (check, lint, fmt, fix, deploy, ...) #17729

Open
huonw opened this issue Dec 6, 2022 · 42 comments
Assignees

Comments

@huonw
Copy link
Contributor

huonw commented Dec 6, 2022

Is your feature request related to a problem? Please describe.

Shell is a universal glue, that allows augmenting pants adhoc with a lower barrier to entry (and, for simple tasks, lower maintenance burden) than writing a custom plugin. It'd be nifty to generalise the current codegen experimental_shell_command, run experimental_run_shell_command and recently test experimental_test_shell_command (#17640) to allow using a shell script for 'anything'.

For example, in a polyglot repo, pants may support some of the languages/tools natively, but not others, but it'd still be nice to have basic commands like ./pants lint :: and ./pants fmt :: work across the whole system. Additionally, it would allow faster experiments and migration of an existing repo into pants, if glue scripts (and/or makefiles, or whatever) can be reused, and switch to having ./pants to more.

(We 'tripped' over both of these in our migration into pants, but I personally have liked pants enough to suffer through split tooling, and convince the rest of the team too, as well 😅 )

Describe the solution you'd like

Something way to have a shell command hook into other goals. For instance:

# in a directory with CFN templates
experimental_shell_command(
   name="my-linter",
   command="cfn-lint",  # https://github.com/aws-cloudformation/cfn-lint, installed somehow
   tools=["..."],
   goal="check",
   dependencies=["./*.yaml"]
)

# in a docs/ directory
experimental_shell_command(
   name="my-formatter",
   command="mdformat",  # https://github.com/executablebooks/mdformat
   tools=["..."],
   goal="fmt",
   dependencies=["./**/*.md"]
)
  • The dependencies would be the input files (for file-based commands).
  • The outputs field could be reused for fmt and fix to write back to the codebase.

Some questions that seem relevant (although some can probably be deferred/considered potential future enhancements?):

  1. How do these sort of shell commands get ordered with other fmt/fix subsystems (e.g. if my-formatter touches python files, does it run before or after black)?
  2. Can a shell command delete files as part of fmt and fix? How does that work?
  3. Maybe there can be opt-in or opt-out for invocations to run in parallel on subsets of the input?
    1. How does the target distinguish between configuration file that needs to be in all invocations, and an input file should be in exactly one invocation?
    2. Are outputs just naively merged (i.e. if one outputs a.txt and another outputs b.txt, the final output has both)? What happens if two invocations generate the same path (with different or identical content)?
  4. Maybe lumping all goals into one command isn't quite right, since they may have very different options?
  5. Is it easy/possible to express dependencies like **/*.py (to be able to run a custom linter on all Python files in descendent directories, without writing out the path to their individual targets)?
  6. How do custom commands get installed? Some possibilities:

Describe alternatives you've considered

We are:

  • Actively using: shell scripts outside pants, with some sort of orchestrator for them (e.g. GitHub Actions YAML, internal documentation to copy-paste from for us; potentially Make or similar too)
  • Considering: custom plugins

Additional context
N/A

@benjyw
Copy link
Contributor

benjyw commented Dec 8, 2022

cc @chrisjrn who is already waist deep in shell command stuff.

@chrisjrn
Copy link
Contributor

chrisjrn commented Dec 8, 2022

Oh this is interesting.

If I were designing this, I could imagine being able to hook into existing non-destructive goals (like check or lint) where there's no confusion around ordering (since they're side-effect free so can be run concurrently), and potentially having a new goal for running custom destructive/side-effecting goals -- so you could run ./pants experimental-custom-thingy path/to/side_effecting:target, which would avoid most of the questions that you raise.

I agree that hooking into fmt etc would be best in the long run, but creating a custom experimental goal would make this achievable very quickly

@huonw
Copy link
Contributor Author

huonw commented Dec 8, 2022

For the short term, I feel that a good-enough approach to side-effects exists may already via experimental_run_shell_command, run like ./pants run path/to/side_effecting:target. For instance, in some experiments, I have a 'fmt'/'fix'-like target that consists of effectively (this is off the top of my head, so may not be 100% accurate, but I think the concept is about right):

# actually do all the interesting work to change files (cacheable, etc)
experimental_shell_command(
  name="make_the_changes", outputs=["generated.txt"], command="...", ...
)

# write the side-effect back (i.e. replacement for fmt/fix)
experimental_run_shell_command(
  name="write_the_changes", command="cp {chroot}/path/to/generated.txt path/to/whatever.txt", dependencies=[":make_the_changes"], ...
)

# test that things match (i.e. replacement for the lint part of a fmt/fix goal)
experimental_test_shell_command(
  name="check_the_changes", command="diff path/to/generated.txt path/to/whatever.txt", dependencies=[":make_the_changes", "./whatever.txt"], ...
)

That said, this isn't perfect, because every run seems to have happen as its own ./pants invocation? And there doesn't seem to be a way to orchestrate around this from within pants itself:

  • we have a do-it-all alias local = "tailor update-build-files fmt lint check ::" for the quick local checks, and it'd be nice to be able to run the side-effecting shell commands as part of this in an integrated way
  • adding run to it fails, e.g. running ./pants fmt :: run path/to:write_the_changes seems to be equivalent to ./pants fmt run :: and that understandably fails (TooManyTargetsException: The `run` goal only works with one valid target, but was given multiple valid targets:)
  • we thus seem to have to have shell script that does a few consecutive invocations of ./pants

@chrisjrn
Copy link
Contributor

chrisjrn commented Dec 8, 2022

Running multiple targets would definitely be an interesting use case, and you're right, we don't have that right now. 🤔

@chrisjrn
Copy link
Contributor

After some discussion, and given we now have test support through experimental_test_shell_command, it probably makes sense to support check with similar behaviour; this would cover all use cases that would be handled by check or lint (there's not really a good semantic difference between check and lint imho).

Tools with side-effects are still interesting. I'm less happy with the idea of hacking something together with run now, since we lose sandboxing, and with it the ability to discard or roll-back the results of the goal: this is important since it means you can run fmt as a dry-run at CI time and report on whether the commit is fmt-clean entirely from within Pants. That's much harder if you're using run and not sandboxing things.

I think we'd want to implement mutating behaviour with a different goal, because the semantics are different to fmt: if we have a target (extremely strawman name) that defines a mutating command

experimental_mutating_command(
  name="mutator",
  targets=["//src/js/package_to_format:js_sources",],
)

Then the target spec to call mutator would be ./pants fmt path/to/mutator:mutator, however, src/js/package_to_format/*.js would be the files that are formatted. i.e. the goal would mutate targets that are not the ones specified in the target spec. This is significantly different to fmt's behaviour, and I think that warrants a separate goal with separate documentation.

Finally, having a separate goal would make it possible to string together multiple experimental_mutating_command targets at the command line (or have them caught by ::).

Thoughts?

@huonw
Copy link
Contributor Author

huonw commented Dec 14, 2022

Makes sense for check (and lint)!

I'm not sure I entirely understand the proposal for fmt. Would it be something like ./pants goal-name --arg-name=path/to/mutator:mutator1 --arg-name=path/to/mutator:mutator1 :: (placeholder names to avoid getting distracted by bikeshedding 😅 ) to run mutator1 and mutator2 on all targets? If so, that seems pretty reasonable!

Questions (in decreasing order of priority, discussion of the priority in parens):

  1. would that work with an do-it-all command like the alias referenced in Allow using a shell command/adhoc tool as part of any goal (check, lint, fmt, fix, deploy, ...) #17729 (comment)? For example, ./pants fmt goal-name --arg-name=path/to/mutator:mutator1 --arg-name=path/to/mutator:mutator2 :: would be equivalent to ./pants fmt :: then mutator1 then mutator2? (Having the ability to do this would alleviate a lot of the downsides of being a separate goal to fmt, so that we can set up a pants alias for ease of use locally.)
  2. based on your very small example, it kinda looks like targets might require actual target specifications, which means it is effectively an allowlist? Would there be a way to run on any *.yml or *.py file anywhere in the tree? (Without this, I'd be nervous about adopting this feature, as we'd almost certainly forget to keep the allowlist up to date as code is added/moved.)
  3. it's nice that ./pants lint :: currently runs all the linters and formatters (in dry-run mode) in parallel, would the dry run mode of the command from 1 do something similar, or would it be more sequential (e.g. run linters and then these new style)? (Just an optimisation, not critical.)

A probably-wild idea (more along the lines of your "in the long run" reference above, rather than a first step) would be getting these into backend_packages, e.g. something like the following, where pants automatically understands that it needs to resolve the mutator1 "backend" as a target within the repo:

backend_packages = [
  "pants.backend.python",
  "pants.backend.python.lint.black",
  "//path/to/mutator:mutator1", # runs after black, but before isort
  "pants.backend.python.lint.isort",
]

That seems like it may require some careful circular dependency resolution: pants needs the backends in order to resolve the targets, and needs to resolve targets in order to load the backends. 😅

(This potentially generalises how pythonpath = ["%(buildroot)s/pants-plugins"] can be used to load pants-plugins/path/to/plugin as a path.to.plugin backend: it can instead be just listed as a //pants-plugins/path/to/plugin:plugin backend directly.)

@chrisjrn
Copy link
Contributor

Without really re-thinking things, there would be no way to specify which targets the mutator is to act upon, except within the target definition itself.

If you need more info, I'll dump that when I'm back at my desk.

@chrisjrn
Copy link
Contributor

(The crux of it is that :: would supply literally every file type to the custom mutator. The way to make :: behave sanely is to make a plugin with a custom source type, at which point it's just as easy to make a fmt plugin.)

@thejcannon
Copy link
Member

thejcannon commented Dec 21, 2022

Yeah fmt/fix seems a bit more interesting, but I'd really love if we saw the reporting goals (lint/check/etc...) in an experimental target. I have some "lints" I really want to do in-repo but plumbing a full plugin is overkill (e.g. "make sure all test files use the _test.py suffix, and not test_). We can then come back and figure out fix/fmt later.

I suspect this feature would take off like hotcakes. In fact ignoring the complexity of ICs and whatnot, imagine flake8 implemented as a pex_binary and experimental_lint_command (with a dependency on the relevant configs).

🤯

@huonw huonw changed the title Allow using a shell command as part of any goal (check, lint, fmt, fix, deploy, ...) Allow using a shell command/adhoc tool as part of any goal (check, lint, fmt, fix, deploy, ...) May 18, 2023
@gauthamnair
Copy link
Contributor

gauthamnair commented Sep 29, 2023

@huonw I have been playing around with possibilities relating to the scoped problem of:

  • we have some program we normally use for linting like cfn-lint or mdformat
  • we'd like to be able to run it in pants like a linter, with very low set-up overhead.

the idea is there is someone trying to convert to pants and they currently are running, say mdformat from the command line, and they'd like to replicate that behavior on pants without having to understand enough about pants to adapt https://www.pantsbuild.org/docs/plugins-lint-goal to their needs and create a custom plugin.

The demo repo in https://github.com/gauthamnair/example-pants-byo-tool shows an API to "Bring Your Own Tool". The user makes a tiny plugin with this configuration:

confs = [
    ByoLinter(
        options_scope='byo_shellcheck',
        name="Shellcheck",
        help="A shell linter based on your installed shellcheck",
        command="shellcheck",
        file_extensions=[".sh"],
    ),
    ByoLinter(
        options_scope='byo_markdownlint',
        name="MarkdownLint",
        help="A markdown linter based on your installed markdown lint.",
        command="markdownlint",
        file_extensions=[".md"],
    )
]

And the linters become available:

$ pants lint ::
...

✓ byo_markdownlint succeeded.
✓ byo_shellcheck succeeded.
✓ flake8 succeeded.

The underlying library is dynamically creating the different types and rules required for a proper implementation of LintRequest based on the user's configuration. (the lib is at #19954)

It is a crude demo and much would be needed to be done to button it up, too many things to list here even, but the hypotheses driving the approach is:

  • pants, the plugin code and the core configuration (pants.toml) deal with something different than the target declarations in the BUILD files. Keeping the boundary could help.
  • A very knowledgeable user could crank out plugins for the scripts they'd like to use. Its just that it is a bunch of work, even for "simple" cases. For less experienced users, there are also a lot of concepts to grok, of a quite different category than what they previously needed to understand to just run markdownlint docs/ .

If there is a lot of incidental complexity in activating a new tool for a goal, then we can add more tooling, but keep the responsibility on the pants/plugin/pants.toml side of the fence.

The demo adds the capability as an abbreviated plugin, but it could be configured entirely within pants.toml by creating some new bootstrap option and threading them to load_backend during extension loading. I plan to play with that to see what could happen there.

Having it in pants.toml would reduce the barrier to entry to probably around what a new user would find welcoming.

With such a plugin-generation strategy, we should be able to handle fix and fmt without much difficulty. We would implement one for some custom tool, look at the part that looks like boilerplate, and abstract it away.

@gauthamnair
Copy link
Contributor

The demo in example-pants-byo-tool is fleshed out now more and includes the following kind of API:

The user defines an in-repo plugin with a register.py containing:

confs = [
    ByoTool(
        goal="lint",
        options_scope='byo_markdownlint',
        name="MarkdownLint",
        help="A markdown linter based on your installed markdown lint.",
        executable=SystemBinaryExecutable("markdownlint", tools=["node"]),
        file_glob_include=["**/*.md"],
        file_glob_exclude=["README.md"],
    ),
    ByoTool(
        goal="lint",
        options_scope='byo_flake8',
        name="byo_Flake8",
        help="Flake8 linter using the flake8 you have specified in a resolve",
        executable=PythonToolExecutable(
            main=ConsoleScript("flake8"),
            requirements=["flake8>=5.0.4,<7"],
            resolve="byo_flake8",
        ),
        file_glob_include=["**/*.py"],
        file_glob_exclude=["pants-plugins/**"],
    ),
    ByoTool(
        goal="fmt",
        options_scope='byo_black',
        name="byo_Black",
        help="Black formatter using the black you have specified in a resolve",
        executable=PythonToolExecutable(
            main=ConsoleScript("black"),
            requirements=[
                "black>=22.6.0,<24",
                'typing-extensions>=3.10.0.0; python_version < "3.10"',
            ],
            resolve="byo_black",
        ),
        file_glob_include=["**/*.py"],
        file_glob_exclude=["pants-plugins/**"],
    )
]

And markdownlint, flake8 and black become activated for use. Properties:

  • the linters/formatters are file-based rather than target-based. This prevents the need for defining targets, but is still compatible with target addresses being used as specs (as they usually have been).
  • so far there are a couple of ways to define the executable:
    • a pex executable just like PythonTool uses. But requires the user to make the dependencies available in a resolve.
    • a system-provided executable with potentially other accessory tools.
    • ... it is straightforward to extend to other executable tool formats, like Downloaded executable, etc.
  • Formatting works just as usual
  • option-level skip and only work as usual to select which tools you want to run. Help appears.

It will be straightforward to add the following features as well:

  • a mechanism for defining Configs, could be as simply as having a field that takes the same kind of ConfigFilesRequest we usually use.

It is aspirational to be able to define this configuration within pants.toml or some other config file so the user does not even have to know about the involved python types.

Under the hood we take the user-provided configuration and generate all the types and rules needed to wire up a plugin. It uses the regular mechanisms to do so, and involves no change to the core. It is basically code-generation.

The implementation in #19954 is a proof-of-concept.

@gauthamnair
Copy link
Contributor

gauthamnair commented Oct 14, 2023

For reference, there is a slack thread in which we've been discussing possibilities: https://pantsbuild.slack.com/archives/C0D7TNJHL/p1696006330169339

There is also a google slides doc (should be open for read to anyone with the link) incorporating the discussion in #16412 which tackles the same issue of making linters easier to express.

@cburroughs
Copy link
Contributor

As a small but concrete example, we have several pre-commit hooks: https://pypi.org/project/pre-commit-hooks/ Maybe of them are small single file things like check-shebang-scripts-are-executable or trailing-whitespace. We also have some similarly sized custom hooks. I'd like to run them through pants:

  • So pants is the one stop shop
  • Pants is way better at transitive dependencies, caching, changed-since, etc
  • Pants can deal with getting the python executable to run them.

But since they are all so small it would be absurd to make a plugin for each one.

@gauthamnair
Copy link
Contributor

gauthamnair commented Oct 17, 2023

Proposal: Easy linters via conf + runnable targets

We could define runnable targets as usual in BUILD files:

# build-support/BUILD

python_requirement(name="black_req", requirements=["black==23.1.0"])

# a runnable
pex_binary(
  name="black_bin",
  script="black",
  dependencies=[":black_req"],
)

These are runnables in the same sense as those compatible with adhoc_tool.
Then we add configuration to pants.toml that refers back to this runnable and instructs how to use it for lint/fmt/fix.

# pants.toml

[GLOBAL]
backend_packages.add = ["...", "pants.backends.byotool"]

[byotool]
conf = """{
  'byo_black': {
    'goal': "fmt",
    'runnable': 'build-support:black_bin'      #  any runnable supported by adhoc_tool
    'name': "Black",
    'help': "Black formatter …",
    'file_glob_include': ["**/*.py"],
    'file_glob_exclude': ["pants-plugins/**"],
    'config_files': {
        'discover': {
            'globs': "**/pyproject.toml"
        }
    }
  }
}"""

The user would then use byo_black in the same way as they currently use the black backend linter.

pants.toml is still responsible for defining what lint means, repo-wide, and for driving the options system, but at run-time it looks up a target to build the relevant executable.

Related prior art

adhoc_tool established a means to generically run "runnable" targets in a sandbox. Those definitions and the mechanism should be reusable here.

There is precedent for referring to targets from pants.toml. For example to specify in-repo first-party plugins for flake8.

Problems with original ByoTool proposal

One big problem with the ByoTool proposal is that it introduces yet another way to express a runnable in pants. Currently we have two methods:

  1. In plugin code by subclassing PythonToolBase, JvmToolBase, SystemBinary, TemplatedExternalTool etc... These data structures are quite complex and involve concerns like the options system. They are meant to be used by pants contributors and power users

  2. In target definitions. These are things like pex_binary, python_source, system_binary, jvm_artifact. They are intended for use by the developers whose code lives in a pants-enabled repo.

Many of the above are logically similar. For example the pex_binary target for a third party lib can accept much of the same type of data as a PythonToolBase subsystem.

The original ByoTool proposal introduces a 3rd way: a simplified data-only (no options) specification of executables for developers to use but on the plugin side. It would be much better to lean on the existing ability to specify runnables in targets.

@cburroughs
Copy link
Contributor

In the proposal, is there a reason the conf = """{ is a string as opposed to the config expanding out as "regular toml"?

I think my ideal world these definitions would be python-y things like BUILD files, but I'm not sure if there is prior art for things other than targets being in BUILD-like files and I share the hesitation to rope this -- hot cakes! -- proposal to nebulously larger changes.

@gauthamnair
Copy link
Contributor

Here is a working prototype that should be good for testing usability:

User in-repo setup

BUILD

in the BUILD files define runnable targets and then define a new code_quality_tool target that references it and adds other things needed to use it.

file(name="flake8_conf", source=".flake8")

python_requirement(
    name="flake8",
    requirements=["flake8==5.0.4"],
    resolve="byo_flake8"
)

code_quality_tool(
    name="flake8_linter",
    runnable=":flake8",   # a runnable in the sense of adhoc_tool
    runnable_dependencies=[],
    execution_dependencies=[":flake8_conf"],
    args=["--max-line-length", "100"],
    file_glob_include=["**/*.py"],
    file_glob_exclude=["pants-plugins/**"],
)

We refer to config files with regular targets in the same style as adhoc_tool

Here's an example of using a system binary that itself depends on node:

system_binary(
    name="node",
    binary_name="node",
    fingerprint=r'.*',
    fingerprint_args=["--version"],
)

system_binary(
    name="markdownlint",
    binary_name="markdownlint",
    fingerprint=r'.*',
    fingerprint_args=['--version'],
    fingerprint_dependencies=[':node']
)

code_quality_tool(
    name="markdownlint_linter",
    runnable=':markdownlint',
    runnable_dependencies=[":node"],
    file_glob_include=["**/*.md"],
    file_glob_exclude=["README.md"],
)

pants.toml

The code quality tools have to be registered with a goal, and have their options scopes named. For fmt/fix, they also need to be inserted in the correct order.

[GLOBAL]
pants_version = "2.17.0"
backend_packages.add = [
    "pants.backend.python",
    "pants.backend.experimental.code_quality_tool(goal='lint', target='//:flake8_linter', scope='flake8', name='Flake8')",
    "pants.backend.experimental.code_quality_tool(goal='lint', target='//:markdownlint_linter', scope='markdownlint', name='Markdown Lint')",
    "pants.backend.experimental.code_quality_tool(goal='fmt', target='//:black_formatter', scope='black_tool', name='Black')",
]

The proposal extends the backend loading mechanism to be able to use the same module to define several sets of rules based on inputs.

No further configuration is required.

Usage of the code quality tools

Outwardly the installed code quality tools behave very similarly to regular ones. See https://github.com/gauthamnair/example-pants-byo-tool/tree/main#readme for what the command line input/output look like.

@thejcannon
Copy link
Member

thejcannon commented Oct 27, 2023

I'm absolutely loving it so far. This is honestly better than I'd imagined.

Only nit I have:

The proposal extends the backend loading mechanism to be able to use the same module to define several sets of rules based on inputs.

Changing backend_packages.add certainly restarts the daemon. So we likely would want to put this in an option that wouldn't restart the daemon. If ordering is important (it really only is for fix/fmt, in which case ideally ordering shouldnt matter because that's a red flag) we could probably figure out a different strawman.

So I think it's best to just assume ordering doesn't matter between backend_packages and code_quality_tools.

@cburroughs
Copy link
Contributor

I'll also echo that this is exciting!

file_glob_include=["**/*.py"],

What's the intuition I should have for how this interacts with caching? If one .py file changes, is the tool run over just that file or the whole globs?

@thejcannon
Copy link
Member

What's the intuition I should have for how this interacts with caching? If one .py file changes, is the tool run over just that file or the whole globs?

The correct thing (and I suspect the behavior as implemented in the branch) is the same as if you edited a file with any of the existing plugins: it invalidates the batch that contains the file. So you re-run the batch.

@gauthamnair
Copy link
Contributor

@cburroughs The file_glob_include only describes what files are eligible to be the subject of the code quality tool. The actual runs are done by the usual batching mechanism. You can also subselect what you run on in the same way as with existing linters and formatters. Caching should work on the batches.

@thejcannon I am a bit puzzled about the daemon-restarting bit. Normally, if we add a new linter or formatter we have to restart the daemon because it introduces all kinds of new rules, etc. We add to pants.toml and that restarts the daemon.

In the proposal above, I think we are in no worse of a situation. Adding a new linter is a one-time change to the pants.toml and will result in one restart of the daemon, which seems necessary to install the new underlying types and rules.

Changing things about the configuration of the code quality tool, like which config files it depends on, or which args should always be passed, does not require restarting the daemon since that is all in the target.

@gauthamnair
Copy link
Contributor

@cburroughs about your earlier comment, it did turn out to be possible to move more stuff into the BUILD file, leaving the bare minimum in pants.toml to define the options subsystem and rule-graph rules for each tool. These things have to be done during bootstrapping and to my understanding cannot involve querying for targets and their contents.

Fortunately there was a way to dodge the ugly conf = """{ that you were wary about.

@thejcannon
Copy link
Member

thejcannon commented Oct 27, 2023

@thejcannon I am a bit puzzled about the daemon-restarting bit. Normally, if we add a new linter or formatter we have to restart the daemon because it introduces all kinds of new rules, etc. We add to pants.toml and that restarts the daemon.

Fair enough.

So, the other awkwardness with this approach is that it would be essentially impossible to have some "higher" config remove this backend (e.g. backend_packages.remove = ) because their string is... very involved. So another reason we might want to shuffle those values into its own (like Dict) option.

@gauthamnair
Copy link
Contributor

+1 definitely interested in other ways to do it. There were some mechanical conveniences in doing it in backend_packages only, but its when it also seemed conceptually sound that it became attractive to introduce the unusual syntax:

pants.backend.experimental.code_quality_tool is a kind of backend code-generator. It stamps out what one normally uses as backends as long as it is given some templated arguments. Taking the analogy for what we would do in regular programming, we could name these in a preliminary step, and then load those as backend packages.

Something like

[GLOBAL]

generated_backend_packages = {
    flake8_linter = "pants.backend.experimental.code_quality_tool(goal='lint', target='//:flake8_linter', scope='flake8', name='Flake8')"
    markdown_linter = "pants.backend.experimental.code_quality_tool(goal='lint', target='//:markdownlint_linter', scope='markdownlint', name='Markdown Lint')"
    black_formatter = "pants.backend.experimental.code_quality_tool(goal='fmt', target='//:black_formatter', scope='black_tool', name='Black')"
}

backend_packages.add = [
    "pants.backend.python",
    "flake8_linter",
    "markdown_linter",
    "black_formatter",
]

@huonw
Copy link
Contributor Author

huonw commented Oct 27, 2023

Thank you for all the experimentation! So good!

Just brainstorming: what if the BUILD target incorporated things like which goal and the name, and the pants.toml line was just the address path/to:black? With pants understanding "if it has a colon don't try to import it as Python syntax", and instead does the magic to read it from a BUILD file and deduce the goal etc from that. This would simplify the text in backend packages to allow easier addition/removal.

@gauthamnair
Copy link
Contributor

gauthamnair commented Oct 27, 2023

@huonw I am worried about

magic to read it from a BUILD file

From what I can tell, there is no precedent for reading into BUILD files, interpreting targets, etc. during options bootstrapping. For example, one usually has to load all backends to discover all target_types and build aliases before our usual method to parse BUILD files can work. Similarly, is there precedent for the rule graph's structure to depend on particular target definitions in BUILD files?

There is precedent for subsystems having references to targets that will be resolved at runtime.

Maybe another angle on this matter:

  • if we determine what goal a tool applies to by configuration in a BUILD file, that affects things like UnionMembership and the structure and definition of the rules (are we creating a LintResult or a FmtResult? and what is the return type?).
  • This means that changing this configuration has to change the rule graph.

@kaos
Copy link
Member

kaos commented Oct 28, 2023

yea, as @gauthamnair says, there's no interaction with BUILD files during options bootstrapping and loading backends. I think we'd need to simplify the bootstrapping process before considering going into a two-phase bootstrapping scenario where we can consume BUILD file targets for bootstrapping purposes. Environments does this already (the two-phase thing) but not for bootstrapping options, but, environments :)

@huonw
Copy link
Contributor Author

huonw commented Oct 29, 2023

Makes sense, thanks!

@gauthamnair
Copy link
Contributor

Friends, the positive reception is suggesting we keep pushing. I am going to split the wip PR into two different PRs:

  1. Introduce library-level functionality for CodeQualityTool. After this PR, it will be possible for a user to define code_quality_tool targets and write a very thin in-repo plugin to activate them via the lib. This is the core functionality of the proposal.

  2. A later PR will be about how one could use such functionality without having to write an in-repo plugin but through pants.toml alone. This aspect is where most of the discussion in the last few messages has been because it will involve either new pants.toml syntax or possibly some change in the bootstrapping mechanism.

@kaos
Copy link
Member

kaos commented Oct 31, 2023

@gauthamnair any reason this is "scoped" to code quality, rather than just a generic any kind of tool?

@gauthamnair
Copy link
Contributor

@kaos, code quality was some generic term that could encompass lint, fix, and fmt. I could not think of a better one (actually it was chat GPT's idea, no kidding). The idea is there is a target that defines a tool that can be used for one of these purposes. The tool cannot 'deploy' or 'publish' or anything like that.

@benjyw
Copy link
Contributor

benjyw commented Nov 1, 2023

Thanks for digging in on this @gauthamnair . Having higher level "porcelain" like this to simplify common plugin tasks is a great idea. Starting with fix/fmt/lint makes sense to me for scope. We can later add more porcelain, such as for code generation, or deploy.

@gauthamnair
Copy link
Contributor

#20135 has a CI-passing version of the library part of this functionality. Please take a look.

kaos pushed a commit that referenced this issue Nov 15, 2023
This is a library supporting the behavior proposed in
#17729 (comment)

The usage example is a bit awkward but there is a lot of variation and
discussion about the steps to make the use-case smooth. That work is
separated out to a subsequent PR. The core is here.

## Goal

Allows a user to define custom linters or formatters by:
1. defining an adhoc-tool compatible target
2. wrapping it in a new  `code_quality_tool` target
3. referring to this target from a thin in-repo plugin.

## Example Use-case

in `pants.toml`:
```python
[GLOBAL]
pythonpath = ["%(buildroot)s/pants-plugins"]
backend_packages.add = [

    # needed for python_requirement
    "pants.backend.python",

    # code_quality_tool's target and rules are defined in the adhoc backend
    "pants.backend.experimental.adhoc",

    # thin in-repo plugin activates a particular target as a linter
    "flake8_linter_plugin",
]
```

in `BUILD`:
```python
# a python requirement is an adhoc_tool-compatible runnable
python_requirement( 
    name="flake8",
    requirements=["flake8==5.0.4"]
)

# new target type describes how to use a runnable as a code quality tool
code_quality_tool(
    name="flake8_tool",
    runnable=":flake8",
    execution_dependencies=[":flake8_conf"],
    file_glob_include=["**/*.py"],
    file_glob_exclude=["dont_lint_me/**"],
    args=["--indent-size=2"],
)
```

and an in-repo plugin `pants-plugins/flake8_linter_plugin/register.py`
```python
from pants.backend.adhoc.code_quality_tool import CodeQualityToolRuleBuilder

def rules():
    cfg = CodeQualityToolRuleBuilder(
            goal="lint", target="//:flake8_tool", name="Flake8", scope="flake8_tool"
    )
    return cfg.rules()
```

The lib supports:
- fmt, lint and fix
- any runnable supported by adhoc_tool
- passing additional runnables that need to be available on the PATH
- passing configuration via exec dependencies
- passing args
- file-based lint/fmt/fix only (as opposed to target based - therefore
no target-level `skip_` fields)
- `skip` and `only` working similarly to regular code quality tools.

See the tests for demos.

## Full demo repo

See this branch:

https://github.com/gauthamnair/example-pants-byo-tool/tree/code-quality-tool-lib-demo


## Follow-ups

- Add the ability to access this functionality without having to write
even a thin in-repo plugin but in pants.toml directly. (as discussed in
#17729 (comment))
- DRY/factor out common bit of machinery with adhoc_tool
@gauthamnair gauthamnair self-assigned this Dec 2, 2023
@gauthamnair
Copy link
Contributor

Possibilities for a code quality tool API for wider use

#20135 Added the core machinery but only exposes it to users by writing a small in-repo plugin. Code quality tool is not yet announced and we are pondering if we want an "easier" way before announcing it.

Here are several proposals summarized from the rest of this thread. As an example we'll add flake8 ourselves rather than via the pants plugin.

Proposal: In-repo thin plugin (current state)

This is already supported and a reference for all other proposals

BUILD:

python_requirement( 
    name="flake8",
    requirements=["flake8==5.0.4"]
)

code_quality_tool(
    name="flake8_tool",
    runnable=":flake8",
    execution_dependencies=[":flake8_conf"],
    file_glob_include=["**/*.py"],
    file_glob_exclude=["dont_lint_me/**"],
    args=["--indent-size=2"],
)

pants.toml

[GLOBAL]
pythonpath = ["%(buildroot)s/pants-plugins"]
backend_packages.add = [
    "pants.backend.python",
    # code_quality_tool's target and rules are defined in the adhoc backend
    "pants.backend.experimental.adhoc",
    # thin in-repo plugin activates a particular target as a linter
    "flake8_linter_plugin",
]

and an in-repo plugin pants-plugins/flake8_linter_plugin/register.py

from pants.backend.adhoc.code_quality_tool import CodeQualityToolRuleBuilder

def rules():
    cfg = CodeQualityToolRuleBuilder(
            goal="lint", target="//:flake8_tool", name="Flake8", scope="flake8_tool"
    )
    return cfg.rules()

This is already supported and does not require introducing any new concepts, bits of syntax or overloading of syntax

Proposal: target as a pants.toml backend

#17729 (comment)

This is the first of several proposals that involve introducing new semantics to backend_packages. If the target had a bit more info, it seems we could refer to the target directly and the subsystem loader could be adjusted to activate it.

BUILD:

python_requirement( 
    name="flake8",
    requirements=["flake8==5.0.4"]
)

code_quality_tool(
    name="flake8_tool",
    runnable=":flake8",
    execution_dependencies=[":flake8_conf"],
    file_glob_include=["**/*.py"],
    file_glob_exclude=["dont_lint_me/**"],
    args=["--indent-size=2"],
    # These next two fields used to be in the in-repo plugin
    scope="flake8_tool",
    subsystem_name="Flake8",
    goal="lint",
)

pants.toml

[GLOBAL]
pythonpath = ["%(buildroot)s/pants-plugins"]
backend_packages.add = [
    "pants.backend.python",
    # needed to activate code_quality_tool
    "pants.backend.experimental.adhoc",
    # reference to the code_quality_tool
    "//:flake8_tool", 
]

The ergonomics or ease seems very clean on this. The complication is reshaping the boundary between the role of targets and the role of subsystems and bootstrapping in a pants project both in concepts and in implementation. As Andreas points out this was done in a way for environments, but loading targets during bootstrap has gotchas

Proposal: "templated" backends

#17729 (comment)

Proposes that a backend can be templated and stamped out by supplying the template arguments all within the pants.toml. A templating backend is like a regular backend except it takes keyword arguments in the entrypoints in register.py.

BUILD (same as base case):

python_requirement( 
    name="flake8",
    requirements=["flake8==5.0.4"]
)

code_quality_tool(
    name="flake8_tool",
    runnable=":flake8",
    execution_dependencies=[":flake8_conf"],
    file_glob_include=["**/*.py"],
    file_glob_exclude=["dont_lint_me/**"],
    args=["--indent-size=2"],
)

pants.toml:

[GLOBAL]
pythonpath = ["%(buildroot)s/pants-plugins"]
backend_packages.add = [
    "pants.backend.python",
    # code_quality_tool's target and rules are defined in the adhoc backend
    "pants.backend.experimental.adhoc",
    # stamps out a backend that uses the particular target as a linter
    "pants.backend.experimental.code_quality_tool(goal='lint', target='//:flake8_tool', scope='flake8', name='Flake8')",
]

There could be other forms of syntax. The particular variant above has a POC implementation here . A pro of this approach is it maintains the separation of concerns between options/bootstrapping and target-based configuration, and as such has a straightforward implementation.

A con is the introduction of templating as a concept, and the oddness of introducing something like a function call all within "" in what is otherwise a list of packages. Josh has also raised the concern that these complicated strings may be difficult to remove when a backend_packages.remove is used.

A variant of the proposal is to have separate section where stamping out of such templated backends gets done and then refer to those in backend_packages:

[GLOBAL]

generated_backend_packages = {
    flake8_linter = "pants.backend.experimental.code_quality_tool(goal='lint', target='//:flake8_linter', scope='flake8', name='Flake8')"
}

backend_packages.add = [
    "pants.backend.python",
    "pants.backend.experimental.adhoc",
    "flake8_linter",
]

@kaos
Copy link
Member

kaos commented Dec 6, 2023

Of these I like the templated alternative the most, with a slight adjustment.

Instead of using a single string value for the backend and it's arguments, I'd consider exploading it onto separate option values:

[GLOBAL]

backend_packages.add = [
    "pants.backend.python",
    "pants.backend.experimental.adhoc",
    # using a static prefix to avoid conflict with any other plugin names now and into the future.
    "generated:flake8_linter",
]

[GLOBAL.generated_backend_packages.flake8_linter]
backend_package = "pants.backend.experimental.code_quality_tool"
goal = "lint"
target = "//:flake8_linter"
scope = "flake8"
name = "Flake8"

Although the current options system don't have good support for this deeply nested option structures, it does allow arbitrary values, so the goal, target, scope etc are "free form" and backend specific so no validation done on them by pants itself. They are merely passed on as kwargs to the specific backend. Only the backend_package is required and checked by pants and excluded from the kwargs used for the backend init call.

@kaos
Copy link
Member

kaos commented Dec 6, 2023

Looking at this, trying to put on the user goggles the generated_backend_packages seems a bit awkward and "implementation detail leak-y".. how about just calling it "backend"?:

[GLOBAL]

backend_packages.add = [
    "pants.backend.python",
    "pants.backend.experimental.adhoc",
    # using a static prefix to avoid conflict with any other plugin names now and into the future.
    "backend:flake8_linter",
]

[GLOBAL.backend.flake8_linter]
backend_package = "pants.backend.experimental.code_quality_tool"
goal = "lint"
target = "//:flake8_linter"
scope = "flake8"
name = "Flake8"

@gauthamnair
Copy link
Contributor

This is a nice idea @kaos , riffing on it further. There is a backend_package and there are also backends. Previously they have been 1-1. Every backend_package was a python package satisfying the register.py with no-arg entrypoints protocol.

Now we still have python packages with relevant code but some packages do not automatically generate a backend because they need further parameters. They can also generate several backends.

So we could have something like this:

[GLOBAL]

backend_packages.add = [
    "pants.backend.python",
    "pants.backend.experimental.adhoc",
     # maybe this line below is not needed
    "pants.backend.experimental.code_quality_tool_template",
]

[GLOBAL.backends.flake8_linter]
backend_package = "pants.backend.experimental.code_quality_tool_template"
goal = "lint"
target = "//:flake8_linter"
scope = "flake8"
name = "Flake8"

New linters would not add new lines to backend_packages, and backend_packages would still be packages. There are several options for how the system would know code_quality_tool_template is a templating backend package rather than the usual kind. We could either add new methods to the register.py protocol or we could add a new entry point like register_templated.py.

@kaos
Copy link
Member

kaos commented Dec 6, 2023

Ah, yes. I think we could leave templated backends out of backend_packages altogether, and simply have a enabled = <bool> option instead, in the GLOBAL.backends.<name> section.

@huonw
Copy link
Contributor Author

huonw commented Dec 6, 2023

The separate [GLOBAL.backend] sections seems very nice, nice idea @kaos!

I believe order of backends can matter (e.g. if a formatters/fixers need to be invoked in a particular order), so I wonder if omitting them completely from the backend_packages list isn't going to work?

@gauthamnair
Copy link
Contributor

Huon, you are right. order matters, it has to be referenced. I'll take a crack at seeing what the implementation could look like.

I have to confess that I'm starting to hear the voice saying "When in doubt, leave it out" about trying to do something more ergonomic than the thin in-repo plugin. Questioning whether it is really a blocker before go-to-market for code_quality_tool.

Hope to report back with some kind of POC impl of the [GLOBAL.backend] based approach soon.

@gauthamnair
Copy link
Contributor

@kaos @huonw Here is a POC impl. #20270 . I am really liking it actually. I thought I would be more afraid of it when it is alive, but turns out it is comforting instead. Here is the toml from the example repo that shows it in action:

[GLOBAL]
pants_version = "2.17.0"
backend_packages.add = [
    "pants.backend.python",
    "pants.backend.experimental.adhoc",
    "flake8",
    "markdownlint",
    "blackfmt",
]

[GLOBAL.templated_backends.flake8]
template = "pants.backend.experimental.adhoc.code_quality_tool_backend_template"
goal = "lint"
target = "//:flake8_linter"
name = "Flake8"

[GLOBAL.templated_backends.markdownlint]
template = "pants.backend.experimental.adhoc.code_quality_tool_backend_template"
goal = "lint"
target = "//:markdownlint_linter"
name = "Markdown Lint"

[GLOBAL.templated_backends.blackfmt]
template = "pants.backend.experimental.adhoc.code_quality_tool_backend_template"
goal = "fmt"
target = "//:black_formatter"
name = "Black"

@kaos
Copy link
Member

kaos commented Dec 7, 2023

I believe order of backends can matter (e.g. if a formatters/fixers need to be invoked in a particular order), so I wonder if omitting them completely from the backend_packages list isn't going to work?

I wish us to rid this constraint. That is, aim for having the backends provide the needed information to resolve order instead of relying on the user provided configuration is in proper order. This is a sneaky gotcha.

Isn't most solved by simply applying fixers first then formatters and lastly linters?
If order of fixers/formatters matter that would be awkward but hopefully we could find a solution to that as well if needed.

@kaos
Copy link
Member

kaos commented Dec 7, 2023

@gauthamnair really nice!

For me, I'd prefer not to surface the "templating" part too much, keeping the UX for what I feel would be cleaner, I'll leave some notes for that on your draft PR.

Also, I like the way the POC interacts with backend_packages so the extra sections are merely a way to provide kwargs dynamically to any backend.

huonw pushed a commit that referenced this issue Dec 30, 2023
Before moving to step 2 of the plan described in
#17729 (comment)
, cleaning up a gross duplication of rule code that I introduced in
#20135 between `adhoc_tool` and
the new `code_quality_tool`.

This PR extracts the shared logic into the concept of a ToolRunner and a
rule to hydrate it in `adhoc_process_support`.

Both `adhoc_tool` and `code_quality_tool` have the latent idea of a tool
runner and a considerable machinery to build it. Starting from something
like
```python
@DataClass(frozen=True)
class ToolRunnerRequest:
    runnable_address_str: str
    args: tuple[str, ...]
    execution_dependencies: tuple[str, ...]
    runnable_dependencies: tuple[str, ...]
    target: Target
```
they need to assemble things like locate the actual runnable by str and
figure out what should be its base digest, args, env, etc. and also
co-locate the execution and runnable dependencies. We now capture that
information as a "runner":
```python
@DataClass(frozen=True)
class ToolRunner:
    digest: Digest
    args: tuple[str, ...]
    extra_env: Mapping[str, str]
    append_only_caches: Mapping[str, str]
    immutable_input_digests: Mapping[str, Digest]
```

After this, `adhoc_tool` and `code_quality_tool` diverge in what they do
with it. `adhoc_tool` uses this runner to generate code and
code_quality_tool uses it to run batches of lint/fmt/fix on source
files.

## Food for thought ...

It should not escape our attention that this `ToolRunner` could also be
surfaced as a Target, to be used by `adhoc_tool` and `code_quality_tool`
rather than each specifying all these fields together. It would also
help to reduce confusion when handling all the kinds of 'dependencies'
arguments that `adhoc_tool` takes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants