-
-
Notifications
You must be signed in to change notification settings - Fork 644
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow using a shell command/adhoc tool as part of any goal (check, lint, fmt, fix, deploy, ...) #17729
Comments
cc @chrisjrn who is already waist deep in shell command stuff. |
Oh this is interesting. If I were designing this, I could imagine being able to hook into existing non-destructive goals (like I agree that hooking into |
For the short term, I feel that a good-enough approach to side-effects exists may already via # actually do all the interesting work to change files (cacheable, etc)
experimental_shell_command(
name="make_the_changes", outputs=["generated.txt"], command="...", ...
)
# write the side-effect back (i.e. replacement for fmt/fix)
experimental_run_shell_command(
name="write_the_changes", command="cp {chroot}/path/to/generated.txt path/to/whatever.txt", dependencies=[":make_the_changes"], ...
)
# test that things match (i.e. replacement for the lint part of a fmt/fix goal)
experimental_test_shell_command(
name="check_the_changes", command="diff path/to/generated.txt path/to/whatever.txt", dependencies=[":make_the_changes", "./whatever.txt"], ...
) That said, this isn't perfect, because every
|
Running multiple targets would definitely be an interesting use case, and you're right, we don't have that right now. 🤔 |
After some discussion, and given we now have Tools with side-effects are still interesting. I'm less happy with the idea of hacking something together with I think we'd want to implement mutating behaviour with a different goal, because the semantics are different to
Then the target spec to call Finally, having a separate goal would make it possible to string together multiple Thoughts? |
Makes sense for check (and lint)! I'm not sure I entirely understand the proposal for Questions (in decreasing order of priority, discussion of the priority in parens):
A probably-wild idea (more along the lines of your "in the long run" reference above, rather than a first step) would be getting these into backend_packages = [
"pants.backend.python",
"pants.backend.python.lint.black",
"//path/to/mutator:mutator1", # runs after black, but before isort
"pants.backend.python.lint.isort",
] That seems like it may require some careful circular dependency resolution: pants needs the backends in order to resolve the targets, and needs to resolve targets in order to load the backends. 😅 (This potentially generalises how |
Without really re-thinking things, there would be no way to specify which targets the mutator is to act upon, except within the target definition itself. If you need more info, I'll dump that when I'm back at my desk. |
(The crux of it is that |
Yeah I suspect this feature would take off like hotcakes. In fact ignoring the complexity of ICs and whatnot, imagine 🤯 |
@huonw I have been playing around with possibilities relating to the scoped problem of:
the idea is there is someone trying to convert to pants and they currently are running, say The demo repo in https://github.com/gauthamnair/example-pants-byo-tool shows an API to "Bring Your Own Tool". The user makes a tiny plugin with this configuration: confs = [
ByoLinter(
options_scope='byo_shellcheck',
name="Shellcheck",
help="A shell linter based on your installed shellcheck",
command="shellcheck",
file_extensions=[".sh"],
),
ByoLinter(
options_scope='byo_markdownlint',
name="MarkdownLint",
help="A markdown linter based on your installed markdown lint.",
command="markdownlint",
file_extensions=[".md"],
)
] And the linters become available:
The underlying library is dynamically creating the different types and rules required for a proper implementation of LintRequest based on the user's configuration. (the lib is at #19954) It is a crude demo and much would be needed to be done to button it up, too many things to list here even, but the hypotheses driving the approach is:
If there is a lot of incidental complexity in activating a new tool for a goal, then we can add more tooling, but keep the responsibility on the pants/plugin/pants.toml side of the fence. The demo adds the capability as an abbreviated plugin, but it could be configured entirely within Having it in pants.toml would reduce the barrier to entry to probably around what a new user would find welcoming. With such a plugin-generation strategy, we should be able to handle |
The demo in example-pants-byo-tool is fleshed out now more and includes the following kind of API: The user defines an in-repo plugin with a register.py containing: confs = [
ByoTool(
goal="lint",
options_scope='byo_markdownlint',
name="MarkdownLint",
help="A markdown linter based on your installed markdown lint.",
executable=SystemBinaryExecutable("markdownlint", tools=["node"]),
file_glob_include=["**/*.md"],
file_glob_exclude=["README.md"],
),
ByoTool(
goal="lint",
options_scope='byo_flake8',
name="byo_Flake8",
help="Flake8 linter using the flake8 you have specified in a resolve",
executable=PythonToolExecutable(
main=ConsoleScript("flake8"),
requirements=["flake8>=5.0.4,<7"],
resolve="byo_flake8",
),
file_glob_include=["**/*.py"],
file_glob_exclude=["pants-plugins/**"],
),
ByoTool(
goal="fmt",
options_scope='byo_black',
name="byo_Black",
help="Black formatter using the black you have specified in a resolve",
executable=PythonToolExecutable(
main=ConsoleScript("black"),
requirements=[
"black>=22.6.0,<24",
'typing-extensions>=3.10.0.0; python_version < "3.10"',
],
resolve="byo_black",
),
file_glob_include=["**/*.py"],
file_glob_exclude=["pants-plugins/**"],
)
] And markdownlint, flake8 and black become activated for use. Properties:
It will be straightforward to add the following features as well:
It is aspirational to be able to define this configuration within Under the hood we take the user-provided configuration and generate all the types and rules needed to wire up a plugin. It uses the regular mechanisms to do so, and involves no change to the core. It is basically code-generation. The implementation in #19954 is a proof-of-concept. |
For reference, there is a slack thread in which we've been discussing possibilities: https://pantsbuild.slack.com/archives/C0D7TNJHL/p1696006330169339 There is also a google slides doc (should be open for read to anyone with the link) incorporating the discussion in #16412 which tackles the same issue of making linters easier to express. |
As a small but concrete example, we have several
But since they are all so small it would be absurd to make a plugin for each one. |
Proposal: Easy linters via conf + runnable targetsWe could define runnable targets as usual in BUILD files: # build-support/BUILD
python_requirement(name="black_req", requirements=["black==23.1.0"])
# a runnable
pex_binary(
name="black_bin",
script="black",
dependencies=[":black_req"],
) These are runnables in the same sense as those compatible with adhoc_tool. # pants.toml
[GLOBAL]
backend_packages.add = ["...", "pants.backends.byotool"]
[byotool]
conf = """{
'byo_black': {
'goal': "fmt",
'runnable': 'build-support:black_bin' # any runnable supported by adhoc_tool
'name': "Black",
'help': "Black formatter …",
'file_glob_include': ["**/*.py"],
'file_glob_exclude': ["pants-plugins/**"],
'config_files': {
'discover': {
'globs': "**/pyproject.toml"
}
}
}
}""" The user would then use
Related prior art
There is precedent for referring to targets from pants.toml. For example to specify in-repo first-party plugins for flake8. Problems with original ByoTool proposalOne big problem with the ByoTool proposal is that it introduces yet another way to express a runnable in pants. Currently we have two methods:
Many of the above are logically similar. For example the The original ByoTool proposal introduces a 3rd way: a simplified data-only (no options) specification of executables for developers to use but on the plugin side. It would be much better to lean on the existing ability to specify runnables in targets. |
In the proposal, is there a reason the I think my ideal world these definitions would be python-y things like BUILD files, but I'm not sure if there is prior art for things other than targets being in BUILD-like files and I share the hesitation to rope this -- hot cakes! -- proposal to nebulously larger changes. |
Here is a working prototype that should be good for testing usability:
User in-repo setupBUILDin the BUILD files define runnable targets and then define a new
We refer to config files with regular targets in the same style as Here's an example of using a system binary that itself depends on node:
pants.tomlThe code quality tools have to be registered with a goal, and have their options scopes named. For fmt/fix, they also need to be inserted in the correct order. [GLOBAL]
pants_version = "2.17.0"
backend_packages.add = [
"pants.backend.python",
"pants.backend.experimental.code_quality_tool(goal='lint', target='//:flake8_linter', scope='flake8', name='Flake8')",
"pants.backend.experimental.code_quality_tool(goal='lint', target='//:markdownlint_linter', scope='markdownlint', name='Markdown Lint')",
"pants.backend.experimental.code_quality_tool(goal='fmt', target='//:black_formatter', scope='black_tool', name='Black')",
] The proposal extends the backend loading mechanism to be able to use the same module to define several sets of rules based on inputs. No further configuration is required. Usage of the code quality toolsOutwardly the installed code quality tools behave very similarly to regular ones. See https://github.com/gauthamnair/example-pants-byo-tool/tree/main#readme for what the command line input/output look like. |
I'm absolutely loving it so far. This is honestly better than I'd imagined. Only nit I have:
Changing So I think it's best to just assume ordering doesn't matter between |
I'll also echo that this is exciting!
What's the intuition I should have for how this interacts with caching? If one |
The correct thing (and I suspect the behavior as implemented in the branch) is the same as if you edited a file with any of the existing plugins: it invalidates the batch that contains the file. So you re-run the batch. |
@cburroughs The @thejcannon I am a bit puzzled about the daemon-restarting bit. Normally, if we add a new linter or formatter we have to restart the daemon because it introduces all kinds of new rules, etc. We add to pants.toml and that restarts the daemon. In the proposal above, I think we are in no worse of a situation. Adding a new linter is a one-time change to the pants.toml and will result in one restart of the daemon, which seems necessary to install the new underlying types and rules. Changing things about the configuration of the code quality tool, like which config files it depends on, or which args should always be passed, does not require restarting the daemon since that is all in the target. |
@cburroughs about your earlier comment, it did turn out to be possible to move more stuff into the Fortunately there was a way to dodge the ugly |
Fair enough. So, the other awkwardness with this approach is that it would be essentially impossible to have some "higher" config remove this backend (e.g. |
+1 definitely interested in other ways to do it. There were some mechanical conveniences in doing it in
Something like
|
Thank you for all the experimentation! So good! Just brainstorming: what if the |
@huonw I am worried about
From what I can tell, there is no precedent for reading into BUILD files, interpreting targets, etc. during options bootstrapping. For example, one usually has to load all backends to discover all target_types and build aliases before our usual method to parse BUILD files can work. Similarly, is there precedent for the rule graph's structure to depend on particular target definitions in BUILD files? There is precedent for subsystems having references to targets that will be resolved at runtime. Maybe another angle on this matter:
|
yea, as @gauthamnair says, there's no interaction with BUILD files during options bootstrapping and loading backends. I think we'd need to simplify the bootstrapping process before considering going into a two-phase bootstrapping scenario where we can consume BUILD file targets for bootstrapping purposes. Environments does this already (the two-phase thing) but not for bootstrapping options, but, environments :) |
Makes sense, thanks! |
Friends, the positive reception is suggesting we keep pushing. I am going to split the wip PR into two different PRs:
|
@gauthamnair any reason this is "scoped" to code quality, rather than just a generic any kind of tool? |
@kaos, code quality was some generic term that could encompass lint, fix, and fmt. I could not think of a better one (actually it was chat GPT's idea, no kidding). The idea is there is a target that defines a tool that can be used for one of these purposes. The tool cannot 'deploy' or 'publish' or anything like that. |
Thanks for digging in on this @gauthamnair . Having higher level "porcelain" like this to simplify common plugin tasks is a great idea. Starting with fix/fmt/lint makes sense to me for scope. We can later add more porcelain, such as for code generation, or deploy. |
#20135 has a CI-passing version of the library part of this functionality. Please take a look. |
This is a library supporting the behavior proposed in #17729 (comment) The usage example is a bit awkward but there is a lot of variation and discussion about the steps to make the use-case smooth. That work is separated out to a subsequent PR. The core is here. ## Goal Allows a user to define custom linters or formatters by: 1. defining an adhoc-tool compatible target 2. wrapping it in a new `code_quality_tool` target 3. referring to this target from a thin in-repo plugin. ## Example Use-case in `pants.toml`: ```python [GLOBAL] pythonpath = ["%(buildroot)s/pants-plugins"] backend_packages.add = [ # needed for python_requirement "pants.backend.python", # code_quality_tool's target and rules are defined in the adhoc backend "pants.backend.experimental.adhoc", # thin in-repo plugin activates a particular target as a linter "flake8_linter_plugin", ] ``` in `BUILD`: ```python # a python requirement is an adhoc_tool-compatible runnable python_requirement( name="flake8", requirements=["flake8==5.0.4"] ) # new target type describes how to use a runnable as a code quality tool code_quality_tool( name="flake8_tool", runnable=":flake8", execution_dependencies=[":flake8_conf"], file_glob_include=["**/*.py"], file_glob_exclude=["dont_lint_me/**"], args=["--indent-size=2"], ) ``` and an in-repo plugin `pants-plugins/flake8_linter_plugin/register.py` ```python from pants.backend.adhoc.code_quality_tool import CodeQualityToolRuleBuilder def rules(): cfg = CodeQualityToolRuleBuilder( goal="lint", target="//:flake8_tool", name="Flake8", scope="flake8_tool" ) return cfg.rules() ``` The lib supports: - fmt, lint and fix - any runnable supported by adhoc_tool - passing additional runnables that need to be available on the PATH - passing configuration via exec dependencies - passing args - file-based lint/fmt/fix only (as opposed to target based - therefore no target-level `skip_` fields) - `skip` and `only` working similarly to regular code quality tools. See the tests for demos. ## Full demo repo See this branch: https://github.com/gauthamnair/example-pants-byo-tool/tree/code-quality-tool-lib-demo ## Follow-ups - Add the ability to access this functionality without having to write even a thin in-repo plugin but in pants.toml directly. (as discussed in #17729 (comment)) - DRY/factor out common bit of machinery with adhoc_tool
Possibilities for a code quality tool API for wider use#20135 Added the core machinery but only exposes it to users by writing a small in-repo plugin. Code quality tool is not yet announced and we are pondering if we want an "easier" way before announcing it. Here are several proposals summarized from the rest of this thread. As an example we'll add flake8 ourselves rather than via the pants plugin. Proposal: In-repo thin plugin (current state)This is already supported and a reference for all other proposals BUILD: python_requirement(
name="flake8",
requirements=["flake8==5.0.4"]
)
code_quality_tool(
name="flake8_tool",
runnable=":flake8",
execution_dependencies=[":flake8_conf"],
file_glob_include=["**/*.py"],
file_glob_exclude=["dont_lint_me/**"],
args=["--indent-size=2"],
) pants.toml [GLOBAL]
pythonpath = ["%(buildroot)s/pants-plugins"]
backend_packages.add = [
"pants.backend.python",
# code_quality_tool's target and rules are defined in the adhoc backend
"pants.backend.experimental.adhoc",
# thin in-repo plugin activates a particular target as a linter
"flake8_linter_plugin",
] and an in-repo plugin from pants.backend.adhoc.code_quality_tool import CodeQualityToolRuleBuilder
def rules():
cfg = CodeQualityToolRuleBuilder(
goal="lint", target="//:flake8_tool", name="Flake8", scope="flake8_tool"
)
return cfg.rules() This is already supported and does not require introducing any new concepts, bits of syntax or overloading of syntax Proposal: target as a pants.toml backendThis is the first of several proposals that involve introducing new semantics to BUILD: python_requirement(
name="flake8",
requirements=["flake8==5.0.4"]
)
code_quality_tool(
name="flake8_tool",
runnable=":flake8",
execution_dependencies=[":flake8_conf"],
file_glob_include=["**/*.py"],
file_glob_exclude=["dont_lint_me/**"],
args=["--indent-size=2"],
# These next two fields used to be in the in-repo plugin
scope="flake8_tool",
subsystem_name="Flake8",
goal="lint",
) pants.toml [GLOBAL]
pythonpath = ["%(buildroot)s/pants-plugins"]
backend_packages.add = [
"pants.backend.python",
# needed to activate code_quality_tool
"pants.backend.experimental.adhoc",
# reference to the code_quality_tool
"//:flake8_tool",
] The ergonomics or ease seems very clean on this. The complication is reshaping the boundary between the role of targets and the role of subsystems and bootstrapping in a pants project both in concepts and in implementation. As Andreas points out this was done in a way for environments, but loading targets during bootstrap has gotchas Proposal: "templated" backendsProposes that a backend can be templated and stamped out by supplying the template arguments all within the BUILD (same as base case): python_requirement(
name="flake8",
requirements=["flake8==5.0.4"]
)
code_quality_tool(
name="flake8_tool",
runnable=":flake8",
execution_dependencies=[":flake8_conf"],
file_glob_include=["**/*.py"],
file_glob_exclude=["dont_lint_me/**"],
args=["--indent-size=2"],
) pants.toml: [GLOBAL]
pythonpath = ["%(buildroot)s/pants-plugins"]
backend_packages.add = [
"pants.backend.python",
# code_quality_tool's target and rules are defined in the adhoc backend
"pants.backend.experimental.adhoc",
# stamps out a backend that uses the particular target as a linter
"pants.backend.experimental.code_quality_tool(goal='lint', target='//:flake8_tool', scope='flake8', name='Flake8')",
] There could be other forms of syntax. The particular variant above has a POC implementation here . A pro of this approach is it maintains the separation of concerns between options/bootstrapping and target-based configuration, and as such has a straightforward implementation. A con is the introduction of templating as a concept, and the oddness of introducing something like a function call all within A variant of the proposal is to have separate section where stamping out of such templated backends gets done and then refer to those in backend_packages: [GLOBAL]
generated_backend_packages = {
flake8_linter = "pants.backend.experimental.code_quality_tool(goal='lint', target='//:flake8_linter', scope='flake8', name='Flake8')"
}
backend_packages.add = [
"pants.backend.python",
"pants.backend.experimental.adhoc",
"flake8_linter",
] |
Of these I like the templated alternative the most, with a slight adjustment. Instead of using a single string value for the backend and it's arguments, I'd consider exploading it onto separate option values: [GLOBAL]
backend_packages.add = [
"pants.backend.python",
"pants.backend.experimental.adhoc",
# using a static prefix to avoid conflict with any other plugin names now and into the future.
"generated:flake8_linter",
]
[GLOBAL.generated_backend_packages.flake8_linter]
backend_package = "pants.backend.experimental.code_quality_tool"
goal = "lint"
target = "//:flake8_linter"
scope = "flake8"
name = "Flake8"
Although the current options system don't have good support for this deeply nested option structures, it does allow arbitrary values, so the |
Looking at this, trying to put on the user goggles the [GLOBAL]
backend_packages.add = [
"pants.backend.python",
"pants.backend.experimental.adhoc",
# using a static prefix to avoid conflict with any other plugin names now and into the future.
"backend:flake8_linter",
]
[GLOBAL.backend.flake8_linter]
backend_package = "pants.backend.experimental.code_quality_tool"
goal = "lint"
target = "//:flake8_linter"
scope = "flake8"
name = "Flake8" |
This is a nice idea @kaos , riffing on it further. There is a Now we still have python packages with relevant code but some packages do not automatically generate a backend because they need further parameters. They can also generate several backends. So we could have something like this: [GLOBAL]
backend_packages.add = [
"pants.backend.python",
"pants.backend.experimental.adhoc",
# maybe this line below is not needed
"pants.backend.experimental.code_quality_tool_template",
]
[GLOBAL.backends.flake8_linter]
backend_package = "pants.backend.experimental.code_quality_tool_template"
goal = "lint"
target = "//:flake8_linter"
scope = "flake8"
name = "Flake8" New linters would not add new lines to |
Ah, yes. I think we could leave templated backends out of |
The separate I believe order of backends can matter (e.g. if a formatters/fixers need to be invoked in a particular order), so I wonder if omitting them completely from the |
Huon, you are right. order matters, it has to be referenced. I'll take a crack at seeing what the implementation could look like. I have to confess that I'm starting to hear the voice saying "When in doubt, leave it out" about trying to do something more ergonomic than the thin in-repo plugin. Questioning whether it is really a blocker before go-to-market for Hope to report back with some kind of POC impl of the |
@kaos @huonw Here is a POC impl. #20270 . I am really liking it actually. I thought I would be more afraid of it when it is alive, but turns out it is comforting instead. Here is the toml from the example repo that shows it in action: [GLOBAL]
pants_version = "2.17.0"
backend_packages.add = [
"pants.backend.python",
"pants.backend.experimental.adhoc",
"flake8",
"markdownlint",
"blackfmt",
]
[GLOBAL.templated_backends.flake8]
template = "pants.backend.experimental.adhoc.code_quality_tool_backend_template"
goal = "lint"
target = "//:flake8_linter"
name = "Flake8"
[GLOBAL.templated_backends.markdownlint]
template = "pants.backend.experimental.adhoc.code_quality_tool_backend_template"
goal = "lint"
target = "//:markdownlint_linter"
name = "Markdown Lint"
[GLOBAL.templated_backends.blackfmt]
template = "pants.backend.experimental.adhoc.code_quality_tool_backend_template"
goal = "fmt"
target = "//:black_formatter"
name = "Black" |
I wish us to rid this constraint. That is, aim for having the backends provide the needed information to resolve order instead of relying on the user provided configuration is in proper order. This is a sneaky gotcha. Isn't most solved by simply applying fixers first then formatters and lastly linters? |
@gauthamnair really nice! For me, I'd prefer not to surface the "templating" part too much, keeping the UX for what I feel would be cleaner, I'll leave some notes for that on your draft PR. Also, I like the way the POC interacts with |
Before moving to step 2 of the plan described in #17729 (comment) , cleaning up a gross duplication of rule code that I introduced in #20135 between `adhoc_tool` and the new `code_quality_tool`. This PR extracts the shared logic into the concept of a ToolRunner and a rule to hydrate it in `adhoc_process_support`. Both `adhoc_tool` and `code_quality_tool` have the latent idea of a tool runner and a considerable machinery to build it. Starting from something like ```python @DataClass(frozen=True) class ToolRunnerRequest: runnable_address_str: str args: tuple[str, ...] execution_dependencies: tuple[str, ...] runnable_dependencies: tuple[str, ...] target: Target ``` they need to assemble things like locate the actual runnable by str and figure out what should be its base digest, args, env, etc. and also co-locate the execution and runnable dependencies. We now capture that information as a "runner": ```python @DataClass(frozen=True) class ToolRunner: digest: Digest args: tuple[str, ...] extra_env: Mapping[str, str] append_only_caches: Mapping[str, str] immutable_input_digests: Mapping[str, Digest] ``` After this, `adhoc_tool` and `code_quality_tool` diverge in what they do with it. `adhoc_tool` uses this runner to generate code and code_quality_tool uses it to run batches of lint/fmt/fix on source files. ## Food for thought ... It should not escape our attention that this `ToolRunner` could also be surfaced as a Target, to be used by `adhoc_tool` and `code_quality_tool` rather than each specifying all these fields together. It would also help to reduce confusion when handling all the kinds of 'dependencies' arguments that `adhoc_tool` takes.
Is your feature request related to a problem? Please describe.
Shell is a universal glue, that allows augmenting pants adhoc with a lower barrier to entry (and, for simple tasks, lower maintenance burden) than writing a custom plugin. It'd be nifty to generalise the current codegen
experimental_shell_command
, runexperimental_run_shell_command
and recently testexperimental_test_shell_command
(#17640) to allow using a shell script for 'anything'.For example, in a polyglot repo, pants may support some of the languages/tools natively, but not others, but it'd still be nice to have basic commands like
./pants lint ::
and./pants fmt ::
work across the whole system. Additionally, it would allow faster experiments and migration of an existing repo into pants, if glue scripts (and/or makefiles, or whatever) can be reused, and switch to having./pants
to more.(We 'tripped' over both of these in our migration into pants, but I personally have liked pants enough to suffer through split tooling, and convince the rest of the team too, as well 😅 )
Describe the solution you'd like
Something way to have a shell command hook into other goals. For instance:
dependencies
would be the input files (for file-based commands).outputs
field could be reused forfmt
andfix
to write back to the codebase.Some questions that seem relevant (although some can probably be deferred/considered potential future enhancements?):
fmt
/fix
subsystems (e.g. ifmy-formatter
touches python files, does it run before or after black)?fmt
andfix
? How does that work?a.txt
and another outputsb.txt
, the final output has both)? What happens if two invocations generate the same path (with different or identical content)?**/*.py
(to be able to run a custom linter on all Python files in descendent directories, without writing out the path to their individual targets)?external_tool
target #17277 (or similar) for downloadable toolsexperimental_shell_command
requires building a PEX and specifying an appropriate interpreter as a tool #17405 may be relevant)Describe alternatives you've considered
We are:
Additional context
N/A
The text was updated successfully, but these errors were encountered: