Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code Quality Tool: toml-based backend templating #20270

Open
wants to merge 24 commits into
base: main
Choose a base branch
from

Conversation

gauthamnair
Copy link
Contributor

@gauthamnair gauthamnair commented Dec 7, 2023

This demonstrates a version of the suggestion in #17729 (comment)

The PR does two things:

  • introduce a mechanism for backend templating
  • use that mechanism to provide a simple way to configure code_quality_tool backends. Also adds docs and validation for code_quality_tool

Overview/Demo

It introduces the notion within bootstrapping that backends can be templated out from a package:

[GLOBAL]
backend_packages.add = [
    "pants.backend.python",
    "flake8",
    "markdownlint",
]

[GLOBAL.templated_backends.flake8]
template = "pants.backend.experimental.adhoc.code_quality_tool_backend_template"
goal = "lint"
target = "//:flake8_linter"
name = "Flake8"

[GLOBAL.templated_backends.markdownlint]
template = "pants.backend.experimental.adhoc.code_quality_tool_backend_template"
goal = "lint"
target = "//:markdownlint_linter"
name = "Markdown Lint"

Working example

See
https://github.com/gauthamnair/example-pants-byo-tool/tree/global-backends-option-demo for a repo with a working example of using this functionality to activate code quality tool without needing an in-repo plugin.

Backend templating

We adjust the meaning of backend_packages to include either:

  • a str specifying a python package to be loaded as before, containing a register.py with entrypoints that take no arguments
  • OR a str referring to a templated backend defined as a key in [GLOBAL.templated_backends]. See below how they get loaded.

Backends are loaded in order, as before.

A templated backend is configured with toml following this shape:

[GLOBAL.templated_backends.<backend_package_name>]
template = "path.to.backend.generator.module"
kwarg1 = "val1"
kwarg2 = "val2"

To load it, pants will import the python module specified in the template arg. It will call the generate function on it to produce a register-module-like object:

# pseudocode equivalent to the load_backend impl in this PR
from path.to.backend.generator.module import generate

obj = generate(<backend_package_name>, dict(kwarg1="val1", kwarg2="val2"))

Then pants will look for and call the usual backend registration entry points but on obj: obj.rules(), obj.target_types(), etc.

Code Quality Tool

PR makes these changes:

  • Code Quality Tool backends are now available via the backend template pants.backend.experimental.adhoc.code_quality_tool_backend_template. The backends it stamps out are self-contained and does not require the adhoc backend to be separately added to backend_packages
  • Added several bits of documentation, argument validation and better error messages for code quality tool users.

Copy link
Member

@kaos kaos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

src/python/pants/init/extension_loader.py Outdated Show resolved Hide resolved
src/python/pants/init/extension_loader.py Outdated Show resolved Hide resolved
Comment on lines 166 to 169
err_msg = (
f"Entrypoint {name} in {backend_module} backend template "
f"must accept {list(kwargs)} as keyword arguments: {e!r}"
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be improved by using inspect.signature(entrypoint) to also present the actual keyword args expected.

Would probably also be helpful to include kwargs["backend_package_alias"] in the error message, to help identify which templating section this was for, as the backend module could be in multiple of them.

src/python/pants/init/extension_loader.py Outdated Show resolved Hide resolved
src/python/pants/init/extension_loader.py Outdated Show resolved Hide resolved
@gauthamnair
Copy link
Contributor Author

@kaos I am going to try to begin adding tests here.

@gauthamnair

This comment was marked as outdated.

@gauthamnair

This comment was marked as outdated.

@gauthamnair gauthamnair changed the title [WIP] Code Quality Tool: toml-based backend templating Code Quality Tool: toml-based backend templating Dec 15, 2023
@gauthamnair gauthamnair marked this pull request as ready for review December 15, 2023 16:47
description_of_origin=f"the configured `target` for Code quality tool {request.description_of_origin}",
),
)
target = wrapped_target.target
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes in this file are only to give a better error message when a user mistypes the address of a code quality tool target.

scope=backend_package_alias,
)

return CodeQualityToolBackend(rule_builder)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the main function of a backend template module. It returns an object with a similar interface as the register module objects normally used to load backends.

template = d.pop("template", None)
if not template:
raise ValueError('"template" is a required key for a templated backend')
return cls(template=cast(str, template), kwargs=FrozenDict(d))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Located in a separate file to avoid circular imports between options and extension loading. This stage of parsing logic is owned by bootstrap options.

)
module = backend_generator_module.generate(backend_package, cfg.kwargs)
module_source = f"{module} generated from {cfg.template}"
return module, module_source
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two functions return objects with the same an interface: module is something with zero-arg callable attributes like target_types, rules and module_source is a string describing the module for use in error messages.

Copy link
Member

@kaos kaos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really cool stuff!

Happy with how this has turned out. Only thing still nagging me is the terminology. But maybe that's just me 🤷🏽 I would still prefer to see the templating bit as a generic way to "configure" any backend.. that from a user perspective all backends are equal, and some may take additional configuration.

Just to clarify my thoughts to make it easier to agree/disagree:

[GLOBAL]

backend_packages.add = [
  "some.regular.backend",
  "another-backend-variant",
  "yet.one.more",
]

[GLOBAL.backend_package_config.another-backend-variant]
module = "another.backend"  # or maybe `package = ..` or `register = ..` 🤷🏽 
flavour = "variant"
sauce = "medium"

The benefit of still having also the configured backends in the regular backend_packages list is to make it easier to manage the list as usual, to enable/disable backends--while the configuration for them could always be present.

@gauthamnair
Copy link
Contributor Author

@kaos , the way the PR is currently one does have to refer to the configured backends still in backend_packages.

I think it will be fine to change the template key for module so that the example will look like:

[GLOBAL]
backend_packages.add = [
    "pants.backend.python",
    # the below are configured/templated backends and must still be referred to in the backend_packages list.
    "flake8",
    "markdownlint",
]

[GLOBAL.templated_backends.flake8]
module = "pants.backend.experimental.adhoc.code_quality_tool_backend_template"
# not template =  and not package = because the string refers to a module not a python package.
goal = "lint"
target = "//:flake8_linter"
name = "Flake8"

[GLOBAL.templated_backends.markdownlint]
module = "pants.backend.experimental.adhoc.code_quality_tool_backend_template"
goal = "lint"
target = "//:markdownlint_linter"
name = "Markdown Lint"

What do you think? I can make the s/template/module/ change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see why this file is placed here, as it becomes the module name used in the configuration to invoke it. Would however suggest moving the implementation to the regular place, and importing it from there (so we still have the experimental name as the interface to use in the configuration.

from pants.backend.adhoc.code_quality_tool_backend_template import generate as generate

(the as generate part is to tell linters that we import it with the purpose of re-exporting it)

The motivation being that we then have the full history also after it graduates from experimental.

@kaos
Copy link
Member

kaos commented Dec 16, 2023

@kaos , the way the PR is currently one does have to refer to the configured backends still in backend_packages.

Yep, and I like that. 👍🏽

What do you think? I can make the s/template/module/ change.

Yea, I like module =.

[GLOBAL.templated_backends.markdownlint]

Do you have any alternative suggestions for the templated_backends option?

I like the generate() method to return the backend "module" used to get all the rules, target types etc from. But I think we could still have the generate method to be declared in a register.py module (so it would still be closer in structure/semantics to a regular backend).

So:

[GLOBAL.templated_backends.markdownlint]
module = "pants.backend.experimental.adhoc.code_quality_tool"
...

When loading backends, pants would load pants.backend.experimental.adhoc.code_quality_tool.register and call generate("markdownlint", ...) on that to get the module used to load all the rules etc from. Wdyt?

@thejcannon
Copy link
Member

I'll dive into this this week, but damn. Looks absolutely fantastic on the surface.

You should feel proud, this is very well done.

@gauthamnair
Copy link
Contributor Author

gauthamnair commented Dec 18, 2023

@kaos This comment takes your suggestion and see how to tie it together in a self-consistent way of talking/thinking about backends. What you are proposing is an interesting formulation.

Current proposal: two types of backends

What is in this PR at the time of this writing. There are two types of backends: regular and templated. We refer to their definition in code differently. One has a package with a register.py, the other points to a module with a generate inside of it.

Alternate proposal: unification of templated and regular backends

This is following your suggestion. We can propose that all backends are "generated"/"templated". The only difference with regular backends is that generate is implicit and trivial:

  • The entry point for installing any backend functionality is a python package with a register.py.
  • We consider all backends are "generated":
    • If register.py defines generate we use that callable with generate(backend_alias, kwargs)
    • If not, we consider an implicit generate that returns the loaded register.py module object itself and asserts that no kwargs were present in the configuration:
# equivalent to the implicit generate defined in every classic backend package
def generate(backend_package_alias, kwargs) -> RegisterProtocol:
    assert not kwargs, f"Backend package {__name__} does not take configuration args"
    return sys.modules[__name__]

In this view the templated_backends config section is something that applies to all backend instantiations.
I like the unification that this would make happen, now that it seems the pieces can be tied together. And we'd use package in the toml.

I'd like to wait for another reviewer/maintainer to chime in as well and if they are +1 with moving to this direction, I'll go ahead and impl.

@thejcannon
Copy link
Member

We can propose that all backends are "generated"/"templated"

Can you share a snippet of what that would look like in the toml?

@gauthamnair
Copy link
Contributor Author

Sorry, @thejcannon , what I mean by:

We can propose that all backends are "generated"/"templated"

Is just a conceptual change, but does not require any change in toml. Its just saying that internally we can manage regular backends as though they too are "generated" via an implicit generate that takes only empty kwargs.

@tgolsson
Copy link
Contributor

We adjust the meaning of backend_packages to include either:

  • a str specifying a python package to be loaded as before, containing a register.py with entrypoints that take no arguments
  • OR a str referring to a templated backend defined as a key in [GLOBAL.templated_backends]. See below how they get loaded.
[GLOBAL]
backend_packages.add = [
    "pants.backend.python",
    # the below are configured/templated backends and must still be referred to in the backend_packages list.
    "flake8",
    "markdownlint",
]

[GLOBAL.templated_backends.flake8]
module = "pants.backend.experimental.adhoc.code_quality_tool_backend_template"
# not template =  and not package = because the string refers to a module not a python package.
goal = "lint"
target = "//:flake8_linter"
name = "Flake8"

...

This looks awesome, but I've got a conceptual worry that nags at me. These are definitely not packages in any sense that overlaps with existing usage. I'm not even sure if they'd count as backends? Conceptually, they're jazzed up code generators. I don't say that to denigrate the work done - it's awesome, and Pants is unforgivingly verbose so I'm all for simplifying the process. But there's certainly a lack of smarts that separates them from regular backends. I'm worrying that by equating them we'll (a) make teaching Pants harder, (b) have a mismatch in expectation of what they'll do, and (c) potentially set us up for a lot of work to meet those expectations?

I'm wondering if there's a "missing" component in Pants here which lives between static configuration (pants.toml) and the BUILD files. For example; in Bazel you've got WORKSPACE which looks like build files, but runs earlier and is where you'd regularly register rules. bazel also has .bazelrc, and contains settings modulated by contexts. I've not kept up enough with the previous work in this series to know why we are where we are now, so maybe this has been discussed before.

As a more general point about this whole series, I'm worried how this would interact e.g. with code generators, or native Python extensions. Can that work? How does it work? What about Rust, which requires both rustup + toolchain before it can run, and the partioning has to happen on "package" or workspace level? On the other hand -- the way Pants handles source files + dependencies is quite inflexible, so this is a good use-case for finding those limits. I'm just wondering whether we can then translate that back into gains for general "non-trivial" file based runs.

@thejcannon
Copy link
Member

I'm wondering if there's a "missing" component in Pants here which lives between static configuration (pants.toml) and the BUILD files. For example; in Bazel you've got WORKSPACE which looks like build files, but runs earlier and is where you'd regularly register rules.

(Just a drive-by thought) The "Environments" support parses the root BUILD file without macro support to find environments. It could be that if we had something akin to WORKSPACE, the environments handling would move there as well.

@kaos
Copy link
Member

kaos commented Dec 18, 2023

I'm wondering if there's a "missing" component in Pants here which lives between static configuration (pants.toml) and the BUILD files. For example; in Bazel you've got WORKSPACE which looks like build files, but runs earlier and is where you'd regularly register rules.

(Just a drive-by thought) The "Environments" support parses the root BUILD file without macro support to find environments. It could be that if we had something akin to WORKSPACE, the environments handling would move there as well.

Not just the root BUILD file.. it parses any BUILD file as required to load all defined environment targets.

@kaos
Copy link
Member

kaos commented Dec 18, 2023

We can propose that all backends are "generated"/"templated".

Yes, we can attempt to always check for generate(..) when loading a backend, but perhaps only if we have kwargs. This would allow a backend to be usable either as a regular backend or as a templated backend.

These [I assume this refers to the aliases in the backend_packages list] are definitely not packages in any sense that overlaps with existing usage. I'm not even sure if they'd count as backends? Conceptually, they're jazzed up code generators.

I agree they're not packages. What if the global option was builtin_plugins rather than backend_packages?
I think that is a better description, as it is a bit unfortunate that the current method of enabling the backends (plugins) are in an option tied to the implementation rather than the effect.

However, not being a package but rather a placeholder for a configured name that in turn points at a module to use to generate the backend to load--they indeed can be seen as jazzed up code generators, but not generic and always produce the same thing; the backend to load. So the new concept being that mixed with the regular backend packages there can now also be "backend aliases". As these aliases are named freely, it'll be up to the user to pick names that makes sense to them so I don't see this as a big problem, outweighed by the intuitive effect of being able to manipulate one single list for all "backends" to use.

@kaos
Copy link
Member

kaos commented Dec 18, 2023

But there's certainly a lack of smarts that separates them from regular backends. I'm worrying that by equating them we'll (a) make teaching Pants harder, (b) have a mismatch in expectation of what they'll do, and (c) potentially set us up for a lot of work to meet those expectations?

What smarts are lacking?

A templated backend can do precisely the same thing as a regular backend, so I fail to see what expectations there may be that we wouldn't meet?

@kaos
Copy link
Member

kaos commented Dec 18, 2023

oh, and thanks for reviewing and raising questions, it's precisely what we need to settle this :) 🙏🏽

@kaos
Copy link
Member

kaos commented Dec 18, 2023

Another shift, to stay closer to packages in backend_packages could be to instead have:

[GLOBAL]
backend_packages.add = [
    "pants.backend.python",
    # the below are configured/templated backends and must still be referred to in the backend_packages list.
    "pants.backend.experimental.adhoc.code_quality_tool",
]

[[GLOBAL.templated_backends]]
package = "pants.backend.experimental.adhoc.code_quality_tool"
goal = "lint"
target = "//:flake8_linter"
name = "Flake8"

[[GLOBAL.templated_backends]]
package = "pants.backend.experimental.adhoc.code_quality_tool"
goal = "lint"
target = "//:markdownlint_linter"
name = "Markdown Lint"

So, when loading backends, we check if there are any configurations for the package in templated_backends and if so load one instance for each configuration. In case there are no configurations we load the backend as usual.

The main difference to previous is that we don't use the names in the backend_packages and thus all variants sharing a package are enabled/disabled as a single unit, using the package option as the connecting value to the backend package.

I kind of like this idea, as it keeps the backend_packages list clean.. and if a backend package is loaded without configuration that is meant to have that we it can raise an error explaining the situation (or if Pants detects this out of the box...)

@gauthamnair gauthamnair requested a review from tgolsson December 19, 2023 11:47
@gauthamnair
Copy link
Contributor Author

gauthamnair commented Dec 19, 2023

These are definitely not packages in any sense that overlaps with existing usage. I'm not even sure if they'd count as backends? Conceptually, they're jazzed up code generators.

Very much +1 that we should be careful to have coherent meanings for the words backend, backend_package or else things can be a headache and a confusion down the road. In the current version of the PR, with the backend aliases in the backend_packages list, I think they qualify as backends, if we define a backend to be:

  • backend: a set of Targets, Rules, build aliases, etc that work together to power some bit of pants end-user functionality.

I agree that they are code generators. In this case a kind of code generator for a pants plugin, creating a bunch of types and rules relating them to goals and known union bases.

As to:

I'm worried how this would interact e.g. with code generators, or native Python extensions.

The only changed introduced here or, for that matter, in the original PR - is a pattern of dynamically generating pants types and rules during the process of backend-loading.

Andreas, the issue with:

[GLOBAL]
backend_packages.add = [
    "pants.backend.python",
    # the below are configured/templated backends and must still be referred to in the backend_packages list.
    "pants.backend.experimental.adhoc.code_quality_tool",
]

is that backend order matters for things like fmt and fix. #17729 (comment) , something I constantly need to be reminded of.

We can also not do this

I'd very much like us to leave on the table the option of not introducing this new bit of .toml meaning and backend loading. As of #20135 it is already fairly straightforward to enable new code quality tools. Yes, the repo admin won't be able to just do stuff in pants.toml and BUILD files. But this is all that is needed:

pants.toml

[GLOBAL]
pythonpath = ["%(buildroot)s/pants-plugins"]
backend_packages.add = [
    "pants.backend.python",
    # code_quality_tool's target and rules are defined in the adhoc backend
    "pants.backend.experimental.adhoc",
    # thin in-repo plugin activates a particular target as a linter
    "flake8_linter_plugin",
]

and an in-repo plugin pants-plugins/flake8_linter_plugin/register.py

from pants.backend.adhoc.code_quality_tool import CodeQualityToolRuleBuilder

def rules():
    cfg = CodeQualityToolRuleBuilder(
            goal="lint", target="//:flake8_tool", name="Flake8", scope="flake8_tool"
    )
    return cfg.rules()

This is already supported and does not require introducing any new concepts, bits of syntax or overloading of syntax, and the user retains full control over things like ordering.

On the other hand, users do seem to have apprehension to go into writing their own in-repo plugins, and maybe that will also happen for such a microscopic one.

@kaos
Copy link
Member

kaos commented Dec 19, 2023

backend order matters for things like fmt and fix. #17729 (comment) , something I constantly need to be reminded of.

Ah, thank you for keeping that top of mind. Good point, I thought that would be solved by having the GLOBAL.templated_backends as a list, since that will give as a determined order, but I realize that you still have no control of where the regular backends go if they need to go in between the templated ones for a particular templated backend.

I still think this issue with order being significant is a dumb issue to have and ought to be resolved.. but until then we'll have to deal with it.

@tgolsson
Copy link
Contributor

But there's certainly a lack of smarts that separates them from regular backends. I'm worrying that by equating them we'll (a) make teaching Pants harder, (b) have a mismatch in expectation of what they'll do, and (c) potentially set us up for a lot of work to meet those expectations?

What smarts are lacking?

A templated backend can do precisely the same thing as a regular backend, so I fail to see what expectations there may be that we wouldn't meet?

I mentioned a few cases where I think it'd be hard to use this efficiently. By smarts I mean it cannot use any deeper knowledge about the language to run correctly -- it only runs on a glob of files (and whatever smarts are used to generate that glob). And then there's partitioning better to improve caching, and so on. That's fine on efficient tools, but this would suck horrendously for something like cargo clippy because you'd recompile everything from scratch each time you ran. And I'm not raising this point as an argument this mechanism in general, more so about the semantic mixup.

And while it's true it can do the same things, especially with more configuration options, at some point the configuration to get exactly what's needed (again: see clippy) would likely be less maintainable, definitely less shareable, and still very verbose.

Very much +1 that we should be careful to have coherent meanings for the words backend, backend_package or else things can be a headache and a confusion down the road. In the current version of the PR, with the backend aliases in the backend_packages list, I think they qualify as backends, if we define a backend to be:

backend: a set of Targets, Rules, build aliases, etc that work together to power some bit of pants end-user functionality.

I agree that they are code generators. In this case a kind of code generator for a pants plugin, creating a bunch of types and rules relating them to goals and known union bases.

Sure. I think my concern is mostly that I put a lot of emphasis on the "smarts" side, where this makes tradeoffs that makes it unfit for a bunch of use-cases. I'll concede that maybe that isn't part of the definition, semantically. I do however also maintain that there's a semantic difference between a dedicated plugin with multiple backends compared to this. That's maybe less important the package-vs-non-package distinction.

As to:

I'm worried how this would interact e.g. with code generators, or native Python extensions.

The only changed introduced here or, for that matter, in #20135 - is a pattern of dynamically generating pants types and rules during the process of backend-loading.

I'm aware, it's just my general line of reasoning around whether these count as "backends", as in -- are there footguns here that warrants keeping them more or less separate from current backends.

is that backend order matters for things like fmt and fix. #17729 (comment) , something I constantly need to be reminded of.

I much prefer the separate-but-equal approach. How big is the risk here w.r.t. ordering? Is it something we can solve in different ways? It seems like a footgun in general -- though I guess with the split approach it's unsolvable without other mechanics.

I'd very much like us to leave on the table the option of not introducing this new bit of .toml meaning and backend loading. As of #20135 it is already fairly straightforward to enable new code quality tools. Yes, the repo admin won't be able to just do stuff in pants.toml and BUILD files. But this is all that is needed:

If I had to rank my ideal workflows it'd probably be "WORKSPACE-style"/BUILD, followed by pants.toml, followed by this dummy backend model. That to me seems like sacrificing both shareability1 and simplicity.

Footnotes

  1. This is probably me being me here, as technically these could be shared like any other plugin. It just seems weird someone would go through the process of publishing a plugin with 5 lines of code. Plus that sort of defeats the point of a simple system, if the easiest way to use it is let someone else do it....

@benjyw
Copy link
Contributor

benjyw commented Feb 16, 2024

Hi all, I'm wondering what the status of this is?

@gauthamnair
Copy link
Contributor Author

@benjyw The PR is itself in a good state, but there is debate about whether this is the right move or whether what is offered here will provide value greater than its weight. I can pick it back up if the community can see a path forward. It was otherwise, in my opinion, starting to fall in the "when in doubt, leave it out" category.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants