Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Visibility" support (akin to Bazel's) #13393

Closed
joshua-cannon-techlabs opened this issue Oct 28, 2021 · 29 comments · Fixed by #17401
Closed

"Visibility" support (akin to Bazel's) #13393

joshua-cannon-techlabs opened this issue Oct 28, 2021 · 29 comments · Fixed by #17401
Assignees

Comments

@joshua-cannon-techlabs
Copy link

Is your feature request related to a problem? Please describe.

In a monorepo you will like have N services (likely containing one-or-more underlying packages) as well as M "common" packages. Logically speaking, any piece of code should be able to dip it's hand into at most 2 top-level packages: it's own and the common code. Having code from service X depend on code from service Y would be a mistake and shouldn't be mechanically allowed.

Describe the solution you'd like
Some way of specifying visibility in a BUILD file (with the default as "public" to maintain backwards-compatibility).

Initial implementation can be scoped to only supporting 2 values: "public" and "package".

For simplicity, if visibility was inherited from the nearest-declared-ancestor-dir that would limit the amount of visibility declarations to N BUILD files (one for each service, the M common packages can remain with the public default).

Additionally, this could be hidden behind a plugin to make it truly opt-in.

Describe alternatives you've considered
A plugin which enforces this at test/lint/check-time instead of visibility declarations in BUILD files.

Additional context
N/A

@asherf
Copy link
Member

asherf commented Nov 8, 2021

+1 for this.

@yoav-orca
Copy link
Contributor

This would be a great value for our codebase well! It's something that's easy to introduce by mistake and wastes a lot of time when not handled correctly

@tdimiduk
Copy link
Contributor

One thing that occurs to me would be to use something like a .gitignore model. Might as well re-use the syntax since there are already tools to parse it.

I think it would probably work fairly well to use a similar model as .gitignore that each file describes what a directory and it's subdirectories are allowed to depend on. There could be value to a .pants_allow or a .pants_disallow since in different cases allow or block lists would be more convenient. There might be a good way to allow both and just error if there are inconsistencies.

This would not be as fine grained as something in the build files, but could work pretty well without needing to define much new syntax

@stuhood
Copy link
Member

stuhood commented Nov 29, 2021

This also relates to some of the proposals in #13621.

@yoav-orca
Copy link
Contributor

I think this is a really important feature not only for governance but for stability. An example from today - we had an issue where a developer introduced a python dependency on another microservice in a code path that wasn't covered in CI.
These services are not deployed together, which blew up of course.
Visibility rules would have easily prevented this issue.

@stuhood
Copy link
Member

stuhood commented Dec 13, 2021

Visibility rules would have easily prevented this issue.

So the idea would be that ~every target in every service would declare that it was only visible to that service?

Finding a way to do this without a tremendous amount of boilerplate will be a challenge. Maybe related to #13767.

@benjyw
Copy link
Contributor

benjyw commented Dec 14, 2021

I think this is a really important feature not only for governance but for stability. An example from today - we had an issue where a developer introduced a python dependency on another microservice in a code path that wasn't covered in CI. These services are not deployed together, which blew up of course. Visibility rules would have easily prevented this issue.

Out of curiosity, why didn't dep inference pull in the dependency at the code level?

@yoav-orca
Copy link
Contributor

We can't use pants to package and even if we did the service doesn't have the permissions that the other service has.

@benjyw
Copy link
Contributor

benjyw commented Dec 14, 2021

We can't use pants to package and even if we did the service doesn't have the permissions that the other service has.

So how would this feature have helped? If Pants isn't doing the packaging then it can't enforce deps at packaging time. I guess you'd want Pants to check dependency hygiene when you run "lint" or something like that?

@joshua-cannon-techlabs
Copy link
Author

Probably when tests were run, right? The test process would die on an import error, conceivably.

@benjyw
Copy link
Contributor

benjyw commented Dec 14, 2021

Yeah that makes sense

@joshua-cannon-techlabs
Copy link
Author

On second thought, it might be good to not have to get that far before seeing the error. There might be cases where the code isn't tested, but conceivably some pants operation would run on it where the dep resolution would error. So I presume the error should be raised in dep resolution, then it's on the user to ensure Pants tries to resolve the deps one way or another.

@Eric-Arellano
Copy link
Contributor

So I presume the error should be raised in dep resolution, then it's on the user to ensure Pants tries to resolve the deps one way or another.

Agreed. I think this would happen when calculating dependencies. Which happens when doing things like check (MyPy), test, and package. In CI, we could also tell people to run ./pants dependencies :: I suppose to validate?

I doubt we want to "always no matter what" run the check. ./pants --version or ./pants --changed-since=HEAD fmt should avoid the cost of considering dependencies.

@joshua-cannon-techlabs
Copy link
Author

Another data point, dependency inference is where the "unowned dependency error" is raised.

unowned_dependency_behavior = python_infer_subsystem.unowned_dependency_behavior

@paiforsyth
Copy link
Contributor

Here is a tool that implements something like this for Python projects https://import-linter.readthedocs.io/en/stable/index.html. Could be useful to consider the types of import contracts it supports.

@benjyw
Copy link
Contributor

benjyw commented Dec 20, 2021

The contracts concept is useful!

@thejcannon
Copy link
Member

This makes me think. How would people feel about this being a lint plugin 🤔

(Thinking about it more, "unowned dependency error" could've been a lint plugin as well. It was never discussed)

@stuhood
Copy link
Member

stuhood commented May 11, 2022

Hey folks: #15373 added a ValidateDependenciesRequest @union (which will be in 2.12.x), which can be used on a target-by-target basis to validate dependencies. Someone interested in visibility could likely use that @union and a plugin Field to experiment with an API. Unlike a lint plugin (which would also work, just at a different point in the build), the validation would be applied whenever dependencies are computed.

@Eric-Arellano
Copy link
Contributor

the validation would be applied whenever dependencies are computed.

Meaning ./pants dependencies :: will always trigger it, and often ./pants lint :: and usually ./pants check test package :: will.

@kaos kaos self-assigned this Jun 10, 2022
kaos added a commit to kaos/pants that referenced this issue Jun 10, 2022
Also introduces a new `defaults` target which allows to dynamically provide "scoped" default field values.

Closes pantsbuild#13393.

[ci skip-rust]

[ci skip-build-wheels]
@cognifloyd
Copy link
Member

visibility can be thought of as a dependees assertion.

What would you call a dependencies assertion? (restrict which modules can be imported by a target and its children targets)
Using import-linter as inspiration, maybe that would be called a dependencies_contract?

@Eric-Arellano
Copy link
Contributor

visibility can be thought of as a dependees assertion.

What would you call a dependencies assertion? (restrict which modules can be imported by a target and its children targets)

I think you're right that these are different things.

From what I understand, most have been asking for the visibility / dependees assertion. What is your motivation for the dependencies assertion?

@cognifloyd
Copy link
Member

cognifloyd commented Jun 16, 2022

I have a list of 9 python modules (top-level modules I should say, there are actually quite a few other python modules in the monorepo) that are closely related. One of them is a client module (an api client library) that gets distributed via pypi for 3rd party use, and it can also be used by anything in the code base. But, it must be very lightweight in terms of dependencies, and it must not import any of the other modules (the server-side code). So, I need a way to make sure that that module is a self-contained package (except for a few 3rd party packages).

With visibility I would have to go backwards and make sure that every other module in the monorepo is <private> with exceptions for various other modules, but never the client lib. Essentially, that is spreading the boilerplate all over the place such that someone is very likely to not include it everywhere. So, a dependencies focused assertion would be more useful in this case than a dependees assertion like visibility.

@kaos
Copy link
Member

kaos commented Jun 16, 2022

Actually, I think the __defaults__ PR could be for you here @cognifloyd #15836 (if you have multiple targets that all should have this limited view of your repo, being lightweight)

Example:

# src/client/lightweight/BUILD

# Explicitly exclude any dependencies from these sources..
__defaults__(all=dict(dependencies=["!src/lib::", "!src/common/stuff::", ...]))

# The normal target stuff...

If it is just one target, then that dependencies ejection could simply be added directly to that target.

I think this ought to work.

But then again, we may need to improve any error handling here, when the inference may add a dependency that have been explicitly excluded, we should say so rather than just let things fail later. (If that's the case, I've not tried this out)

@benjyw
Copy link
Contributor

benjyw commented Jun 19, 2022

I think different repos are going to want very different ways of expression allowed/disallowed dependencies. Some will be on the dependency side, some on the dependee side, some will allowlist, some will blocklist, etc.

So I think it's best to implement this at two levels:

  1. A base API that exposes a very simple "is this dep allowed" pluggable rule and enforces it.

  2. A higher-level feature built on top of 1) , in which we make some opinionated choices about how to decide whether a dep is allowed. E.g., this is the level where we say that "visibility" is the mechanism.

That way anyone with custom needs that don't align with the choices and restrictions imposed by 2) can drop down to 1), write their own in-repo rule and plug it in.

@kaos
Copy link
Member

kaos commented Jun 19, 2022

A base API that exposes a very simple "is this dep allowed" pluggable rule and enforces it.

For the base API on the dependencies side there is the ValidateDependenciesRequest:

# Validate dependencies.
_ = await MultiGet(
Get(
ValidatedDependencies,
ValidateDependenciesRequest,
vd_request_type(vd_request_type.field_set_type.create(tgt), result), # type: ignore[misc]
)
for vd_request_type in union_membership.get(ValidateDependenciesRequest)
if vd_request_type.field_set_type.is_applicable(tgt) # type: ignore[misc]
)

which is the enabler for the visibility implementation taking shape in #15803, which then would be bullet 2 above.

For the dependee side I think that using dep excludes !!some:tgt may go a long way, esp if we support any specs and not just literal specs for those, paired with __defaults__ you are pretty close to the inversed direction of the visibility PR. (and possibly some additional checks to see if inferred deps have been explicitly excluded and log accordingly)

# visibility
__defaults__(all=dict(visibility=["src/lib/a::", "src/lib/b::"]))

# vs dep excludes
__defaults__(all=dict(dependencies=["!!src/common::"]))

N.b. I may have mixed up the dependee vs dependencies side above, it's not super clear to me which term refers to which side.

Also, the visibility check is implemented slightly backwards, as it goes from a target, and then checks if it is allowed according to each dependency's visibility rules.

@benjyw
Copy link
Contributor

benjyw commented Jun 20, 2022

Got it. so ValidateDependenciesRequest is pluggable? Seems very sensible to me.

@kaos
Copy link
Member

kaos commented Jun 20, 2022

so ValidateDependenciesRequest is pluggable?

Yah, correct:

@union
@dataclass(frozen=True)
class ValidateDependenciesRequest(Generic[FS], ABC):
"""A request to validate dependencies after they have been computed.
An implementing rule should raise an exception if dependencies are invalid.
"""
field_set_type: ClassVar[Type[FS]] # type: ignore[misc]
field_set: FS
dependencies: Addresses
@dataclass(frozen=True)
class ValidatedDependencies:
pass

@kaos
Copy link
Member

kaos commented Jun 20, 2022

Oh, I got to make a shout out to ./pants help here! :D

$ ./pants help ValidateDependenciesRequest

`pants.engine.target.ValidateDependenciesRequest` api type
----------------------------------------------------------

A request to validate dependencies after they have been computed.

An implementing rule should raise an exception if dependencies are invalid.

activated by : pants.engine.target
union members: ExplorerValidateDependenciesRequest
               PythonValidateDependenciesRequest

Include API types and rules dependency information by running `./pants help-advanced pants.engine.target.ValidateDependenciesRequest`.

@benjyw
Copy link
Contributor

benjyw commented Nov 17, 2022

Absolutely epic work from @kaos here! We've finally closed one of our most requested issues!! Now just need docs and a blog post.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done