Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[jvm] Split thirdparty/firstparty packages need finer-grained thirdparty definitions #13661

Closed
stuhood opened this issue Nov 18, 2021 · 10 comments · Fixed by #13698
Closed

[jvm] Split thirdparty/firstparty packages need finer-grained thirdparty definitions #13661

stuhood opened this issue Nov 18, 2021 · 10 comments · Fixed by #13698
Assignees

Comments

@stuhood
Copy link
Member

stuhood commented Nov 18, 2021

It is somewhat common (two repos so far) for a firstparty package to overlap a thirdparty package (usually for a handful of classes). But having any overlap at all disables inference for the symbol, which can force adding explicit deps to lots of BUILD files.

It's possible that to resolve this we should support finer-grained package (symbol?) definitions for thirdparty packages. One option which would be sufficient for the cases I've encountered so far would be to allow for a ! syntax that skips matching for particular symbols. A thirdparty package definition could then do something like:

jvm_artifact(
  ...
  version='2.5.32',
  packages=[
    'akka.actor.testkit.**',
    # New:
    '!akka.actor.testkit.DefinedLocally',
  ],
)
@chrisjrn
Copy link
Contributor

Having given it a tiny amount of thought, I suspect we're not going to want the exclusions to be defined on jvm_artifact, but rather to change our approach to prioritising source files.

I can see in a large-ish repo where a given jvm_artifact is shared amongst projects, there's going to be potential for a lot of overrides, and those overrides aren't going to be localised to the place where the override is actually important. It may also get in the way of shading support – where a first-party/vendored source file could be needed to patch one project, but not another.

I suspect it's going to be easier for maintenance to provide overrides at a source file level. We can probably provide this by inference in most cases (certainly after source file analysis).

Perhaps java_sources/scala_sources could have a provides clause that we can drop in to say that we provide specific types where we don't/can't do inference?

My guess is that it's going to be easy enough for us to infer that a given first-party source provides a given type, and if a first-party source provides the type, then that should override a third-party package, unless a dependency is explicitly provided.

@stuhood
Copy link
Member Author

stuhood commented Nov 19, 2021

I can see in a large-ish repo where a given jvm_artifact is shared amongst projects, there's going to be potential for a lot of overrides, and those overrides aren't going to be localised to the place where the override is actually important.

Agreed in general. But a bit more color: the cases I encountered were not cases of actual collisions between symbols: rather, overlapping packages, with some symbols defined via thirdparty, and some locally. Given that, I think that I lean toward "more accurate" thirdparty definitions being preferable.

I think that if there had been actual collisions between symbols, explicitly declaring which dependency you wanted would be reasonable... or that the proposals on #13621 would allow marking one target as only visible to a particular resolve.

It may also get in the way of shading support – where a first-party/vendored source file could be needed to patch one project, but not another.

I don't think that putting this metadata on the source file helps to resolve that though... you'll still have affected all consumers by adding "provides". So in that regard, this seems like a choice between excluding in one place on the jvm_artifact (where we already an existing facility), vs including/"providing" in one place on a source target (with a new facility).

My guess is that it's going to be easy enough for us to infer that a given first-party source provides a given type, and if a first-party source provides the type, then that should override a third-party package, unless a dependency is explicitly provided.

Maybe... but it would be nice for that facility to fit generically with how we solve #13621. I think that the challenge I see with "provides" forcing something to win inference in all cases is that it doesn't allow for three different resolves having different sources of a package without each declaring them.

@chrisjrn
Copy link
Contributor

Agreed in general. But a bit more color: the cases I encountered were not cases of actual collisions between symbols: rather, overlapping packages, with some symbols defined via thirdparty, and some locally. Given that, I think that I lean toward "more accurate" thirdparty definitions being preferable.

I'm trying to feel out what a "more accurate" definition looks like -- in your akka example, is DefinedLocally something that's always going to be defined outside the artifact, and is it always going to be called DefinedLocally, or could it be something different? If it's always going to be called DefinedLocally, then it probably makes sense to have it as an exclusion, but this seems like an unrealistically narrow approach more generally -- there's nothing stopping people from defining any typename, and defining an exclusion on a jvm_artifact doesn't seem like the right place.

@stuhood
Copy link
Member Author

stuhood commented Nov 19, 2021

in your akka example, is DefinedLocally something that's always going to be defined outside the artifact, and is it always going to be called DefinedLocally, or could it be something different?

That's correct: the class is not defined in the thirdparty artifact: only locally. The thirdparty artifact otherwise defines the vast majority of the package.

there's nothing stopping people from defining any typename, and defining an exclusion on a jvm_artifact doesn't seem like the right place.

In this case, I view it as making the packages (probably a misnomer: could be symbols) defined for a jvm_artifact more precise than our mapping currently allows for. If we were actually directly indexing the jars, we would already know that these don't collide (and who knows, we might be able to do that in the future if we can avoid performance issues with it).

@chrisjrn
Copy link
Contributor

@stuhood Sorry, I musn't have been clear here. Is the class always going to be called DefinedLocally? (i.e. is calling a class DefinedLocally a thing that the library expects of you for some reason?)

Basically, my concern is that defining exclusions at the jvm_artifiact level hides the configuration for a dependency in the wrong place. If it's the case that a library often requires creating carveouts like DefinedLocally then you've got a good solution, but that seems like a strange solution to me absent that understanding.

@chrisjrn
Copy link
Contributor

Having let this stew for a bit longer:

  1. I can't see any requirement on the part of akka.actor.testkit that there be classes called DefinedLocally. I can't quite rationalise defining a specific exclusion on the jvm_artifact based on a definition that doesn't derive from the artifact itself (i.e. we're creating a type called DefinedLocally elsewhere out of choice, rather than the API for the artifact saying that DefinedLocally needs to be created). I think defining exclusions to achieve the goal that you've set out is just not going to be maintainable in practice.
  2. I think a better solution for this specific limitation is to allow given source files the ability to claim priority in cases of dependency inference. I have a few options here, and I'm not sure which to implement:
    i. We add a tie-break facility -- if a source file provides a specific type, but an artifact provides a glob, then accept the option with the more specific type definition
    ii. If a source file defines a type that clashes with a jvm_artifact, allow it to specify priority over the artifact: DefinedLocally.java can specify that it overrides akka.testkit

I personally think option ii makes the most sense, as it lines up with what you're actually doing: defining a type that should be provided instead of a given JAR. i might work, but I'm not totally sure what the tiebreaking rules should be.

@chrisjrn
Copy link
Contributor

After some discussion with @tdyas, we're going down the route of adding a provides field on java_sources/scala_sources.

This follows the principle of keeping the configuration for a piece of code as close as possible to the code itself.

If you define a type called DefinedLocally, then the configuration for that type belongs next to that type, not in a jvm_artifact that is likely to be in a completely different part of a repo. It's more likely that the person writing DefinedLocally is more familiar with the part of the repo where that type is defined than where the jvm_artifact is defined.

For larger repos, defining these overrides at the jvm_artifact level is likely to make the jvm_artifact definition complicated.

The "exclusion" approach may still be something we implement later on, but this is more likely to be useful when defining partially overlapping jvm_artifacts (e.g. guava and guava-testkit have overlapping com.google.common subpackages), but it doesn't make sense in the firstparty/thirdparty split case.

@tdyas
Copy link
Contributor

tdyas commented Nov 22, 2021

For larger repos, defining these overrides at the jvm_artifact level is likely to make the jvm_artifact definition complicated.

Agreed. Will add that, if there are multiple "DefinedLocally" implementations that conflict, then that condition will be detected when Pants builds the trie.

I also recommend calling the field experimental_provides so that we don't make a stability guarantee yet for it. This will allow us to see whether this approach is actually advantageous in actual practice.

chrisjrn pushed a commit that referenced this issue Nov 24, 2021
Adds a new field to `java_source`(`s`) and  `scala_source`(`s`) targets that allow them to strongly declare that they provide a specific JVM type. This excludes these types from being provided by third-party dep inference. This allows first-party sources to provide types that amend a third-party package.

Closes #13661
@Eric-Arellano
Copy link
Contributor

The chosen implementation makes sense to me.

If we were to store all this information on jvm_artifact, I think I would want it to be implemented as an exhaustive allowlist, rather than using an excludelist. I agree with Chris that you would be leaking those excludes from your codebase. And then that's horribly verbose to have to allowlist every symbol.

@stuhood
Copy link
Member Author

stuhood commented Dec 2, 2021

I think I would want it to be implemented as an exhaustive allowlist, rather than using an excludelist

It's already supposed to be an exhaustive allowlist because it has wildcards... the proposal was to add excludes to handle cases where you know that something doesn't exist remotely or should be ignored. I think that we ended up with a good solution for the latter case, but the former case is a bit more tenuous.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants