From f01144694f4adb447c45980eaeda39f33c737179 Mon Sep 17 00:00:00 2001 From: Jeremy Fitzhardinge Date: Sat, 29 Jul 2023 13:24:14 -0700 Subject: [PATCH 1/6] WIP --- text/0000-extern-impl.md | 218 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 218 insertions(+) create mode 100644 text/0000-extern-impl.md diff --git a/text/0000-extern-impl.md b/text/0000-extern-impl.md new file mode 100644 index 00000000000..9c671c5108f --- /dev/null +++ b/text/0000-extern-impl.md @@ -0,0 +1,218 @@ +- Feature Name: (fill me in with a unique ident, `my_awesome_feature`) +- Start Date: (fill me in with today's date, YYYY-MM-DD) +- RFC PR: [rust-lang/rfcs#0000](https://github.com/rust-lang/rfcs/pull/0000) +- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000) + +# Summary +[summary]: #summary + +This RFC proposes a mechanism to allow `impl`s for a type or trait to be in a +separate crate, while still meeting the soundness guarantees currently enforced +by the orphan rule. + +# Motivation +[motivation]: #motivation + +In order to guarantee coherence, Rust currently imposes the fairly strict orphan +rule. That is: +- only a crate defining a type can have inherent implementations for that type, + or for any traits, and +- only a crate defining a trait can implement that trait for any type + +In other words, a crate is not allowed to have "third-party implementations" - +it's not allowed to add new inherent implementations to a type, nor is it +allowed to implement a foreign trait for a foreign type. + +This raises a number of problems, but this RFC's principle concern is that of +build performance and dependency graph scaling. + +For example: + +- It common for a to only use a crate for its type definitions, but not need any + of it's implementations (for example, when mentioning a type in a function + signature or another type definition). However, with the current orphan rule, + this would require waiting for the full crate to compile, including all of its + dependencies. +- When introducing a new trait, it's useful to provide implementations for other + external types. But this means the trait-definition crate takes on + depdendencies on all those downstream dependencies whether they're needed or not. + +~~What this RFC proposes is a mechanism which allows implementations to be +delegated to "trusted" or "friendly" crates (typically defined in the same +package), allowing downstream consumers to take on only the specific +dependencies they need. Furthermore a "witness" mechanism makes sure that the +coherence invariants currently enforced by the orphan rule are maintained.~~ + +Dependency graph dominator based coherence checking. + +Status Quo: +```mermaid +graph LR +MysqlUser --> DBTrait +MongoUser --> DBTrait +DBTrait --> Mysql +DBTrait --> MongoDB +DBTrait --> RocksDB +``` + +Proposed: +```mermaid +graph LR +MysqlUser --> DBMysql +MongoUser --> DBMongo +DBMysql --> Mysql +DBMysql -->|impl| DBTrait +DBMongo --> MongoDB +DBMongo -->|impl| DBTrait +DBRocks --> RocksDB +DBRocks -->|impl| DBTrait +``` +# Guide-level explanation +[guide-level-explanation]: #guide-level-explanation + +- defining crate and impl crate + +`--extern impl:name=name.rlib` + +- impl crate must directly depend on defining crate - re-exports don't count +- defining crate & impl crate are considered the same (eg `pub(crate)` is visible to both) +- impls in defining & impl must be coherent + +# Reference-level explanation +[reference-level-explanation]: #reference-level-explanation + +- "Local" for coherence checking extended from "definitions in one crate" to "definitions from set of crates" +- set of crates are impls for a definition which are downstream from dominator + +Dominating crates: +- crate dominates itself (so must be internally coherent) +- impl crate dominates self and defining crate (so must be coherent) +- least dominator for a set of impl crates must guarantee coherence + +Defining crate is its own implementing crates, and must be internally coherent +(ie, status quo) +```mermaid +graph TD +D["`D +also I0`"] +``` + +Each impl crate dominates its defining crate(s) and must be coherent with them: +```mermaid +graph TD +I1 -->|impl| D +I2 -->|impl| D +In["I..n"] -->|impl| D +``` + +Least dominator of all impl crates for a given defining crate must check +coherence of impls: +```mermaid +graph TD +D["`D +Defines types/traits`"] +I1 -->|impl| D +I2 -->|impl| D +I3 -->|impl| D +In["`I..n +impls for types/traits in D`"] -->|impl| D +C0 --> I1 +C1 --> I2 +C2["`C2 +must check for I1, I2 coherence`"] +C2 --> C0 +C2 --> C1 +C3["C3 need not check"] +C3 --> C2 +C4["C4 must check I1, I2, I3 coherence"] +C4 --> C2 +C4 --> I3 +``` + +## Extra cases + +Impl crates depend on each other +```mermaid +graph TD + I1 -->|impl| D + I2 -->|impl| D + I2 --> I1 +``` +Impl crates use other crates (common) +```mermaid +graph TD + I1 -->|impl| D + I1 --> T +``` +Impl crates implement multiple definition crates +```mermaid +graph TD + I1 -->|impl| D0 + I1 -->|impl| D1 +``` +Impl crates define types and have their own impl crates +```mermaid +graph TD + I1 -->|impl| D0 + I1 -->|impl| I2 + I2 -->|impl| D0 +``` + + +This is the technical portion of the RFC. Explain the design in sufficient detail that: + +- Its interaction with other features is clear. +- It is reasonably clear how the feature would be implemented. +- Corner cases are dissected by example. + +The section should return to the examples given in the previous section, and explain more fully how the detailed proposal makes those examples work. + +# Drawbacks +[drawbacks]: #drawbacks + +- Complexity +- Ecosystem split + +# Rationale and alternatives +[rationale-and-alternatives]: #rationale-and-alternatives + +- Features? + - only work for small-scale projects + +# Prior art +[prior-art]: #prior-art + +Discuss prior art, both the good and the bad, in relation to this proposal. +A few examples of what this can include are: + +- For language, library, cargo, tools, and compiler proposals: Does this feature exist in other programming languages and what experience have their community had? +- For community proposals: Is this done by some other community and what were their experiences with it? +- For other teams: What lessons can we learn from what other communities have done here? +- Papers: Are there any published papers or great posts that discuss this? If you have some relevant papers to refer to, this can serve as a more detailed theoretical background. + +This section is intended to encourage you as an author to think about the lessons from other languages, provide readers of your RFC with a fuller picture. +If there is no prior art, that is fine - your ideas are interesting to us whether they are brand new or if it is an adaptation from other languages. + +Note that while precedent set by other languages is some motivation, it does not on its own motivate an RFC. +Please also take into consideration that rust sometimes intentionally diverges from common language features. + +# Unresolved questions +[unresolved-questions]: #unresolved-questions + +Intended for close coupled crates from same origin, but *could* be used for +generic third-party impls. + +Corresponding Cargo part + - Multi-crate packages? + - Namespaces + - Some other abstraction? + +# Future possibilities +[future-possibilities]: #future-possibilities + +- Transitiive impls +- Total coherence check + - All impl crates are checked against each other for publication, even if not + used together + - Or deliberately allow for conflicting impls in different impl crates, eg + different algorithms for different use-cases for the same thing \ No newline at end of file From fbd0b59eb3fea3d1e85ea4293a7e8ad52752ff6c Mon Sep 17 00:00:00 2001 From: Jeremy Fitzhardinge Date: Sat, 29 Jul 2023 19:59:19 -0700 Subject: [PATCH 2/6] Lots more details --- text/0000-extern-impl.md | 339 ++++++++++++++++++++++++++------------- 1 file changed, 224 insertions(+), 115 deletions(-) diff --git a/text/0000-extern-impl.md b/text/0000-extern-impl.md index 9c671c5108f..fe19dfdd43e 100644 --- a/text/0000-extern-impl.md +++ b/text/0000-extern-impl.md @@ -1,5 +1,5 @@ -- Feature Name: (fill me in with a unique ident, `my_awesome_feature`) -- Start Date: (fill me in with today's date, YYYY-MM-DD) +- Feature Name: extern_impl +- Start Date: 2023-07-17 - RFC PR: [rust-lang/rfcs#0000](https://github.com/rust-lang/rfcs/pull/0000) - Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000) @@ -8,7 +8,8 @@ This RFC proposes a mechanism to allow `impl`s for a type or trait to be in a separate crate, while still meeting the soundness guarantees currently enforced -by the orphan rule. +by the orphan rule. This allows the code author to use the build system to more +flexibly partition functionality among crates. # Motivation [motivation]: #motivation @@ -20,181 +21,277 @@ rule. That is: - only a crate defining a trait can implement that trait for any type In other words, a crate is not allowed to have "third-party implementations" - -it's not allowed to add new inherent implementations to a type, nor is it -allowed to implement a foreign trait for a foreign type. +it's not allowed to add new inherent implementations to a non-local type, nor is +it allowed to implement a non-local trait for a non-local type. This raises a number of problems, but this RFC's principle concern is that of -build performance and dependency graph scaling. +dependency graph scaling and build performance. Specifically, the current orphan +rule sets a hard limit how finely implementations can be distributed amongst +crates. For example: +- When introducing a new trait, it's useful to provide implementations for other + external types. But this means the trait-definition crate takes on + depdendencies on all those downstream dependencies whether they're ultimately + needed or not. - It common for a to only use a crate for its type definitions, but not need any of it's implementations (for example, when mentioning a type in a function signature or another type definition). However, with the current orphan rule, this would require waiting for the full crate to compile, including all of its dependencies. -- When introducing a new trait, it's useful to provide implementations for other - external types. But this means the trait-definition crate takes on - depdendencies on all those downstream dependencies whether they're needed or not. -~~What this RFC proposes is a mechanism which allows implementations to be -delegated to "trusted" or "friendly" crates (typically defined in the same -package), allowing downstream consumers to take on only the specific -dependencies they need. Furthermore a "witness" mechanism makes sure that the -coherence invariants currently enforced by the orphan rule are maintained.~~ +What this RFC proposes is a new kind of dependency relationship between two +crates, in which a crate can have an "impl" dependency on a crate defining a +type or trait, and coherence is a computed over more than one crate. + +## More concrete example + +Say you've just written a new crate which defines a univeral interface to all +types of databases, called `DBTrait`. You want to make this trait useful from +the outset by implementing it for a number of existing DB crates. -Dependency graph dominator based coherence checking. +Today this means your `DBTrait` crate would need to take dependencies on every +one of those DB crates in order to do `impl DBTrait for X`. It also means that +every downstream user of `DBTrait` would also transitively gain those +dependencies. -Status Quo: +Here, `MysqlUser` ends up depending on all of `Mysql`, `MongoDB` and `RocksDB` +even though it only cares about `Mysql`. ```mermaid -graph LR -MysqlUser --> DBTrait -MongoUser --> DBTrait -DBTrait --> Mysql -DBTrait --> MongoDB -DBTrait --> RocksDB +graph TB +MysqlUser(MysqlUser) --> DBTrait --> Mysql & MongoDB & RocksDB ``` -Proposed: +With this RFC, the implementation of `DBTrait` for each of the database crates +would be distributed among multiple implementation crates. Downstream users +would only need to take a dependency on the specific implemenetations they need. + +In this example, `MysqlUser` depends only on `DBTrait` and `DBMysql`, and is +unaffected by the `DBMongo` and `DBRocks` implementations. ```mermaid -graph LR -MysqlUser --> DBMysql -MongoUser --> DBMongo +graph TB +Bin(MysqlUser) --> DBMysql +DBMysql & DBMongo & DBRocks -->|impl| DBTrait DBMysql --> Mysql -DBMysql -->|impl| DBTrait DBMongo --> MongoDB -DBMongo -->|impl| DBTrait DBRocks --> RocksDB -DBRocks -->|impl| DBTrait ``` # Guide-level explanation [guide-level-explanation]: #guide-level-explanation -- defining crate and impl crate +This RFC defines the notion of an *defining* crate and an *implementing* crate. -`--extern impl:name=name.rlib` +We extend the `rustc` `--extern` flag with an additional `impl` flag: +`--extern impl:dbtrait=dbtrait.rlib`. This means that *this* crate (the +*implementing* crate), that rustc is currently compiling, is allowed to +implement traits and methods for types defined in the `dbtrait` crate (the +*defining* crate). -- impl crate must directly depend on defining crate - re-exports don't count -- defining crate & impl crate are considered the same (eg `pub(crate)` is visible to both) -- impls in defining & impl must be coherent +For example: +```rust +//! Defining crate +trait DBTrait {} +``` +```rust +//! Implementation crate, also depends on the `mysql` crate. +use mysql::MySql; -# Reference-level explanation -[reference-level-explanation]: #reference-level-explanation +// Allowed because we're depending on `dbtrait` with `impl` +impl dbtrait::DBTrait for MySql {} +``` -- "Local" for coherence checking extended from "definitions in one crate" to "definitions from set of crates" -- set of crates are impls for a definition which are downstream from dominator +In other words, the implementing crate is considered the same as the defining +crate as far as the orphan rule goes. Note that you may only implement traits +and types for crates directly defined in the defining crate — you cannot +implement re-exported types. -Dominating crates: -- crate dominates itself (so must be internally coherent) -- impl crate dominates self and defining crate (so must be coherent) -- least dominator for a set of impl crates must guarantee coherence +The implementing crate is also considered the same as the defining crate from a +visibility perspective - it will be able to access `pub(crate)` modules, types, +fields, methods, etc in the defining crate (though in this case including +re-exports). -Defining crate is its own implementing crates, and must be internally coherent -(ie, status quo) -```mermaid -graph TD -D["`D -also I0`"] -``` +The defining and implementing crates are not the same in other respects. They +still have distinct crate names, defining their own crate-level namespaces. For +example, if they both defined a type `Foo`, it would be two definitions +`defining::Foo` and `implementing::Foo`. + +There are no other constraints on the defining and implementing crates - they +can freely define new traits, types and other items, and implementations for +those types and traits. An implementing crate can be a defining crate for +another crate. -Each impl crate dominates its defining crate(s) and must be coherent with them: +Regardless, the definitions within the defining crate, the implementing crate, +and between the implementing crate and defining crate must meet the current +coherence requirements. Also if there are multiple implementation crates in +scope in the dependency graph, they must also be coherent. + +# Reference-level explanation +[reference-level-explanation]: #reference-level-explanation + +[Coherence](https://rust-lang.github.io/chalk/book/clauses/coherence.html) is +currently defined in terms of "local" definitions - that is, items defined with +in the current crate. + +This RFC does not change the definition of Coherence, just the definition of +"local". Instead of only considering items defined in one specific crate, it +defines a set of crates whose definitions are considered "local". + +This set is defined in terms of immediate dominators in the crate dependency +graph. + +Some definitions & axioms: +- a dependency graph always has a top-level crate (typically a binary) which: + - has no dependencies + - only crates which it transitively depends on are relevant +- a "local set" of crates TODO +- an implementing crate must have a dependency relationship to a defining crate + in order to actually have implementations +- a crate always dominates itself +- a crate is the dominator of other crates if it's on the only path to + those other crates + - it's the immediate dominator if it's the closest on that path + +So given this example compiling binary crate `Binary`: +```mermaid +graph TB + Bin(Binary) --> A & B + A & B -->|impl| C + D -->|impl| C + ``` +- When compiling `C`, because it dominates itself, `rustc` must check that + it is internally coherent (this is identical to today) +- When compiling `A`, which dominates itself and `C`, `rustc` must check `A` is + internally coherent and coherence between `A` and `C`. Likewise `B` with + respect to `C`.` +- `Binary` dominates both `A` and `B`, so when compiling it `rustc` must check + for coherence between them. It doesn't need to worry about coherence between + `A` and `C`, nor `B` and `C`, since that's already been checked. +- `D` is not relevant to any of this, as it is not in `Binary`'s transitive + dependencies, and its coherence is not checked. + +In a more complex example: +```mermaid +graph TB + Bin(Binary) --> A & C + A --> D & E + C & D & E -->|impl| F + ``` + - `F` must be internally coherent, likewise `D` & `E` with respect to `F`. + - Compiling `A` must check for coherence between `D` and `E` with respect to `F` + - Compiling `Binary` must check for coherence between `C`, `D` and `E` (which + may be able to take advantage of the knowledge that `D` and `E` are already + coherent because `A` already checked it.) + +## Changes to coherence checking + +In the simple case, when checking the coherence between an implementing crate +and the defining crate, definitions in both the defining and implementing crate +are considered "local". Aside from that, the algorithm then proceeds as before. +Here, implementing crates `A`, `B` and `C` are each considered local to defining crate `D`, but not to each other. ```mermaid graph TD -I1 -->|impl| D -I2 -->|impl| D -In["I..n"] -->|impl| D + A & B & C --> D ``` -Least dominator of all impl crates for a given defining crate must check -coherence of impls: +When checking the coherence of multiple implementing crates, we must be careful +with the definition of "local". Since the implementing crates don't depend on +each other, and can't have implementations for each other's types, we can safely +consider them local to each other with respect to the defining crate `D` for the +purposes of coherence checking. + ```mermaid -graph TD -D["`D -Defines types/traits`"] -I1 -->|impl| D -I2 -->|impl| D -I3 -->|impl| D -In["`I..n -impls for types/traits in D`"] -->|impl| D -C0 --> I1 -C1 --> I2 -C2["`C2 -must check for I1, I2 coherence`"] -C2 --> C0 -C2 --> C1 -C3["C3 need not check"] -C3 --> C2 -C4["C4 must check I1, I2, I3 coherence"] -C4 --> C2 -C4 --> I3 +graph TB + Bin(Binary) --> A & B & C --> D + subgraph Local + A & B & C + end ``` ## Extra cases Impl crates depend on each other ```mermaid -graph TD - I1 -->|impl| D - I2 -->|impl| D +graph TB + I1 & I2 -->|impl| D I2 --> I1 ``` Impl crates use other crates (common) ```mermaid -graph TD +graph TB I1 -->|impl| D I1 --> T ``` Impl crates implement multiple definition crates ```mermaid -graph TD - I1 -->|impl| D0 - I1 -->|impl| D1 +graph TB + I1 -->|impl| D0 & D1 ``` Impl crates define types and have their own impl crates ```mermaid -graph TD +graph TB I1 -->|impl| D0 I1 -->|impl| I2 I2 -->|impl| D0 ``` +Duplicated checking -- `A` and `B` would both check the coherence of `I1` and +`I2` with respect to each other, since neither knows about the other. +```mermaid +graph TB +I1 & I2 -->|impl| D +Bin([Binary]) --> A & B --> I1 & I2 - -This is the technical portion of the RFC. Explain the design in sufficient detail that: - -- Its interaction with other features is clear. -- It is reasonably clear how the feature would be implemented. -- Corner cases are dissected by example. - -The section should return to the examples given in the previous section, and explain more fully how the detailed proposal makes those examples work. +``` # Drawbacks [drawbacks]: #drawbacks -- Complexity -- Ecosystem split +This adds a fair amount of complexity to a delecate part of the language +semantics and to the compiler's implementation of it. I've tried to keep the +changes as light-touch as possible, but I haven't tried to prototype it yet. + +If implemented as described here, and enabled in Cargo in an unconstrainted way, +it would enabled full third-party implementations. That is, anyone could publish +a crate implementing any trait for any type. This would be fine in the small, +but at large scales it would mean that if there were two crates with conflicting +implementations, then could never appear in the same dependency graph. It means +that non-local changes to dependencies could cause downstream build breakage. +This would lead to very undesireable ecosystem-wide dynamics. As such how this +feature is used must be very carefully considered. # Rationale and alternatives [rationale-and-alternatives]: #rationale-and-alternatives -- Features? - - only work for small-scale projects +In principle you could use features to achieve similar outcomes to the +motivating `DBTrait` example above - you'd define a feature for each DB +implementation, and the consumer would only enable the specific implementations +they want. + +For relatively small Cargo-defined projects, this would probably work fine. You +would lose some of the benefits (like type-only crates for use in declarations), +but it would solve the fundamental "don't want to depend on every DB" problem. + +However, because features are additive, as the dependency graphs get larger it +will tend towards having every feature enabled, so that every subgraph of the +overall dependency graph will end up depending on what everyone else depends on. +As a result, trait-defining crates like `DBTrait` become dependency bottlenecks +which end up single-threading the build. + +(This is particularly acute in cases where Cargo is primarily used as a +dependency management tool, and builds are done with another tool like Buck.) # Prior art [prior-art]: #prior-art -Discuss prior art, both the good and the bad, in relation to this proposal. -A few examples of what this can include are: +...? -- For language, library, cargo, tools, and compiler proposals: Does this feature exist in other programming languages and what experience have their community had? -- For community proposals: Is this done by some other community and what were their experiences with it? -- For other teams: What lessons can we learn from what other communities have done here? -- Papers: Are there any published papers or great posts that discuss this? If you have some relevant papers to refer to, this can serve as a more detailed theoretical background. +Other languages have a very weak notion of coherence. C++ doesn't enforce it at +all, and simply defines conflicting implementations as UB. Haskell allows for +ecosystem wide conflicts, which may or may not be a problem in practice. -This section is intended to encourage you as an author to think about the lessons from other languages, provide readers of your RFC with a fuller picture. -If there is no prior art, that is fine - your ideas are interesting to us whether they are brand new or if it is an adaptation from other languages. - -Note that while precedent set by other languages is some motivation, it does not on its own motivate an RFC. -Please also take into consideration that rust sometimes intentionally diverges from common language features. +Rust is unique(?) in separately languages in taking such a strong stand on this, +and it's been mostly beneficial. But it would be nice to have a mechanism to +loosen the constraints on this without admitting unsoundness. # Unresolved questions [unresolved-questions]: #unresolved-questions @@ -202,17 +299,29 @@ Please also take into consideration that rust sometimes intentionally diverges f Intended for close coupled crates from same origin, but *could* be used for generic third-party impls. -Corresponding Cargo part - - Multi-crate packages? - - Namespaces - - Some other abstraction? +The big question is what this looks like from a Cargo perspective. I can think +of a few options: +- Extend the namespace proposal so that crates in the same namespace can have + `impl` relations +- Multi-crate Cargo packages, which allow internal `impl` dependencies + # Future possibilities [future-possibilities]: #future-possibilities -- Transitiive impls -- Total coherence check - - All impl crates are checked against each other for publication, even if not - used together - - Or deliberately allow for conflicting impls in different impl crates, eg - different algorithms for different use-cases for the same thing \ No newline at end of file +Allow transitive impl relations. For example: +```mermaid +graph LR + A -->|impl| B -->|impl| C +``` +Allowing `A` to implement items in `C` because of the transitive `impl` +relationship through B. I think this potentially complicates the correctness +analysis a lot though, and doesn't really have a lot of benefits over just +making `A` depend on `C` directly. + +There's a question on whether to globally enforce coherence even between crates +which are not in the same dependency graph. This could be particularly useful at +crates.io publication time to make sure the package doesn't have lurking +problems. On the other hand there might be cases where it's desireable to allow +incoherent implementations to exist; for example, two implementations of the +same API with different algorithms tuned for different use-cases. From 4d75622d9d7982bd0104a48f79f77d7d7c3da144 Mon Sep 17 00:00:00 2001 From: Jeremy Fitzhardinge Date: Sat, 29 Jul 2023 20:09:14 -0700 Subject: [PATCH 3/6] More tweaks --- text/0000-extern-impl.md | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-) diff --git a/text/0000-extern-impl.md b/text/0000-extern-impl.md index fe19dfdd43e..11de02921f0 100644 --- a/text/0000-extern-impl.md +++ b/text/0000-extern-impl.md @@ -212,24 +212,24 @@ graph TB Impl crates depend on each other ```mermaid -graph TB +graph LR I1 & I2 -->|impl| D I2 --> I1 ``` Impl crates use other crates (common) ```mermaid -graph TB +graph LR I1 -->|impl| D I1 --> T ``` Impl crates implement multiple definition crates ```mermaid -graph TB +graph LR I1 -->|impl| D0 & D1 ``` Impl crates define types and have their own impl crates ```mermaid -graph TB +graph LR I1 -->|impl| D0 I1 -->|impl| I2 I2 -->|impl| D0 @@ -262,7 +262,7 @@ feature is used must be very carefully considered. # Rationale and alternatives [rationale-and-alternatives]: #rationale-and-alternatives -In principle you could use features to achieve similar outcomes to the +In principle you could use Cargo features to achieve similar outcomes to the motivating `DBTrait` example above - you'd define a feature for each DB implementation, and the consumer would only enable the specific implementations they want. @@ -325,3 +325,7 @@ crates.io publication time to make sure the package doesn't have lurking problems. On the other hand there might be cases where it's desireable to allow incoherent implementations to exist; for example, two implementations of the same API with different algorithms tuned for different use-cases. + +There are cases where the same coherence properties may be checked multiple +times. Perhaps these checks could be memoized with the incremental machinery to +avoid the redundancy. From 3ebc4e84f9aeaa20189517abb03225511cfbabea Mon Sep 17 00:00:00 2001 From: Jeremy Fitzhardinge Date: Mon, 7 Aug 2023 02:56:42 -0700 Subject: [PATCH 4/6] Add WIP alloy spec --- text/0000-extern-impl.md | 168 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 168 insertions(+) diff --git a/text/0000-extern-impl.md b/text/0000-extern-impl.md index 11de02921f0..ae74c4d6266 100644 --- a/text/0000-extern-impl.md +++ b/text/0000-extern-impl.md @@ -329,3 +329,171 @@ same API with different algorithms tuned for different use-cases. There are cases where the same coherence properties may be checked multiple times. Perhaps these checks could be memoized with the incremental machinery to avoid the redundancy. + +# Appendix - Alloy spec (WIP) + +--- +title: Alloy spec for Rust implementation coherency +--- + +Note: this is a literate [Alloy](https://alloytools.org/) spec. Download the +most recent version of Alloy from +https://github.com/AlloyTools/org.alloytools.alloy/releases, and see +https://alloy.readthedocs.io for documentation. + +# Alloy spec for Rust implementation coherency + +This is a simplified spec of Rust coherency checking. Rust requires that there, +are no overlapping or conflicting implementations with a complete Rust program, +as that would allow for ambiguity about which one to use. + +(This applies to all implementations, but here we're only going to consider the +subset of simple trait implementations for types, with no generic type +parameters.) + +The current solution to this is to requires that when a crate implements a trait +for a type, at least one of the type or the trait have to be "local" - ie, +defined in the same crate. + +This is simple to describe and implement, but in practice it has several downsides: +- (dependencies, no third-party impls) + +## Alloy Spec + +First we define signatures for types and traits. Both are defined in a crate: + +```alloy +sig Trait { + trait_def: one Crate +} + +sig Type { + type_def: one Crate +} +``` + +And crates themselves. Crates can depend on a set of other crates, but the +overall crate dependency graph must be acyclic. Each crate also has a relation +of which trait implementations for which types it contains. + +```alloy +sig Crate { + deps: set Crate, + impls: Trait -> Type, +} { + no this & this.^@deps -- acyclic +} +``` + +A Binary is the unique "top-level" crate which depends on all the other crates +transitively. Or to put it another way, no other crate depends on Binary. + +```alloy +one sig Binary extends Crate {} { + no @deps.this -- nothing depends on Binary + all c: Crate - Binary | c in this.^@deps -- Binary depends on everything else +} +``` + +Let's define the safety invariant we need to enforce, that every implementation +is unique. Or more precisely, for every trait/type pair, there's at most one +crate implementing it. (It's fine if nothing implements it.) + +```alloy +pred coherent_impls[crates: Crate] { + all tr: Trait, ty: Type | lone crates & impls.ty.tr +} +`````` + +This is the basic orphan rule, with a tight definition of "local": either the +type or the trait must be defined in the crate: + +```alloy +pred local_orphan_rule[crates: Crate] { + all crate: crates | + crate.impls in + (crate[trait_def] -> (crate + crate.deps)[type_def]) + + ((crate + crate.deps)[trait_def] -> crate[type_def]) +} +``` + +We can check that if `local_orphan_rule` is true for all crates, then we have +coherence for all crates. This has no counter-examples. + +```alloy +check local_coherent { + -- ie, checking local_orphan_rule on each crate implies that all crates are globally coherent + local_orphan_rule[Crate] => coherent_impls[Crate] +} +``` + +## impl dependencies + +Let's extend the orphan constraint so that the definition of "local" is extended +to immediate dependencies as well. In a way this is simpler than +`local_orphan_rule` because we no longer have to constrain either the type or +trait to be in `crate`. + +```alloy +pred dep_orphan_rule[crates: Crate] { + all crate: crates | + crate.impls in + (crate + crate.deps)[trait_def] -> (crate + crate.deps)[type_def] +} +``` + +However, this is not sufficient to maintain the invariant. This will quickly +find a counter-example with two crates with a duplicate implementation. + +```alloy +check dep_coherent_bad { + dep_orphan_rule[Crate] => coherent_impls[Crate] +} +``` + +We need to add additional constraints to maintain the invariant. First, let's +define a function which, for a given crate which defines types and/or traits, +all the crates with implementations for those definitions: +```alloy +fun impl_crates[c: Crate]: set Crate { + c[trait_def][impls.univ] + c[type_def][impls].univ +} +``` + +We can then apply the constraint in the most general way: for all dependencies +of Binary, all the crates implementing anything must be coherent. We'll ignore +that this is a tautology for now, as we'll tighten this up later. + +```alloy +check dep_coherent_impl_crates { + { + dep_orphan_rule[Crate] -- redundant + all dep: Binary.*deps | + coherent_impls[impl_crates[dep]] -- tautology + } => coherent_impls[Crate] +} for 10 -- make sure there's enough objects to be interesting +``` + +Unfortunately, this doesn't correspond to how the build is actually performed in +practice. When we're compiling a crate which has definitions, we don't know +which crates will have implementations. And when we're compiling the crates with +implementations, we don't know which other crate to cross-check with for +coherence. + +The key insight is that there must be *some* crate which has both the +implementing crates in its transitive dependencies (even if it's the top-level +Binary), which means it can check for coherence when compiling that crate. + +For example, here `Use` is responsible for checking the coherence of `ImplA` and +`ImplB` in its dependencies, but `ImplC` someone else's problem, and `Other` is +not relevant. +```mermaid +graph TD + Use --> ImplA & ImplB + ImplA & ImplB & ImplC --> Defn + Use --> Other +``` + +```alloy +-- TODO check constraint for common deps +``` From 8dd90d73e5290fefb89a8d9fafb8a329e47be554 Mon Sep 17 00:00:00 2001 From: Jeremy Fitzhardinge Date: Sat, 2 Sep 2023 18:03:10 -0700 Subject: [PATCH 5/6] Revise and clarify --- text/0000-extern-impl.md | 471 +++++++++++++++++++++++++-------------- 1 file changed, 298 insertions(+), 173 deletions(-) diff --git a/text/0000-extern-impl.md b/text/0000-extern-impl.md index ae74c4d6266..385580c8f1f 100644 --- a/text/0000-extern-impl.md +++ b/text/0000-extern-impl.md @@ -8,42 +8,51 @@ This RFC proposes a mechanism to allow `impl`s for a type or trait to be in a separate crate, while still meeting the soundness guarantees currently enforced -by the orphan rule. This allows the code author to use the build system to more -flexibly partition functionality among crates. +by the orphan rule. With this, the crate structure (and thus build system +dependency graph) can be decoupled from the actual implementations. # Motivation [motivation]: #motivation -In order to guarantee coherence, Rust currently imposes the fairly strict orphan -rule. That is: -- only a crate defining a type can have inherent implementations for that type, - or for any traits, and -- only a crate defining a trait can implement that trait for any type - -In other words, a crate is not allowed to have "third-party implementations" - -it's not allowed to add new inherent implementations to a non-local type, nor is -it allowed to implement a non-local trait for a non-local type. - -This raises a number of problems, but this RFC's principle concern is that of -dependency graph scaling and build performance. Specifically, the current orphan -rule sets a hard limit how finely implementations can be distributed amongst -crates. - -For example: - -- When introducing a new trait, it's useful to provide implementations for other - external types. But this means the trait-definition crate takes on - depdendencies on all those downstream dependencies whether they're ultimately - needed or not. -- It common for a to only use a crate for its type definitions, but not need any - of it's implementations (for example, when mentioning a type in a function - signature or another type definition). However, with the current orphan rule, - this would require waiting for the full crate to compile, including all of its - dependencies. - -What this RFC proposes is a new kind of dependency relationship between two -crates, in which a crate can have an "impl" dependency on a crate defining a -type or trait, and coherence is a computed over more than one crate. +Rust requires implementation coherence - that is, there's at most one +implementation of a trait for a type. This is required to: +- avoid any ambiguity about which implementation to use when compiling, and +- make sure that adding a new implementation does not change the meaning of +existing code. + +To do this, Rust implements the orphan rule, which requires that at least one of +the trait or the type are defined in the same crate as the implementation (ie is +a "local" definition). The "other" definition must be in a crate which is a +dependency, and as the dependency graph is acyclic, that "other" crate can't +possibly have a conflicting implementation because it can't have both +definitions in scope. + +The effect of this rule imposes some very hard constraints on how a developer +can factor their code into crates. For example it prevents: +- introducing a new trait and implementing it for existing types, without adding + dependencies for those types to all trait users + - more generally, refactoring to break dependency chains +- refactoring code to increase build parallelism +- adding trait implementations for types in crates whose source is generated (eg protobuf) +- splitting libstd/libcore +- multiple conflicting implementations to make different implementation + tradeoffs (eg size vs performance), or bindings to incompatible native + libraries (python 2 vs python 3) + +This RFC proposes an extension to the orphan rule which retains the coherence +invariant, while loosening the "locality" requirement for implementation. This +considerably increases implementation flexibility, at the cost of no longer +being able to enforce the invariant purely locally. + +At the rustc and language level, this mechanism is very general, and allows +almost arbitrary implementation relations between crates. However in practice +unconstrained use of this feature (ie implementing traits for types from +different crates from unrelated origins) would have large impacts on the general +Rust / crates.io ecosystem if not carefully considered. + +As a result, this RFC also proposes a fairly small Cargo extension which allows +the feature to be used within a workspace for experimentation, but leaves the +implications of publishing to crates.io to a future RFC. ## More concrete example @@ -77,16 +86,45 @@ DBMysql --> Mysql DBMongo --> MongoDB DBRocks --> RocksDB ``` -# Guide-level explanation -[guide-level-explanation]: #guide-level-explanation +# Guide-level explanation (cargo) +[guide-level-explanation]: #guide-level-explanation + +Within a workspace, you can designate an "impl" relationship between two crates. +For example, in your `my_impls/Cargo.toml` you can put: -This RFC defines the notion of an *defining* crate and an *implementing* crate. +``` +[dependency] +my_definitions = { version = "0.1", path = "../my_definitions", impl = True } +``` -We extend the `rustc` `--extern` flag with an additional `impl` flag: -`--extern impl:dbtrait=dbtrait.rlib`. This means that *this* crate (the -*implementing* crate), that rustc is currently compiling, is allowed to -implement traits and methods for types defined in the `dbtrait` crate (the -*defining* crate). +This means that your `my_impls` crate can add trait implementations using types +or traits defined in `my_definitions`, almost as if they were defined in +`my_definitions`. If you want to use those implementations in other crates, +however, you must also add a dependency to both `my_impls` and `my_definitions` +(or you could re-export the definitions in `my_impls`). + +You may only implement for types/traits actually defined in `my_definitions` +itself. Re-exports don't count. + +The dependency *must* be a path dependency to another crate defined in the same +workspace. Cargo will not allow you specify `impl = True` for other crates. + +Publishing packages with `impl = True` to crates.io is not allowed. + +# Guide-level explanation (rustc) + +In current Rust, when you specify in your build system that a crate depends on +another, this is a "use" dependency - the dependee can use definitions in the +dependant. + +This RFC defines a new dependency relationship: the "impl" dependency. The +dependee can provide implementations for types and traits defined in the +dependant. + +We extend the `rustc` `--extern` flag with an additional `impl` flag: `--extern +impl:dbtrait=dbtrait.rlib`. This means that the crate currently being compiled +with this option, is allowed to implement traits and methods for types defined +in the `dbtrait` crate (the *defining* crate). For example: ```rust @@ -106,11 +144,6 @@ crate as far as the orphan rule goes. Note that you may only implement traits and types for crates directly defined in the defining crate — you cannot implement re-exported types. -The implementing crate is also considered the same as the defining crate from a -visibility perspective - it will be able to access `pub(crate)` modules, types, -fields, methods, etc in the defining crate (though in this case including -re-exports). - The defining and implementing crates are not the same in other respects. They still have distinct crate names, defining their own crate-level namespaces. For example, if they both defined a type `Foo`, it would be two definitions @@ -133,136 +166,162 @@ scope in the dependency graph, they must also be coherent. currently defined in terms of "local" definitions - that is, items defined with in the current crate. -This RFC does not change the definition of Coherence, just the definition of -"local". Instead of only considering items defined in one specific crate, it -defines a set of crates whose definitions are considered "local". - -This set is defined in terms of immediate dominators in the crate dependency -graph. - -Some definitions & axioms: -- a dependency graph always has a top-level crate (typically a binary) which: - - has no dependencies - - only crates which it transitively depends on are relevant -- a "local set" of crates TODO -- an implementing crate must have a dependency relationship to a defining crate - in order to actually have implementations -- a crate always dominates itself -- a crate is the dominator of other crates if it's on the only path to - those other crates - - it's the immediate dominator if it's the closest on that path - -So given this example compiling binary crate `Binary`: -```mermaid -graph TB - Bin(Binary) --> A & B - A & B -->|impl| C - D -->|impl| C - ``` -- When compiling `C`, because it dominates itself, `rustc` must check that - it is internally coherent (this is identical to today) -- When compiling `A`, which dominates itself and `C`, `rustc` must check `A` is - internally coherent and coherence between `A` and `C`. Likewise `B` with - respect to `C`.` -- `Binary` dominates both `A` and `B`, so when compiling it `rustc` must check - for coherence between them. It doesn't need to worry about coherence between - `A` and `C`, nor `B` and `C`, since that's already been checked. -- `D` is not relevant to any of this, as it is not in `Binary`'s transitive - dependencies, and its coherence is not checked. - -In a more complex example: -```mermaid -graph TB - Bin(Binary) --> A & C - A --> D & E - C & D & E -->|impl| F - ``` - - `F` must be internally coherent, likewise `D` & `E` with respect to `F`. - - Compiling `A` must check for coherence between `D` and `E` with respect to `F` - - Compiling `Binary` must check for coherence between `C`, `D` and `E` (which - may be able to take advantage of the knowledge that `D` and `E` are already - coherent because `A` already checked it.) +The details of the coherence checking logic are unchanged; the only change is +the constraints on which crates a definition and an implementation can be in. + +The discussion below talks about coherence in terms of the crate dependency +relations, but the actual coherence check is at the level of individual type and +trait implementations. The crate dependency relations set an upper bound on +this, as any crate which does not appear within the relevent dependency graph +need not be considered for coherence checking. ## Changes to coherence checking -In the simple case, when checking the coherence between an implementing crate -and the defining crate, definitions in both the defining and implementing crate -are considered "local". Aside from that, the algorithm then proceeds as before. -Here, implementing crates `A`, `B` and `C` are each considered local to defining crate `D`, but not to each other. +The key problem we need to solve is how to reconcile the Rust language-level +details (types and traits) with the build system view (dependency relationship +between crates). Coherence is defined in terms of type/trait definitions and +implementations, which forms its own graph. This graph is embedded in the crate +dependency graph — that is there are no edges in the type/trait +implementation graph which are not also present in the crate dependency graph. + ```mermaid -graph TD - A & B & C --> D +graph TB + CrateC -->|dep| CrateA & CrateB + subgraph CrateA + A["Trait A"] + D["Type D"] + E["Impl A for D"] -..->|impl| A & D + end + subgraph CrateB + B["Type B"] + end + subgraph CrateC + C["Impl A for B"] -..->|impl| A & B + end ``` -When checking the coherence of multiple implementing crates, we must be careful -with the definition of "local". Since the implementing crates don't depend on -each other, and can't have implementations for each other's types, we can safely -consider them local to each other with respect to the defining crate `D` for the -purposes of coherence checking. +For the discussion below, crate B is visible to crate A if it falls within its +transitive dependencies; a crate is visible to itself. ```mermaid graph TB - Bin(Binary) --> A & B & C --> D - subgraph Local - A & B & C + subgraph A sees only A + c1["A"] --> c2["..."] + c3["B"] + end + subgraph A sees A&B + b1["A"] --> b2["..."] --> b3["B"] + end + subgraph A sees A&B + a1["A"] --> a2["B"] end ``` -## Extra cases +The general rule is when compiling a crate, `rustc` must check the coherence of +a set of implementations for a type if: +- all those implementations are visible +- no other visible crate has checked them -Impl crates depend on each other +This means the compilation of a crate must check for coherence when: +- all the definitions and implementations are within one crate +- if a crate A has an impl dependency on crate B, it must check coherency + between A and B +- if a crate has two or more crates with implementations for a given definition + within its view via distinct direct dependencies + +This last point is the most complex. For example: ```mermaid -graph LR - I1 & I2 -->|impl| D - I2 --> I1 +graph TB + T -->|view| A & B -->|impl| D + A & B --> E ``` -Impl crates use other crates (common) +Here, `T` must check `A` and `B`'s implementations are coherent with respect to +each other and definitions in `D` and `E`. It can rely that `A` and `B` are each +individually coherent with respect to `D` and `E` because that would have been +checked when `A` and `B` were compiled. + +Note that while `A` and `B` both depend +on `D` and `E`, they need only have a `impl` dependency on one of them; the `impl` +dependency simply means that `D`'s definitions are considered local to `A` and +`B` when seen in terms of the current orphan rule. + +This is OK even if `A` and `B` have different `impl` dependencies: ```mermaid -graph LR - I1 -->|impl| D - I1 --> T +graph TB + T -->|view| A & B + A -->|impl| D + A --> E + B --> D + B -->|impl| E ``` -Impl crates implement multiple definition crates +Since the groups (`A`, `D` `E`) and (`B`, `D` `E`) must be internally coherent, +and `T` still needs to check the coherence of (`A` and `B`). + +In this case: ```mermaid -graph LR - I1 -->|impl| D0 & D1 +graph TB + T --> A -->|view| B & C -->|impl| D ``` -Impl crates define types and have their own impl crates +`A` must check the coherence of `B` & `C`'s implementations with respect to `D`. +`T`, however, can rely on `A` having already checked `B` and `C`'s coherence +because both are visible through it's `A` dependency. + +A more complex example: ```mermaid -graph LR - I1 -->|impl| D0 - I1 -->|impl| I2 - I2 -->|impl| D0 +graph TB + T --> A -->|view| B & C -->|impl| D + B & C --> E + T --> C ``` -Duplicated checking -- `A` and `B` would both check the coherence of `I1` and -`I2` with respect to each other, since neither knows about the other. +`T` can rely on `A` to check the coherence of `B` and `C`, even though `T` has +its own direct dependency on `C`, because `C` is also visible through `A`. + +But in this case: ```mermaid graph TB -I1 & I2 -->|impl| D -Bin([Binary]) --> A & B --> I1 & I2 + T --> A -->|view| B & C -->|impl| E + B & C --> F + T -->|view| D -->|impl| E +``` +`T` can rely on `A` for the coherency of `B` and `C`, but it must check the +coherence of `B`, `C` and `D` with respect to `E`. +Note that this can mean that the same crates' coherency can be checked +redundantly. For example: +```mermaid +graph TB + Binary --> X & Y & Z -->|view| A & B -->|impl| C ``` +`X`, `Y` and `Z` are all the first to see the implementations of `A` and `B` +with respect to `C`, so must each do their own coherency checks, even though +`Binary` only needs for this to be performed once. Fortunately these redundant +checks can be done in parallel, but it's still a waste of CPU. See below for +some discussion about optimizing this. # Drawbacks [drawbacks]: #drawbacks This adds a fair amount of complexity to a delecate part of the language -semantics and to the compiler's implementation of it. I've tried to keep the -changes as light-touch as possible, but I haven't tried to prototype it yet. +semantics and to the compiler's implementation of it. This is opt-in, but one +would need to understand these semantics when using other people's packages +which use this feature. If implemented as described here, and enabled in Cargo in an unconstrainted way, -it would enabled full third-party implementations. That is, anyone could publish -a crate implementing any trait for any type. This would be fine in the small, -but at large scales it would mean that if there were two crates with conflicting -implementations, then could never appear in the same dependency graph. It means -that non-local changes to dependencies could cause downstream build breakage. -This would lead to very undesireable ecosystem-wide dynamics. As such how this -feature is used must be very carefully considered. +it would enable full third-party implementations. That is, anyone could publish +a package implementing any type or trait in any other package. This would be +fine at small scalle, but at large scales it would mean that if there were two +crates with conflicting implementations, then could never appear in the same +dependency graph. That is a change to private dependencies, which wouldn't +usually be considered compatibility breaking, would cause downstream build +breakage. This would lead to very undesireable ecosystem-wide dynamics. As such +how this feature is used must be very carefully considered. # Rationale and alternatives [rationale-and-alternatives]: #rationale-and-alternatives -In principle you could use Cargo features to achieve similar outcomes to the +## Features + +In principle you could use Cargo "features" to achieve similar outcomes to the motivating `DBTrait` example above - you'd define a feature for each DB implementation, and the consumer would only enable the specific implementations they want. @@ -280,18 +339,62 @@ which end up single-threading the build. (This is particularly acute in cases where Cargo is primarily used as a dependency management tool, and builds are done with another tool like Buck.) +Sometimes features are used in a non-additive way, which is technically not +supported, but there's no mechanism to prevent it (or even warn about it). In +contrast, the mechanism described here does allow for separate implementation +crates with conflicting implementations, perhaps to allow for different +implementation tradeoffs, or bindings to different non-Rust implementations. + +Extensive use of features make code navigation difficult. For example, IDE +integration (VS Code + Rust Analyzer) often won't know which features are +enabled, and so will tend to dim feature-controlled code which should be +enabled, or vice versa. + +It's also hard to know which features should be enabled when generating +documentation with `rustdoc`. Not enabling features which are actually used +means that crate functionality goes undocumented, where as enabling too many +unused features can obscure the API the user actually wants to see. + # Prior art [prior-art]: #prior-art -...? +## Generally weak coherence Other languages have a very weak notion of coherence. C++ doesn't enforce it at all, and simply defines conflicting implementations as UB. Haskell allows for ecosystem wide conflicts, which may or may not be a problem in practice. -Rust is unique(?) in separately languages in taking such a strong stand on this, -and it's been mostly beneficial. But it would be nice to have a mechanism to -loosen the constraints on this without admitting unsoundness. +Rust is unique(?) in separately-compiled languages in taking such a strong stand +on this, and it's been mostly beneficial. But it would be nice to have a +mechanism to loosen the constraints on this without admitting unsoundness. + +## Interface/implementation split + +In C and C++, there's a clear separation between "header files" containing a +specification of an interface, and "source files" which contain actual +implementations. Typically a compilation unit which has a dependency on another +compilation unit will include the header files containing the interface +definitions, and can be compiled with that information alone; it does not requre +the implementation of the dependant to be compiled. + +Ideally (and often in practice) this means that the entire program can be +compiled in parallel, with only the final linking step dependent on all the +compilation units. + +The primary disadvantage of this scheme is that the header file definitions must +be manually kept in sync with the implementations, and failing to do so can +cause numerous poor outcomes from linker errors to undefined behaviour. + +Today, Rust approximates this interface/implementation split with pipelined +builds, where `rustc` will generate a metadata object containing a crates +definitions, followed by actual code generation. The dependent compilation can +start as soon as the metadata is available without having to wait for the +generated code. But this still requires metadata for the entire dependency chain +- and for many crates the metadata generation *is* the expensive part of +compilation. + +With this RFC the split can be made more explicit by having actual separate +crates for definitions and implementations. # Unresolved questions [unresolved-questions]: #unresolved-questions @@ -299,37 +402,64 @@ loosen the constraints on this without admitting unsoundness. Intended for close coupled crates from same origin, but *could* be used for generic third-party impls. -The big question is what this looks like from a Cargo perspective. I can think -of a few options: -- Extend the namespace proposal so that crates in the same namespace can have - `impl` relations -- Multi-crate Cargo packages, which allow internal `impl` dependencies +The big question is what this looks like from a crates.io perspective. If we +could guarantee that `cargo publish` published an entire workspace atomically, +then # Future possibilities [future-possibilities]: #future-possibilities -Allow transitive impl relations. For example: -```mermaid -graph LR - A -->|impl| B -->|impl| C -``` -Allowing `A` to implement items in `C` because of the transitive `impl` -relationship through B. I think this potentially complicates the correctness -analysis a lot though, and doesn't really have a lot of benefits over just -making `A` depend on `C` directly. - -There's a question on whether to globally enforce coherence even between crates -which are not in the same dependency graph. This could be particularly useful at -crates.io publication time to make sure the package doesn't have lurking -problems. On the other hand there might be cases where it's desireable to allow -incoherent implementations to exist; for example, two implementations of the -same API with different algorithms tuned for different use-cases. +## Support intrinsic implementations + +This RFC focuses on implementing traits for types, since this is the common pain +point. In principle everything described above will also work for intrinsic +implementations, but there's some awkward secondary questions. For example, +intrinsic implementations tend to access the private fields of structs; does +that mean we need add or redefine visibility rules? + +## "Global" third-party implementations + +If this mechanism were broadly enabled for the crates.io ecosystem, it could +cause undesireably ecosystem splits. For example if you have type-defining +package `some_type` and trait defining package `some_trait`, you could have two +separate crates `A` & `B` implementing `some_trait` for `some_type`. If your +package `my_package` contains just one of `A` or `B` in its dependencies, then +all is OK. But at any point, with only a minor patch update, any package could +add the other to its dependencies, preventing your package from compiling. This +kind of brittleness is highly undesireable. + +So is there some way to enable third-party these kinds of third-party +implementation crates in a more general way without introducing this kind of +ecosystem brittleness? + +The [namespacing RFC](https://github.com/Manishearth/namespacing-rfc) +([PR)(https://github.com/rust-lang/rfcs/pull/3243)]) is also looking at +questions relating the the overall crates.io ecosystem. One could imagine a +proposal, for example, that packages within the same namespace could have this +`impl` relationship. But whether that makes sense really depends on the intended +semantics of namespaces. + +## Cached coherence checks There are cases where the same coherence properties may be checked multiple -times. Perhaps these checks could be memoized with the incremental machinery to +times, if the same implementing crates are dependencies of multiple unrelated +crates. Perhaps these checks could be memoized with the incremental machinery to avoid the redundancy. + +## Auto-split crates + +Modules +within crates may have cyclic dependencies, but the crate dependency graph must +be acyclic. But only in extremely pathological cases will a crate's internal +dependency graph consist of a single strongly connected component. + +This opens the possibility of using the mechanism described in this RFC to +automatically partition a crate into sub-crates in order to expose more +built-time parallelism. This could either be performed inline during the build +or some tooling to perform pre-build processing on the crates. + # Appendix - Alloy spec (WIP) --- @@ -343,7 +473,9 @@ https://alloy.readthedocs.io for documentation. # Alloy spec for Rust implementation coherency -This is a simplified spec of Rust coherency checking. Rust requires that there, +(TODO: make this fully consistent with the description above.) + +This is a simplified spec of Rust coherency checking. Rust requires that there are no overlapping or conflicting implementations with a complete Rust program, as that would allow for ambiguity about which one to use. @@ -351,13 +483,6 @@ as that would allow for ambiguity about which one to use. subset of simple trait implementations for types, with no generic type parameters.) -The current solution to this is to requires that when a crate implements a trait -for a type, at least one of the type or the trait have to be "local" - ie, -defined in the same crate. - -This is simple to describe and implement, but in practice it has several downsides: -- (dependencies, no third-party impls) - ## Alloy Spec First we define signatures for types and traits. Both are defined in a crate: From e0b22b24ee172c1fb648fa679e4b8c124cb53d47 Mon Sep 17 00:00:00 2001 From: Jeremy Fitzhardinge Date: Wed, 6 Sep 2023 08:32:09 -0700 Subject: [PATCH 6/6] WIP address comments --- text/0000-extern-impl.md | 69 +++++++++++++++++++++++++--------------- 1 file changed, 44 insertions(+), 25 deletions(-) diff --git a/text/0000-extern-impl.md b/text/0000-extern-impl.md index 385580c8f1f..305bd06ae02 100644 --- a/text/0000-extern-impl.md +++ b/text/0000-extern-impl.md @@ -6,10 +6,11 @@ # Summary [summary]: #summary -This RFC proposes a mechanism to allow `impl`s for a type or trait to be in a -separate crate, while still meeting the soundness guarantees currently enforced -by the orphan rule. With this, the crate structure (and thus build system -dependency graph) can be decoupled from the actual implementations. +This RFC proposes an opt-in mechanism to relax some of the restrictions of the +orphan rule, while maintaining the soundness of implementation coherence. It +also proposes a minimal Cargo extension to allow this feature to be exercised. +With this, the crate structure (and thus build system dependency graph) can be +decoupled from the actual implementations. # Motivation [motivation]: #motivation @@ -109,7 +110,8 @@ itself. Re-exports don't count. The dependency *must* be a path dependency to another crate defined in the same workspace. Cargo will not allow you specify `impl = True` for other crates. -Publishing packages with `impl = True` to crates.io is not allowed. +Publishing packages with any `impl = True` dependencies to crates.io is not +allowed. # Guide-level explanation (rustc) @@ -139,20 +141,22 @@ use mysql::MySql; impl dbtrait::DBTrait for MySql {} ``` -In other words, the implementing crate is considered the same as the defining -crate as far as the orphan rule goes. Note that you may only implement traits -and types for crates directly defined in the defining crate — you cannot -implement re-exported types. +In other words, when the overlap check is considering what definitions are +"local", both the implementing crate and the defining crate both count. Note +that you may only implement traits and types for crates directly defined in the +defining crate — you cannot implement re-exported types. The defining and implementing crates are not the same in other respects. They still have distinct crate names, defining their own crate-level namespaces. For example, if they both defined a type `Foo`, it would be two definitions `defining::Foo` and `implementing::Foo`. -There are no other constraints on the defining and implementing crates - they -can freely define new traits, types and other items, and implementations for -those types and traits. An implementing crate can be a defining crate for -another crate. +The terms "implementing" and "defining" describe a property of the dependency +relationship between two crates, not an intrinsic property of the crates +themselves. There are no other constraints on the defining and implementing +crates - they can freely define new traits, types and other items, and +implementations for those types and traits. An implementing crate can be a +defining crate for another crate. Regardless, the definitions within the defining crate, the implementing crate, and between the implementing crate and defining crate must meet the current @@ -162,18 +166,27 @@ scope in the dependency graph, they must also be coherent. # Reference-level explanation [reference-level-explanation]: #reference-level-explanation -[Coherence](https://rust-lang.github.io/chalk/book/clauses/coherence.html) is -currently defined in terms of "local" definitions - that is, items defined with -in the current crate. +> The idea of trait +> [coherence](https://rust-lang.github.io/chalk/book/clauses/coherence.html) is +> that, given a trait and some set of types for its type parameters, there +> should be exactly one impl that applies. -The details of the coherence checking logic are unchanged; the only change is -the constraints on which crates a definition and an implementation can be in. +A key part of this are *overlap checks*, which check that not only are there +currently no conflicting implementations, but adding new implementations won't +invalidate or change the meaning of existing code. -The discussion below talks about coherence in terms of the crate dependency -relations, but the actual coherence check is at the level of individual type and -trait implementations. The crate dependency relations set an upper bound on -this, as any crate which does not appear within the relevent dependency graph -need not be considered for coherence checking. +These overlap checks can only be performed if all the relevant implementation +sites are known so they can be checked. This is the role of the *orphan rule*, +which has the consequence that there's only one possible crate in which contain +a given implementation. This is the restriction this RFC proposes to loosen. + +We introduce what could be called *the adoption rule*. This is an extension of +the orphan rule, in which implementations can be in more than one crate, but +some other crate adopts those implementation crates in order to apply the +overlap check across them in order to guarantee coherence. + +The actual details of the overlap check logic are unchanged; the only change is +a wider definition of what's considered a "local definition". ## Changes to coherence checking @@ -184,6 +197,12 @@ implementations, which forms its own graph. This graph is embedded in the crate dependency graph — that is there are no edges in the type/trait implementation graph which are not also present in the crate dependency graph. +Note that a blanket implementations - `impl Foo for T` - are a universal +quantifier over all types `T`, whether they're in scope or not. However if we're +interested in applying it to specific concrete types, we need only consider +those types which are in scope, which are therefore constrained by the +dependency graph. + ```mermaid graph TB CrateC -->|dep| CrateA & CrateB @@ -224,8 +243,8 @@ a set of implementations for a type if: This means the compilation of a crate must check for coherence when: - all the definitions and implementations are within one crate -- if a crate A has an impl dependency on crate B, it must check coherency - between A and B +- if a crate A has an impl dependency (ie, a direct dependency with the impl + option) on crate B, it must check coherency between A and B - if a crate has two or more crates with implementations for a given definition within its view via distinct direct dependencies