Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lint to detect semantic versioning failures #374

Open
nikomatsakis opened this issue Aug 14, 2014 · 18 comments
Open

Lint to detect semantic versioning failures #374

nikomatsakis opened this issue Aug 14, 2014 · 18 comments
Labels
A-new-subcommand Area: new subcommand S-needs-design Status: Needs someone to work further on the design for the feature or fix. NOT YET accepted.

Comments

@nikomatsakis
Copy link
Contributor

It would be nice if cargo can give you warnings when you try to publish a new version of a package that obviously breaks the API of a previous version.

An important but easily overlooked part of this would be whether a type is sendable etc.

@thehydroimpulse
Copy link

Perhaps when a registry is online it could outright refuse (perhaps with an override) to push a new change when a breaking change happened and the semantic versioning has not been appropriately changed (i.e., they bumped the patch, but the change needs to bump the major).

@steveklabnik
Copy link
Member

Yeah, there has been some small amount of discussion about attempting to determine breakage as part of the metatdata the central server has.

@wycats
Copy link
Contributor

wycats commented Aug 14, 2014

@thehydroimpulse yes, that's the idea 😄

@emberian
Copy link
Member

emberian commented Jul 9, 2015

The tool should be separate from cargo, with a subcommand that shells out to it.

@conradkleinespel
Copy link

I'm very much interested in this issue. I would like to help.

Given that this issue is marked as E-hard, how can I know if I would be able contribute in a meaningful way? Would you happen to have any pointers as to what I should look into? I'm thinking clippy might be a good way to learn about lints.

Even if I turn out not to be skilled enough to actually code the thing, I'd be happy to help with testing, documenting or anything else related to this issue.

Side note @cmr: why should this be separate from cargo?

@alexcrichton
Copy link
Member

Yeah this is indeed unfortunately E-hard because it will involve some significant and independent design and implementation work. There's not much of a skeleton to go off today other than the idea that this would in theory boil down to an "AST diff" tool.

@jonas-schievink
Copy link
Contributor

I've been thinking that we could use the incremental compilation infrastructure (once it's set up): Save incr. comp. data for the old version, and check what has changed in the new version. Of course, this would still need a set of "breakage rules" that dictate which changes are OK for which kind of version bump, but at least finding changes is then done by the compiler.

@conradkleinespel
Copy link

@alexcrichton Alright, thanks.
@jonas-schievink Oh, that sounds like a good idea. I'm off to read the relevant RFC then.

@dwijnand
Copy link
Member

dwijnand commented Sep 2, 2018

Looks (unless I'm confused) like GSoC 2017 led to the creation of https://github.com/rust-lang-nursery/rust-semverver, which seeks to address this. Looks like that needs more work, in anyone here is interested.

@epage
Copy link
Contributor

epage commented Jul 25, 2022

cargo-semverver uses the metadata in an rlib plus some hard coded rules to determine what a breaking change is

cargo-breaking originally was parsing the crate using cargo expand + syn but got bogged down under the weight of re-implementing a lot of semantics. It was in the middle of switching to internal rustc APIs for the parsing.

cargo-crate-api is a PoC for using the json output from rustdoc to detect breaking changes. cargo-public-api is somewhat similar.

One approach that I've not seen used is to build on top of rust-analyzer.

To me, the ideal solution would

  • Detect breaking changes in features exposed and not just the rust api
  • Detect breaking changes in one run and take into account conditional APIs, including platform and features
  • Expose different breaking change rules as lints so people can enable/disable them as they wish, considering there is some wiggle room for breaking changes
  • Allow comparing the working dir to a git rev, path, or registry version
  • Support multiple report outputs
    • Adds, changes, and removes for including in a changelog
    • Lints of what broke
    • Potential version (since behavior changes might exist, we need to be clear to not give people a false sense of security)
  • Works on stable
    • If not, at least has a wide range of nightly versions it is compatible with

Breaking down potential data sources:

  • rust-analyzer: Not having worked with it, I can't really say how well it can meet these goals.
    • Either using its internals or by going through LSP
  • rlib:
    • I have the impression that the internal rustc APIs for introspecting on rlib's is fairly unstable, requiring a narrow range of nightlies
    • Requires recompiling for every set of features and platforms (which requires every feature combination to compile)
  • rustdoc json
    • I have the impression that there isn't a lot of churn, allowing a wide range of versions to be compatible. We can also use a parser that supports a wider range
    • I think this will give us conditional compilation annotations so we can compare them in a single run

@badboy
Copy link
Member

badboy commented Jul 25, 2022

There's also cargo-semver-checks now (cc @obi1kenobi), also based on rustdoc output.

@epage
Copy link
Contributor

epage commented Jul 25, 2022

To help cover the importance / use case for this tool (at least partly obvious but still want to cover it).

The more obvious is to prevent unintentional breaking changes like recently happened with h hit clap recently due to an auto-trait.

The less obvious is the impact on maintainers and their ability to scale up and remove toil in their development. cargo is fine making breaking changes but tracking when to bump the breaking version field is still work. We recently had a breaking change released without bumping the right field (#10803). We don't have a lot of process in place (e.g. conventionalcommits.org/) to help catch these. I feel the overhead of tracking breaking changes is one of several limiting factors for the cargo repo to containing more crates and resolving this would be a big help for the cargo team. Providing a trivial way of checking the status of all crates and helping to write the release notes would make it easier to scale up what maintainers are able to deal with much like the difference I've had in manually publishing crates and switching to cargo-release.

EDIT: It'd also shift documentation to automation, see #8736

@epage
Copy link
Contributor

epage commented Jul 25, 2022

@JohnTitor I'd be interested in hearing the rust-semverver perspective on my earlier comment, including what your thoughts are on

  • Alternative data sources like rustdoc
  • Expanding the scope to include checking the manifest for breaking changes (I don't think this is handled currently)

@ehuss
Copy link
Contributor

ehuss commented Jul 25, 2022

Regarding rust-semverver:

  • I have the impression that the internal rustc APIs for introspecting on rlib's is fairly unstable, requiring a narrow range of nightlies

There was an attempt to bring it into the rust-lang/rust repo so that it would always be in a buildable state (like clippy or rustfmt), and thus usable on stable. I think that is still an option, it just needs active people to maintain it. I think this is still an attractive option, since having the full power of the compiler to run queries (like trait and method resolution) can be useful. I do have concerns about the complexity of rust-semverver and how it is implemented. The issues on the issue tracker look somewhat concerning.

OTOH, using rustdoc JSON could be much lighter weight and easier to handle. However, it looks to have some limitations. For example, I don't see a way to inspect macro definitions with it (which would be needed to see how they change). Also, it is not clear what trajectory it has for stabilization, which would be required here. It is not clear how far away that is.

@epage
Copy link
Contributor

epage commented Jul 25, 2022

I think this is still an attractive option, since having the full power of the compiler to run queries (like trait and method resolution) can be useful.

Agreed. We would need a way though for users to skip checking some parts of the API (if it doesn't already have it) since some items will be exposed only for the sake of macros and, depending on how things are setup, the author can break those as needed.

This is also why I'm curious about using rust-analyzer for a base. It is working towards having the full power of the compiler but I'm assuming it doesn't prefer one set of conditional compilation settings but tracks all states the code can run in. If that is the case, then that would be the most ideal for such a tool.

OTOH, using rustdoc JSON
...
Also, it is not clear what trajectory it has for stabilization, which would be required here. It is not clear how far away that is.

Yes, for both approaches, becoming a rustup component (or included with one) would resolve the nightly issue.

@obi1kenobi
Copy link
Member

OTOH, using rustdoc JSON could be much lighter weight and easier to handle. However, it looks to have some limitations. For example, I don't see a way to inspect macro definitions with it (which would be needed to see how they change). Also, it is not clear what trajectory it has for stabilization, which would be required here. It is not clear how far away that is.

It's true that rustdoc JSON is somewhat limited — another thing it can't be used to check (AFAICT) is re-exports, since the JSON format doesn't seem to mention them at all: obi1kenobi/cargo-semver-checks#2

On the flip side, it seems that docs.rs is considering hosting the rustdoc JSON alongside the HTML docs when the JSON output is stabilized (rust-lang/docs.rs#1285) which would make checking even faster and more convenient in the common cases. For cargo-semver-checks, the process of generating the JSON files is the long pole in the runtime — the checking itself takes milliseconds for most crates that are not egregiously large.


In my view, we don't want an "either-or" but an "and." We won't find a single solution that dominates all others. For example:

  • It's going to be hard to beat rust-semverver for thoroughness and ability to check for the most complex semver breaks, because it has the full power of the compiler.
  • It's going to be hard to beat cargo-semver-checks for speed and extensibility because it's based on a query engine specifically designed for speed and extensibility. Implementing a new rule is just writing a query in a strongly-typed DSL that's syntactically GraphQL but with more expressive semantics. But it probably won't be able to check the most complex semver rules involving generics (e.g. GATs), specifically because it doesn't do all the work that rust-semverver does.
  • It's going to be hard to beat cargo-public-api for generating a diff suitable for inclusion in a changelog, since that's specifically what it's designed for. Etc.

With the above in mind:

  • I'd argue for a "best tool for the job" approach.
  • I'd like to work out a common set of names for the different semver rules + design an attribute to disable checking a particular rule that all tools would understand. This will make using different tools for different jobs practical and ergonomic.
  • I think it's substantially less than day's worth of work to make cargo-semver-checks aware of manifests and their semver-breaking cases. I think cargo-semver-checks is the tool for that, and I'd be happy to build it around / after RustConf.

I'm also excited about the idea of making rust-analyzer-provided information queryable through cargo-semver-checks i.e. through a Trustfall query API. If anyone with knowledge of rust-analyzer wants to pair-program a prototype, you know where to find me and you just need to name a day and time compatible with US/Eastern time.

@epage
Copy link
Contributor

epage commented Jul 26, 2022

@matklad what are your thoughts on using rust-analyzer (internal or LSP API) as the data source for a cargo-semverver-like tool?

In particular

  • Would it generally be sufficient?
  • Could it handle cases that rustdoc json output can't, like macro definitions?
  • Could we superimpose all configurations on top of each other rather having to re-run per configuration like cargo-semverver requires since it relies on rlib

@matklad
Copy link
Member

matklad commented Jul 26, 2022

Using rust-analyzer for semverver would give you a nice prototype, but woun't scale to production readiness, for two reasons:

  • rust-analyzer isn't precise enough for that (it has false errors and such)
  • rust-analyzer doesn't provide any kind of stable API

Could we superimpose all configurations on top of each other rather having to re-run per configuration like cargo-semverver requires since it relies on rlib

So, there are two kinds of things you can do:

  • you can build a heuristic best effort tool with some amount of false positives and false negative, and than, sure, you can try to analyze all the cfgs together
  • you can build an exact tool that follows the language semantics. In the latter case, you have to have a loop over all combinations of cfg flags: that's just how the language works, everything semantically relevant becomes to exist only after expansion.

@epage epage added A-new-subcommand Area: new subcommand S-needs-design Status: Needs someone to work further on the design for the feature or fix. NOT YET accepted. and removed E-hard Experience: Hard labels May 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-new-subcommand Area: new subcommand S-needs-design Status: Needs someone to work further on the design for the feature or fix. NOT YET accepted.
Projects
None yet
Development

No branches or pull requests