Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#[doc(hidden)] result in false positives #120

Closed
mqudsi opened this issue Sep 1, 2022 · 20 comments
Closed

#[doc(hidden)] result in false positives #120

mqudsi opened this issue Sep 1, 2022 · 20 comments
Labels
C-bug Category: doesn't meet expectations

Comments

@mqudsi
Copy link

mqudsi commented Sep 1, 2022

Steps to reproduce the bug with the above code

It is possible to use #[doc(hidden)] to implement seemingly-breaking changes without actually breaking semver compatibility.

Since cargo-semver-checks only uses the json rustdoc output to form its analysis, the claim that cargo-semver-checks has no false positives should probably have this shortcoming noted in the readme.

e.g. these two versions of the same code below are technically semver compatible:

pub struct Foo {}
pub struct Bar {}
#[doc(hidden)]
pub struct Foo {}
pub struct Bar {}

as merely hiding a type (or as is more often, a method) from the documentation doesn't break the semver contract.

or, for a more interesting case, the following:

pub struct Fo {}
pub struct Foo {}
// The previous release included a typo in the type name so the following is kept for backwards-compatibility:
#[doc(hidden)]
pub type Fo = Foo;

There is a lot more dark magic that can done that would break docs but preserve backwards compatibility. Relying solely on the docs is a great and efficient way of implementing straightforward semver breakage cases, but I don't think it's fair to claim that any tool using this approach has no false positives.

Actual Behaviour

A breaking change is reported.

Expected Behaviour

No breaking change should be reported (though I realize this is fundamentally not possible for this crate as it is written).
As such, the "no false positives" claim should be removed or at least the caveats thoroughly mentioned.

Additional Context

No response

Debug Output

No response

@mqudsi mqudsi added the C-bug Category: doesn't meet expectations label Sep 1, 2022
@mqudsi mqudsi changed the title doc(hidden) types result in false positives #[doc(hidden)] result in false positives Sep 1, 2022
@obi1kenobi
Copy link
Owner

obi1kenobi commented Sep 1, 2022

Thanks for reporting. I don't think it's fundamentally impossible to fix. We just need more data from rustdoc and then to tweak some queries to act on that data appropriately.

Action items:

  • Suitably update README wording on false positives. For cargo-semver-checks, false positives are bugs — we avoid them as best we can and squash them when we find them, but few things are truly 100% bug-free and we're all doing our best here.
  • We currently run rustdoc with --document-private-items, and we should also run it with --document-hidden-items as well to get the additional data we need (ref: Distinguish between private items and hidden items in rustdoc rust-lang/rust#67875 and Add --document-hidden-items flag to cargo doc rust-lang/cargo#7766 )
  • For the various deleted / renamed / moved lints, we should only check items that were not hidden in the baseline, and match them regardless of whether they are hidden in the current version. Should also add test cases for these behaviors.
  • For all other lints (I believe, let's double-check), we should filter only to items that were not, and continue to not be, hidden. Also need test cases.
  • For future semver-minor lints, we should consider how #[doc(hidden)] interacts with "new functionality added to the public API" kinds of lints. We can afford to punt this for a little while since we still have plenty of semver-major lints to write first.

Related to #58 since users may occasionally want to explicitly include a #[doc(hidden)] item for checking, or exempt non-hidden items from checking.

@epage
Copy link
Collaborator

epage commented Sep 2, 2022

We currently run rustdoc with --document-private-items, and we should also run it with --document-hidden-items as well to get the additional data we need

I don't even remember why we do this. Can we get rid of private items?

@obi1kenobi
Copy link
Owner

I don't even remember why we do this. Can we get rid of private items?

TL;DR: We can, but I'd wait a bit first just to be safe.

It's because of a bug in rustdoc that caused a bug in cargo-semver-checks: #32

The bug has since been fixed, but I don't remember whether the fix is before or after the last rustdoc JSON format was introduced. If it was after the last format change, then removing private items is equivalent to tightening the minimum nightly version we support, which might be a bit more trouble than it's worth right now. The next time we need a newer nightly bound, we should remove it then.

@obi1kenobi
Copy link
Owner

Update: --document-private-items is necessary for one additional reason at the moment. It lets us determine the existence of private fields in structs, which affects whether they are externally-constructible, which is relevant when determining e.g. whether adding #[non_exhaustive] is a breaking change or not. For structs, it's only breaking if the struct was previously externally-constructible.

We could work around this requirement if need be, but that additional effort at the moment is perhaps better spent elsewhere.

@aDotInTheVoid
Copy link

It lets us determine the existence of private fields in structs.

We could work around this requirement if need be

This is reported directly in fields_stripped, so theirs a relativly easy "native" way to do it.

@Bromeon
Copy link

Bromeon commented Jan 17, 2023

#[doc(hidden)] is often used precisely to exclude something from the public API, i.e. exclude it from semantic versioning guarantees. Many times, #[doc(hidden)] is the only option, when there is no technical alternative in Rust (e.g. macros polluting the top-level namespace, internal dependencies between crates, generated code, etc.)

As such, it seems wrong to me that changes in hidden items are detected as "breaking semver".
Shouldn't hidden items be treated as non-existing from a public API perspective?


As such, I disagree with this conclusion:

e.g. these two versions of the same code below are technically semver compatible:

pub struct Foo {}
pub struct Bar {}
#[doc(hidden)]
pub struct Foo {}
pub struct Bar {}

as merely hiding a type (or as is more often, a method) from the documentation doesn't break the semver contract.

Foo was part of the public API and is now removed from it. This is a semver-breaking change in my opinion.

If you want to communicate that a previously existing symbol should no longer be used, #[deprecated] is the better choice.

  • Deprecating is better than hiding, as it clearly communicates the migration path to the user and doesn't let them wonder why a symbol seemingly disappeared but still works.
  • This is btw exactly what should be done in the Fo/Foo typo example.

If on the other hand the presence of the symbol is a deal-breaking bug, the symbol should be removed, and the next semver-incompatible version should be released.

  • Leaving it accessible but hidden does not prevent any harm. A compiler will not inform about usage of hidden symbols (as opposed to deprecated symbols or straight compile errors).

Some popular crates follow this understanding, for example syn.

This view is further backed by the Rust API guidelines:

Rustdoc is supposed to include everything users need to use the crate fully and nothing more. It is fine to explain relevant implementation details in prose but they should not be real entries in the documentation.

Especially be selective about which impls are visible in rustdoc -- all the ones that users would need for using the crate fully, but no others. In the following code the rustdoc of PublicError by default would show the From<PrivateError> impl. We choose to hide it with #[doc(hidden)] because users can never have a PrivateError in their code so this impl would never be relevant to them.

@obi1kenobi
Copy link
Owner

obi1kenobi commented Feb 4, 2023

Another nuance, posted without prejudice: some clap versions in the 3.x family include items and renames that are deprecated and doc-hidden, presumably with the intention of allowing code that depends on the old names to continue working while encouraging users to use the new names: https://github.com/clap-rs/clap/blob/v3.2.0/src/lib.rs#L65-L68

If we stopped using --document-hidden-items then the above would be reported as a removal and a breaking change, and that crate's maintainers would have a reasonable argument that the error is false-positive as no code would actually be broken.

It's clear that this issue is both important and not straightforward to solve. More research is needed to come up with a good solution.

@davidhewitt
Copy link

If there are different usages of #[doc(hidden)], perhaps there may need to be some configuration to help the crate authors indicate what is acceptable in their case?

PyO3, for example, has a lot of #[doc(hidden)] items exported from the pyo3::impl_ module. This module is explicitly documented as implementation details which are semver exempt.

It might be good enough for PyO3's case to have an option --allow-semver-exempt=pyo3::impl_ or something like this, which the crate authors could use to specify semver-exempt namespaces.

@obi1kenobi
Copy link
Owner

Thanks for the suggestion. I'd love to flesh this out a bit more and understand how a few edge cases here should behave:

  • Say a type Foo was defined in pyo3::impl_ and then re-exported by another module which was not marked as semver-exempt. The re-export is not #[doc(hidden)]. Is Foo semver-exempt or not?
  • Say that pyo3::impl_::Foo type is re-exported by another crate, and the re-export is not #[doc(hidden)]. Is Foo semver-exempt or not?
  • Say that pyo3::impl_::Foo type is used in a private field in struct pyo3::Bar, which is not semver-exempt. Both pyo3::impl_::Foo and pyo3::Bar used to be Send + Sync. As of the most recent version, the semver-exempt pyo3::impl_::Foo stopped being Send + Sync. However, this made pyo3::Bar not Send + Sync any more either. Should this be reported as a semver problem?

Edge cases like these are tricky and difficult to reason about. I'd love your help in getting them right!

@adamreichold
Copy link

adamreichold commented Mar 4, 2023

Is Foo semver-exempt or not?

At least for our use case, the re-export itself is most likely an error and we would benefit from it being reported, independent of any changes. As a corollary, changes could be reported as well.

Is Foo semver-exempt or not?

I think just publically exposing these items outside of the crate and allowing to list them could be reported as a potential semver hazard. Thereby, I would say changes can be reported as this is strictly less of a requirement.

Should this be reported as a semver problem?

Definitely because it affects types we expect our uses to access.

@adamreichold
Copy link

adamreichold commented Mar 4, 2023

Maybe as proposal for a mechanistic definition: Allow listing a path prefix means that these items are completely removed before any checks are performed. Anything publically visible referencing them (and not removed itself in the previous step) is an error by itself.

@obi1kenobi
Copy link
Owner

Ultimately this will be some form of query tweak equivalent to a mechanistic definition, since all the lints are written as queries. Two things we have to be careful of, though:

  • Items exempt from semver are allowed to fulfill semver obligations (i.e. prevent semver violations). They just aren't allowed to break semver obligations and cause violations.
  • The item should remain in existence but any import paths in semver-exempt modules should be filtered out subject to the above constraint. This is so that e.g. moving an item into a semver-exempt module but re-exporting it in its original non-semver-exempt location correctly tracks that item as subject to semver due to the re-export.

Probing more edge cases, imagine we have a type defined in a semver-exempt module and not re-exported elsewhere.

  • Are there cases where some of the methods on that type are not semver-exempt?
  • Are the traits that type implements ever not semver-exempt?

I'm worried about situations like macro code producing something the user is expected to use or observe directly in some limited fashion.

@obi1kenobi
Copy link
Owner

An observation I made recently gave me an idea for solving this problem.

#[doc(hidden)]-ness is a property of the import path.

Consider the following situation:

#[doc(hidden)]
pub mod inner {
    pub struct Foo;

    pub struct Bar;
}

pub use inner::Foo;

Here, Bar is #[doc(hidden)] but Foo is not. crate::Foo is a valid import path for Foo that does not go through any #[doc(hidden)] items, whereas Bar is only accessible as crate::inner::Bar which is #[doc(hidden)].

We already have infrastructure in place for computing all possible import paths. This was a hard problem to solve, but we've solved it — and with a slight extension we can reuse that solution to solve #[doc(hidden)] too.

Proposal

I propose we add a new edge to the Importable interface to signify importable paths that are not #[doc(hidden)]. Name subject to bike-shedding (please share your suggestions!), tentatively calling it non_hidden_importable_path.

Then we can adjust queries to use non_hidden_importable_path in the place of importable_path in appropriate places. For example:

  • Removing a function (or its re-export) is major+breaking only if the function previously had a non_hidden_importable_path.
  • Adding a variant to an exhaustive enum is major+breaking only if the enum previously had a non_hidden_importable_path.

We must consider two more factors: deprecations, and transitions from non-hidden to hidden and vice versa.

Handling deprecations

When maintainers deprecate items, they also make them #[doc(hidden)] to discourage their use. This is a minor change, and must not be reported as major+breaking.

However, such hidden + deprecated items may still be in use by crates in the ecosystem, so the APIs should still be considered non-hidden for semver-checking purposes.

Perhaps we should implement this along the lines of "deprecation undoes #[doc(hidden)]" in the importable path computations?

Handling transitions between hidden and non-hidden status

Items that were initially added as #[doc(hidden)] and have remained thus are exempt from semver-checking.

A hidden item becoming non-hidden should be a minor change, as if the item was just newly added.

A non-hidden item becoming hidden without being deprecated must be reported as major+breaking. Here's why: say we don't do this, then we can have a situation where v1.0 adds a non-hidden pub API, v1.1 hides part of that pub API, then v1.2 deletes the hidden portion of the API. This is a breaking change between v1.0 and v1.2 — but where could we have detected this? The only sensible place to detect and report this change is v1.1, since the v1.1 -> v1.2 change is just a deletion of a #[doc(hidden)] type and we already said this isn't major+breaking.

@davidhewitt
Copy link

The above proposal sounds correct to me; I can't think of any uses in PyO3 which would not fit in the above.

@obi1kenobi
Copy link
Owner

@tonowak pointed out another edge case to consider: auto-traits in #[doc(hidden)] items.

Normally it's fine for macro internals to be hidden and not public API even though their types are technically pub.

But consider a use case like the ouroboros crate where those macros generate new types containing their own types. The auto-traits on the macro-internal types affect the auto-traits on the user's generated type. If the user's generated type today is Send / Sync / UnwindSafe, a change in the macro-internal types can break it even though the changed type was not part of the public API.

Are we stuck picking between "always report auto-trait changes on hidden items, and suffer false-positives" and "never report auto-trait changes on hidden items, and suffer false-negatives"? Is there a better solution?

@obi1kenobi
Copy link
Owner

obi1kenobi commented Oct 30, 2023

Another interesting edge case: if a non-sealed trait has a #[doc(hidden)] method without a default implementation, that method is still public API despite the #[doc(hidden)] since downstream implementors have to implement it, and may be broken otherwise.

If the #[doc(hidden)] method has a default impl but isn't itself sealed,[0] it could be argued that it isn't public API ("users don't have to implement it") or that it is ("users may choose to implement it anyway"). I'm currently leaning toward the former, but I need to think about this some more.

Brought up here: https://hachyderm.io/@jsbarretto@social.coop/111323903224194053

[0]: My post "A definitive guide to sealed traits in Rust" shows how to make a trait method not allowed to be called or overridden outside of the defining crate. If that method has a default impl, then the trait itself is not sealed because of it but that method's default impl may not be overridden. In such a case, #[doc(hidden)] on that method definitely makes it not public API, since it arguably already wasn't part of the public API anyway — it couldn't be called or overridden outside of its defining crate.

@epage
Copy link
Collaborator

epage commented Oct 30, 2023

I feel like doc(hidden) should be a tool of last resort for indicating something is private and, where possible, sealing traits or trait methods should be used.

@obi1kenobi
Copy link
Owner

FWIW I agree. The linked thread has more info, but TL;DR there was a concern about the quality of error messages being degraded for some reason if the trait is sealed. It might be worth filing a diagnostics issue on rustc for that as well.

@adamreichold
Copy link

I feel like doc(hidden) should be a tool of last resort for indicating something is private and, where possible, sealing traits or trait methods should be used.

The main use case for PyO3 is types which are supposed to be implemented, i.e. cannot be sealed, but never manually, i.e. only via proc-macro generated code.

@obi1kenobi
Copy link
Owner

Resolved by #576. Headline feature of cargo-semver-checks v0.25, planned for release shortly after Rust 1.74.

Official announcement blog post coming soon on my blog: https://predr.ag/blog

Thanks to everyone for your patience! Thanks to @davidhewitt in particular for bouncing ideas and prototypes back and forth until we got to something viable.

Also thanks to everyone that sponsors my work — the best way to help cargo-semver-checks progress faster is to sponsor its development, which helps me dedicate more time to it instead of doing consulting gigs to pay my rent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: doesn't meet expectations
Projects
None yet
Development

No branches or pull requests

7 participants