-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JSON backend experimental impl #75114
Conversation
(rust_highfive has picked a reviewer for you, use r? to override) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks pretty good! Left some comments.
src/librustdoc/json/mod.rs
Outdated
match item.inner.clone() { | ||
StructItem(s) => s.fields.into_iter().for_each(|i| self.insert(i, cache)), | ||
UnionItem(u) => u.fields.into_iter().for_each(|i| self.insert(i, cache)), | ||
VariantItem(clean::Variant { kind: clean::VariantKind::Struct(v) }) => { | ||
v.fields.into_iter().for_each(|i| self.insert(i, cache)); | ||
} | ||
EnumItem(e) => e.variants.into_iter().for_each(|i| self.item(i, cache).unwrap()), | ||
TraitItem(t) => t.items.into_iter().for_each(|i| self.insert(i, cache)), | ||
ImplItem(i) => i.items.into_iter().for_each(|i| self.insert(i, cache)), | ||
_ => {} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be better to have this knowledge inside a method on ItemEnum
itself, either returning an iterator of inner items or taking a closure.
Also, a problem with using a wildcard here is that a new kind of item with children could be added, and this match could be forgotten. So it's actually safer to list everything out.
src/librustdoc/json/mod.rs
Outdated
fn after_krate(&mut self, krate: &clean::Crate, cache: &Cache) -> Result<(), Error> { | ||
debug!("Done with crate"); | ||
let mut index = (*self.index).clone().into_inner(); | ||
let trait_items = cache.traits.iter().filter_map(|(id, trait_item)| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this could be extracted to a function instead of happening directly in after_krate
#[serde(rename = "trait")] | ||
pub trait_: Option<Type>, | ||
#[serde(rename = "for")] | ||
pub for_: Type, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we'd ever want to create an index of Types so we could de-duplicate them. This could be an open question on the RFC to consider before stabilizing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In some sense they're already partially de-duplicated because they refer to IDs for the structs/enums/traits instead of containing their definitions. It doesn't seem to me that another level of indirection is worth it in general, but it would be nice for the purpose of easily comparing deeply nested types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I think experience will tell if this is worth it or not, but I suspect you're right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This CL is great, essentially self-explanatory, thanks to the prior refactor. My comments are mainly minor docstrings and naming. Looking forward to the tests.
src/librustdoc/json/types.rs
Outdated
|
||
#[derive(Clone, Debug, Serialize, Deserialize)] | ||
pub struct Item { | ||
/// This can be used as a key to the `external_crates` map of [`Crate`][] to see which crate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can be? Perhaps just state that the crate_num here corresponds to Crate.id?
Also if not too late, perhaps using crate_id is a better name unless they number and id are different.
|
||
#[serde(rename_all = "snake_case")] | ||
#[derive(Clone, Debug, Serialize, Deserialize)] | ||
pub enum GenericArgs { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here and below it would be helpful with the same level of docs as in the first part of the file. They are very helpful.
src/librustdoc/json/types.rs
Outdated
ResolvedPath { | ||
name: String, | ||
id: Id, | ||
args: Box<Option<GenericArgs>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Swap wrapping? Also, curious about the boxing, what's the purpose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's purely to satisfy the compiler because otherwise the Type
enum is self referential due to GenericArgs
containing other types.
src/librustdoc/json/types.rs
Outdated
pub blanket_impl: Option<Type>, | ||
} | ||
|
||
// TODO: this is currently broken because imports have the same ID as the module that contains |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In what way? Do imports not show up at all, is there a race? Perhaps imports could be omitted until this is resolved?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep there was a race. Still working on fixing it, but since re-exports aren't going to be part of the first PR it's not blocking for a little bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment has since been removed. Is this still relevant?
src/librustdoc/clean/types.rs
Outdated
@@ -1405,6 +1405,11 @@ impl Path { | |||
pub fn last_name(&self) -> &str { | |||
self.segments.last().expect("segments were empty").name.as_str() | |||
} | |||
|
|||
pub fn whole_name(&self) -> String { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this contextually dependent? What's the difference towards a fully qualified path?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't be contextually dependent, it is just the fully qualified path.
# local IDs have to be in `index`, external ones can sometimes be in `index` but otherwise have | ||
# to be in `paths` | ||
def valid_id(item_id): | ||
return item_id in crate["index"] or item_id[0] != "0" and item_id in crate["paths"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really the bare minimum level of testing for the output. We just check that the blob doesn't contain any invalid IDs (ones that aren't present in index
or paths
. This doesn't handle the case where neither the ID nor the item itself end up in the output, or really anything about the structure of the output itself.
Originally I'd planned to write tests that simply compared the output to a known-good version for some minimal examples, but without writing a fuzzy differ to only look at the fields we care about that turns out to be quite difficult due to the large output size and potentially brittle to small changes in the compiler and stdlib.
IMO I would rather merge this sooner rather than later and do the cleanup in later PRs. That would let people test things out on nightly - I expect any bugs in the implementation not to have too much impact on the design itself. @P1n3appl3 the CI failures are saying to change I'm ok with leaving macros for a follow-up PR. |
@GuillaumeGomez ^ what do you think? |
The code looks good, so I think we're getting close to a merge. However, I agree with @jyn514: please implement https://github.com/rust-lang/rust/pull/75114/files#diff-94e97fcd5e773cae8edd0781d7bd5411R104 before. :) |
Just thought: we need tests too. Probably a new test suite (I can add that if needed) to be sure that every need PR on this feature won't bring regressions alongside. |
☔ The latest upstream changes (presumably #74489) made this pull request unmergeable. Please resolve the merge conflicts. |
To avoid having this perpetually in limbo, can we delay the test suite until stabilization? The best test suite will be people using it ;) and it's going to be at least as much work as this PR if not more. @P1n3appl3 FYI @nmccarty has volunteered to finish up the implementation on this PR if you're busy with work/school/life. |
Sorry I should have updated here: I've been on vacation visiting family for the past 2 weeks and was mostly offline but I've got some time now before school ramps up so I'm planning to finish/fix up the implementation. @robinkrahl has been testing out the backend in its current state and has found several bugs that I'm still working on (doc-hidden wasn't being respected, and there are issues where serde incorrectly deserializes fields due to using untagged enums). There are some smaller things that I can just open issues for once this is merged, but I want to get the big ones fixed so it's usable on release. I also have been trying to write up tests and I agree that it's best left for another PR (especially given the size of this one already). I would love some help from @nmccarty on coming up with those as well as fixing the known bugs. I'm "Pineapple" on the rust discord or just email me and I can show you the remaining TODOs. |
I don't ask for an extensive list of tests, just at least one. Always useful to at least avoid regressions. Then more will be added once new bugs are fixed and so on. |
The code in this PR has several tests and I think it's fine to merge as-is, once conflicts are resolved and failing tests are fixed. @GuillaumeGomez What do you think? @P1n3appl3 I think this is basically done. Are you able to push this over the finish line? I'd also be happy to rebase and fix it up. |
Yup @jyn514 already offered to help clean up some WIP bugfixes and minor changes that I haven't pushed yet so he can do the rebase i think. After that it should be good to merge. |
/// Whether this item is meant to be omitted from the generated documentation due to `#doc(hidden)`, | ||
/// because it is private, or because it was inlined. | ||
pub stripped: bool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still prefer to omit stripped items altogether. I won't block on it because I really want to see this in nightly 😆 but it should be removed before stabilization; rustdoc should not have a stable way to introspect private items from other crates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--document-private-items
is really useful for generating internal docs, and I think this feature should support it, even though it technically provides a way to do what you say.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you pass --document-private-items
, the items won't be stripped (and in fact you'll lose the ability to distinguish between private and public items). This is more like 'unconditionally document private items'.
|
I think you want |
removing them creates dangling references
This way it doesn't have to build rustc twice. Eventually it should probably get its own suite, like rustdoc-js, but that can be fixed in a follow-up.
| PrimitiveItem(_) | ||
| AssocConstItem(_, _) | ||
| AssocTypeItem(_, _) | ||
| StrippedItem(_) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is correct? StrippedItem
always has an inner item: https://doc.rust-lang.org/nightly/nightly-rustc/rustdoc/clean/types/enum.ItemEnum.html#variant.StrippedItem
src/librustdoc/json/conversions.rs
Outdated
fn from(deprecation: clean::Deprecation) -> Self { | ||
Deprecation { since: deprecation.since, note: deprecation.note } | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make this exhaustive to avoid missing new fields.
fn from(deprecation: clean::Deprecation) -> Self { | |
Deprecation { since: deprecation.since, note: deprecation.note } | |
} | |
fn from(deprecation: clean::Deprecation) -> Self { | |
let clean::Deprecation { since, note } = deprecation; | |
Deprecation { since, note } | |
} |
Inherited => Visibility::Default, | ||
Crate => Visibility::Crate, | ||
Restricted(did, path) => { | ||
Visibility::Restricted { parent: did.into(), path: path.whole_name() } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to have both the path and the ID? I think the path can be looked up by the id.
/// For the most part items are private by default. The exceptions are associated items of | ||
/// public traits and variants of public enums. | ||
Default, | ||
Crate, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does pub(crate)
need to be treated specially? It's the same as pub(in crate)
, i.e. a restricted path.
src/librustdoc/json/types.rs
Outdated
pub blanket_impl: Option<Type>, | ||
} | ||
|
||
// TODO: this is currently broken because imports have the same ID as the module that contains |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment has since been removed. Is this still relevant?
This fully halves the size of the emitted JSON.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it ok if I change some of the output to differ from the eRFC? Some of this info really shouldn't be duplicated.
@@ -2314,7 +2314,7 @@ impl Clean<Vec<Item>> for doctree::Import<'_> { | |||
name: None, | |||
attrs: self.attrs.clean(cx), | |||
source: self.span.clean(cx), | |||
def_id: DefId::local(CRATE_DEF_INDEX), | |||
def_id: cx.tcx.hir().local_def_id(self.id).to_def_id(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did this change?
impl From<clean::Struct> for Struct { | ||
fn from(struct_: clean::Struct) -> Self { | ||
let clean::Struct { struct_type, generics, fields, fields_stripped } = struct_; | ||
Struct { | ||
struct_type: struct_type.into(), | ||
generics: generics.into(), | ||
fields_stripped, | ||
fields: fields.into_iter().map(|i| i.def_id.into()).collect(), | ||
impls: Vec::new(), // Added in JsonRenderer::item | ||
} | ||
} | ||
} | ||
|
||
impl From<clean::Union> for Struct { | ||
fn from(struct_: clean::Union) -> Self { | ||
let clean::Union { struct_type, generics, fields, fields_stripped } = struct_; | ||
Struct { | ||
struct_type: struct_type.into(), | ||
generics: generics.into(), | ||
fields_stripped, | ||
fields: fields.into_iter().map(|i| i.def_id.into()).collect(), | ||
impls: Vec::new(), // Added in JsonRenderer::item | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These impls are exactly the same, which means that the JSON output doesn't distinguish between structs and unions. Is that intentional?
.iter() | ||
.map(|(k, v)| { | ||
( | ||
k.as_u32(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having numeric keys seems annoying. Would it make more sense to instead have a list that you look up by index?
Concretely, currently the output looks like this:
{
"1": {
"name": "std",
"html_root_url": "https://doc.rust-lang.org/nightly/"
},
"2": {
"name": "core",
"html_root_url": "https://doc.rust-lang.org/nightly/"
},
...
}
and I'd prefer it to look like
[
{
"name": "std",
"html_root_url": "https://doc.rust-lang.org/nightly/"
},
{
"name": "core",
"html_root_url": "https://doc.rust-lang.org/nightly/"
}
...
]
let mut index = (*self.index).clone().into_inner(); | ||
index.extend(self.get_trait_items(cache)); | ||
let output = types::Crate { | ||
root: types::Id(String::from("0:0")), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the point of this field?
}, | ||
source: source.into(), | ||
visibility: visibility.into(), | ||
docs: attrs.collapsed_doc_value().unwrap_or_default(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this instead be null
if there are no docs?
let output = types::Crate { | ||
root: types::Id(String::from("0:0")), | ||
version: krate.version.clone(), | ||
includes_private: cache.document_private, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to be included? The user will know if they passed --document-private-items
.
pub struct JsonRenderer { | ||
/// A mapping of IDs that contains all local items for this crate which gets output as a top | ||
/// level field of the JSON blob. | ||
index: Rc<RefCell<FxHashMap<types::Id, types::Item>>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Making this a map makes it annoying to work with, and introduces duplicate information: now the ID is both a key and a field on the object. Instead, I think it makes sense to make it an array, and you have to look up the id in the object if you want it.
} | ||
|
||
#[derive(Clone, Debug, PartialEq, Eq, Hash, Serialize, Deserialize)] | ||
pub struct Id(pub String); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently this contains both the crate number and the item id. But the crate number is already part of Item
. Do we need both? The Id
is used to globally identify the item, so I think a better idea is to use a { crate_number, id }
object, then remove the crate number from Item
.
I've implemented these changes. |
docs: Default::default(), | ||
links: Default::default(), | ||
attrs: Default::default(), | ||
deprecation: Default::default(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't look right: it shows Drop
as having no docs at all.
It turns out that I broke the tests by moving them to |
☔ The latest upstream changes (presumably #79070) made this pull request unmergeable. Please resolve the merge conflicts. Note that reviewers usually do not review pull requests until merge conflicts are resolved! Once you resolve the conflicts, you should change the labels applied by bors to indicate that your PR is ready for review. Post this as a comment to change the labels:
|
Closing this in favor of #79539; I still want to address the points I brought up, but we can do that after the initial impl lands on nightly. |
…illaumeGomez Rustdoc: JSON backend experimental impl, with new tests. Based on rust-lang#75114 by `@P1n3appl3` The first commit is all of rust-lang#75114, but squased to 1 commit, as that was much easier to rebase onto master. The git history is a mess, but I think I'll edit it after review, so it's obvious whats new. ## Still to do - [ ] Update docs. - [ ] Add bless option to tests. - [ ] Add test option for multiple files in same crate. - [ ] Decide if the tests should check for json to be equal or subset. - [ ] Go through the rest of the review for the original pr. (This is open because the test system is done(ish), but stuff like [not using a hashmap](rust-lang#75114 (comment)) and [using `CRATE_DEF_INDEX` ](rust-lang#75114 (comment)) hasn't) I'm also sure how many of these we need to do before landing on nightly, as it would be nice to get this in tree, so it isn't effected by churn like rust-lang#79125, rust-lang#79041, rust-lang#79061 r? `@jyn514`
Rustdoc: JSON backend experimental impl, with new tests. Based on rust-lang#75114 by `@P1n3appl3` The first commit is all of rust-lang#75114, but squased to 1 commit, as that was much easier to rebase onto master. The git history is a mess, but I think I'll edit it after review, so it's obvious whats new. ## Still to do - [ ] Update docs. - [ ] Add bless option to tests. - [ ] Add test option for multiple files in same crate. - [ ] Decide if the tests should check for json to be equal or subset. - [ ] Go through the rest of the review for the original pr. (This is open because the test system is done(ish), but stuff like [not using a hashmap](rust-lang#75114 (comment)) and [using `CRATE_DEF_INDEX` ](rust-lang#75114 (comment)) hasn't) I'm also sure how many of these we need to do before landing on nightly, as it would be nice to get this in tree, so it isn't effected by churn like rust-lang#79125, rust-lang#79041, rust-lang#79061 r? `@jyn514`
@rustbot modify labels: +A-rustdoc-json |
This is for preliminary code review of the in progress implementation of RFC 2963.