-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Greatly improve rustdoc xpath checks #89676
Greatly improve rustdoc xpath checks #89676
Conversation
This isn't a simple question to answer, particularly with the fairly scarce information provided here. What specific problems are we aiming to address with this move? Is parsel the recommended/most popular/standard replacement (how was it chosen?), and how is its compatibility with other platforms (e.g., does it have any native code, or just pure python)? Generally I think there is a fairly high cost if "just run rustdoc tests" starts adding in external dependencies that aren't already required. But evaluating that as a tradeoff against more correct parsing and similar is hard, not something I can readily do for you without knowing the pain(?) of the current selectors. |
I looked for packages supporting XPath in python. The initial problem was that the standard python lib we use is very bad (as the test fixes just showed) and way too limited. |
This comment has been minimized.
This comment has been minimized.
|
bf2c581
to
f5336e6
Compare
I meant for the install. |
|
Oh cool, well then |
This comment has been minimized.
This comment has been minimized.
It's surprising to me that 5 tests in @GuillaumeGomez If you have time, could you see which tests passed? Maybe they are not working correctly. |
@camelid I was able to compute the list: $ diff out2 out
1a2
> asm-foreign2.rs
63a65
> doc-cfg-target-feature.rs
95a98
> force-target-feature.rs
268a272
> issue-38219.rs
326a331
> macro-document-private-duplicate.rs
347a353
> no-run-still-checks-lints.rs
390a397
> sanitizer-option.rs
So I think it's normal for them. |
@GuillaumeGomez the tests having broken syntax doesn't seem like a big deal to me; as long as the test is still checking what it's supposed to, it doesn't really matter if the parser is too lenient. The invalid generated HTML seems like a bigger deal. But I'm not sure a different XPATH parser is the best way to check that? Can we change |
Do we have a case with higher traits in the std docs? Because that's what The main argument to switch to another xpath handler here is that the current one is too limited like I showed at the top. Which frustrated me at a few occasions where I wanted to select an item based on its text to then be able to select one of its child. |
👍 I missed that this allows @Mark-Simulacrum it sounds like you were concerned installing things is a pain - but x.py is already installing a lot of things behind the scenes, right? Is it a big deal to have it run @GuillaumeGomez if you could find a way to do this with lol-html or kuchiki that might be simpler as well. |
Neither |
AFAIK, we do not install anything in x.py that is a library for a different language, the only things x.py 'installs' are the rustc bootstrap binaries (and I suppose you could consider the Cargo packages we fetch to be installations too). To some extent rustdoc has increasingly had a larger footprint in terms of "tools it wants", which is not entirely unexpected as the tests for it's HTML and JS generation are fairly different in kind from e.g. compiletest which largely has a dependency on "can run binaries" and some LLVM tools for codegen checking. I think if we were to depend on parsel, I would want to do the same as we do for the JS tools we depend on today -- suggest how to install them, but not actually install them on the users behalf. I think the more tools and other stuff users need to install to run a test suite, the less likely they're to do so -- I would personally be quite wary of adding our first Python dependency for example outside of python's standard library.
It's worth noting that this seems like there is probably alternatives to switching to a different library here. For example, it seems plausible that we can expand ourselves upon the baseline support offered by Python's standard library and add in other XPath selectors (e.g., for text-based matching). It looks like upstream already somewhat supports the particular thing noted since 3.7 with the I think it's worth making sure there's @rust-lang/rustdoc discussion of the addition of new external dependencies, particularly ones not facilitated by Cargo (i.e., trivial for most people in the rust-lang/rust tree to get). I'll also note that both kuchiki and lol-html look like they support CSS-based selectors, which as far as I can tell are often sufficient, though seem to be a little less powerful. We could support both CSS and XPath selectors, in theory, and/or migrate to CSS selectors. My guess is that developers are more likely to quickly understand a CSS selector, fwiw. |
I agree that CSS selectors are easier to understand, but IIUC, they are much more limited in what they can express than XPath. At some point, I would like to try a system where you use XPath or CSS selectors to select a section of the DOM and then have it automatically written to a file, kind of like with UI test stderr. Then you could write tests like this: // @select decl 'foo/fn.bar.html' 'pre.rust'
fn bar() -> i32 { 123 } And then have a <pre class="rust ...">
fn bar() -> <a href="...">i32</a>
</pre> This would allow auto-blessing of tests, making it easier to change rustdoc's UI; and it would enable testing the order of DOM nodes, which would make writing tests for things like #89098 (comment) much easier, even if we used a more advanced XPath library. Also, I think we could do this with the existing Python library, since we'd be less reliant on fancy XPath syntax. |
I wonder if we can use a pure rust library for this and integrate this into compiletest? |
I suggested it and @jyn514 did as well. Unfortunately I don't know of any and developing such a pure rust library would take a lot of time. @Mark-Simulacrum: CSS selectors aren't easier to read (but I guess it's just a matter of point of view...), however XPaths allow more things. |
…ound-html-gen, r=notriddle Fix invalid HTML generation for higher bounds Considering this is a bug, I cherry-picked the commit from rust-lang#89676 so it's merged more quickly. r? `@notriddle`
…ound-html-gen, r=notriddle Fix invalid HTML generation for higher bounds Considering this is a bug, I cherry-picked the commit from rust-lang#89676 so it's merged more quickly. r? ``@notriddle``
The sxd-xpath crate exists. |
Oh nice! I'll take a look at it in the next days then. |
Just tested $ git diff
diff --git a/Cargo.lock b/Cargo.lock
index e45926f832c..6633d667dce 100644
--- a/Cargo.lock
+++ b/Cargo.lock
@@ -2506,6 +2506,12 @@ version = "2.1.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "d4fd5641d01c8f18a23da7b6fe29298ff4b55afcccdf78973b24cf3175fee32e"
+[[package]]
+name = "peresil"
+version = "0.3.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "f658886ed52e196e850cfbbfddab9eaa7f6d90dd0929e264c31e5cec07e09e57"
+
[[package]]
name = "perf-event-open-sys"
version = "1.0.1"
@@ -4590,6 +4596,19 @@ dependencies = [
"serde_json",
]
+[[package]]
+name = "rustdoc-test"
+version = "0.1.0"
+dependencies = [
+ "fs-err",
+ "getopts",
+ "once_cell",
+ "regex",
+ "shlex",
+ "sxd-document",
+ "sxd-xpath",
+]
+
[[package]]
name = "rustdoc-themes"
version = "0.1.0"
@@ -5078,6 +5097,27 @@ dependencies = [
"syn",
]
+[[package]]
+name = "sxd-document"
+version = "0.3.2"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "94d82f37be9faf1b10a82c4bd492b74f698e40082f0f40de38ab275f31d42078"
+dependencies = [
+ "peresil",
+ "typed-arena",
+]
+
+[[package]]
+name = "sxd-xpath"
+version = "0.4.2"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "36e39da5d30887b5690e29de4c5ebb8ddff64ebd9933f98a01daaa4fd11b36ea"
+dependencies = [
+ "peresil",
+ "quick-error 1.2.3",
+ "sxd-document",
+]
+
[[package]]
name = "syn"
version = "1.0.65"
@@ -5457,6 +5497,12 @@ dependencies = [
"tracing-subscriber",
]
+[[package]]
+name = "typed-arena"
+version = "1.7.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "a9b2228007eba4120145f785df0f6c92ea538f5a3635a612ecf4e334c8c1446d"
+
[[package]]
name = "typenum"
version = "1.12.0"
diff --git a/Cargo.toml b/Cargo.toml
index 42dd5d7ef43..6583375600b 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -37,6 +37,7 @@ members = [
"src/tools/html-checker",
"src/tools/bump-stage0",
"src/tools/lld-wrapper",
+ "src/tools/rustdoc-test",
]
exclude = [
diff --git a/library/backtrace b/library/backtrace
--- a/library/backtrace
+++ b/library/backtrace
@@ -1 +1 @@
-Subproject commit cc89bb66f91b2b4a640b0b525ca5d753e3346d7e
+Subproject commit cc89bb66f91b2b4a640b0b525ca5d753e3346d7e-dirty
diff --git a/library/stdarch b/library/stdarch
--- a/library/stdarch
+++ b/library/stdarch
@@ -1 +1 @@
-Subproject commit 5fdbc476afc81a789806697fc4a2d9d19b8c9993
+Subproject commit 5fdbc476afc81a789806697fc4a2d9d19b8c9993-dirty
diff --git a/src/bootstrap/test.rs b/src/bootstrap/test.rs
index 8594fa42266..a646e5dd5f4 100644
--- a/src/bootstrap/test.rs
+++ b/src/bootstrap/test.rs
@@ -1294,6 +1294,12 @@ fn run(self, builder: &Builder<'_>) {
cmd.arg("--jsondocck-path")
.arg(builder.ensure(tool::JsonDocCk { compiler: json_compiler, target }));
}
+ if mode == "rustdoc" {
+ // Use the beta compiler for rustdoc-test
+ let rustdoc_test_compiler = compiler.with_stage(0);
+ cmd.arg("--rustdoc-test-path")
+ .arg(builder.ensure(tool::RustdocTest { compiler: rustdoc_test_compiler, target }));
+ }
if mode == "run-make" && suite.ends_with("fulldeps") {
let rust_demangler = builder
diff --git a/src/bootstrap/tool.rs b/src/bootstrap/tool.rs
index af6f4bb0e5f..97e9dad0b12 100644
--- a/src/bootstrap/tool.rs
+++ b/src/bootstrap/tool.rs
@@ -376,6 +376,7 @@ fn run(self, builder: &Builder<'_>) -> PathBuf {
ExpandYamlAnchors, "src/tools/expand-yaml-anchors", "expand-yaml-anchors";
LintDocs, "src/tools/lint-docs", "lint-docs";
JsonDocCk, "src/tools/jsondocck", "jsondocck";
+ RustdocTest, "src/tools/rustdoc-test", "rustdoc-test";
HtmlChecker, "src/tools/html-checker", "html-checker";
BumpStage0, "src/tools/bump-stage0", "bump-stage0";
);
diff --git a/src/tools/compiletest/src/common.rs b/src/tools/compiletest/src/common.rs
index cd0a56d08d8..029f23f0057 100644
--- a/src/tools/compiletest/src/common.rs
+++ b/src/tools/compiletest/src/common.rs
@@ -207,6 +207,9 @@ pub struct Config {
/// The jsondocck executable.
pub jsondocck_path: Option<String>,
+ /// The rustdoc-test executable.
+ pub rustdoc_test_path: Option<String>,
+
/// The LLVM `FileCheck` binary path.
pub llvm_filecheck: Option<PathBuf>,
diff --git a/src/tools/compiletest/src/main.rs b/src/tools/compiletest/src/main.rs
index 87aba8c5d32..07aa2b4ecb1 100644
--- a/src/tools/compiletest/src/main.rs
+++ b/src/tools/compiletest/src/main.rs
@@ -64,6 +64,7 @@ pub fn parse_config(args: Vec<String>) -> Config {
.reqopt("", "lldb-python", "path to python to use for doc tests", "PATH")
.reqopt("", "docck-python", "path to python to use for doc tests", "PATH")
.optopt("", "jsondocck-path", "path to jsondocck to use for doc tests", "PATH")
+ .optopt("", "rustdoc-test-path", "path to rustdoc-test to use for rustdoc tests", "PATH")
.optopt("", "valgrind-path", "path to Valgrind executable for Valgrind tests", "PROGRAM")
.optflag("", "force-valgrind", "fail if Valgrind tests cannot be run under Valgrind")
.optopt("", "run-clang-based-tests-with", "path to Clang executable", "PATH")
@@ -223,6 +224,7 @@ fn make_absolute(path: PathBuf) -> PathBuf {
lldb_python: matches.opt_str("lldb-python").unwrap(),
docck_python: matches.opt_str("docck-python").unwrap(),
jsondocck_path: matches.opt_str("jsondocck-path"),
+ rustdoc_test_path: matches.opt_str("rustdoc-test-path"),
valgrind_path: matches.opt_str("valgrind-path"),
force_valgrind: matches.opt_present("force-valgrind"),
run_clang_based_tests_with: matches.opt_str("run-clang-based-tests-with"),
diff --git a/src/tools/compiletest/src/runtest.rs b/src/tools/compiletest/src/runtest.rs
index 4470272a9f8..e75850466f1 100644
--- a/src/tools/compiletest/src/runtest.rs
+++ b/src/tools/compiletest/src/runtest.rs
@@ -2220,11 +2220,18 @@ fn run_rustdoc_test(&self) {
self.check_rustdoc_test_option(proc_res);
} else {
let root = self.config.find_rust_src_root().unwrap();
+ // let res = self.cmd2procres(
+ // Command::new(&self.config.docck_python)
+ // .arg(root.join("src/etc/htmldocck.py"))
+ // .arg(&out_dir)
+ // .arg(&self.testpaths.file),
+ // );
let res = self.cmd2procres(
- Command::new(&self.config.docck_python)
- .arg(root.join("src/etc/htmldocck.py"))
- .arg(&out_dir)
- .arg(&self.testpaths.file),
+ Command::new(self.config.rustdoc_test_path.as_ref().unwrap())
+ .arg("--doc-dir")
+ .arg(root.join(&out_dir))
+ .arg("--template")
+ .arg(&self.testpaths.file),
);
if !res.status.success() {
self.fatal_proc_rec_with_ctx("htmldocck failed!", &res, |mut this| { And the error I got from Check failed: XML parsing error at 0: {Expected("<?xml"), ExpectedComment, ExpectedElement, ExpectedProcessingInstruction, ExpectedWhitespace} EDIT: And of course, because I'm very stupid sometimes, I forgot to add the files into git so the diff doesn't display them and I removed them before realizing my mistake... So just in case you want to try your luck from what I did, I copied the use crate::error::CkError;
use sxd_document::Package;
use sxd_document::dom::Document;
use std::collections::HashMap;
use std::io;
use std::path::{Path, PathBuf};
use fs_err as fs;
#[derive(Debug)]
pub struct Cache {
root: PathBuf,
files: HashMap<PathBuf, String>,
values: HashMap<PathBuf, Package>,
last_path: Option<PathBuf>,
}
impl Cache {
/// Create a new cache, used to read files only once and otherwise store their contents.
pub fn new(doc_dir: &str) -> Cache {
Cache {
root: Path::new(doc_dir).to_owned(),
files: HashMap::new(),
values: HashMap::new(),
last_path: None,
}
}
fn resolve_path(&mut self, path: &str) -> PathBuf {
if path != "-" {
let resolve = self.root.join(path);
self.last_path = Some(resolve.clone());
resolve
} else {
self.last_path
.as_ref()
// FIXME: Point to a line number
.expect("No last path set. Make sure to specify a full path before using `-`")
.clone()
}
}
fn read_file(&mut self, path: &PathBuf) -> Result<String, io::Error> {
if let Some(f) = self.files.get(path) {
return Ok(f.clone());
}
let file = fs::read_to_string(path)?;
self.files.insert(path.clone(), file.clone());
Ok(file)
}
/// Get the text from a file. If called multiple times, the file will only be read once
pub fn get_file(&mut self, path: &str) -> Result<String, io::Error> {
let path = self.resolve_path(path);
self.read_file(&path)
}
/// Parse the DOM from a file. If called multiple times, the file will only be read once.
pub fn get_value<T, F: Fn(Document<'_>) -> T>(&mut self, path: &str, f: F) -> Result<T, CkError> {
let rpath = self.resolve_path(path);
if let Some(v) = self.values.get(&rpath) {
return Ok(f(v.as_document()));
}
let content = self.read_file(&rpath)?;
let val = sxd_document::parser::parse(&content)?;
self.values.insert(rpath, val);
self.get_value(path, f)
}
} |
☔ The latest upstream changes (presumably #90050) made this pull request unmergeable. Please resolve the merge conflicts. |
@GuillaumeGomez I think something went wrong the last time you rebased — this PR includes a debuginfo commit, which doesn't seem right :) |
@camelid Behold the power of |
4b18e74
to
e2ec807
Compare
e2ec807
to
44378ac
Compare
Rebased and cleaned up the commit mess. |
The job Click to see the possible cause of the failure (guessed by this bot)
|
☔ The latest upstream changes (presumably #91505) made this pull request unmergeable. Please resolve the merge conflicts. |
Just noting that |
Unfortunately it doesn't. We still can't select items based on their text content. I needed that recently to count items which had a specific text but couldn't. |
I would like to not be responsible for this. r? @camelid |
We recently encountered a new issue with the current tool with #95813 and We looked into it with @Urgau and the tool I picked in this PR doesn't fix this issue either so I'd need to pick a completely different one. I'll try to come back to this as soon as possible. |
Can you explain the issue with |
It doesn't keep the HTML as is, it generates it. So if you have
you can match it against |
…=jsha Add empty impl blocks if they have documentation Fixes rust-lang#90866. The update for the test script is needed to count the number of impl blocks we have with only the struct. To be noted that with rust-lang#89676 merged, it wouldn't be needed (I don't know what is the status of it btw. cc `@Mark-Simulacrum).` It looks like this: ![Screenshot from 2021-11-14 16-51-28](https://user-images.githubusercontent.com/3050060/141689100-e57123c0-bf50-4c42-adf5-d991e169a0e4.png) cc `@jyn514` r? `@camelid`
…=jsha Add empty impl blocks if they have documentation Fixes rust-lang#90866. The update for the test script is needed to count the number of impl blocks we have with only the struct. To be noted that with rust-lang#89676 merged, it wouldn't be needed (I don't know what is the status of it btw. cc ``@Mark-Simulacrum).`` It looks like this: ![Screenshot from 2021-11-14 16-51-28](https://user-images.githubusercontent.com/3050060/141689100-e57123c0-bf50-4c42-adf5-d991e169a0e4.png) cc ``@jyn514`` r? ``@camelid``
…=jsha Add empty impl blocks if they have documentation Fixes rust-lang#90866. The update for the test script is needed to count the number of impl blocks we have with only the struct. To be noted that with rust-lang#89676 merged, it wouldn't be needed (I don't know what is the status of it btw. cc ```@Mark-Simulacrum).``` It looks like this: ![Screenshot from 2021-11-14 16-51-28](https://user-images.githubusercontent.com/3050060/141689100-e57123c0-bf50-4c42-adf5-d991e169a0e4.png) cc ```@jyn514``` r? ```@camelid```
Closing this after a discussion with @GuillaumeGomez |
Before that, we were using the XML parser from the python standard library which was... pretty limited and very nice on invalid XPaths while preventing to use nice features like
//*[text()="whatever"]
. This PR switches toparsel
.While doing this switch, I uncovered quite a lot of small bugs, one being in the HTML generated by rustdoc.
One question remains now: is it ok to ask for a new package to be installed (
parsel
) to runsrc/test/rustdoc
test suite @Mark-Simulacrum ? If so, I can add a check into bootstrap and update the requirements in the CI.r? @jyn514