-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rustdoc: simplify JS search routine by not messing with lev distance #105796
rustdoc: simplify JS search routine by not messing with lev distance #105796
Conversation
(rustbot has picked a reviewer for you, use r? to override) |
Some changes occurred in HTML/CSS/JS. cc @GuillaumeGomez, @Folyd, @jsha |
d4f91f8
to
5c6a168
Compare
Looks good to me. Considering search is a bit sensitive, let's also check if it's okay for others. cc @rust-lang/rustdoc |
I tested a couple of cases, and none of the changes seemed like an issue to me. So this PR is fine with me. |
@GuillaumeGomez Should this go through a FCP? |
I think it'd be a good idea to prevent it to be stuck on unknown status. Can you write up differences more in details so we can have a better overview of the changes brought by this PR please? |
Also added this to the PR description:
|
I'm in favor of this |
Thanks for the explanation @notriddle. Let's start the FCP then. @rfcbot fcp merge |
Team member @GuillaumeGomez has proposed to merge this. The next step is review by the rest of the tagged team members: No concerns currently listed. Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up! See this document for info about what commands tagged team members can give me. |
🔔 This is now entering its final comment period, as per the review above. 🔔 |
Since the sorting function accounts for an `index` field, there's not much reason to also be applying changes to the levenshtein distance. Instead, we can just not treat `lev` as a filter if there's already a non-sentinel value for `index`. This change gives slightly more weight to the index and path part, as search criteria, than it used to. This changes some of the test cases, but not in any obviously-"worse" way, and, in particular, substring matches are a bigger deal than levenshtein distances (we're assuming that a typo is less likely than someone just not typing the entire name). Based on rust-lang#103710 (comment)
5c6a168
to
db558b4
Compare
The final comment period, with a disposition to merge, as per the review above, is now complete. As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed. This will be merged soon. |
Thanks! @bors r+ |
It vanished from the queue. @bors r=GuillaumeGomez |
…-stop-doing-demerits, r=GuillaumeGomez rustdoc: simplify JS search routine by not messing with lev distance Since the sorting function accounts for an `index` field, there's not much reason to also be applying changes to the levenshtein distance. Instead, we can just not treat `lev` as a filter if there's already a non-sentinel value for `index`. <details> This change gives slightly more weight to the index and path part, as search criteria, than it used to. This changes some of the test cases, but not in any obviously-"worse" way, and, in particular, substring matches are a bigger deal than levenshtein distances (we're assuming that a typo is less likely than someone just not typing the entire name). The biggest change is the addition of a `path_lev` field to result items. It's always zero if the search query has no parent path part and for type queries, making the check in the `sortResults` function a no-op. When it's present, it is used to implement different precedence for the parent path and the tail. Consider the query `hashset::insert`, a test case [that already exists and can be found here](https://github.com/rust-lang/rust/blob/5c6a1681a9a7b815febdd9de2f840da338984e68/src/test/rustdoc-js-std/path-ordering.js). We want the ordering shown in the test case: ``` { 'path': 'std::collections::hash_set::HashSet', 'name': 'insert' }, { 'path': 'std::collections::hash_set::HashSet', 'name': 'get_or_insert' }, { 'path': 'std::collections::hash_set::HashSet', 'name': 'get_or_insert_with' }, { 'path': 'std::collections::hash_set::HashSet', 'name': 'get_or_insert_owned' }, { 'path': 'std::collections::hash_map::HashMap', 'name': 'insert' }, ``` We do not want this ordering, which is the ordering that would occur if substring position took priority over `path_lev`: ``` { 'path': 'std::collections::hash_set::HashSet', 'name': 'insert' }, { 'path': 'std::collections::hash_map::HashMap', 'name': 'insert' }, // BAD { 'path': 'std::collections::hash_set::HashSet', 'name': 'get_or_insert' }, { 'path': 'std::collections::hash_set::HashSet', 'name': 'get_or_insert_with' }, { 'path': 'std::collections::hash_set::HashSet', 'name': 'get_or_insert_owned' }, ``` We also do not want `HashSet::iter` to appear before `HashMap::insert`, which is what would happen if `path_lev` took priority over the appearance of any substring match. This is why the `sortResults` function has `path_lev` sandwiched between a `index < 0` check and a `index` comparison check: ``` { 'path': 'std::collections::hash_set::HashSet', 'name': 'insert' }, { 'path': 'std::collections::hash_set::HashSet', 'name': 'get_or_insert' }, { 'path': 'std::collections::hash_set::HashSet', 'name': 'get_or_insert_with' }, { 'path': 'std::collections::hash_set::HashSet', 'name': 'get_or_insert_owned' }, { 'path': 'std::collections::hash_set::HashSet', 'name': 'iter' }, // BAD { 'path': 'std::collections::hash_map::HashMap', 'name': 'insert' }, ``` The old code implemented a similar feature by manipulating the `lev` member based on whether a substring match was found and averaging in the path distance (`item.lev = name_lev + path_lev / 10`), so the path lev wound up acting like a tie breaker, but it gives slightly different results for `Vec::new`, [changing the test case](https://github.com/rust-lang/rust/pull/105796/files#diff-b346e2ef72a407915f438063c8c2c04f7a621df98923d441b41c0312211a5b21) because of the slight changes to ordering priority. </details> Based on rust-lang#103710 (comment) Previews: * https://notriddle.com/notriddle-rustdoc-demos/rustdoc-search-stop-doing-demerits/std/index.html * https://notriddle.com/notriddle-rustdoc-demos/rustdoc-search-stop-doing-demerits-compiler/index.html
…mpiler-errors Rollup of 8 pull requests Successful merges: - rust-lang#105796 (rustdoc: simplify JS search routine by not messing with lev distance) - rust-lang#106753 (Make sure that RPITITs are not considered suggestable) - rust-lang#106917 (Encode const mir for closures if they're const) - rust-lang#107004 (Implement some candidates for the new solver (redux)) - rust-lang#107023 (Stop using `BREAK` & `CONTINUE` in compiler) - rust-lang#107030 (Correct typo) - rust-lang#107042 (rustdoc: fix corner cases with "?" JS keyboard command) - rust-lang#107045 (rustdoc: remove redundant CSS rule `#settings .setting-line`) Failed merges: r? `@ghost` `@rustbot` modify labels: rollup
…-2023, r=GuillaumeGomez rustdoc: compute maximum Levenshtein distance based on the query Preview: https://notriddle.com/notriddle-rustdoc-demos/search-lev-distance-2023/std/index.html?search=regex The heuristic is pretty close to the name resolver, maxLevDistance = `Math.floor(queryLen / 3)`. Fixes rust-lang#103357 Fixes rust-lang#82131 Similar to rust-lang#103710, but following the suggestion in rust-lang#103710 (comment) to use `floor` instead of `ceil`, and unblocked now that rust-lang#105796 made it so that setting the max lev distance to `0` doesn't cause substring matches to be removed.
…GuillaumeGomez rustdoc: compute maximum Levenshtein distance based on the query Preview: https://notriddle.com/notriddle-rustdoc-demos/search-lev-distance-2023/std/index.html?search=regex The heuristic is pretty close to the name resolver, maxLevDistance = `Math.floor(queryLen / 3)`. Fixes #103357 Fixes #82131 Similar to rust-lang/rust#103710, but following the suggestion in rust-lang/rust#103710 (comment) to use `floor` instead of `ceil`, and unblocked now that rust-lang/rust#105796 made it so that setting the max lev distance to `0` doesn't cause substring matches to be removed.
Pkgsrc changes: * Adjust patches and cargo checksums to new versions. Upstream changes: Version 1.68.0 (2023-03-09) =========================== Language -------- - [Stabilize default_alloc_error_handler] (rust-lang/rust#102318) This allows usage of `alloc` on stable without requiring the definition of a handler for allocation failure. Defining custom handlers is still unstable. - [Stabilize `efiapi` calling convention.] (rust-lang/rust#105795) - [Remove implicit promotion for types with drop glue] (rust-lang/rust#105085) Compiler -------- - [Change `bindings_with_variant_name` to deny-by-default] (rust-lang/rust#104154) - [Allow .. to be parsed as let initializer] (rust-lang/rust#105701) - [Add `armv7-sony-vita-newlibeabihf` as a tier 3 target] (rust-lang/rust#105712) - [Always check alignment during compile-time const evaluation] (rust-lang/rust#104616) - [Disable "split dwarf inlining" by default.] (rust-lang/rust#106709) - [Add vendor to Fuchsia's target triple] (rust-lang/rust#106429) - [Enable sanitizers for s390x-linux] (rust-lang/rust#107127) Libraries --------- - [Loosen the bound on the Debug implementation of Weak.] (rust-lang/rust#90291) - [Make `std::task::Context` !Send and !Sync] (rust-lang/rust#95985) - [PhantomData layout guarantees] (rust-lang/rust#104081) - [Don't derive Debug for `OnceWith` & `RepeatWith`] (rust-lang/rust#104163) - [Implement DerefMut for PathBuf] (rust-lang/rust#105018) - [Add O(1) `Vec -> VecDeque` conversion guarantee] (rust-lang/rust#105128) - [Leak amplification for peek_mut() to ensure BinaryHeap's invariant is always met] (rust-lang/rust#105851) Stabilized APIs --------------- - [`{core,std}::pin::pin!`] (https://doc.rust-lang.org/stable/std/pin/macro.pin.html) - [`impl From<bool> for {f32,f64}`] (https://doc.rust-lang.org/stable/std/primitive.f32.html#impl-From%3Cbool%3E-for-f32) - [`std::path::MAIN_SEPARATOR_STR`] (https://doc.rust-lang.org/stable/std/path/constant.MAIN_SEPARATOR_STR.html) - [`impl DerefMut for PathBuf`] (https://doc.rust-lang.org/stable/std/path/struct.PathBuf.html#impl-DerefMut-for-PathBuf) These APIs are now stable in const contexts: - [`VecDeque::new`] (https://doc.rust-lang.org/stable/std/collections/struct.VecDeque.html#method.new) Cargo ----- - [Stabilize sparse registry support for crates.io] (rust-lang/cargo#11224) - [`cargo build --verbose` tells you more about why it recompiles.] (rust-lang/cargo#11407) - [Show progress of crates.io index update even `net.git-fetch-with-cli` option enabled] (rust-lang/cargo#11579) Misc ---- Compatibility Notes ------------------- - [Add `SEMICOLON_IN_EXPRESSIONS_FROM_MACROS` to future-incompat report] (rust-lang/rust#103418) - [Only specify `--target` by default for `-Zgcc-ld=lld` on wasm] (rust-lang/rust#101792) - [Bump `IMPLIED_BOUNDS_ENTAILMENT` to Deny + ReportNow] (rust-lang/rust#106465) - [`std::task::Context` no longer implements Send and Sync] (rust-lang/rust#95985) nternal Changes ---------------- These changes do not affect any public interfaces of Rust, but they represent significant improvements to the performance or internals of rustc and related tools. - [Encode spans relative to the enclosing item] (rust-lang/rust#84762) - [Don't normalize in AstConv] (rust-lang/rust#101947) - [Find the right lower bound region in the scenario of partial order relations] (rust-lang/rust#104765) - [Fix impl block in const expr] (rust-lang/rust#104889) - [Check ADT fields for copy implementations considering regions] (rust-lang/rust#105102) - [rustdoc: simplify JS search routine by not messing with lev distance] (rust-lang/rust#105796) - [Enable ThinLTO for rustc on `x86_64-pc-windows-msvc`] (rust-lang/rust#103591) - [Enable ThinLTO for rustc on `x86_64-apple-darwin`] (rust-lang/rust#103647)
Pkgsrc changes: * Adjust patches (add & remove) and cargo checksums to new versions. * It's conceivable that the workaround for LLVM based NetBSD works even less in this version (ref. PKGSRC_HAVE_LIBCPP not having a corresponding patch anymore). Upstream changes: Version 1.68.2 (2023-03-28) =========================== - [Update the GitHub RSA host key bundled within Cargo] (rust-lang/cargo#11883). The key was [rotated by GitHub] (https://github.blog/2023-03-23-we-updated-our-rsa-ssh-host-key/) on 2023-03-24 after the old one leaked. - [Mark the old GitHub RSA host key as revoked] (rust-lang/cargo#11889). This will prevent Cargo from accepting the leaked key even when trusted by the system. - [Add support for `@revoked` and a better error message for `@cert-authority` in Cargo's SSH host key verification] (rust-lang/cargo#11635) Version 1.68.1 (2023-03-23) =========================== - [Fix miscompilation in produced Windows MSVC artifacts] (rust-lang/rust#109094) This was introduced by enabling ThinLTO for the distributed rustc which led to miscompilations in the resulting binary. Currently this is believed to be limited to the -Zdylib-lto flag used for rustc compilation, rather than a general bug in ThinLTO, so only rustc artifacts should be affected. - [Fix --enable-local-rust builds] (rust-lang/rust#109111) - [Treat `$prefix-clang` as `clang` in linker detection code] (rust-lang/rust#109156) - [Fix panic in compiler code] (rust-lang/rust#108162) Version 1.68.0 (2023-03-09) =========================== Language -------- - [Stabilize default_alloc_error_handler] (rust-lang/rust#102318) This allows usage of `alloc` on stable without requiring the definition of a handler for allocation failure. Defining custom handlers is still unstable. - [Stabilize `efiapi` calling convention.] (rust-lang/rust#105795) - [Remove implicit promotion for types with drop glue] (rust-lang/rust#105085) Compiler -------- - [Change `bindings_with_variant_name` to deny-by-default] (rust-lang/rust#104154) - [Allow .. to be parsed as let initializer] (rust-lang/rust#105701) - [Add `armv7-sony-vita-newlibeabihf` as a tier 3 target] (rust-lang/rust#105712) - [Always check alignment during compile-time const evaluation] (rust-lang/rust#104616) - [Disable "split dwarf inlining" by default.] (rust-lang/rust#106709) - [Add vendor to Fuchsia's target triple] (rust-lang/rust#106429) - [Enable sanitizers for s390x-linux] (rust-lang/rust#107127) Libraries --------- - [Loosen the bound on the Debug implementation of Weak.] (rust-lang/rust#90291) - [Make `std::task::Context` !Send and !Sync] (rust-lang/rust#95985) - [PhantomData layout guarantees] (rust-lang/rust#104081) - [Don't derive Debug for `OnceWith` & `RepeatWith`] (rust-lang/rust#104163) - [Implement DerefMut for PathBuf] (rust-lang/rust#105018) - [Add O(1) `Vec -> VecDeque` conversion guarantee] (rust-lang/rust#105128) - [Leak amplification for peek_mut() to ensure BinaryHeap's invariant is always met] (rust-lang/rust#105851) Stabilized APIs --------------- - [`{core,std}::pin::pin!`] (https://doc.rust-lang.org/stable/std/pin/macro.pin.html) - [`impl From<bool> for {f32,f64}`] (https://doc.rust-lang.org/stable/std/primitive.f32.html#impl-From%3Cbool%3E-for-f32) - [`std::path::MAIN_SEPARATOR_STR`] (https://doc.rust-lang.org/stable/std/path/constant.MAIN_SEPARATOR_STR.html) - [`impl DerefMut for PathBuf`] (https://doc.rust-lang.org/stable/std/path/struct.PathBuf.html#impl-DerefMut-for-PathBuf) These APIs are now stable in const contexts: - [`VecDeque::new`] (https://doc.rust-lang.org/stable/std/collections/struct.VecDeque.html#method.new) Cargo ----- - [Stabilize sparse registry support for crates.io] (rust-lang/cargo#11224) - [`cargo build --verbose` tells you more about why it recompiles.] (rust-lang/cargo#11407) - [Show progress of crates.io index update even `net.git-fetch-with-cli` option enabled] (rust-lang/cargo#11579) Misc ---- Compatibility Notes ------------------- - [Add `SEMICOLON_IN_EXPRESSIONS_FROM_MACROS` to future-incompat report] (rust-lang/rust#103418) - [Only specify `--target` by default for `-Zgcc-ld=lld` on wasm] (rust-lang/rust#101792) - [Bump `IMPLIED_BOUNDS_ENTAILMENT` to Deny + ReportNow] (rust-lang/rust#106465) - [`std::task::Context` no longer implements Send and Sync] (rust-lang/rust#95985) nternal Changes ---------------- These changes do not affect any public interfaces of Rust, but they represent significant improvements to the performance or internals of rustc and related tools. - [Encode spans relative to the enclosing item] (rust-lang/rust#84762) - [Don't normalize in AstConv] (rust-lang/rust#101947) - [Find the right lower bound region in the scenario of partial order relations] (rust-lang/rust#104765) - [Fix impl block in const expr] (rust-lang/rust#104889) - [Check ADT fields for copy implementations considering regions] (rust-lang/rust#105102) - [rustdoc: simplify JS search routine by not messing with lev distance] (rust-lang/rust#105796) - [Enable ThinLTO for rustc on `x86_64-pc-windows-msvc`] (rust-lang/rust#103591) - [Enable ThinLTO for rustc on `x86_64-apple-darwin`] (rust-lang/rust#103647) Version 1.67.0 (2023-01-26) ========================== Language -------- - [Make `Sized` predicates coinductive, allowing cycles.] (rust-lang/rust#100386) - [`#[must_use]` annotations on `async fn` also affect the `Future::Output`.] (rust-lang/rust#100633) - [Elaborate supertrait obligations when deducing closure signatures.] (rust-lang/rust#101834) - [Invalid literals are no longer an error under `cfg(FALSE)`.] (rust-lang/rust#102944) - [Unreserve braced enum variants in value namespace.] (rust-lang/rust#103578) Compiler -------- - [Enable varargs support for calling conventions other than `C` or `cdecl`.] (rust-lang/rust#97971) - [Add new MIR constant propagation based on dataflow analysis.] (rust-lang/rust#101168) - [Optimize field ordering by grouping m\*2^n-sized fields with equivalently aligned ones.] (rust-lang/rust#102750) - [Stabilize native library modifier `verbatim`.] (rust-lang/rust#104360) Added and removed targets: - [Add a tier 3 target for PowerPC on AIX] (rust-lang/rust#102293), `powerpc64-ibm-aix`. - [Add a tier 3 target for the Sony PlayStation 1] (rust-lang/rust#102689), `mipsel-sony-psx`. - [Add tier 3 `no_std` targets for the QNX Neutrino RTOS] (rust-lang/rust#102701), `aarch64-unknown-nto-qnx710` and `x86_64-pc-nto-qnx710`. - [Remove tier 3 `linuxkernel` targets] (rust-lang/rust#104015) (not used by the actual kernel). Refer to Rust's [platform support page][platform-support-doc] for more information on Rust's tiered platform support. Libraries --------- - [Merge `crossbeam-channel` into `std::sync::mpsc`.] (rust-lang/rust#93563) - [Fix inconsistent rounding of 0.5 when formatted to 0 decimal places.] (rust-lang/rust#102935) - [Derive `Eq` and `Hash` for `ControlFlow`.] (rust-lang/rust#103084) - [Don't build `compiler_builtins` with `-C panic=abort`.] (rust-lang/rust#103786) Stabilized APIs --------------- - [`{integer}::checked_ilog`] (https://doc.rust-lang.org/stable/std/primitive.i32.html#method.checked_ilog) - [`{integer}::checked_ilog2`] (https://doc.rust-lang.org/stable/std/primitive.i32.html#method.checked_ilog2) - [`{integer}::checked_ilog10`] (https://doc.rust-lang.org/stable/std/primitive.i32.html#method.checked_ilog10) - [`{integer}::ilog`] (https://doc.rust-lang.org/stable/std/primitive.i32.html#method.ilog) - [`{integer}::ilog2`] (https://doc.rust-lang.org/stable/std/primitive.i32.html#method.ilog2) - [`{integer}::ilog10`] (https://doc.rust-lang.org/stable/std/primitive.i32.html#method.ilog10) - [`NonZeroU*::ilog2`] (https://doc.rust-lang.org/stable/std/num/struct.NonZeroU32.html#method.ilog2) - [`NonZeroU*::ilog10`] (https://doc.rust-lang.org/stable/std/num/struct.NonZeroU32.html#method.ilog10) - [`NonZero*::BITS`] (https://doc.rust-lang.org/stable/std/num/struct.NonZeroU32.html#associatedconstant.BITS) These APIs are now stable in const contexts: - [`char::from_u32`] (https://doc.rust-lang.org/stable/std/primitive.char.html#method.from_u32) - [`char::from_digit`] (https://doc.rust-lang.org/stable/std/primitive.char.html#method.from_digit) - [`char::to_digit`] (https://doc.rust-lang.org/stable/std/primitive.char.html#method.to_digit) - [`core::char::from_u32`] (https://doc.rust-lang.org/stable/core/char/fn.from_u32.html) - [`core::char::from_digit`] (https://doc.rust-lang.org/stable/core/char/fn.from_digit.html) Compatibility Notes ------------------- - [The layout of `repr(Rust)` types now groups m\*2^n-sized fields with equivalently aligned ones.] (rust-lang/rust#102750) This is intended to be an optimization, but it is also known to increase type sizes in a few cases for the placement of enum tags. As a reminder, the layout of `repr(Rust)` types is an implementation detail, subject to change. - [0.5 now rounds to 0 when formatted to 0 decimal places.] (rust-lang/rust#102935) This makes it consistent with the rest of floating point formatting that rounds ties toward even digits. - [Chains of `&&` and `||` will now drop temporaries from their sub-expressions in evaluation order, left-to-right.] (rust-lang/rust#103293) Previously, it was "twisted" such that the _first_ expression dropped its temporaries _last_, after all of the other expressions dropped in order. - [Underscore suffixes on string literals are now a hard error.] (rust-lang/rust#103914) This has been a future-compatibility warning since 1.20.0. - [Stop passing `-export-dynamic` to `wasm-ld`.] (rust-lang/rust#105405) - [`main` is now mangled as `__main_void` on `wasm32-wasi`.] (rust-lang/rust#105468) - [Cargo now emits an error if there are multiple registries in the configuration with the same index URL.] (rust-lang/cargo#10592) Internal Changes ---------------- These changes do not affect any public interfaces of Rust, but they represent significant improvements to the performance or internals of rustc and related tools. - [Rewrite LLVM's archive writer in Rust.] (rust-lang/rust#97485)
Since the sorting function accounts for an
index
field, there's not much reason to also be applying changes to the levenshtein distance. Instead, we can just not treatlev
as a filter if there's already a non-sentinel value forindex
.This change gives slightly more weight to the index and path part, as search criteria, than it used to. This changes some of the test cases, but not in any obviously-"worse" way, and, in particular, substring matches are a bigger deal than levenshtein distances (we're assuming that a typo is less likely than someone just not typing the entire name).
The biggest change is the addition of a
path_lev
field to result items. It's always zero if the search query has no parent path part and for type queries, making the check in thesortResults
function a no-op. When it's present, it is used to implement different precedence for the parent path and the tail.Consider the query
hashset::insert
, a test case that already exists and can be found here. We want the ordering shown in the test case:We do not want this ordering, which is the ordering that would occur if substring position took priority over
path_lev
:We also do not want
HashSet::iter
to appear beforeHashMap::insert
, which is what would happen ifpath_lev
took priority over the appearance of any substring match. This is why thesortResults
function haspath_lev
sandwiched between aindex < 0
check and aindex
comparison check:The old code implemented a similar feature by manipulating the
lev
member based on whether a substring match was found and averaging in the path distance (item.lev = name_lev + path_lev / 10
), so the path lev wound up acting like a tie breaker, but it gives slightly different results forVec::new
, changing the test case because of the slight changes to ordering priority.Based on #103710 (comment)
Previews: