forked from rust-lang/rust
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit fc7ca70
authored
Rollup merge of rust-lang#133005 - notriddle:notriddle/trie-search, r=GuillaumeGomez
rustdoc: use a trie for name-based search
Potentially rust-lang#131156 — need to try reproducing the problem with `windows`
Preview and profiler results
----------------------------
Here's some quick profiling in Firefox done on the rust compiler docs:
- Before: https://share.firefox.dev/3UPm3M8
- After: https://share.firefox.dev/40LXvYb
Here's the results for the node.js profiler:
- https://notriddle.com/rustdoc-html-demo-15/trie-perf/index.html
Here's a copy that you can use to try it out. Compare it with [the nightly]. Try typing `typecheckercontext` one character at a time, slowly.
- https://notriddle.com/rustdoc-html-demo-15/compiler-doc-trie/index.html
[the nightly]: https://doc.rust-lang.org/nightly/nightly-rustc/
The fuzzy match algo is based on [Fast String Correction with Levenshtein-Automata] and the corresponding implementation code in [moman] and [Lucene]; the bit-packing representation comes from Lucene, but the actual matcher is more based on `fsc.py`. As suggested in the paper, a trie is used to represent the FSA dictionary.
The same trie is used for prefix matching. Substring matching is done with a side table of three-character[^1] windows that point into the trie.
[Fast String Correction with Levenshtein-Automata]: https://github.com/tpn/pdfs/blob/master/Fast%20String%20Correction%20with%20Levenshtein-Automata%20(2002)%20(10.1.1.16.652).pdf
[Lucene]: https://fossies.org/linux/lucene/lucene/core/src/java/org/apache/lucene/util/automaton/Lev1TParametricDescription.java
[moman]: https://gitlab.com/notriddle/moman-rustdoc
User-visible changes
--------------------
I don't expect anybody to notice anything, but it does cause two changes:
- Substring matches, in the middle of a name, only apply if there's three or more characters in the search query.
- Levenshtein distance limit now maxes out at two. In the old version, the limit was w/3, so you could get looser matches for queries with 9 or more characters[^1] in them.
- It uses more RAM.
- It's faster (assuming you don't swap thrash).
[^1]: technically utf-16 code unitsFile tree
6 files changed
+756
-127
lines changedFilter options
- src/librustdoc/html/static/js
- tests
- rustdoc-js
- rustdoc-js-std
6 files changed
+756
-127
lines changed
0 commit comments