Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sort doc search results by their total module path length #16735

Closed
wants to merge 1 commit into from
Closed

Sort doc search results by their total module path length #16735

wants to merge 1 commit into from

Conversation

carols10cents
Copy link
Member

Hi! This is my first attempt at a rust contribution (:fireworks:)! I think this could improve the documentation search results.

Using the assumption that more commonly used items are going to be higher in the module hierarchy and have shorter names, I've added a sort using the total module path length where longer will be ranked later. I think this is a good heuristic for more important and more relevant to the person searching the docs.

I've put the docs with my change up here, if you'd like to play with it to see the difference: https://carols10cents.github.io/rustdoc-playground/std/index.html?search=read

Here are some screenshots of search results before and after this change, where before is the nightly docs as of today on rust-lang.org. I haven't been using rust long enough to be able to say this is always going to be the better ranking, but it doesn't seem to make the results in general any worse, and in some cases makes them dramatically better.

screen_shot_2014-08-24_at_4_41_06_pm

screen_shot_2014-08-24_at_4_41_03_pm

screen_shot_2014-08-24_at_4_39_15_pm

screen_shot_2014-08-24_at_4_39_06_pm

screen_shot_2014-08-24_at_4_38_56_pm

screen_shot_2014-08-24_at_4_38_49_pm

An added bonus is that you can see a bit better how the results are being sorted; it just looks less random when they're ordered (in groups) shortest to longest.

I'd love to hear what more experienced rust documentation users think!

@lilyball
Copy link
Contributor

If I type in "read" and have to look halfway down the page before I find a function literally named read(), then I'm going to think the search sorting is really bad.

Relevance is the most important metric here. Short paths are a rather poor approximation of relevance. I think it's an important signal to use for determining relevance, but it's not at all the most important.

@Gankra
Copy link
Contributor

Gankra commented Aug 24, 2014

I concur with kballard here. At very least, exact matches should always be prioritized first. After that I guess this heuristic can subsort the results?

@carols10cents
Copy link
Member Author

Ah I see! Easy enough to try-- I updated the code on https://carols10cents.github.io/rustdoc-playground/std/index.html?search=read as well. Now the search results for read look like:

screen shot 2014-08-24 at 6 40 42 pm

Wdyt?

@lilyball
Copy link
Contributor

Seems maybe a bit odd for the results to change dramatically going from "rea" to "read". I don't really know though. Maybe it's ok.

What I'd really like to see is simply more intelligent searching. If I search for "ptr::re" I get lots of results that don't even have "ptr" in the name. My personal expectation of how the search should work is basically finding all results that match case-insensitively if you turn the search string into a glob pattern by inserting a * between every character. It would still want to do relevance ordering, e.g. characters that are actually next to each other in the result are a better match than others, e.g. "ptr" matches "ptr" better than "pointer". And it would also still want to do some sort of edit distance fuzzy matching too, to account for typos.

Search is a hard problem.

@Gankra
Copy link
Contributor

Gankra commented Aug 24, 2014

cc @cmr, any thoughts?

@steveklabnik
Copy link
Member

The behavior you observe helps with misspellings.

@emberian
Copy link
Member

I have no particular feelings on search, though the other comments seem good.

With the assumption that more commonly used items are going to be higher
in the module hierarchy and have shorter names, so this is a good
heuristic for more important and more relevant to the person searching
the docs.
@carols10cents
Copy link
Member Author

HAH, oops, my original code had a bug where it wasn't including the name. I've put it back to have the new heuristic before the exact match heuristic again, and now std::ptr::read is first BUT there are still more things that look more relevant to me higher up than, say, std::io::util::ZeroReader::read:

screen shot 2014-08-24 at 7 37 43 pm

@carols10cents
Copy link
Member Author

@steveklabnik do you mean the current behavior of the search helps with misspellings? Do you have an example?

@Gankra
Copy link
Contributor

Gankra commented Aug 24, 2014

@carols10cents The search results incorporate levenshtein distance into their ordering as of this PR: #15385

@carols10cents
Copy link
Member Author

@gankro yup, and it's currently ranking all non-levenshtein matches above all levenshtein. I've based my changes and tests from master; this change does not interfere with the levenshtein distance as it is on master.

@Gankra
Copy link
Contributor

Gankra commented Aug 24, 2014

@carols10cents Ah, yep, you're right. (Also wow, how over-engineered is that sort function?)

@carols10cents
Copy link
Member Author

@gankro hah, yep, it's a doozy :) At least it's commented...?

bors added a commit that referenced this pull request Sep 4, 2014
This has the primary advantage of not interfering with browser default behavior for links like being able to cmd/ctrl+click on a result to open the result in a new tab but leave the current page as-is (previous behavior both opened a new tab and changed the current tab's location to the result's).

I've done my best to keep the rest of the behavior and the appearance the same-- the whole row still highlights, still has a hand cursor, still moves to the result page with a normal click, arrows+enter still work. If the result is on the current page, the search is simply hidden.

The biggest difference in behavior is that people using tab to navigate through the links will have to hit tab twice for each row, since each cell has its own `a` tag.. I could fix this by switching to `div`s and `span`s instead of a table, but that's potentially more CSS finicky?

The biggest difference in appearance is probably that all the text in the search results is Fira Sans now, instead of just the method name with the rest of the text in Source Serif Pro. I can put this appearance back, but it looks like all links anywhere on the page are Fira Sans. Only the name was in an `a` tag before, but the whole row was ACTING like a link, so I think this is actually more consistent.

[I've pushed these changes to a gh-pages repo](https://carols10cents.github.io/rustdoc-playground/std/index.html?search=t) if you'd like to take a look at the effects; note that I also have my changes for PR #16735 there too so the search results will be sorted differently than on master.
@aturon
Copy link
Member

aturon commented Oct 17, 2014

This PR has gotten pretty stale. What's the consensus here? Is this something we want to do?

@Gankra
Copy link
Contributor

Gankra commented Oct 17, 2014

I fiddled around with some searches to see how the results compare. For unambiguous searches it's a total wash. Minor shuffling. For ambiguous searches there's some pretty big shifts, but it's not clear if the results are better or worse as a result.

I think the strongest thing I can say about this functionality is that it visually smooths out the search results, making them easier to parse, in my subjective opinion. I'm overall positive on the feature, it's just that the gains are just very marginal if real at all. That said, it's a small, easily-reversible, addition to what is already an abomination of a function.

Thankfully it's not my call 😅

@aturon
Copy link
Member

aturon commented Nov 15, 2014

@carols10cents First, I want to apologize for the very long wait on hearing anything definitive here. We try not to let PRs fall through the cracks, but this one has. Please feel free to reach out to any of the Rust developers on IRC if you're not hearing back.

Second, I want to thank you for making the PR in the first place.

All that said, I think I'm going to close this PR for now. While I agree that our current sorting is not ideal, I don't think this is a clear enough improvement to warrant the churn. I think @kballard made some good suggestions about a deeper change here that might be worth pursuing.

Finally, don't hesitate to reach out if you'd like guidance on contributions!

@aturon aturon closed this Nov 15, 2014
@carols10cents
Copy link
Member Author

@aturon Thanks for the response!! To be frank, I forgot about this PR too :) Thank you to everyone on this issue for your feedback, I might take another crack at this sometime and I totally understand the decision not to merge this change. 💖

@carols10cents carols10cents deleted the doc-search-sort-by-total-length branch May 8, 2015 01:19
bors added a commit to rust-lang-ci/rust that referenced this pull request Mar 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants