-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve search index generation for PHP.net #154
Conversation
Improves the search indexes generated by the PHP-Web format by: - Adding short descriptions to entries that lack them - Skipping non-chunk entries (page elements)
3c44489
to
e3fb93a
Compare
Co-authored-by: Kamil Tekiela <tekiela246@gmail.com>
3acf1b6
to
ba87ca6
Compare
Will this break the search if we merge it now? |
No. It is safe to merge as it doesn't alter the JSON structure. |
Thank you! |
Just curious. You said it would not affect the current functionality, right? Does that mean you changed something in the other PR that will avail of this change? |
I said that it wouldn't break the current search, but you can already see the improved results on php.net. Unfortunately, the current UI hides them under the last result group ("Other Matches"). You'll probably need to scroll down the menu to see it. You may also need to clear you local storage because the search index is cached for two weeks. Try searching for "syntax", "types" or "operators". You should see some language reference results under "Other Matches". Those pages were missing from the index before because they don't have an |
One question: how does removing non-chunk entries address the issue of empty long/short descriptions not using their short/long counterpart? Beside that this change not related to the problem, it also removes the possibility of searching for any useful sections that don't necessarily need an entire page (e.g. the ternary or the null coalescing operators). |
Thanks for bringing this up. The main reason for skipping non-chunk entries was to avoid adding an To properly integrate non-chunk entries into the search, we'd need to:
Your question led me to some interesting discoveries:
|
@kamil-tekiela would you mind adding the |
Done |
Note
This is a companion PR to php/web-php#1084, but it is not dependent on it and can be merged independently.
Intro
PHD index entries can have both short (sdesc) and long (ldesc) descriptions. Often, the short description is empty, leading to the current fallback mechanism in
getShortDescription
andgetLongDescription
:Problem
The current search index JSON generation doesn't utilize this fallback
mechanism, resulting in missing entries in PHP.net search results.
Solution
This PR addresses the issue by:
Impact
Examples
Currently, "PHP Manual > Language Reference > Types > String" is missing from search results due to an empty short description. This change will use "Strings" (the long description) as the title and "Language Reference" (the parent book title) as the long description.
search-index.json
search-description.json
Statistics
Generated Search Index Diff Preview
Only the first 100 lines are shown for brevity.