-
Notifications
You must be signed in to change notification settings - Fork 54
Description
Background
In #154, we addressed the issue of missing pages in the search index. The root cause was that some index entries lacked a title property (sdesc). We resolved this by applying the same solution used for the manual content: using the description (ldesc) as the title.
To avoid duplicating text in both the title and description fields, we now pull the description from the parent <book>. For example:
In this example, the title "Type" was taken from the page description, while the new description ("Language Reference") comes from the parent <book>. You can see the implementation here:
phd/phpdotnet/phd/Package/PHP/Web.php
Lines 244 to 258 in 673b2da
| if ($index["sdesc"] === "" && $index["ldesc"] !== "") { | |
| $index["sdesc"] = $index["ldesc"]; | |
| $parentId = $index['parent_id']; | |
| // isset() to guard against undefined array keys, either for root | |
| // elements (no parent) or in case the index structure is broken. | |
| while (isset($this->indexes[$parentId])) { | |
| $parent = $this->indexes[$parentId]; | |
| if ($parent['element'] === 'book') { | |
| $index["ldesc"] = Format::getLongDescription($parent['docbook_id']); | |
| break; | |
| } | |
| $parentId = $parent['parent_id']; | |
| } | |
| } |
Issue
Some entries, like extension main pages (e.g. book.strings, book.zip) and top-level pages (e.g. copyright, getting-started, security), don’t have a parent <book>. In these cases, the description is being reused as the title, resulting in duplicate content:
Proposed fix
While some entries lack a parent <book>, every entry has at least one parent <set>. The root entry itself is a set called "PHP Manual".
The proposed solution is to fall back to the first <set> in the hierarchy when no <book> is found:
I have a working implementation and will submit a PR soon.






