You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In #154, we addressed the issue of missing pages in the search index. The root cause was that some index entries lacked a title property (sdesc). We resolved this by applying the same solution used for the manual content: using the description (ldesc) as the title.
To avoid duplicating text in both the title and description fields, we now pull the description from the parent <book>. For example:
In this example, the title "Type" was taken from the page description, while the new description ("Language Reference") comes from the parent <book>. You can see the implementation here:
Some entries, like extension main pages (e.g. book.strings, book.zip) and top-level pages (e.g. copyright, getting-started, security), don’t have a parent <book>. In these cases, the description is being reused as the title, resulting in duplicate content:
Proposed fix
While some entries lack a parent <book>, every entry has at least one parent <set>. The root entry itself is a set called "PHP Manual".
The proposed solution is to fall back to the first <set> in the hierarchy when no <book> is found:
I have a working implementation and will submit a PR soon.
The text was updated successfully, but these errors were encountered:
This commit enhances the search index generation process by providing
more meaningful descriptions for entries that lack a parent <book>
element. Additionally, refactors writeJsonIndex() into smaller methods.
Fixesphp#159
lhsazevedo
changed the title
Duplicate titles and descriptions in search index for chunks without parent book
Duplicated titles and descriptions in search index for chunks without parent book
Oct 9, 2024
Background
In #154, we addressed the issue of missing pages in the search index. The root cause was that some index entries lacked a title property (
sdesc
). We resolved this by applying the same solution used for the manual content: using the description (ldesc
) as the title.To avoid duplicating text in both the title and description fields, we now pull the description from the parent
<book>
. For example:In this example, the title "Type" was taken from the page description, while the new description ("Language Reference") comes from the parent
<book>
. You can see the implementation here:phd/phpdotnet/phd/Package/PHP/Web.php
Lines 244 to 258 in 673b2da
Issue
Some entries, like extension main pages (e.g.
book.strings
,book.zip
) and top-level pages (e.g.copyright
,getting-started
,security
), don’t have a parent<book>
. In these cases, the description is being reused as the title, resulting in duplicate content:Proposed fix
While some entries lack a parent
<book>
, every entry has at least one parent<set>
. The root entry itself is a set called "PHP Manual".The proposed solution is to fall back to the first
<set>
in the hierarchy when no<book>
is found:I have a working implementation and will submit a PR soon.
The text was updated successfully, but these errors were encountered: