You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Breaks (<br/>) from titles/descriptions are being converted into carriage returns (\r) in the highlight results from sandpaper. We need the highlighted title/description text to have breaks in order to show line breaks in the DIG UI.
Here is the knowledge_graph->description->value:
"<br/> nav <br/> search <br/> los angeles, ca<br/> <br/> <br/> <br/> free classifieds<br/> <br/> The requested ad could not be found.<br/> <br/> <br/> <br/> <br/> <br/> Recent escorts ads. <br/> Posted: Sun. May. 22, 6:21 AM <br/> Posted: Sun. May. 22, 6:17 AM <br/> Posted: Sun. May. 22, 6:17 AM <br/> Posted: Sun. May. 22, 6:16 AM <br/> Posted: Sun. May. 22, 6:13 AM <br/> Posted: Sun. May. 22, 6:12 AM <br/> Posted: Sun. May. 22, 6:10 AM <br/> Posted: Sun. May. 22, 6:09 AM <br/> Posted: Sun. May. 22, 6:06 AM <br/> Posted: Sun. May. 22, 6:06 AM <br/>"
Here is the highlight->content_extraction.content_strict.text
"\n \n search \n \n \n \n \n \n \r\n <em>los</em> <em>angeles</em>, ca\r\n \r\n \r\n \r\n free classifieds\r\n \n \n \n \n \n \n The requested ad could not be found.\r\n\r\n\r\n \r\n \r\n \r\n \r\n \r\n Recent escorts ads. \n \n Posted: Sun. May. 22, 6:21 AM \n \n \n Posted: Sun. May. 22, 6:17 AM \n \n \n Posted: Sun. May. 22, 6:17 AM \n \n \n Posted: Sun. May. 22, 6:16 AM \n \n \n Posted: Sun. May. 22, 6:13 AM \n \n \n Posted: Sun. May. 22, 6:12 AM \n \n \n Posted: Sun. May. 22, 6:10 AM \n \n \n Posted: Sun. May. 22, 6:09 AM \n \n \n Posted: Sun. May. 22, 6:06 AM \n \n \n Posted: Sun. May. 22, 6:06 AM \n \n \n \n \n"
Here is my sandpaper query on http://10.3.2.82:9876/search/coarse
@jasonslepicka
After talking with @saggu it sounds like the issue is that the <br/> tags are only in knowledge_graph but the highlights only use content_extraction or indexed. In order to have both highlights and line breaks, we need to either:
Clean the newlines and change them to <br/> tags within content_extraction
Have sandpaper return highlights from knowledge_graph
We don't want to just use the raw text because part of the cleaning process includes removing excess newlines.
Breaks (
<br/>
) from titles/descriptions are being converted into carriage returns (\r
) in the highlight results from sandpaper. We need the highlighted title/description text to have breaks in order to show line breaks in the DIG UI.Here is the
knowledge_graph->description->value
:Here is the
highlight->content_extraction.content_strict.text
Here is my sandpaper query on
http://10.3.2.82:9876/search/coarse
Here is the link to the ES document: http://10.1.94.103:9201/dig-etk-search/ads/CDFDF087781B7FCEFD7CEA46A739DAB72F26434CF6B7BE5D34865CAE48243B76
The text was updated successfully, but these errors were encountered: