Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Breaks are converted to carriage returns in highlights #4

Open
ThomasSchellenbergNextCentury opened this issue Oct 31, 2017 · 1 comment

Comments

@ThomasSchellenbergNextCentury

Breaks (<br/>) from titles/descriptions are being converted into carriage returns (\r) in the highlight results from sandpaper. We need the highlighted title/description text to have breaks in order to show line breaks in the DIG UI.

Here is the knowledge_graph->description->value:

"<br/> nav <br/> search <br/>            los angeles, ca<br/>           <br/>           <br/>           <br/>            free classifieds<br/>           <br/> The requested ad could not be found.<br/>  <br/>  <br/>    <br/>    <br/>        <br/>           Recent escorts ads. <br/> Posted: Sun. May. 22, 6:21 AM <br/> Posted: Sun. May. 22, 6:17 AM <br/> Posted: Sun. May. 22, 6:17 AM <br/> Posted: Sun. May. 22, 6:16 AM <br/> Posted: Sun. May. 22, 6:13 AM <br/> Posted: Sun. May. 22, 6:12 AM <br/> Posted: Sun. May. 22, 6:10 AM <br/> Posted: Sun. May. 22, 6:09 AM <br/> Posted: Sun. May. 22, 6:06 AM <br/> Posted: Sun. May. 22, 6:06 AM <br/>"

Here is the highlight->content_extraction.content_strict.text

"\n \n search \n \n \n \n \n \n \r\n            <em>los</em> <em>angeles</em>, ca\r\n           \r\n           \r\n           \r\n            free classifieds\r\n           \n \n \n \n \n \n The requested ad could not be found.\r\n\r\n\r\n  \r\n  \r\n    \r\n    \r\n        \r\n           Recent escorts ads. \n \n Posted: Sun. May. 22, 6:21 AM \n \n \n Posted: Sun. May. 22, 6:17 AM \n \n \n Posted: Sun. May. 22, 6:17 AM \n \n \n Posted: Sun. May. 22, 6:16 AM \n \n \n Posted: Sun. May. 22, 6:13 AM \n \n \n Posted: Sun. May. 22, 6:12 AM \n \n \n Posted: Sun. May. 22, 6:10 AM \n \n \n Posted: Sun. May. 22, 6:09 AM \n \n \n Posted: Sun. May. 22, 6:06 AM \n \n \n Posted: Sun. May. 22, 6:06 AM \n \n \n \n \n"

Here is my sandpaper query on http://10.3.2.82:9876/search/coarse

{"SPARQL":{"group-by":{"limit":1,"offset":0},"select":{"variables":[{"type":"simple","variable":"?ad"}]},"where":{"clauses":[{"constraint":"los angeles","isOptional":false,"predicate":"city"}],"filters":[],"type":"Ad","variable":"?ad"}},"type":"Point Fact"}

Here is the link to the ES document: http://10.1.94.103:9201/dig-etk-search/ads/CDFDF087781B7FCEFD7CEA46A739DAB72F26434CF6B7BE5D34865CAE48243B76

@ThomasSchellenbergNextCentury
Copy link
Author

ThomasSchellenbergNextCentury commented Oct 31, 2017

@jasonslepicka
After talking with @saggu it sounds like the issue is that the <br/> tags are only in knowledge_graph but the highlights only use content_extraction or indexed. In order to have both highlights and line breaks, we need to either:

  • Clean the newlines and change them to <br/> tags within content_extraction
  • Have sandpaper return highlights from knowledge_graph

We don't want to just use the raw text because part of the cleaning process includes removing excess newlines.

saggu added a commit that referenced this issue Aug 3, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant