Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search: strip HTML tags while indexing #1332

Closed
pdurbin opened this issue Jan 13, 2015 · 6 comments
Closed

Search: strip HTML tags while indexing #1332

pdurbin opened this issue Jan 13, 2015 · 6 comments

Comments

@pdurbin
Copy link
Member

pdurbin commented Jan 13, 2015

#1320 is a high level ticket about supporting HTML. This ticket is about stripping out those HTML tag while we are indexing fields into Solr.

@pdurbin pdurbin self-assigned this Jan 13, 2015
@pdurbin pdurbin added this to the In Review - Dataverse 4.0 milestone Jan 13, 2015
@scolapasta scolapasta modified the milestones: Beta 13 - Dataverse 4.0, In Review - Dataverse 4.0 Jan 23, 2015
@scolapasta scolapasta modified the milestones: Beta 13 - Dataverse 4.0, In Review - Dataverse 4.0, Beta 14 - Dataverse 4.0 Feb 6, 2015
@scolapasta scolapasta modified the milestones: Beta 14 - Dataverse 4.0, In Review - Dataverse 4.0 Feb 20, 2015
@pdurbin
Copy link
Member Author

pdurbin commented Mar 5, 2015

As of fa3a94f I'm using JSoup to strip HTML tags before indexing the following:

  • descriptions of dataverses
  • dataset fields whose FieldType is "TEXTBOX"

@scolapasta said it's fine that there is no HTML in the cards. For example, if you put a link in a dataverse description, you can click it from the dataverse landing page but not from the card for that dataverse. We do this because we don't want <h1> tags and other weird stuff to be put in the cards.

Passing to QA.

@pdurbin pdurbin removed their assignment Mar 5, 2015
@kcondon kcondon assigned sbarbosadataverse and unassigned kcondon Mar 18, 2015
@sbarbosadataverse
Copy link

@kcondon --I won't be able to see if this (html tags) is showing up until we migrate correct?

@pdurbin
Copy link
Member Author

pdurbin commented Mar 30, 2015

@sbarbosadataverse the way you would test this:

  • put some HTML in some places (descriptions, etc.)
  • make sure the HTML is showing up when you look at the page (you can see bold or links or whatever)
  • do a search on words you used in the HTML
  • see if you can see any HTML tags when you search (you shouldn't!)

I hope this helps!

@sbarbosadataverse
Copy link

@pdurbin
while html tags don't show up on search facets displayed, if i search for "http" I get. If this is expected I will close. Not sure if we can do anything about it

0qwowmyq7egvehvzgamkvxpvlx1bmmtudgfy-lih7j8

@pdurbin
Copy link
Member Author

pdurbin commented Apr 9, 2015

@sbarbosadataverse that's what I expect to see. Thanks for the screenshot. Please feel free to get other opinions before closing if you have any doubt, though.

@sbarbosadataverse
Copy link

works as designed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants