Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faceting breaks when searching for :, /, }, ], ) #218

Closed
eaquigley opened this issue Jul 9, 2014 · 5 comments
Closed

Faceting breaks when searching for :, /, }, ], ) #218

eaquigley opened this issue Jul 9, 2014 · 5 comments
Assignees
Labels
Type: Bug a defect

Comments

@eaquigley
Copy link
Contributor


Author Name: Elda Sotiri (@esotiri)
Original Redmine Issue: 3632, https://redmine.hmdc.harvard.edu/issues/3632
Original Date: 2014-03-04
Original Assignee: Elda Sotiri


Faceting breaks (the app enters "Solr down mode") when searching for :, /, }, ], )

there might be other characters, I will keep testing for more.

Links to review:

http://stackoverflow.com/questions/18277609/search-in-solr-with-special-characters

https://lucene.apache.org/core/3_6_0/queryparsersyntax.html#Escaping%20Special%20Characters


This seems like a nice current list of characters we might want to treat in a special way (escape in some cases):

  public static String escapeQueryChars(String s) {
    StringBuilder sb = new StringBuilder();
    for (int i = 0; i < s.length(); i++) {
      char c = s.charAt(i);
      // These characters are part of the query syntax and must be escaped
      if (c == '\\' || c == '+' || c == '-' || c == '!' || c == '(' || c == ')' || c == ':'
        || c == '^' || c == '[' || c == ']' || c == '\"' || c == '{' || c == '}' || c == '~'
        || c == '*' || c == '?' || c == '|' || c == '&' || c == ';' || c == '/'
        || Character.isWhitespace(c)) {
        sb.append('\\');
      }
      sb.append(c);
    }
    return sb.toString();
  }

This is from https://github.com/apache/lucene-solr/blob/lucene_solr_4_6_0/solr/solrj/src/java/org/apache/solr/client/solrj/util/ClientUtils.java#L231 which is a more recent version of the link provided here: http://lucene.472066.n3.nabble.com/What-is-the-full-list-of-Solr-Special-Characters-td4094053.html


Related issue(s): #131
Redmine related issue(s): 3540


@eaquigley
Copy link
Contributor Author


Original Redmine Comment
Author Name: Philip Durbin (@pdurbin)
Original Date: 2014-04-07T19:43:21Z


I updated the description of this ticket after doing a little more research and committed some test data:

6d33315 added test data for search on :, -, bug #3632

The problem is not solved. I'm having trouble generally with escaping the characters. See scripts/search/tests/special-characters for some example raw Solr queries.

I'm also not sure on what our approach should be. This ticket has eaten up a couple hours today. I'm going to work on more pressing matters.

@eaquigley eaquigley added this to the Dataverse 4.0: Beta 1 milestone Jul 9, 2014
@eaquigley
Copy link
Contributor Author


Original Redmine Comment
Author Name: Philip Durbin (@pdurbin)
Original Date: 2014-04-08T16:55:05Z


Recent conversation on this: http://lucene.472066.n3.nabble.com/Solr-special-characters-like-and-amp-td4129854.html

@eaquigley
Copy link
Contributor Author


Original Redmine Comment
Author Name: Philip Durbin (@pdurbin)
Original Date: 2014-05-07T17:07:20Z


I just committed this:

don't enter solrDownMode on an unparseable query #3632 · 18337ab · IQSS/dataverse - 18337ab

Here's how it looks right now from the search API:

$ curl 'http://localhost:8080/api/search?q=:'

{
    "q":":",
    "fq_provided":"[]",
    "fq_actual":"[]",
    "total_count":0,
    "start":0,
    "count_in_response":0,
    "items":"[]",
    "error":"Trouble parsing query? org.apache.solr.search.SyntaxError: Cannot parse ':': Encountered \" \":\" \": \"\" at line 1, column 0.\nWas expecting one of:\n    <NOT> ...\n    \"+\" ...\n    \"-\" ...\n    <BAREOPER> ...\n    \"(\" ...\n    \"*\" ...\n    <QUOTED> ...\n    <TERM> ...\n    <PREFIXTERM> ...\n    <WILDTERM> ...\n    <REGEXPTERM> ...\n    \"[\" ...\n    \"{\" ...\n    <LPARAMS> ...\n    <NUMBER> ...\n    <TERM> ...\n    \"*\" ...\n    "
}

@eaquigley
Copy link
Contributor Author


Original Redmine Comment
Author Name: Philip Durbin (@pdurbin)
Original Date: 2014-05-07T19:00:11Z


As of this commit...

show solr errors (if any) on debug page #3632 · c447bf2 · IQSS/dataverse - c447bf2

... you should be able to see...

"errorFromSolr: Trouble parsing query? org.apache.solr.search.SyntaxError: Cannot parse ':': Encountered " ":" ": "" at line 1, column 0. Was expecting one of: ... "+" ... "-" ... ... "(" ... "" ... ... ... ... ... ... "[" ... "{" ... ... ... ... "" ..."

... when you turn on debug=true and give it an unparseable query (i.e. "q=:")like this:

http://dvn-build.hmdc.harvard.edu/dataverse.xhtml?q=%3A&debug=true

@eaquigley
Copy link
Contributor Author


Original Redmine Comment
Author Name: Elda Sotiri (@esotiri)
Original Date: 2014-05-09T01:37:01Z


issue resolved

pdurbin added a commit that referenced this issue Dec 3, 2014
In #1042 we want to say Dataverse is down when Solr is down. Now a
message is shown, which needs cleanup.

For #218 we punted and gave no information :, /, }, ], )

Now we show an error (or unknown field searches like "foo:bar").
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Bug a defect
Projects
None yet
Development

No branches or pull requests

2 participants