Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

better query support for exact matching #60

Open
bnewbold opened this issue Mar 9, 2021 · 0 comments
Open

better query support for exact matching #60

bnewbold opened this issue Mar 9, 2021 · 0 comments

Comments

@bnewbold
Copy link
Contributor

bnewbold commented Mar 9, 2021

Partially a query parsing issue, but likely to also be an indexing issue.

Especially in technical fields, or when doing digital humanities-style queries, there are a lot of valid queries which include meta characters. Not clear how to represent many of these in the Lucene query syntax, or to escape out to a simpler syntax. Also not clear how many of these can even be handled by the query engine. Some examples:

  • A* search in computer science ("A star" algorithm)
  • identifiers used in bio-medicine. could try to query by prefix, suffix, or sub-patterns. sometimes dashes, periods, spaces, or other characters have meaning
  • math. even simple things like searching for exponentiation. or symbols like β (\beta in LaTeX). appear in titles, abstracts, body, citations, etc. do we flatten these down (in a unicode-aware way) to, eg, "b" for indexing? expand "beta"? other isues: function syntax, arrows, primes, dots, set inclusion, real numbers ("R"), integers ("N"), dot product, etc.
  • chemical formula: arrows, other notation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant