-
Notifications
You must be signed in to change notification settings - Fork 17
Query in Information Retrieval (IR)
query compose keyword and documents containing a keyword that searched for
+Single word query
+Context query: given context near other words as Phrase, Proximity
+Boolean query: using Boolean operators (AND, OR, BUT) with a syntax composed of atoms
+Natural Language: an enumeration of words and context queries which are of interest to the user.
Using a concept of the pattern (set of syntactic features that must be found in a text segment) allow retrieving pieces of text that have some property.
Type of patterns such as words, prefixes, suffixes, substrings, ranges, or regular expression
- Boolean queries:
Exact match
- Wildcard queries:
Words have many accepted spellings such as ar or bar*l
Mapping these patterns to term/s in vocabulary. Can result in expensive query execution.
- Phrase queries:
The process as did with the wildcard. The representation of documents as vectors. The relative
order of terms in a document is lost in the encoding of a document as a vector.
-
Cluster pruning
-
Tiered indexes
-
Query-term proximity
-
Designing parsing and scoring functions
-
Vector space scoring and query operator interaction for free text queries
Reference:
Manning, C.D., Raghaven, P., & Schütze, H. (2009). An Introduction to Information Retrieval (Online ed.). Cambridge, MA: Cambridge University Press. Available at http://nlp.stanford.edu/IR-book/information-retrieval-book.html