-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Structure of result for completions #82
Comments
To be honest i'm not completely sure how completion is working right now. I remember that we discussed this topic last year and that we changed some things in comparison to Holumbus, but i don't feel like it does what it should do. I guess we have to define first, what kind of completions we want to support. I think in the end it comes down to this two options:
So this is a combination of the two options, but it really comes down to the same question. If the server does not know which term the user is currently typing, we would need the completions for all terms. So...
I could imagine a function that integrates the completion results into the query term (just from the top of my head - no idea if this is really possible). So that the client puts in the queryterm and the completion-results and gets back a list of new queryterms, each with a different completion applied. He could then just show this list within the auto completion box. |
So concerning the question: I think, if we could make option 2 work, we could ignore option 1 and complex queries in a first release. Then a single list of suggestions would be fine. But long term, i think the current format is what we really need. |
I just discussed this completion topic with Ulf and we came up with an idea of how this could be improved. This is not directly related to the initial proposal of this issue, but shows a way to improve the completion by building on top the proposed solution. So in an ideal case we would support the cases 1 and 2 explained earlier. This way we are able to provide completions for "word" queries and "phrase" queries. The problem really is, that applications using a search-engine like hunt or elasticsearch generally needs more complex queries. By that i mean, queries that are generated out of an UI containing select-boxes and such things. So lets consider a common search application. There is a text field like in our demo server, but there are also select-boxes to filter for "color", "car-type", "horsepower" and other things. So the text field is really just a sub-term of big complex generated query. The completion should only be applied to this sub-term. But the provided suggestions should take the whole complex query into consideration, so that the *auto-completed" query always yields results. How could we achieve this The idea is to introduce a new constructor to the query language, like this data Query
= QWord TextSearchType Text -- ^ Word search.
| QPhrase TextSearchType Text -- ^ Phrase search.
| QContext [Context] Query -- ^ Restrict a query to a list of contexts.
| QBinary BinOp Query Query -- ^ Combine two queries with a binary operation.
| QBoost Weight Query -- ^ Weight for query.
| QRange Text Text -- ^ Range query.
| QCompletionTerm Query
deriving (Eq, Show) So this new CompletionTerm is only used in the generation of completions, but is completely ignored in the general query processing. It can be viewed as an annotation for a sub-term. With this addition the client can pass a complex query to the server and define, which sub-term the completion should be applied to. This annotated sub-term would be restricted to either "Word" or "Phrase" queries. (This could be enforced by the parser or perhaps by the type directly). How would the completions be calculated? After the completions are calculated, each found suggestion needs to be inserted into the original complex query (by replacing the annotated sub-term with the suggestion-term). This should be fairly easy, because we are only dealing with "Word" and "Phrase" terms. Then this query needs to be run through the general query processor - to see if there are results for this particular query. I think this could be implemented quite easily, but would improve the completion significantly. This is a lot of stuff to go through here, but i'd love to hear your opinions on this. |
In the last 2 weeks I've worked a bit on a refactoring of the query evaluation with the goal to separate query search, which computes a set of scored docids, from query completion with a set of (scored) words as result. The key step to reach this is to write two interpreters for these tasks. Then it showes up, there's no need to extend the query language, all info necessary is already there. The new algorithm computes a list of suggestions for the rightmost primitive search node (a word or prefix search). The usual way, a user constructs a query is extending the query to the right side, The algorithm works like this: for an OR query, the left operand is ignored. The same applies Context and boost queries are simple, just the sub-queries are evaluated. For a phrase with a list of words the raw results of all sub-queries are combined to a new raw result by intersecting the occurrence sets (intersection with docidis, position intersection with an offset of 1). This result of a phrase can easily be aggregated to a scored list of words, e.g. by counting the positions and summing up all counts over all docids. First test show up to be effective, but I've not yet done a test with the Hayoo index. I expect both algorithms to be rather efficient (compared to the existing), due to the rather early aggregation of the results of sub-queries. The partial results are aggregagted in both cases as early as possible. At one point I've indeed extended the query language. For a simple word search (not a prefix search) there is a similar case to For a real list of words there is an extra case This separation leads to more modularity in the query interpreters, and by the way, to more flexibility. Now it's easy to add search ops to search not just for a sequence of words but to look into e.g. the next 2,3,4,... words or to look into the a range around a word. |
Okay, just to be sure i got this right. With this approach the completion for phrase queries works like option2 by default. What i defined as option1 could be achievied by transforming the query (by a client library perhaps). For example: Search: rud völl. If the user is currently typing völl the completions work out of the box (for the word völl). If the user is currently typing rud, the client could just switch the words and ask completions for the term völl rud. So the resulting query would be völl AND rud instead of rud AND völl. Now he gets completions for word rud. So basically, what i called annotation of the completion term is done not by annotation here, but by defaulting the completion term to the rightmost term of the query. So for complex queries, it is the same. Send the "phrase" or "word" query for completion as the rightmost term and you get suggestions. What i could not figure out, is, if this approach already takes the whole complex query into consideration? Nevertheless, this sounds really cool and will be way bedder than what we got now. |
I've just pushed the refactored scoring algorithm and the completion algorithm to github. The completion alg. computes suggestions for the very last word or prefix, but it takes into account the other sub-queries to the left. The more general algorithm, that computes completions for all primitive queries must have a more general result structure, e.g. something like the I will test the algorithm with the Hayoo index to check whether the scoring works for larger sets of words than with the simple test cases used during development. A tuning of the scoring parameters may be necessary. Currently there runs an indexing process for complete hackage on my local machine. |
Cool! How does it work for phrase queries? This request ... Exec: Completion {icPrefixCR = QPhrase QNoCase "thomas strunz", icMaxCR = 20}
execCompletion: QPhrase QNoCase "thomas strunz" ... gives me {"msg":[["Strunz",8]],"code":0} but: This request ... Exec: Completion {icPrefixCR = QPhrase QNoCase "thomas strun", icMaxCR = 20}
execCompletion: QPhrase QNoCase "thomas strun" ... gives me an empty suggestion list. {"msg":[],"code":0} I'm trying to integrate the new completions into the user interface. How should the phrase queries look like to produce suggestions? |
In the given query we have to change the exact word search to a prefix search. The phrase search QSeq Phrase [QFullWord QNoCase "thomas", QFullWord QNoCase "strunz"]
QSeq Phrase [QFullWord QNoCase "thomas", QWord QNoCase "strun"] to get all suggestions for QSeq Phrase [QWord QNoCase "thom", QWord QNoCase "strun"]
The easiest way to build these queries is using the qNext $ map (setNoCaseSearch . qWord) ["thomas", "strun"] And we can easily extend that interface by more 'smart' constructors, to make the query generation simpler, e.g. by a |
I've pushed a modified ClientInterface to github to make the query from the last comment (and others) qNext $ map (setNoCaseSearch . qWord) ["thomas", "strun"] a bit simpler qNext $ map qWordNoCase ["thomas", "strun"] |
Cool. I think we should stick to this variant: QSeq Phrase [QFullWord QNoCase "thomas", QWord QNoCase "strun"] This way we get all completions for strun that really lead to hits. If we would use For example: "thoma strun" would result in completion strunz, but the "completed" term "thoma strunz" would not find any document. Furthermore i think, this transformation has to happen in the query processor. This way it works not only for queries generated with Haskell, but also for queries generated from There is a function I tried to implement that in my latest commit. I'm not a 100% sure if it's the correct place to apply the new transformation, but it looks good to me. |
I just adjusted the server frontend. There are still some flaws in the javascript (works best in Firefox), but otherwise it looks really really nice. As long as the parentheses and quotes are closed we even get displayable suggestions for phrase and complex queries like: wer:(Rudi Völ). This is really cool! I think, there is still some room for improvement for the phrase queries. Think of a phrase search like "Wie so oft". This would not return a lot of suggestions. Maybe we could provide completions as phrases as well like oft liegt auch hier die Mitte in der Wahrheit. Just an idea. |
A generalization of the completion algorithm for queries with a phrase as rightmost sub-query (or just a phrase query) should not be a big deal. It's rather easy to get all word sequences, not only the choices for the last word. The only point is to extend the |
In discussions with Sebastian about the completions, we both had the impression the existing structure of the completion is a bit messy to be processed by a client.
A nested list of words
[(Word,[Word])]
is in most cases a bit of an overkill, usually only the last word of a list of words (a phrase) query is of interest, so a[Word]
or a[(Word,Score)]
for completion of the last word would be sufficient (This only would concern the result, when the query is evaluated all the other words would all be taken into account).When working on the scoring algorithms for the completions, I've one rather simple algorithm (processing arbitrary queries), that computes a scored list of words suited for completions. In the local version, I've added a new
CmdResult
variantResSuggestions
that delivers a scored (and sorted by score) list of words. TheResCompletion
is still there, but not used.Is this simplified version o.k.? Is this a version sufficient with the 1. release or do we need the more advanced version too? Any suggestions?
The text was updated successfully, but these errors were encountered: