whitespace at end of search string yields results; lack of it yields no results #3

kshahkshah · 2017-04-02T15:26:38Z

very simply, when I am searching for:

"My Search String" and there is a perfect match in the corpus provided, I am getting return no results.

However, with a single space "My Search String " I'm able to get results, though the top result is similar to but not the exact match. I'm going to dive into this further, note that the actual string in question is 10 characters long, and I have not been able to create a reduced example.

Elements from actual corpus of products that are being searched look like:

My Search String
My Search Attribute1 String
My Search Attribute2 String
My Search Attribute3 String

Again when searching for "My Search String " I'm returned "My Search Attribute 1 String" as my top result.

Also, again, I tried building a reduced example of this behaviour along these contrived lines but failed to do so, so it'll merit additional investigation.

Though perhaps you can suggest ways I can debug?

kshahkshah · 2017-04-02T15:27:02Z

Also I'm happily to provide the actual corpus privately

brianhempel · 2017-04-03T14:24:45Z

This line of code is probably the issue: https://github.com/brianhempel/fuzzy_tools/blob/master/lib/fuzzy_tools/tf_idf_index.rb#L53

The first round of pruning for search only matches on not-low-value tokens. It's not a great heuristic. Not sure how to replace it.

kshahkshah · 2017-04-03T14:29:48Z

Ahh, yes, I suspected it might be because it is such a common phrase it gets removed as useless, but that tends to not model a master X and variants of X well.

Let me see if I can tune that parameter. I wonder how other implementations accomplish this as well.

Thanks again for getting back so quickly.

brianhempel · 2017-04-03T14:32:18Z

Yeah, it's kind of like "stop words" in other implementations except that there's no hard-coded list of stop words. Instead, the lowest 1/16th of the tokens are not used for finding the initial candidate documents.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

whitespace at end of search string yields results; lack of it yields no results #3

whitespace at end of search string yields results; lack of it yields no results #3

kshahkshah commented Apr 2, 2017

kshahkshah commented Apr 2, 2017

brianhempel commented Apr 3, 2017

kshahkshah commented Apr 3, 2017

brianhempel commented Apr 3, 2017

whitespace at end of search string yields results; lack of it yields no results #3

whitespace at end of search string yields results; lack of it yields no results #3

Comments

kshahkshah commented Apr 2, 2017

kshahkshah commented Apr 2, 2017

brianhempel commented Apr 3, 2017

kshahkshah commented Apr 3, 2017

brianhempel commented Apr 3, 2017