WIP HTML Reporter: Fuzzy search using fuse #1442

ventuno · 2020-05-19T03:02:47Z

Sample implementation of: #1438 (Fuse.js-based).
Replaces #1441.

trentmwillis · 2020-05-21T00:54:49Z

RE your comment from the other PR. I do agree the results aren't as good as the fuzzysearch implementation. Would you mind tweaking the settings a bit to see if you can tune it to get better results?

I'm more partial to this approach because I like the API a bit more and like that the package seems to still be maintained and developed (unlike fuzzysearch). That said, ultimately, the user experience is more important, so if we can't get good results with this library, we'll go the other route.

ventuno · 2020-05-21T04:26:23Z

Thanks @trentmwillis. I agree that code-wise Fuse.js is the obvious choice. It's definitely a very active project and calling search without having to pass the targets each time is a much cleaner API. Moreover, in order to import fuzzysort we need to rely on an older version rollup-plugin-commonjs (but updating the way qunit.js is built is a different conversation).

I did a few tests, but tweaking the configuration parameters doesn't seem to be very helpful, let me take you briefly through my thought process. There are four parameters:

Key Weight: this is useless in our case as we only search based on one key (module name);
location (default: 0)/distance (default: 100)/threshold (default: 0.6): these are used to bind the search in the string between location and distance*threshold. This means that by default a match will only trigger a result if it's in the range [0, 60] of the search string.
As none of the modules in our example is longer than 60 characters I don't believe these values are affecting at all our search results (and nothing in tweaking these values will improve results).

In general it looks like Fuse.js will favor results which are closer to the beginning of string, that's probably why in my tests "before" or "before (skip)" score better than "Module with Promise-aware beforeEach".

I hope this helps, let me know if you have other ideas, more than happy to brainstorm and try out more things.

ventuno · 2020-05-29T01:20:35Z

@trentmwillis any thoughts on the above?

trentmwillis · 2020-06-01T05:00:55Z

Hey @ventuno, sorry been busy recently. After looking into the problem a bit more, I think what we'd ideally want is the ability to get a separate "fuzziness" score from Fuse.js, which may be possible eventually but not right now: krisk/Fuse#397. If we had that, we could probably provide a better set of results and sorting, since we don't really care about where in the title a match is found (in other words, we don't care about distance really).

Given the current API, the only potential adjustments I think we could try would be to:

Try to filter out low score matches (i.e., the results for <script> that seem way off)
Use the includeMatches option to get the actual matched term and use that to group/sort the results (i.e., make the results for beforeEach seem less randomly ordered)

Let me know if you think either of those are worth pursuing.

ventuno · 2020-06-01T07:02:29Z

Thanks @trentmwillis!

I tried filtering the results before, but unlike fuzzysort the documentation doesn't provide a good reference value to filter out bad results, so it was pretty difficult to identify a value that wouldn't remove relevant results. In general, I don't care how many results I get as long as the most relevant ones are on top (think of "page 2" in a Google search);
is actually an interesting idea. I played around with it a bit, see my thoughts below.

First, let's take a look at the result object, so we're on the same page (I redacted the content of item as it's just the module object):

{
  "item": { ... },
  "refIndex": 40,
  "matches": [
    {
      "indices": [
        [0, 5],
        [7, 16]
      ],
      "value": "before/beforeEach/afterEach/after",
      "key": "name"
}]}

Let's focus on matches. It's an array of objects, each object represents the matches for a given key, in our case as we only have one key (the module name) so we only have one item in the array of matches. The matches object has an array of indices which contains "tuples" indicating the [<start_index>, <end_index>] of a match. In the above example the library matched: "before/beforeEach/afterEach/after".

My first idea was to compare two results by finding the one with longest unique match, let's call this computeScore1:

function computeScore1(result) {
  return Math.max(...result.matches[0].indices.map(m => m[1] - m[0]));
}

This works well if we are searching entire words (e.g.: "beforeEach"), but doesn't work as well if compound words (e.g.: "promiseawarebeforeEach"), because we will still give more weight to the longest continuous match, so I introduced computeScore2 which by adding the overall match length improves searches for compound words:

function computeScore2(result) {
  return result.matches[0].indices.map(m => m[1] - m[0]).reduce((acc, curr) => acc+curr, 0);
}

Of course this approach has its own issues as many 1-character matches could overweight an overall smaller but continuous match. We could potentially "weight" each "tuple" so that longest ones are more relevant, but this gets complicated very quickly.

I used the above functions in filterModules (replace computeScore<N> with computeScore1 or computeScore2):

function filterModules( searchText ) {
  if ( searchText === "" ) {
    return config.modules;
  }
  return fuse.search( searchText )
    .sort( ( m1, m2 ) => {
      return computeScore<N>(m2) - computeScore<N>(m1);
    })
    .map(result => result.item);
}

And collected the results (see screenshots below):

Search "beforeEach" with `computeScore1`

Search "promiseawarebeforeEach" with `computeScore1`

Search "beforeEach" with `computeScore2`

Search "promiseawarebeforeEach" with `computeScore2`

Search "beforeEach" with `fuzzysort`

Search "promiseawarebeforeEach" with `fuzzysort`

Search "promiseawarebeforeEach" with "vanilla" `Fuse.js`

In general, as much as I like the idea and seems to yield promising results, I'm not sure if it's a step in the right direction: ideally I would like to pick a library that works out of the box. I'm also not sure if custom sorting of results will make sense with future versions of Fuse.js.
Thoughts?

trentmwillis · 2020-06-04T00:22:23Z

Thanks for the detailed write-up. I agree that this does seem to yield better results but the added complexity is undesirable.

After thinking about it a bit more, I think we should go with the fuzzysearch. Even though it hasn't had a new release in sometime, it is likely not something we'll need to update (assuming it gives good results). Agree?

ventuno · 2020-06-04T00:39:46Z

Thanks @trentmwillis I agree with you and it's probably better to go with fuzzysort instead (#1440). I use qunit at work so I'll be heavily using this and should know if anything is off.
This doesn't stop us from keeping an eye on Fuse.js (and krisk/Fuse#397) and switch to Fuse.js later as this PR will serve as a template for future testing. Closing for now.

krisk · 2020-06-13T00:30:18Z

Happened to see this in my mentions. So, hope you don't mind my jumping in here 😄

In general it looks like Fuse.js will favor results which are closer to the beginning of string, that's probably why in my tests "before" or "before (skip)" score better than "Module with Promise-aware beforeEach".

Yes. That is correct.

I tried filtering the results before, but unlike fuzzysort the documentation doesn't provide a good reference value to filter out bad results, so it was pretty difficult to identify a value that wouldn't remove relevant results. In general, I don't care how many results I get as long as the most relevant ones are on top (think of "page 2" in a Google search);

Did you try setting includeScore? This would add the relevance score in each search result, which you could then use to further refine the results.

Hey @ventuno, sorry been busy recently. After looking into the problem a bit more, I think what we'd ideally want is the ability to get a separate "fuzziness" score from Fuse.js, which may be possible eventually but not right now: krisk/Fuse#397. If we had that, we could probably provide a better set of results and sorting, since we don't really care about where in the title a match is found (in other words, we don't care about distance really).

A little more complicated, but see here for details on Fuse.js scoring mechanism. Nonetheless, I do actually plan to add an option to Fuse.js to ignore distance. It seems to have become a desired feature.

cc: @ventuno, @trentmwillis

ventuno · 2020-06-13T00:38:57Z

Thanks @krisk for commenting on this.

Did you try setting includeScore? This would add the relevance score in each search result, which you could then use to further refine the results.

I tried, but I had issues identifying a good "threshold" to use to filter out bad results.

Nonetheless, I do actually plan to add an option to Fuse.js to ignore distance. It seems to have become a desired feature.

We already merged the fuzzysort version, but feel free to ping me directly or this conversation and I can give Fuse.js another try.

krisk · 2020-06-13T01:39:42Z

@ventuno you can use fuse.js@6.1.0-alpha.0. Set the option ignoreLocation to true should produce the desired behavior.

const list = [
  'beforeEach',
  'async beforeEach test',
  'assert.async in beforeEach',
  'Module with Promise-aware beforeEach',
  'Promise-aware return values without beforeEach/afterEach',
  'before',
  'before (skip)'
]

When searching "beforeEach". Default behavior (ignoreLocation is false):

[
  { item: 'beforeEach', refIndex: 0, score: 0 },
  {
    item: 'async beforeEach test',
    refIndex: 1,
    score: 0.1972392177586917
  },
  { item: 'before', refIndex: 5, score: 0.4 },
  {
    item: 'assert.async in beforeEach',
    refIndex: 2,
    score: 0.4493775633055149
  },
  { item: 'before (skip)', refIndex: 6, score: 0.5231863610884103 },
  {
    item: 'Module with Promise-aware beforeEach',
    refIndex: 3,
    score: 0.5916079783099616
  },
  {
    item: 'Promise-aware return values without beforeEach/afterEach',
    refIndex: 4,
    score: 0.699819425905295
  }
]

When ignoreLocation is true:

[
  { item: 'beforeEach', refIndex: 0, score: 0 },
  {
    item: 'async beforeEach test',
    refIndex: 1,
    score: 0.01857804455091699
  },
  {
    item: 'assert.async in beforeEach',
    refIndex: 2,
    score: 0.01857804455091699
  },
  {
    item: 'Module with Promise-aware beforeEach',
    refIndex: 3,
    score: 0.03162277660168379
  },
  {
    item: 'Promise-aware return values without beforeEach/afterEach',
    refIndex: 4,
    score: 0.045603691595129614
  },
  { item: 'before', refIndex: 5, score: 0.4 },
  { item: 'before (skip)', refIndex: 6, score: 0.5231863610884103 }
]

Let me how that works out for you. If it's good, I'll accelerate to beta and latest.

[Tracking krisk/Fuse#438]

ventuno · 2020-06-13T22:31:12Z

Thanks @krisk. I replayed these changes with the recommended version and configuration in #1447 (search results screenshots are attached to the description).

The results for "beforeEach" are indeed a lot better. I'm still puzzled by the order for "promiseawarebeforeEach", see how "Promise-aware return values without beforeEach/afterEach" comes after "Module with Promise-aware afterEach" or "async beforeEach test".

Please let's continue this conversation in #1447, I wanted to tag you there, but GH won't let me (cc @trentmwillis).

HTML Reporter: Fuzzy search using fuse.

4e4c39c

This was referenced May 19, 2020

Support fuzzy search in HTML reporter's module filter #1438

Closed

WIP HTML Reporter: Fuzzy search using fuse #1441

Closed

ventuno closed this Jun 4, 2020

ventuno deleted the ftr-1438-fuse-3 branch June 4, 2020 00:43

ventuno mentioned this pull request Jun 13, 2020

WIP: HTML Reporter: Fuzzy search using fuse@6.1.0-alpha.1 #1447

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP HTML Reporter: Fuzzy search using fuse #1442

WIP HTML Reporter: Fuzzy search using fuse #1442

ventuno commented May 19, 2020 •

edited

Loading

trentmwillis commented May 21, 2020

ventuno commented May 21, 2020 •

edited

Loading

ventuno commented May 29, 2020

trentmwillis commented Jun 1, 2020

ventuno commented Jun 1, 2020

trentmwillis commented Jun 4, 2020

ventuno commented Jun 4, 2020

krisk commented Jun 13, 2020 •

edited

Loading

ventuno commented Jun 13, 2020

krisk commented Jun 13, 2020 •

edited

Loading

ventuno commented Jun 13, 2020

WIP HTML Reporter: Fuzzy search using fuse #1442

WIP HTML Reporter: Fuzzy search using fuse #1442

Conversation

ventuno commented May 19, 2020 • edited Loading

trentmwillis commented May 21, 2020

ventuno commented May 21, 2020 • edited Loading

ventuno commented May 29, 2020

trentmwillis commented Jun 1, 2020

ventuno commented Jun 1, 2020

Search "beforeEach" with computeScore1

Search "promiseawarebeforeEach" with computeScore1

Search "beforeEach" with computeScore2

Search "promiseawarebeforeEach" with computeScore2

Search "beforeEach" with fuzzysort

Search "promiseawarebeforeEach" with fuzzysort

Search "promiseawarebeforeEach" with "vanilla" Fuse.js

trentmwillis commented Jun 4, 2020

ventuno commented Jun 4, 2020

krisk commented Jun 13, 2020 • edited Loading

ventuno commented Jun 13, 2020

krisk commented Jun 13, 2020 • edited Loading

ventuno commented Jun 13, 2020

ventuno commented May 19, 2020 •

edited

Loading

ventuno commented May 21, 2020 •

edited

Loading

Search "beforeEach" with `computeScore1`

Search "promiseawarebeforeEach" with `computeScore1`

Search "beforeEach" with `computeScore2`

Search "promiseawarebeforeEach" with `computeScore2`

Search "beforeEach" with `fuzzysort`

Search "promiseawarebeforeEach" with `fuzzysort`

Search "promiseawarebeforeEach" with "vanilla" `Fuse.js`

krisk commented Jun 13, 2020 •

edited

Loading

krisk commented Jun 13, 2020 •

edited

Loading