-
Notifications
You must be signed in to change notification settings - Fork 781
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP HTML Reporter: Fuzzy search using fuse #1442
Conversation
RE your comment from the other PR. I do agree the results aren't as good as the fuzzysearch implementation. Would you mind tweaking the settings a bit to see if you can tune it to get better results? I'm more partial to this approach because I like the API a bit more and like that the package seems to still be maintained and developed (unlike fuzzysearch). That said, ultimately, the user experience is more important, so if we can't get good results with this library, we'll go the other route. |
Thanks @trentmwillis. I agree that code-wise I did a few tests, but tweaking the configuration parameters doesn't seem to be very helpful, let me take you briefly through my thought process. There are four parameters:
In general it looks like I hope this helps, let me know if you have other ideas, more than happy to brainstorm and try out more things. |
@trentmwillis any thoughts on the above? |
Hey @ventuno, sorry been busy recently. After looking into the problem a bit more, I think what we'd ideally want is the ability to get a separate "fuzziness" score from Fuse.js, which may be possible eventually but not right now: krisk/Fuse#397. If we had that, we could probably provide a better set of results and sorting, since we don't really care about where in the title a match is found (in other words, we don't care about Given the current API, the only potential adjustments I think we could try would be to:
Let me know if you think either of those are worth pursuing. |
Thanks @trentmwillis!
First, let's take a look at the result object, so we're on the same page (I redacted the content of
Let's focus on My first idea was to compare two results by finding the one with longest unique match, let's call this function computeScore1(result) {
return Math.max(...result.matches[0].indices.map(m => m[1] - m[0]));
} This works well if we are searching entire words (e.g.: "beforeEach"), but doesn't work as well if compound words (e.g.: "promiseawarebeforeEach"), because we will still give more weight to the longest continuous match, so I introduced function computeScore2(result) {
return result.matches[0].indices.map(m => m[1] - m[0]).reduce((acc, curr) => acc+curr, 0);
} Of course this approach has its own issues as many 1-character matches could overweight an overall smaller but continuous match. We could potentially "weight" each "tuple" so that longest ones are more relevant, but this gets complicated very quickly. I used the above functions in function filterModules( searchText ) {
if ( searchText === "" ) {
return config.modules;
}
return fuse.search( searchText )
.sort( ( m1, m2 ) => {
return computeScore<N>(m2) - computeScore<N>(m1);
})
.map(result => result.item);
} And collected the results (see screenshots below): Search "beforeEach" with
|
Thanks for the detailed write-up. I agree that this does seem to yield better results but the added complexity is undesirable. After thinking about it a bit more, I think we should go with the |
Thanks @trentmwillis I agree with you and it's probably better to go with |
Happened to see this in my mentions. So, hope you don't mind my jumping in here 😄
Yes. That is correct.
Did you try setting
A little more complicated, but see here for details on Fuse.js scoring mechanism. Nonetheless, I do actually plan to add an option to Fuse.js to ignore distance. It seems to have become a desired feature. cc: @ventuno, @trentmwillis |
Thanks @krisk for commenting on this.
I tried, but I had issues identifying a good "threshold" to use to filter out bad results.
We already merged the |
@ventuno you can use const list = [
'beforeEach',
'async beforeEach test',
'assert.async in beforeEach',
'Module with Promise-aware beforeEach',
'Promise-aware return values without beforeEach/afterEach',
'before',
'before (skip)'
] When searching [
{ item: 'beforeEach', refIndex: 0, score: 0 },
{
item: 'async beforeEach test',
refIndex: 1,
score: 0.1972392177586917
},
{ item: 'before', refIndex: 5, score: 0.4 },
{
item: 'assert.async in beforeEach',
refIndex: 2,
score: 0.4493775633055149
},
{ item: 'before (skip)', refIndex: 6, score: 0.5231863610884103 },
{
item: 'Module with Promise-aware beforeEach',
refIndex: 3,
score: 0.5916079783099616
},
{
item: 'Promise-aware return values without beforeEach/afterEach',
refIndex: 4,
score: 0.699819425905295
}
] When [
{ item: 'beforeEach', refIndex: 0, score: 0 },
{
item: 'async beforeEach test',
refIndex: 1,
score: 0.01857804455091699
},
{
item: 'assert.async in beforeEach',
refIndex: 2,
score: 0.01857804455091699
},
{
item: 'Module with Promise-aware beforeEach',
refIndex: 3,
score: 0.03162277660168379
},
{
item: 'Promise-aware return values without beforeEach/afterEach',
refIndex: 4,
score: 0.045603691595129614
},
{ item: 'before', refIndex: 5, score: 0.4 },
{ item: 'before (skip)', refIndex: 6, score: 0.5231863610884103 }
] Let me how that works out for you. If it's good, I'll accelerate to beta and latest. [Tracking krisk/Fuse#438] |
Thanks @krisk. I replayed these changes with the recommended version and configuration in #1447 (search results screenshots are attached to the description). The results for "beforeEach" are indeed a lot better. I'm still puzzled by the order for "promiseawarebeforeEach", see how "Promise-aware return values without beforeEach/afterEach" comes after "Module with Promise-aware afterEach" or "async beforeEach test". Please let's continue this conversation in #1447, I wanted to tag you there, but GH won't let me (cc @trentmwillis). |
Sample implementation of: #1438 (
Fuse.js
-based).Replaces #1441.