Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extended search behaviour options #2894

Closed

Conversation

Mikk3lRo
Copy link
Contributor

@Mikk3lRo Mikk3lRo commented Oct 1, 2017

Summary

Extended chosen with several options to improve flexibility of searching significantly.

  • All changes were made in CoffeeScript files, not JavaScript files.
  • You used Grunt to build the JavaScript files and tested them locally.
  • You've updated both the jQuery and Prototype versions.
  • You haven't manually updated the version number in package.json.
  • If necessary, you've updated the documentation.

Backstory

A client of mine used his brand new system for about a month and then commented something like:
"Overall I love it! It's faster than the old system and it looks great. It's just a pity the search is so lousy."

When I dug a bit deeper it turned out he was referring to the Chosen selects I had implemented everywhere in an effort to make it easier to find stuff.

One issue for example, is that he has a bunch of Chosens where he can choose a customer based on their name. Now, some people may have the same first and last name, and some of them may have middle names. They are listed with other info that makes it possible to distinguish who's who if only they turn up in the search.

What my client means by "lousy" is that if he enters a persons first and last name, that person doesn't show up in the list if his middle name happens to also be in the database.

At this point I had to agree that "the search is lousy".

So I set out to improve it, and I believe what I have now is worth sharing.

What it is

New options:

search_term_delimiter
What: true, false, string or RegExp
Why: Massive improvement to the flexibility of searches - this was my primary goal
How: When set to true whitespaces are considered query separators, so a query with multiple words is split into terms that may be found in separate positions in the option. It also accepts a string or a RegExp, so if someone fx. wants to interpret spaces as literal, but have both asterix, question mark and comma work as "wildcards" that is possible (together with search_contains).
Defaults to false so it doesn't affect default behaviour or backwards compatibility - the entire query is interpreted as one literal term just the way it always has been unless this is set to something other than false.

search_word_boundary
What: string
Why: The default ^|\\b|\\s is remarkably bad at finding word boundaries in languages that use non-unicode characters. A word boundary is basically detected after every non-ascii character (fx. ü, å, ø and æ to mention just a few - but there are many*). I've looked into possibilities, and unfortunately there doesn't seem to be any way to get decent word-boundary detection for anything except ascii in javascripts RegExp implementation... without either using a third-party library or including some 4k+ characters in the string. Turning it into an option means that people can at least set something appropriate for their language and individual use case if they care. I know I've seen this discussed on GitHub, but can't find it at this moment.
How: Whatever this is set to will be used as the word boundary in the regexes when enable_split_word_search is true, but search_contains is false (which are the defaults!).
Defaults to ^|\\b|\\s so it doesn't affect default behaviour or backwards compatibility (I'm actually not 100% sure here - I think I saw that this was changed quite recently somewhere?).

search_in_order
What: Boolean
Why: More flexibility
How: When true terms will be matched in any order. Fx. Smith John would match John Smith. When false options will only match if the terms appear in exactly the same order they are entered in the query (but other characters may appear in between depending on other settings).
Defaults to true - because when you want to match individual terms, you most likely want to match them even in a different order, as long as the option contains all the entered terms. Does not affect default behaviour or backwards compatibility because the query is interpreted a single literal term.

search_highlight_full_match
What: Boolean
Why: Because in certain situations it just makes more sense.
How: When true the entire match will be highlighted - ie. from the first matched letter to the last.
Defaults to false - because in the vast majority of cases it is much clearer to highlight only the typed letters.

group_search (Not really new!)
What: Boolean
Why: I didn't actually add this, but it wasn't documented anywhere, so I've added it to the documentation and examples. The behaviour should be the same as previously.
How: When set to false chosen will not search inside group labels and show the entire group among the results.
Defaults to true, which means do search in group labels.

I've added a new section to the demo-page that makes it easier to see the result of all the different settings.

Compatibility and testing

I believe this correctly solves the problem that previous "fuzzy search" PR's tried to solve - but without the issues they each had.

As far as my own testing goes:

  1. It is 100% backwards compatible - if you don't set any of the new options it should behave exactly as it has previously in spite of differences in the underlying code.
  2. It correctly and consistently highlights matching terms (which - as far as I can tell - none of the previous "fuzzy search" PR's could)
  3. It respects all combinations of all pre-existing search related options (search_contains, case_sensitive_search etc.)
  4. It passes all tests.
  5. I have also added several new tests as well, to help minimize the risk of future commits inadvertently modifying the behaviour.
  6. It is fully implemented and working in both jquery and prototype version

One known issue that I haven't solved is that with search_in_order = false, search_contains = true and a query containing many very short terms that are present in a lot of options it will be rather slow. For instance typing in s a i n t h e l e n a a s c e n s will become pretty sluggish after the first 5-10 "terms". Add just one two-letter term and it dramatically improves. For now I have decided this is enough of an edge-case to not be worth fixing... I simply don't see it becoming relevant in any real-world query. I'm sure it can be fixed if anyone can come up with an example where it actually would be relevant.

Cleaning up the naming...?

I'll take the opportunity to also propose the following changes to pre-existing search-behaviour related options (which I have not included in this PR):

  • "group_search" should be "search_group_labels" - the prefix for consistency, and the "_labels" to better describe what it does.
  • "enable_split_word_search" should be "search_only_from_beginning" - which will do the opposite of what it used to - ie. it should default to false, and match only at the beginning of the label when set to true
  • "case_sensitive_search" should be "search_case_sensitive" - again for consistency

These changes can easily be implemented without breaking compatibility by preserving the current names as alternatives (maybe with a deprecation warning in the console) for the next x months / years / versions.

References

Previous PR's attempting to push flexibility into chosens master branch

#2599
#2811
#2028
#858
#867
#1037
#2028

A couple of open issues that would be fixed

#2752
#2831
... there are probably several others

New options are:

- search_term_delimiter: splits the query in "terms"
- search_word_boundary: allows setting the word boundary
- search_in_order: allows searching for terms in any order
- search_highlight_full_match: forces highlighting of full match

Apart from adding the above options, this commit contains a fairly major
rewrite of the whole search functionality.

It *should* be 100% backwards compatible, and respect all pre-existing
options.
Added tests mainly related to the new search behaviour related options.
Added descriptions of the new search-behaviour related options to the
options page, as well as a testing area on the examples-page where the
effects of different combinations of settings can be easily previewed.
@tjschuck
Copy link
Member

tjschuck commented Oct 2, 2017

@Mikk3lRo Thank you for the ambitious PR, but unfortunately I'm going to close this one as-is. Frankly, the scope of this PR is just too large, and can't really be reviewed or discussed easily.

Please consider breaking this up into smaller PRs — at the very least, one for each of the options that you added. That way, the maintainers can reasonably discuss and review them. Right now, this is just far too complex to consider without great difficulty.

Hope to see those new PRs soon — thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants