Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Improve search #95

Closed
Brilator opened this issue Jan 27, 2021 · 6 comments
Closed

[BUG] Improve search #95

Brilator opened this issue Jan 27, 2021 · 6 comments
Labels
Type: Bug Something is not working, and it is confirmed by maintainers to be a bug.

Comments

@Brilator
Copy link
Member

Describe the bug
I'm sometimes a bit confused by the search results.

To Reproduce
a) In "Annotation building block selection"

  1. Search for the treatment "red light"
  2. See the color "light red" as first result.

b) In "Advanced Search"

  1. Search by "Term name keywords": "red light exposure"
  2. The exact match "red light exposure" PECO:0007207 is the 9th hit

c) In "Advanced Search"

  1. Search by "Name must contain": "red light exposure"
  2. The exact match "red light exposure" PECO:0007207 is the 5th hit

Expected behavior
The search results should be displayed in the order:
exact match > same order of keywords (see (a)) > match as many keywords as possible > match any

@Brilator Brilator added the Type: Bug Something is not working, and it is confirmed by maintainers to be a bug. label Jan 27, 2021
@Freymaurer
Copy link
Collaborator

Freymaurer commented Jan 27, 2021

Maybe i'll start by giving a bit more insight into how we order our search results:

We use a variant of a search algorithm called "sorensen dice" which compares small subelements of two text strings we want to compare. The more equal sub elements both strings contain compared to their combined length the better the score.

This is why "red light" matches "light red" slightly better than "red light exposure".

But as you suggested we will consider tweaking this search a bit to increase the score for "exact matches".

Your criticism of the advanced term search is absolutely justified and I have noticed that the soerensen dice algorithm is not applied to this search. I added it and tested it for the example and now "red light exposure" is hit number 1.

While we will discuss tweaking the soerensen dice, the change for advanced term search will be live in version 0.2.1.

@Brilator
Copy link
Member Author

Thanks. And my bad - I thought there was also an exact match "red light" (not "red light exposure").

@Freymaurer
Copy link
Collaborator

No worries! Just to make sure i checked the database and did in fact not find a term with the name "red light". Did you expect such a term?

@Brilator
Copy link
Member Author

No no. I just thought I saw it earlier and was confused about "light red" > "red light".

@Freymaurer
Copy link
Collaborator

@Brilator By the way, do you know that you can still answer on closed issues? I am always closing issues after i consider them solved and i am afraid you'll think i am just shutting you up. You can always answer to an closed issue if you don't feel like it is solved satisfactorily.

@Brilator
Copy link
Member Author

Sure, thanks. I just did (react with thumbs-up not comment) earlier this morning on closed issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Bug Something is not working, and it is confirmed by maintainers to be a bug.
Projects
None yet
Development

No branches or pull requests

2 participants