Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better searching #833

Open
Taeir opened this issue Aug 14, 2022 · 1 comment
Open

Better searching #833

Taeir opened this issue Aug 14, 2022 · 1 comment
Labels
area: backend Changes to server-side code complexity: unassessed Needs further developer investigation before complexity/feasibility can be determined. priority: medium type: analysis Potential changes that require some design/architecture/code analysis before we start implementing. type: change request New feature or request

Comments

@Taeir
Copy link
Contributor

Taeir commented Aug 14, 2022

Is your feature request related to a problem? Please describe.
Currently, search is performed using a database match query. While this is a very fast way of searching, it does not provide great results. As it is currently set up, only exact word matches are found. This can be good or bad:

  • Bad: Searching for 'test' does not find a post with the word 'testing'
  • Good: Searching for 'hell' does not find a post with the word 'hello'

Additionally, it seems the current solution does not search in titles, even though those usually contain the most important information.

When a user is not able to find their question, they will look elsewhere or ask a new one. This may decrease user interaction with the platform and increase the likelihood of negative interactions (closed because of duplicate). It should not be underestimated how important proper search is.

Describe the solution you'd like
Searching effectively is a rather complex problem. Rather than reinvent the wheel, I suggest to use an existing full-text analyzing solution. These search systems have an understanding of languages and can detect word matches when other tenses or even synonyms are used. Additionally, weights can be set to give more weight to matches in the title, etc. These systems can also make suggestions for better search terms (spelling corrections) to help the user further.

Describe alternatives you've considered
There are a few different alternatives available:

Elasticsearch
Pros:

  • Widely used
  • Free to use
  • Good ruby and rails integration
  • Easy to integrate with current logic

Mehs:

  • Partially open source (was fully open source, but seems to be moving away from that model slowly)

Cons:

  • Hard to configure correctly when working at a large scale

Apache Solr/Lucene
Pros:

  • Widely used
  • Free to use
  • Properly open source
  • Good ruby and rails integration

Cons:

  • Requires more work to integrate with current filtering logic (tag:..., votes:10, etc.)

There may be other systems out there, but these two seem to be the most used which have proper rails integration.

Additional context
While SolR may be a somewhat better choice for the long run, the current code style lends itself to elasticsearch much better (as far as I can tell). I'm relatively familiar with ElasticSearch, so I will be providing a pull request with an implementation for it. More work will be required to build additional features such as highlighting (indicate the words in the match), search in associated entities (increase search rank of posts where the answers match highly with the search term), and if you want to go that direction another consideration should be made which of the two to use.

For smaller sites (not much data or few searches), running a single elasticsearch instance (i.e. very similarly to running a database locally), is fine. However, to deploy something like this at a larger scale will require some proper testing and configuration. It requires setting up an Elasticsearch cluster, with probably multiple nodes to get performant search results. I don't know at which scale Codidact currently is, but it would be something to consider for the future. I'm pretty sure AWS has configurations/tutorials for setting up both Elasticsearch and Solr clusters.

@cellio cellio added area: backend Changes to server-side code type: change request New feature or request priority: medium complexity: unassessed Needs further developer investigation before complexity/feasibility can be determined. type: analysis Potential changes that require some design/architecture/code analysis before we start implementing. labels Aug 14, 2022
@cellio
Copy link
Member

cellio commented Sep 19, 2022

Another report about search in titles: https://meta.codidact.com/posts/287032

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: backend Changes to server-side code complexity: unassessed Needs further developer investigation before complexity/feasibility can be determined. priority: medium type: analysis Potential changes that require some design/architecture/code analysis before we start implementing. type: change request New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants