Skip to content

Elasticsearch (quick & advanced search)

Clément Roig edited this page Jan 7, 2022 · 31 revisions

Introduction

Grottocenter uses the Elastic tools suite to perform quick and advanced searches over the database. More precisely, it uses Elasticsearch and Logstash. Currently, the installation process of these softwares is made automatically inside the Docker containers.

Elasticsearch

Once launched, Elasticsearch listens to the 9200 port for RESTful requests to create index, populate them, search data etc. For example, if you hit http://localhost:9200/_all/_search?q=cave with your browser (GET request), Elasticsearch will search the word "cave" through all its indexes and return a JSON result matching this word.

Logstash

Logstash is used to create and populate the indexes used by Elasticsearch. It takes an input, filters it and outputs it (to Elasticsearch or somewhere else if needed). In Grottocenter, Logstash is executed via docker using the logstash.conf file at the root of the repository to populate the running instance of Elasticsearch.

Usage

Quick search

Returns the 10 first results matching the keywords provided.

ROUTE /api/v1/search

PARAMS

  • query - string required

Keywords to perform the search on.

  • complete - bool (default = false)

Does the query need to send every information about the results ? If set to false, returns only the id and the name of each result.

  • resourceType - string (one of: 'entries', 'grottos', 'massifs', 'bbs') (default = search on all indexes)

The type of resource on which the search if performed.

  • from - number (default = 0)

Starting result number from which the results are retrieved.

  • size - number (default = 10)

Number of results to return

Boosted data (= weight)

To improve the quality of the search results (quick search only !), boosts are added to some properties of the database. The boost values are defined in ElasticsearchService.js file. Refer to the boost documentation for more information.

The general rules about boosts are:

  • name ⏫ 5
  • city ⏫ 2
  • description / body (long texts) ⏬ 0.5

Advanced search

ROUTE: /api/v1/advancedSearch

PARAMS

  • resourceType - string (one of: 'entries', 'grottos', 'massifs', 'bbs') required

The type of resource on which the search if performed.

  • complete - bool (default = true)

Does the query need to send every information about the results ? If set to false, returns only the id and the name of each result.

  • paramX - string / bool

This parameter must be a property indexed by Elasticsearch. The resource property must be equal to this value.

  • paramY-min - number

This parameter must be a range property indexed by Elasticsearch. The resource range property must be superior to this value.

  • paramZ-max - number

This parameter must be a range property indexed by Elasticsearch. The resource range property must be inferior to this value.

Example

{
    resourceType: grottos
    complete: true
    number of cavers-min: 2
    number of cavers-max: 10
    city: "Montpellier"
}

This request will return the groups based on Montpellier with 2 to 10 members.

Indexed data

The data is totally re-indexed every monday at 3:00 AM using Logstash. Otherwise, Sails updates the indexes using the npm package elasticsearch via the ControllerService function treatAndConvert().

01/2019 note: the update feature has not been tested (no update / delete functions currently).

PostGreSQL Migration

In order to use a PostGreSQL database, it will be necessary to change some things in the Elasticsearch features. First, in the logstash.conf, the jdbc_driver_library and jdbc_driver_class attributes must be switched from MySQL to PostGreSQL.

Also, the statements (= SQL requests) have to be modified and tested, accordingly to the new database schema. Also, the GROUP_CONCAT clause doesn't exist in PostGreSQL: its equivalent is array_agg or array_to_string.

Appendices

Detailed boosts (07/01/2022 => outdated, todo: update with latest indexed data)

In quick search only, some data fields are boosted (if none, boost = 1):

ENTRIES (t_entry)

  • name ⏫ 5
  • city ⏫ 2
  • county
  • country
  • region

t_description through j_entry_description table

  • descriptions (title, body) ⏬ 0.5

t_cave through j_cave_entry table

  • caves (name)
  • cave length (length)
  • cave depth (depth)

t_rigging through j_entry_rigging table

  • riggings (title, observation, obstacle)

t_location through t_location table

  • location (body) ⏬ 0.5

t_bibliography through t_bibliography table

  • bibliographies (body) ⏬ 0.5

MASSIFS (t_massif)

  • name ⏫ 5

t_entry through j_massif_cave table

  • entries(name, city, county, country, region)

GROTTOS (t_grotto)

  • name ⏫ 5
  • city ⏫ 2
  • county
  • country
  • custom_message

t_cavers through j_grotto_caver table

  • cavers names

DOCUMENTS (t_document)

  • title ⏫ 2.8
  • authors
  • abstract ⏬ 0.5
  • ref
  • country
  • theme
  • subtheme