forked from rero/rero-ils
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
The current elasticsearch query string used by the REST API is powerful but should not be used for search boxes. It raises exceptions when the query syntax contains errors. As described in the elaticsearch documentation, a simple query string should be used. A new http query optional parameters has been added and can be specified as follows: `&simple=1`. For example: `https://ils.rero.ch/global/search/?q=potter&simple=1`. When the simple query syntax is chosen, the default boolean operator is `AND`. Except this parameter, nothing has been modified. * Adds new type of aggregation filter to perform a AND boolean operator between the terms in the same aggregation. All aggregations filters use now AND boolean operator. * Updates elasticsearch mappings to enhance the search engine quality. * Adds a new REST API list records operator to use a `simple_query_string` instead of a `query_string`. The simple query string preform an AND boolean operator by default. * Adds search query tests. * Adds missing utils test. * Renames `global_lowercase_asciifolding` elasticsearch analyzer to `default`. This makes the rero-ils custom analyzer to be the default for all elasticsearch `text` fields. All elasticsearch mappings has been simplified. * Creates a new `custom_keyword` analyzer. * Creates a custom elasticsearch image with the icu analysis plugin (https://www.elastic.co/guide/en/elasticsearch/plugins/current/analysis-icu.html). * Closes rero#755. Co-Authored-by: Johnny Mariéthoz <Johnny.Mariethoz@rero.ch>
Showing
23 changed files
with
529 additions
and
210 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
FROM docker.elastic.co/elasticsearch/elasticsearch-oss:6.6.2 | ||
RUN bin/elasticsearch-plugin install analysis-icu |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,277 @@ | ||
# -*- coding: utf-8 -*- | ||
# | ||
# RERO ILS | ||
# Copyright (C) 2019 RERO | ||
# | ||
# This program is free software: you can redistribute it and/or modify | ||
# it under the terms of the GNU Affero General Public License as published by | ||
# the Free Software Foundation, version 3 of the License. | ||
# | ||
# This program is distributed in the hope that it will be useful, | ||
# but WITHOUT ANY WARRANTY; without even the implied warranty of | ||
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the | ||
# GNU Affero General Public License for more details. | ||
# | ||
# You should have received a copy of the GNU Affero General Public License | ||
# along with this program. If not, see <http://www.gnu.org/licenses/>. | ||
|
||
"""Search tests.""" | ||
|
||
from flask import url_for | ||
from utils import get_json | ||
|
||
|
||
def test_document_search( | ||
client, | ||
doc_title_travailleurs, | ||
doc_title_travailleuses | ||
): | ||
"""Test document boosting.""" | ||
# phrase search | ||
list_url = url_for( | ||
'invenio_records_rest.doc_list', | ||
q='"Les travailleurs assidus sont de retours"', | ||
simple='1' | ||
) | ||
res = client.get(list_url) | ||
hits = get_json(res)['hits'] | ||
assert hits['total'] == 1 | ||
|
||
# phrase search with punctuations | ||
list_url = url_for( | ||
'invenio_records_rest.doc_list', | ||
q='"Les travailleurs assidus sont de retours."', | ||
simple='1' | ||
) | ||
res = client.get(list_url) | ||
hits = get_json(res)['hits'] | ||
assert hits['total'] == 1 | ||
|
||
# word search | ||
list_url = url_for( | ||
'invenio_records_rest.doc_list', | ||
q='travailleurs', | ||
simple='1' | ||
) | ||
res = client.get(list_url) | ||
hits = get_json(res)['hits'] | ||
assert hits['total'] == 2 | ||
|
||
# travailleurs == travailleur == travailleuses | ||
list_url = url_for( | ||
'invenio_records_rest.doc_list', | ||
q='travailleur', | ||
simple='1' | ||
) | ||
res = client.get(list_url) | ||
hits = get_json(res)['hits'] | ||
assert hits['total'] == 2 | ||
|
||
# ecole == école | ||
list_url = url_for( | ||
'invenio_records_rest.doc_list', | ||
q='ecole', | ||
simple='1' | ||
) | ||
res = client.get(list_url) | ||
hits = get_json(res)['hits'] | ||
assert hits['total'] == 1 | ||
|
||
# Ecole == école | ||
list_url = url_for( | ||
'invenio_records_rest.doc_list', | ||
q='Ecole', | ||
simple='1' | ||
) | ||
res = client.get(list_url) | ||
hits = get_json(res)['hits'] | ||
assert hits['total'] == 1 | ||
|
||
# ECOLE == école | ||
list_url = url_for( | ||
'invenio_records_rest.doc_list', | ||
q='Ecole', | ||
simple='1' | ||
) | ||
res = client.get(list_url) | ||
hits = get_json(res)['hits'] | ||
assert hits['total'] == 1 | ||
|
||
# _école_ == école | ||
list_url = url_for( | ||
'invenio_records_rest.doc_list', | ||
q=' école ', | ||
simple='1' | ||
) | ||
res = client.get(list_url) | ||
hits = get_json(res)['hits'] | ||
assert hits['total'] == 1 | ||
|
||
# Müller | ||
list_url = url_for( | ||
'invenio_records_rest.doc_list', | ||
q='Müller', | ||
simple='1' | ||
) | ||
res = client.get(list_url) | ||
hits = get_json(res)['hits'] | ||
assert hits['total'] == 1 | ||
|
||
# Müller == Muller | ||
list_url = url_for( | ||
'invenio_records_rest.doc_list', | ||
q='Muller', | ||
simple='1' | ||
) | ||
res = client.get(list_url) | ||
hits = get_json(res)['hits'] | ||
assert hits['total'] == 1 | ||
|
||
# Müller == Mueller | ||
list_url = url_for( | ||
'invenio_records_rest.doc_list', | ||
q='Mueller', | ||
simple='1' | ||
) | ||
res = client.get(list_url) | ||
hits = get_json(res)['hits'] | ||
assert hits['total'] == 1 | ||
|
||
# test AND | ||
list_url = url_for( | ||
'invenio_records_rest.doc_list', | ||
q='travailleuse école', | ||
simple='1' | ||
) | ||
res = client.get(list_url) | ||
hits = get_json(res)['hits'] | ||
assert hits['total'] == 1 | ||
|
||
# test OR in two docs | ||
list_url = url_for( | ||
'invenio_records_rest.doc_list', | ||
q='retours | école', | ||
simple='1' | ||
) | ||
res = client.get(list_url) | ||
hits = get_json(res)['hits'] | ||
assert hits['total'] == 2 | ||
|
||
# test AND in two fields (travailleuses == travailleur) | ||
list_url = url_for( | ||
'invenio_records_rest.doc_list', | ||
q='travailleuses bientôt', | ||
simple='1' | ||
) | ||
res = client.get(list_url) | ||
hits = get_json(res)['hits'] | ||
assert hits['total'] == 1 | ||
|
||
list_url = url_for( | ||
'invenio_records_rest.doc_list', | ||
q='travailleuses + bientôt', | ||
simple='1' | ||
) | ||
res = client.get(list_url) | ||
hits = get_json(res)['hits'] | ||
assert hits['total'] == 1 | ||
|
||
# test OR in two docs (each match only one term) | ||
list_url = url_for( | ||
'invenio_records_rest.doc_list', | ||
q='retours | école', | ||
simple='1' | ||
) | ||
res = client.get(list_url) | ||
hits = get_json(res)['hits'] | ||
assert hits['total'] == 2 | ||
|
||
# test AND in two docs (each match only one term) => no result | ||
list_url = url_for( | ||
'invenio_records_rest.doc_list', | ||
q='retours école', | ||
simple='1' | ||
) | ||
res = client.get(list_url) | ||
hits = get_json(res)['hits'] | ||
assert hits['total'] == 0 | ||
|
||
list_url = url_for( | ||
'invenio_records_rest.doc_list', | ||
q='retours + école', | ||
simple='1' | ||
) | ||
res = client.get(list_url) | ||
hits = get_json(res)['hits'] | ||
assert hits['total'] == 0 | ||
|
||
# title + subtitle | ||
list_url = url_for( | ||
'invenio_records_rest.doc_list', | ||
q='Les travailleurs assidus sont de retours : ' | ||
'les jeunes arrivent bientôt ?', | ||
simple='1' | ||
) | ||
res = client.get(list_url) | ||
hits = get_json(res)['hits'] | ||
assert hits['total'] == 1 | ||
|
||
# punctuation | ||
list_url = url_for( | ||
'invenio_records_rest.doc_list', | ||
q=r'école : . ... , ; ? \ ! = == - --', | ||
simple='1' | ||
) | ||
res = client.get(list_url) | ||
hits = get_json(res)['hits'] | ||
assert hits['total'] == 1 | ||
|
||
list_url = url_for( | ||
'invenio_records_rest.doc_list', | ||
q=r'école:.,;?\!...=-==--', | ||
simple='1' | ||
) | ||
res = client.get(list_url) | ||
hits = get_json(res)['hits'] | ||
assert hits['total'] == 1 | ||
|
||
# special chars | ||
# œ in title | ||
list_url = url_for( | ||
'invenio_records_rest.doc_list', | ||
q=r'bœuf', | ||
simple='1' | ||
) | ||
res = client.get(list_url) | ||
hits = get_json(res)['hits'] | ||
assert hits['total'] == 1 | ||
|
||
# æ in title | ||
list_url = url_for( | ||
'invenio_records_rest.doc_list', | ||
q=r'ex æquo', | ||
simple='1' | ||
) | ||
res = client.get(list_url) | ||
hits = get_json(res)['hits'] | ||
assert hits['total'] == 1 | ||
|
||
# æ in title | ||
list_url = url_for( | ||
'invenio_records_rest.doc_list', | ||
q=r'ÆQUO', | ||
simple='1' | ||
) | ||
res = client.get(list_url) | ||
hits = get_json(res)['hits'] | ||
assert hits['total'] == 1 | ||
|
||
# œ in author | ||
list_url = url_for( | ||
'invenio_records_rest.doc_list', | ||
q=r'Corminbœuf', | ||
simple='1' | ||
) | ||
res = client.get(list_url) | ||
hits = get_json(res)['hits'] | ||
assert hits['total'] == 1 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters