-
-
Notifications
You must be signed in to change notification settings - Fork 489
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migration to Elasticsearch #2830
Conversation
* Add any field for more global full text search * Add recordGroup for collapse mode * Test routing key * Add recordLink to target one parent (need more work) * Link also feature to parent record
…ng features & records (disabled).
…ch based on the bucket id in session and you can reuse it somewhere else if needed. Query can be done using JSON Elastic object or Lucene query syntax also.
…hen there're not failed documents to avoid exception
…ore folder to xsl/conversion
CFV https://sourceforge.net/p/geonetwork/mailman/message/36995405/
Actions:
Note:
|
Closing, this is now https://github.com/geonetwork/core-geonetwork/tree/4.0.x |
The move from Lucene library to a on the shelf search engine like Solr or Elasticsearch is mainly motivated to:
Moving to ElasticSearch will bring a lot of flexibility on configuring the indexing and search in the catalogue. This page highlight the main benefits of this move and the current progress. GeoNetwork developers are looking for funding to continue this task.
Current limitations
"
character used in searchUse cases to illustrate some of the benefits
Main benefits that this move can bring to the application concern:
This change will also probably improve search and indexing performance and still support spatial searches.
It also target to simplify the codebase and as such will make the life easier to new comers:
The following sections illustrates some of the benefits:
More flexible facets (named aggregations)
Elasticsearch API allows to:
Better suggestions
Current suggestion does not always take user privileges in account. Elasticsearch allows to use suggesters on any fields indexed (and combining searches)
eg.
More to analyze:
More like this
"More like this" provides suggestion on similar record to the one you're currently viewing. Similarity could be define based on which fields the similarity is computed and the frequency of terms (needs more testing on how to define more like this parameters)
Dynamic dashboards
Once indexed in ElasticSearch, Kibana can be used to analyze the content of your catalogue. Kibana can be used to make analysis on the catalogue in order to promote your catalogue on CMS or third party website and can also be used to improve the content of records by searching invalid values.
Dashboard can also focus on geographical extents:
Performances
After basic testing with
ab
it looks like we could expect to be 5 time faster than current services for search/facets/indexing.Migration roadmap
Dev branch is available here https://github.com/geonetwork/core-geonetwork/tree/es
This migration is a major work and will require iterations in order to cover the full scope of what GeoNetwork search & indexing related features actually covers. The process of migration is described as a succession of levels from Level 0 providing the minimal set of features to the last Level which could provide the same level of features. Therefore this migration can also be an opportunity to remove unused features and simplify the codebase.
Level 0 (Proof of concept)
Level 0 means that the application starts, create the index if not existing, index document from the database and the main search is available. This is the proof of concept allowing to analyze what will be the implementation main goals.
The search API is provided by the
/api/search/records/_search
service:Tasks for this level are:
Funding: This level 0 of features was developed mainly during the 2018 Bolsena codesprint.
Level 1
Level 1 target a fully working user interface based on the new search service. Search is used in many places from the home page to the associated resources panel in the editor.
Technical challenges:
Tasks for this level are:
Search / wire all the UI (home/edit/admin) to the new search service with proper search results
Search / Return in _source only the field required by the UI (performance)
Translation of codelist
Facets / restore full support of one level facet with configuration from the settings
Various Elasticsearch queries can be used to configure autocompletion.
By default, a multi_match on anytext + its ngram associated fields is configured in order to propose record titles based on analysis of partial word match.
Funding: This level 1 of features was developed mainly:
Level 2
Level 2 focus on restoring CSW and improving aggregations (aka facet) supports.
Tasks for this level are:
CSW support
CSW / support geometries with And / Or conditions.
Virtual CSW (deprecated and replace by portal - which include a virtual CSW)
Migrate to RestHighLevelClient Java client API instead of JEST library (used because no dependency on Lucene)
Selection / Restore selection manager
Selection / Restore MEF export
Selection / Restore PDF/CSV export
How to integrate ES for running tests and to package the installer
Improve index status checker by reporting status
create automatically index when it does exist
Multi portal support / Restore portal filter injection
Facet / Allow OR in a category
Hierarchical facet eg. GEMET. Hierarchy of facet is now supported using 2 approaches:
At the end of level 2, GeoNetwork should provide main functionalities for users requiring search (including CSW)/editing/map/admin console.
Funding: This level 2 of features was developed mainly:
Level 3
Funding:
Level 4
Level 4 focused on making a first beta release of GeoNetwork on Elasticsearch that can be used for real.
Funding:
Reference documents: