2022 Evergreen Online Conference
Bill Erickson
Software Development Engineer, King County Library System
- Improve Evergreen catalog search speed for staff.
- Evergreen Begins
- Rise of Solr and discovery layers
- KCLS adopts EG, soon migrates to 3rd-party catalog
- Jeff G presents(?) on Elasticsearch-driven mobile catalog
- Elasticsearch proof-of-concept implementation for EG
- Blake GH opens LP1844418
- Angular Catalog development proceeds in parallel
- Angular Catalog + Elasticsearch limited staff use at KCLS 2020
- KCLS general use late 2021
Elasticsearch is a distributed, open source search and analytics engine for all types of data, including textual, numerical, geospatial, structured, and unstructured. Elasticsearch is built on Apache Lucene and was first released in 2010 by Elasticsearch N.V. (now known as Elastic). Known for its simple REST APIs, distributed nature, speed, and scalability...
Source: https://www.elastic.co/what-is/elasticsearch
- Similar to Solr
- Ease of use
- Broad feature set
- Excellent Documentation and Examples
- I liked the API
- Industry use outside the library world
- Clustering / Replication
- Open source w/ vendor support/additions
- Indexing speed
- KCLS 1.1M records; 3.6M items
- 4 parallel: 1 hour 45 mins
- Takes heavy search query load off primary PG Database
- Searches report total result count / no estimates
- Opportunities for new types of searches with minimal backend development.
- Parallel, Interchangeable Datasets
- An Evergreen API for Keyword, Title, Author, etc. searches
- Some Numeric Searches (e.g. not Item Barcode)
- MARC search
- Query String support
- Give me everything: *:*
- Give me the new stuff: pubdate:2020
- Ranges
- pubdate:[2001 TO 2010]
- create_date:[2021-01 TO 2021-02]
- pubdate:>=2020
- Boolean Grouping
- (kw:dogs AND (pubdate:2021 OR pubdate:2022)) OR (ti:cats AND NOT pubdate:2022)
- R.E.M.
- REM
- R E M
- Its A Wonderful Life
- MARC match option selector
- 'contains exact' match opt
- MARC regex search
- .{24}[^6]{3}.{13}
- Graphic Novels
- "Did You Mean" (Elastic Docs)
- Search Results Highlight (Elastic Docs)
- Autosuggest (Elastic Docs)
- Sort by Populatrity
- Copy Location Group filtering
- Org Unit Lasso filtering
- Others?
!sh
$ sudo apt install openjdk-11-jre-headless
$ wget 'https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.8.11.deb'
$ sudo dpkg -i elasticsearch-6.8.11.deb
$ sudo systemctl start elasticsearch
$ sudo systemctl enable elasticsearch
$ sudo cpan Search::Elasticsearch::Client::6_0
Elasticserach ICU Analysis Plugin
!sh
$ cd /usr/share/elasticsearch/
$ sudo bin/elasticsearch-plugin install analysis-icu
!sh
cd /home/opensrf/Evergreen/Open-ILS/src/support-scripts/
./elastic-index.pl --index-name kcls-1 --create-index
./elastic-index.pl --index-name kcls-1 --populate
./elastic-index.pl --index-name kcls-1 --activate-index
- Two dedicated VMs with ~100G disk and 24G RAM
- Load-balanced with one write node, one replica node.
- A full index uses about 36G disk
- Apply firewall (iptables) to limit port 9200 access
- 2 Indexer Scripts
!sh
curl -s http://localhost:9200/bib-search/_doc/891066 | jq -C . | less -R
curl -s -XGET 'localhost:9200/bib-search/_search?pretty&q=dogs' | jq -C . | less -R
curl -s -XGET 'localhost:9200/bib-search/_count?pretty'
curl -s -XGET 'localhost:9200/_cat/shards?h=index,shard,prirep,state,unassigned.reason'
curl -s -XGET 'localhost:9200/_cluster/health?pretty'
#
!sh
$ curl -s -XGET "localhost:9200/bib-search/_analyze?pretty" -H 'Content-Type: application/json' -d'
{
"analyzer" : "icu_folding",
"text" : "En̲ iruḷ vān̲il oḷi nilavāy nī"
}
' | jq -C . | less -R
#