Elasticsearch is a good fulltext search engine.
- Wikipedia search is powered by Elasticsearch.
- The Guardian joins access log data with social network data using Elasticsearch to give editors an idea of how public is reponding to articles.
- StackOverflow fulltext search is powered by Elasticsearch. They use the more like this feature to find similar answers.
- GitHub uses Elasticsearch to query 130 billion lines of code
Docker and Python 2.7 with pip or easy_istall and internet access.
- Get code.
git clone git@github.com:josalmi/es-movies.git
- Fire up elasticsearch.
docker-compose up
- Open shell in client container:
docker-compose run client /bin/bash
- Load data.
./init.sh
- Profit
We are using UCI Movies Dataset of over 10k films. The titles are from late 1800's to 1999.
Find all the Academy Awards winners in the database. AA stands for winning an Academy Award.
Find the film Elmer Gantry in the raw data. Did it win an Academy Award?
- Find all the Academny Award winners excluding those who were just nominated (AAN).
- Try to filter all those movies which contain the word 'Vampire'. How many are there? What's up with the score.
-
The Best films are not in any particular order. Let's see if we can use a function score to order the results after matches have been made. Perhaps the field_value_factor or the decay functions can help us order our movies.
-
Something isn't right. Let's look at what our index looks like.
curl http://localhost:9200/movies
. What's the problem?
Tuning relevance in Elasticsearch is a dance between the index and the query. Let's add some mappings! In order to change the mappings, we will create a new index named 1. There are some ready made mappings. But is there something we should change to make the function score work?
./et create index 1
./et reindex 0 1
./et index alias movies 1 0
Once you start typing into the typeahead field the experience isn't very satisfying. Let's create a typeahead index.