Welcome to the Beginner's Crash Course to Elastic Stack!
This repo contains all resources shared during Part 6: Troubleshooting Beginner-Level Elasticsearch Errors!
Throughout the series, we have learned about CRUD operations, fine tuning the relevance of your search, full text search, aggregations, and mapping.
As you continue your journey with Elasticsearch, you will inevitably encounter some common errors associated with the topics we have covered in the series.
Learn how to troubleshoot these pesky errors so you can get unstuck!
Table of Contents: Beginner's Crash Course to Elastic Stack: This workshop is a part of the Beginner's Crash Course to Elastic Stack series. Check out this table contents to access all the workshops in the series.
Instructions on how to access Elasticsearch and Kibana on Elastic Cloud
Instructions for downloading Elasticsearch and Kibana
Video recording of the workshop
Mini Beginner's Crash Course to Elasticsearch & Kibana playlist
Do you prefer learning by watching shorter videos? Check out this playlist to watch short clips of beginner's crash course full length workshops. Season 2 clips will be uploaded here in the future!
YouTube Playlist of the Beginner's Crash Course to Elastic Stack: Want to watch all the workshops in the series? Check out the YouTube playlist of the Beginner's Crash Course to Elastic Stack!
Season 2 Topics Survey: I want to make the content more digestible for all of you!
Starting with season 2, I will discontinue holding live hour-long workshops and start uploading short clips(10 min or less) on YouTube Playlist of the Beginner's Crash Course to Elastic Stack.
This series is created for YOU! Please let me know what you would like to learn in season 2 by submitting your preference in the survey.
I will create content on most requested topics along with other helpful topics for beginners!
Whenever you perform an action with Elasticsearch and Kibana, Elasticsearch responds with an HTTP status and a response body.
The request below asks Elasticsearch to index a document and assign it an id of 1.
The HTTP status of 201-success indicates that the document has been successfully created. The response body indicates that the document with an assigned id of 1 has been created in the beginners_crash_course
index.
As we work with Elasticsearch, we will inevitably encounter error messages like the one below.
When this happens, the HTTP status and the response body will provide valuable clues about why the request failed!
Here are some common errors that you may encounter as you work with Elasticsearch.
The cluster may be down or it may be a network issue. Check the network status and cluster health to identify the problem.
The node may have died or it may be a network issue. Retry your request.
Errors with an HTTP status starting with 5 stems from internal server error in Elasticsearch. When you see this error, take a look at the Elasticsearch log and identify the problem.
Errors with an HTTP status starting with 4 stems from client errors. When you see this error, correct the request before retrying.
As beginners, we are still familiarizing ourselves with the rules and syntax required to communicate with Elasticsearch. Majority of the error messages we encouter are likely to have been caused by the mistakes we make while writing our requests(4XX errors).
To strengthen our understanding of the requests we have learned throughout the series, we will only focus on 4XX errors during this workshop.
- What number does the HTTP status start with(4XX? 5XX?)
- What does the response say? Always read the full message!
- Use the Elasticsearch documentation as your guide. Compare your request with the example from the documentation. Identify the mistake and make appropriate changes.
At times, you will encounter error messages that are not very helpful. We will go over a couple of these and see how we can troubleshoot these types of errors.
Throughout the series, we learned how to send requests related to the following topics:
- CRUD operations
- Queries
- Aggregations
- Mapping
We will revisit each topic and troubleshoot common errors you may encounter as you explore each topic.
In Part 1: Intro to Elasticsearch and Kibana, we learned how to perform CRUD operations. Let's say we have sent the following request to retrieve a document with an id of 1 from the common_errors
index.
Request sent:
GET common_errors/_doc/1
Expected response from Elasticsearch:
Elasticsearch returns a 404-error along with the cause of the error in the response body. The HTTP status starts with a 4, meaning that there was a client error with the request sent.
When you look at the response body, Elasticsearch lists the reason(line 6) as "no such index [common_errors]".
The two possible explanations for this error are:
- The index
common_errors
truly does not exist or was deleted - We do not have the correct index name
In our example, the cause of the error is quite clear! We have not created an index called common_errors
and we were trying to retrieve a document from an index that does not exist.
Let's create an index called common_errors
:
Syntax:
PUT Name-of-the-Index
Example:
PUT common_errors
Expected response from Elasticsearch:
Elasticsearch returns a 200-success HTTP status acknowledging that the index common_errors
has been successfully created.
Now that we have created the index common_errors
, let's index a document!
Suppose you have remembered that you could use the HTTP verb PUT to index a document and send the following request:
PUT common_errors/_doc
{
"source_of_error": "Using the wrong syntax for PUT or POST indexing request"
}
Expected response from Elasticsearch:
Elasticsearch returns a 405-error along with the cause of the error in the response body. This HTTP status starts with a 4, meaning that there was a client error with the request sent.
If you look at the response, Elasticsearch lists the reason as "Incorrect HTTP method for uri... and method: [PUT], allowed:[POST]".
This error message suggests that we used the wrong HTTP verb to index this document.
You can use either PUT or POST HTTP verb to index a document. Each HTTP verb serves a different purpose and requires a different syntax.
We learned about the difference between the two verbs during Part 1: Intro to Elasticsearch and Kibana under the Index a document
section.
When indexing a document, HTTP verb PUT or POST can be used.
The HTTP verb PUT is used when you want to assign a specific id to your document.
Syntax:
PUT name-of-the-Index/_doc/id-you-want-to-assign-to-this-document
{
field_name: "value"
}
Let's compare the syntax to the request we just sent:
PUT common_errors/_doc
{
"source_of_error": "Using the wrong syntax for PUT or POST indexing request"
}
You will see that our request uses the HTTP verb PUT but it does not include the document id we want to assign to this document.
If you add the id of the document to the request as seen below, you will see that the request is carried out without a hitch!
Correct example for PUT indexing request:
PUT common_errors/_doc/1
{
"source_of_error": "Using the wrong syntax for PUT or POST indexing request"
}
Expected response from Elasticsearch:
Elasticsearch returns a 201-success HTTP status acknowledging that document 1 has been successfully created.
The HTTP verb POST is used when you want Elasticsearch to autogenerate an id for the document.
If this is the option you wanted, you could fix the error message by replacing the verb PUT with POST and not including the document id after the document endpoint.
Syntax:
POST Name-of-the-Index/_doc
{
field_name: "value"
}
Correct example for POST indexing request:
POST common_errors/_doc
{
"source_of_error": "Using the wrong syntax for PUT or POST indexing request"
}
Expected response from Elasticsearch:
Elasticsearch returns a 201-success HTTP status and autogenerates an id(line 4) for the document that was indexed.
Error 3: 400 Unexpected Character: was expecting a comma to separate Object entries at [Source: ...] line: x
Suppose you wanted to update document 1 by adding the fields "error" and "solution" as seen in the example.
Example:
POST common_errors/_update/1
{
"doc": {
"error": "405 Method Not Allowed"
"solution": "Look up the syntax of PUT and POST indexing requests and use the correct syntax."
}
}
Expected response from Elasticsearch:
Elasticsearch returns a 400-error along with the cause of the error in the response body. This HTTP error starts with a 4, meaning that there was a client error with the request sent.
If you look at the response, Elasticsearch lists the error type(line 12) as "json_parse_exception" and the reason(line 13) as "...was expecting comma to separate Object entries at ... line: 4]".
In Elasticsearch, if you have multiple fields("errors" and "solution") in an object("doc"), you must separate each field with a comma. The error message tells us that we need to add a comma between the fields "error" and "solution".
Add the comma as shown below and send the following request:
POST common_errors/_update/1
{
"doc": {
"error": "405 Method Not Allowed",
"solution": "Look up the syntax of PUT and POST indexing requests and use the correct syntax."
}
}
Expected response from Elasticsearch:
You will see that the document with an id of 1 has been successfully updated.
In parts 2 and 3, we learned how to send queries about news headlines in our index.
As a prerequisite part of these workshops, we added a news headlines dataset to an index we named as news_headlines
.
We sent various queries to retrieve documents that match the criteria. Let's go over common errors you may encounter while working with these queries.
Suppose you want to use the range query to pull up news headlines published within a specific date range.
You have sent the following request:
GET news_headlines/_search
{
"query": {
"range": {
"date":
"gte": "2015-06-20",
"lte": "2015-09-22"
}
}
}
Expected response from Elasticsearch:
Elasticsearch returns a 400-error along with the cause of the error in the response body. This HTTP status starts with a 4, meaning that there was a client error with the request sent.
If you look at the response, Elasticsearch lists the error type(line 5) as "parsing_exception" and the reason(line 6) as "[range] query does not support [date]".
This error message is misleading as the range query should be able to retrieve documents that contain terms within a provided range. It should not matter that you have requested to run a range query against the field "date".
Let's check out the screenshots from the Elastic documentation on the range query to see what is going on.
Pay attention to the syntax of the range query line by line.
Screenshot from the documentation:
Compare this syntax to the request we have sent earlier:
GET news_headlines/_search
{
"query": {
"range": {
"date":
"gte": "2015-06-20",
"lte": "2015-09-22"
}
}
}
The culprit of this error is the range query syntax!
Our request is missing curly brackets around the inner fields("gte" and "lte") of the field "date".
Let's add the curly brackets as shown below and send the following request:
GET news_headlines/_search
{
"query": {
"range": {
"date": {
"gte": "2015-06-20",
"lte": "2015-09-22"
}
}
}
}
Expected response from Elasticsearch:
Elasticsearch returns a 200-success status and retrieves news headlines that were published between the specified date range.
In Part 2, we learned about the multi_match
query. This query allows you to search for the same search terms in multiple fields at one time.
Suppose you wanted to search for the phrase "party planning" in the fields headline
and short_description
as shown below:
GET news_headlines/_search
{
"query": {
"multi_match": {
"query": "party planning",
"fields": [
"headline",
"short_description"
]
},
"type": "phrase"
}
}
Expected response from Elasticsearch:
Elasticsearch returns a 400-error along with the cause of the error in the response body. This HTTP status starts with a 4, meaning that something isn't quite right with the request sent.
If you look at the response, Elasticsearch lists the error type(line 5) as "parsing_exception" and the reason(line 6) as "[multi_match] malformed query, expected [END_OBJECT] but found [FIELD_NAME]".
This is a vague error message that does not really tell you what went wrong.
However, we do know that the error is coming from somewhere around line 10, which suggests that the error may have something to do with the "type" parameter(line 11).
When you check the opening and closing brackets from the outside in, you will realize that the "type" parameter is placed outside of the multi_match
query.
Move the "type" parameter up a line and move the comma from line 10 to line 9 as shown below and send the request:
GET news_headlines/_search
{
"query": {
"multi_match": {
"query": "party planning",
"fields": [
"headline",
"short_description"
],
"type": "phrase"
}
}
}
Expected response from Elasticsearch:
Elastcsearch returns a 200-success(red box) response.
All hits contain the phrase "party planning" in either the field "headline" or "short description" or both!
When we search for something, we often ask a multi-faceted question. For example, you may want to retrieve entertainment news headlines published on "2018-04-12."
This question actually requires sending multiple queries in one request:
1.A query that retrieves documents from the "ENTERTAINMENT" category 2.A query that retrieves documents that were published on "2018-04-12"
Let's say you are most familiar with the match query
so you write the following request:
GET news_headlines/_search
{
"query": {
"match": {
"category": "ENTERTAINMENT",
"date":"2018-04-12"
}
}
}
Expected response from Elasticsearch:
Elasticsearch returns a 400-error along with the cause of the error in the response body. This HTTP status starts with a 4, meaning that something is off with the query syntax.
If you look at the response, Elasticsearch lists the error type(line 5) as "parsing_exception" and the reason(line 6) as "[match] query doesn't support multiple fields, found [category] and [date]".
Elasticsearch throws an error because a match query
can query documents from only one field. Our request tried to query multiple fields using only one match query
.
Bool Query
In Part 3, we learned how to combine multiple queries into one request by using the bool query
.
With the bool query
, you can combine multiple queries into one request and you can further specify boolean clauses to narrow down your search results.
This query offers four clauses that you can choose from:
- must
- must_not
- should
- filter
You can mix and match any of these clauses to get the relevant search results you want.
In our use case, we have two queries:
1.A query that retrieves documents from the "ENTERTAINMENT" category 2.A query that retrieves documents that were published on "2018-04-12"
The news headlines we want could be filtered into a yes or no category:
Is the news headline from the "ENTERTAINMENT" category? yes or no Was the news headline published on "2018-04-12"? yes or no
When documents could be filtered into either a yes or no category, we can use the filter clause and include two match queries
within it:
GET news_headlines/_search
{
"query": {
"bool": {
"filter": [
{
"match": {
"category": "ENTERTAINMENT"
}
},
{
"match": {
"date": "2018-04-12"
}
}
]
}
}
}
Expected response from Elastcsearch:
Elasticsearch returns a 200-success HTTP status and shows the top 10 hits whose "category" field contains the value "ENTERTAINMENT" and the "date" field contains the value of "2018-04-12".
Suppose you want to get the summary of categories that exist in our dataset. Since this requires summarizing your data, you decide to send the following aggregations request:
GET news_headlines/_search
{
"aggs": {
"by_category": {
"terms": {
"field": "category"
}
}
}
}
Expected response from Elasticsearch:
By default, Elasticsearch returns both top 10 search hits and aggregations results. Notice that the top 10 search hits take up lines 16-168.
Let's say you are only interested in the aggregations results.
You remember that you can add a "size" parameter and set it equal to 0 to avoid fetching the hits.
You send the following request to accomplish this task:
GET news_headlines/_search
{
"aggs": {
"size": 0,
"by_category": {
"terms": {
"field": "category"
}
}
}
}
Expected response from Elasticsearch:
Elasticsearch returns a 400-error along with the cause of the error in the response body. This HTTP status starts with a 4, meaning that there was a client error with the request sent.
If you look at the response, Elasticsearch lists the error type(line 5) as "parsing_exception" and the reason(line 6) as "Aggregation definition for [size starts with a [VALUE_NUMBER], expected a [START_OBJECT]".
Something is off with our aggregations request syntax. Let's take a look at the screenshots from the Elastic documentation on aggregations and see what we missed.
Screenshot from the documentation: Pay close attention to the syntax of the aggregations request.
This error is occurring because the "size" parameter was placed in a spot where Elasticsearch is expecting the name of the aggregations("my-agg-name").
If you scroll down to the Return only aggregation results
section in the documentation, you will see that the "size" parameter is placed outside of the aggregations request as shown below.
Screenshot from the documentation:
Place the "size" parameter outside of the aggregations request and set it equal to 0 as shown below.
Send the following request:
GET news_headlines/_search
{
"size": 0,
"aggs": {
"by_category": {
"terms": {
"field": "category"
}
}
}
}
Expected response from Elasticsearch:
As intended, Elasticsearch does not retrieve the top 10 hits(line 16).
You can see the aggregations results(an array of categories) without having to scroll through the hits.
The next two errors(error 8 & 9) are related to the requests we have learned in Part 4: Aggregations and Part 5: Mapping workshops.
During these workshops, we have worked with e-commerce dataset.
In Part 4, we have added the e-commerce dataset to Elasticsearch and named the index ecommerce_original_data
.
Then, we had to follow additional steps in Set up data within Elasticsearch section
section in Part 4 repo.
Screenshot from Part 4 Repo:
To set up data within Elasticsearch, we implemented the following steps: We never covered why we had to go through these steps. It was all because of the the error message we are about to see next!
From this point on, imagine that you had just added the e-commerce dataset into the ecommerce_original_data index
. We have not completed steps 1 and 2
In Part 4, we learned how to group data into buckets based on time interval. This type of aggregation request is called the date_histogram aggregation
.
Suppose we wanted to group our data into 8 hour buckets and have sent the request below:
GET ecommerce_original_data/_search
{
"size": 0,
"aggs": {
"transactions_by_8_hrs": {
"date_histogram": {
"field": "InvoiceDate",
"fixed_interval": "8h"
}
}
}
}
Expected response from Elasticsearch:
Elasticsearch returns a 400-error along with the cause of the error in the response body. This HTTP status starts with a 4, meaning that there was a client error with the request sent.
If you look at the response, Elasticsearch lists the error type(line 5) as "illegal_argument_exception" and the reason(line 6) as "Field [InvoiceDate] of type [keyword] is not supported for aggregation [date_histogram]".
This error is different from syntax error messages we have gone over thus far. It says that the field type "keyword" is not supported for the date_histogram aggregation
, which suggests that this error may have something to do with the mapping.
Let's check the mapping of the ecommerce_original_data
index:
GET ecommerce_original_data/_mapping
Expected response from Elasticsearch:
You will see that the field "InvoiceDate" is typed as "keyword".
Let's take a look at the screenshots from the Elastic documentation on date_histogram aggregations.
Screenshot from the documentation:
The first sentence gives us a valuable clue on why this error occurred!
The date_histogram aggregation
cannot be performed on a field typed as "keyword".
To perform a date_histogram aggregation
on the "InvoiceDate" field, the "InvoiceDate" field must be mapped as field type "date".
But the mapping for the field "date" already exists. What are we going to do?!
Remember, you cannot change the mapping of the existing field!
The only way you can accomplish this is to:
Step 1: Create a new index with the desired mapping
Step 2: Reindex data from the original index to the new one
Step 3: Send the date_histogram aggregation
request to the new index
In Part 4, this is why we carried out steps 1 and 2!
Step 1: Create a new index(ecommerce_data) with the following mapping
PUT ecommerce_data
{
"mappings": {
"properties": {
"Country": {
"type": "keyword"
},
"CustomerID": {
"type": "long"
},
"Description": {
"type": "text"
},
"InvoiceDate": {
"type": "date",
"format": "M/d/yyyy H:m"
},
"InvoiceNo": {
"type": "keyword"
},
"Quantity": {
"type": "long"
},
"StockCode": {
"type": "keyword"
},
"UnitPrice": {
"type": "double"
}
}
}
}
Side note about error associated with the _meta
field
If you were following the steps from Setting up data within Elasticsearch
section from Part 4, you probably have enountered the following error:
This was due to a typo in the request where I forgot to include an underscore before meta in line 4.
The _meta
field is a space used to include any notes that you want as a reference. It can be tips about common bug fixes or info about your app that you want to include.
The _meta
field is completely optional. For our use case, it is not necessary so I have removed the _meta
field from Part 4 repo since this issue came to my attention.
Sincere apologies to anybody who has encountered that error while following along and thank you to @radhakrishnaakamat for catching the error!!
Side note about adding the format of the "InvoiceDate" field
Let's look at the date format of the field "InvoiceDate":
GET ecommerce_original_data/_search
Expected response from Elasticsearch:
The format of the InvoiceDate is "M/d/yyyy H:m".
By default, Elasticsearch is configured to recognize iso8601 date format(ex. 2021-07-16T17:12:56.123Z).
If the date format in your dataset differs from the iso8601 format, Elasticsearch will not recognize it and throw an error.
In order to prevent this from happening, we specify the date format of the "InvoiceDate" field("format": "M/d/yyyy H:m") within the mapping.
The symbols used in date format was formed using this documentation.
We have covered a LOT! Let's do a recap on why we are carrying out these steps in the first place.
In Part 4, we added the e-commerce dataset to the ecommerce_original_data
index where the field "InvoiceDate" was dynamically typed as "keyword".
When we tried to run a date_histogram aggregation
on the field "InvoiceDate", Elasticsearch threw an error saying that it can only perform the date_histogram aggregation
on a field typed as "date".
Since we could not change the mapping of an existing field "InvoiceDate", we had to carry out step 1 where we created a new index called ecommerce_data
with the desired mapping for the field "InvoiceDate".
Step 2: Reindex the data from original index("source") to the one you just created("dest").
At this point, we have a new index called ecommerce_data
with the desired mapping. However, there is no data in this index.
To correct that, we will send the following request to reindex the data from the ecommerce_original_data
index to the ecommerce_data
index:
POST _reindex
{
"source": {
"index": "ecommerce_original_data"
},
"dest": {
"index": "ecommerce_data"
}
}
Expected response from Elasticsearch:
Elasticsearch successfully reindexes the e-commerce dataset from the ecommerce_original_data
index to the ecommerce_data
index.
Step 3: Send the date_histogram aggregations request to the new index(ecommerce_data).
Now that the data has been reindexed to the new index, let’s send the date_histogram aggregation
request we sent earlier.
The following is almost identical to the original request except that the index name has been changed to the new index(ecommerce_data
).
GET ecommerce_data/_search
{
"size": 0,
"aggs": {
"transactions_by_8_hrs": {
"date_histogram": {
"field": "InvoiceDate",
"fixed_interval": "8h"
}
}
}
}
Expected response from Elasticsearch:
Elasticsearch returns a 200-success response. It divides the dataset into 8 hour buckets and returns them in the response.
One of the cool things about Elasticsearch is that you can build any combination of aggregations to answer more complex questions.
For example, let's say we want to get the daily revenue and the number of unique customers per day.
This requires grouping data into daily buckets.
Within each bucket, we calculate the daily revenue and the number of unique customers per day.
Let's say we wrote the following request to accomplish this task:
GET ecommerce_data/_search
{
"size": 0,
"aggs": {
"transactions_per_day": {
"date_histogram": {
"field": "InvoiceDate",
"calendar_interval": "day"
},
"daily_revenue": {
"sum": {
"script": {
"source": "doc['UnitPrice'].value * doc['Quantity'].value"
}
}
},
"number_of_unique_customers_per_day": {
"cardinality": {
"field": "CustomerID"
}
}
}
}
}
Expected response from Elasticsearch:
Elasticsearch returns a 400-error along with the cause of the error in the response body. This HTTP error starts with a 4, meaning that there was a client error with the request sent.
This error is occurring because the structure of the aggregations request is incorrect.
In order to accomplish our goals, we first group data into daily buckets. Within each bucket, we calculate the daily revenue and the unique number of customers per day.
Therefore, our request contains an aggregation(pink brackets) within an aggregation(blue brackets).
The following demonstrates the correct aggregations request structure. Note the sub-aggregations that encloses the "daily_revenue" and the "number_of_unique_customers_per_day":
GET ecommerce_data/_search
{
"size": 0,
"aggs": {
"transactions_per_day": {
"date_histogram": {
"field": "InvoiceDate",
"calendar_interval": "day"
},
"aggs": {
"daily_revenue": {
"sum": {
"script": {
"source": "doc['UnitPrice'].value * doc['Quantity'].value"
}
}
},
"number_of_unique_customers_per_day": {
"cardinality": {
"field": "CustomerID"
}
}
}
}
}
}
Expected response from Elasticsearch:
Elasticsearch returns a 200-success HTTP status.
It groups the dataset into daily buckets. Within each bucket, the number of unique customers per day as well as the daily revenue are calculated.