Skip to content

Latest commit

 

History

History
894 lines (641 loc) · 35.4 KB

File metadata and controls

894 lines (641 loc) · 35.4 KB

Beginner's Crash Course to Elastic Stack Series

Part 6: Troubleshooting Beginner Level Elasticsearch Errors

Welcome to the Beginner's Crash Course to Elastic Stack!

This repo contains all resources shared during Part 6: Troubleshooting Beginner-Level Elasticsearch Errors!

Throughout the series, we have learned about CRUD operations, fine tuning the relevance of your search, full text search, aggregations, and mapping.

As you continue your journey with Elasticsearch, you will inevitably encounter some common errors associated with the topics we have covered in the series.

Learn how to troubleshoot these pesky errors so you can get unstuck!

Resources

Table of Contents: Beginner's Crash Course to Elastic Stack: This workshop is a part of the Beginner's Crash Course to Elastic Stack series. Check out this table contents to access all the workshops in the series.

Free Elastic Cloud Trial

Instructions on how to access Elasticsearch and Kibana on Elastic Cloud

Instructions for downloading Elasticsearch and Kibana

Presentation slides

Video recording of the workshop

Mini Beginner's Crash Course to Elasticsearch & Kibana playlist

Do you prefer learning by watching shorter videos? Check out this playlist to watch short clips of beginner's crash course full length workshops. Season 2 clips will be uploaded here in the future!

YouTube Playlist of the Beginner's Crash Course to Elastic Stack: Want to watch all the workshops in the series? Check out the YouTube playlist of the Beginner's Crash Course to Elastic Stack!

Season 2 Topics Survey: I want to make the content more digestible for all of you!

Starting with season 2, I will discontinue holding live hour-long workshops and start uploading short clips(10 min or less) on YouTube Playlist of the Beginner's Crash Course to Elastic Stack.

This series is created for YOU! Please let me know what you would like to learn in season 2 by submitting your preference in the survey.

I will create content on most requested topics along with other helpful topics for beginners!

Want To Troubleshoot Your Errors? Follow The Clues!

Whenever you perform an action with Elasticsearch and Kibana, Elasticsearch responds with an HTTP status and a response body.

The request below asks Elasticsearch to index a document and assign it an id of 1.

image

The HTTP status of 201-success indicates that the document has been successfully created. The response body indicates that the document with an assigned id of 1 has been created in the beginners_crash_course index.

As we work with Elasticsearch, we will inevitably encounter error messages like the one below.

image

When this happens, the HTTP status and the response body will provide valuable clues about why the request failed!

Common Errors

Here are some common errors that you may encounter as you work with Elasticsearch.

Unable to connect

The cluster may be down or it may be a network issue. Check the network status and cluster health to identify the problem.

Connection unexpectedly closed

The node may have died or it may be a network issue. Retry your request.

5XX Errors

Errors with an HTTP status starting with 5 stems from internal server error in Elasticsearch. When you see this error, take a look at the Elasticsearch log and identify the problem.

4XX Errors

Errors with an HTTP status starting with 4 stems from client errors. When you see this error, correct the request before retrying.

As beginners, we are still familiarizing ourselves with the rules and syntax required to communicate with Elasticsearch. Majority of the error messages we encouter are likely to have been caused by the mistakes we make while writing our requests(4XX errors).

To strengthen our understanding of the requests we have learned throughout the series, we will only focus on 4XX errors during this workshop.

Thought Process For Troubleshooting Errors

  1. What number does the HTTP status start with(4XX? 5XX?)
  2. What does the response say? Always read the full message!
  3. Use the Elasticsearch documentation as your guide. Compare your request with the example from the documentation. Identify the mistake and make appropriate changes.

At times, you will encounter error messages that are not very helpful. We will go over a couple of these and see how we can troubleshoot these types of errors.

Trip Down Memory Lane

Throughout the series, we learned how to send requests related to the following topics:

  1. CRUD operations
  2. Queries
  3. Aggregations
  4. Mapping

We will revisit each topic and troubleshoot common errors you may encounter as you explore each topic.

Errors Associated With CRUD Operations

Error 1: 404 No such index[x]

In Part 1: Intro to Elasticsearch and Kibana, we learned how to perform CRUD operations. Let's say we have sent the following request to retrieve a document with an id of 1 from the common_errors index.

Request sent:

GET common_errors/_doc/1

Expected response from Elasticsearch:

Elasticsearch returns a 404-error along with the cause of the error in the response body. The HTTP status starts with a 4, meaning that there was a client error with the request sent.

image

When you look at the response body, Elasticsearch lists the reason(line 6) as "no such index [common_errors]".

The two possible explanations for this error are:

  1. The index common_errors truly does not exist or was deleted
  2. We do not have the correct index name

Cause of Error 1

In our example, the cause of the error is quite clear! We have not created an index called common_errors and we were trying to retrieve a document from an index that does not exist.

Let's create an index called common_errors:

Syntax:

PUT Name-of-the-Index

Example:

PUT common_errors

Expected response from Elasticsearch:

Elasticsearch returns a 200-success HTTP status acknowledging that the index common_errors has been successfully created.

image

Error 2: 405 Incorrect HTTP method for uri, allowed: [x]

Now that we have created the index common_errors, let's index a document!

Suppose you have remembered that you could use the HTTP verb PUT to index a document and send the following request:

PUT common_errors/_doc
{
  "source_of_error": "Using the wrong syntax for PUT or POST indexing request"
}

Expected response from Elasticsearch:

Elasticsearch returns a 405-error along with the cause of the error in the response body. This HTTP status starts with a 4, meaning that there was a client error with the request sent.

If you look at the response, Elasticsearch lists the reason as "Incorrect HTTP method for uri... and method: [PUT], allowed:[POST]".

image

Cause of Error 2

This error message suggests that we used the wrong HTTP verb to index this document.

You can use either PUT or POST HTTP verb to index a document. Each HTTP verb serves a different purpose and requires a different syntax.

We learned about the difference between the two verbs during Part 1: Intro to Elasticsearch and Kibana under the Index a document section.

When indexing a document, HTTP verb PUT or POST can be used.

The HTTP verb PUT is used when you want to assign a specific id to your document.

Syntax:

PUT name-of-the-Index/_doc/id-you-want-to-assign-to-this-document
{
  field_name: "value"
}

Let's compare the syntax to the request we just sent:

PUT common_errors/_doc
{
  "source_of_error": "Using the wrong syntax for PUT or POST indexing request"
}

You will see that our request uses the HTTP verb PUT but it does not include the document id we want to assign to this document.

If you add the id of the document to the request as seen below, you will see that the request is carried out without a hitch!

Correct example for PUT indexing request:

PUT common_errors/_doc/1
{
  "source_of_error": "Using the wrong syntax for PUT or POST indexing request"
}

Expected response from Elasticsearch:

Elasticsearch returns a 201-success HTTP status acknowledging that document 1 has been successfully created.

image

The HTTP verb POST is used when you want Elasticsearch to autogenerate an id for the document.

If this is the option you wanted, you could fix the error message by replacing the verb PUT with POST and not including the document id after the document endpoint.

Syntax:

POST Name-of-the-Index/_doc
{
  field_name: "value"
}

Correct example for POST indexing request:

POST common_errors/_doc
{
  "source_of_error": "Using the wrong syntax for PUT or POST indexing request"
}

Expected response from Elasticsearch:

Elasticsearch returns a 201-success HTTP status and autogenerates an id(line 4) for the document that was indexed.

image

Error 3: 400 Unexpected Character: was expecting a comma to separate Object entries at [Source: ...] line: x

Suppose you wanted to update document 1 by adding the fields "error" and "solution" as seen in the example.

Example:

POST common_errors/_update/1
{
  "doc": {
    "error": "405 Method Not Allowed"
    "solution": "Look up the syntax of PUT and POST indexing requests and use the correct syntax."
  }
}

Expected response from Elasticsearch:

Elasticsearch returns a 400-error along with the cause of the error in the response body. This HTTP error starts with a 4, meaning that there was a client error with the request sent.

image

Cause of Error 3

If you look at the response, Elasticsearch lists the error type(line 12) as "json_parse_exception" and the reason(line 13) as "...was expecting comma to separate Object entries at ... line: 4]".

In Elasticsearch, if you have multiple fields("errors" and "solution") in an object("doc"), you must separate each field with a comma. The error message tells us that we need to add a comma between the fields "error" and "solution".

Add the comma as shown below and send the following request:

POST common_errors/_update/1
{
  "doc": {
    "error": "405 Method Not Allowed",
    "solution": "Look up the syntax of PUT and POST indexing requests and use the correct syntax."
  }
}

Expected response from Elasticsearch:

You will see that the document with an id of 1 has been successfully updated.

image

Errors Associated With Sending Queries

In parts 2 and 3, we learned how to send queries about news headlines in our index.

As a prerequisite part of these workshops, we added a news headlines dataset to an index we named as news_headlines.

We sent various queries to retrieve documents that match the criteria. Let's go over common errors you may encounter while working with these queries.

Error 4: 400 [x] query does not support [y]

Suppose you want to use the range query to pull up news headlines published within a specific date range.

You have sent the following request:

GET news_headlines/_search
{
  "query": {
    "range": {
      "date": 
        "gte": "2015-06-20",
        "lte": "2015-09-22"
    }
  }
}

Expected response from Elasticsearch:

Elasticsearch returns a 400-error along with the cause of the error in the response body. This HTTP status starts with a 4, meaning that there was a client error with the request sent.

image

If you look at the response, Elasticsearch lists the error type(line 5) as "parsing_exception" and the reason(line 6) as "[range] query does not support [date]".

This error message is misleading as the range query should be able to retrieve documents that contain terms within a provided range. It should not matter that you have requested to run a range query against the field "date".

Let's check out the screenshots from the Elastic documentation on the range query to see what is going on.

Pay attention to the syntax of the range query line by line.

Screenshot from the documentation: image

Compare this syntax to the request we have sent earlier:

GET news_headlines/_search
{
  "query": {
    "range": {
      "date": 
        "gte": "2015-06-20",
        "lte": "2015-09-22"
    }
  }
}

Cause of Error 4

The culprit of this error is the range query syntax!

Our request is missing curly brackets around the inner fields("gte" and "lte") of the field "date".

Let's add the curly brackets as shown below and send the following request:

GET news_headlines/_search
{
  "query": {
    "range": {
      "date": {
        "gte": "2015-06-20",
        "lte": "2015-09-22"
      }
    }
  }
}

Expected response from Elasticsearch:

Elasticsearch returns a 200-success status and retrieves news headlines that were published between the specified date range.

image

Error 5: 400 Unexpected character...: was expecting double-quote to start field name.

In Part 2, we learned about the multi_match query. This query allows you to search for the same search terms in multiple fields at one time.

Suppose you wanted to search for the phrase "party planning" in the fields headline and short_description as shown below:

GET news_headlines/_search
{
  "query": {
    "multi_match": {
      "query": "party planning",
      "fields": [
        "headline",
        "short_description"
      ]
    },
    "type": "phrase"
  }
}

Expected response from Elasticsearch:

Elasticsearch returns a 400-error along with the cause of the error in the response body. This HTTP status starts with a 4, meaning that something isn't quite right with the request sent.

image

If you look at the response, Elasticsearch lists the error type(line 5) as "parsing_exception" and the reason(line 6) as "[multi_match] malformed query, expected [END_OBJECT] but found [FIELD_NAME]".

Cause of Error 5

This is a vague error message that does not really tell you what went wrong.

However, we do know that the error is coming from somewhere around line 10, which suggests that the error may have something to do with the "type" parameter(line 11).

When you check the opening and closing brackets from the outside in, you will realize that the "type" parameter is placed outside of the multi_match query.

Move the "type" parameter up a line and move the comma from line 10 to line 9 as shown below and send the request:

GET news_headlines/_search
{
  "query": {
    "multi_match": {
      "query": "party planning",
      "fields": [
        "headline",
        "short_description"
      ],
      "type": "phrase"
    }
  }
}

Expected response from Elasticsearch:

Elastcsearch returns a 200-success(red box) response.

All hits contain the phrase "party planning" in either the field "headline" or "short description" or both!

image

Error 6: 400 parsing_exception

When we search for something, we often ask a multi-faceted question. For example, you may want to retrieve entertainment news headlines published on "2018-04-12."

This question actually requires sending multiple queries in one request:

1.A query that retrieves documents from the "ENTERTAINMENT" category 2.A query that retrieves documents that were published on "2018-04-12"

Let's say you are most familiar with the match query so you write the following request:

GET news_headlines/_search
{
  "query": {
    "match": {
      "category": "ENTERTAINMENT",
      "date":"2018-04-12"
    }
  }
}

Expected response from Elasticsearch:

Elasticsearch returns a 400-error along with the cause of the error in the response body. This HTTP status starts with a 4, meaning that something is off with the query syntax.

image

If you look at the response, Elasticsearch lists the error type(line 5) as "parsing_exception" and the reason(line 6) as "[match] query doesn't support multiple fields, found [category] and [date]".

Cause of Error 6

Elasticsearch throws an error because a match query can query documents from only one field. Our request tried to query multiple fields using only one match query.

Bool Query

In Part 3, we learned how to combine multiple queries into one request by using the bool query.

With the bool query, you can combine multiple queries into one request and you can further specify boolean clauses to narrow down your search results.

This query offers four clauses that you can choose from:

  1. must
  2. must_not
  3. should
  4. filter

You can mix and match any of these clauses to get the relevant search results you want.

In our use case, we have two queries:

1.A query that retrieves documents from the "ENTERTAINMENT" category 2.A query that retrieves documents that were published on "2018-04-12"

The news headlines we want could be filtered into a yes or no category:

Is the news headline from the "ENTERTAINMENT" category? yes or no Was the news headline published on "2018-04-12"? yes or no

When documents could be filtered into either a yes or no category, we can use the filter clause and include two match queries within it:

GET news_headlines/_search
{
  "query": {
    "bool": {
      "filter": [
        {
          "match": {
            "category": "ENTERTAINMENT"
          }
        },
        {
          "match": {
            "date": "2018-04-12"
          }
        }
      ]
    }
  }
}

Expected response from Elastcsearch:

Elasticsearch returns a 200-success HTTP status and shows the top 10 hits whose "category" field contains the value "ENTERTAINMENT" and the "date" field contains the value of "2018-04-12".

image

Errors Associated With Aggregations and Mapping

Suppose you want to get the summary of categories that exist in our dataset. Since this requires summarizing your data, you decide to send the following aggregations request:

GET news_headlines/_search
{
  "aggs": {
    "by_category": {
      "terms": {
        "field": "category"
      }
    }
  }
}

Expected response from Elasticsearch:

By default, Elasticsearch returns both top 10 search hits and aggregations results. Notice that the top 10 search hits take up lines 16-168.

image

Error 7: 400 Aggregation definition for [x], expected a [y].

Let's say you are only interested in the aggregations results.

You remember that you can add a "size" parameter and set it equal to 0 to avoid fetching the hits.

You send the following request to accomplish this task:

GET news_headlines/_search
{
  "aggs": {
    "size": 0,
    "by_category": {
      "terms": {
        "field": "category"
      }
    }
  }
}

Expected response from Elasticsearch:

Elasticsearch returns a 400-error along with the cause of the error in the response body. This HTTP status starts with a 4, meaning that there was a client error with the request sent.

image

If you look at the response, Elasticsearch lists the error type(line 5) as "parsing_exception" and the reason(line 6) as "Aggregation definition for [size starts with a [VALUE_NUMBER], expected a [START_OBJECT]".

Something is off with our aggregations request syntax. Let's take a look at the screenshots from the Elastic documentation on aggregations and see what we missed.

Screenshot from the documentation: Pay close attention to the syntax of the aggregations request. image

Cause of Error 7

This error is occurring because the "size" parameter was placed in a spot where Elasticsearch is expecting the name of the aggregations("my-agg-name").

If you scroll down to the Return only aggregation results section in the documentation, you will see that the "size" parameter is placed outside of the aggregations request as shown below.

Screenshot from the documentation: image

Place the "size" parameter outside of the aggregations request and set it equal to 0 as shown below.

Send the following request:

GET news_headlines/_search
{
  "size": 0,
  "aggs": {
    "by_category": {
      "terms": {
        "field": "category"
      }
    }
  }
}

Expected response from Elasticsearch:

As intended, Elasticsearch does not retrieve the top 10 hits(line 16).

You can see the aggregations results(an array of categories) without having to scroll through the hits.

image

Error 8: 400 Field [x] of type [y] is not supported for z type of aggregation

The next two errors(error 8 & 9) are related to the requests we have learned in Part 4: Aggregations and Part 5: Mapping workshops.

During these workshops, we have worked with e-commerce dataset.

In Part 4, we have added the e-commerce dataset to Elasticsearch and named the index ecommerce_original_data.

Then, we had to follow additional steps in Set up data within Elasticsearch section section in Part 4 repo.

Screenshot from Part 4 Repo:

To set up data within Elasticsearch, we implemented the following steps: image We never covered why we had to go through these steps. It was all because of the the error message we are about to see next!

From this point on, imagine that you had just added the e-commerce dataset into the ecommerce_original_data index. We have not completed steps 1 and 2

In Part 4, we learned how to group data into buckets based on time interval. This type of aggregation request is called the date_histogram aggregation.

Suppose we wanted to group our data into 8 hour buckets and have sent the request below:

GET ecommerce_original_data/_search
{
  "size": 0,
  "aggs": {
    "transactions_by_8_hrs": {
      "date_histogram": {
        "field": "InvoiceDate",
        "fixed_interval": "8h"
      }
    }
  }
}

Expected response from Elasticsearch:

Elasticsearch returns a 400-error along with the cause of the error in the response body. This HTTP status starts with a 4, meaning that there was a client error with the request sent.

image

If you look at the response, Elasticsearch lists the error type(line 5) as "illegal_argument_exception" and the reason(line 6) as "Field [InvoiceDate] of type [keyword] is not supported for aggregation [date_histogram]".

This error is different from syntax error messages we have gone over thus far. It says that the field type "keyword" is not supported for the date_histogram aggregation, which suggests that this error may have something to do with the mapping.

Let's check the mapping of the ecommerce_original_data index:

GET ecommerce_original_data/_mapping

Expected response from Elasticsearch:

You will see that the field "InvoiceDate" is typed as "keyword".

image

Cause of Error 8

Let's take a look at the screenshots from the Elastic documentation on date_histogram aggregations.

Screenshot from the documentation:

image

The first sentence gives us a valuable clue on why this error occurred!

The date_histogram aggregation cannot be performed on a field typed as "keyword".

To perform a date_histogram aggregation on the "InvoiceDate" field, the "InvoiceDate" field must be mapped as field type "date".

But the mapping for the field "date" already exists. What are we going to do?!

Remember, you cannot change the mapping of the existing field!

The only way you can accomplish this is to: Step 1: Create a new index with the desired mapping Step 2: Reindex data from the original index to the new one Step 3: Send the date_histogram aggregation request to the new index

In Part 4, this is why we carried out steps 1 and 2!

Step 1: Create a new index(ecommerce_data) with the following mapping

PUT ecommerce_data
{
  "mappings": {
    "properties": {
      "Country": {
        "type": "keyword"
      },
      "CustomerID": {
        "type": "long"
      },
      "Description": {
        "type": "text"
      },
      "InvoiceDate": {
        "type": "date",
        "format": "M/d/yyyy H:m"
      },
      "InvoiceNo": {
        "type": "keyword"
      },
      "Quantity": {
        "type": "long"
      },
      "StockCode": {
        "type": "keyword"
      },
      "UnitPrice": {
        "type": "double"
      }
    }
  }
}

Side note about error associated with the _meta field

If you were following the steps from Setting up data within Elasticsearch section from Part 4, you probably have enountered the following error:

image

This was due to a typo in the request where I forgot to include an underscore before meta in line 4.

image

The _meta field is a space used to include any notes that you want as a reference. It can be tips about common bug fixes or info about your app that you want to include.

The _meta field is completely optional. For our use case, it is not necessary so I have removed the _meta field from Part 4 repo since this issue came to my attention.

Sincere apologies to anybody who has encountered that error while following along and thank you to @radhakrishnaakamat for catching the error!!

Side note about adding the format of the "InvoiceDate" field

Let's look at the date format of the field "InvoiceDate":

GET ecommerce_original_data/_search 

Expected response from Elasticsearch: image

The format of the InvoiceDate is "M/d/yyyy H:m".

By default, Elasticsearch is configured to recognize iso8601 date format(ex. 2021-07-16T17:12:56.123Z).

If the date format in your dataset differs from the iso8601 format, Elasticsearch will not recognize it and throw an error.

In order to prevent this from happening, we specify the date format of the "InvoiceDate" field("format": "M/d/yyyy H:m") within the mapping.

The symbols used in date format was formed using this documentation.

We have covered a LOT! Let's do a recap on why we are carrying out these steps in the first place.

In Part 4, we added the e-commerce dataset to the ecommerce_original_data index where the field "InvoiceDate" was dynamically typed as "keyword". image

When we tried to run a date_histogram aggregation on the field "InvoiceDate", Elasticsearch threw an error saying that it can only perform the date_histogram aggregation on a field typed as "date". image

Since we could not change the mapping of an existing field "InvoiceDate", we had to carry out step 1 where we created a new index called ecommerce_data with the desired mapping for the field "InvoiceDate". image

Step 2: Reindex the data from original index("source") to the one you just created("dest").

At this point, we have a new index called ecommerce_data with the desired mapping. However, there is no data in this index.

To correct that, we will send the following request to reindex the data from the ecommerce_original_data index to the ecommerce_data index:

POST _reindex
{
  "source": {
    "index": "ecommerce_original_data"
  },
  "dest": {
    "index": "ecommerce_data"
  }
}

Expected response from Elasticsearch:

Elasticsearch successfully reindexes the e-commerce dataset from the ecommerce_original_data index to the ecommerce_data index.

image

Step 3: Send the date_histogram aggregations request to the new index(ecommerce_data).

Now that the data has been reindexed to the new index, let’s send the date_histogram aggregation request we sent earlier.

The following is almost identical to the original request except that the index name has been changed to the new index(ecommerce_data).

GET ecommerce_data/_search
{
  "size": 0,
  "aggs": {
    "transactions_by_8_hrs": {
      "date_histogram": {
        "field": "InvoiceDate",
        "fixed_interval": "8h"
      }
    }
  }
}

Expected response from Elasticsearch:

Elasticsearch returns a 200-success response. It divides the dataset into 8 hour buckets and returns them in the response.

image

Error 9: 400 Found two aggregation type definitions in [x]: y and z

One of the cool things about Elasticsearch is that you can build any combination of aggregations to answer more complex questions.

For example, let's say we want to get the daily revenue and the number of unique customers per day.

This requires grouping data into daily buckets. image

Within each bucket, we calculate the daily revenue and the number of unique customers per day. image

Let's say we wrote the following request to accomplish this task:

GET ecommerce_data/_search
{
  "size": 0,
  "aggs": {
    "transactions_per_day": {
      "date_histogram": {
        "field": "InvoiceDate",
        "calendar_interval": "day"
      },
      "daily_revenue": {
        "sum": {
          "script": {
            "source": "doc['UnitPrice'].value * doc['Quantity'].value"
          }
        }
      },
      "number_of_unique_customers_per_day": {
        "cardinality": {
          "field": "CustomerID"
        }
      }
    }
  }
}

Expected response from Elasticsearch:

Elasticsearch returns a 400-error along with the cause of the error in the response body. This HTTP error starts with a 4, meaning that there was a client error with the request sent. image

Cause of Error 9

This error is occurring because the structure of the aggregations request is incorrect.

In order to accomplish our goals, we first group data into daily buckets. Within each bucket, we calculate the daily revenue and the unique number of customers per day.

Therefore, our request contains an aggregation(pink brackets) within an aggregation(blue brackets). image

The following demonstrates the correct aggregations request structure. Note the sub-aggregations that encloses the "daily_revenue" and the "number_of_unique_customers_per_day":

GET ecommerce_data/_search
{
  "size": 0,
  "aggs": {
    "transactions_per_day": {
      "date_histogram": {
        "field": "InvoiceDate",
        "calendar_interval": "day"
      },
      "aggs": {
        "daily_revenue": {
          "sum": {
            "script": {
              "source": "doc['UnitPrice'].value * doc['Quantity'].value"
            }
          }
        },
        "number_of_unique_customers_per_day": {
          "cardinality": {
            "field": "CustomerID"
          }
        }
      }
    }
  }
}

Expected response from Elasticsearch:

Elasticsearch returns a 200-success HTTP status.

It groups the dataset into daily buckets. Within each bucket, the number of unique customers per day as well as the daily revenue are calculated.

image