API: Add response filtering with filter_path parameter

This change adds a new "filter_path" parameter that can be used to filter and reduce the responses returned by the REST API of elasticsearch. For example, returning only the shards that failed to be optimized: ``` curl -XPOST 'localhost:9200/beer/_optimize?filter_path=_shards.failed' {"_shards":{"failed":0}}% ``` It supports multiple filters (separated by a comma): ``` curl -XGET 'localhost:9200/_mapping?pretty&filter_path=*.mappings.*.properties.name,*.mappings.*.properties.title' ``` It also supports the YAML response format. Here it returns only the `_id` field of a newly indexed document: ``` curl -XPOST 'localhost:9200/library/book?filter_path=_id' -d '---hello:\n world: 1\n' --- _id: "AU0j64-b-stVfkvus5-A" ``` It also supports wildcards. Here it returns only the host name of every nodes in the cluster: ``` curl -XGET 'http://localhost:9200/_nodes/stats?filter_path=nodes.*.host*' {"nodes":{"lvJHed8uQQu4brS-SXKsNA":{"host":"portable"}}} ``` And "**" can be used to include sub fields without knowing the exact path. Here it returns only the Lucene version of every segment: ``` curl 'http://localhost:9200/_segments?pretty&filter_path=indices.**.version' { "indices" : { "beer" : { "shards" : { "0" : [ { "segments" : { "_0" : { "version" : "5.2.0" }, "_1" : { "version" : "5.2.0" } } } ] } } } } ``` Note that elasticsearch sometimes returns directly the raw value of a field, like the _source field. If you want to filter _source fields, you should consider combining the already existing _source parameter (see Get API for more details) with the filter_path parameter like this: ``` curl -XGET 'localhost:9200/_search?pretty&filter_path=hits.hits._source&_source=title' { "hits" : { "hits" : [ { "_source":{"title":"Book elastic#2"} }, { "_source":{"title":"Book elastic#1"} }, { "_source":{"title":"Book elastic#3"} } ] } } ```
tlrx · May 26, 2015 · ce63590 · ce63590
1 parent 543f572
commit ce63590
Show file tree

Hide file tree

Showing 31 changed files with 1,986 additions and 66 deletions.
diff --git a/docs/reference/api-conventions.asciidoc b/docs/reference/api-conventions.asciidoc
@@ -81,6 +81,113 @@ being consumed by a monitoring tool, rather than intended for human
 consumption.  The default for the `human` flag is
 `false`.
 
+[float]
+=== Response Filtering
+
+All REST APIs accept a `filter_path` parameter that can be used to reduce
+the response returned by elasticsearch. This parameter takes a comma
+separated list of filters expressed with the dot notation:
+
+[source,sh]
+--------------------------------------------------
+curl -XGET 'localhost:9200/_search?pretty&filter_path=took,hits.hits._id,hits.hits._score'
+{
+  "took" : 3,
+  "hits" : {
+    "hits" : [
+      {
+        "_id" : "3640",
+        "_score" : 1.0
+      },
+      {
+        "_id" : "3642",
+        "_score" : 1.0
+      }
+    ]
+  }
+}
+--------------------------------------------------
+
+It also supports the `*` wildcard character to match any field or part
+of a field's name:
+
+[source,sh]
+--------------------------------------------------
+curl -XGET 'localhost:9200/_nodes/stats?filter_path=nodes.*.ho*'
+{
+  "nodes" : {
+    "lvJHed8uQQu4brS-SXKsNA" : {
+      "host" : "portable"
+    }
+  }
+}
+--------------------------------------------------
+
+And the `**` wildcard can be used to include fields without knowing the
+exact path of the field. For example, we can return the Lucene version
+of every segment with this request:
+
+[source,sh]
+--------------------------------------------------
+curl 'localhost:9200/_segments?pretty&filter_path=indices.**.version'
+{
+  "indices" : {
+    "movies" : {
+      "shards" : {
+        "0" : [ {
+          "segments" : {
+            "_0" : {
+              "version" : "5.2.0"
+            }
+          }
+        } ],
+        "2" : [ {
+          "segments" : {
+            "_0" : {
+              "version" : "5.2.0"
+            }
+          }
+        } ]
+      }
+    },
+    "books" : {
+      "shards" : {
+        "0" : [ {
+          "segments" : {
+            "_0" : {
+              "version" : "5.2.0"
+            }
+          }
+        } ]
+      }
+    }
+  }
+}
+--------------------------------------------------
+
+Note that elasticsearch sometimes returns directly the raw value of a field,
+like the `_source` field. If you want to filter _source fields, you should
+consider combining the already existing `_source` parameter (see
+<<get-source-filtering,Get API>> for more details) with the `filter_path`
+ parameter like this:
+
+[source,sh]
+--------------------------------------------------
+curl -XGET 'localhost:9200/_search?pretty&filter_path=hits.hits._source&_source=title'
+{
+  "hits" : {
+    "hits" : [ {
+      "_source":{"title":"Book #2"}
+    }, {
+      "_source":{"title":"Book #1"}
+    }, {
+      "_source":{"title":"Book #3"}
+    } ]
+  }
+}
+--------------------------------------------------
+
+
 [float]
 === Flat Settings
 

diff --git a/rest-api-spec/api/nodes.stats.json b/rest-api-spec/api/nodes.stats.json
@@ -56,6 +56,10 @@
           "options" : ["node", "indices", "shards"],
           "default" : "node"
         },
+        "filter_path": {
+          "type" : "list",
+          "description" : "A comma-separated list of fields to include in the returned response"
+        },
         "types" : {
           "type" : "list",
           "description" : "A comma-separated list of document types for the `indexing` index metric"

diff --git a/rest-api-spec/api/search.json b/rest-api-spec/api/search.json
@@ -72,6 +72,10 @@
           "type" : "boolean",
           "description" : "Specify whether query terms should be lowercased"
         },
+        "filter_path": {
+          "type" : "list",
+          "description" : "A comma-separated list of fields to include in the returned response"
+        },
         "preference": {
           "type" : "string",
           "description" : "Specify the node or shard the operation should be performed on (default: random)"

diff --git a/rest-api-spec/test/nodes.stats/20_response_filtering.yaml b/rest-api-spec/test/nodes.stats/20_response_filtering.yaml
@@ -0,0 +1,154 @@
+---
+"Nodes Stats with response filtering":
+  - do:
+      cluster.state: {}
+
+  # Get master node id
+  - set: { master_node: master }
+
+  # Nodes Stats with no filtering
+  - do:
+      nodes.stats: {}
+
+  - is_true: cluster_name
+  - is_true: nodes
+  - is_true: nodes.$master.name
+  - is_true: nodes.$master.indices
+  - is_true: nodes.$master.indices.docs
+  - gte:     { nodes.$master.indices.docs.count: 0 }
+  - is_true: nodes.$master.indices.segments
+  - gte:     { nodes.$master.indices.segments.count: 0 }
+  - is_true: nodes.$master.jvm
+  - is_true: nodes.$master.jvm.threads
+  - gte:     { nodes.$master.jvm.threads.count: 0 }
+  - is_true: nodes.$master.jvm.buffer_pools.direct
+  - gte:     { nodes.$master.jvm.buffer_pools.direct.count: 0 }
+  - gte:     { nodes.$master.jvm.buffer_pools.direct.used_in_bytes: 0 }
+
+  # Nodes Stats with only "cluster_name" field
+  - do:
+      nodes.stats:
+        filter_path: cluster_name
+
+  - is_true:  cluster_name
+  - is_false: nodes
+  - is_false: nodes.$master.name
+  - is_false: nodes.$master.indices
+  - is_false: nodes.$master.jvm
+
+  # Nodes Stats with "nodes" field and sub-fields
+  - do:
+      nodes.stats:
+        filter_path: nodes.*
+
+  - is_false: cluster_name
+  - is_true:  nodes
+  - is_true:  nodes.$master.name
+  - is_true:  nodes.$master.indices
+  - is_true:  nodes.$master.indices.docs
+  - gte:      { nodes.$master.indices.docs.count: 0 }
+  - is_true:  nodes.$master.indices.segments
+  - gte:      { nodes.$master.indices.segments.count: 0 }
+  - is_true:  nodes.$master.jvm
+  - is_true:  nodes.$master.jvm.threads
+  - gte:      { nodes.$master.jvm.threads.count: 0 }
+  - is_true:  nodes.$master.jvm.buffer_pools.direct
+  - gte:      { nodes.$master.jvm.buffer_pools.direct.count: 0 }
+  - gte:     { nodes.$master.jvm.buffer_pools.direct.used_in_bytes: 0 }
+
+  # Nodes Stats with "nodes.*.indices" field and sub-fields
+  - do:
+      nodes.stats:
+        filter_path: nodes.*.indices
+
+  - is_false: cluster_name
+  - is_true:  nodes
+  - is_false: nodes.$master.name
+  - is_true:  nodes.$master.indices
+  - is_true:  nodes.$master.indices.docs
+  - gte:      { nodes.$master.indices.docs.count: 0 }
+  - is_true:  nodes.$master.indices.segments
+  - gte:      { nodes.$master.indices.segments.count: 0 }
+  - is_false: nodes.$master.jvm
+
+  # Nodes Stats with "nodes.*.name" and "nodes.*.indices.docs.count" fields
+  - do:
+      nodes.stats:
+        filter_path: [ "nodes.*.name", "nodes.*.indices.docs.count" ]
+
+  - is_false: cluster_name
+  - is_true:  nodes
+  - is_true:  nodes.$master.name
+  - is_true:  nodes.$master.indices
+  - is_true:  nodes.$master.indices.docs
+  - gte:      { nodes.$master.indices.docs.count: 0 }
+  - is_false: nodes.$master.indices.segments
+  - is_false: nodes.$master.jvm
+
+  # Nodes Stats with all "count" fields
+  - do:
+      nodes.stats:
+        filter_path: "nodes.**.count"
+
+  - is_false: cluster_name
+  - is_true:  nodes
+  - is_false: nodes.$master.name
+  - is_true:  nodes.$master.indices
+  - is_true:  nodes.$master.indices.docs
+  - gte:      { nodes.$master.indices.docs.count: 0 }
+  - is_true:  nodes.$master.indices.segments
+  - gte:      { nodes.$master.indices.segments.count: 0 }
+  - is_true:  nodes.$master.jvm
+  - is_true:  nodes.$master.jvm.threads
+  - gte:      { nodes.$master.jvm.threads.count: 0 }
+  - is_true:  nodes.$master.jvm.buffer_pools.direct
+  - gte:      { nodes.$master.jvm.buffer_pools.direct.count: 0 }
+  - is_false: nodes.$master.jvm.buffer_pools.direct.used_in_bytes
+
+  # Nodes Stats with all "count" fields in sub-fields of "jvm" field
+  - do:
+      nodes.stats:
+        filter_path: "nodes.**.jvm.**.count"
+
+  - is_false: cluster_name
+  - is_true:  nodes
+  - is_false: nodes.$master.name
+  - is_false: nodes.$master.indices
+  - is_false: nodes.$master.indices.docs.count
+  - is_false: nodes.$master.indices.segments.count
+  - is_true:  nodes.$master.jvm
+  - is_true:  nodes.$master.jvm.threads
+  - gte:      { nodes.$master.jvm.threads.count: 0 }
+  - is_true:  nodes.$master.jvm.buffer_pools.direct
+  - gte:      { nodes.$master.jvm.buffer_pools.direct.count: 0 }
+  - is_false: nodes.$master.jvm.buffer_pools.direct.used_in_bytes
+
+  # Nodes Stats with "nodes.*.fs.data" fields
+  - do:
+      nodes.stats:
+        filter_path: "nodes.*.fs.data"
+
+  - is_false: cluster_name
+  - is_true:  nodes
+  - is_false: nodes.$master.name
+  - is_false: nodes.$master.indices
+  - is_false: nodes.$master.jvm
+  - is_true:  nodes.$master.fs.data
+  - is_true:  nodes.$master.fs.data.0.path
+  - is_true:  nodes.$master.fs.data.0.type
+  - is_true:  nodes.$master.fs.data.0.total_in_bytes
+
+  # Nodes Stats with "nodes.*.fs.data.t*" fields
+  - do:
+      nodes.stats:
+        filter_path: "nodes.*.fs.data.t*"
+
+  - is_false: cluster_name
+  - is_true:  nodes
+  - is_false: nodes.$master.name
+  - is_false: nodes.$master.indices
+  - is_false: nodes.$master.jvm
+  - is_true:  nodes.$master.fs.data
+  - is_false: nodes.$master.fs.data.0.path
+  - is_true:  nodes.$master.fs.data.0.type
+  - is_true:  nodes.$master.fs.data.0.total_in_bytes
diff --git a/rest-api-spec/test/search/70_response_filtering.yaml b/rest-api-spec/test/search/70_response_filtering.yaml
@@ -0,0 +1,87 @@
+---
+"Search with response filtering":
+  - do:
+      indices.create:
+          index:  test
+  - do:
+      index:
+          index:  test
+          type:   test
+          id:     1
+          body:   { foo: bar }
+
+  - do:
+      index:
+          index:  test
+          type:   test
+          id:     2
+          body:   { foo: bar }
+
+  - do:
+      indices.refresh:
+        index: [test]
+
+  - do:
+      search:
+        index: test
+        filter_path: "*"
+        body: "{ query: { match_all: {} } }"
+
+  - is_true: took
+  - is_true: _shards.total
+  - is_true: hits.total
+  - is_true: hits.hits.0._index
+  - is_true: hits.hits.0._type
+  - is_true: hits.hits.0._id
+  - is_true: hits.hits.1._index
+  - is_true: hits.hits.1._type
+  - is_true: hits.hits.1._id
+
+  - do:
+      search:
+        index: test
+        filter_path: "took"
+        body: "{ query: { match_all: {} } }"
+
+  - is_true:  took
+  - is_false: _shards.total
+  - is_false: hits.total
+  - is_false: hits.hits.0._index
+  - is_false: hits.hits.0._type
+  - is_false: hits.hits.0._id
+  - is_false: hits.hits.1._index
+  - is_false: hits.hits.1._type
+  - is_false: hits.hits.1._id
+
+  - do:
+      search:
+        index: test
+        filter_path: "_shards.*"
+        body: "{ query: { match_all: {} } }"
+
+  - is_false: took
+  - is_true:  _shards.total
+  - is_false: hits.total
+  - is_false: hits.hits.0._index
+  - is_false: hits.hits.0._type
+  - is_false: hits.hits.0._id
+  - is_false: hits.hits.1._index
+  - is_false: hits.hits.1._type
+  - is_false: hits.hits.1._id
+
+  - do:
+      search:
+        index: test
+        filter_path: [ "hits.**._i*", "**.total" ]
+        body: "{ query: { match_all: {} } }"
+
+  - is_false: took
+  - is_true:  _shards.total
+  - is_true:  hits.total
+  - is_true:  hits.hits.0._index
+  - is_false: hits.hits.0._type
+  - is_true:  hits.hits.0._id
+  - is_true:  hits.hits.1._index
+  - is_false: hits.hits.1._type
+  - is_true:  hits.hits.1._id
+
diff --git a/src/main/java/org/elasticsearch/common/xcontent/XContent.java b/src/main/java/org/elasticsearch/common/xcontent/XContent.java
@@ -40,6 +40,11 @@ public interface XContent {
      */
     XContentGenerator createGenerator(OutputStream os) throws IOException;
 
+    /**
+     * Creates a new generator using the provided output stream and some filters.
+     */
+    XContentGenerator createGenerator(OutputStream os, String[] filters) throws IOException;
+
     /**
      * Creates a new generator using the provided writer.
      */