[DOCS] Reformats bulk API. (elastic#47479)

* Reformats bulk API. * Update docs/reference/docs/bulk.asciidoc Co-Authored-By: James Rodewig <james.rodewig@elastic.co>
debadair · Oct 7, 2019 · de5a817 · de5a817
1 parent ffacfc6
commit de5a817
Showing 1 changed file with 180 additions and 118 deletions.
diff --git a/docs/reference/docs/bulk.asciidoc b/docs/reference/docs/bulk.asciidoc
@@ -1,28 +1,37 @@
 [[docs-bulk]]
 === Bulk API
+++++
+<titleabbrev>Bulk</titleabbrev>
+++++
 
-The bulk API makes it possible to perform many index/delete operations
-in a single API call. This can greatly increase the indexing speed.
+Performs multiple indexing or delete operations in a single API call. 
+This reduces overhead and can greatly increase indexing speed.
 
-.Client support for bulk requests
-*********************************************
-
-Some of the officially supported clients provide helpers to assist with
-bulk requests and reindexing of documents from one index to another:
+[source,console]
+--------------------------------------------------
+POST _bulk
+{ "index" : { "_index" : "test", "_id" : "1" } }
+{ "field1" : "value1" }
+{ "delete" : { "_index" : "test", "_id" : "2" } }
+{ "create" : { "_index" : "test", "_id" : "3" } }
+{ "field1" : "value3" }
+{ "update" : {"_id" : "1", "_index" : "test"} }
+{ "doc" : {"field2" : "value2"} }
+--------------------------------------------------
 
-Perl::
+[[docs-bulk-api-request]]
+==== {api-request-title}
 
-    See https://metacpan.org/pod/Search::Elasticsearch::Client::5_0::Bulk[Search::Elasticsearch::Client::5_0::Bulk]
-    and https://metacpan.org/pod/Search::Elasticsearch::Client::5_0::Scroll[Search::Elasticsearch::Client::5_0::Scroll]
+`POST /_bulk`
 
-Python::
+`POST /<index>/_bulk`
 
-    See http://elasticsearch-py.readthedocs.org/en/master/helpers.html[elasticsearch.helpers.*]
+[[docs-bulk-api-desc]]
+==== {api-description-title}
 
-*********************************************
+Provides a way to perform multiple `index`, `create`, `delete`, and `update` actions in a single request.
 
-The REST API endpoint is `/_bulk`, and it expects the following newline delimited JSON
-(NDJSON) structure:
+The actions are specified in the request body using a newline delimited JSON (NDJSON) structure:
 
 [source,js]
 --------------------------------------------------
@@ -36,19 +45,70 @@ optional_source\n
 --------------------------------------------------
 // NOTCONSOLE
 
-*NOTE*: The final line of data must end with a newline character `\n`. Each newline character
-may be preceded by a carriage return `\r`. When sending requests to this endpoint the
-`Content-Type` header should be set to `application/x-ndjson`.
+The `index` and `create` actions expect a source on the next line, 
+and have the same semantics as the `op_type` parameter in the standard index API: 
+create fails if a document with the same name already exists in the index,  
+index adds or replaces a document as necessary. 
+
+`update` expects that the partial doc, upsert, 
+and script and its options are specified on the next line.
+
+`delete` does not expect a source on the next line and 
+has the same semantics as the standard delete API.
+
+[NOTE]
+====
+The final line of data must end with a newline character `\n`. 
+Each newline character may be preceded by a carriage return `\r`. 
+When sending requests to the `_bulk` endpoint,
+the `Content-Type` header should be set to `application/x-ndjson`.
+====
+
+Because this format uses literal `\n`'s as delimiters, 
+make sure that the JSON actions and sources are not pretty printed.
+
+If you specify an index in the request URI, 
+it is used for any actions that don't explicitly specify an index.
+
+A note on the format: The idea here is to make processing of this as
+fast as possible. As some of the actions are redirected to other
+shards on other nodes, only `action_meta_data` is parsed on the
+receiving node side.
+
+Client libraries using this protocol should try and strive to do
+something similar on the client side, and reduce buffering as much as
+possible.
+
+The response to a bulk action is a large JSON structure with 
+the individual results of each action performed, 
+in the same order as the actions that appeared in the request. 
+The failure of a single action does not affect the remaining actions.
+
+There is no "correct" number of actions to perform in a single bulk request. 
+Experiment with different settings to find the optimal size for your particular workload.
+
+When using the HTTP API, make sure that the client does not send HTTP chunks, 
+as this will slow things down.
+
+[float]
+[[bulk-clients]]
+===== Client support for bulk requests
+
+Some of the officially supported clients provide helpers to assist with
+bulk requests and reindexing of documents from one index to another:
+
+Perl::
+
+    See https://metacpan.org/pod/Search::Elasticsearch::Client::5_0::Bulk[Search::Elasticsearch::Client::5_0::Bulk]
+    and https://metacpan.org/pod/Search::Elasticsearch::Client::5_0::Scroll[Search::Elasticsearch::Client::5_0::Scroll]
+
+Python::
+
+    See http://elasticsearch-py.readthedocs.org/en/master/helpers.html[elasticsearch.helpers.*]
 
-The possible actions are `index`, `create`, `delete`, and `update`.
-`index` and `create` expect a source on the next
-line, and have the same semantics as the `op_type` parameter to the
-standard index API (i.e. create will fail if a document with the same
-index exists already, whereas index will add or replace a
-document as necessary). `delete` does not expect a source on the
-following line, and has the same semantics as the standard delete API.
-`update` expects that the partial doc, upsert and script and its options
-are specified on the next line.
+[float]
+[[bulk-curl]]
+===== Submitting bulk requests with cURL
 
 If you're providing text file input to `curl`, you *must* use the
 `--data-binary` flag instead of plain `-d`. The latter doesn't preserve
@@ -65,9 +125,97 @@ $ curl -s -H "Content-Type: application/x-ndjson" -XPOST localhost:9200/_bulk --
 // NOTCONSOLE
 // Not converting to console because this shows how curl works
 
-Because this format uses literal `\n`'s as delimiters, please be sure
-that the JSON actions and sources are not pretty printed. Here is an
-example of a correct sequence of bulk commands:
+[float]
+[[bulk-optimistic-concurrency-control]]
+===== Optimistic Concurrency Control
+
+Each `index` and `delete` action within a bulk API call may include the
+`if_seq_no` and `if_primary_term` parameters in their respective action
+and meta data lines. The `if_seq_no` and `if_primary_term` parameters control
+how operations are executed, based on the last modification to existing
+documents. See <<optimistic-concurrency-control>> for more details.
+
+
+[float]
+[[bulk-versioning]]
+===== Versioning
+
+Each bulk item can include the version value using the
+`version` field. It automatically follows the behavior of the
+index / delete operation based on the `_version` mapping. It also
+support the `version_type` (see <<index-versioning, versioning>>).
+
+[float]
+[[bulk-routing]]
+===== Routing
+
+Each bulk item can include the routing value using the
+`routing` field. It automatically follows the behavior of the
+index / delete operation based on the `_routing` mapping.
+
+[float]
+[[bulk-wait-for-active-shards]]
+===== Wait For Active Shards
+
+When making bulk calls, you can set the `wait_for_active_shards`
+parameter to require a minimum number of shard copies to be active
+before starting to process the bulk request. See
+<<index-wait-for-active-shards,here>> for further details and a usage
+example.
+
+[float]
+[[bulk-refresh]]
+===== Refresh
+
+Control when the changes made by this request are visible to search. See
+<<docs-refresh,refresh>>.
+
+NOTE: Only the shards that receive the bulk request will be affected by
+`refresh`. Imagine a `_bulk?refresh=wait_for` request with three
+documents in it that happen to be routed to different shards in an index
+with five shards. The request will only wait for those three shards to
+refresh. The other two shards that make up the index do not
+participate in the `_bulk` request at all.
+
+[float]
+[[bulk-security]]
+===== Security
+
+See <<url-access-control>>.
+
+[float]
+[[bulk-partial-responses]]
+===== Partial responses
+To ensure fast responses, the bulk API will respond with partial results if one or more shards fail. 
+See <<shard-failures, Shard failures>> for more information.
+
+[[docs-bulk-api-path-params]]
+==== {api-path-parms-title}
+
+`<index>`::
+(Optional, string) Name of the index to perform the bulk actions against.
+
+[[docs-bulk-api-query-params]]
+==== {api-query-parms-title}
+
+include::{docdir}/rest-api/common-parms.asciidoc[tag=pipeline]
+
+include::{docdir}/rest-api/common-parms.asciidoc[tag=refresh]
+
+include::{docdir}/rest-api/common-parms.asciidoc[tag=routing]
+
+include::{docdir}/rest-api/common-parms.asciidoc[tag=source]
+
+include::{docdir}/rest-api/common-parms.asciidoc[tag=source_excludes]
+
+include::{docdir}/rest-api/common-parms.asciidoc[tag=source_includes]
+
+include::{docdir}/rest-api/common-parms.asciidoc[tag=timeout]
+
+include::{docdir}/rest-api/common-parms.asciidoc[tag=wait_for_active_shards]
+
+[[docs-bulk-api-example]]
+==== {api-examples-title}
 
 [source,console]
 --------------------------------------------------
@@ -81,7 +229,7 @@ POST _bulk
 { "doc" : {"field2" : "value2"} }
 --------------------------------------------------
 
-The result of this bulk operation is:
+The API returns the following result:
 
 [source,console-result]
 --------------------------------------------------
@@ -171,85 +319,9 @@ The result of this bulk operation is:
 // TESTRESPONSE[s/"_seq_no" : 3/"_seq_no" : $body.items.3.update._seq_no/]
 // TESTRESPONSE[s/"_primary_term" : 4/"_primary_term" : $body.items.3.update._primary_term/]
 
-The endpoints are `/_bulk` and `/{index}/_bulk`. When the index is provided, it
-will be used by default on bulk items that don't provide it explicitly.
-
-A note on the format. The idea here is to make processing of this as
-fast as possible. As some of the actions will be redirected to other
-shards on other nodes, only `action_meta_data` is parsed on the
-receiving node side.
-
-Client libraries using this protocol should try and strive to do
-something similar on the client side, and reduce buffering as much as
-possible.
-
-The response to a bulk action is a large JSON structure with the individual
-results of each action that was performed in the same order as the actions that
-appeared in the request. The failure of a single action does not affect the
-remaining actions.
-
-There is no "correct" number of actions to perform in a single bulk
-call. You should experiment with different settings to find the optimum
-size for your particular workload.
-
-If using the HTTP API, make sure that the client does not send HTTP
-chunks, as this will slow things down.
-
-[float]
-[[bulk-optimistic-concurrency-control]]
-==== Optimistic Concurrency Control
-
-Each `index` and `delete` action within a bulk API call may include the
-`if_seq_no` and `if_primary_term` parameters in their respective action
-and meta data lines. The `if_seq_no` and `if_primary_term` parameters control
-how operations are executed, based on the last modification to existing
-documents. See <<optimistic-concurrency-control>> for more details.
-
-
-[float]
-[[bulk-versioning]]
-==== Versioning
-
-Each bulk item can include the version value using the
-`version` field. It automatically follows the behavior of the
-index / delete operation based on the `_version` mapping. It also
-support the `version_type` (see <<index-versioning, versioning>>).
-
-[float]
-[[bulk-routing]]
-==== Routing
-
-Each bulk item can include the routing value using the
-`routing` field. It automatically follows the behavior of the
-index / delete operation based on the `_routing` mapping.
-
-[float]
-[[bulk-wait-for-active-shards]]
-==== Wait For Active Shards
-
-When making bulk calls, you can set the `wait_for_active_shards`
-parameter to require a minimum number of shard copies to be active
-before starting to process the bulk request. See
-<<index-wait-for-active-shards,here>> for further details and a usage
-example.
-
-[float]
-[[bulk-refresh]]
-==== Refresh
-
-Control when the changes made by this request are visible to search. See
-<<docs-refresh,refresh>>.
-
-NOTE: Only the shards that receive the bulk request will be affected by
-`refresh`. Imagine a `_bulk?refresh=wait_for` request with three
-documents in it that happen to be routed to different shards in an index
-with five shards. The request will only wait for those three shards to
-refresh. The other two shards that make up the index do not
-participate in the `_bulk` request at all.
-
 [float]
 [[bulk-update]]
-==== Update
+===== Bulk update example
 
 When using the `update` action, `retry_on_conflict` can be used as a field in
 the action itself (not in the extra payload line), to specify how many
@@ -276,13 +348,3 @@ POST _bulk
 --------------------------------------------------
 // TEST[continued]
 
-[float]
-[[bulk-security]]
-==== Security
-
-See <<url-access-control>>.
-
-[float]
-[[bulk-partial-responses]]
-==== Partial responses
-To ensure fast responses, the bulk API will respond with partial results if one or more shards fail. See <<shard-failures, Shard failures>> for more information.