From f781547908892d4baa595437c52be7ef61ec7c28 Mon Sep 17 00:00:00 2001 From: Medcl Date: Sat, 22 Oct 2016 23:17:25 +0800 Subject: [PATCH] Revert "chapter05_part2:/050_Search/10_Multi_index_multi_type.asciidoc" --- 050_Search/05_Empty_search.asciidoc | 54 +++++++++--- 050_Search/10_Multi_index_multi_type.asciidoc | 40 +++++---- 050_Search/15_Pagination.asciidoc | 41 +++++++--- 050_Search/20_Query_string.asciidoc | 82 +++++++++++++------ 4 files changed, 155 insertions(+), 62 deletions(-) diff --git a/050_Search/05_Empty_search.asciidoc b/050_Search/05_Empty_search.asciidoc index a12f507d1..25cb69a86 100644 --- a/050_Search/05_Empty_search.asciidoc +++ b/050_Search/05_Empty_search.asciidoc @@ -1,14 +1,17 @@ [[empty-search]] === The Empty Search -搜索API的最基础的形式是没有指定任何查询的空搜索,它简单地返回集群中所有目录中的所有文档: +The most basic form of the((("searching", "empty search")))((("empty search"))) search API is the _empty search_, which doesn't +specify any query but simply returns all documents in all indices in the +cluster: [source,js] -------------------------------------------------- GET /_search -------------------------------------------------- +// SENSE: 050_Search/05_Empty_search.json -返回的结果(为了解决编辑过的)像这种这样子: +The response (edited for brevity) looks something like this: [source,js] -------------------------------------------------- @@ -45,39 +48,66 @@ GET /_search ==== hits -返回结果中最重的部分是 `hits` ,它包含与我们查询相匹配的文档总数 `total` ,并且一个 `hits` 数组包含所查询结果的前十个文档。 +The most important section of the response is `hits`, which((("searching", "empty search", "hits")))((("hits"))) contains the +`total` number of documents that matched our query, and a `hits` array +containing the first 10 of those matching documents--the results. -在 `hits` 数组中每个结果包含文档的 `_index` 、 `_type` 、 `_id` ,加上 `_source` 字段。这意味着我们可以直接从返回的搜索结果中使用整个文档。这不像其他的搜索引擎,仅仅返回文档的ID,获取对应的文档需要在单独的步骤。 +Each result in the `hits` array contains the `_index`, `_type`, and `_id` of +the document, plus the `_source` field. This means that the whole document is +immediately available to us directly from the search results. This is unlike +other search engines, which return just the document ID, requiring you to fetch +the document itself in a separate step. -每个结果还有一个 `_score` ,这是衡量文档与查询匹配度的关联性分数。默认情况下,首先返回最相关的文档结果,就是说,返回的文档是按照 `_score` 降序排列的。在这个例子中,我们没有指定任何查询,故所有的文档具有相同的相关性,因此对所有的结果而言 `1` 是中性的 `_score` 。 +Each element also ((("score", "for empty search")))((("relevance scores")))has a `_score`. This is the _relevance score_, which is a +measure of how well the document matches the query. By default, results are +returned with the most relevant documents first; that is, in descending order +of `_score`. In this case, we didn't specify any query, so all documents are +equally relevant, hence the neutral `_score` of `1` for all results. -`max_score` 值是与查询所匹配文档的最高 `_score` 。 +The `max_score` value is the highest `_score` of any document that matches our +query.((("max_score value"))) ==== took -`took` 值告诉我们执行整个搜索请求耗费了多少毫秒。 +The `took` value((("took value (empty search)"))) tells us how many milliseconds the entire search request took +to execute. ==== shards -`_shards` 部分告诉我们在查询中参与分片的总数,以及这些分片成功了多少个失败了多少个。正常情况下我们不希望分片失败,但是分片失败是可能发生的。如果我们遭遇到一种较常见的灾难,在这个灾难中丢失了相同分片的原始数据和副本,那么对这个分片将没有可用副本来对搜索请求作出响应。假若这样,Elasticsearch 将报告这个分片是失败的,但是会继续返回剩余分片的结果。 +The `_shards` element((("shards", "number involved in an empty search"))) tells us the `total` number of shards that were involved +in the query and,((("failed shards (in a search)")))((("successful shards (in a search)"))) of them, how many were `successful` and how many `failed`. +We wouldn't normally expect shards to fail, but it can happen. If we were to +suffer a major disaster in which we lost both the primary and the replica copy +of the same shard, there would be no copies of that shard available to respond +to search requests. In this case, Elasticsearch would report the shard as +`failed`, but continue to return results from the remaining shards. ==== timeout -`timed_out` 值告诉我们查询是否超时。默认情况下,搜索请求不会超时。如果低响应时间比完成结果更重要,你可以指定 `timeout` 为10或者10ms(10毫秒),或者1s(1秒): +The `timed_out` value tells((("timed_out value in search results"))) us whether the query timed out. By +default, search requests do not time out.((("timeout parameter", "specifying in a request"))) If low response times are more +important to you than complete results, you can specify a `timeout` as `10` +or `10ms` (10 milliseconds), or `1s` (1 second): [source,js] -------------------------------------------------- GET /_search?timeout=10ms -------------------------------------------------- -在请求超时之前,Elasticsearch 将返回从每个分片聚集来的结果。 + +Elasticsearch will return any results that it has managed to gather from +each shard before the requests timed out. [WARNING] ================================================ -应当注意的是 `timeout` 不是停止执行查询,它仅仅是告知正在协调的节点返回到目前为止收集的结果并且关闭连接。在后台,其他的分片可能仍在执行查询即使是结果已经被发送了。 +It should be noted that this `timeout` does not((("timeout parameter", "not halting query execution"))) halt the execution of the +query; it merely tells the coordinating node to return the results collected +_so far_ and to close the connection. In the background, other shards may +still be processing the query even though results have been sent. -使用超时是因为对你的SLA是重要的,不是因为想去中止长时间运行的查询。 +Use the time-out because it is important to your SLA, not because you want +to abort the execution of long-running queries. ================================================ diff --git a/050_Search/10_Multi_index_multi_type.asciidoc b/050_Search/10_Multi_index_multi_type.asciidoc index 77fff07c4..d865bff0d 100644 --- a/050_Search/10_Multi_index_multi_type.asciidoc +++ b/050_Search/10_Multi_index_multi_type.asciidoc @@ -1,42 +1,54 @@ [[multi-index-multi-type]] -=== 多索引,多类型 +=== Multi-index, Multitype -你有没有注意到之前的 <> 的结果,不同类型的文档((("searching", "multi-index, multi-type search")))— `user` 和 `tweet` 来自不同的索引— `us` 和 `gb` ? +Did you notice that the results from the preceding <> +contained documents ((("searching", "multi-index, multi-type search")))of different types—`user` and `tweet`—from two +different indices—`us` and `gb`? -如果不对某一特殊的索引或者类型做限制,就会搜索集群中的所有文档。Elasticsearch 转发搜索请求到每一个主分片或者副本分片,汇集查询出的前10个结果,并且返回给我们。 +By not limiting our search to a particular index or type, we have searched +across _all_ documents in the cluster. Elasticsearch forwarded the search +request in parallel to a primary or replica of every shard in the cluster, +gathered the results to select the overall top 10, and returned them to us. -然而,经常的情况下,你((("types", "specifying in search requests")))(((" indices", "specifying in search requests")))想在一个或多个特殊的索引并且在一个或者多个特殊的类型中进行搜索。我们可以通过在URL中指定特殊的索引和类型达到这种效果,如下所示: +Usually, however, you will((("types", "specifying in search requests")))((("indices", "specifying in search requests"))) want to search within one or more specific indices, +and probably one or more specific types. We can do this by specifying the +index and type in the URL, as follows: `/_search`:: - 在所有的索引中搜索所有的类型 + Search all types in all indices `/gb/_search`:: - 在 `gb` 索引中搜索所有的类型 + Search all types in the `gb` index `/gb,us/_search`:: - 在 `gb` 和 `us` 索引中搜索所有的文档 + Search all types in the `gb` and `us` indices `/g*,u*/_search`:: - 在任何以 `g` 或者 `u` 开头的索引中搜索所有的类型 + Search all types in any indices beginning with `g` or beginning with `u` `/gb/user/_search`:: - 在 `gb` 索引中搜索 `user` 类型 + Search type `user` in the `gb` index `/gb,us/user,tweet/_search`:: - 在 `gb` 和 `us` 索引中搜索 `user` 和 `tweet` 类型 + Search types `user` and `tweet` in the `gb` and `us` indices `/_all/user,tweet/_search`:: - 在所有的索引中搜索 `user` 和 `tweet` 类型 + Search types `user` and `tweet` in all indices -当在单一的索引下进行搜索的时候,Elasticsearch 转发请求到索引的每个分片中,可以是主分片也可以是副本分片,然后从每个分片中收集结果。多索引搜索恰好也是用相同的方式工作的--只是会涉及到更多的分片。 +When you search within a single index, Elasticsearch forwards the search +request to a primary or replica of every shard in that index, and then gathers the +results from each shard. Searching within multiple indices works in exactly +the same way--there are just more shards involved. [TIP] ================================================ -搜索一个索引有五个主分片和搜索五个索引各有一个分片准确来所说是等价的。 +Searching one index that has five primary shards is _exactly equivalent_ to +searching five indices that have one primary shard each. ================================================ -接下来,你将明白这种简单的方式如何弹性的把请求的变化变得简单化。 +Later, you will see how this simple fact makes it easy to scale flexibly +as your requirements change. diff --git a/050_Search/15_Pagination.asciidoc b/050_Search/15_Pagination.asciidoc index 8a8511bae..6123cf73b 100644 --- a/050_Search/15_Pagination.asciidoc +++ b/050_Search/15_Pagination.asciidoc @@ -1,17 +1,21 @@ [[pagination]] -=== 分页 +=== Pagination -在之前的 <> 中知道集群中有14个文档匹配了我们(empty)query。但是在 `hits` 数组中只有10个文档,怎么样我们才能看到其他的文档呢? +Our preceding <> told us that 14 documents in the((("pagination"))) +cluster match our (empty) query. But there were only 10 documents in +the `hits` array. How can we see the other documents? -像SQL使用 `LIMIT` 关键字返回单页的结果一样,Elasticsearch 有 `from` 和 `size` 参数: +In the same way as SQL uses the `LIMIT` keyword to return a single ``page'' of +results, Elasticsearch accepts ((("from parameter")))((("size parameter")))the `from` and `size` parameters: `size`:: - 显示应该返回的结果数量,默认是 `10` + Indicates the number of results that should be returned, defaults to `10` `from`:: - 显示应该跳过的初始结果数量,默认是 `0` + Indicates the number of initial results that should be skipped, defaults to `0` -如果想每页展示五条结果,可以用下面三种方式请求: +If you wanted to show five results per page, then pages 1 to 3 +could be requested as follows: [source,js] -------------------------------------------------- @@ -22,17 +26,30 @@ GET /_search?size=5&from=10 // SENSE: 050_Search/15_Pagination.json -考虑到分页太深或者请求太多结果的情况,在返回结果之前可以对结果排序。但是请记住一个请求经常跨越多个分片,每个分片都产生自己的排序结果,这些结果需要进行集中排序以保证全部的次序是正确的。 +Beware of paging too deep or requesting too many results at once. Results are +sorted before being returned. But remember that a search request usually spans +multiple shards. Each shard generates its own sorted results, which then need +to be sorted centrally to ensure that the overall order is correct. -.在分布式系统中深度分页 +.Deep Paging in Distributed Systems **** -理解问什么深度分页是有问题的,我们可以想象搜索有五个主分片的单一索引。当我们请求结果的第一页(结果从1到10),每一个分片产生前10的结果,并且返回给起协调作用的节点,起协调作用的节点在对50个结果排序得到全部结果的前10个。 +To understand why ((("deep paging, problems with")))deep paging is problematic, let's imagine that we are +searching within a single index with five primary shards. When we request the +first page of results (results 1 to 10), each shard produces its own top 10 +results and returns them to the _coordinating node_, which then sorts all 50 +results in order to select the overall top 10. -现在想象我们请求第1000页--结果从10001到10010。所有都以相同的方式工作除了每个分片不得不产生前10010个结果以外。然后起协调作用的节点对全部50050个结果排序最后丢弃掉这些结果中的50040个结果。 +Now imagine that we ask for page 1,000--results 10,001 to 10,010. Everything +works in the same way except that each shard has to produce its top 10,010 +results. The coordinating node then sorts through all 50,050 results and +discards 50,040 of them! -看得出来,在分布式系统中,对结果排序的成本随分页的深度成指数上升。这就是为什么每次查询不要返回超过1000个结果的一个好理由。 +You can see that, in a distributed system, the cost of sorting results +grows exponentially the deeper we page. There is a good reason +that web search engines don't return more than 1,000 results for any query. **** -TIP: 在 <> 中我们解释了如何有效的获取大量的文档。 +TIP: In <> we explain how you _can_ retrieve large numbers of +documents efficiently. diff --git a/050_Search/20_Query_string.asciidoc b/050_Search/20_Query_string.asciidoc index 813de7a1d..f4340dab8 100644 --- a/050_Search/20_Query_string.asciidoc +++ b/050_Search/20_Query_string.asciidoc @@ -1,9 +1,14 @@ [[search-lite]] === Search _Lite_ -有两种搜索API的形式:一种精简查询-字符串版本在查询字符串中传递所有的参数,另一种功能全面的_request body_版本使用JSON格式并且使用一种名叫查询DSL的丰富搜索语言。 +There are two forms of the `search` API: a ``lite'' _query-string_ version +that expects all its((("searching", "query string searches")))((("query strings", "searching with"))) parameters to be passed in the query string, and the full +_request body_ version that expects a JSON request body and uses a +rich search language called the query DSL. -在命令行中查询-字符串搜索对运行特殊的查询是有益的。例如,查询在 `tweet` 类型中 `tweet` 字段包含 `elasticsearch` 单词的所有文档: +The query-string search is useful for running ad hoc queries from the +command line. For instance, this query finds all documents of type `tweet` that +contain the word `elasticsearch` in the `tweet` field: [source,js] -------------------------------------------------- @@ -11,11 +16,13 @@ GET /_all/tweet/_search?q=tweet:elasticsearch -------------------------------------------------- // SENSE: 050_Search/20_Query_string.json -下一个查询在 `name` 字段中包含 `john` 并且在 `tweet` 字段中包含 `mary` 的文档。实际的查询就是这样 +The next query looks for `john` in the `name` field and `mary` in the +`tweet` field. The actual query is just +name:john +tweet:mary -但是查询-字符串参数所需要的百分比编码让它比实际上的更含义模糊: +but the _percent encoding_ needed for query-string parameters makes it appear +more cryptic than it really is: [source,js] -------------------------------------------------- @@ -24,12 +31,15 @@ GET /_search?q=%2Bname%3Ajohn+%2Btweet%3Amary // SENSE: 050_Search/20_Query_string.json -`+` 前缀表示必须与查询条件匹配。类似地, `-` 前缀表示一定不与查询条件匹配。没有 `+` 或者 `-` 的所有条件是可选的--匹配的越多,文档就越相关。 +The `+` prefix indicates conditions that _must_ be satisfied for our query to +match. Similarly a `-` prefix would indicate conditions that _must not_ +match. All conditions without a `+` or `-` are optional--the more that match, +the more relevant the document. [[all-field-intro]] ==== The _all Field -这个简单搜索返回包含 `mary` 的所有文档: +This simple search returns all documents that contain the word `mary`: [source,js] -------------------------------------------------- @@ -38,15 +48,19 @@ GET /_search?q=mary // SENSE: 050_Search/20_All_field.json -之前的例子中,我们在 `tweet` 和 `name` 字段中搜索内容。然而,这个查询的结果在三个地方提到了 `mary` : +In the previous examples, we searched for words in the `tweet` or +`name` fields. However, the results from this query mention `mary` in +three fields: * A user whose name is Mary * Six tweets by Mary * One tweet directed at @mary -Elasticsearch 是如何在三个不同的区域中查找到结果的呢? +How has Elasticsearch managed to find results in three different fields? -当你索引一个文档的时候,Elasticsearch 取出所有字段的值拼接成一个大的字符串,作为 `_all` 字段进行索引。例如,当我们索引这个文档时: +When you index a document, Elasticsearch takes the string values of all of +its fields and concatenates them into one big string, which it indexes as +the special `_all` field.((("_all field", sortas="all field"))) For example, when we index this document: [source,js] -------------------------------------------------- @@ -59,7 +73,7 @@ Elasticsearch 是如何在三个不同的区域中查找到结果的呢? -------------------------------------------------- -这就好似增加了一个名叫 `_all` 的额外字段: +it's as if we had added an extra field called `_all` with this value: [source,js] -------------------------------------------------- @@ -67,19 +81,24 @@ Elasticsearch 是如何在三个不同的区域中查找到结果的呢? -------------------------------------------------- -除非字段已经被指定,否则就使用 `_all` 字段进行搜索。 +The query-string search uses the `_all` field unless another +field name has been specified. -TIP: 在你刚开始使用 Elasticsearch 的时候, `_all` 字段是一个很实用的特征。之后,你会发现如果你在搜索的时候用指定字段来代替 `_all` 字段,对搜索出来的结果将有更好的控制。当 `_all` 字段对你不再有用的时候,你可以将它置为失效,向在 <> 中解释的。 +TIP: The `_all` field is a useful feature while you are getting started with +a new application. Later, you will find that you have more control over +your search results if you query specific fields instead of the `_all` +field. When the `_all` field is no longer useful to you, you can +disable it, as explained in <>. [[query-string-query]] [role="pagebreak-before"] -==== 更复杂的查询 +==== More Complicated Queries -下面对tweents的查询,使用以下的条件: +The next query searches for tweets, using the following criteria: -* `name` 字段中包含 `mary` 或者 `john` -* `date` 值大于 `2014-09-10` -* +_all_+ 字段包含 `aggregations` 或者 `geo` +* The `name` field contains `mary` or `john` +* The `date` is greater than `2014-09-10` +* The +_all+ field contains either of the words `aggregations` or `geo` [source,js] -------------------------------------------------- @@ -87,24 +106,39 @@ TIP: 在你刚开始使用 Elasticsearch 的时候, `_all` 字段是一个很 -------------------------------------------------- // SENSE: 050_Search/20_All_field.json -适当编码过的查询字符串看起来有点晦涩难读: +As a properly encoded query string, this looks like the slightly less +readable result: [source,js] -------------------------------------------------- ?q=%2Bname%3A(mary+john)+%2Bdate%3A%3E2014-09-10+%2B(aggregations+geo) -------------------------------------------------- -从之前的例子中可以看出,这种简化的查询-字符串的效果是非常惊人的。在相关参考文档中做出了详细解释的查询语法,让我们可以简洁的表达很复杂的查询。这对于命令行随机查询和在开发阶段都是很好的。 +As you can see from the preceding examples, this _lite_ query-string search is +surprisingly powerful.((("query strings", "syntax, reference for"))) Its query syntax, which is explained in detail in the +{ref}/query-dsl-query-string-query.html#query-string-syntax[Query String Syntax] +reference docs, allows us to express quite complex queries succinctly. This +makes it great for throwaway queries from the command line or during +development. -然而,这种简洁的方式可能让排错变得模糊和困难。像 `-` , `:` , `/` 或者 `"` 不匹配这种易错的小语法问题将返回一个错误。 +However, you can also see that its terseness can make it cryptic and +difficult to debug. And it's fragile--a slight syntax error in the query +string, such as a misplaced `-`, `:`, `/`, or `"`, and it will return an error +instead of results. -字符串查询允许任何用户在索引的任意字段上运行既慢又重的查询,这些查询可能会暴露隐私信息或者将你的集群拖垮。 +Finally, the query-string search allows any user to run potentially slow, heavy +queries on any field in your index, possibly exposing private information or +even bringing your cluster to its knees! [TIP] ================================================== -因为这些原因,我们不推荐直接向用户暴露查询-字符串,除非这些用户对于集群和数据是可以被信任的。 - +For these reasons, we don't recommend exposing query-string searches directly to +your users, unless they are power users who can be trusted with your data and +with your cluster. ================================================== -相反,我们经常在产品中更多的使用功能全面的 _request body_ 查询API。然而,在我们达到那种程度之前,我们首先需要了解数据在 Elasticsearch 中是如何索引的。 +Instead, in production we usually rely on the full-featured _request body_ +search API, which does all of this, plus a lot more. Before we get there, +though, we first need to take a look at how our data is indexed in +Elasticsearch.