Skip to content

Revert "chapter05_part2:/050_Search/10_Multi_index_multi_type.asciidoc" #326

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
54 changes: 42 additions & 12 deletions 050_Search/05_Empty_search.asciidoc
Original file line number Diff line number Diff line change
@@ -1,14 +1,17 @@
[[empty-search]]
=== The Empty Search

搜索API的最基础的形式是没有指定任何查询的空搜索,它简单地返回集群中所有目录中的所有文档:
The most basic form of the((("searching", "empty search")))((("empty search"))) search API is the _empty search_, which doesn't
specify any query but simply returns all documents in all indices in the
cluster:

[source,js]
--------------------------------------------------
GET /_search
--------------------------------------------------
// SENSE: 050_Search/05_Empty_search.json

返回的结果(为了解决编辑过的)像这种这样子:
The response (edited for brevity) looks something like this:

[source,js]
--------------------------------------------------
Expand Down Expand Up @@ -45,39 +48,66 @@ GET /_search

==== hits

返回结果中最重的部分是 `hits` ,它包含与我们查询相匹配的文档总数 `total` ,并且一个 `hits` 数组包含所查询结果的前十个文档。
The most important section of the response is `hits`, which((("searching", "empty search", "hits")))((("hits"))) contains the
`total` number of documents that matched our query, and a `hits` array
containing the first 10 of those matching documents--the results.

在 `hits` 数组中每个结果包含文档的 `_index` 、 `_type` 、 `_id` ,加上 `_source` 字段。这意味着我们可以直接从返回的搜索结果中使用整个文档。这不像其他的搜索引擎,仅仅返回文档的ID,获取对应的文档需要在单独的步骤。
Each result in the `hits` array contains the `_index`, `_type`, and `_id` of
the document, plus the `_source` field. This means that the whole document is
immediately available to us directly from the search results. This is unlike
other search engines, which return just the document ID, requiring you to fetch
the document itself in a separate step.

每个结果还有一个 `_score` ,这是衡量文档与查询匹配度的关联性分数。默认情况下,首先返回最相关的文档结果,就是说,返回的文档是按照 `_score` 降序排列的。在这个例子中,我们没有指定任何查询,故所有的文档具有相同的相关性,因此对所有的结果而言 `1` 是中性的 `_score` 。
Each element also ((("score", "for empty search")))((("relevance scores")))has a `_score`. This is the _relevance score_, which is a
measure of how well the document matches the query. By default, results are
returned with the most relevant documents first; that is, in descending order
of `_score`. In this case, we didn't specify any query, so all documents are
equally relevant, hence the neutral `_score` of `1` for all results.

`max_score` 值是与查询所匹配文档的最高 `_score` 。
The `max_score` value is the highest `_score` of any document that matches our
query.((("max_score value")))

==== took

`took` 值告诉我们执行整个搜索请求耗费了多少毫秒。
The `took` value((("took value (empty search)"))) tells us how many milliseconds the entire search request took
to execute.

==== shards

`_shards` 部分告诉我们在查询中参与分片的总数,以及这些分片成功了多少个失败了多少个。正常情况下我们不希望分片失败,但是分片失败是可能发生的。如果我们遭遇到一种较常见的灾难,在这个灾难中丢失了相同分片的原始数据和副本,那么对这个分片将没有可用副本来对搜索请求作出响应。假若这样,Elasticsearch 将报告这个分片是失败的,但是会继续返回剩余分片的结果。
The `_shards` element((("shards", "number involved in an empty search"))) tells us the `total` number of shards that were involved
in the query and,((("failed shards (in a search)")))((("successful shards (in a search)"))) of them, how many were `successful` and how many `failed`.
We wouldn't normally expect shards to fail, but it can happen. If we were to
suffer a major disaster in which we lost both the primary and the replica copy
of the same shard, there would be no copies of that shard available to respond
to search requests. In this case, Elasticsearch would report the shard as
`failed`, but continue to return results from the remaining shards.

==== timeout

`timed_out` 值告诉我们查询是否超时。默认情况下,搜索请求不会超时。如果低响应时间比完成结果更重要,你可以指定 `timeout` 为10或者10ms(10毫秒),或者1s(1秒):
The `timed_out` value tells((("timed_out value in search results"))) us whether the query timed out. By
default, search requests do not time out.((("timeout parameter", "specifying in a request"))) If low response times are more
important to you than complete results, you can specify a `timeout` as `10`
or `10ms` (10 milliseconds), or `1s` (1 second):

[source,js]
--------------------------------------------------
GET /_search?timeout=10ms
--------------------------------------------------

在请求超时之前,Elasticsearch 将返回从每个分片聚集来的结果。

Elasticsearch will return any results that it has managed to gather from
each shard before the requests timed out.

[WARNING]
================================================

应当注意的是 `timeout` 不是停止执行查询,它仅仅是告知正在协调的节点返回到目前为止收集的结果并且关闭连接。在后台,其他的分片可能仍在执行查询即使是结果已经被发送了。
It should be noted that this `timeout` does not((("timeout parameter", "not halting query execution"))) halt the execution of the
query; it merely tells the coordinating node to return the results collected
_so far_ and to close the connection. In the background, other shards may
still be processing the query even though results have been sent.

使用超时是因为对你的SLA是重要的,不是因为想去中止长时间运行的查询。
Use the time-out because it is important to your SLA, not because you want
to abort the execution of long-running queries.

================================================

40 changes: 26 additions & 14 deletions 050_Search/10_Multi_index_multi_type.asciidoc
Original file line number Diff line number Diff line change
@@ -1,42 +1,54 @@
[[multi-index-multi-type]]
=== 多索引,多类型
=== Multi-index, Multitype

你有没有注意到之前的 <<empty-search,empty search>> 的结果,不同类型的文档((("searching", "multi-index, multi-type search")))&#x2014; `user` 和 `tweet` 来自不同的索引&#x2014; `us` 和 `gb` ?
Did you notice that the results from the preceding <<empty-search,empty search>>
contained documents ((("searching", "multi-index, multi-type search")))of different types&#x2014;`user` and `tweet`&#x2014;from two
different indices&#x2014;`us` and `gb`?

如果不对某一特殊的索引或者类型做限制,就会搜索集群中的所有文档。Elasticsearch 转发搜索请求到每一个主分片或者副本分片,汇集查询出的前10个结果,并且返回给我们。
By not limiting our search to a particular index or type, we have searched
across _all_ documents in the cluster. Elasticsearch forwarded the search
request in parallel to a primary or replica of every shard in the cluster,
gathered the results to select the overall top 10, and returned them to us.

然而,经常的情况下,你((("types", "specifying in search requests")))(((" indices", "specifying in search requests")))想在一个或多个特殊的索引并且在一个或者多个特殊的类型中进行搜索。我们可以通过在URL中指定特殊的索引和类型达到这种效果,如下所示:
Usually, however, you will((("types", "specifying in search requests")))((("indices", "specifying in search requests"))) want to search within one or more specific indices,
and probably one or more specific types. We can do this by specifying the
index and type in the URL, as follows:


`/_search`::
在所有的索引中搜索所有的类型
Search all types in all indices

`/gb/_search`::
`gb` 索引中搜索所有的类型
Search all types in the `gb` index

`/gb,us/_search`::
`gb` `us` 索引中搜索所有的文档
Search all types in the `gb` and `us` indices

`/g*,u*/_search`::
在任何以 `g` 或者 `u` 开头的索引中搜索所有的类型
Search all types in any indices beginning with `g` or beginning with `u`

`/gb/user/_search`::
在 `gb` 索引中搜索 `user` 类型
Search type `user` in the `gb` index

`/gb,us/user,tweet/_search`::
在 `gb` 和 `us` 索引中搜索 `user` 和 `tweet` 类型
Search types `user` and `tweet` in the `gb` and `us` indices

`/_all/user,tweet/_search`::
在所有的索引中搜索 `user` `tweet` 类型
Search types `user` and `tweet` in all indices


当在单一的索引下进行搜索的时候,Elasticsearch 转发请求到索引的每个分片中,可以是主分片也可以是副本分片,然后从每个分片中收集结果。多索引搜索恰好也是用相同的方式工作的--只是会涉及到更多的分片。
When you search within a single index, Elasticsearch forwards the search
request to a primary or replica of every shard in that index, and then gathers the
results from each shard. Searching within multiple indices works in exactly
the same way--there are just more shards involved.

[TIP]
================================================

搜索一个索引有五个主分片和搜索五个索引各有一个分片准确来所说是等价的。
Searching one index that has five primary shards is _exactly equivalent_ to
searching five indices that have one primary shard each.

================================================

接下来,你将明白这种简单的方式如何弹性的把请求的变化变得简单化。
Later, you will see how this simple fact makes it easy to scale flexibly
as your requirements change.
41 changes: 29 additions & 12 deletions 050_Search/15_Pagination.asciidoc
Original file line number Diff line number Diff line change
@@ -1,17 +1,21 @@
[[pagination]]
=== 分页
=== Pagination

在之前的 <<empty-search,empty search>> 中知道集群中有14个文档匹配了我们(empty)query。但是在 `hits` 数组中只有10个文档,怎么样我们才能看到其他的文档呢?
Our preceding <<empty-search,empty search>> told us that 14 documents in the((("pagination")))
cluster match our (empty) query. But there were only 10 documents in
the `hits` array. How can we see the other documents?

像SQL使用 `LIMIT` 关键字返回单页的结果一样,Elasticsearch 有 `from` 和 `size` 参数:
In the same way as SQL uses the `LIMIT` keyword to return a single ``page'' of
results, Elasticsearch accepts ((("from parameter")))((("size parameter")))the `from` and `size` parameters:

`size`::
显示应该返回的结果数量,默认是 `10`
Indicates the number of results that should be returned, defaults to `10`

`from`::
显示应该跳过的初始结果数量,默认是 `0`
Indicates the number of initial results that should be skipped, defaults to `0`

如果想每页展示五条结果,可以用下面三种方式请求:
If you wanted to show five results per page, then pages 1 to 3
could be requested as follows:

[source,js]
--------------------------------------------------
Expand All @@ -22,17 +26,30 @@ GET /_search?size=5&from=10
// SENSE: 050_Search/15_Pagination.json


考虑到分页太深或者请求太多结果的情况,在返回结果之前可以对结果排序。但是请记住一个请求经常跨越多个分片,每个分片都产生自己的排序结果,这些结果需要进行集中排序以保证全部的次序是正确的。
Beware of paging too deep or requesting too many results at once. Results are
sorted before being returned. But remember that a search request usually spans
multiple shards. Each shard generates its own sorted results, which then need
to be sorted centrally to ensure that the overall order is correct.

.在分布式系统中深度分页
.Deep Paging in Distributed Systems
****

理解问什么深度分页是有问题的,我们可以想象搜索有五个主分片的单一索引。当我们请求结果的第一页(结果从1到10),每一个分片产生前10的结果,并且返回给起协调作用的节点,起协调作用的节点在对50个结果排序得到全部结果的前10个。
To understand why ((("deep paging, problems with")))deep paging is problematic, let's imagine that we are
searching within a single index with five primary shards. When we request the
first page of results (results 1 to 10), each shard produces its own top 10
results and returns them to the _coordinating node_, which then sorts all 50
results in order to select the overall top 10.

现在想象我们请求第1000页--结果从10001到10010。所有都以相同的方式工作除了每个分片不得不产生前10010个结果以外。然后起协调作用的节点对全部50050个结果排序最后丢弃掉这些结果中的50040个结果。
Now imagine that we ask for page 1,000--results 10,001 to 10,010. Everything
works in the same way except that each shard has to produce its top 10,010
results. The coordinating node then sorts through all 50,050 results and
discards 50,040 of them!

看得出来,在分布式系统中,对结果排序的成本随分页的深度成指数上升。这就是为什么每次查询不要返回超过1000个结果的一个好理由。
You can see that, in a distributed system, the cost of sorting results
grows exponentially the deeper we page. There is a good reason
that web search engines don't return more than 1,000 results for any query.

****

TIP: 在 <<reindex>> 中我们解释了如何有效的获取大量的文档。
TIP: In <<reindex>> we explain how you _can_ retrieve large numbers of
documents efficiently.
Loading