Skip to content

Commit f9997fe

Browse files
debadairjrodewig
andcommitted
[DOCS] Streamlined GS aggs section. (elastic#45951)
* [DOCS] Streamlined GS aggs section. * Update docs/reference/getting-started.asciidoc Co-Authored-By: James Rodewig <james.rodewig@elastic.co>
1 parent d50d700 commit f9997fe

File tree

1 file changed

+44
-157
lines changed

1 file changed

+44
-157
lines changed

docs/reference/getting-started.asciidoc

Lines changed: 44 additions & 157 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ Step through this getting started tutorial to:
1717
Need more context?
1818

1919
Check out the <<elasticsearch-intro,
20-
Elasticsearch Introduction>> to learn the lingo and understand the basics of
20+
{es} Introduction>> to learn the lingo and understand the basics of
2121
how {es} works. If you're already familiar with {es} and want to see how it works
2222
with the rest of the stack, you might want to jump to the
2323
{stack-gs}/get-started-elastic-stack.html[Elastic Stack
@@ -26,29 +26,30 @@ Tutorial] to see how to set up a system monitoring solution with {es}, {kib},
2626

2727
TIP: The fastest way to get started with {es} is to
2828
https://www.elastic.co/cloud/elasticsearch-service/signup[start a free 14-day
29-
trial of Elasticsearch Service] in the cloud.
29+
trial of {ess}] in the cloud.
3030
--
3131

3232
[[getting-started-install]]
3333
== Get {es} up and running
3434

35-
To take {es} for a test drive, you can create a one-click cloud deployment
36-
on the https://www.elastic.co/cloud/elasticsearch-service/signup[Elasticsearch Service],
37-
or <<run-elasticsearch-local, set up a multi-node {es} cluster>> on your own
35+
To take {es} for a test drive, you can create a
36+
https://www.elastic.co/cloud/elasticsearch-service/signup[hosted deployment] on
37+
the {ess} or set up a multi-node {es} cluster on your own
3838
Linux, macOS, or Windows machine.
3939

4040

4141
[float]
4242
[[run-elasticsearch-local]]
4343
=== Run {es} locally on Linux, macOS, or Windows
4444

45-
When you create a cluster on the Elasticsearch Service, you automatically
46-
get a three-node cluster. By installing from the tar or zip archive, you can
47-
start multiple instances of {es} locally to see how a multi-node cluster behaves.
45+
When you create a deployment on the {ess}, a master node and
46+
two data nodes are provisioned automatically. By installing from the tar or zip
47+
archive, you can start multiple instances of {es} locally to see how a multi-node
48+
cluster behaves.
4849

4950
To run a three-node {es} cluster locally:
5051

51-
. Download the Elasticsearch archive for your OS:
52+
. Download the {es} archive for your OS:
5253
+
5354
Linux: https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-{version}-linux-x86_64.tar.gz[elasticsearch-{version}-linux-x86_64.tar.gz]
5455
+
@@ -92,7 +93,7 @@ Windows PowerShell:
9293
Expand-Archive elasticsearch-{version}-windows-x86_64.zip
9394
--------------------------------------------------
9495

95-
. Start elasticsearch from the `bin` directory:
96+
. Start {es} from the `bin` directory:
9697
+
9798
Linux and macOS:
9899
+
@@ -386,28 +387,8 @@ And the response (partially shown):
386387
// TESTRESPONSE[s/"took" : 63/"took" : $body.took/]
387388
// TESTRESPONSE[s/\.\.\./$body.hits.hits.2, $body.hits.hits.3, $body.hits.hits.4, $body.hits.hits.5, $body.hits.hits.6, $body.hits.hits.7, $body.hits.hits.8, $body.hits.hits.9/]
388389

389-
As for the response, we see the following parts:
390-
391-
* `took` – time in milliseconds for Elasticsearch to execute the search
392-
* `timed_out` – tells us if the search timed out or not
393-
* `_shards` – tells us how many shards were searched, as well as a count of the successful/failed searched shards
394-
* `hits` – search results
395-
* `hits.total` – an object that contains information about the total number of documents matching our search criteria
396-
** `hits.total.value` - the value of the total hit count (must be interpreted in the context of `hits.total.relation`).
397-
** `hits.total.relation` - whether `hits.total.value` is the exact hit count, in which case it is equal to `"eq"` or a
398-
lower bound of the total hit count (greater than or equals), in which case it is equal to `gte`.
399-
* `hits.hits` – actual array of search results (defaults to first 10 documents)
400-
* `hits.sort` - sort value of the sort key for each result (missing if sorting by score)
401-
* `hits._score` and `max_score` - ignore these fields for now
402-
403-
The accuracy of `hits.total` is controlled by the request parameter `track_total_hits`, when set to true
404-
the request will track the total hits accurately (`"relation": "eq"`). It defaults to `10,000`
405-
which means that the total hit count is accurately tracked up to `10,000` documents.
406-
You can force an accurate count by setting `track_total_hits` to true explicitly.
407-
See the <<request-body-search-track-total-hits, request body>> documentation
408-
for more details.
409-
410-
Here is the same exact search above using the alternative request body method:
390+
For example, the following request retrieves all documents in the `bank`
391+
index sorted by account number:
411392

412393
[source,js]
413394
--------------------------------------------------
@@ -506,7 +487,9 @@ GET /bank/_search
506487
// CONSOLE
507488
// TEST[continued]
508489

509-
Note that if `size` is not specified, it defaults to 10.
490+
Each search request is self-contained: {es} does not maintain any
491+
state information across requests. To page through the search hits, specify
492+
the `from` and `size` parameters in your request.
510493

511494
This example does a `match_all` and returns documents 10 through 19:
512495

@@ -524,65 +507,9 @@ GET /bank/_search
524507

525508
The `from` parameter (0-based) specifies which document index to start from and the `size` parameter specifies how many documents to return starting at the from parameter. This feature is useful when implementing paging of search results. Note that if `from` is not specified, it defaults to 0.
526509

527-
This example does a `match_all` and sorts the results by account balance in descending order and returns the top 10 (default size) documents.
528-
529-
[source,js]
530-
--------------------------------------------------
531-
GET /bank/_search
532-
{
533-
"query": { "match_all": {} },
534-
"sort": { "balance": { "order": "desc" } }
535-
}
536-
--------------------------------------------------
537-
// CONSOLE
538-
// TEST[continued]
539-
540-
Now that we have seen a few of the basic search parameters, let's dig in some more into the Query DSL. Let's first take a look at the returned document fields. By default, the full JSON document is returned as part of all searches. This is referred to as the source (`_source` field in the search hits). If we don't want the entire source document returned, we have the ability to request only a few fields from within source to be returned.
541-
542-
This example shows how to return two fields, `account_number` and `balance` (inside of `_source`), from the search:
543-
544-
[source,js]
545-
--------------------------------------------------
546-
GET /bank/_search
547-
{
548-
"query": { "match_all": {} },
549-
"_source": ["account_number", "balance"]
550-
}
551-
--------------------------------------------------
552-
// CONSOLE
553-
// TEST[continued]
554-
555-
Note that the above example simply reduces the `_source` field. It will still only return one field named `_source` but within it, only the fields `account_number` and `balance` are included.
556-
557-
If you come from a SQL background, the above is somewhat similar in concept to the `SQL SELECT FROM` field list.
558-
559-
Now let's move on to the query part. Previously, we've seen how the `match_all` query is used to match all documents. Let's now introduce a new query called the {ref}/query-dsl-match-query.html[`match` query], which can be thought of as a basic fielded search query (i.e. a search done against a specific field or set of fields).
560-
561-
This example returns the account numbered 20:
562-
563-
[source,js]
564-
--------------------------------------------------
565-
GET /bank/_search
566-
{
567-
"query": { "match": { "account_number": 20 } }
568-
}
569-
--------------------------------------------------
570-
// CONSOLE
571-
// TEST[continued]
572-
573-
This example returns all accounts containing the term "mill" in the address:
574-
575-
[source,js]
576-
--------------------------------------------------
577-
GET /bank/_search
578-
{
579-
"query": { "match": { "address": "mill" } }
580-
}
581-
--------------------------------------------------
582-
// CONSOLE
583-
// TEST[continued]
584-
585-
This example returns all accounts containing the term "mill" or "lane" in the address:
510+
To search for specific terms within a field, you can use a `match` query.
511+
For example, the following request searches the `address` field to find
512+
customers whose addresses contain `mill` or `lane`:
586513

587514
[source,js]
588515
--------------------------------------------------
@@ -735,9 +662,15 @@ In addition to the `match_all`, `match`, `bool`, and `range` queries, there are
735662
[[getting-started-aggregations]]
736663
== Analyze results with aggregations
737664

738-
Aggregations provide the ability to group and extract statistics from your data. The easiest way to think about aggregations is by roughly equating it to the SQL GROUP BY and the SQL aggregate functions. In Elasticsearch, you have the ability to execute searches returning hits and at the same time return aggregated results separate from the hits all in one response. This is very powerful and efficient in the sense that you can run queries and multiple aggregations and get the results back of both (or either) operations in one shot avoiding network roundtrips using a concise and simplified API.
665+
{es} aggregations enable you to get meta-information about your search results
666+
and answer questions like, "How many account holders are in Texas?" or
667+
"What's the average balance of accounts in Tennessee?" You can search
668+
documents, filter hits, and use aggregations to analyze the results all in one
669+
request.
739670

740-
To start with, this example groups all the accounts by state, and then returns the top 10 (default) states sorted by count descending (also default):
671+
For example, the following request uses a `terms` aggregation to group
672+
all of the accounts in the `bank` index by state, and returns the ten states
673+
with the most accounts in descending order:
741674

742675
[source,js]
743676
--------------------------------------------------
@@ -756,14 +689,10 @@ GET /bank/_search
756689
// CONSOLE
757690
// TEST[continued]
758691

759-
In SQL, the above aggregation is similar in concept to:
760-
761-
[source,sh]
762-
--------------------------------------------------
763-
SELECT state, COUNT(*) FROM bank GROUP BY state ORDER BY COUNT(*) DESC LIMIT 10;
764-
--------------------------------------------------
765-
766-
And the response (partially shown):
692+
The `buckets` in the response are the values of the `state` field. The
693+
`doc_count` shows the number of accounts in each state. For example, you
694+
can see that there are 27 accounts in `ID` (Idaho). Because the request
695+
set `size=0`, the response only contains the aggregation results.
767696

768697
[source,js]
769698
--------------------------------------------------
@@ -825,12 +754,11 @@ And the response (partially shown):
825754
--------------------------------------------------
826755
// TESTRESPONSE[s/"took": 29/"took": $body.took/]
827756

828-
We can see that there are 27 accounts in `ID` (Idaho), followed by 27 accounts
829-
in `TX` (Texas), followed by 25 accounts in `AL` (Alabama), and so forth.
830-
831-
Note that we set `size=0` to not show search hits because we only want to see the aggregation results in the response.
832757

833-
Building on the previous aggregation, this example calculates the average account balance by state (again only for the top 10 states sorted by count in descending order):
758+
You can combine aggregations to build more complex summaries of your data. For
759+
example, the following request nests an `avg` aggregation within the previous
760+
`group_by_state` aggregation to calculate the average account balances for
761+
each state.
834762

835763
[source,js]
836764
--------------------------------------------------
@@ -856,9 +784,8 @@ GET /bank/_search
856784
// CONSOLE
857785
// TEST[continued]
858786

859-
Notice how we nested the `average_balance` aggregation inside the `group_by_state` aggregation. This is a common pattern for all the aggregations. You can nest aggregations inside aggregations arbitrarily to extract pivoted summarizations that you require from your data.
860-
861-
Building on the previous aggregation, let's now sort on the average balance in descending order:
787+
Instead of sorting the results by count, you could sort using the result of
788+
the nested aggregation by specifying the order within the `terms` aggregation:
862789

863790
[source,js]
864791
--------------------------------------------------
@@ -887,54 +814,14 @@ GET /bank/_search
887814
// CONSOLE
888815
// TEST[continued]
889816

890-
This example demonstrates how we can group by age brackets (ages 20-29, 30-39, and 40-49), then by gender, and then finally get the average account balance, per age bracket, per gender:
891-
892-
[source,js]
893-
--------------------------------------------------
894-
GET /bank/_search
895-
{
896-
"size": 0,
897-
"aggs": {
898-
"group_by_age": {
899-
"range": {
900-
"field": "age",
901-
"ranges": [
902-
{
903-
"from": 20,
904-
"to": 30
905-
},
906-
{
907-
"from": 30,
908-
"to": 40
909-
},
910-
{
911-
"from": 40,
912-
"to": 50
913-
}
914-
]
915-
},
916-
"aggs": {
917-
"group_by_gender": {
918-
"terms": {
919-
"field": "gender.keyword"
920-
},
921-
"aggs": {
922-
"average_balance": {
923-
"avg": {
924-
"field": "balance"
925-
}
926-
}
927-
}
928-
}
929-
}
930-
}
931-
}
932-
}
933-
--------------------------------------------------
934-
// CONSOLE
935-
// TEST[continued]
817+
In addition to basic bucketing and metrics aggregations like these, {es}
818+
provides specialized aggregations for operating on multiple fields and
819+
analyzing particular types of data Such as dates, IP addresses, and geo
820+
data. You can also feed the results of individual aggregations into pipeline
821+
aggregations for further analysis.
936822

937-
There are many other aggregations capabilities that we won't go into detail here. The {ref}/search-aggregations.html[aggregations reference guide] is a great starting point if you want to do further experimentation.
823+
The core analysis capabilities provided by aggregations enable advanced
824+
features such as using machine learning to detect anomalies.
938825

939826
[[getting-started-next-steps]]
940827
== Where to go from here

0 commit comments

Comments
 (0)