You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The `matrix_stats` aggregation generates advanced stats for multiple fields in a matrix form.
13
-
The following example returns advanced stats in a matrix form for the `taxful_total_price` and `products.base_price` fields:
12
+
The `matrix_stats` aggregation is a multi-value metric aggregation that generates covariance statistics for two or more fields in matrix form.
13
+
14
+
The `matrix_stats` aggregation does not support scripting.
15
+
{: .note}
16
+
17
+
## Parameters
18
+
19
+
The `matrix_stats` aggregation takes the following parameters.
20
+
21
+
| Parameter | Required/Optional | Data type | Description |
22
+
| :-- | :-- | :-- | :-- |
23
+
|`fields`| Required | String | An array of fields for which the matrix stats are computed. |
24
+
|`missing`| Optional | Object | The value to use in place of missing values. By default, missing values are ignored. See [Missing values](#missing-values). |
25
+
|`mode`| Optional | String | The value to use as a sample from a multi-valued or array field. Allowed values are `avg`, `min`, `max`, `sum`, and `median`. Default is `avg`. |
26
+
27
+
## Example
28
+
29
+
The following example returns statistics for the `taxful_total_price` and `products.base_price` fields in the OpenSearch Dashboards e-commerce sample data:
14
30
15
31
```json
16
32
GET opensearch_dashboards_sample_data_ecommerce/_search
@@ -27,60 +43,262 @@ GET opensearch_dashboards_sample_data_ecommerce/_search
27
43
```
28
44
{% include copy-curl.html %}
29
45
30
-
#### Example response
46
+
The response containes the aggregated results:
31
47
32
48
```json
33
-
...
34
-
"aggregations" : {
35
-
"matrix_stats_taxful_total_price" : {
36
-
"doc_count" : 4675,
37
-
"fields" : [
38
-
{
39
-
"name" : "products.base_price",
40
-
"count" : 4675,
41
-
"mean" : 34.994239430147196,
42
-
"variance" : 360.5035285833703,
43
-
"skewness" : 5.530161335032702,
44
-
"kurtosis" : 131.16306324042148,
45
-
"covariance" : {
46
-
"products.base_price" : 360.5035285833703,
47
-
"taxful_total_price" : 846.6489362233166
49
+
{
50
+
"took": 250,
51
+
"timed_out": false,
52
+
"_shards": {
53
+
"total": 1,
54
+
"successful": 1,
55
+
"skipped": 0,
56
+
"failed": 0
57
+
},
58
+
"hits": {
59
+
"total": {
60
+
"value": 4675,
61
+
"relation": "eq"
62
+
},
63
+
"max_score": null,
64
+
"hits": []
65
+
},
66
+
"aggregations": {
67
+
"matrix_stats_taxful_total_price": {
68
+
"doc_count": 4675,
69
+
"fields": [
70
+
{
71
+
"name": "products.base_price",
72
+
"count": 4675,
73
+
"mean": 34.99423943014724,
74
+
"variance": 360.5035285833702,
75
+
"skewness": 5.530161335032689,
76
+
"kurtosis": 131.1630632404217,
77
+
"covariance": {
78
+
"products.base_price": 360.5035285833702,
79
+
"taxful_total_price": 846.6489362233169
80
+
},
81
+
"correlation": {
82
+
"products.base_price": 1,
83
+
"taxful_total_price": 0.8444765264325269
84
+
}
48
85
},
49
-
"correlation" : {
50
-
"products.base_price" : 1.0,
51
-
"taxful_total_price" : 0.8444765264325268
86
+
{
87
+
"name": "taxful_total_price",
88
+
"count": 4675,
89
+
"mean": 75.05542864304839,
90
+
"variance": 2788.1879749835425,
91
+
"skewness": 15.812149139923994,
92
+
"kurtosis": 619.1235507385886,
93
+
"covariance": {
94
+
"products.base_price": 846.6489362233169,
95
+
"taxful_total_price": 2788.1879749835425
96
+
},
97
+
"correlation": {
98
+
"products.base_price": 0.8444765264325269,
99
+
"taxful_total_price": 1
100
+
}
52
101
}
53
-
},
54
-
{
55
-
"name" : "taxful_total_price",
56
-
"count" : 4675,
57
-
"mean" : 75.05542864304839,
58
-
"variance" : 2788.1879749835402,
59
-
"skewness" : 15.812149139924037,
60
-
"kurtosis" : 619.1235507385902,
61
-
"covariance" : {
62
-
"products.base_price" : 846.6489362233166,
63
-
"taxful_total_price" : 2788.1879749835402
102
+
]
103
+
}
104
+
}
105
+
}
106
+
```
107
+
108
+
The following table describes the response fields.
109
+
110
+
| Statistic | Description |
111
+
| :--- | :--- |
112
+
|`count`| The number of documents sampled for the aggregation. |
113
+
|`mean`| The average value of the field computed from the sample. |
114
+
|`variance`| The square of deviation from the mean, a measure of data spread. |
115
+
|`skewness`| A measure of the distribution's asymmetry relative to the mean. See [Skewness](https://en.wikipedia.org/wiki/Skewness). |
116
+
|`kurtosis`| A measure of the tail-heaviness of a distribution. As the tails become lighter, kurtosis decreases. Kurtosis and skewness are evaluated to determine whether a population is likely to be [normally distributed](https://en.wikipedia.org/wiki/Normal_distribution). See [Kurtosis](https://en.wikipedia.org/wiki/Kurtosis).|
117
+
|`covariance`| A measure of the joint variability between two fields. A positive value means their values move in the same direction. |
118
+
|`correlation`| The normalized covariance, a measure of the strength of the relationship between two fields. Possible values are from -1 to 1, inclusive, indicating perfect negative to perfect positive linear correlation. A value of 0 indicates no discernible relationship between the variables. |
119
+
120
+
## Missing values
121
+
122
+
To define how missing values are treated, use the `missing` parameter. By default, missing values are ignored.
123
+
124
+
For example, create an index in which document 1 is missing the `gpa` and `class_grades` fields:
First, run a `matrix_stats` aggregation without providing a `missing` parameter:
138
+
139
+
```json
140
+
GET students/_search
141
+
{
142
+
"size": 0,
143
+
"aggs": {
144
+
"matrix_stats_taxful_total_price": {
145
+
"matrix_stats": {
146
+
"fields": [
147
+
"gpa",
148
+
"class_grades"
149
+
],
150
+
"mode": "avg"
151
+
}
152
+
}
153
+
}
154
+
}
155
+
```
156
+
{% include copy-curl.html %}
157
+
158
+
OpenSearch ignores missing values when calculating the matrix statistics:
159
+
160
+
```json
161
+
{
162
+
"took": 5,
163
+
"timed_out": false,
164
+
"terminated_early": true,
165
+
"_shards": {
166
+
"total": 1,
167
+
"successful": 1,
168
+
"skipped": 0,
169
+
"failed": 0
170
+
},
171
+
"hits": {
172
+
"total": {
173
+
"value": 3,
174
+
"relation": "eq"
175
+
},
176
+
"max_score": null,
177
+
"hits": []
178
+
},
179
+
"aggregations": {
180
+
"matrix_stats_taxful_total_price": {
181
+
"doc_count": 2,
182
+
"fields": [
183
+
{
184
+
"name": "gpa",
185
+
"count": 2,
186
+
"mean": 3.684999942779541,
187
+
"variance": 0.05444997482300096,
188
+
"skewness": 0,
189
+
"kurtosis": 1,
190
+
"covariance": {
191
+
"gpa": 0.05444997482300096,
192
+
"class_grades": 0.09899998760223136
193
+
},
194
+
"correlation": {
195
+
"gpa": 1,
196
+
"class_grades": 0.9999999999999991
197
+
}
64
198
},
65
-
"correlation" : {
66
-
"products.base_price" : 0.8444765264325268,
67
-
"taxful_total_price" : 1.0
199
+
{
200
+
"name": "class_grades",
201
+
"count": 2,
202
+
"mean": 3.333333333333333,
203
+
"variance": 0.1800000381469746,
204
+
"skewness": 0,
205
+
"kurtosis": 1,
206
+
"covariance": {
207
+
"gpa": 0.09899998760223136,
208
+
"class_grades": 0.1800000381469746
209
+
},
210
+
"correlation": {
211
+
"gpa": 0.9999999999999991,
212
+
"class_grades": 1
213
+
}
214
+
}
215
+
]
216
+
}
217
+
}
218
+
}
219
+
```
220
+
221
+
To set the missing fields to `0`, provide the `missing` parameter as a key-value map. Even though `class_grades` is an array field, the `matrix_stats` aggregation flattens multi-valued numeric fields into a per-document average, so you must supply a single number as the missing value:
222
+
223
+
```json
224
+
GET students/_search
225
+
{
226
+
"size": 0,
227
+
"aggs": {
228
+
"matrix_stats_taxful_total_price": {
229
+
"matrix_stats": {
230
+
"fields": ["gpa", "class_grades"],
231
+
"mode": "avg",
232
+
"missing": {
233
+
"gpa": 0,
234
+
"class_grades": 0
68
235
}
69
236
}
70
-
]
237
+
}
71
238
}
72
-
}
73
239
}
74
240
```
241
+
{% include copy-curl.html %}
75
242
76
-
The following table lists all response fields.
77
-
78
-
Statistic | Description
79
-
:--- | :---
80
-
`count` | The number of samples measured.
81
-
`mean` | The average value of the field measured from the sample.
82
-
`variance` | How far the values of the field measured are spread out from its mean value. The larger the variance, the more it's spread from its mean value.
83
-
`skewness` | An asymmetric measure of the distribution of the field's values around the mean.
84
-
`kurtosis` | A measure of the tail heaviness of a distribution. As the tail becomes lighter, kurtosis decreases. As the tail becomes heavier, kurtosis increases. To learn about kurtosis, see [Wikipedia](https://en.wikipedia.org/wiki/Kurtosis).
85
-
`covariance` | A measure of the joint variability between two fields. A positive value means their values move in the same direction and the other way around.
86
-
`correlation` | A measure of the strength of the relationship between two fields. The valid values are between [-1, 1]. A value of -1 means that the value is negatively correlated and a value of 1 means that it's positively correlated. A value of 0 means that there's no identifiable relationship between them.
243
+
OpenSearch substitutes `0` for any missing `gpa` or `class_grades` values when calculating the matrix statistics:
0 commit comments