Add stats agg api (github#524)

* Beginning of the stats_agg docs` Still WIP for now, but pushing to begin the process.` * Continuing adding stats_agg API elements * Apply suggestions from code review Co-authored-by: Lana Brindley <github@lanabrindley.com> * Add missing links * Fix tables * Apply suggestions from code review Co-authored-by: Lana Brindley <github@lanabrindley.com> Co-authored-by: Lana Brindley <github@lanabrindley.com>
jnidzwetzki · Oct 27, 2021 · 9c91498 · 9c91498
1 parent 84ca585
commit 9c91498
Show file tree

Hide file tree

Showing 27 changed files with 1,084 additions and 103 deletions.
diff --git a/api/average-stats.md b/api/average-stats.md
@@ -0,0 +1,46 @@
+# average(), average_y(), and average_x() <tag type="toolkit">Toolkit</tag>
+
+```SQL
+average(summary StatsSummary1D) RETURNS BIGINT
+```
+```SQL
+average_y(summary StatsSummary2D) RETURNS BIGINT
+```
+```SQL
+average_x(summary StatsSummary2D) RETURNS BIGINT
+```
+
+Get the average of the values contained in a statistical aggregate.
+In a two-dimensional [`stats_agg`][stats-agg] use the `_y`/ `_x` form to access the 
+average of the dependent and independent variables. 
+
+For more information about statistical aggregate functions, see the
+[hyperfunctions documentation][hyperfunctions-stats-agg].
+
+## Required arguments
+
+|Name|Type|Description|
+|-|-|-|
+|`summary`|`StatsSummary1D`/`StatsSummary2D`|The already constructed data structure from a previous [`stats_agg`][stats-agg] call|
+
+## Returns
+
+|Column|Type|Description|
+|-|-|-|
+|`average`/`average_y`/`average_x`|`DOUBLE PRECISION`|The average of  the values in the statistical aggregate|
+
+## Sample usage
+
+```SQL
+SELECT average(stats_agg(data))
+FROM generate_series(0, 100) data;
+```
+```output
+ average
+-----------
+       50
+```
+
+
+[hyperfunctions-stats-agg]: timescaledb/:currentVersion:/how-to-guides/hyperfunctions/stats-aggs/
+[stats-agg]:/hyperfunctions/stats_aggs/stats_agg/
diff --git a/api/average.md → api/average-time-weight.md b/api/average.md → api/average-time-weight.md
diff --git a/api/corr.md → api/corr-counter.md b/api/corr.md → api/corr-counter.md
@@ -1,6 +1,6 @@
 # corr() <tag type="toolkit" content="toolkit" />
 The correlation coefficient of the least squares fit line of the adjusted
-counter value. Given that the slope of a line for any counter value must be
+counter value and epoch value of the time column. Given that the slope of a line for any counter value must be
 non-negative, this must also always be non-negative and in the range from 0.0 to
 1.0. It measures how well the least squares fit the available data, where a
 value of 1.0 represents the strongest correlation between time and the counter

diff --git a/api/corr-stats.md b/api/corr-stats.md
@@ -0,0 +1,40 @@
+# corr() <tag type="toolkit" content="toolkit" />
+
+```sql
+corr(
+    summary StatsSummary2D
+) RETURNS DOUBLE PRECISION
+```
+The correlation coefficient of the [least squares fit][least-squares] line 
+computed from a two-dimensional statistical aggregate. 
+
+For more information about statistical aggregate functions, see the
+[hyperfunctions documentation][hyperfunctions-stats-agg].
+
+## Required arguments
+
+|Name|Type|Description|
+|-|-|-|
+|`summary`|`StatsSummary2D`|The input StatsSummary from a [`stats_agg` call][stats-agg]|
+
+## Returns
+
+|Name|Type|Description|
+|-|-|-|
+|`corr`|`DOUBLE PRECISION`|The correlation coefficient of the least squares fit line.|
+
+## Sample usage
+
+```sql
+SELECT
+    id,
+    time_bucket('15 min'::interval, ts) AS bucket,
+    corr(stats_agg(y, x)) AS summary
+FROM foo
+GROUP BY id, time_bucket('15 min'::interval, ts)
+```
+
+
+[hyperfunctions-stats-agg]: timescaledb/:currentVersion:/how-to-guides/hyperfunctions/stats-aggs/
+[stats-agg]:/hyperfunctions/stats_aggs/stats_agg/
+[least-squares]:https://en.wikipedia.org/wiki/Least_squares
diff --git a/api/counter_agg_point.md b/api/counter_agg_point.md
@@ -20,8 +20,9 @@ function.
 
 <highlight type="note">
 Note that both `ts` and `value` can be NULL, but the aggregate is not evaluated
-on NULL values. This means that if the aggregate receives a NULL value, it will
-return NULL, it will not return an error.
+on NULL values. This means that if the aggregate receives only a NULL value, it
+returns NULL, it does not return an error. If non-NULL values are also received, the NULL 
+values are ignored. Both `ts` and `value` must be non-NULL for the row to be included.
 </highlight>
 
 ### Optional arguments
@@ -44,7 +45,8 @@ extrapolation, but not for other accessor functions.
 <!---Any special notes about the returns-->
 
 ## Sample usage
-This example produces a CounterSummary from timestamps and associated values.
+This example produces a CounterSummary from timestamps and associated values,
+then computes the [`irate_right` accessor]((/hyperfunctions/counter_aggs/irate/):
 
 ``` sql
 WITH t as (

diff --git a/api/counter_aggs.md b/api/counter_aggs.md
@@ -12,21 +12,21 @@ additional hyperfunctions, you need to install the
 |-|-|-|-|-|
 |Counter aggregation|Counter aggregates|[`counter_agg`](/hyperfunctions/counter_aggs/counter_agg_point/)|❌|✅|
 |||[`rollup`](/hyperfunctions/counter_aggs/rollup-counter/)|❌|✅|
-|||[`corr`](/hyperfunctions/counter_aggs/corr/)|✅|❌|
-|||[`counter_zero_time`](/hyperfunctions/counter_aggs/counter_zero_time/)|✅|❌|
-|||[`delta`](/hyperfunctions/counter_aggs/delta/)|✅|❌|
-|||[`extrapolated_delta`](/hyperfunctions/counter_aggs/extrapolated_delta/)|✅|❌|
-|||[`extrapolated_rate`](/hyperfunctions/counter_aggs/extrapolated_rate/)|✅|❌|
-|||[`idelta`](/hyperfunctions/counter_aggs/idelta/)|✅|❌|
-|||[`intercept`](/hyperfunctions/counter_aggs/intercept/)|✅|❌|
-|||[`irate`](/hyperfunctions/counter_aggs/irate/)|✅|❌|
-|||[`num_changes`](/hyperfunctions/counter_aggs/num_changes/)|✅|❌|
-|||[`num_elements`](/hyperfunctions/counter_aggs/num_elements/)|✅|❌|
-|||[`num_resets`](/hyperfunctions/counter_aggs/num_resets/)|✅|❌|
-|||[`rate`](/hyperfunctions/counter_aggs/rate/)|✅|❌|
-|||[`slope`](/hyperfunctions/counter_aggs/slope/)|✅|❌|
-|||[`time_delta`](/hyperfunctions/counter_aggs/time_delta/)|✅|❌|
-|||[`with_bounds`](/hyperfunctions/counter_aggs/with_bounds/)|❌|✅|
+|Counter aggregation|Counter aggregate accessors|[`corr`](/hyperfunctions/counter_aggs/corr/)|❌|✅|
+|||[`counter_zero_time`](/hyperfunctions/counter_aggs/counter_zero_time/)|❌|✅|
+|||[`delta`](/hyperfunctions/counter_aggs/delta/)|❌|✅|
+|||[`extrapolated_delta`](/hyperfunctions/counter_aggs/extrapolated_delta/)|❌|✅|
+|||[`extrapolated_rate`](/hyperfunctions/counter_aggs/extrapolated_rate/)|❌|✅|
+|||[`idelta_left`/`idelta_right`](/hyperfunctions/counter_aggs/idelta/)|❌|✅|
+|||[`intercept`](/hyperfunctions/counter_aggs/intercept/)|❌|✅|
+|||[`irate_left`/`irate_right`](/hyperfunctions/counter_aggs/irate/)|❌|✅|
+|||[`num_changes`](/hyperfunctions/counter_aggs/num_changes/)|❌|✅|
+|||[`num_elements`](/hyperfunctions/counter_aggs/num_elements/)|❌|✅|
+|||[`num_resets`](/hyperfunctions/counter_aggs/num_resets/)|❌|✅|
+|||[`rate`](/hyperfunctions/counter_aggs/rate/)|❌|✅|
+|||[`slope`](/hyperfunctions/counter_aggs/slope/)|❌|✅|
+|||[`time_delta`](/hyperfunctions/counter_aggs/time_delta/)|❌|✅|
+|Counter aggregation|Counter aggregate mutators|[`with_bounds`](/hyperfunctions/counter_aggs/with_bounds/)|❌|✅|
 
 
 [hyperfunctions-counter-agg]: timescaledb/:currentVersion:/how-to-guides/hyperfunctions/counter-aggregation/

diff --git a/api/covariance.md b/api/covariance.md
@@ -0,0 +1,51 @@
+# covariance() <tag type="toolkit" content="toolkit" />
+
+```sql
+covariance(
+    summary StatsSummary2D,
+    method TEXT 
+) RETURNS DOUBLE PRECISION
+```
+The covariance of the [least squares fit][least-squares] line 
+computed from a two-dimensional statistical aggregate. 
+
+The `method` determines whether you calculate a 'population' or 'sample' covariance. 
+These values can be provided as their full names, or you can abbreviate them as `pop` or `samp`. These
+are the only four accepted values for the `method` argument. The default is `sample`.
+
+For more information about statistical aggregate functions, see the
+[hyperfunctions documentation][hyperfunctions-stats-agg].
+
+## Required arguments
+
+|Name|Type|Description|
+|-|-|-|
+|`summary`|`StatsSummary2D`|The input StatsSummary from a [`stats_agg` call][stats-agg]|
+
+### Optional Arguments
+
+|Name|Type|Description|
+|-|-|-|
+|`method`|`TEXT`|The method for the calculation 'population' or 'sample' (default)|
+
+## Returns
+
+|Name|Type|Description|
+|-|-|-|
+|`covariance`|`DOUBLE PRECISION`|The x intercept of the least squares fit line.|
+
+## Sample usage
+
+```sql
+SELECT
+    id,
+    time_bucket('15 min'::interval, ts) AS bucket,
+    covariance(stats_agg(y, x)) AS summary
+FROM foo
+GROUP BY id, time_bucket('15 min'::interval, ts)
+```
+
+
+[hyperfunctions-stats-agg]: timescaledb/:currentVersion:/how-to-guides/hyperfunctions/stats-aggs/
+[stats-agg]:/hyperfunctions/stats_aggs/stats_agg/
+[least-squares]:https://en.wikipedia.org/wiki/Least_squares
diff --git a/api/determination_coeff.md b/api/determination_coeff.md
@@ -0,0 +1,39 @@
+# determination_coeff() <tag type="toolkit" content="toolkit" />
+
+```sql
+determination_coeff(
+    summary StatsSummary2D
+) RETURNS DOUBLE PRECISION
+```
+The coefficient of determination (or the R squared) of the least squares fit line 
+computed from a two-dimensional statistical aggregate. 
+
+For more information about statistical aggregate functions, see the
+[hyperfunctions documentation][hyperfunctions-stats-agg].
+
+## Required arguments
+
+|Name|Type|Description|
+|-|-|-|
+|`summary`|`StatsSummary2D`|The input StatsSummary from a [`stats_agg` call][stats-agg]|
+
+## Returns
+
+|Name|Type|Description|
+|-|-|-|
+|`determination_coeff`|`DOUBLE PRECISION`|The determination coefficient of the least squares fit line.|
+
+## Sample usage
+
+```sql
+SELECT
+    id,
+    time_bucket('15 min'::interval, ts) AS bucket,
+    determination_coeff(stats_agg(y, x)) AS summary
+FROM foo
+GROUP BY id, time_bucket('15 min'::interval, ts)
+```
+
+
+[hyperfunctions-stats-agg]: timescaledb/:currentVersion:/how-to-guides/hyperfunctions/stats-aggs/
+[stats-agg]:/hyperfunctions/stats_aggs/stats_agg/
diff --git a/api/intercept.md → api/intercept-counter.md b/api/intercept.md → api/intercept-counter.md
@@ -1,5 +1,5 @@
 # intercept() <tag type="toolkit" content="toolkit" />
-The intercept of the least squares fit line computed from the adjusted counter
+The intercept of the [least squares fit][least-squares] line computed from the adjusted counter
 values and times input in the CounterSummary. This corresponds to the projected
 value at the PostgreSQL epoch (2000-01-01 00:00:00+00). This is useful for
 drawing the best fit line on a graph, using the slope and the intercept.
@@ -44,3 +44,4 @@ FROM (
 
 
 [hyperfunctions-counter-agg]: timescaledb/:currentVersion:/how-to-guides/hyperfunctions/counter-aggregation/
+[least-squares]:https://en.wikipedia.org/wiki/Least_squares
diff --git a/api/intercept-stats.md b/api/intercept-stats.md
@@ -0,0 +1,41 @@
+# intercept() <tag type="toolkit" content="toolkit" />
+
+```sql
+intercept(
+    summary StatsSummary2D
+) RETURNS DOUBLE PRECISION
+```
+
+The y intercept of the [least squares fit][least-squares] line computed 
+from a two-dimensional statistical aggregate. 
+
+For more information about statistical aggregate functions, see the
+[hyperfunctions documentation][hyperfunctions-stats-agg].
+
+## Required arguments
+
+|Name|Type|Description|
+|-|-|-|
+|`summary`|`StatsSummary2D`|The input StatsSummary from a [`stats_agg` call][stats-agg]|
+
+## Returns
+
+|Name|Type|Description|
+|-|-|-|
+|`intercept`|`DOUBLE PRECISION`|The y intercept of the least squares fit line.|
+
+## Sample usage
+
+```sql
+SELECT
+    id,
+    time_bucket('15 min'::interval, ts) AS bucket,
+    intercept(stats_agg(y, x)) AS summary
+FROM foo
+GROUP BY id, time_bucket('15 min'::interval, ts)
+```
+
+
+[hyperfunctions-stats-agg]: timescaledb/:currentVersion:/how-to-guides/hyperfunctions/stats-aggs/
+[stats-agg]:/hyperfunctions/stats_aggs/stats_agg/
+[least-squares]:https://en.wikipedia.org/wiki/Least_squares
diff --git a/api/kurtosis.md b/api/kurtosis.md
@@ -0,0 +1,57 @@
+# kurtosis() / kurtosis_y() / kurtosis_x() <tag type="toolkit">Toolkit</tag>
+
+```SQL
+kurtosis(summary StatsSummary1D, method TEXT) RETURNS BIGINT
+```
+```SQL
+kurtosis_y(summary StatsSummary2D, method TEXT) RETURNS BIGINT
+```
+```SQL
+kurtosis_x(summary StatsSummary2D, method TEXT) RETURNS BIGINT
+```
+
+Calculate the [kurtosis][kurtosis], or the 4th statistical moment, of the values contained
+in a statistical aggregate. In a two-dimensional [`stats_agg`][stats-agg] use 
+the `_y`/ `_x` form to access the `kurtosis` of the dependent and independent variables. 
+
+The `method` determines whether you calculate a population or sample kurtosis. These 
+values can be provided as their full names, or you can abbreviate them as `pop` or `samp`. 
+These are the only four accepted values for the `method` argument. The default is `sample`.
+
+For more information about statistical aggregate functions, see the
+[hyperfunctions documentation][hyperfunctions-stats-agg].
+
+## Required arguments
+
+|Name|Type|Description|
+|-|-|-|
+|`summary`|`StatsSummary1D`/`StatsSummary2D`|The already constructed data structure from a previous [`stats_agg`][stats-agg] call|
+
+### Optional arguments
+
+|Name|Type|Description|
+|-|-|-|
+|`method`|`TEXT`|The method for the calculation 'population' or 'sample' (default)|
+
+## Returns
+
+|Column|Type|Description|
+|-|-|-|
+|`kurtosis`/`kurtosis_y`/`kurtosis_x`|`DOUBLE PRECISION`|The kurtosis of the values in the statistical aggregate|
+
+## Sample usage
+
+```SQL
+SELECT kurtosis_y(stats_agg(data, data))
+FROM generate_series(0, 100) data;
+```
+```output
+  kurtosis_y 
+------------
+    1.78195
+```
+
+
+[hyperfunctions-stats-agg]: timescaledb/:currentVersion:/how-to-guides/hyperfunctions/stats-aggs/
+[stats-agg]:/hyperfunctions/stats_aggs/stats_agg/
+[kurtosis]: https://en.wikipedia.org/wiki/Kurtosis
diff --git a/api/num_vals.md → api/num_vals-pct.md b/api/num_vals.md → api/num_vals-pct.md
@@ -1,7 +1,7 @@
 # num_vals()  <tag type="toolkit">Toolkit</tag>
 
 ```SQL
-num_vals(sketch UddSketch) RETURNS DOUBLE PRECISION
+num_vals(StatsSummary1D) RETURNS DOUBLE PRECISION
 ```
 ```SQL
 num_vals(digest tdigest) RETURNS DOUBLE PRECISION
@@ -13,8 +13,7 @@ aggregate. You can compute a single percentile estimator by extracting the
 `num_vals` from the percentile estimator. You do not need to specify a separate
 `count` aggregate.
 
-*   For more information about statistical aggregate functions, see the
-    [hyperfunctions documentation][hyperfunctions-stats-agg].
+
 *   For more information about percentile approximation functions, see the
     [hyperfunctions documentation][hyperfunctions-percentile-approx].