Skip to content

Commit

Permalink
Add stats agg api (github#524)
Browse files Browse the repository at this point in the history
* Beginning of the stats_agg docs`

Still WIP for now, but pushing to begin the process.`

* Continuing adding stats_agg API elements

* Apply suggestions from code review

Co-authored-by: Lana Brindley <github@lanabrindley.com>

* Add missing links

* Fix tables

* Apply suggestions from code review

Co-authored-by: Lana Brindley <github@lanabrindley.com>

Co-authored-by: Lana Brindley <github@lanabrindley.com>
  • Loading branch information
davidkohn88 and Loquacity authored Oct 27, 2021
1 parent 84ca585 commit 9c91498
Show file tree
Hide file tree
Showing 27 changed files with 1,084 additions and 103 deletions.
46 changes: 46 additions & 0 deletions api/average-stats.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# average(), average_y(), and average_x() <tag type="toolkit">Toolkit</tag>

```SQL
average(summary StatsSummary1D) RETURNS BIGINT
```
```SQL
average_y(summary StatsSummary2D) RETURNS BIGINT
```
```SQL
average_x(summary StatsSummary2D) RETURNS BIGINT
```

Get the average of the values contained in a statistical aggregate.
In a two-dimensional [`stats_agg`][stats-agg] use the `_y`/ `_x` form to access the
average of the dependent and independent variables.

For more information about statistical aggregate functions, see the
[hyperfunctions documentation][hyperfunctions-stats-agg].

## Required arguments

|Name|Type|Description|
|-|-|-|
|`summary`|`StatsSummary1D`/`StatsSummary2D`|The already constructed data structure from a previous [`stats_agg`][stats-agg] call|

## Returns

|Column|Type|Description|
|-|-|-|
|`average`/`average_y`/`average_x`|`DOUBLE PRECISION`|The average of the values in the statistical aggregate|

## Sample usage

```SQL
SELECT average(stats_agg(data))
FROM generate_series(0, 100) data;
```
```output
average
-----------
50
```


[hyperfunctions-stats-agg]: timescaledb/:currentVersion:/how-to-guides/hyperfunctions/stats-aggs/
[stats-agg]:/hyperfunctions/stats_aggs/stats_agg/
File renamed without changes.
2 changes: 1 addition & 1 deletion api/corr.md → api/corr-counter.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# corr() <tag type="toolkit" content="toolkit" />
The correlation coefficient of the least squares fit line of the adjusted
counter value. Given that the slope of a line for any counter value must be
counter value and epoch value of the time column. Given that the slope of a line for any counter value must be
non-negative, this must also always be non-negative and in the range from 0.0 to
1.0. It measures how well the least squares fit the available data, where a
value of 1.0 represents the strongest correlation between time and the counter
Expand Down
40 changes: 40 additions & 0 deletions api/corr-stats.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
# corr() <tag type="toolkit" content="toolkit" />

```sql
corr(
summary StatsSummary2D
) RETURNS DOUBLE PRECISION
```
The correlation coefficient of the [least squares fit][least-squares] line
computed from a two-dimensional statistical aggregate.

For more information about statistical aggregate functions, see the
[hyperfunctions documentation][hyperfunctions-stats-agg].

## Required arguments

|Name|Type|Description|
|-|-|-|
|`summary`|`StatsSummary2D`|The input StatsSummary from a [`stats_agg` call][stats-agg]|

## Returns

|Name|Type|Description|
|-|-|-|
|`corr`|`DOUBLE PRECISION`|The correlation coefficient of the least squares fit line.|

## Sample usage

```sql
SELECT
id,
time_bucket('15 min'::interval, ts) AS bucket,
corr(stats_agg(y, x)) AS summary
FROM foo
GROUP BY id, time_bucket('15 min'::interval, ts)
```


[hyperfunctions-stats-agg]: timescaledb/:currentVersion:/how-to-guides/hyperfunctions/stats-aggs/
[stats-agg]:/hyperfunctions/stats_aggs/stats_agg/
[least-squares]:https://en.wikipedia.org/wiki/Least_squares
8 changes: 5 additions & 3 deletions api/counter_agg_point.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,9 @@ function.

<highlight type="note">
Note that both `ts` and `value` can be NULL, but the aggregate is not evaluated
on NULL values. This means that if the aggregate receives a NULL value, it will
return NULL, it will not return an error.
on NULL values. This means that if the aggregate receives only a NULL value, it
returns NULL, it does not return an error. If non-NULL values are also received, the NULL
values are ignored. Both `ts` and `value` must be non-NULL for the row to be included.
</highlight>

### Optional arguments
Expand All @@ -44,7 +45,8 @@ extrapolation, but not for other accessor functions.
<!---Any special notes about the returns-->

## Sample usage
This example produces a CounterSummary from timestamps and associated values.
This example produces a CounterSummary from timestamps and associated values,
then computes the [`irate_right` accessor]((/hyperfunctions/counter_aggs/irate/):

``` sql
WITH t as (
Expand Down
30 changes: 15 additions & 15 deletions api/counter_aggs.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,21 +12,21 @@ additional hyperfunctions, you need to install the
|-|-|-|-|-|
|Counter aggregation|Counter aggregates|[`counter_agg`](/hyperfunctions/counter_aggs/counter_agg_point/)|||
|||[`rollup`](/hyperfunctions/counter_aggs/rollup-counter/)|||
|||[`corr`](/hyperfunctions/counter_aggs/corr/)|||
|||[`counter_zero_time`](/hyperfunctions/counter_aggs/counter_zero_time/)|||
|||[`delta`](/hyperfunctions/counter_aggs/delta/)|||
|||[`extrapolated_delta`](/hyperfunctions/counter_aggs/extrapolated_delta/)|||
|||[`extrapolated_rate`](/hyperfunctions/counter_aggs/extrapolated_rate/)|||
|||[`idelta`](/hyperfunctions/counter_aggs/idelta/)|||
|||[`intercept`](/hyperfunctions/counter_aggs/intercept/)|||
|||[`irate`](/hyperfunctions/counter_aggs/irate/)|||
|||[`num_changes`](/hyperfunctions/counter_aggs/num_changes/)|||
|||[`num_elements`](/hyperfunctions/counter_aggs/num_elements/)|||
|||[`num_resets`](/hyperfunctions/counter_aggs/num_resets/)|||
|||[`rate`](/hyperfunctions/counter_aggs/rate/)|||
|||[`slope`](/hyperfunctions/counter_aggs/slope/)|||
|||[`time_delta`](/hyperfunctions/counter_aggs/time_delta/)|||
|||[`with_bounds`](/hyperfunctions/counter_aggs/with_bounds/)|||
|Counter aggregation|Counter aggregate accessors|[`corr`](/hyperfunctions/counter_aggs/corr/)|||
|||[`counter_zero_time`](/hyperfunctions/counter_aggs/counter_zero_time/)|||
|||[`delta`](/hyperfunctions/counter_aggs/delta/)|||
|||[`extrapolated_delta`](/hyperfunctions/counter_aggs/extrapolated_delta/)|||
|||[`extrapolated_rate`](/hyperfunctions/counter_aggs/extrapolated_rate/)|||
|||[`idelta_left`/`idelta_right`](/hyperfunctions/counter_aggs/idelta/)|||
|||[`intercept`](/hyperfunctions/counter_aggs/intercept/)|||
|||[`irate_left`/`irate_right`](/hyperfunctions/counter_aggs/irate/)|||
|||[`num_changes`](/hyperfunctions/counter_aggs/num_changes/)|||
|||[`num_elements`](/hyperfunctions/counter_aggs/num_elements/)|||
|||[`num_resets`](/hyperfunctions/counter_aggs/num_resets/)|||
|||[`rate`](/hyperfunctions/counter_aggs/rate/)|||
|||[`slope`](/hyperfunctions/counter_aggs/slope/)|||
|||[`time_delta`](/hyperfunctions/counter_aggs/time_delta/)|||
|Counter aggregation|Counter aggregate mutators|[`with_bounds`](/hyperfunctions/counter_aggs/with_bounds/)|||


[hyperfunctions-counter-agg]: timescaledb/:currentVersion:/how-to-guides/hyperfunctions/counter-aggregation/
Expand Down
51 changes: 51 additions & 0 deletions api/covariance.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
# covariance() <tag type="toolkit" content="toolkit" />

```sql
covariance(
summary StatsSummary2D,
method TEXT
) RETURNS DOUBLE PRECISION
```
The covariance of the [least squares fit][least-squares] line
computed from a two-dimensional statistical aggregate.

The `method` determines whether you calculate a 'population' or 'sample' covariance.
These values can be provided as their full names, or you can abbreviate them as `pop` or `samp`. These
are the only four accepted values for the `method` argument. The default is `sample`.

For more information about statistical aggregate functions, see the
[hyperfunctions documentation][hyperfunctions-stats-agg].

## Required arguments

|Name|Type|Description|
|-|-|-|
|`summary`|`StatsSummary2D`|The input StatsSummary from a [`stats_agg` call][stats-agg]|

### Optional Arguments

|Name|Type|Description|
|-|-|-|
|`method`|`TEXT`|The method for the calculation 'population' or 'sample' (default)|

## Returns

|Name|Type|Description|
|-|-|-|
|`covariance`|`DOUBLE PRECISION`|The x intercept of the least squares fit line.|

## Sample usage

```sql
SELECT
id,
time_bucket('15 min'::interval, ts) AS bucket,
covariance(stats_agg(y, x)) AS summary
FROM foo
GROUP BY id, time_bucket('15 min'::interval, ts)
```


[hyperfunctions-stats-agg]: timescaledb/:currentVersion:/how-to-guides/hyperfunctions/stats-aggs/
[stats-agg]:/hyperfunctions/stats_aggs/stats_agg/
[least-squares]:https://en.wikipedia.org/wiki/Least_squares
39 changes: 39 additions & 0 deletions api/determination_coeff.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# determination_coeff() <tag type="toolkit" content="toolkit" />

```sql
determination_coeff(
summary StatsSummary2D
) RETURNS DOUBLE PRECISION
```
The coefficient of determination (or the R squared) of the least squares fit line
computed from a two-dimensional statistical aggregate.

For more information about statistical aggregate functions, see the
[hyperfunctions documentation][hyperfunctions-stats-agg].

## Required arguments

|Name|Type|Description|
|-|-|-|
|`summary`|`StatsSummary2D`|The input StatsSummary from a [`stats_agg` call][stats-agg]|

## Returns

|Name|Type|Description|
|-|-|-|
|`determination_coeff`|`DOUBLE PRECISION`|The determination coefficient of the least squares fit line.|

## Sample usage

```sql
SELECT
id,
time_bucket('15 min'::interval, ts) AS bucket,
determination_coeff(stats_agg(y, x)) AS summary
FROM foo
GROUP BY id, time_bucket('15 min'::interval, ts)
```


[hyperfunctions-stats-agg]: timescaledb/:currentVersion:/how-to-guides/hyperfunctions/stats-aggs/
[stats-agg]:/hyperfunctions/stats_aggs/stats_agg/
3 changes: 2 additions & 1 deletion api/intercept.md → api/intercept-counter.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# intercept() <tag type="toolkit" content="toolkit" />
The intercept of the least squares fit line computed from the adjusted counter
The intercept of the [least squares fit][least-squares] line computed from the adjusted counter
values and times input in the CounterSummary. This corresponds to the projected
value at the PostgreSQL epoch (2000-01-01 00:00:00+00). This is useful for
drawing the best fit line on a graph, using the slope and the intercept.
Expand Down Expand Up @@ -44,3 +44,4 @@ FROM (


[hyperfunctions-counter-agg]: timescaledb/:currentVersion:/how-to-guides/hyperfunctions/counter-aggregation/
[least-squares]:https://en.wikipedia.org/wiki/Least_squares
41 changes: 41 additions & 0 deletions api/intercept-stats.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
# intercept() <tag type="toolkit" content="toolkit" />

```sql
intercept(
summary StatsSummary2D
) RETURNS DOUBLE PRECISION
```

The y intercept of the [least squares fit][least-squares] line computed
from a two-dimensional statistical aggregate.

For more information about statistical aggregate functions, see the
[hyperfunctions documentation][hyperfunctions-stats-agg].

## Required arguments

|Name|Type|Description|
|-|-|-|
|`summary`|`StatsSummary2D`|The input StatsSummary from a [`stats_agg` call][stats-agg]|

## Returns

|Name|Type|Description|
|-|-|-|
|`intercept`|`DOUBLE PRECISION`|The y intercept of the least squares fit line.|

## Sample usage

```sql
SELECT
id,
time_bucket('15 min'::interval, ts) AS bucket,
intercept(stats_agg(y, x)) AS summary
FROM foo
GROUP BY id, time_bucket('15 min'::interval, ts)
```


[hyperfunctions-stats-agg]: timescaledb/:currentVersion:/how-to-guides/hyperfunctions/stats-aggs/
[stats-agg]:/hyperfunctions/stats_aggs/stats_agg/
[least-squares]:https://en.wikipedia.org/wiki/Least_squares
57 changes: 57 additions & 0 deletions api/kurtosis.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
# kurtosis() / kurtosis_y() / kurtosis_x() <tag type="toolkit">Toolkit</tag>

```SQL
kurtosis(summary StatsSummary1D, method TEXT) RETURNS BIGINT
```
```SQL
kurtosis_y(summary StatsSummary2D, method TEXT) RETURNS BIGINT
```
```SQL
kurtosis_x(summary StatsSummary2D, method TEXT) RETURNS BIGINT
```

Calculate the [kurtosis][kurtosis], or the 4th statistical moment, of the values contained
in a statistical aggregate. In a two-dimensional [`stats_agg`][stats-agg] use
the `_y`/ `_x` form to access the `kurtosis` of the dependent and independent variables.

The `method` determines whether you calculate a population or sample kurtosis. These
values can be provided as their full names, or you can abbreviate them as `pop` or `samp`.
These are the only four accepted values for the `method` argument. The default is `sample`.

For more information about statistical aggregate functions, see the
[hyperfunctions documentation][hyperfunctions-stats-agg].

## Required arguments

|Name|Type|Description|
|-|-|-|
|`summary`|`StatsSummary1D`/`StatsSummary2D`|The already constructed data structure from a previous [`stats_agg`][stats-agg] call|

### Optional arguments

|Name|Type|Description|
|-|-|-|
|`method`|`TEXT`|The method for the calculation 'population' or 'sample' (default)|

## Returns

|Column|Type|Description|
|-|-|-|
|`kurtosis`/`kurtosis_y`/`kurtosis_x`|`DOUBLE PRECISION`|The kurtosis of the values in the statistical aggregate|

## Sample usage

```SQL
SELECT kurtosis_y(stats_agg(data, data))
FROM generate_series(0, 100) data;
```
```output
kurtosis_y
------------
1.78195
```


[hyperfunctions-stats-agg]: timescaledb/:currentVersion:/how-to-guides/hyperfunctions/stats-aggs/
[stats-agg]:/hyperfunctions/stats_aggs/stats_agg/
[kurtosis]: https://en.wikipedia.org/wiki/Kurtosis
5 changes: 2 additions & 3 deletions api/num_vals.md → api/num_vals-pct.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# num_vals() <tag type="toolkit">Toolkit</tag>

```SQL
num_vals(sketch UddSketch) RETURNS DOUBLE PRECISION
num_vals(StatsSummary1D) RETURNS DOUBLE PRECISION
```
```SQL
num_vals(digest tdigest) RETURNS DOUBLE PRECISION
Expand All @@ -13,8 +13,7 @@ aggregate. You can compute a single percentile estimator by extracting the
`num_vals` from the percentile estimator. You do not need to specify a separate
`count` aggregate.

* For more information about statistical aggregate functions, see the
[hyperfunctions documentation][hyperfunctions-stats-agg].

* For more information about percentile approximation functions, see the
[hyperfunctions documentation][hyperfunctions-percentile-approx].

Expand Down
Loading

0 comments on commit 9c91498

Please sign in to comment.