Skip to content

Commit

Permalink
Merge pull request #11 from andreimatei/dist-sql-rfc-types-of-aggrega…
Browse files Browse the repository at this point in the history
…tors

list the types of aggregators
  • Loading branch information
andreimatei committed Apr 13, 2016
2 parents fe8d2f1 + 1468c41 commit 4b780dd
Showing 1 changed file with 47 additions and 0 deletions.
47 changes: 47 additions & 0 deletions docs/RFCS/distributed_sql.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
* [Example 1](#example-1)
* [Example 2](#example-2)
* [Example 3](#example-3)
* [Types of aggregators](#types-of-aggregators)
* [From logical to physical](#from-logical-to-physical)
* [Processors](#processors)
* [Joins](#joins)
Expand Down Expand Up @@ -431,6 +432,52 @@ AGGREGATOR final:
Composition: src -> countdistinctmin -> final
```

### Types of aggregators

- `TABLE READER` is a special agregator, with no input stream. It's configured
with spans of a table or index and the schema that it needs to read.
Like every other aggregator, it can be configured with a programmable output
filter.
- `PROGRAM` is a fully programmable no-grouping aggregator. It runs a "program"
on each individual row. The program can drop the row, or modify it
arbitrarily.
- `JOIN` performs a join on two streams, with equality constraints between
certain columns. The aggregator is grouped on the columns that are
constrained to be equal. See [Stream joins](#stream-joins).
- `JOIN READER` performs point-lookups for rows with the kyes indicated by the
input stream. It can do so by performing (potentially remote) KV reads, or by
setting up remote flows. See [Join-by-lookup](#join-by-lookup) and
[On-the-fly flows setup](#on-the-fly-flows-setup).
- `MUTATE` performs insertions/deletions/updates to KV. See section TODO.
- `SET OPERATION` takes several inputs and performs set arithmetic on them
(union, difference).
- `AGGREGATOR` is the one that does "aggregation" in the SQL sense. It groups
rows and computes an aggregate for each group. The group is configured using
the group key. `AGGREGATOR` can be configured with one or more aggregation
functions:
- `SUM`
- `COUNT`
- `COUNT DISTINCT`
- `DISTINCT`
`AGGREGATOR`'s output schema consists of the group key, plus a configurable
subset of the the generated aggregated values. The optional output filter has
access to the group key and all the aggregagated values (i.e. it can use even
values that are not ultimately outputted).
- `SORT` sorts the input according to a configurable set of columns. Note that
this is a no-grouping aggregator, hence it can be distributed arbitrarily to
the data producers. This means, of course, that it doesn't produce a global
ordering, instead it just guarantees an intra-stream ordering on each
physical output streams). The global ordering, when needed, is achieved by an
input synchronizer of a grouped processor (such as `LIMIT` or `FINAL`).
- `LIMIT` is a single-group aggregator that stops after reading so many input
rows.
- `INTENT-COLLECTOR` is a single-group aggregator, scheduled on the gateway,
that receives all the intents generated by a `MUTATE` and keeps track of them
in memory until the transaction is committed.
- `FINAL` is a single-group aggregator, scheduled on the gateway, that collects
the results of the query. This aggregator will be hooked up to the pgwire
connection to the client.

## From logical to physical

To distribute the computation that was described in terms of aggregators and
Expand Down

0 comments on commit 4b780dd

Please sign in to comment.