Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Aggregation overhaul — return to Profunctor and semi-aggregations
This PR makes a number of changes to how aggregation works in Rel8. The biggest change is that we drop the `Aggregate` context and we return to the `Profunctor`-based `Aggregator` that Opaleye uses (as in #37). While working with `Profunctor`s is more awkward for many common use-cases, it's ultimately more powerful. The big thing it gives you that we don't currently have is the ability to "post-map" on the result of an aggregation function. Pretend for a moment that Postgres does not have the `avg` function built-in. With the previous Rel8, there is no way to directly write `sum(x) / count(x)`. The best you could do would something like: ```haskell fmap (\(total, size) -> total / fromIntegral size) $ aggregate $ do foo <- each fooSchema pure (sum foo.x, count foo.x) ``` The key thing is that the mapping can only happen after `aggregate` is called. Whereas with the `Profunctor`-based `Aggregator` this is just `(/) <$> sum <*> fmap fromIntegral count`. This isn't too bad if the only thing you want to do is computing the average, but if you're doing a complicated aggregation with several things happening at once then you might need to do several unrelated post-processings after the `aggregate`. We really want a way to bundle up the postmapping with the aggregation itself and have that as a singular composable unit. Another example is the `listAggExpr` function. The only reason Rel8 exports this is because it can't be directly expressed in terms of `listAgg`. With the `Profunctor`-based `Aggregator` it can be, it's just `(id $*) <$> listAgg`, it no longer needs to be a special case. The original attempt in #37 recognised that it can be awkward to have to write `lmap (.x) sum`, so instead of sum having the type signature `Aggregator (Expr a) (Expr a)`, it had the type signature `(i -> Expr a) -> Aggregator i (Expr a)`, so that you wouldn't have to use `lmap`, you could just type `sum (.x)`. However, there are many ways to compose `Aggregator`s — for example, if you wanted to use combinators from `product-profunctor` to combine aggregators, then you'd rather type `sum ***! count` than `sum id ***! count id`. So in this PR we keep the type of `sum` as `Aggregator (Expr a) (Expr a)`, but we also export `sumOn`, which has the bundled `lmap`. The other major change is that this PR introduces two forms of aggregation — "semi"-aggregation and "full"-aggregation. Up until now, all aggregation in Rel8 was "semi"-aggregation, but "full"-aggregation feels a bit more natural and Haskelly. Up until now, the `aggrgegate` combinator in Rel8 would return zero rows if given a query that itself returned zero rows, even if the aggregation functions that comprised it had identity values. So it was very common to see code like `fmap (fromMaybeTable 0) $ optional $ aggregate $ sum <$> _`. Again, we "know" that `0` is the identity value for `sum` and we really want some way to bundle those together and to say "return the identity value if there are zero rows". Rel8 now has this ability — it has both `Aggregator` and `Aggregator1`, with the former having identity values and the latter not. The `aggregate` function now takes an `Aggregator` and returns the identity value when encountering zero rows, whereas the `aggregate1` function takes an `Aggregator1` and behaves as before. `count`, `sum`, `and`, `or`, `listAgg` are `Aggregator`s (with the identity values `0`, `0`, `true`, `false` and `listTable []` respectively) and `groupBy`, `max` and `min` are `Aggregator1`s. This also means that `many` is now just `aggregate listAgg` instead of `fmap (fromMaybeTable (listTable [])) . optional . aggregate . fmap listAgg`. It should also be noted that these functions are actually polymorphic — `sum` will actually give you an `Aggregator'` that can be used as either `Aggregator` or `Aggregator1` without needing to explicitly convert between them. Similarly `aggregate1` can take either an `Aggegator` or an `Aggregator1` (though it won't use the identity value of the former). Aggregation in Rel8 now supports more of the features of PostgresSQL supports. Three new combinators are introduced — `distinctAggregate`, `filterWhere` and `orderAggregateBy`. Opaleye itself already supported `distinctAggregate` and indeed we used this to implement `countDistinct` as a special case, but we now support using `DISTINCT` on arbitrary aggregation functions. `filterWhere` is new to both Rel8 and Opaleye. It corresponds to PostgreSQL's `FILTER (WHERE ...)` syntax in aggregations. It also uses the identity value of an `Aggregator` in the case where the given predicate returns zero rows. There is also `filterWhereOptional` which can be used with `Aggregator1`s. `orderAggregateBy` allows the values within an aggregation to be ordered using a given ordering, mainly non-commutative aggregation functions like `listAgg`.
- Loading branch information