Skip to content

Commit

Permalink
Aggregation overhaul — return to Profunctor and semi-aggregations
Browse files Browse the repository at this point in the history
This PR makes a number of changes to how aggregation works in Rel8.

The biggest change is that we drop the `Aggregate` context and we return to the `Profunctor`-based `Aggregator` that Opaleye uses (as in #37). While working with `Profunctor`s is more awkward for many common use-cases, it's ultimately more powerful. The big thing it gives you that we don't currently have is the ability to "post-map" on the result of an aggregation function. Pretend for a moment that Postgres does not have the `avg` function built-in. With the previous Rel8, there is no way to directly write `sum(x) / count(x)`. The best you could do would something like:

```haskell
fmap (\(total, size) -> total / fromIntegral size) $ aggregate $ do
  foo <- each fooSchema
  pure (sum x, count x)
```

The key thing is that the mapping can only happen after `aggregate` is called. Whereas with the `Profunctor`-based `Aggregator` this is just `(/) <$> sum <*> fmap fromIntegral count`. This isn't too bad if the only thing you want to do is computing the average, but if you're doing a complicated aggregation with several things happening at once then you might need to do several unrelated post-processings after the `aggregate`. We really want a way to bundle up the postmapping with the aggregation itself and have that as a singular composable unit. Another example is the `listAggExpr` function. The only reason Rel8 exports this is because it can't be directly expressed in terms of `listAgg`. With the `Profunctor`-based `Aggregator` it can be, it's just `(id $*) <$> listAgg`, it no longer needs to be a special case.

The original attempt in #37 recognised that it can be awkward to have to write `lmap (.x) sum`, so instead of sum having the type signature `Aggregator (Expr a) (Expr a)`, it had the type signature `(i -> Expr a) -> Aggregator i (Expr a)`, so that you wouldn't have to use `lmap`, you could just type `sum (.x)`. However, there are many ways to compose `Aggregator`s — for example, if you wanted to use combinators from `product-profunctor` to combine aggregators, then you'd rather type `sum ***! count` than `sum id ***! count id`. So in this PR we keep the type of `sum` as `Aggregator (Expr a) (Expr a)`, but we also export `sumOn`, which has the bundled `lmap`.

The other major change is that this PR introduces two forms of aggregation — "semi"-aggregation and "full"-aggregation. Up until now, all aggregation in Rel8 was "semi"-aggregation, but "full"-aggregation feels a bit more natural and Haskelly.

Up until now, the `aggrgegate` combinator in Rel8 would return zero rows if given a query that itself returned zero rows, even if the aggregation functions that comprised it had identity values. So it was very common to see code like `fmap (fromMaybeTable 0) $ optional $ aggregate $ sum <$> _`. Again, we "know" that `0` is the identity value for `sum` and we really want some way to bundle those together and to say "return the identity value if there are zero rows". Rel8 now has this ability — it has both `Aggregator` and `Aggregator1`, with the former having identity values and the latter not. The `aggregate` function now takes an `Aggregator` and returns the identity value when encountering zero rows, whereas the `aggregate1` function takes an `Aggregator1` and behaves as before. `count`, `sum`, `and`, `or`, `listAgg` are `Aggregator`s (with the identity values `0`, `0`, `true`, `false` and `listTable []` respectively) and `groupBy`, `max` and `min` are `Aggregator1`s.

This also means that `many` is now just `aggregate listAgg` instead of `fmap (fromMaybeTable (listTable [])) . optional . aggregate . fmap listAgg`.

It should also be noted that these functions are actually polymorphic — `sum` will actually give you an `Aggregator'` that can be used as either `Aggregator` or `Aggregator1` without needing to explicitly convert between them. Similarly `aggregate1` can take either an `Aggegator` or an `Aggregator1` (though it won't use the identity value of the former).

Aggregation in Rel8 now supports more of the features of PostgresSQL supports. Three new combinators are introduced — `distinctAggregate`, `filterWhere` and `orderAggregateBy`.

Opaleye itself already supported `distinctAggregate` and indeed we used this to implement `countDistinct` as a special case, but we now support using `DISTINCT` on arbitrary aggregation functions.

`filterWhere` is new to both Rel8 and Opaleye. It corresponds to PostgreSQL's `FILTER (WHERE ...)` syntax in aggregations. It also uses the identity value of an `Aggregator` in the case where the given predicate returns zero rows. There is also `filterWhereOptional` which can be used with `Aggregator1`s.

`orderAggregateBy` allows the values within an aggregation to be ordered using a given ordering, mainly non-commutative aggregation functions like `listAgg`.
  • Loading branch information
shane-circuithub committed May 8, 2023
1 parent e2a77b7 commit 55f076f
Show file tree
Hide file tree
Showing 25 changed files with 844 additions and 494 deletions.
1 change: 1 addition & 0 deletions rel8.cabal
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,7 @@ library

other-modules:
Rel8.Aggregate
Rel8.Aggregate.Fold

Rel8.Column
Rel8.Column.ADT
Expand Down
54 changes: 30 additions & 24 deletions src/Rel8.hs
Original file line number Diff line number Diff line change
Expand Up @@ -135,9 +135,6 @@ module Rel8
, BuildADT, buildADT
, ConstructADT, constructADT

-- *** Other ADT operations
, AggregateADT, aggregateADT

-- *** Miscellaneous notes
-- $misc-notes

Expand All @@ -147,7 +144,6 @@ module Rel8
, ConstructHKD, constructHKD
, DeconstructHKD, deconstructHKD
, NameHKD, nameHKD
, AggregateHKD, aggregateHKD

-- ** Table schemas
, TableSchema(..)
Expand Down Expand Up @@ -253,24 +249,34 @@ module Rel8
, loop

-- ** Aggregation
, Aggregate
, Aggregates
, Aggregator
, Aggregator1
, Aggregator'
, Fold (Semi, Full)
, semi
, full
, aggregate
, aggregate1
, filterWhere
, filterWhereOptional
, distinctAggregate
, orderAggregateBy
, optionalAggregate
, countRows
, groupBy
, listAgg, listAggExpr
, groupBy, groupByOn
, listAgg, listAggOn, listAggExpr, listAggExprOn
, mode
, nonEmptyAgg, nonEmptyAggExpr
, DBMax, max
, DBMin, min
, DBSum, sum, sumWhere, avg
, nonEmptyAgg, nonEmptyAggOn, nonEmptyAggExpr, nonEmptyAggExprOn
, DBMax, max, maxOn
, DBMin, min, minOn
, DBSum, sum, sumOn, sumWhere, avg, avgOn
, DBString, stringAgg
, count
, count, countOn
, countStar
, countDistinct
, countWhere
, and
, or
, countDistinct, countDistinctOn
, countWhere, countWhereOn
, and, andOn
, or, orOn

-- ** Ordering
, orderBy
Expand All @@ -288,19 +294,18 @@ module Rel8
, partitionBy
, orderPartitionBy
, cumulative
, cumulative_
, currentRow
, rowNumber
, rank
, denseRank
, percentRank
, cumeDist
, ntile
, lag
, lead
, firstValue
, lastValue
, nthValue
, lag, lagOn
, lead, leadOn
, firstValue, firstValueOn
, lastValue, lastValueOn
, nthValue, nthValueOn
, indexed

-- ** Bindings
Expand Down Expand Up @@ -352,6 +357,7 @@ import Prelude ()

-- rel8
import Rel8.Aggregate
import Rel8.Aggregate.Fold
import Rel8.Column
import Rel8.Column.ADT
import Rel8.Column.Either
Expand All @@ -374,7 +380,7 @@ import Rel8.Expr.Order
import Rel8.Expr.Serialize
import Rel8.Expr.Sequence
import Rel8.Expr.Text ( like, ilike )
import Rel8.Expr.Window hiding ( cumulative )
import Rel8.Expr.Window
import Rel8.Generic.Rel8able ( KRel8able, Rel8able )
import Rel8.Order
import Rel8.Query
Expand Down
248 changes: 181 additions & 67 deletions src/Rel8/Aggregate.hs
Original file line number Diff line number Diff line change
@@ -1,83 +1,197 @@
{-# language DataKinds #-}
{-# language FlexibleContexts #-}
{-# language FlexibleInstances #-}
{-# language MultiParamTypeClasses #-}
{-# language RankNTypes #-}
{-# language GADTs #-}
{-# language KindSignatures #-}
{-# language ScopedTypeVariables #-}
{-# language StandaloneKindSignatures #-}
{-# language TypeFamilies #-}
{-# language UndecidableInstances #-}

module Rel8.Aggregate
( Aggregate(..), zipOutputs
, unsafeMakeAggregate
, Aggregates
( Aggregator' (Aggregator)
, Aggregator
, Aggregator1
, semi
, full
, filterWhereExplicit
, unsafeMakeAggregator
)
where

-- base
import Control.Applicative ( liftA2 )
import Data.Functor.Identity ( Identity( Identity ) )
import Data.Kind ( Constraint, Type )
import Control.Applicative (liftA2)
import Data.Kind (Type)
import Prelude

-- profunctors
import Data.Profunctor ( dimap )

-- opaleye
import qualified Opaleye.Aggregate as Opaleye
import qualified Opaleye.Internal.HaskellDB.PrimQuery as Opaleye
import qualified Opaleye.Internal.Column as Opaleye
import qualified Opaleye.Internal.MaybeFields as Opaleye
import qualified Opaleye.Internal.Operators as Opaleye

-- rel8
import Rel8.Expr ( Expr )
import Rel8.Schema.HTable.Identity ( HIdentity(..) )
import qualified Rel8.Schema.Kind as K
import Rel8.Schema.Null ( Sql )
import Rel8.Table
( Table, Columns, Context, fromColumns, toColumns
, FromExprs, fromResult, toResult
, Transpose
-- product-profunctor
import Data.Profunctor.Product
( ProductProfunctor, purePP, (****)
, SumProfunctor, (+++!)
)
import Rel8.Table.Transpose ( Transposes )
import Rel8.Type ( DBType )


-- | 'Aggregate' is a special context used by 'Rel8.aggregate'.
type Aggregate :: K.Context
newtype Aggregate a = Aggregate (Opaleye.Aggregator () (Expr a))


instance Sql DBType a => Table Aggregate (Aggregate a) where
type Columns (Aggregate a) = HIdentity a
type Context (Aggregate a) = Aggregate
type FromExprs (Aggregate a) = a
type Transpose to (Aggregate a) = to a

toColumns = HIdentity
fromColumns (HIdentity a) = a
toResult = HIdentity . Identity
fromResult (HIdentity (Identity a)) = a


-- | @Aggregates a b@ means that the columns in @a@ are all 'Aggregate's
-- for the 'Expr' columns in @b@.
type Aggregates :: Type -> Type -> Constraint
class Transposes Aggregate Expr aggregates exprs => Aggregates aggregates exprs
instance Transposes Aggregate Expr aggregates exprs => Aggregates aggregates exprs


zipOutputs :: ()
=> (Expr a -> Expr b -> Expr c) -> Aggregate a -> Aggregate b -> Aggregate c
zipOutputs f (Aggregate a) (Aggregate b) = Aggregate (liftA2 f a b)

-- profunctors
import Data.Profunctor (Profunctor, dimap)

unsafeMakeAggregate :: forall (input :: Type) (output :: Type) n n' a a'. ()
=> (Expr input -> Opaleye.PrimExpr)
-> (Opaleye.PrimExpr -> Expr output)
-> Opaleye.Aggregator (Opaleye.Field_ n a) (Opaleye.Field_ n' a')
-> Expr input
-> Aggregate output
unsafeMakeAggregate input output aggregator expr =
Aggregate $ dimap in_ out aggregator
where out = output . Opaleye.unColumn
in_ = Opaleye.Column . input . const expr
-- rel8
import Rel8.Expr (Expr)
import Rel8.Expr.Opaleye (toPrimExpr, toColumn)
import Rel8.Aggregate.Fold (Fallback (Empty, Fallback), Fold (Full, Semi))

-- semigroupoids
import Data.Functor.Apply (Apply, liftF2)


-- | 'Aggregator'' is the most general form of \"aggregator\", of which
-- 'Aggregator' and 'Aggregator1' are special cases. 'Aggregator''s are
-- comprised of aggregation functions and/or @GROUP BY@ clauses.
--
-- Aggregation functions operating on individual 'Rel8.Expr's such as
-- 'Rel8.sum' can be combined into 'Aggregator's operating on larger types
-- using the 'Applicative', 'Profunctor' and 'ProductProfunctor' interfaces.
-- Working with 'Profunctor's can sometimes be awkward so for every 'Rel8.sum'
-- we also provide a 'Rel8.sumOn' which bundles an 'Data.Profunctor.lmap'. For
-- complex aggregations, we recommend using these functions along with
-- @ApplicativeDo@, @BlockArguments@, @OverloadedRecordDot@ and
-- @RecordWildCards@:
--
-- @
--
-- data Input f = Input
-- { orderId :: Column f OrderId
-- , customerId :: Column f CustomerId
-- , productId :: Column f ProductId
-- , quantity :: Column f Int64
-- , price :: Column f Scientific
-- }
-- deriving (Generic, Rel8able)
--
--
-- totalPrice :: Input Expr -> Expr Scientific
-- totalPrice input = fromIntegral input.quantity * input.price
--
--
-- data Result f = Result
-- { customerId :: Column f CustomerId
-- , totalOrders :: Column f Int64
-- , productsOrdered :: Column f Int64
-- , totalPrice :: Column Scientific
-- }
-- deriving (Generic, Rel8able)
--
--
-- allResults :: Query (Result Expr)
-- allResults =
-- aggregate
-- do
-- customerId <- groupByOn (.customerId)
-- totalOrders <- countDistinctOn (.orderId)
-- productsOrdered <- countDistinctOn (.productId)
-- totalPrice <- sumOn totalPrice
-- pure Result {..}
-- do
-- order <- each orderSchema
-- orderLine <- each orderLineSchema
-- where_ $ order.id ==. orderLine.orderId
-- pure
-- Input
-- { orderId = order.id
-- , customerId = order.customerId
-- , productId = orderLine.productId
-- , quantity = orderLine.quantity
-- , price = orderLine.price
-- }
-- @
type Aggregator' :: Fold -> Type -> Type -> Type
data Aggregator' fold i a = Aggregator !(Fallback fold a) !(Opaleye.Aggregator i a)


instance Profunctor (Aggregator' fold) where
dimap f g (Aggregator fallback a) =
Aggregator (fmap g fallback) (dimap f g a)


instance ProductProfunctor (Aggregator' fold) where
purePP = pure
(****) = (<*>)


instance SumProfunctor (Aggregator' fold) where
Aggregator fallback a +++! Aggregator fallback' b =
flip Aggregator (a +++! b) $ case fallback of
Empty -> case fallback' of
Empty -> Empty
Fallback x -> Fallback (Right x)
Fallback x -> Fallback (Left x)


instance Functor (Aggregator' fold i) where
fmap = dimap id


instance Apply (Aggregator' fold i) where
liftF2 f (Aggregator fallback a) (Aggregator fallback' b) =
Aggregator (liftF2 f fallback fallback') (liftA2 f a b)


instance Applicative (Aggregator' fold i) where
pure a = Aggregator (pure a) (pure a)
liftA2 = liftF2


-- | An 'Aggregator' takes a 'Rel8.Query' producing a collection of rows of
-- type @a@ and transforms it into a 'Rel8.Query' producing a single row of
-- type @b@. If the given 'Rel8.Query' produces an empty collection of rows,
-- then the single row in the resulting 'Rel8.Query' contains the identity
-- values of the aggregation functions comprising the 'Aggregator' (i.e.,
-- @0@ for 'Rel8.sum', 'Rel8.false' for 'Rel8.or', etc.).
--
-- 'Aggregator' is a special form of 'Aggregator'' parameterised by 'Full'.
type Aggregator :: Type -> Type -> Type
type Aggregator = Aggregator' 'Full


-- | An 'Aggregator1' takes a collection of rows of type @a@, groups them, and
-- transforms each group into a single row of type @b@. This corresponds to
-- aggregators using @GROUP BY@ in SQL. If given an empty collection of rows,
-- 'Aggregator1' will have no groups and will therefore also return an empty
-- collection of rows.
--
-- 'Aggregator1' is a special form of 'Aggregator'' parameterised by 'Semi'.
type Aggregator1 :: Type -> Type -> Type
type Aggregator1 = Aggregator' 'Semi


-- | 'semi' turns an 'Aggregator' into an 'Aggregator1'.
semi :: Aggregator' fold i a -> Aggregator1 i a
semi (Aggregator _ a) = Aggregator Empty a


-- | Given a value to fall back on if given an empty collection of rows, 'full'
-- turns an 'Aggregator1' into an 'Aggregator'.
full :: a -> Aggregator' fold i a -> Aggregator' fold' i a
full fallback (Aggregator _ a) = Aggregator (Fallback fallback) a


filterWhereExplicit :: ()
=> Opaleye.IfPP a a
-> (i -> Expr Bool)
-> Aggregator i a
-> Aggregator' fold i a
filterWhereExplicit ifPP f (Aggregator (Fallback fallback) aggregator) =
Aggregator (Fallback fallback) aggregator'
where
aggregator' =
Opaleye.fromMaybeFieldsExplicit ifPP fallback
<$> Opaleye.filterWhere (toColumn . toPrimExpr . f) aggregator


unsafeMakeAggregator :: forall (i :: Type) (o :: Type) (fold :: Fold) i' o'. ()
=> (i -> i')
-> (o' -> o)
-> Fallback fold o
-> Opaleye.Aggregator i' o'
-> Aggregator' fold i o
unsafeMakeAggregator input output fallback =
Aggregator fallback . dimap input output
Loading

0 comments on commit 55f076f

Please sign in to comment.