Aggregation overhaul — return to Profunctor and semi-aggregations

This PR makes a number of changes to how aggregation works in Rel8. The biggest change is that we drop the `Aggregate` context and we return to the `Profunctor`-based `Aggregator` that Opaleye uses (as in #37). While working with `Profunctor`s is more awkward for many common use-cases, it's ultimately more powerful. The big thing it gives you that we don't currently have is the ability to "post-map" on the result of an aggregation function. Pretend for a moment that Postgres does not have the `avg` function built-in. With the previous Rel8, there is no way to directly write `sum(x) / count(x)`. The best you could do would something like: ```haskell fmap (\(total, size) -> total / fromIntegral size) $ aggregate $ do foo <- each fooSchema pure (sum foo.x, count foo.x) ``` The key thing is that the mapping can only happen after `aggregate` is called. Whereas with the `Profunctor`-based `Aggregator` this is just `(/) <$> sum <*> fmap fromIntegral count`. This isn't too bad if the only thing you want to do is computing the average, but if you're doing a complicated aggregation with several things happening at once then you might need to do several unrelated post-processings after the `aggregate`. We really want a way to bundle up the postmapping with the aggregation itself and have that as a singular composable unit. Another example is the `listAggExpr` function. The only reason Rel8 exports this is because it can't be directly expressed in terms of `listAgg`. With the `Profunctor`-based `Aggregator` it can be, it's just `(id $*) <$> listAgg`, it no longer needs to be a special case. The original attempt in #37 recognised that it can be awkward to have to write `lmap (.x) sum`, so instead of sum having the type signature `Aggregator (Expr a) (Expr a)`, it had the type signature `(i -> Expr a) -> Aggregator i (Expr a)`, so that you wouldn't have to use `lmap`, you could just type `sum (.x)`. However, there are many ways to compose `Aggregator`s — for example, if you wanted to use combinators from `product-profunctor` to combine aggregators, then you'd rather type `sum ***! count` than `sum id ***! count id`. So in this PR we keep the type of `sum` as `Aggregator (Expr a) (Expr a)`, but we also export `sumOn`, which has the bundled `lmap`. The other major change is that this PR introduces two forms of aggregation — "semi"-aggregation and "full"-aggregation. Up until now, all aggregation in Rel8 was "semi"-aggregation, but "full"-aggregation feels a bit more natural and Haskelly. Up until now, the `aggrgegate` combinator in Rel8 would return zero rows if given a query that itself returned zero rows, even if the aggregation functions that comprised it had identity values. So it was very common to see code like `fmap (fromMaybeTable 0) $ optional $ aggregate $ sum <$> _`. Again, we "know" that `0` is the identity value for `sum` and we really want some way to bundle those together and to say "return the identity value if there are zero rows". Rel8 now has this ability — it has both `Aggregator` and `Aggregator1`, with the former having identity values and the latter not. The `aggregate` function now takes an `Aggregator` and returns the identity value when encountering zero rows, whereas the `aggregate1` function takes an `Aggregator1` and behaves as before. `count`, `sum`, `and`, `or`, `listAgg` are `Aggregator`s (with the identity values `0`, `0`, `true`, `false` and `listTable []` respectively) and `groupBy`, `max` and `min` are `Aggregator1`s. This also means that `many` is now just `aggregate listAgg` instead of `fmap (fromMaybeTable (listTable [])) . optional . aggregate . fmap listAgg`. It should also be noted that these functions are actually polymorphic — `sum` will actually give you an `Aggregator'` that can be used as either `Aggregator` or `Aggregator1` without needing to explicitly convert between them. Similarly `aggregate1` can take either an `Aggegator` or an `Aggregator1` (though it won't use the identity value of the former). Aggregation in Rel8 now supports more of the features of PostgresSQL supports. Three new combinators are introduced — `distinctAggregate`, `filterWhere` and `orderAggregateBy`. Opaleye itself already supported `distinctAggregate` and indeed we used this to implement `countDistinct` as a special case, but we now support using `DISTINCT` on arbitrary aggregation functions. `filterWhere` is new to both Rel8 and Opaleye. It corresponds to PostgreSQL's `FILTER (WHERE ...)` syntax in aggregations. It also uses the identity value of an `Aggregator` in the case where the given predicate returns zero rows. There is also `filterWhereOptional` which can be used with `Aggregator1`s. `orderAggregateBy` allows the values within an aggregation to be ordered using a given ordering, mainly non-commutative aggregation functions like `listAgg`.
circuithub · Jun 18, 2023 · e28cc31 · e28cc31
1 parent baaabe7
commit e28cc31
Show file tree

Hide file tree

Showing 27 changed files with 855 additions and 497 deletions.
diff --git a/cabal.project.haskell-nix b/cabal.project.haskell-nix
@@ -2,3 +2,9 @@
 -- will interpret them as local packages, and try to build them when we cabal
 -- build. The only reason we have to specify these is for Haskell.nix to know to
 -- override these packages by fetching them rather than using Hackage.
+
+source-repository-package
+  type: git
+  location: https://github.com/tomjaguarpaw/haskell-opaleye
+  tag: version_0.9.7.0
+  --sha256: sha256-jOsDmVzHgvHwy3vBH+Bef/fvTK7J2YoC5LnEgecqWY8=
diff --git a/rel8.cabal b/rel8.cabal
@@ -59,6 +59,7 @@ library
 
   other-modules:
     Rel8.Aggregate
+    Rel8.Aggregate.Fold
 
     Rel8.Column
     Rel8.Column.ADT

diff --git a/src/Rel8.hs b/src/Rel8.hs
@@ -135,9 +135,6 @@ module Rel8
   , BuildADT, buildADT
   , ConstructADT, constructADT
 
-    -- *** Other ADT operations
-  , AggregateADT, aggregateADT
-
     -- *** Miscellaneous notes
     -- $misc-notes
 
@@ -147,7 +144,6 @@ module Rel8
   , ConstructHKD, constructHKD
   , DeconstructHKD, deconstructHKD
   , NameHKD, nameHKD
-  , AggregateHKD, aggregateHKD
 
     -- ** Table schemas
   , TableSchema(..)
@@ -253,24 +249,34 @@ module Rel8
   , loop
 
     -- ** Aggregation
-  , Aggregate
-  , Aggregates
+  , Aggregator
+  , Aggregator1
+  , Aggregator'
+  , Fold (Semi, Full)
+  , toAggregator
+  , toAggregator1
   , aggregate
+  , aggregate1
+  , filterWhere
+  , filterWhereOptional
+  , distinctAggregate
+  , orderAggregateBy
+  , optionalAggregate
   , countRows
-  , groupBy
-  , listAgg, listAggExpr
+  , groupBy, groupByOn
+  , listAgg, listAggOn, listAggExpr, listAggExprOn
   , mode
-  , nonEmptyAgg, nonEmptyAggExpr
-  , DBMax, max
-  , DBMin, min
-  , DBSum, sum, sumWhere, avg
+  , nonEmptyAgg, nonEmptyAggOn, nonEmptyAggExpr, nonEmptyAggExprOn
+  , DBMax, max, maxOn
+  , DBMin, min, minOn
+  , DBSum, sum, sumOn, sumWhere, avg, avgOn
   , DBString, stringAgg
-  , count
+  , count, countOn
   , countStar
-  , countDistinct
-  , countWhere
-  , and
-  , or
+  , countDistinct, countDistinctOn
+  , countWhere, countWhereOn
+  , and, andOn
+  , or, orOn
 
     -- ** Ordering
   , orderBy
@@ -288,19 +294,18 @@ module Rel8
   , partitionBy
   , orderPartitionBy
   , cumulative
-  , cumulative_
   , currentRow
   , rowNumber
   , rank
   , denseRank
   , percentRank
   , cumeDist
   , ntile
-  , lag
-  , lead
-  , firstValue
-  , lastValue
-  , nthValue
+  , lag, lagOn
+  , lead, leadOn
+  , firstValue, firstValueOn
+  , lastValue, lastValueOn
+  , nthValue, nthValueOn
   , indexed
 
     -- ** Bindings
@@ -352,6 +357,7 @@ import Prelude ()
 
 -- rel8
 import Rel8.Aggregate
+import Rel8.Aggregate.Fold
 import Rel8.Column
 import Rel8.Column.ADT
 import Rel8.Column.Either
@@ -374,7 +380,7 @@ import Rel8.Expr.Order
 import Rel8.Expr.Serialize
 import Rel8.Expr.Sequence
 import Rel8.Expr.Text ( like, ilike )
-import Rel8.Expr.Window hiding ( cumulative )
+import Rel8.Expr.Window
 import Rel8.Generic.Rel8able ( KRel8able, Rel8able )
 import Rel8.Order
 import Rel8.Query

diff --git a/src/Rel8/Aggregate.hs b/src/Rel8/Aggregate.hs
@@ -1,83 +1,197 @@
 {-# language DataKinds #-}
-{-# language FlexibleContexts #-}
-{-# language FlexibleInstances #-}
-{-# language MultiParamTypeClasses #-}
-{-# language RankNTypes #-}
+{-# language GADTs #-}
+{-# language KindSignatures #-}
+{-# language ScopedTypeVariables #-}
 {-# language StandaloneKindSignatures #-}
-{-# language TypeFamilies #-}
-{-# language UndecidableInstances #-}
 
 module Rel8.Aggregate
-  ( Aggregate(..), zipOutputs
-  , unsafeMakeAggregate
-  , Aggregates
+  ( Aggregator' (Aggregator)
+  , Aggregator
+  , Aggregator1
+  , toAggregator
+  , toAggregator1
+  , filterWhereExplicit
+  , unsafeMakeAggregator
   )
 where
 
 -- base
-import Control.Applicative ( liftA2 )
-import Data.Functor.Identity ( Identity( Identity ) )
-import Data.Kind ( Constraint, Type )
+import Control.Applicative (liftA2)
+import Data.Kind (Type)
 import Prelude
 
--- profunctors
-import Data.Profunctor ( dimap )
-
 -- opaleye
 import qualified Opaleye.Aggregate as Opaleye
-import qualified Opaleye.Internal.HaskellDB.PrimQuery as Opaleye
-import qualified Opaleye.Internal.Column as Opaleye
+import qualified Opaleye.Internal.MaybeFields as Opaleye
+import qualified Opaleye.Internal.Operators as Opaleye
 
--- rel8
-import Rel8.Expr ( Expr )
-import Rel8.Schema.HTable.Identity ( HIdentity(..) )
-import qualified Rel8.Schema.Kind as K
-import Rel8.Schema.Null ( Sql )
-import Rel8.Table
-  ( Table, Columns, Context, fromColumns, toColumns
-  , FromExprs, fromResult, toResult
-  , Transpose
+-- product-profunctor
+import Data.Profunctor.Product
+  ( ProductProfunctor, purePP, (****)
+  , SumProfunctor, (+++!)
   )
-import Rel8.Table.Transpose ( Transposes )
-import Rel8.Type ( DBType )
-
-
--- | 'Aggregate' is a special context used by 'Rel8.aggregate'.
-type Aggregate :: K.Context
-newtype Aggregate a = Aggregate (Opaleye.Aggregator () (Expr a))
-
-
-instance Sql DBType a => Table Aggregate (Aggregate a) where
-  type Columns (Aggregate a) = HIdentity a
-  type Context (Aggregate a) = Aggregate
-  type FromExprs (Aggregate a) = a
-  type Transpose to (Aggregate a) = to a
-
-  toColumns = HIdentity
-  fromColumns (HIdentity a) = a
-  toResult = HIdentity . Identity
-  fromResult (HIdentity (Identity a)) = a
-
-
--- | @Aggregates a b@ means that the columns in @a@ are all 'Aggregate's
--- for the 'Expr' columns in @b@.
-type Aggregates :: Type -> Type -> Constraint
-class Transposes Aggregate Expr aggregates exprs => Aggregates aggregates exprs
-instance Transposes Aggregate Expr aggregates exprs => Aggregates aggregates exprs
-
-
-zipOutputs :: ()
-  => (Expr a -> Expr b -> Expr c) -> Aggregate a -> Aggregate b -> Aggregate c
-zipOutputs f (Aggregate a) (Aggregate b) = Aggregate (liftA2 f a b)
 
+-- profunctors
+import Data.Profunctor (Profunctor, dimap)
 
-unsafeMakeAggregate :: forall (input :: Type) (output :: Type) n n' a a'. ()
-  => (Expr input -> Opaleye.PrimExpr)
-  -> (Opaleye.PrimExpr -> Expr output)
-  -> Opaleye.Aggregator (Opaleye.Field_ n a) (Opaleye.Field_ n' a')
-  -> Expr input
-  -> Aggregate output
-unsafeMakeAggregate input output aggregator expr =
-  Aggregate $ dimap in_ out aggregator
-  where out = output . Opaleye.unColumn
-        in_ = Opaleye.Column . input . const expr
+-- rel8
+import Rel8.Expr (Expr)
+import Rel8.Expr.Opaleye (toPrimExpr, toColumn)
+import Rel8.Aggregate.Fold (Fallback (Empty, Fallback), Fold (Full, Semi))
+
+-- semigroupoids
+import Data.Functor.Apply (Apply, liftF2)
+
+
+-- | 'Aggregator'' is the most general form of \"aggregator\", of which
+-- 'Aggregator' and 'Aggregator1' are special cases. 'Aggregator''s are
+-- comprised of aggregation functions and/or @GROUP BY@ clauses.
+--
+-- Aggregation functions operating on individual 'Rel8.Expr's such as
+-- 'Rel8.sum' can be combined into 'Aggregator's operating on larger types
+-- using the 'Applicative', 'Profunctor' and 'ProductProfunctor' interfaces.
+-- Working with 'Profunctor's can sometimes be awkward so for every 'Rel8.sum'
+-- we also provide a 'Rel8.sumOn' which bundles an 'Data.Profunctor.lmap'. For
+-- complex aggregations, we recommend using these functions along with
+-- @ApplicativeDo@, @BlockArguments@, @OverloadedRecordDot@ and
+-- @RecordWildCards@:
+--
+-- @
+--
+-- data Input f = Input
+--   { orderId :: Column f OrderId
+--   , customerId :: Column f CustomerId
+--   , productId :: Column f ProductId
+--   , quantity :: Column f Int64
+--   , price :: Column f Scientific
+--   }
+--   deriving (Generic, Rel8able)
+--
+--
+-- totalPrice :: Input Expr -> Expr Scientific
+-- totalPrice input = fromIntegral input.quantity * input.price
+--
+--
+-- data Result f = Result
+--   { customerId :: Column f CustomerId
+--   , totalOrders :: Column f Int64
+--   , productsOrdered :: Column f Int64
+--   , totalPrice :: Column Scientific
+--   }
+--   deriving (Generic, Rel8able)
+--
+--
+-- allResults :: Query (Result Expr)
+-- allResults =
+--   aggregate
+--     do
+--       customerId <- groupByOn (.customerId)
+--       totalOrders <- countDistinctOn (.orderId)
+--       productsOrdered <- countDistinctOn (.productId)
+--       totalPrice <- sumOn totalPrice
+--       pure Result {..}
+--     do
+--       order <- each orderSchema
+--       orderLine <- each orderLineSchema
+--       where_ $ order.id ==. orderLine.orderId
+--       pure
+--         Input
+--           { orderId = order.id
+--           , customerId = order.customerId
+--           , productId = orderLine.productId
+--           , quantity = orderLine.quantity
+--           , price = orderLine.price
+--           }
+-- @
+type Aggregator' :: Fold -> Type -> Type -> Type
+data Aggregator' fold i a = Aggregator !(Fallback fold a) !(Opaleye.Aggregator i a)
+
+
+instance Profunctor (Aggregator' fold) where
+  dimap f g (Aggregator fallback a) =
+    Aggregator (fmap g fallback) (dimap f g a)
+
+
+instance ProductProfunctor (Aggregator' fold) where
+  purePP = pure
+  (****) = (<*>)
+
+
+instance SumProfunctor (Aggregator' fold) where
+  Aggregator fallback a +++! Aggregator fallback' b =
+    flip Aggregator (a +++! b) $ case fallback of
+      Empty -> case fallback' of
+        Empty -> Empty
+        Fallback x -> Fallback (Right x)
+      Fallback x -> Fallback (Left x)
+
+
+instance Functor (Aggregator' fold i) where
+  fmap = dimap id
+
+
+instance Apply (Aggregator' fold i) where
+  liftF2 f (Aggregator fallback a) (Aggregator fallback' b) =
+    Aggregator (liftF2 f fallback fallback') (liftA2 f a b)
+
+
+instance Applicative (Aggregator' fold i) where
+  pure a = Aggregator (pure a) (pure a)
+  liftA2 = liftF2
+
+
+-- | An 'Aggregator' takes a 'Rel8.Query' producing a collection of rows of
+-- type @a@ and transforms it into a 'Rel8.Query' producing a single row of
+-- type @b@. If the given 'Rel8.Query' produces an empty collection of rows,
+-- then the single row in the resulting 'Rel8.Query' contains the identity
+-- values of the aggregation functions comprising the 'Aggregator' (i.e.,
+-- @0@ for 'Rel8.sum', 'Rel8.false' for 'Rel8.or', etc.).
+--
+-- 'Aggregator' is a special form of 'Aggregator'' parameterised by 'Full'.
+type Aggregator :: Type -> Type -> Type
+type Aggregator = Aggregator' 'Full
+
+
+-- | An 'Aggregator1' takes a collection of rows of type @a@, groups them, and
+-- transforms each group into a single row of type @b@. This corresponds to
+-- aggregators using @GROUP BY@ in SQL. If given an empty collection of rows,
+-- 'Aggregator1' will have no groups and will therefore also return an empty
+-- collection of rows.
+--
+-- 'Aggregator1' is a special form of 'Aggregator'' parameterised by 'Semi'.
+type Aggregator1 :: Type -> Type -> Type
+type Aggregator1 = Aggregator' 'Semi
+
+
+-- | 'toAggregator1' turns an 'Aggregator' into an 'Aggregator1'.
+toAggregator1 :: Aggregator' fold i a -> Aggregator1 i a
+toAggregator1 (Aggregator _ a) = Aggregator Empty a
+
+
+-- | Given a value to fall back on if given an empty collection of rows,
+-- 'toAggregator' turns an 'Aggregator1' into an 'Aggregator'.
+toAggregator :: a -> Aggregator' fold i a -> Aggregator' fold' i a
+toAggregator fallback (Aggregator _ a) = Aggregator (Fallback fallback) a
+
+
+filterWhereExplicit :: ()
+  => Opaleye.IfPP a a
+  -> (i -> Expr Bool)
+  -> Aggregator i a
+  -> Aggregator' fold i a
+filterWhereExplicit ifPP f (Aggregator (Fallback fallback) aggregator) =
+  Aggregator (Fallback fallback) aggregator'
+  where
+    aggregator' =
+      Opaleye.fromMaybeFieldsExplicit ifPP fallback
+        <$> Opaleye.filterWhere (toColumn . toPrimExpr . f) aggregator
+
+
+unsafeMakeAggregator :: forall (i :: Type) (o :: Type) (fold :: Fold) i' o'.  ()
+  => (i -> i')
+  -> (o' -> o)
+  -> Fallback fold o
+  -> Opaleye.Aggregator i' o'
+  -> Aggregator' fold i o
+unsafeMakeAggregator input output fallback =
+  Aggregator fallback . dimap input output