Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return to the profunctor-based Aggregator #37

Closed
wants to merge 4 commits into from

Conversation

ocharles
Copy link
Contributor

@ocharles ocharles commented Apr 11, 2021

After having a deep think about aggregations again over the weekend, I think the best API is to return to the same API that Opaleye uses. This means:

aggregate :: Query (Aggregate a) -> Query a

becomes

aggregate :: Aggregator i o -> Query i -> Query o

This Aggregator type is a Functor, Applicative and Profunctor. Being applicative of course means that we recover ApplicativeDo, and we get the nice feature of being able to combine aggregations (e.g., liftA2 (/) (sum x) (count x).

This API:

  • Prevents "double aggregation" (sum(sum x)), because there's no way to "feed" sum the result of another sum.
  • Prevents non-aggregated columns. The only thing you can pure is columns outside the aggregation - which is fine. For example, liftA2 (+) (sum x) (pure y) is fine, because the aggregation is laterally joined.

Fixes #36

@@ -46,9 +47,9 @@ import Rel8.Type.Sum ( DBSum )


-- | Count the occurances of a single column. Corresponds to @COUNT(a)@
count :: Expr a -> Aggregate (Expr Int64)
count :: (i -> Expr a) -> Aggregator i (Expr Int64)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've opted to have all Aggregator producing functions also bundle the lmap function at the same time. 99% of the time you're going to want lmap, and I think:

aggregate f g
  where 
    f = do
      n <- sum x
      m <- count y
      pure (n, m)

is clearer than

aggregate f g
  where 
    f = do
      n <- lmap x sum
      m <- lmap y count
      pure (n, m)

It also saves users a dependency on profunctors.

Comment on lines +71 to +72
countStar :: Aggregator i (Expr Int64)
countStar = count \_ -> litExpr True
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this function doesn't bundle lmap because it's polymorphic in its input.

@@ -90,7 +89,6 @@ type Column context a =
type HEither :: K.Context -> Type -> Type -> Type
type family HEither context where
HEither Structure = Shape2 'Either
HEither Aggregate = EitherTable
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be possible to get this working, but I haven't explored it. I'm still not very interested in this API, but if you want to get it working I'll accept whatever is needed to make this work.

@@ -81,28 +76,6 @@ class Interpretation context => Nullifiable context where
-> Col context ('Spec labels necessity x)


instance Nullifiable Aggregate where
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also didn't think much about this, but nothing seems to break with this class removed (probably because I removed other stuff at the same time). Same point, if you want to keep this, I'm OK with that if you show me how.

(\_ (Identity (DB a)) -> Aggregation $ listAggExpr a)
(pure exprs)
listAgg :: Table Expr exprs => (i -> exprs) -> Aggregator i (ListTable exprs)
listAgg f = lmap (toColumns . f) $ fromColumns $ HVectorize $ htabulate $ \(HMapTableField field) ->
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth doing this in HVectorize.hs itself? Basically to use listAggExpr, we need to know that a ~ [x], which we get from pattern matching on SSpec from hfield hspecs field, but only when field comes from HMapTableField.

Comment on lines +191 to +200
-- aggregateEitherTable :: ()
-- => (a -> Aggregate c)
-- -> (b -> Aggregate d)
-- -> EitherTable a b
-- -> Aggregate (EitherTable c d)
-- aggregateEitherTable f g EitherTable {tag, left, right} =
-- liftF3 EitherTable (tag <$ aggregate) (f left) (g right)
-- where
-- Tag {aggregator, expr} = tag
-- aggregate = unsafeMakeAggregate toPrimExpr fromPrimExpr aggregator expr
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I won't merge without getting this working again.

@ocharles ocharles force-pushed the profunctor-aggregator branch from 1d64f24 to 7bb4246 Compare April 11, 2021 17:33
@ocharles
Copy link
Contributor Author

ocharles commented Apr 11, 2021

Actually trying to use this API is giving me pause. We used to have

panelStateCountByOrderId :: Tabulation (Expr OrderId) (PanelStateCount Expr)
panelStateCountByOrderId = aggregateTabulation id do
  boardOrder <- boardOrdersByOrderId `prebind` queryTable
  liftQuery do
    panel <- panelsFromBoardOrder boardOrder
    M.PanelStateLog
      { panelStateLogState = state
      , panelStateLogSide = side
      } <- panelStateLogsFromPanel panel >>= latestPanelStateLogs

    let top = side ==. lit Top
        bottom = side ==. lit Bot
        placed = state ==. lit Placed
        baked = state ==. lit Baked
        complete = state ==. lit Complete

    pure PanelStateCount
      { countBottom = countWhere (bottom &&. (placed ||. baked ||. complete))
      , countTop = countWhere (top &&. (placed ||. baked ||. complete))
      , countBottomBaked = countWhere (bottom &&. baked)
      , countTopBaked = countWhere (top &&. baked)
      , countBottomComplete = countWhere (bottom &&. complete)
      , countTopComplete = countWhere (top &&. complete)
      }

which is elegant and succinct. I can't find an equally nice way to write this using this proposed API :/

On this, I'm more inclined to say that we promote Rel8.Aggregate.Legacy to Rel8.Aggregate.

@ocharles
Copy link
Contributor Author

Closing for now, needs more thought.

@ocharles ocharles closed this Apr 19, 2021
shane-circuithub added a commit that referenced this pull request May 8, 2023
This PR makes a number of changes to how aggregation works in Rel8.

The biggest change is that we drop the `Aggregate` context and we return to the `Profunctor`-based `Aggregator` that Opaleye uses (as in #37). While working with `Profunctor`s is more awkward for many common use-cases, it's ultimately more powerful. The big thing it gives you that we don't currently have is the ability to "post-map" on the result of an aggregation function. Pretend for a moment that Postgres does not have the `avg` function built-in. With the previous Rel8, there is no way to directly write `sum(x) / count(x)`. The best you could do would something like:

```haskell
fmap (\(total, size) -> total / fromIntegral size) $ aggregate $ do
  foo <- each fooSchema
  pure (sum x, count x)
```

The key thing is that the mapping can only happen after `aggregate` is called. Whereas with the `Profunctor`-based `Aggregator` this is just `(/) <$> sum <*> fmap fromIntegral count`. This isn't too bad if the only thing you want to do is computing the average, but if you're doing a complicated aggregation with several things happening at once then you might need to do several unrelated post-processings after the `aggregate`. We really want a way to bundle up the postmapping with the aggregation itself and have that as a singular composable unit. Another example is the `listAggExpr` function. The only reason Rel8 exports this is because it can't be directly expressed in terms of `listAgg`. With the `Profunctor`-based `Aggregator` it can be, it's just `(id $*) <$> listAgg`, it no longer needs to be a special case.

The original attempt in #37 recognised that it can be awkward to have to write `lmap (.x) sum`, so instead of sum having the type signature `Aggregator (Expr a) (Expr a)`, it had the type signature `(i -> Expr a) -> Aggregator i (Expr a)`, so that you wouldn't have to use `lmap`, you could just type `sum (.x)`. However, there are many ways to compose `Aggregator`s — for example, if you wanted to use combinators from `product-profunctor` to combine aggregators, then you'd rather type `sum ***! count` than `sum id ***! count id`. So in this PR we keep the type of `sum` as `Aggregator (Expr a) (Expr a)`, but we also export `sumOn`, which has the bundled `lmap`.

The other major change is that this PR introduces two forms of aggregation — "semi"-aggregation and "full"-aggregation. Up until now, all aggregation in Rel8 was "semi"-aggregation, but "full"-aggregation feels a bit more natural and Haskelly.

Up until now, the `aggrgegate` combinator in Rel8 would return zero rows if given a query that itself returned zero rows, even if the aggregation functions that comprised it had identity values. So it was very common to see code like `fmap (fromMaybeTable 0) $ optional $ aggregate $ sum <$> _`. Again, we "know" that `0` is the identity value for `sum` and we really want some way to bundle those together and to say "return the identity value if there are zero rows". Rel8 now has this ability — it has both `Aggregator` and `Aggregator1`, with the former having identity values and the latter not. The `aggregate` function now takes an `Aggregator` and returns the identity value when encountering zero rows, whereas the `aggregate1` function takes an `Aggregator1` and behaves as before. `count`, `sum`, `and`, `or`, `listAgg` are `Aggregator`s (with the identity values `0`, `0`, `true`, `false` and `listTable []` respectively) and `groupBy`, `max` and `min` are `Aggregator1`s.

This also means that `many` is now just `aggregate listAgg` instead of `fmap (fromMaybeTable (listTable [])) . optional . aggregate . fmap listAgg`.

It should also be noted that these functions are actually polymorphic — `sum` will actually give you an `Aggregator'` that can be used as either `Aggregator` or `Aggregator1` without needing to explicitly convert between them. Similarly `aggregate1` can take either an `Aggegator` or an `Aggregator1` (though it won't use the identity value of the former).

Aggregation in Rel8 now supports more of the features of PostgresSQL supports. Three new combinators are introduced — `distinctAggregate`, `filterWhere` and `orderAggregateBy`.

Opaleye itself already supported `distinctAggregate` and indeed we used this to implement `countDistinct` as a special case, but we now support using `DISTINCT` on arbitrary aggregation functions.

`filterWhere` is new to both Rel8 and Opaleye. It corresponds to PostgreSQL's `FILTER (WHERE ...)` syntax in aggregations. It also uses the identity value of an `Aggregator` in the case where the given predicate returns zero rows. There is also `filterWhereOptional` which can be used with `Aggregator1`s.

`orderAggregateBy` allows the values within an aggregation to be ordered using a given ordering, mainly non-commutative aggregation functions like `listAgg`.
shane-circuithub added a commit that referenced this pull request May 8, 2023
This PR makes a number of changes to how aggregation works in Rel8.

The biggest change is that we drop the `Aggregate` context and we return to the `Profunctor`-based `Aggregator` that Opaleye uses (as in #37). While working with `Profunctor`s is more awkward for many common use-cases, it's ultimately more powerful. The big thing it gives you that we don't currently have is the ability to "post-map" on the result of an aggregation function. Pretend for a moment that Postgres does not have the `avg` function built-in. With the previous Rel8, there is no way to directly write `sum(x) / count(x)`. The best you could do would something like:

```haskell
fmap (\(total, size) -> total / fromIntegral size) $ aggregate $ do
  foo <- each fooSchema
  pure (sum foo.x, count foo.x)
```

The key thing is that the mapping can only happen after `aggregate` is called. Whereas with the `Profunctor`-based `Aggregator` this is just `(/) <$> sum <*> fmap fromIntegral count`. This isn't too bad if the only thing you want to do is computing the average, but if you're doing a complicated aggregation with several things happening at once then you might need to do several unrelated post-processings after the `aggregate`. We really want a way to bundle up the postmapping with the aggregation itself and have that as a singular composable unit. Another example is the `listAggExpr` function. The only reason Rel8 exports this is because it can't be directly expressed in terms of `listAgg`. With the `Profunctor`-based `Aggregator` it can be, it's just `(id $*) <$> listAgg`, it no longer needs to be a special case.

The original attempt in #37 recognised that it can be awkward to have to write `lmap (.x) sum`, so instead of sum having the type signature `Aggregator (Expr a) (Expr a)`, it had the type signature `(i -> Expr a) -> Aggregator i (Expr a)`, so that you wouldn't have to use `lmap`, you could just type `sum (.x)`. However, there are many ways to compose `Aggregator`s — for example, if you wanted to use combinators from `product-profunctor` to combine aggregators, then you'd rather type `sum ***! count` than `sum id ***! count id`. So in this PR we keep the type of `sum` as `Aggregator (Expr a) (Expr a)`, but we also export `sumOn`, which has the bundled `lmap`.

The other major change is that this PR introduces two forms of aggregation — "semi"-aggregation and "full"-aggregation. Up until now, all aggregation in Rel8 was "semi"-aggregation, but "full"-aggregation feels a bit more natural and Haskelly.

Up until now, the `aggrgegate` combinator in Rel8 would return zero rows if given a query that itself returned zero rows, even if the aggregation functions that comprised it had identity values. So it was very common to see code like `fmap (fromMaybeTable 0) $ optional $ aggregate $ sum <$> _`. Again, we "know" that `0` is the identity value for `sum` and we really want some way to bundle those together and to say "return the identity value if there are zero rows". Rel8 now has this ability — it has both `Aggregator` and `Aggregator1`, with the former having identity values and the latter not. The `aggregate` function now takes an `Aggregator` and returns the identity value when encountering zero rows, whereas the `aggregate1` function takes an `Aggregator1` and behaves as before. `count`, `sum`, `and`, `or`, `listAgg` are `Aggregator`s (with the identity values `0`, `0`, `true`, `false` and `listTable []` respectively) and `groupBy`, `max` and `min` are `Aggregator1`s.

This also means that `many` is now just `aggregate listAgg` instead of `fmap (fromMaybeTable (listTable [])) . optional . aggregate . fmap listAgg`.

It should also be noted that these functions are actually polymorphic — `sum` will actually give you an `Aggregator'` that can be used as either `Aggregator` or `Aggregator1` without needing to explicitly convert between them. Similarly `aggregate1` can take either an `Aggegator` or an `Aggregator1` (though it won't use the identity value of the former).

Aggregation in Rel8 now supports more of the features of PostgresSQL supports. Three new combinators are introduced — `distinctAggregate`, `filterWhere` and `orderAggregateBy`.

Opaleye itself already supported `distinctAggregate` and indeed we used this to implement `countDistinct` as a special case, but we now support using `DISTINCT` on arbitrary aggregation functions.

`filterWhere` is new to both Rel8 and Opaleye. It corresponds to PostgreSQL's `FILTER (WHERE ...)` syntax in aggregations. It also uses the identity value of an `Aggregator` in the case where the given predicate returns zero rows. There is also `filterWhereOptional` which can be used with `Aggregator1`s.

`orderAggregateBy` allows the values within an aggregation to be ordered using a given ordering, mainly non-commutative aggregation functions like `listAgg`.
shane-circuithub added a commit that referenced this pull request May 8, 2023
This PR makes a number of changes to how aggregation works in Rel8.

The biggest change is that we drop the `Aggregate` context and we return to the `Profunctor`-based `Aggregator` that Opaleye uses (as in #37). While working with `Profunctor`s is more awkward for many common use-cases, it's ultimately more powerful. The big thing it gives you that we don't currently have is the ability to "post-map" on the result of an aggregation function. Pretend for a moment that Postgres does not have the `avg` function built-in. With the previous Rel8, there is no way to directly write `sum(x) / count(x)`. The best you could do would something like:

```haskell
fmap (\(total, size) -> total / fromIntegral size) $ aggregate $ do
  foo <- each fooSchema
  pure (sum foo.x, count foo.x)
```

The key thing is that the mapping can only happen after `aggregate` is called. Whereas with the `Profunctor`-based `Aggregator` this is just `(/) <$> sum <*> fmap fromIntegral count`. This isn't too bad if the only thing you want to do is computing the average, but if you're doing a complicated aggregation with several things happening at once then you might need to do several unrelated post-processings after the `aggregate`. We really want a way to bundle up the postmapping with the aggregation itself and have that as a singular composable unit. Another example is the `listAggExpr` function. The only reason Rel8 exports this is because it can't be directly expressed in terms of `listAgg`. With the `Profunctor`-based `Aggregator` it can be, it's just `(id $*) <$> listAgg`, it no longer needs to be a special case.

The original attempt in #37 recognised that it can be awkward to have to write `lmap (.x) sum`, so instead of sum having the type signature `Aggregator (Expr a) (Expr a)`, it had the type signature `(i -> Expr a) -> Aggregator i (Expr a)`, so that you wouldn't have to use `lmap`, you could just type `sum (.x)`. However, there are many ways to compose `Aggregator`s — for example, if you wanted to use combinators from `product-profunctor` to combine aggregators, then you'd rather type `sum ***! count` than `sum id ***! count id`. So in this PR we keep the type of `sum` as `Aggregator (Expr a) (Expr a)`, but we also export `sumOn`, which has the bundled `lmap`.

The other major change is that this PR introduces two forms of aggregation — "semi"-aggregation and "full"-aggregation. Up until now, all aggregation in Rel8 was "semi"-aggregation, but "full"-aggregation feels a bit more natural and Haskelly.

Up until now, the `aggrgegate` combinator in Rel8 would return zero rows if given a query that itself returned zero rows, even if the aggregation functions that comprised it had identity values. So it was very common to see code like `fmap (fromMaybeTable 0) $ optional $ aggregate $ sum <$> _`. Again, we "know" that `0` is the identity value for `sum` and we really want some way to bundle those together and to say "return the identity value if there are zero rows". Rel8 now has this ability — it has both `Aggregator` and `Aggregator1`, with the former having identity values and the latter not. The `aggregate` function now takes an `Aggregator` and returns the identity value when encountering zero rows, whereas the `aggregate1` function takes an `Aggregator1` and behaves as before. `count`, `sum`, `and`, `or`, `listAgg` are `Aggregator`s (with the identity values `0`, `0`, `true`, `false` and `listTable []` respectively) and `groupBy`, `max` and `min` are `Aggregator1`s.

This also means that `many` is now just `aggregate listAgg` instead of `fmap (fromMaybeTable (listTable [])) . optional . aggregate . fmap listAgg`.

It should also be noted that these functions are actually polymorphic — `sum` will actually give you an `Aggregator'` that can be used as either `Aggregator` or `Aggregator1` without needing to explicitly convert between them. Similarly `aggregate1` can take either an `Aggegator` or an `Aggregator1` (though it won't use the identity value of the former).

Aggregation in Rel8 now supports more of the features of PostgresSQL supports. Three new combinators are introduced — `distinctAggregate`, `filterWhere` and `orderAggregateBy`.

Opaleye itself already supported `distinctAggregate` and indeed we used this to implement `countDistinct` as a special case, but we now support using `DISTINCT` on arbitrary aggregation functions.

`filterWhere` is new to both Rel8 and Opaleye. It corresponds to PostgreSQL's `FILTER (WHERE ...)` syntax in aggregations. It also uses the identity value of an `Aggregator` in the case where the given predicate returns zero rows. There is also `filterWhereOptional` which can be used with `Aggregator1`s.

`orderAggregateBy` allows the values within an aggregation to be ordered using a given ordering, mainly non-commutative aggregation functions like `listAgg`.
shane-circuithub added a commit that referenced this pull request May 9, 2023
This PR makes a number of changes to how aggregation works in Rel8.

The biggest change is that we drop the `Aggregate` context and we return to the `Profunctor`-based `Aggregator` that Opaleye uses (as in #37). While working with `Profunctor`s is more awkward for many common use-cases, it's ultimately more powerful. The big thing it gives you that we don't currently have is the ability to "post-map" on the result of an aggregation function. Pretend for a moment that Postgres does not have the `avg` function built-in. With the previous Rel8, there is no way to directly write `sum(x) / count(x)`. The best you could do would something like:

```haskell
fmap (\(total, size) -> total / fromIntegral size) $ aggregate $ do
  foo <- each fooSchema
  pure (sum foo.x, count foo.x)
```

The key thing is that the mapping can only happen after `aggregate` is called. Whereas with the `Profunctor`-based `Aggregator` this is just `(/) <$> sum <*> fmap fromIntegral count`. This isn't too bad if the only thing you want to do is computing the average, but if you're doing a complicated aggregation with several things happening at once then you might need to do several unrelated post-processings after the `aggregate`. We really want a way to bundle up the postmapping with the aggregation itself and have that as a singular composable unit. Another example is the `listAggExpr` function. The only reason Rel8 exports this is because it can't be directly expressed in terms of `listAgg`. With the `Profunctor`-based `Aggregator` it can be, it's just `(id $*) <$> listAgg`, it no longer needs to be a special case.

The original attempt in #37 recognised that it can be awkward to have to write `lmap (.x) sum`, so instead of sum having the type signature `Aggregator (Expr a) (Expr a)`, it had the type signature `(i -> Expr a) -> Aggregator i (Expr a)`, so that you wouldn't have to use `lmap`, you could just type `sum (.x)`. However, there are many ways to compose `Aggregator`s — for example, if you wanted to use combinators from `product-profunctor` to combine aggregators, then you'd rather type `sum ***! count` than `sum id ***! count id`. So in this PR we keep the type of `sum` as `Aggregator (Expr a) (Expr a)`, but we also export `sumOn`, which has the bundled `lmap`.

The other major change is that this PR introduces two forms of aggregation — "semi"-aggregation and "full"-aggregation. Up until now, all aggregation in Rel8 was "semi"-aggregation, but "full"-aggregation feels a bit more natural and Haskelly.

Up until now, the `aggrgegate` combinator in Rel8 would return zero rows if given a query that itself returned zero rows, even if the aggregation functions that comprised it had identity values. So it was very common to see code like `fmap (fromMaybeTable 0) $ optional $ aggregate $ sum <$> _`. Again, we "know" that `0` is the identity value for `sum` and we really want some way to bundle those together and to say "return the identity value if there are zero rows". Rel8 now has this ability — it has both `Aggregator` and `Aggregator1`, with the former having identity values and the latter not. The `aggregate` function now takes an `Aggregator` and returns the identity value when encountering zero rows, whereas the `aggregate1` function takes an `Aggregator1` and behaves as before. `count`, `sum`, `and`, `or`, `listAgg` are `Aggregator`s (with the identity values `0`, `0`, `true`, `false` and `listTable []` respectively) and `groupBy`, `max` and `min` are `Aggregator1`s.

This also means that `many` is now just `aggregate listAgg` instead of `fmap (fromMaybeTable (listTable [])) . optional . aggregate . fmap listAgg`.

It should also be noted that these functions are actually polymorphic — `sum` will actually give you an `Aggregator'` that can be used as either `Aggregator` or `Aggregator1` without needing to explicitly convert between them. Similarly `aggregate1` can take either an `Aggegator` or an `Aggregator1` (though it won't use the identity value of the former).

Aggregation in Rel8 now supports more of the features of PostgresSQL supports. Three new combinators are introduced — `distinctAggregate`, `filterWhere` and `orderAggregateBy`.

Opaleye itself already supported `distinctAggregate` and indeed we used this to implement `countDistinct` as a special case, but we now support using `DISTINCT` on arbitrary aggregation functions.

`filterWhere` is new to both Rel8 and Opaleye. It corresponds to PostgreSQL's `FILTER (WHERE ...)` syntax in aggregations. It also uses the identity value of an `Aggregator` in the case where the given predicate returns zero rows. There is also `filterWhereOptional` which can be used with `Aggregator1`s.

`orderAggregateBy` allows the values within an aggregation to be ordered using a given ordering, mainly non-commutative aggregation functions like `listAgg`.
shane-circuithub added a commit that referenced this pull request Jun 17, 2023
This PR makes a number of changes to how aggregation works in Rel8.

The biggest change is that we drop the `Aggregate` context and we return to the `Profunctor`-based `Aggregator` that Opaleye uses (as in #37). While working with `Profunctor`s is more awkward for many common use-cases, it's ultimately more powerful. The big thing it gives you that we don't currently have is the ability to "post-map" on the result of an aggregation function. Pretend for a moment that Postgres does not have the `avg` function built-in. With the previous Rel8, there is no way to directly write `sum(x) / count(x)`. The best you could do would something like:

```haskell
fmap (\(total, size) -> total / fromIntegral size) $ aggregate $ do
  foo <- each fooSchema
  pure (sum foo.x, count foo.x)
```

The key thing is that the mapping can only happen after `aggregate` is called. Whereas with the `Profunctor`-based `Aggregator` this is just `(/) <$> sum <*> fmap fromIntegral count`. This isn't too bad if the only thing you want to do is computing the average, but if you're doing a complicated aggregation with several things happening at once then you might need to do several unrelated post-processings after the `aggregate`. We really want a way to bundle up the postmapping with the aggregation itself and have that as a singular composable unit. Another example is the `listAggExpr` function. The only reason Rel8 exports this is because it can't be directly expressed in terms of `listAgg`. With the `Profunctor`-based `Aggregator` it can be, it's just `(id $*) <$> listAgg`, it no longer needs to be a special case.

The original attempt in #37 recognised that it can be awkward to have to write `lmap (.x) sum`, so instead of sum having the type signature `Aggregator (Expr a) (Expr a)`, it had the type signature `(i -> Expr a) -> Aggregator i (Expr a)`, so that you wouldn't have to use `lmap`, you could just type `sum (.x)`. However, there are many ways to compose `Aggregator`s — for example, if you wanted to use combinators from `product-profunctor` to combine aggregators, then you'd rather type `sum ***! count` than `sum id ***! count id`. So in this PR we keep the type of `sum` as `Aggregator (Expr a) (Expr a)`, but we also export `sumOn`, which has the bundled `lmap`.

The other major change is that this PR introduces two forms of aggregation — "semi"-aggregation and "full"-aggregation. Up until now, all aggregation in Rel8 was "semi"-aggregation, but "full"-aggregation feels a bit more natural and Haskelly.

Up until now, the `aggrgegate` combinator in Rel8 would return zero rows if given a query that itself returned zero rows, even if the aggregation functions that comprised it had identity values. So it was very common to see code like `fmap (fromMaybeTable 0) $ optional $ aggregate $ sum <$> _`. Again, we "know" that `0` is the identity value for `sum` and we really want some way to bundle those together and to say "return the identity value if there are zero rows". Rel8 now has this ability — it has both `Aggregator` and `Aggregator1`, with the former having identity values and the latter not. The `aggregate` function now takes an `Aggregator` and returns the identity value when encountering zero rows, whereas the `aggregate1` function takes an `Aggregator1` and behaves as before. `count`, `sum`, `and`, `or`, `listAgg` are `Aggregator`s (with the identity values `0`, `0`, `true`, `false` and `listTable []` respectively) and `groupBy`, `max` and `min` are `Aggregator1`s.

This also means that `many` is now just `aggregate listAgg` instead of `fmap (fromMaybeTable (listTable [])) . optional . aggregate . fmap listAgg`.

It should also be noted that these functions are actually polymorphic — `sum` will actually give you an `Aggregator'` that can be used as either `Aggregator` or `Aggregator1` without needing to explicitly convert between them. Similarly `aggregate1` can take either an `Aggegator` or an `Aggregator1` (though it won't use the identity value of the former).

Aggregation in Rel8 now supports more of the features of PostgresSQL supports. Three new combinators are introduced — `distinctAggregate`, `filterWhere` and `orderAggregateBy`.

Opaleye itself already supported `distinctAggregate` and indeed we used this to implement `countDistinct` as a special case, but we now support using `DISTINCT` on arbitrary aggregation functions.

`filterWhere` is new to both Rel8 and Opaleye. It corresponds to PostgreSQL's `FILTER (WHERE ...)` syntax in aggregations. It also uses the identity value of an `Aggregator` in the case where the given predicate returns zero rows. There is also `filterWhereOptional` which can be used with `Aggregator1`s.

`orderAggregateBy` allows the values within an aggregation to be ordered using a given ordering, mainly non-commutative aggregation functions like `listAgg`.
shane-circuithub added a commit that referenced this pull request Jun 18, 2023
This PR makes a number of changes to how aggregation works in Rel8.

The biggest change is that we drop the `Aggregate` context and we return to the `Profunctor`-based `Aggregator` that Opaleye uses (as in #37). While working with `Profunctor`s is more awkward for many common use-cases, it's ultimately more powerful. The big thing it gives you that we don't currently have is the ability to "post-map" on the result of an aggregation function. Pretend for a moment that Postgres does not have the `avg` function built-in. With the previous Rel8, there is no way to directly write `sum(x) / count(x)`. The best you could do would something like:

```haskell
fmap (\(total, size) -> total / fromIntegral size) $ aggregate $ do
  foo <- each fooSchema
  pure (sum foo.x, count foo.x)
```

The key thing is that the mapping can only happen after `aggregate` is called. Whereas with the `Profunctor`-based `Aggregator` this is just `(/) <$> sum <*> fmap fromIntegral count`. This isn't too bad if the only thing you want to do is computing the average, but if you're doing a complicated aggregation with several things happening at once then you might need to do several unrelated post-processings after the `aggregate`. We really want a way to bundle up the postmapping with the aggregation itself and have that as a singular composable unit. Another example is the `listAggExpr` function. The only reason Rel8 exports this is because it can't be directly expressed in terms of `listAgg`. With the `Profunctor`-based `Aggregator` it can be, it's just `(id $*) <$> listAgg`, it no longer needs to be a special case.

The original attempt in #37 recognised that it can be awkward to have to write `lmap (.x) sum`, so instead of sum having the type signature `Aggregator (Expr a) (Expr a)`, it had the type signature `(i -> Expr a) -> Aggregator i (Expr a)`, so that you wouldn't have to use `lmap`, you could just type `sum (.x)`. However, there are many ways to compose `Aggregator`s — for example, if you wanted to use combinators from `product-profunctor` to combine aggregators, then you'd rather type `sum ***! count` than `sum id ***! count id`. So in this PR we keep the type of `sum` as `Aggregator (Expr a) (Expr a)`, but we also export `sumOn`, which has the bundled `lmap`.

The other major change is that this PR introduces two forms of aggregation — "semi"-aggregation and "full"-aggregation. Up until now, all aggregation in Rel8 was "semi"-aggregation, but "full"-aggregation feels a bit more natural and Haskelly.

Up until now, the `aggrgegate` combinator in Rel8 would return zero rows if given a query that itself returned zero rows, even if the aggregation functions that comprised it had identity values. So it was very common to see code like `fmap (fromMaybeTable 0) $ optional $ aggregate $ sum <$> _`. Again, we "know" that `0` is the identity value for `sum` and we really want some way to bundle those together and to say "return the identity value if there are zero rows". Rel8 now has this ability — it has both `Aggregator` and `Aggregator1`, with the former having identity values and the latter not. The `aggregate` function now takes an `Aggregator` and returns the identity value when encountering zero rows, whereas the `aggregate1` function takes an `Aggregator1` and behaves as before. `count`, `sum`, `and`, `or`, `listAgg` are `Aggregator`s (with the identity values `0`, `0`, `true`, `false` and `listTable []` respectively) and `groupBy`, `max` and `min` are `Aggregator1`s.

This also means that `many` is now just `aggregate listAgg` instead of `fmap (fromMaybeTable (listTable [])) . optional . aggregate . fmap listAgg`.

It should also be noted that these functions are actually polymorphic — `sum` will actually give you an `Aggregator'` that can be used as either `Aggregator` or `Aggregator1` without needing to explicitly convert between them. Similarly `aggregate1` can take either an `Aggegator` or an `Aggregator1` (though it won't use the identity value of the former).

Aggregation in Rel8 now supports more of the features of PostgresSQL supports. Three new combinators are introduced — `distinctAggregate`, `filterWhere` and `orderAggregateBy`.

Opaleye itself already supported `distinctAggregate` and indeed we used this to implement `countDistinct` as a special case, but we now support using `DISTINCT` on arbitrary aggregation functions.

`filterWhere` is new to both Rel8 and Opaleye. It corresponds to PostgreSQL's `FILTER (WHERE ...)` syntax in aggregations. It also uses the identity value of an `Aggregator` in the case where the given predicate returns zero rows. There is also `filterWhereOptional` which can be used with `Aggregator1`s.

`orderAggregateBy` allows the values within an aggregation to be ordered using a given ordering, mainly non-commutative aggregation functions like `listAgg`.
shane-circuithub added a commit that referenced this pull request Jun 18, 2023
This PR makes a number of changes to how aggregation works in Rel8.

The biggest change is that we drop the `Aggregate` context and we return to the `Profunctor`-based `Aggregator` that Opaleye uses (as in #37). While working with `Profunctor`s is more awkward for many common use-cases, it's ultimately more powerful. The big thing it gives you that we don't currently have is the ability to "post-map" on the result of an aggregation function. Pretend for a moment that Postgres does not have the `avg` function built-in. With the previous Rel8, there is no way to directly write `sum(x) / count(x)`. The best you could do would something like:

```haskell
fmap (\(total, size) -> total / fromIntegral size) $ aggregate $ do
  foo <- each fooSchema
  pure (sum foo.x, count foo.x)
```

The key thing is that the mapping can only happen after `aggregate` is called. Whereas with the `Profunctor`-based `Aggregator` this is just `(/) <$> sum <*> fmap fromIntegral count`. This isn't too bad if the only thing you want to do is computing the average, but if you're doing a complicated aggregation with several things happening at once then you might need to do several unrelated post-processings after the `aggregate`. We really want a way to bundle up the postmapping with the aggregation itself and have that as a singular composable unit. Another example is the `listAggExpr` function. The only reason Rel8 exports this is because it can't be directly expressed in terms of `listAgg`. With the `Profunctor`-based `Aggregator` it can be, it's just `(id $*) <$> listAgg`, it no longer needs to be a special case.

The original attempt in #37 recognised that it can be awkward to have to write `lmap (.x) sum`, so instead of sum having the type signature `Aggregator (Expr a) (Expr a)`, it had the type signature `(i -> Expr a) -> Aggregator i (Expr a)`, so that you wouldn't have to use `lmap`, you could just type `sum (.x)`. However, there are many ways to compose `Aggregator`s — for example, if you wanted to use combinators from `product-profunctor` to combine aggregators, then you'd rather type `sum ***! count` than `sum id ***! count id`. So in this PR we keep the type of `sum` as `Aggregator (Expr a) (Expr a)`, but we also export `sumOn`, which has the bundled `lmap`.

The other major change is that this PR introduces two forms of aggregation — "semi"-aggregation and "full"-aggregation. Up until now, all aggregation in Rel8 was "semi"-aggregation, but "full"-aggregation feels a bit more natural and Haskelly.

Up until now, the `aggrgegate` combinator in Rel8 would return zero rows if given a query that itself returned zero rows, even if the aggregation functions that comprised it had identity values. So it was very common to see code like `fmap (fromMaybeTable 0) $ optional $ aggregate $ sum <$> _`. Again, we "know" that `0` is the identity value for `sum` and we really want some way to bundle those together and to say "return the identity value if there are zero rows". Rel8 now has this ability — it has both `Aggregator` and `Aggregator1`, with the former having identity values and the latter not. The `aggregate` function now takes an `Aggregator` and returns the identity value when encountering zero rows, whereas the `aggregate1` function takes an `Aggregator1` and behaves as before. `count`, `sum`, `and`, `or`, `listAgg` are `Aggregator`s (with the identity values `0`, `0`, `true`, `false` and `listTable []` respectively) and `groupBy`, `max` and `min` are `Aggregator1`s.

This also means that `many` is now just `aggregate listAgg` instead of `fmap (fromMaybeTable (listTable [])) . optional . aggregate . fmap listAgg`.

It should also be noted that these functions are actually polymorphic — `sum` will actually give you an `Aggregator'` that can be used as either `Aggregator` or `Aggregator1` without needing to explicitly convert between them. Similarly `aggregate1` can take either an `Aggegator` or an `Aggregator1` (though it won't use the identity value of the former).

Aggregation in Rel8 now supports more of the features of PostgresSQL supports. Three new combinators are introduced — `distinctAggregate`, `filterWhere` and `orderAggregateBy`.

Opaleye itself already supported `distinctAggregate` and indeed we used this to implement `countDistinct` as a special case, but we now support using `DISTINCT` on arbitrary aggregation functions.

`filterWhere` is new to both Rel8 and Opaleye. It corresponds to PostgreSQL's `FILTER (WHERE ...)` syntax in aggregations. It also uses the identity value of an `Aggregator` in the case where the given predicate returns zero rows. There is also `filterWhereOptional` which can be used with `Aggregator1`s.

`orderAggregateBy` allows the values within an aggregation to be ordered using a given ordering, mainly non-commutative aggregation functions like `listAgg`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Apply Aggregate is broken
1 participant