Aggregate by count is broken when used with distinct clause and select with struct value #4568

aparkerdavid · 2025-01-10T21:56:22Z

Elixir version

1.17.1

Database and Version

PostgreSQL 16.4

Ecto Versions

main

Database Adapter and Versions (postgrex, myxql, etc)

postgrex

Current behavior

Aggregating by count fails if the query includes a distinct clause, and a select clause that would not be allowed in a subquery.

      TestRepo.insert!(%MySchema{})

      query =
        from entity in MySchema,
          distinct: entity.id,
          select: %{entity: entity}

      TestRepo.aggregate(query, :count)

     ** (Ecto.SubQueryError) the following exception happened when compiling a subquery.

         ** (Ecto.QueryError) atoms, structs, maps, lists, tuples and sources are not allowed as map values in subquery, got: `%{entity: &0}` in query:
         
         from m0 in Ecto.RepoTest.MySchema,
           distinct: [asc: m0.id],
           select: %{entity: m0}
         

     The subquery originated from the following query:

     from m0 in subquery(from m0 in Ecto.RepoTest.MySchema,
       distinct: [asc: m0.id],
       select: %{entity: m0}),
       select: count()

PR with failing test, for reference: #4567

Expected behavior

1

The text was updated successfully, but these errors were encountered:

greg-rychlewski · 2025-01-10T23:50:46Z

Thanks for the report. I believe this is working as expected but it's probably not obvious enough why.

If you use distinct with Repo.aggregate then your query needs to be wrapped inside of a subquery to ensure the result is correct. And subqueries have certain restrictions. One of them is when you're selecting a map its values can't be one of the things listed in the error message: atoms, structs, maps, lists, tuples and sources.

The reason for this restriction is because when you select a map in a subquery, you turn it into something like this:

SELECT .... FROM (SELECT value1 as key1, value2 as key2, ... FROM ...)

Where value1, key1, value2, key2, ... are the keys and values from the selected map. The reason why sources are not allowed in this scenario is because selecting a source in Ecto will translate into a query that selects all of the fields. For example, the query from p in Post, select: p will get translated into

SELECT p0.schema_field1, p0.schema_field2, ... FROM "posts" as p0

rather than

SELECT p0 from "posts" as p0

If we didn't do it this way then the database would return all of the fields in the underlying table even if they weren't in the schema.

TLDR:

I believe this is expected and can't be changed. But we can probably update the docs to make this clearer.

josevalim · 2025-01-11T06:34:30Z

@greg-rychlewski in this case though, since we are not aggregating over a particular column, we could nuke the select? Or have it as select: 1?

greg-rychlewski · 2025-01-11T12:56:53Z

I think something like select: %{key: 1} would work. I believe we need to select something otherwise the planner will complain. And I don't think we can select integers from subqueries without putting them in a map so they can be aliased.

The only hesitation I have is there are databases out there like clickhouse and others that are supporting more non-standard functions. I don't know for sure if there is any aggregate function out there that can operate on 0 args and might be affected. A couple of ways I can see this theoretically breaking something, but can't find an example for:

If there are 0 arg aggregate functions that are affected by nulls
If there are 0 arg aggregate functions "do something" to all the columns or all the int columns or something like that

The chance of breaking seems low to me based on what I know. But I'm not sure how much I don't know :). So it's probably a choice between being extra cautious and improving the docs or taking a tiny chance it breaks an unusual case and we have to revert.

I slightly prefer the status quo just because it's less manipulation of user input and I don't see a strong reason to allow aggregating over a query that Ecto would not let you run anyways. But I am ok changing it too if you think that will help more people.

josevalim · 2025-01-11T15:07:20Z

I think the confusion is exactly that the query they pass actually run, it doesn't run because we move it to a subquery behind the scenes. I think we could:

If you are aggregating a giving field, we select that field
If you are not aggregating any field, we do select: 1?

greg-rychlewski · 2025-01-11T16:34:02Z

It works well for the standard use cases I am aware of. For number 2 in particular I am not sure if there exist some non standard use cases where the properties of the row are important without needing any particular value for the aggregation.

As a contrived example if there was a count_not_all_null convenience function in some database that checks if all columns are null or not. Then us changing the value will not be good.

I don’t use this function though and the use case for it is not super clear to me. If it’s meant as a convenience for the user to give it a query they are using somewhere else in their app and leave it to Ecto to make it work with aggregations then what you said is probably best and we just don’t support uncommon cases that don’t fit our assumptions

josevalim · 2025-01-11T16:35:59Z

You are right. I believe our best option then is to improve the docs. I will push something soon.

aparkerdavid added the Kind:Bug label Jan 10, 2025

josevalim closed this as completed in aacf5ff Jan 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Aggregate by count is broken when used with distinct clause and select with struct value #4568

Aggregate by count is broken when used with distinct clause and select with struct value #4568

aparkerdavid commented Jan 10, 2025

greg-rychlewski commented Jan 10, 2025 •

edited

Loading

josevalim commented Jan 11, 2025

greg-rychlewski commented Jan 11, 2025 •

edited

Loading

josevalim commented Jan 11, 2025

greg-rychlewski commented Jan 11, 2025

josevalim commented Jan 11, 2025

Aggregate by count is broken when used with distinct clause and select with struct value #4568

Aggregate by count is broken when used with distinct clause and select with struct value #4568

Comments

aparkerdavid commented Jan 10, 2025

Elixir version

Database and Version

Ecto Versions

Database Adapter and Versions (postgrex, myxql, etc)

Current behavior

Expected behavior

greg-rychlewski commented Jan 10, 2025 • edited Loading

josevalim commented Jan 11, 2025

greg-rychlewski commented Jan 11, 2025 • edited Loading

josevalim commented Jan 11, 2025

greg-rychlewski commented Jan 11, 2025

josevalim commented Jan 11, 2025

greg-rychlewski commented Jan 10, 2025 •

edited

Loading

greg-rychlewski commented Jan 11, 2025 •

edited

Loading