Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GROUP BY clause will not properly rendered if there is a function applied column expression #879

Closed
sgryjp opened this issue Jul 11, 2023 · 6 comments
Assignees
Labels
api: bigquery Issues related to the googleapis/python-bigquery-sqlalchemy API.

Comments

@sgryjp
Copy link

sgryjp commented Jul 11, 2023

Environment details

  • OS type and version: Windows 10 Pro, macOS Ventura 13.4
  • Python version: 3.8.10
  • pip version: 23.1.2
  • sqlalchemy-bigquery version: 1.6.1

Steps to reproduce

  1. Compose an SQL statement using SQLAlchemy Core or ORM which includes a GROUP BY clause with an expression timestamp_trunc(`some_column`, hour)
  2. Render it (or execute it and confirm how it was rendered by examining SQLAlchemy's log)

Code example

Below is an example script to reproduce the symptom. Please refer to the first comment block in the file for detail.

"""An example script to reproduce the bug.

Executing this script outputs (partially folded for ease of reading):

    sqlalchemy:          1.4.49
    sqlalchemy-bigquery: 1.6.1
    SQLite----------------------------------------
    SELECT timestamp_trunc(sensor.time, hour) AS time,
           max(sensor.value) AS max_1
    FROM sensor
    GROUP BY timestamp_trunc(sensor.time, hour)
    BigQuery----------------------------------------
    SELECT timestamp_trunc(`sensor`.`time`, hour) AS `time`,
           max(`sensor`.`value`) AS `max_1`
    FROM `sensor`
    GROUP BY `timestamp_trunc_1`

In the final line, there is a reference to a non-existing colunn
of which name is `timestamp_trunc_1`. The line should be rendered as:

    GROUP BY timestamp_trunc(`sensor`.`time`, hour)

...like the rendering result for SQLite, written above.
"""
import sqlalchemy as sa
import sqlalchemy.dialects.sqlite as sqlite
import sqlalchemy_bigquery as bigquery

print("sqlalchemy:         ", sa.__version__)
print("sqlalchemy-bigquery:", bigquery.__version__)

engine = sa.create_engine("sqlite://")
metadata = sa.MetaData()

sensor_table = sa.Table(
    "sensor",
    metadata,
    sa.Column("time", sa.TIMESTAMP),
    sa.Column("value", sa.String),
)

stmt = sa.select(
    sa.func.timestamp_trunc(sensor_table.c.time, sa.text("hour")).label("time"),
    sa.func.max(sensor_table.c.value),
).group_by(
    sa.func.timestamp_trunc(sensor_table.c.time, sa.text("hour")),
)

print("SQLite" + "-" * 40)
print(
    stmt.compile(
        dialect=sqlite.dialect(),
        compile_kwargs={"literal_binds": True},
    ),
)

print("BigQuery" + "-" * 40)
print(
    stmt.compile(
        dialect=bigquery.dialect(),
        compile_kwargs={"literal_binds": True},
    ),
)

Stack trace

N/A (no exception will be raised unless you send it to BigQuery)

@product-auto-label product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery-sqlalchemy API. label Jul 11, 2023
@liutmelody
Copy link

liutmelody commented Jan 30, 2024

Would love to know the cause of this! I'm getting the same error when using ROLLUP inside the groupby clause:
Example code:

import sqlalchemy as sa
from sqlalchemy_bigquery import BigQueryDialect

        query = (
            sa.select(sql_projections)
            .select_from(model_expression)
            .where(sa.and_(*filter_expressions))
            .group_by(func.rollup(*groupby_expressions)) // <-- relevant rollup expression 
            .having(sa.and_(*having_expressions))
        )

        query_str = str(
            query.compile(dialect=BigQueryDialect())
        )

Output Query:

SELECT timestamp_trunc(`created_date`, YEAR) AS `created_date`,
       `agency` AS `agency`,
       `city` AS `city`,
       count(*) AS `row_count`,
FROM `quickstart.nyc_311`
GROUP BY `rollup_1` // <-- error here, this should be ROLLUP(1, 2, ..) as it is in other dialects 

I've been working around it by using sa.text but would appreciate some insight into why this is happening.

@kiraksi kiraksi self-assigned this Jan 31, 2024
@kiraksi
Copy link
Contributor

kiraksi commented Jan 31, 2024

Working on implementing group by, rollup and cube clauses as part of another customer issue, which should include a fix for this

@kiraksi
Copy link
Contributor

kiraksi commented Feb 13, 2024

@sgryjp @liutmelody This fix is now a part of branch development-build-1.11.0.dev3. I will be setting up a prerelease soon on PyPI as well for this development release, the reason being that this branch is based off our SQLAlchemy 1.4-2.0+ migration work. Please try it out and let me know if it solves your problem, if it doesn't feel free to reopen this issue.

@kiraksi kiraksi closed this as completed Feb 13, 2024
@sgryjp
Copy link
Author

sgryjp commented Feb 17, 2024

@kiraksi Hi, thank you for fixing this issue! I confirmed that the problem was fixed by your commit 5cfc280.

I also confirmed that the branch development-build-v1.11.0.dev3 does reproduce the problem. It seems that the commit bd38a5e somehow gets the bug back again. Could you check if there is anything suspicious? Thank you in advance.

(.venv) PS> git log --oneline development-build-v1.11.0.dev3                                                                           87a75dc (HEAD, origin/development-build-v1.11.0.dev3, development-build-v1.11.0.dev3) create development build 1.11.0.dev3
0c882b9 test commit
033d329 reformat logic
5154814 added test case
bd38a5e fixed render label as label assignment
68afc39 test basic implementation of group_by_clause and visit_label
ece7f1f removed unnecessary clause function changes, edited tests
e82f5dd test commit to run kokooro tests
310cfa7 Merge branch 'development-build-v1.11.0.dev2' into grouping-rollup-cube
0c4cf07 (tag: v1.11.0.dev2) create development release 1.11.0.dev2
5cfc280 feat: grouping sets, rollup and cube compatibility
...(log continues)...

(.venv) PS> pip install --force-reinstall git+https://github.com/googleapis/python-bigquery-sqlalchemy.git@bd38a5e14145ca77ec462e7cf4e9a989f2eb1fb3 >$null; python .\issue879.py
  Running command git clone --filter=blob:none --quiet https://github.com/googleapis/python-bigquery-sqlalchemy.git 'C:\Users\sgryjp\AppData\Local\Temp\pip-req-build-d9h_kbnv'
  Running command git rev-parse -q --verify 'sha^bd38a5e14145ca77ec462e7cf4e9a989f2eb1fb3'
  Running command git fetch -q https://github.com/googleapis/python-bigquery-sqlalchemy.git bd38a5e14145ca77ec462e7cf4e9a989f2eb1fb3
  Running command git checkout -q bd38a5e14145ca77ec462e7cf4e9a989f2eb1fb3
sqlalchemy:          2.0.27
sqlalchemy-bigquery: 1.11.0.dev2
SQLite----------------------------------------
SELECT timestamp_trunc(sensor.time, hour) AS time, max(sensor.value) AS max_1
FROM sensor GROUP BY timestamp_trunc(sensor.time, hour)
BigQuery----------------------------------------
SELECT timestamp_trunc(`sensor`.`time`, hour) AS `time`, max(`sensor`.`value`) AS `max_1`
FROM `sensor` GROUP BY `timestamp_trunc_1`

(.venv) PS> pip install --force-reinstall git+https://github.com/googleapis/python-bigquery-sqlalchemy.git@68afc3959d988b77a244699e7d5e4f39bd233917 >$null; python .\issue879.py
  Running command git clone --filter=blob:none --quiet https://github.com/googleapis/python-bigquery-sqlalchemy.git 'C:\Users\sgryjp\AppData\Local\Temp\pip-req-build-hywz32py'
  Running command git rev-parse -q --verify 'sha^68afc3959d988b77a244699e7d5e4f39bd233917'
  Running command git fetch -q https://github.com/googleapis/python-bigquery-sqlalchemy.git 68afc3959d988b77a244699e7d5e4f39bd233917
  Running command git checkout -q 68afc3959d988b77a244699e7d5e4f39bd233917
sqlalchemy:          2.0.27
sqlalchemy-bigquery: 1.11.0.dev2
SQLite----------------------------------------
SELECT timestamp_trunc(sensor.time, hour) AS time, max(sensor.value) AS max_1
FROM sensor GROUP BY timestamp_trunc(sensor.time, hour)
BigQuery----------------------------------------
SELECT timestamp_trunc(`sensor`.`time`, hour) AS `time`, max(`sensor`.`value`) AS `max_1`
FROM `sensor` GROUP BY timestamp_trunc(`sensor`.`time`, hour)

@kiraksi
Copy link
Contributor

kiraksi commented Feb 20, 2024

@sgryjp This is odd, the previous commit you mentioned was a fix that introduced other issues, I created a new fix that is now on development-build-v1.11.0.dev3, I even confirmed the fix by adding various test cases. Can you confirm again whether development-build-v1.11.0.dev3 does not fix your issue with the new commits there? This may actually be an issue with the labeling from the within_group_by=True value passed in to group by for timestamp_trunc. Please let me know, otherwise its a quick fix to add TIMESTAMP_TRUNC to the list of exclusions for that label.

@sgryjp
Copy link
Author

sgryjp commented Feb 23, 2024

@kiraksi Thank you for response!

I confirmed that neither of tag v1.11.0.dev3 nor HEAD of the branch development-build-v1.11.0.dev3 does not fix my issue. Running the "code example" in the first comment results the same as below:

$ pip install --force-reinstall git+https://github.com/googleapis/python-bigquery-sqlalchemy.git@v1.11.0.dev3; python issue879.py
...(omitted)...
sqlalchemy:          2.0.27
sqlalchemy-bigquery: 1.11.0.dev3
SQLite----------------------------------------
SELECT timestamp_trunc(sensor.time, hour) AS time, max(sensor.value) AS max_1
FROM sensor GROUP BY timestamp_trunc(sensor.time, hour)
BigQuery----------------------------------------
SELECT timestamp_trunc(`sensor`.`time`, hour) AS `time`, max(`sensor`.`value`) AS `max_1`
FROM `sensor` GROUP BY `timestamp_trunc_1`

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery-sqlalchemy API.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants