Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

value_counts: sqlalchemy.exc.ProgrammingError: (duckdb.BinderException) #4940

Closed
1 task done
MarcSkovMadsen opened this issue Dec 1, 2022 · 7 comments · Fixed by #5367
Closed
1 task done

value_counts: sqlalchemy.exc.ProgrammingError: (duckdb.BinderException) #4940

MarcSkovMadsen opened this issue Dec 1, 2022 · 7 comments · Fixed by #5367
Labels
bug Incorrect behavior inside of ibis duckdb The DuckDB backend
Milestone

Comments

@MarcSkovMadsen
Copy link

MarcSkovMadsen commented Dec 1, 2022

What happened?

I'm trying to see if I can get the support for Ibis working in hvPlot and HoloViews. Right now for histograms.

Running some code witht the duckDB backend I get a sqlalchemy.exc.ProgrammingError: (duckdb.BinderException) Binder Error.

import duckdb
import ibis
import pandas as pd

from pathlib import Path

DUCKDB_PATH = "DuckDB1.db"

if not Path(DUCKDB_PATH).exists():
    pandas_df = pd.DataFrame(
        {
            "forecast": [90, 160, 125, 150, 141, 141, 120],
        },
    )
    duckdb_con = duckdb.connect(DUCKDB_PATH)
    duckdb_con.execute("CREATE TABLE df AS SELECT * FROM pandas_df")

ibis.options.sql.default_limit = None

db = ibis.duckdb.connect(DUCKDB_PATH)

df = db.table("df")
buckets = [90.0, 113.33333333333333, 136.66666666666666, 160.0]
df["forecast"].bucket(buckets).value_counts().execute()

What version of ibis are you using?

3.2.0

What backend(s) are you using, if any?

DuckDB==0.6.0

Relevant log output

$ python script3.py
C:\repos\private\hvplot\.venv\lib\site-packages\ibis\backends\postgres\registry.py:164: UserWarning: locale specific date formats (%c, %x, %X) are not yet implemented for Windows
  warnings.warn(
C:\repos\private\hvplot\.venv\lib\site-packages\duckdb_engine\__init__.py:229: DuckDBEngineWarning: duckdb-engine doesn't yet support reflection on indices
  warnings.warn(
Traceback (most recent call last):
  File "C:\repos\private\hvplot\.venv\lib\site-packages\sqlalchemy\engine\base.py", line 1900, in _execute_context
    self.dialect.do_execute(
  File "C:\repos\private\hvplot\.venv\lib\site-packages\sqlalchemy\engine\default.py", line 736, in do_execute
    cursor.execute(statement, parameters)
  File "C:\repos\private\hvplot\.venv\lib\site-packages\duckdb_engine\__init__.py", line 120, in execute
    self.c.execute(statement, parameters)
duckdb.BinderException: Binder Error: column "forecast" must appear in the GROUP BY clause or must be part of an aggregate function.
Either add it to the GROUP BY list, or use "ANY_VALUE(forecast)" if the exact value of "forecast" is not important.
LINE 1: ...ECT CASE WHEN (CAST(? AS FLOAT(53)) <= t0.forecast AND t0.forecast < CAST(? AS...
                                                  ^

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\repos\private\hvplot\script3.py", line 28, in <module>
    df["forecast"].bucket(buckets).value_counts().execute()
  File "C:\repos\private\hvplot\.venv\lib\site-packages\ibis\expr\types\core.py", line 291, in execute
    return self._find_backend(use_default=True).execute(
  File "C:\repos\private\hvplot\.venv\lib\site-packages\ibis\backends\base\sql\__init__.py", line 191, in execute
    with self._safe_raw_sql(sql, **kwargs) as cursor:
  File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python39_64\lib\contextlib.py", line 119, in __enter__
    return next(self.gen)
  File "C:\repos\private\hvplot\.venv\lib\site-packages\ibis\backends\base\sql\alchemy\__init__.py", line 124, in _safe_raw_sql
    yield con.execute(*args, **kwargs)
  File "C:\repos\private\hvplot\.venv\lib\site-packages\sqlalchemy\engine\base.py", line 1380, in execute
    return meth(self, multiparams, params, _EMPTY_EXECUTION_OPTS)
  File "C:\repos\private\hvplot\.venv\lib\site-packages\sqlalchemy\sql\elements.py", line 334, in _execute_on_connection
    return connection._execute_clauseelement(
  File "C:\repos\private\hvplot\.venv\lib\site-packages\sqlalchemy\engine\base.py", line 1572, in _execute_clauseelement
    ret = self._execute_context(
  File "C:\repos\private\hvplot\.venv\lib\site-packages\sqlalchemy\engine\base.py", line 1943, in _execute_context
    self._handle_dbapi_exception(
  File "C:\repos\private\hvplot\.venv\lib\site-packages\sqlalchemy\engine\base.py", line 2124, in _handle_dbapi_exception
    util.raise_(
  File "C:\repos\private\hvplot\.venv\lib\site-packages\sqlalchemy\util\compat.py", line 210, in raise_
    raise exception
  File "C:\repos\private\hvplot\.venv\lib\site-packages\sqlalchemy\engine\base.py", line 1900, in _execute_context
    self.dialect.do_execute(
  File "C:\repos\private\hvplot\.venv\lib\site-packages\sqlalchemy\engine\default.py", line 736, in do_execute
    cursor.execute(statement, parameters)
  File "C:\repos\private\hvplot\.venv\lib\site-packages\duckdb_engine\__init__.py", line 120, in execute
    self.c.execute(statement, parameters)
sqlalchemy.exc.ProgrammingError: (duckdb.BinderException) Binder Error: column "forecast" must appear in the GROUP BY clause or must be part of an aggregate function.
Either add it to the GROUP BY list, or use "ANY_VALUE(forecast)" if the exact value of "forecast" is not important.
LINE 1: ...ECT CASE WHEN (CAST(? AS FLOAT(53)) <= t0.forecast AND t0.forecast < CAST(? AS...
                                                  ^
[SQL: SELECT CASE WHEN (CAST(? AS FLOAT(53)) <= t0.forecast AND t0.forecast < CAST(? AS FLOAT(53))) THEN CAST(? AS SMALLINT) WHEN (CAST(? AS FLOAT(53)) <= t0.forecast AND t0.forecast < CAST(? 
AS FLOAT(53))) THEN CAST(? AS SMALLINT) WHEN (CAST(? AS FLOAT(53)) <= t0.forecast AND t0.forecast <= CAST(? AS FLOAT(53))) THEN CAST(? AS SMALLINT) ELSE CAST(NULL AS SMALLINT) END AS unnamed, 
count(?) AS count
FROM df AS t0 GROUP BY CASE WHEN (CAST(? AS FLOAT(53)) <= t0.forecast AND t0.forecast < CAST(? AS FLOAT(53))) THEN CAST(? AS SMALLINT) WHEN (CAST(? AS FLOAT(53)) <= t0.forecast AND t0.forecast < CAST(? AS FLOAT(53))) THEN CAST(? AS SMALLINT) WHEN (CAST(? AS FLOAT(53)) <= t0.forecast AND t0.forecast <= CAST(? AS FLOAT(53))) THEN CAST(? AS SMALLINT) ELSE CAST(NULL AS SMALLINT) END]  
[parameters: (90.0, 113.33333333333333, 0, 113.33333333333333, 136.66666666666666, 1, 136.66666666666666, 160.0, 2, '*', 90.0, 113.33333333333333, 0, 113.33333333333333, 136.66666666666666, 1, 136.66666666666666, 160.0, 2)]
(Background on this error at: https://sqlalche.me/e/14/f405)

Code of Conduct

  • I agree to follow this project's Code of Conduct
@MarcSkovMadsen MarcSkovMadsen added the bug Incorrect behavior inside of ibis label Dec 1, 2022
@MarcSkovMadsen MarcSkovMadsen changed the title value_counts bug: sqlalchemy.exc.ProgrammingError: (duckdb.BinderException) value_counts: sqlalchemy.exc.ProgrammingError: (duckdb.BinderException) Dec 1, 2022
@cpcloud
Copy link
Member

cpcloud commented Dec 1, 2022

Hey @MarcSkovMadsen thanks for the report.

It looks like this might an issue with duckdb.

Repeating the grouping key expression in the GROUP BY clause should be valid, but it looks like DucKDB is conservative in its determination of whether columns in the grouping key expression are in the select set.

We'll report this upstream and for now you can do this instead:

df = df.mutate(forceast=df.forecast.bucket(buckets)).forecast.value_counts()

@MarcSkovMadsen
Copy link
Author

Thanks a lot. Also, for providing a work around. Then I can move forward bringing Ibis and hvPlot, HoloViews, Datashader etc. closer to each.

@earlev4
Copy link

earlev4 commented Dec 2, 2022

Hi. First off, thank you very much to all the contributors to the Ibis Project! Your work is greatly appreciated. I am fairly new to the Ibis Project and was working through the Ibis Tutorial (replicated with the DuckDB backend and a DuckDB database file) and I am also experiencing the DuckDB backend BinderException error.

Not sure if it is helpful, but I have recreated a portion of the Ibis Tutorial in Google Colab for easy reproduction of the error.

https://colab.research.google.com/drive/1m0n4NSpWWq2EB-xw1MPH1gvdAhgUHXTQ?usp=sharing

Thanks again!

@cpcloud
Copy link
Member

cpcloud commented Jan 30, 2023

@MarcSkovMadsen @earlev4 Thanks for the colab notebook, it definitely helped me figure out a solution to this issue.

Can y'all give a try against master and confirm whether the fix addresses your specific issue(s)?

@earlev4
Copy link

earlev4 commented Jan 30, 2023

Hi @cpcloud. Glad the colab notebook was helpful. Thanks so much for looking into the issue! I sincerely appreciate it.

Perhaps, I am doing something wrong, but on master I am still encountering an issue on the following code:

countries.group_by(continent_name).aggregate(
    countries['population'].sum().name('total_population')
)

Error:

BinderException: Binder Error: column "continent" must appear in the GROUP BY clause or must be part of an aggregate function.
Either add it to the GROUP BY list, or use "ANY_VALUE(continent)" if the exact value of "continent" is not important.
LINE 1: SELECT CASE WHEN (t0.continent = CAST(? AS TEXT)) THEN CA...

I confirmed the code in the Ibis tutorial documentation - Aggregating data.

Looking forward to feedback from @MarcSkovMadsen.

The colab notebook is updated too.
https://colab.research.google.com/drive/1m0n4NSpWWq2EB-xw1MPH1gvdAhgUHXTQ?usp=sharing#scrollTo=iRSmYxAnfAY6

Thanks again!!! I am very grateful for your time and help.

@cpcloud
Copy link
Member

cpcloud commented Jan 30, 2023

@earlev4 When I add these lines to the first cell it successfully completes:

!pip install 'git+https://github.com/ibis-project/ibis.git@master#egg=ibis-framework[duckdb]'
!pip install -U pandas

It seems like we may have uncovered a bug in our dispatch due to accessing a pandas API that is relatively recent.

@earlev4
Copy link

earlev4 commented Jan 30, 2023

Hi @cpcloud. Thanks for the clarification on the the install. I appreciate it. The results look good! Thanks so much!!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Incorrect behavior inside of ibis duckdb The DuckDB backend
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants