Does not find data frame with duckdb #1033

charlax · 2024-09-17T20:17:45Z

What happens?

For some reason, jupysql can't find my dataframe, even though I used exactly the example in the docs...

To Reproduce

As of writing, I have the latest jupyterlab, jupysql, and duckedb. I'm following the docs and getting "Catalog Error: Table with name df does not exist!", both with native and sqlalchemy connections.

OS:

macOS

JupySQL Version:

0.10.13

Full Name:

Charles-Axel Dein

Affiliation:

Stealth Startup

Alex-Monahan · 2024-09-18T13:10:32Z

I just encountered the same issue using the Google Collab here: https://colab.research.google.com/drive/1eOA2FYHqEfZWLYssbUxdIpSL3PFxWVjk?usp=sharing#scrollTo=uNxSRUVu4YvY

Alex-Monahan · 2024-09-18T13:20:07Z

I just encountered the same issue using the Google Collab here: https://colab.research.google.com/drive/1eOA2FYHqEfZWLYssbUxdIpSL3PFxWVjk?usp=sharing#scrollTo=uNxSRUVu4YvY

The DuckDB configuration setting to allow DataFrame access is set to true, so that isn't the issue.

I also tried with a native DuckDB connection within JupySQL instead of SQLAlchemy and received a similar error.

However, running DuckDB directly on the dataframe works without an issue.

conn = duckdb.connect()
conn.execute('''SELECT sum(i) as total_i FROM input_df''').df()

Alex-Monahan · 2024-09-18T13:36:23Z

This may be related to duckdb/duckdb#13896. This fix is coming in DuckDB 1.1.1, tentatively planned for about a week from now.

Alex-Monahan · 2024-09-18T15:13:56Z

My current hypothesis is that it is related to the DuckDB issue above (duckdb/duckdb#13896). Maybe JupySQL is trying to access a dataframe that is no longer in the same frame? I think there are 2 paths: either adjust how the code is executed so that the dataframes are in the same frame, or once 1.1.1 arrives, automatically set the upcoming setting python_scan_all_frames to true automatically when using JupySQL.

edublancas · 2024-09-18T20:28:51Z

I tried this with duckdb 1.0.0 and it works:

In [1]: import pandas as pd
In [2]: df = pd.DataFrame({"x":range(10)})

In [3]: df
Out[3]:
   x
0  0
1  1
2  2
3  3
4  4
5  5
6  6
7  7
8  8
9  9

In [5]: %load_ext sql
The 'toml' package isn't installed. To load settings from pyproject.toml or ~/.jupysql/config, install with: pip install toml

In [6]: %sql duckdb://
Connecting to 'duckdb://'

In [7]: %sql select * from df
Running query in 'duckdb://'
Out[7]:
+---+
| x |
+---+
| 0 |
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
| 6 |
| 7 |
| 8 |
| 9 |
+---+
Truncated to displaylimit of 10.

my recommendation for now is to downgrade:

pip install 'duckdb<1.1'

once the duckdb fix is in, we'll update jupysql

eitsupi · 2024-10-07T13:22:10Z

Any updates?

I needed something like this with the latest versions.

%%sql
set python_scan_all_frames=true

edublancas · 2024-10-07T17:39:17Z

@eitsupi since DuckDB change its default behavior, I think we should follow (as opposed to arbitrarily running set python_scan_all_frames=true on the user's behalf.

I'll update the docs so people know they need to execute such code at the top of the notebook if they want DuckDB to scan the data frames.

edublancas · 2024-10-08T04:10:22Z

I spent some time looking into this: when using sqlalchemy, set python_scan_all_frames=true fixes the issue:

In [1]: %load_ext sql

In [2]: import pandas as pd; df = pd.DataFrame({"x": range(10)}); conn.execute("select * from df").df()
Out[2]:
   x
0  0
1  1
2  2
3  3
4  4
5  5
6  6
7  7
8  8
9  9

In [3]: %sql duckdb://
Connecting to 'duckdb://'

In [4]: %sql select * from df
Running query in 'duckdb://'
RuntimeError: If using snippets, you may pass the --with argument explicitly.
For more details please refer: https://jupysql.ploomber.io/en/latest/compose.html#with-argument


Original error message from DB driver:
(duckdb.duckdb.CatalogException) Catalog Error: Table with name df does not exist!
Did you mean "pg_depend"?
LINE 1: select * from df
                      ^
[SQL: select * from df]
(Background on this error at: https://sqlalche.me/e/20/f405)

If you need help solving this issue, send us a message: https://ploomber.io/community

In [5]: %sql set python_scan_all_frames = true;
Running query in 'duckdb://'
Out[5]:
+---------+
| Success |
+---------+
+---------+

In [6]: %sql select * from df
Running query in 'duckdb://'
Out[6]:
+---+
| x |
+---+
| 0 |
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
| 6 |
| 7 |
| 8 |
| 9 |
+---+
Truncated to displaylimit of 10.

In [7]: duckdb.__version__
Out[7]: '1.1.1'

but this doesn't work when passing a native duckdb connection, I found a workaround but it requires some extra work. will try to find time soon to patch it

mirorac · 2024-10-16T17:59:13Z

Another temporary workaround is to manually pass desired dataframes using:

In [5]: %sql --persist df --no-index
Success! Persisted df to the database.

In [6]: %sql select * from df
Running query in 'duckdb://'
Out[6]:
+---+
| x |
+---+
| 0 |
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
| 6 |
| 7 |
| 8 |
| 9 |
+---+
Truncated to displaylimit of 10.

However, I do not know if there are any side effects, such as duplicate memory allocation.

charlax changed the title ~~Does not find data frame with ducked~~ Does not find data frame with duckdb Sep 17, 2024

eitsupi mentioned this issue Oct 8, 2024

Add note about SET python_scan_all_frames=true for DuckDB PRQL/pyprql#427

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does not find data frame with duckdb #1033

Does not find data frame with duckdb #1033

charlax commented Sep 17, 2024

Alex-Monahan commented Sep 18, 2024

Alex-Monahan commented Sep 18, 2024 •

edited

Loading

Alex-Monahan commented Sep 18, 2024 •

edited

Loading

Alex-Monahan commented Sep 18, 2024

edublancas commented Sep 18, 2024

eitsupi commented Oct 7, 2024 •

edited

Loading

edublancas commented Oct 7, 2024

edublancas commented Oct 8, 2024 •

edited

Loading

mirorac commented Oct 16, 2024

Does not find data frame with duckdb #1033

Does not find data frame with duckdb #1033

Comments

charlax commented Sep 17, 2024

What happens?

To Reproduce

OS:

JupySQL Version:

Full Name:

Affiliation:

Alex-Monahan commented Sep 18, 2024

Alex-Monahan commented Sep 18, 2024 • edited Loading

Alex-Monahan commented Sep 18, 2024 • edited Loading

Alex-Monahan commented Sep 18, 2024

edublancas commented Sep 18, 2024

eitsupi commented Oct 7, 2024 • edited Loading

edublancas commented Oct 7, 2024

edublancas commented Oct 8, 2024 • edited Loading

mirorac commented Oct 16, 2024

Alex-Monahan commented Sep 18, 2024 •

edited

Loading

Alex-Monahan commented Sep 18, 2024 •

edited

Loading

eitsupi commented Oct 7, 2024 •

edited

Loading

edublancas commented Oct 8, 2024 •

edited

Loading