Experimental

This is experimental and unstable.

Pyodide + DuckDB

This is a proof of concept at executing duckdb_wasm from a Pyodide kernel. This unlocks a few paths for using duckdb, such as PyScript & JupyterLite.

** The project should probably be called Pyoduckwasm or something like that... it started with JupyterLite as the end goal.

Demonstration:

Static PyScript Example
PyScript REPL

pyodide console

import micropip;
await micropip.install('pandas');
await micropip.install('jupylite-duckdb');
import jupylite_duckdb as jd;
conn = await jd.connect();
r1 = await jd.query("pragma version", conn);
r2 = await jd.query("create or replace table xyz as select * from 'https://raw.githubusercontent.com/Teradata/kylo/master/samples/sample-data/parquet/userdata2.parquet'", conn);
r3 = await jd.query("select gender, count(*) as c from xyz group by gender", conn);
print(r1);
print(r2);
print(r3);

JupyterLite
JupyterLite Code Console REPL

Note: reloading seems somewhat unreliable with pyodide. CTRL-F5 works more reliably.

Limitations:

API: duckdb.connect() and duckdb.query()
DataFrames are not (yet) registered in the DuckDB database.
Data is copied from the duckdb_wasm arrow result to a python list[dict], and then to a dataframe. PyArrow is not available (yet) in Pyodide.

Observations:

It takes about a minute to run the JupyterLite examples. Most of this time is prior to any DuckDB stuff. Some of this time could be shaved off with a custom pyodide build, but PyScript is much faster.
JupyterLite was unreliable with page reloads, I ended up having to clear the cache a lot.
Not thrilled with PyScript removing the top level await... will probably just auto-wrap it (like ipython %autoawait)

Demonstration

Code Console REPL Example

jupyterlite_duckdb_wasm

Python wrapper to run DuckDB_WASM within JupyterLite with a Pyodide Kernel See notebooks for example of running this within jupyterlite

Cell Magic %%dql

Following the example of magic_duckdb, there's an initial proof of concept for a duckdb for JupyterLite. See Magic Example

Pyodide Console

pyodide console

import micropip;
await micropip.install('pandas');
await micropip.install('jupylite-duckdb');
import jupylite_duckdb as jd;
conn = await jd.connect();
r1 = await jd.query("pragma version", conn);
r2 = await jd.query("create or replace table xyz as select * from 'https://raw.githubusercontent.com/Teradata/kylo/master/samples/sample-data/parquet/userdata2.parquet'", conn);
r3 = await jd.query("select gender, count(*) as c from xyz group by gender", conn);
print(r1);
print(r2);
print(r3);

Various Issues, Todos and Ideas

Implement a proof of concept version of dataframe registration
Evaluate startup time reduction, perhaps custom pyodide build
Handling errors: detect and display errors in Jupyter: too much sfuff buried in console, such as CORS errors
invalidate pip browser cache (as/if needed); annoying for development purposes
think through async/await/transform_cell approach and whether there's a better solution.
Zero copy data exchange (js/duckdb arrow -> python/dataframe and python/df -> js/duckdb): Blocked by Pyarrow support
If you're adding local .py files, use importlib.invalidate_caches(). Even then, it was flaky to import.
Careful with caching... %pip install will pull from browser cache. I had to clear frequently within dev tools
To clear local storage, which is annoyingly persistent, https://superuser.com/questions/519628/clear-html5-local-storage-on-a-specific-page
%autoawait is part of why this works in notebooks, which is enabled by default. The %%dql cell magic patches transform-cell to push an await into the cell transformation.: https://ipython.readthedocs.io/en/stable/interactive/autoawait.html

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
.github/workflows		.github/workflows
jupylite_duckdb		jupylite_duckdb
notebooks		notebooks
pyscript		pyscript
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py
wasm_example.html		wasm_example.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Experimental

Pyodide + DuckDB

Demonstration:

Observations:

Demonstration

Code Console REPL Example

jupyterlite_duckdb_wasm

Cell Magic %%dql

Pyodide Console

Various Issues, Todos and Ideas

About

Releases

Packages

Languages

License

iqmo-org/jupylite_duckdb

Folders and files

Latest commit

History

Repository files navigation

Experimental

Pyodide + DuckDB

Demonstration:

Observations:

Demonstration

Code Console REPL Example

jupyterlite_duckdb_wasm

Cell Magic %%dql

Pyodide Console

Various Issues, Todos and Ideas

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages