Spark SQL magic command for Jupyter notebooks.
- Python >= 3.6
- PySpark >= 2.3.0
- IPython >= 7.4.0
- Jinja2 >= 3.0.0
pip install sparksql-jupyter
%load_ext sparksql_jupyter
%config SparkSql.limit=<INT>
Option | Default | Description |
---|---|---|
SparkSql.limit |
20 | The maximum number of rows to display |
%%sparksql [-c|--cache] [-e|--eager] [-v|--view VIEW] [-l|--limit LIMIT] [-j|--jinja] [variable]
<QUERY>
Parameter | Description |
---|---|
-c --cache |
Cache dataframe |
-e --eager |
Cache dataframe with eager load |
-v VIEW --view VIEW |
Create or replace temporary view |
-l LIMIT --limit LIMIT |
The maximum number of rows to display (Default: SparkSql.limit ) |
variable |
Capture dataframe in a local variable |
-j --jinja |
Capture dataframe in a local variable in jinja2 template |
twine upload --repository-url https://path/to/repo/ dist/sparksql_jupyter-{version}-py36-none-any.whl
- 2024/05/09
- stringify timestamp with timezone columns before seriealization