Skip to content

Commit

Permalink
[DOCS] - Fix sample code and python api docs (#71)
Browse files Browse the repository at this point in the history
* fix: fix sample code and python api docs

* fix: readme code sample

* fix: python lint

* fix: repo name in docs & url link

* fix: repo name in docs & url link

* fix: remove useless dependency

* fix: remove .DS_Store
francis-du authored Nov 14, 2022
1 parent 72f0600 commit f0d5659
Showing 29 changed files with 590 additions and 219 deletions.
Binary file added .DS_Store
Binary file not shown.
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -2,6 +2,9 @@ target
Cargo.lock
/venv
.idea
/docs/temp
/docs/build
.DS_Store

# Byte-compiled / optimized / DLL files
__pycache__/
23 changes: 21 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -40,7 +40,6 @@ Simple usage:

```python
import datafusion
from datafusion import functions as f
from datafusion import col
import pyarrow

@@ -70,16 +69,27 @@ assert result.column(1) == pyarrow.array([-3, -3, -3])
### UDFs

```python
import pyarrow
from datafusion import udf

def is_null(array: pyarrow.Array) -> pyarrow.Array:
return array.is_null()

is_null_arr = udf(is_null, [pyarrow.int64()], pyarrow.bool_(), 'stable')

# create a context
ctx = datafusion.SessionContext()

# create a RecordBatch and a new DataFrame from it
batch = pyarrow.RecordBatch.from_arrays(
[pyarrow.array([1, 2, 3]), pyarrow.array([4, 5, 6])],
names=["a", "b"],
)
df = ctx.create_dataframe([[batch]])

df = df.select(is_null_arr(col("a")))

result = df.collect()
result = df.collect()[0]

assert result.column(0) == pyarrow.array([False] * 3)
```
@@ -89,7 +99,9 @@ assert result.column(0) == pyarrow.array([False] * 3)
```python
import pyarrow
import pyarrow.compute
import datafusion
from datafusion import udaf, Accumulator
from datafusion import col


class MyAccumulator(Accumulator):
@@ -113,7 +125,14 @@ class MyAccumulator(Accumulator):
def evaluate(self) -> pyarrow.Scalar:
return self._sum

# create a context
ctx = datafusion.SessionContext()

# create a RecordBatch and a new DataFrame from it
batch = pyarrow.RecordBatch.from_arrays(
[pyarrow.array([1, 2, 3]), pyarrow.array([4, 5, 6])],
names=["a", "b"],
)
df = ctx.create_dataframe([[batch]])

my_udaf = udaf(MyAccumulator, pyarrow.float64(), pyarrow.float64(), [pyarrow.float64()], 'stable')
38 changes: 38 additions & 0 deletions docs/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

#
# Minimal makefile for Sphinx documentation
#

# You can set these variables from the command line, and also
# from the environment for the first two.
SPHINXOPTS ?=
SPHINXBUILD ?= sphinx-build
SOURCEDIR = source
BUILDDIR = build

# Put it first so that "make" without argument is like "make help".
help:
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

.PHONY: help Makefile

# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
64 changes: 64 additions & 0 deletions docs/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
<!---
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->

# DataFusion Documentation

This folder contains the source content of the [python api](./source/api).
These are both published to https://arrow.apache.org/datafusion/
as part of the release process.

## Dependencies

It's recommended to install build dependencies and build the documentation
inside a Python virtualenv.

- Python
- `pip install -r requirements.txt`

## Build & Preview

Run the provided script to build the HTML pages.

```bash
./build.sh
```

The HTML will be generated into a `build` directory.

Preview the site on Linux by running this command.

```bash
firefox build/html/index.html
```

## Release Process

The documentation is served through the
[arrow-site](https://github.com/apache/arrow-site/) repo. To release a new
version of the docs, follow these steps:

1. Run `./build.sh` inside `docs` folder to generate the docs website inside the `build/html` folder.
2. Clone the arrow-site repo
3. Checkout to the `asf-site` branch (NOT `master`)
4. Copy build artifacts into `arrow-site` repo's `datafusion` folder with a command such as

- `cp -rT ./build/html/ ../../arrow-site/datafusion/` (doesn't work on mac)
- `rsync -avzr ./build/html/ ../../arrow-site/datafusion/`

5. Commit changes in `arrow-site` and send a PR.
28 changes: 28 additions & 0 deletions docs/build.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
#!/bin/bash
#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
#

set -e
rm -rf build 2> /dev/null
rm -rf temp 2> /dev/null
mkdir temp
cp -rf source/* temp/
# replace relative URLs with absolute URLs
#sed -i 's/\.\.\/\.\.\/\.\.\//https:\/\/github.com\/apache\/arrow-datafusion\/blob\/master\//g' temp/contributor-guide/index.md
make SOURCEDIR=`pwd`/temp html
52 changes: 52 additions & 0 deletions docs/make.bat
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
@rem Licensed to the Apache Software Foundation (ASF) under one
@rem or more contributor license agreements. See the NOTICE file
@rem distributed with this work for additional information
@rem regarding copyright ownership. The ASF licenses this file
@rem to you under the Apache License, Version 2.0 (the
@rem "License"); you may not use this file except in compliance
@rem with the License. You may obtain a copy of the License at
@rem
@rem http://www.apache.org/licenses/LICENSE-2.0
@rem
@rem Unless required by applicable law or agreed to in writing,
@rem software distributed under the License is distributed on an
@rem "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
@rem KIND, either express or implied. See the License for the
@rem specific language governing permissions and limitations
@rem under the License.

@ECHO OFF

pushd %~dp0

REM Command file for Sphinx documentation

if "%SPHINXBUILD%" == "" (
set SPHINXBUILD=sphinx-build
)
set SOURCEDIR=source
set BUILDDIR=build

if "%1" == "" goto help

%SPHINXBUILD% >NUL 2>NUL
if errorlevel 9009 (
echo.
echo.The 'sphinx-build' command was not found. Make sure you have Sphinx
echo.installed, then set the SPHINXBUILD environment variable to point
echo.to the full path of the 'sphinx-build' executable. Alternatively you
echo.may add the Sphinx directory to PATH.
echo.
echo.If you don't have Sphinx installed, grab it from
echo.http://sphinx-doc.org/
exit /b 1
)

%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
goto end

:help
%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%

:end
popd
22 changes: 22 additions & 0 deletions docs/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

sphinx
pydata-sphinx-theme==0.8.0
myst-parser
maturin
jinja2
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions docs/source/_static/images/DataFusion-Logo-Dark.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions docs/source/_static/images/DataFusion-Logo-Light.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
93 changes: 93 additions & 0 deletions docs/source/_static/theme_overrides.css
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
/**
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/


/* Customizing with theme CSS variables */

:root {
--pst-color-active-navigation: 215, 70, 51;
--pst-color-link-hover: 215, 70, 51;
--pst-color-headerlink: 215, 70, 51;
/* Use normal text color (like h3, ..) instead of primary color */
--pst-color-h1: var(--color-text-base);
--pst-color-h2: var(--color-text-base);
/* Use softer blue from bootstrap's default info color */
--pst-color-info: 23, 162, 184;
--pst-header-height: 0px;
}

code {
color: rgb(215, 70, 51);
}

.footer {
text-align: center;
}

/* Ensure the logo is properly displayed */

.navbar-brand {
height: auto;
width: auto;
}

a.navbar-brand img {
height: auto;
width: auto;
max-height: 15vh;
max-width: 100%;
}


/* This is the bootstrap CSS style for "table-striped". Since the theme does
not yet provide an easy way to configure this globaly, it easier to simply
include this snippet here than updating each table in all rst files to
add ":class: table-striped" */

.table tbody tr:nth-of-type(odd) {
background-color: rgba(0, 0, 0, 0.05);
}


/* Limit the max height of the sidebar navigation section. Because in our
custimized template, there is more content above the navigation, i.e.
larger logo: if we don't decrease the max-height, it will overlap with
the footer.
Details: min(15vh, 110px) for the logo size, 8rem for search box etc*/

@media (min-width:720px) {
@supports (position:-webkit-sticky) or (position:sticky) {
.bd-links {
max-height: calc(100vh - min(15vh, 110px) - 8rem)
}
}
}


/* Fix table text wrapping in RTD theme,
* see https://rackerlabs.github.io/docs-rackspace/tools/rtd-tables.html
*/

@media screen {
table.docutils td {
/* !important prevents the common CSS stylesheets from overriding
this as on RTD they are loaded after this stylesheet */
white-space: normal !important;
}
}
19 changes: 19 additions & 0 deletions docs/source/_templates/docs-sidebar.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@

<a class="navbar-brand" href="{{ pathto(master_doc) }}">
<img src="{{ pathto('_static/images/' + logo, 1) }}" class="logo" alt="logo">
</a>

<form class="bd-search d-flex align-items-center" action="{{ pathto('search') }}" method="get">
<i class="icon fas fa-search"></i>
<input type="search" class="form-control" name="q" id="search-input" placeholder="{{ theme_search_bar_text }}" aria-label="{{ theme_search_bar_text }}" autocomplete="off" >
</form>

<nav class="bd-links" id="bd-docs-nav" aria-label="Main navigation">
<div class="bd-toc-item active">
{% if "python/api" in pagename or "python/generated" in pagename %}
{{ generate_nav_html("sidebar", startdepth=0, maxdepth=3, collapse=False, includehidden=True, titles_only=True) }}
{% else %}
{{ generate_nav_html("sidebar", startdepth=0, maxdepth=4, collapse=False, includehidden=True, titles_only=True) }}
{% endif %}
</div>
</nav>
5 changes: 5 additions & 0 deletions docs/source/_templates/layout.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
{% extends "pydata_sphinx_theme/layout.html" %}

{# Silence the navbar #}
{% block docs_navbar %}
{% endblock %}
2 changes: 2 additions & 0 deletions docs/source/python/api.rst → docs/source/api.rst
Original file line number Diff line number Diff line change
@@ -24,7 +24,9 @@ API Reference
.. toctree::
:maxdepth: 2

api/config
api/dataframe
api/execution_context
api/expression
api/functions
api/object_store
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
.. Licensed to the Apache Software Foundation (ASF) under one
.. Licensed to the Apache Software Foundation (ASF) under one
.. or more contributor license agreements. See the NOTICE file
.. distributed with this work for additional information
.. regarding copyright ownership. The ASF licenses this file
@@ -15,26 +15,13 @@
.. specific language governing permissions and limitations
.. under the License.
datafusion.functions
====================

.. automodule:: datafusion.functions
















.. _api.config:
.. currentmodule:: datafusion

Config
=========

.. autosummary::
:toctree: ../generated/

Config
File renamed without changes.
File renamed without changes.
File renamed without changes.
Original file line number Diff line number Diff line change
@@ -24,4 +24,4 @@ Functions
.. autosummary::
:toctree: ../generated/

functions
functions.functions
Original file line number Diff line number Diff line change
@@ -15,31 +15,13 @@
.. specific language governing permissions and limitations
.. under the License.
datafusion.Expression
=====================
.. _api.object_store:
.. currentmodule:: datafusion.object_store

.. currentmodule:: datafusion
ObjectStore
=========

.. autoclass:: Expression
.. autosummary::
:toctree: ../generated/


.. automethod:: __init__


.. rubric:: Methods

.. autosummary::

~Expression.__init__
~Expression.alias
~Expression.cast
~Expression.column
~Expression.is_null
~Expression.literal
~Expression.sort






object_store
115 changes: 115 additions & 0 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

# Configuration file for the Sphinx documentation builder.
#
# This file only contains a selection of the most common options. For a full
# list see the documentation:
# https://www.sphinx-doc.org/en/master/usage/configuration.html

# -- Path setup --------------------------------------------------------------

# If extensions (or modules to document with autodoc) are in another directory,
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.
#
# import os
# import sys
# sys.path.insert(0, os.path.abspath('.'))

# -- Project information -----------------------------------------------------

project = "Arrow DataFusion"
copyright = "2022, Apache Software Foundation"
author = "Arrow DataFusion Authors"


# -- General configuration ---------------------------------------------------

# Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = [
"sphinx.ext.autodoc",
"sphinx.ext.autosummary",
"sphinx.ext.doctest",
"sphinx.ext.ifconfig",
"sphinx.ext.mathjax",
"sphinx.ext.viewcode",
"sphinx.ext.napoleon",
"myst_parser",
]

source_suffix = {
".rst": "restructuredtext",
".md": "markdown",
}

# Add any paths that contain templates here, relative to this directory.
templates_path = ["_templates"]

# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
# This pattern also affects html_static_path and html_extra_path.
exclude_patterns = []

# Show members for classes in .. autosummary
autodoc_default_options = {
"members": None,
"undoc-members": None,
"show-inheritance": None,
"inherited-members": None,
}

autosummary_generate = True

# -- Options for HTML output -------------------------------------------------

# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
#
html_theme = "pydata_sphinx_theme"

html_theme_options = {
"use_edit_page_button": True,
}

html_context = {
"github_user": "apache",
"github_repo": "arrow-datafusion-python",
"github_version": "master",
"doc_path": "docs/source",
}

# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ["_static"]

html_logo = "_static/images/DataFusion-Logo-Background-White.png"

html_css_files = ["theme_overrides.css"]

html_sidebars = {
"**": ["docs-sidebar.html"],
}

# tell myst_parser to auto-generate anchor links for headers h1, h2, h3
myst_heading_anchors = 3

# enable nice rendering of checkboxes for the task lists
myst_enable_extensions = ["tasklist"]
172 changes: 106 additions & 66 deletions docs/source/python/index.rst → docs/source/index.rst
Original file line number Diff line number Diff line change
@@ -38,32 +38,31 @@ Simple usage:

.. code-block:: python
import datafusion
from datafusion import functions as f
from datafusion import col
import pyarrow
import datafusion
from datafusion import col
import pyarrow
# create a context
ctx = datafusion.SessionContext()
# create a context
ctx = datafusion.SessionContext()
# create a RecordBatch and a new DataFrame from it
batch = pyarrow.RecordBatch.from_arrays(
[pyarrow.array([1, 2, 3]), pyarrow.array([4, 5, 6])],
names=["a", "b"],
)
df = ctx.create_dataframe([[batch]])
# create a RecordBatch and a new DataFrame from it
batch = pyarrow.RecordBatch.from_arrays(
[pyarrow.array([1, 2, 3]), pyarrow.array([4, 5, 6])],
names=["a", "b"],
)
df = ctx.create_dataframe([[batch]])
# create a new statement
df = df.select(
col("a") + col("b"),
col("a") - col("b"),
)
# create a new statement
df = df.select(
col("a") + col("b"),
col("a") - col("b"),
)
# execute and collect the first (and only) batch
result = df.collect()[0]
# execute and collect the first (and only) batch
result = df.collect()[0]
assert result.column(0) == pyarrow.array([5, 7, 9])
assert result.column(1) == pyarrow.array([-3, -3, -3])
assert result.column(0) == pyarrow.array([5, 7, 9])
assert result.column(1) == pyarrow.array([-3, -3, -3])
We can also execute a query against data stored in CSV
@@ -76,7 +75,6 @@ We can also execute a query against data stored in CSV
.. code-block:: python
import datafusion
from datafusion import functions as f
from datafusion import col
import pyarrow
@@ -105,7 +103,6 @@ And how to execute a query against a CSV using SQL:
.. code-block:: python
import datafusion
from datafusion import functions as f
from datafusion import col
import pyarrow
@@ -131,54 +128,84 @@ UDFs

.. code-block:: python
def is_null(array: pyarrow.Array) -> pyarrow.Array:
return array.is_null()
import pyarrow
from datafusion import udf
def is_null(array: pyarrow.Array) -> pyarrow.Array:
return array.is_null()
is_null_arr = udf(is_null, [pyarrow.int64()], pyarrow.bool_(), 'stable')
# create a context
ctx = datafusion.SessionContext()
# create a RecordBatch and a new DataFrame from it
batch = pyarrow.RecordBatch.from_arrays(
[pyarrow.array([1, 2, 3]), pyarrow.array([4, 5, 6])],
names=["a", "b"],
)
df = ctx.create_dataframe([[batch]])
udf = f.udf(is_null, [pyarrow.int64()], pyarrow.bool_())
df = df.select(is_null_arr(col("a")))
df = df.select(udf(col("a")))
result = df.collect()[0]
assert result.column(0) == pyarrow.array([False] * 3)
UDAF
----

.. code-block:: python
import pyarrow
import pyarrow.compute
import pyarrow
import pyarrow.compute
import datafusion
from datafusion import udaf, Accumulator
from datafusion import col
class MyAccumulator(Accumulator):
"""
Interface of a user-defined accumulation.
"""
def __init__(self):
self._sum = pyarrow.scalar(0.0)
class Accumulator:
"""
Interface of a user-defined accumulation.
"""
def __init__(self):
self._sum = pyarrow.scalar(0.0)
def update(self, values: pyarrow.Array) -> None:
# not nice since pyarrow scalars can't be summed yet. This breaks on `None`
self._sum = pyarrow.scalar(self._sum.as_py() + pyarrow.compute.sum(values).as_py())
def to_scalars(self) -> [pyarrow.Scalar]:
return [self._sum]
def merge(self, states: pyarrow.Array) -> None:
# not nice since pyarrow scalars can't be summed yet. This breaks on `None`
self._sum = pyarrow.scalar(self._sum.as_py() + pyarrow.compute.sum(states).as_py())
def update(self, values: pyarrow.Array) -> None:
# not nice since pyarrow scalars can't be summed yet. This breaks on `None`
self._sum = pyarrow.scalar(self._sum.as_py() + pyarrow.compute.sum(values).as_py())
def state(self) -> pyarrow.Array:
return pyarrow.array([self._sum.as_py()])
def merge(self, states: pyarrow.Array) -> None:
# not nice since pyarrow scalars can't be summed yet. This breaks on `None`
self._sum = pyarrow.scalar(self._sum.as_py() + pyarrow.compute.sum(states).as_py())
def evaluate(self) -> pyarrow.Scalar:
return self._sum
def evaluate(self) -> pyarrow.Scalar:
return self._sum
# create a context
ctx = datafusion.SessionContext()
# create a RecordBatch and a new DataFrame from it
batch = pyarrow.RecordBatch.from_arrays(
[pyarrow.array([1, 2, 3]), pyarrow.array([4, 5, 6])],
names=["a", "b"],
)
df = ctx.create_dataframe([[batch]])
df = ...
my_udaf = udaf(MyAccumulator, pyarrow.float64(), pyarrow.float64(), [pyarrow.float64()], 'stable')
udaf = f.udaf(Accumulator, pyarrow.float64(), pyarrow.float64(), [pyarrow.float64()])
df = df.aggregate(
[],
[my_udaf(col("a"))]
)
df = df.aggregate(
[],
[udaf(col("a"))]
)
result = df.collect()[0]
assert result.column(0) == pyarrow.array([6.0])
How to install (from pip)
=========================
@@ -187,6 +214,14 @@ How to install (from pip)
pip install datafusion
You can verify the installation by running:

.. code-block:: python
>>> import datafusion
>>> datafusion.__version__
'0.6.0'
How to develop
==============
@@ -197,16 +232,23 @@ Bootstrap:

.. code-block:: shell
# fetch this repo
git clone git@github.com:apache/arrow-datafusion.git
# fetch this repo
git clone git@github.com:apache/arrow-datafusion-python.git
# prepare development environment (used to build wheel / install in development)
python3 -m venv venv
# activate the venv
source venv/bin/activate
# update pip itself if necessary
python -m pip install -U pip
# install dependencies (for Python 3.8+)
python -m pip install -r requirements-310.txt
cd arrow-datafusion/python
The tests rely on test data in git submodules.

# prepare development environment (used to build wheel / install in development)
python3 -m venv venv
# activate the venv
source venv/bin/activate
pip install -r requirements.txt
.. code-block:: shell
git submodule init
git submodule update
Whenever rust code changes (your changes or via `git pull`):
@@ -225,18 +267,16 @@ To change test dependencies, change the `requirements.in` and run

.. code-block:: shell
# install pip-tools (this can be done only once), also consider running in venv
pip install pip-tools
# change requirements.in and then run
pip-compile --generate-hashes
# install pip-tools (this can be done only once), also consider running in venv
python -m pip install pip-tools
python -m piptools compile --generate-hashes -o requirements-310.txt
To update dependencies, run
To update dependencies, run with `-U`

.. code-block:: shell
pip-compile update
python -m piptools compile -U --generate-hashes -o requirements-310.txt
More details about pip-tools `here <https://github.com/jazzband/pip-tools>`_
50 changes: 0 additions & 50 deletions docs/source/python/generated/datafusion.DataFrame.rst

This file was deleted.

52 changes: 0 additions & 52 deletions docs/source/python/generated/datafusion.SessionContext.rst

This file was deleted.

5 changes: 3 additions & 2 deletions pyproject.toml
Original file line number Diff line number Diff line change
@@ -47,8 +47,9 @@ dependencies = [
]

[project.urls]
documentation = "https://arrow.apache.org/datafusion/python"
repository = "https://github.com/apache/arrow-datafusion"
homepage = "arrow.apache.org/datafusion"
documentation = "arrow.apache.org/datafusion"
repository = "github.com/apache/arrow-datafusion-python"

[tool.isort]
profile = "black"

0 comments on commit f0d5659

Please sign in to comment.