Skip to content

Commit

Permalink
Merge branch 'branch-22.06' into branch-22.08-merge-22.06
Browse files Browse the repository at this point in the history
  • Loading branch information
AjayThorve committed Jun 1, 2022
2 parents 816c519 + 4be4708 commit 5fcc228
Show file tree
Hide file tree
Showing 34 changed files with 2,598 additions and 295 deletions.
45 changes: 23 additions & 22 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,15 +2,15 @@

[![Build Status](https://gpuci.gpuopenanalytics.com/job/rapidsai/job/gpuci/job/cuxfilter/job/branches/job/cuxfilter-branch-pipeline/badge/icon)](https://gpuci.gpuopenanalytics.com/job/rapidsai/job/gpuci/job/cuxfilter/job/branches/job/cuxfilter-branch-pipeline/)

cuxfilter ( ku-cross-filter ) is a [RAPIDS](https://github.com/rapidsai) framework to connect web visualizations to GPU accelerated crossfiltering. Inspired by the javascript version of the [original]( https://github.com/crossfilter/crossfilter), it enables interactive and super fast multi-dimensional filtering of 100 million+ row tabular datasets via [cuDF](https://github.com/rapidsai/cudf).

cuxfilter ( ku-cross-filter ) is a [RAPIDS](https://github.com/rapidsai) framework to connect web visualizations to GPU accelerated crossfiltering. Inspired by the javascript version of the [original]( https://github.com/crossfilter/crossfilter), it enables interactive and super fast multi-dimensional filtering of 100 million+ row tabular datasets via [cuDF](https://github.com/rapidsai/cudf).

## RAPIDS Viz
cuxfilter is one of the core projects of the “RAPIDS viz” team. Taking the axiom that “a slider is worth a thousand queries” from @lmeyerov to heart, we want to enable fast exploratory data analytics through an easier-to-use pythonic notebook interface.

cuxfilter is one of the core projects of the “RAPIDS viz” team. Taking the axiom that “a slider is worth a thousand queries” from @lmeyerov to heart, we want to enable fast exploratory data analytics through an easier-to-use pythonic notebook interface.

As there are many fantastic visualization libraries available for the web, our general principle is not to create our own viz library, but to enhance others with faster acceleration, larger datasets, and better dev UX. **Basically, we want to take the headache out of interconnecting multiple charts to a GPU backend, so you can get to visually exploring data faster.**

By the way, cuxfilter is best used to interact with large (1 million+) tabular datasets. GPU’s are fast, but accessing that speedup requires some architecture overhead that isn’t worthwhile for small datasets.
By the way, cuxfilter is best used to interact with large (1 million+) tabular datasets. GPU’s are fast, but accessing that speedup requires some architecture overhead that isn’t worthwhile for small datasets.

For more detailed requirements, see below.

Expand All @@ -22,7 +22,7 @@ The current version of cuxfilter leverages jupyter notebook and bokeh server to

### What is cuDataTiles?

cuxfilter implements cuDataTiles, a GPU accelerated version of data tiles based on the work of [Falcon](https://github.com/uwdata/falcon). When starting to interact with specific charts in a cuxfilter dashboard, values for the other charts are precomputed to allow for fast slider scrubbing without having to recalculate values.
cuxfilter implements cuDataTiles, a GPU accelerated version of data tiles based on the work of [Falcon](https://github.com/uwdata/falcon). When starting to interact with specific charts in a cuxfilter dashboard, values for the other charts are precomputed to allow for fast slider scrubbing without having to recalculate values.

### Open Source Projects

Expand All @@ -34,17 +34,16 @@ cuxfilter wouldn’t be possible without using these great open source projects:
- [Falcon](https://github.com/uwdata/falcon)
- [Jupyter](https://jupyter.org/about)


### Where is the original cuxfilter and Mortgage Viz Demo?

The original version (0.2) of cuxfilter, most known for the backend powering the Mortgage Viz Demo, has been moved into the [`GTC-2018-mortgage-visualization branch`](https://github.com/rapidsai/cuxfilter/tree/GTC-2018-mortgage-visualization) branch. As it has a much more complicated backend and javascript API, we’ve decided to focus more on the streamlined notebook focused version here.


## Usage

### Example 1
### Example 1

[![Open In Studio Lab](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/rapidsai/cuxfilter/blob/branch-22.02/notebooks/auto_accidents_example.ipynb) [<img src="https://img.shields.io/badge/-Setup Studio Lab Environment-gray.svg">](./notebooks/README.md#amazon-sagemaker-studio-lab)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/rapidsai/cuxfilter/blob/branch-22.02/notebooks/auto_accidents_example.ipynb) [<img src="https://img.shields.io/badge/-Setup Colab Environment-gray.svg">](./notebooks/README.md#google-colab)

```python
Expand All @@ -63,7 +62,7 @@ gtc_demo_red_blue_palette = [ "#3182bd", "#6baed6", "#7b8ed8", "#e26798", "#ff00

#declare charts
chart1 = cuxfilter.charts.scatter(x='dropoff_x', y='dropoff_y', aggregate_col='DAY_WEEK', aggregate_fn='mean',
color_palette=gtc_demo_red_blue_palette, tile_provider='CARTODBPOSITRON',
color_palette=gtc_demo_red_blue_palette, tile_provider='CartoLight', unselected_alpha=0.2,
pixel_shade_type='linear')
chart2 = cuxfilter.charts.multi_select('YEAR')
chart3 = cuxfilter.charts.bar('DAY_WEEK', x_label_map=label_map)
Expand All @@ -79,11 +78,13 @@ d = cux_df.dashboard([chart1, chart3, chart4], sidebar=[chart2], layout=cuxfilte
d.app()

```

![output dashboard](./docs/_images/demo.gif)

### Example 2
### Example 2

[![Open In Studio Lab](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/rapidsai/cuxfilter/blob/branch-22.02/notebooks/Mortgage_example.ipynb) [<img src="https://img.shields.io/badge/-Setup Studio Lab Environment-gray.svg">](./notebooks/README.md#amazon-sagemaker-studio-lab)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/rapidsai/cuxfilter/blob/branch-22.02/notebooks/Mortgage_example.ipynb) [<img src="https://img.shields.io/badge/-Setup Colab Environment-gray.svg">](./notebooks/README.md#google-colab)

```python
Expand Down Expand Up @@ -117,16 +118,15 @@ d = cux_df.dashboard([chart0, chart2],sidebar=[chart3, chart1], layout=cuxfilter
d.show('jupyter-notebook/lab-url')

```
![output dashboard](./docs/_images/demo2.gif)

![output dashboard](./docs/_images/demo2.gif)

## Documentation

Full documentation can be found [on the RAPIDS docs page](https://docs.rapids.ai/api/cuxfilter/stable/).

Troubleshooting help can be found [on our troubleshooting page](https://docs.rapids.ai/api/cuxfilter/stable/installation.html#troubleshooting).


## General Dependencies

- python
Expand All @@ -146,18 +146,18 @@ Please see the [Demo Docker Repository](https://hub.docker.com/r/rapidsai/rapids

## Installation


### CUDA/GPU requirements

* CUDA 10.1+
* NVIDIA driver 418.39+
* Pascal architecture or better (Compute Capability >=6.0)
- CUDA 10.1+
- NVIDIA driver 418.39+
- Pascal architecture or better (Compute Capability >=6.0)

### Conda

cuxfilter can be installed with conda ([miniconda](https://conda.io/miniconda.html), or the full [Anaconda distribution](https://www.anaconda.com/download)) from the `rapidsai` channel:

For `cuxfilter version == 22.08` :

```bash
# for CUDA 11.5
conda install -c rapidsai -c nvidia -c numba -c conda-forge \
Expand All @@ -166,6 +166,7 @@ conda install -c rapidsai -c nvidia -c numba -c conda-forge \
```

For the nightly version of `cuxfilter` :

```bash
# for CUDA 11.5
conda install -c rapidsai-nightly -c nvidia -c numba -c conda-forge \
Expand All @@ -174,13 +175,11 @@ conda install -c rapidsai-nightly -c nvidia -c numba -c conda-forge \

Note: cuxfilter is supported only on Linux, and with Python versions 3.7 and later.

See the [Get RAPIDS version picker](https://rapids.ai/start.html) for more OS and version info.

See the [Get RAPIDS version picker](https://rapids.ai/start.html) for more OS and version info.

### Build/Install from Source
See [build instructions](CONTRIBUTING.md#setting-up-your-build-environment).


See [build instructions](CONTRIBUTING.md#setting-up-your-build-environment).

## Troubleshooting

Expand Down Expand Up @@ -232,6 +231,7 @@ python -c "from cuxfilter.sampledata import datasets_check; datasets_check(base_
Currently supported layout templates and example code can be found on the [layouts page](https://rapidsai.github.io/cuxfilter/layouts/Layouts.html).

### Currently Supported Charts

| Library | Chart type |
| ------------- | ------------- |
| bokeh | bar, line |
Expand All @@ -249,4 +249,5 @@ You can see the examples to implement viz libraries in the bokeh and cudatashade
For more details, check out the [contributing guide](./CONTRIBUTING.md).

## Future Work

cuxfilter development is in early stages and on going. See what we are planning next on the [projects page](https://github.com/rapidsai/cuxfilter/projects).
Binary file added docs/_images/9-2Bpoints.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/_images/demo.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
36 changes: 36 additions & 0 deletions docs/source/Dask-cudf-support.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
cuxfilter with multi-GPU using dask_cudf
========================================

`Dask-cuDF <https://github.com/rapidsai/cudf/tree/main/python/dask_cudf>`_ extends Dask where necessary to allow its DataFrame partitions to be processed by cuDF GPU DataFrames as opposed to Pandas DataFrames. For instance, when you call dask_cudf.read_csv(…), your cluster’s GPUs do the work of parsing the CSV file(s) with underlying cudf.read_csv().

When to use cuDF and Dask-cuDF
------------------------------

If your workflow is fast enough on a single GPU or your data comfortably fits in memory on a single GPU, you would want to use cuDF. If you want to distribute your workflow across multiple GPUs, have more data than you can fit in memory on a single GPU, or want to analyze data spread across many files at once, you would want to use Dask-cuDF.

A very useful guide to using Dask-cudf can be found `here <https://docs.rapids.ai/api/cudf/stable/user_guide/10min.html>`_

Cuxfilter with Dask-cudf
------------------------

Using cuxfilter with Dask-cudf is a very seamless experience, and passing in a `dask_cudf.DataFrame` object, instead of `cudf.DataFrame` object should just work, without any other modifications. The `dask_cudf.DataFrame` should however be initialized with it's partitions set, before passing it the the `cuxfilter.DataFrame.from_dataframe` function.

For more information and examples, please visit the cuxfilter repository with `dask_cudf notebooks <https://github.com/rapidsai/cuxfilter/tree/branch-22.06/notebooks/notebooks/cuxfilter%20with%20dask_cudf>`_


.. list-table:: Currently Supported Charts
:widths: 50 50
:header-rows: 1

* - Library
- Chart type
* - bokeh
- bar, line
* - datashader
- scatter, scatter_geo, line, stacked_lines, heatmap, graph(note: edge rendering support is limited for now)
* - panel_widgets
- range_slider, date_range_slider, float_slider, int_slider, drop_down, multi_select, card, number
* - custom
- view_dataframe
* - deckgl
- choropleth(3d and 2d)
11 changes: 6 additions & 5 deletions docs/source/charts/datashader_charts.rst
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ Example
end = start + 60 * 60 * 24

cux_df = DataFrame.from_dataframe(cudf.DataFrame({'x': np.linspace(start, end, n), 'y':np.random.normal(0, 0.3, size=n).cumsum() + 50}))
line_chart_1 = line(x='x', y='y')
line_chart_1 = line(x='x', y='y', unselected_alpha=0.2)

d = cux_df.dashboard([line_chart_1])
line_chart_1.view()
Expand All @@ -54,7 +54,7 @@ Example

cux_df = DataFrame.from_dataframe(cudf.DataFrame({'x': [float(random.randrange(-8239000,-8229000)) for i in range(10000)], 'y':[float(random.randrange(4960000, 4980000)) for i in range(10000)]}))
# setting pixel_shade_type='linear' to display legend (currently supports only log/linear)
scatter_chart = scatter(x='x',y='y', pixel_shade_type="linear")
scatter_chart = scatter(x='x',y='y', pixel_shade_type="linear", unselected_alpha=0.2)

d = cux_df.dashboard([scatter_chart])
scatter_chart.view()
Expand All @@ -78,7 +78,8 @@ Example

stacked_lines_chart = stacked_lines(x='Time', y=['a', 'b', 'c', 'd', 'e', 'f', 'g', 'x', 'y', 'z'],
colors = ["red", "grey", "black", "purple", "pink",
"yellow", "brown", "green", "orange", "blue"]
"yellow", "brown", "green", "orange", "blue"],
unselected_alpha=0.2
)

d = cux_df.dashboard([stacked_lines_chart])
Expand All @@ -105,7 +106,7 @@ Example
colors = ["#75968f", "#a5bab7", "#c9d9d3", "#e2e2e2", "#dfccce", "#ddb7b1", "#cc7878", "#933b41", "#550b1d"]

chart1 = heatmap(x='Year', y='Month', aggregate_col='rate',
color_palette=colors, point_size=20)
color_palette=colors, point_size=20, unselected_alpha=0.2)


d = cux_df.dashboard([chart1], layout=layouts.single_feature, theme=themes.dark)
Expand Down Expand Up @@ -144,7 +145,7 @@ Example

cux_df = cuxfilter.DataFrame.load_graph((nodes, edges))

chart0 = cuxfilter.charts.datashader.graph(node_pixel_shade_type='linear')
chart0 = cuxfilter.charts.datashader.graph(node_pixel_shade_type='linear', unselected_alpha=0.2)

d = cux_df.dashboard([chart0], layout=cuxfilter.layouts.double_feature)
chart0.view()
Expand Down
4 changes: 2 additions & 2 deletions docs/source/examples/NYC_taxi_example.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -163,7 +163,7 @@
"chart1 = charts.scatter(x='dropoff_x',\n",
" y='dropoff_y',\n",
" aggregate_fn='mean',aggregate_col='payment_type', pixel_shade_type='log', legend_position='top_right',\n",
" tile_provider=\"CartoLight\", x_range=(-8239910.23,-8229529.24), y_range=(4968481.34,4983152.92))\n",
" tile_provider=\"CartoLight\", x_range=(-8239910.23,-8229529.24), y_range=(4968481.34,4983152.92)), unselected_alpha=0.2\n",
"\n",
"chart2 = charts.bar('passenger_count', data_points=9)\n",
"chart3 = cuxfilter.charts.bar('tpep_pickup_datetime')\n",
Expand Down Expand Up @@ -270,7 +270,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.12"
"version": "3.8.13"
}
},
"nbformat": 4,
Expand Down
4 changes: 2 additions & 2 deletions docs/source/examples/auto_accidents_example.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -115,7 +115,7 @@
"source": [
"chart1 = charts.scatter(x='dropoff_x', y='dropoff_y', aggregate_col='DAY_WEEK', aggregate_fn='mean',\n",
" tile_provider=\"CartoLight\",\n",
" color_palette=gtc_demo_red_blue_palette,pixel_shade_type='linear')\n",
" color_palette=gtc_demo_red_blue_palette,pixel_shade_type='linear', unselected_alpha=0.2)\n",
"\n",
"chart2 = charts.bar('YEAR')\n",
"\n",
Expand Down Expand Up @@ -221,7 +221,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.12"
"version": "3.8.13"
}
},
"nbformat": 4,
Expand Down
4 changes: 2 additions & 2 deletions docs/source/examples/graphs.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -135,7 +135,7 @@
" node_id='SYMBOL', timeout=200, edge_aggregate_col='color',\n",
" node_aggregate_col='Color', node_aggregate_fn='mean', node_pixel_shade_type='linear',\n",
" edge_render_type='direct',#other option available -> 'curved'\n",
" edge_transparency=0.5\n",
" edge_transparency=0.5, unselected_alpha=0.2\n",
" )\n",
"\n",
"chart1 = cuxfilter.charts.number('Color', aggregate_fn=\"mean\", widget=True, title=\"Mean Color\")"
Expand Down Expand Up @@ -240,7 +240,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.12"
"version": "3.8.13"
}
},
"nbformat": 4,
Expand Down
1 change: 1 addition & 0 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ cuxfilter acts as a connector library, which provides the connections between di
./dataframe.rst
./deployment.rst
./10_minutes_to_cuxfilter.ipynb
./Dask-cudf-support.rst
./charts/charts.rst
./layouts/Layouts.ipynb
./themes/Themes.ipynb
Expand Down
3 changes: 2 additions & 1 deletion notebooks/NYC_taxi_example.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -131,6 +131,7 @@
"chart1 = cuxfilter.charts.scatter(x='dropoff_x',\n",
" y='dropoff_y',\n",
" aggregate_fn='mean',aggregate_col='payment_type', pixel_shade_type='log', legend_position='top_right',\n",
" unselected_alpha=0.2,\n",
" tile_provider=\"CartoDark\", x_range=(-8239910.23,-8229529.24), y_range=(4968481.34,4983152.92))\n",
"\n",
"chart2 = cuxfilter.charts.bar('passenger_count', data_points=9)\n",
Expand Down Expand Up @@ -238,7 +239,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.10"
"version": "3.8.13"
}
},
"nbformat": 4,
Expand Down
Loading

0 comments on commit 5fcc228

Please sign in to comment.