Jupyter DataTables

Jupyter Notebook extension to leverage pandas DataFrames by integrating DataTables JS.

About

Data scientists and in fact many developers work with pd.DataFrame on daily basis to interpret data to process them. In my typical workflow. The common workflow is to display the dataframe, take a look at the data schema and then produce multiple plots to check the distribution of the data to have a clearer picture, perhaps search some data in the table, etc...

What if those distribution plots were part of the standard DataFrame and we had the ability to quickly search through the table with minimal effort? What if it was the default representation?

The jupyter-datatables uses jupyter-require to draw the table.

Installation

pip install jupyter-datatables

Usage

import numpy as np
import pandas as pd

from jupyter_datatables import init_datatables_mode

init_datatables_mode()

That's it, your default pandas representation will now use Jupyter DataTables!

df = pd.DataFrame(np.abs(np.random.randn(50, 5)), columns=list(string.ascii_uppercase[:5]))

In most cases, you don't need to worry too much about the size of your data. Jupyter DataTables calculates required sample size based on a confidence interval (by default this would be 0.95) and margin of error and ceils it to the highest 'smart' value.

For example, for a data containing 100,000 samples, given 0.975 confidence interval and 0.02 margin of error, the Jupyter DataTables would calculate that 3044 samples are required and it would round it up to 4000.

With additional note:

Sample size: 4,000 out of 100,000

We can also handle wide tables with ease.

df = pd.DataFrame(np.abs(np.random.randn(50, 20)), columns=list(string.ascii_uppercase[:20]))

As per 0.3.0, there is a support for interactive tooltips:

And also support for custom indices including Date type:

dft = pd.DataFrame({'A': np.random.rand(5),
                    'B': [1, 1, 3, 2, 1],
                    'C': 'This is a very long sentence that should automatically be trimmed',
                    'D': [pd.Timestamp('20010101'), pd.Timestamp('20010102'), pd.Timestamp('20010103'), pd.Timestamp('20010104'), pd.Timestamp('20010105')],
                    'E': pd.Series([1.0] * 5).astype('float32'),
                    'F': [False, True, False, False, True],
                   })

dft.D = dft.D.apply(pd.to_datetime)
dft.set_index('D', inplace=True)

Current status and future plans:

Check out the Project Board where we track issues and TODOs for our Jupyter tooling!

Author: Marek Cermak macermak@redhat.com, @AICoE

Name		Name	Last commit message	Last commit date
Latest commit History 147 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
assets/images		assets/images
examples		examples
jupyter_datatables		jupyter_datatables
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Pipfile		Pipfile
README.md		README.md
requirements.in		requirements.in
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Jupyter DataTables

About

Installation

Usage

Current status and future plans:

About

Releases

Packages

Contributors 2

Languages

License

CermakM/jupyter-datatables

Folders and files

Latest commit

History

Repository files navigation

Jupyter DataTables

About

Installation

Usage

Current status and future plans:

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages