Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Github pages for graphs #379

Merged
merged 10 commits into from
Aug 6, 2021
7 changes: 7 additions & 0 deletions docs/source/dataprofiler.reports.graphs.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
Graphs
========================================

.. automodule:: dataprofiler.reports.graphs
:members:
:undoc-members:
:show-inheritance:
16 changes: 16 additions & 0 deletions docs/source/dataprofiler.reports.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
Reports
=========


Modules
-------

.. toctree::
:maxdepth: 4

dataprofiler.reports.graphs

.. automodule:: dataprofiler.reports
:members:
:undoc-members:
:show-inheritance:
83 changes: 83 additions & 0 deletions docs/source/graphs.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
.. _reports:

Graphs
********

Graph Your Data
=================

We can plot some of our data as seaborn histogram plots. Below will demonstrate how to do so and provide examples.

What we need to import
~~~~~~~~~~~
.. code-block:: python

from dataprofiler.reports import graphs

The main functions that is used to plot histograms are in graphs. **You will also need the** "`dataprofiler[reports]`" **requirements to be installed**

Plotting from a StructuredProfiler class
~~~~~~~~~~~

With a StructuredProfiler class variable, we can specify what columns we want to be plotted, and plot them into histograms.

.. code-block:: python

graphs.plot_histograms(profiler, columns)

These are what the variables mean:

* **profiler** - StructuredProfiler class variable that contains the data we want
* **columns** - (Optional) The list of IntColumn or FloatColumn we want to specifically plot.

Plotting an individual IntColumn or FloatColumn
~~~~~~~~~~~~~~

Example uses a CSV file for example, but CSV, JSON, Avro or Parquet should also work.

.. code-block:: python

graphs.plot_col_histogram(column, axes, title)

These are what the variables mean:

* **column** - The IntColumn or FloatColumn we want to plot
* **axes** - (Optional) The axes we want to specify.
* **title** - (Optional) The title of the plot we want to define.

Examples
~~~~~~~~~~~~~~~~~

1. This example demonstrates how we can take a StructuredProfiler class and plot histograms.

.. code-block:: python

import dataprofiler as dp

data = [[1, 'a', 1.0],
[2, 'b', 2.2],
[3, 'c', 3.5],
[None, 'd', 10.0]]
profiler = dp.StructuredProfiler(data)

# This will plot all IntColumn and FloatColumn as histograms (The first and last column).
graphs.plot_histograms(profiler)

# This will plot the specified, column 0, as a histogram.
column = [0]
graphs.plot_histograms(profiler, columns)

* If a name is specified for a column, and you want to put that column as to be plotted, you would put the name of the column in columns. In this case, 0 is the name of the first column.

2. This example demonstrates how we can plot a specific histogram.

.. code-block:: python


data = pd.Series([1, 2, 3], dtype=str)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

specify the imports here too so user can just copy paste

profiler = IntColumn('example')
profiler.update(data)

# We will plot a IntColumn as a histogram
graphs.plot_col_histogram(self.profiler)