[DONT AUTOMERGE]: Generate Docs for 0.7.7 (#451)

* Update: updated the documentation for 0.7.6 * Fix: made a small fix in the README * Update: updated index.html with latest version and README.md with more instruction details * Resolved README issues * updated readme * Empty Commit * Generated the 0.7.7 docs
capitalone · Oct 4, 2022 · dc4c415 · dc4c415
1 parent 9599de1
commit dc4c415
Show file tree

Hide file tree

Showing 254 changed files with 63,780 additions and 1 deletion.
diff --git a/docs/0.7.7/doctrees/API.doctree b/docs/0.7.7/doctrees/API.doctree
diff --git a/docs/0.7.7/doctrees/add_new_model_to_data_labeler.doctree b/docs/0.7.7/doctrees/add_new_model_to_data_labeler.doctree
diff --git a/docs/0.7.7/doctrees/data_labeling.doctree b/docs/0.7.7/doctrees/data_labeling.doctree
diff --git a/docs/0.7.7/doctrees/data_reader.doctree b/docs/0.7.7/doctrees/data_reader.doctree
diff --git a/docs/0.7.7/doctrees/data_readers.doctree b/docs/0.7.7/doctrees/data_readers.doctree
diff --git a/docs/0.7.7/doctrees/dataprofiler.data_readers.avro_data.doctree b/docs/0.7.7/doctrees/dataprofiler.data_readers.avro_data.doctree
diff --git a/docs/0.7.7/doctrees/dataprofiler.data_readers.base_data.doctree b/docs/0.7.7/doctrees/dataprofiler.data_readers.base_data.doctree
diff --git a/docs/0.7.7/doctrees/dataprofiler.data_readers.csv_data.doctree b/docs/0.7.7/doctrees/dataprofiler.data_readers.csv_data.doctree
diff --git a/docs/0.7.7/doctrees/dataprofiler.data_readers.data.doctree b/docs/0.7.7/doctrees/dataprofiler.data_readers.data.doctree
diff --git a/docs/0.7.7/doctrees/dataprofiler.data_readers.data_utils.doctree b/docs/0.7.7/doctrees/dataprofiler.data_readers.data_utils.doctree
diff --git a/docs/0.7.7/doctrees/dataprofiler.data_readers.doctree b/docs/0.7.7/doctrees/dataprofiler.data_readers.doctree
diff --git a/docs/0.7.7/doctrees/dataprofiler.data_readers.filepath_or_buffer.doctree b/docs/0.7.7/doctrees/dataprofiler.data_readers.filepath_or_buffer.doctree
diff --git a/docs/0.7.7/doctrees/dataprofiler.data_readers.json_data.doctree b/docs/0.7.7/doctrees/dataprofiler.data_readers.json_data.doctree
diff --git a/docs/0.7.7/doctrees/dataprofiler.data_readers.parquet_data.doctree b/docs/0.7.7/doctrees/dataprofiler.data_readers.parquet_data.doctree
diff --git a/docs/0.7.7/doctrees/dataprofiler.data_readers.structured_mixins.doctree b/docs/0.7.7/doctrees/dataprofiler.data_readers.structured_mixins.doctree
diff --git a/docs/0.7.7/doctrees/dataprofiler.data_readers.text_data.doctree b/docs/0.7.7/doctrees/dataprofiler.data_readers.text_data.doctree
diff --git a/docs/0.7.7/doctrees/dataprofiler.doctree b/docs/0.7.7/doctrees/dataprofiler.doctree
diff --git a/docs/0.7.7/doctrees/dataprofiler.dp_logging.doctree b/docs/0.7.7/doctrees/dataprofiler.dp_logging.doctree
diff --git a/docs/0.7.7/doctrees/dataprofiler.labelers.base_data_labeler.doctree b/docs/0.7.7/doctrees/dataprofiler.labelers.base_data_labeler.doctree
diff --git a/docs/0.7.7/doctrees/dataprofiler.labelers.base_model.doctree b/docs/0.7.7/doctrees/dataprofiler.labelers.base_model.doctree
diff --git a/docs/0.7.7/doctrees/dataprofiler.labelers.character_level_cnn_model.doctree b/docs/0.7.7/doctrees/dataprofiler.labelers.character_level_cnn_model.doctree
diff --git a/docs/0.7.7/doctrees/dataprofiler.labelers.classification_report_utils.doctree b/docs/0.7.7/doctrees/dataprofiler.labelers.classification_report_utils.doctree
diff --git a/docs/0.7.7/doctrees/dataprofiler.labelers.data_labelers.doctree b/docs/0.7.7/doctrees/dataprofiler.labelers.data_labelers.doctree
diff --git a/docs/0.7.7/doctrees/dataprofiler.labelers.data_processing.doctree b/docs/0.7.7/doctrees/dataprofiler.labelers.data_processing.doctree
diff --git a/docs/0.7.7/doctrees/dataprofiler.labelers.doctree b/docs/0.7.7/doctrees/dataprofiler.labelers.doctree
diff --git a/docs/0.7.7/doctrees/dataprofiler.labelers.labeler_utils.doctree b/docs/0.7.7/doctrees/dataprofiler.labelers.labeler_utils.doctree
diff --git a/docs/0.7.7/doctrees/dataprofiler.labelers.regex_model.doctree b/docs/0.7.7/doctrees/dataprofiler.labelers.regex_model.doctree
diff --git a/docs/0.7.7/doctrees/dataprofiler.profilers.base_column_profilers.doctree b/docs/0.7.7/doctrees/dataprofiler.profilers.base_column_profilers.doctree
diff --git a/docs/0.7.7/doctrees/dataprofiler.profilers.categorical_column_profile.doctree b/docs/0.7.7/doctrees/dataprofiler.profilers.categorical_column_profile.doctree
diff --git a/docs/0.7.7/doctrees/dataprofiler.profilers.column_profile_compilers.doctree b/docs/0.7.7/doctrees/dataprofiler.profilers.column_profile_compilers.doctree
diff --git a/docs/0.7.7/doctrees/dataprofiler.profilers.data_labeler_column_profile.doctree b/docs/0.7.7/doctrees/dataprofiler.profilers.data_labeler_column_profile.doctree
diff --git a/docs/0.7.7/doctrees/dataprofiler.profilers.datetime_column_profile.doctree b/docs/0.7.7/doctrees/dataprofiler.profilers.datetime_column_profile.doctree
diff --git a/docs/0.7.7/doctrees/dataprofiler.profilers.doctree b/docs/0.7.7/doctrees/dataprofiler.profilers.doctree
diff --git a/docs/0.7.7/doctrees/dataprofiler.profilers.float_column_profile.doctree b/docs/0.7.7/doctrees/dataprofiler.profilers.float_column_profile.doctree
diff --git a/docs/0.7.7/doctrees/dataprofiler.profilers.helpers.doctree b/docs/0.7.7/doctrees/dataprofiler.profilers.helpers.doctree
diff --git a/docs/0.7.7/doctrees/dataprofiler.profilers.helpers.report_helpers.doctree b/docs/0.7.7/doctrees/dataprofiler.profilers.helpers.report_helpers.doctree
diff --git a/docs/0.7.7/doctrees/dataprofiler.profilers.histogram_utils.doctree b/docs/0.7.7/doctrees/dataprofiler.profilers.histogram_utils.doctree
diff --git a/docs/0.7.7/doctrees/dataprofiler.profilers.int_column_profile.doctree b/docs/0.7.7/doctrees/dataprofiler.profilers.int_column_profile.doctree
diff --git a/docs/0.7.7/doctrees/dataprofiler.profilers.numerical_column_stats.doctree b/docs/0.7.7/doctrees/dataprofiler.profilers.numerical_column_stats.doctree
diff --git a/docs/0.7.7/doctrees/dataprofiler.profilers.order_column_profile.doctree b/docs/0.7.7/doctrees/dataprofiler.profilers.order_column_profile.doctree
diff --git a/docs/0.7.7/doctrees/dataprofiler.profilers.profile_builder.doctree b/docs/0.7.7/doctrees/dataprofiler.profilers.profile_builder.doctree
diff --git a/docs/0.7.7/doctrees/dataprofiler.profilers.profiler_options.doctree b/docs/0.7.7/doctrees/dataprofiler.profilers.profiler_options.doctree
diff --git a/docs/0.7.7/doctrees/dataprofiler.profilers.text_column_profile.doctree b/docs/0.7.7/doctrees/dataprofiler.profilers.text_column_profile.doctree
diff --git a/docs/0.7.7/doctrees/dataprofiler.profilers.unstructured_data_labeler_column_profile.doctree b/docs/0.7.7/doctrees/dataprofiler.profilers.unstructured_data_labeler_column_profile.doctree
diff --git a/docs/0.7.7/doctrees/dataprofiler.profilers.unstructured_labeler_profile.doctree b/docs/0.7.7/doctrees/dataprofiler.profilers.unstructured_labeler_profile.doctree
diff --git a/docs/0.7.7/doctrees/dataprofiler.profilers.unstructured_text_profile.doctree b/docs/0.7.7/doctrees/dataprofiler.profilers.unstructured_text_profile.doctree
diff --git a/docs/0.7.7/doctrees/dataprofiler.profilers.utils.doctree b/docs/0.7.7/doctrees/dataprofiler.profilers.utils.doctree
diff --git a/docs/0.7.7/doctrees/dataprofiler.reports.doctree b/docs/0.7.7/doctrees/dataprofiler.reports.doctree
diff --git a/docs/0.7.7/doctrees/dataprofiler.reports.graphs.doctree b/docs/0.7.7/doctrees/dataprofiler.reports.graphs.doctree
diff --git a/docs/0.7.7/doctrees/dataprofiler.reports.utils.doctree b/docs/0.7.7/doctrees/dataprofiler.reports.utils.doctree
diff --git a/docs/0.7.7/doctrees/dataprofiler.settings.doctree b/docs/0.7.7/doctrees/dataprofiler.settings.doctree
diff --git a/docs/0.7.7/doctrees/dataprofiler.validators.base_validators.doctree b/docs/0.7.7/doctrees/dataprofiler.validators.base_validators.doctree
diff --git a/docs/0.7.7/doctrees/dataprofiler.validators.doctree b/docs/0.7.7/doctrees/dataprofiler.validators.doctree
diff --git a/docs/0.7.7/doctrees/dataprofiler.version.doctree b/docs/0.7.7/doctrees/dataprofiler.version.doctree
diff --git a/docs/0.7.7/doctrees/environment.pickle b/docs/0.7.7/doctrees/environment.pickle
diff --git a/docs/0.7.7/doctrees/examples.doctree b/docs/0.7.7/doctrees/examples.doctree
diff --git a/docs/0.7.7/doctrees/graphs.doctree b/docs/0.7.7/doctrees/graphs.doctree
diff --git a/docs/0.7.7/doctrees/index.doctree b/docs/0.7.7/doctrees/index.doctree
diff --git a/docs/0.7.7/doctrees/install.doctree b/docs/0.7.7/doctrees/install.doctree
diff --git a/docs/0.7.7/doctrees/labeler.doctree b/docs/0.7.7/doctrees/labeler.doctree
diff --git a/docs/0.7.7/doctrees/modules.doctree b/docs/0.7.7/doctrees/modules.doctree
diff --git a/docs/0.7.7/doctrees/nbsphinx/add_new_model_to_data_labeler.ipynb b/docs/0.7.7/doctrees/nbsphinx/add_new_model_to_data_labeler.ipynb
diff --git a/docs/0.7.7/doctrees/nbsphinx/data_reader.ipynb b/docs/0.7.7/doctrees/nbsphinx/data_reader.ipynb
diff --git a/docs/0.7.7/doctrees/nbsphinx/labeler.ipynb b/docs/0.7.7/doctrees/nbsphinx/labeler.ipynb
diff --git a/docs/0.7.7/doctrees/nbsphinx/overview.ipynb b/docs/0.7.7/doctrees/nbsphinx/overview.ipynb
diff --git a/docs/0.7.7/doctrees/nbsphinx/profiler_example.ipynb b/docs/0.7.7/doctrees/nbsphinx/profiler_example.ipynb
diff --git a/docs/0.7.7/doctrees/nbsphinx/regex_labeler_from_scratch.ipynb b/docs/0.7.7/doctrees/nbsphinx/regex_labeler_from_scratch.ipynb
diff --git a/docs/0.7.7/doctrees/nbsphinx/unstructured_profiler_example.ipynb b/docs/0.7.7/doctrees/nbsphinx/unstructured_profiler_example.ipynb
diff --git a/docs/0.7.7/doctrees/overview.doctree b/docs/0.7.7/doctrees/overview.doctree
diff --git a/docs/0.7.7/doctrees/profiler.doctree b/docs/0.7.7/doctrees/profiler.doctree
diff --git a/docs/0.7.7/doctrees/profiler_example.doctree b/docs/0.7.7/doctrees/profiler_example.doctree
diff --git a/docs/0.7.7/doctrees/regex_labeler_from_scratch.doctree b/docs/0.7.7/doctrees/regex_labeler_from_scratch.doctree
diff --git a/docs/0.7.7/doctrees/unstructured_profiler_example.doctree b/docs/0.7.7/doctrees/unstructured_profiler_example.doctree
diff --git a/docs/0.7.7/html/.buildinfo b/docs/0.7.7/html/.buildinfo
@@ -0,0 +1,4 @@
+# Sphinx build info version 1
+# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
+config: 069dd74c21791bbcf1aac658337c995c
+tags: 645f666f9bcd5a90fca523b33c5a78b7
diff --git a/docs/0.7.7/html/API.html b/docs/0.7.7/html/API.html
diff --git a/docs/0.7.7/html/_images/DL-Flowchart.png b/docs/0.7.7/html/_images/DL-Flowchart.png
diff --git a/docs/0.7.7/html/_images/histogram_example_0.png b/docs/0.7.7/html/_images/histogram_example_0.png
diff --git a/docs/0.7.7/html/_images/histogram_example_1.png b/docs/0.7.7/html/_images/histogram_example_1.png
diff --git a/docs/0.7.7/html/_images/histogram_example_2.png b/docs/0.7.7/html/_images/histogram_example_2.png
diff --git a/docs/0.7.7/html/_images/missing_value_barchart_example_0.png b/docs/0.7.7/html/_images/missing_value_barchart_example_0.png
diff --git a/docs/0.7.7/html/_images/missing_value_matrix_example_0.png b/docs/0.7.7/html/_images/missing_value_matrix_example_0.png
diff --git a/docs/0.7.7/html/_sources/API.rst.txt b/docs/0.7.7/html/_sources/API.rst.txt
@@ -0,0 +1,16 @@
+.. _API:
+
+API
+***
+
+The API is split into 4 main components: Profilers, Labelers, Data Readers, and
+Validators.
+
+.. toctree::
+   :maxdepth: 1
+   :caption: Contents:
+
+   dataprofiler.data_readers
+   dataprofiler.profilers
+   dataprofiler.labelers
+   dataprofiler.validators
diff --git a/docs/0.7.7/html/_sources/add_new_model_to_data_labeler.nblink.txt b/docs/0.7.7/html/_sources/add_new_model_to_data_labeler.nblink.txt
@@ -0,0 +1,3 @@
+{
+    "path": "../../feature_branch/examples/add_new_model_to_data_labeler.ipynb"
+}
diff --git a/docs/0.7.7/html/_sources/data_labeling.rst.txt b/docs/0.7.7/html/_sources/data_labeling.rst.txt
@@ -0,0 +1,365 @@
+.. _data_labeling:
+
+Labeler (Sensitive Data)
+************************
+
+In this library, the term *data labeling* refers to entity recognition.
+
+Builtin to the data profiler is a classifier which evaluates the complex data types of the dataset.
+For structured data, it determines the complex data type of each column. When
+running the data profile, it uses the default data labeling model builtin to the
+library. However, the data labeler allows users to train their own data labeler
+as well.
+
+*Data Labels* are determined per cell for structured data (column/row when 
+the *profiler* is used) or at the character level for unstructured data. This
+is a list of the default labels.
+
+* UNKNOWN
+* ADDRESS
+* BAN (bank account number, 10-18 digits)
+* CREDIT_CARD
+* EMAIL_ADDRESS
+* UUID 
+* HASH_OR_KEY (md5, sha1, sha256, random hash, etc.)
+* IPV4
+* IPV6
+* MAC_ADDRESS
+* PERSON
+* PHONE_NUMBER
+* SSN
+* URL
+* US_STATE
+* DRIVERS_LICENSE
+* DATE
+* TIME
+* DATETIME
+* INTEGER
+* FLOAT
+* QUANTITY
+* ORDINAL
+
+
+Identify Entities in Structured Data
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Makes predictions and identifying labels:
+
+.. code-block:: python
+
+    import dataprofiler as dp
+
+    # load data and data labeler
+    data = dp.Data("your_data.csv")
+    data_labeler = dp.DataLabeler(labeler_type='structured')
+
+    # make predictions and get labels per cell
+    predictions = data_labeler.predict(data)
+
+Identify Entities in Unstructured Data
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Predict which class characters belong to in unstructured text:
+
+.. code-block:: python
+
+    import dataprofiler as dp
+
+    data_labeler = dp.DataLabeler(labeler_type='unstructured')
+
+    # Example sample string, must be in an array (multiple arrays can be passed)
+    sample = ["Help\tJohn Macklemore\tneeds\tfood.\tPlease\tCall\t555-301-1234."
+              "\tHis\tssn\tis\tnot\t334-97-1234. I'm a BAN: 000043219499392912.\n"]
+
+    # Prediction what class each character belongs to
+    model_predictions = data_labeler.predict(
+        sample, predict_options=dict(show_confidences=True))
+
+    # Predictions / confidences are at the character level
+    final_results = model_predictions["pred"]
+    final_confidences = model_predictions["conf"]
+
+It's also possible to change output formats, output similar to a **SpaCy** format:
+
+.. code-block:: python
+
+    import dataprofiler as dp
+
+    data_labeler = dp.DataLabeler(labeler_type='unstructured', trainable=True)
+
+    # Example sample string, must be in an array (multiple arrays can be passed)
+    sample = ["Help\tJohn Macklemore\tneeds\tfood.\tPlease\tCall\t555-301-1234."
+              "\tHis\tssn\tis\tnot\t334-97-1234. I'm a BAN: 000043219499392912.\n"]
+
+    # Set the output to the NER format (start position, end position, label)
+    data_labeler.set_params(
+        { 'postprocessor': { 'output_format':'ner', 'use_word_level_argmax':True } } 
+    )
+
+    results = data_labeler.predict(sample)
+
+    print(results)
+
+Train a New Data Labeler
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+Mechanism for training your own data labeler on their own set of structured data
+(tabular):
+
+.. code-block:: python
+    
+    import dataprofiler as dp
+
+    # Will need one column with a default label of UNKNOWN
+    data = dp.Data("your_file.csv")
+
+    data_labeler = dp.train_structured_labeler(
+        data=data,
+        save_dirpath="/path/to/save/labeler",
+        epochs=2
+    )
+
+    data_labeler.save_to_disk("my/save/path") # Saves the data labeler for reuse
+
+Load an Existing Data Labeler
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Mechanism for loading an existing data_labeler:
+
+.. code-block:: python
+
+    import dataprofiler as dp
+
+    data_labeler = dp.DataLabeler(
+        labeler_type='structured', dirpath="/path/to/my/labeler")
+
+    # get information about the parameters/inputs/output formats for the DataLabeler
+    data_labeler.help()
+
+Extending a Data Labeler with Transfer Learning
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Extending or changing labels of a data labeler w/ transfer learning:
+Note: By default, **a labeler loaded will not be trainable**. In order to load a 
+trainable DataLabeler, the user must set `trainable=True` or load a labeler 
+using the `TrainableDataLabeler` class.
+
+The following illustrates how to change the labels:
+
+.. code-block:: python
+
+    import dataprofiler as dp
+
+    labels = ['label1', 'label2', ...]  # new label set can also be an encoding dict
+    data = dp.Data("your_file.csv")  # contains data with new labels
+
+    # load default structured Data Labeler w/ trainable set to True
+    data_labeler = dp.DataLabeler(labeler_type='structured', trainable=True)
+
+    # this will use transfer learning to retrain the data labeler on your new 
+    # dataset and labels.
+    # NOTE: data must be in an acceptable format for the preprocessor to interpret.
+    #       please refer to the preprocessor/model for the expected data format.
+    #       Currently, the DataLabeler cannot take in Tabular data, but requires 
+    #       data to be ingested with two columns [X, y] where X is the samples and 
+    #       y is the labels.
+    model_results = data_labeler.fit(x=data['samples'], y=data['labels'], 
+                                     validation_split=0.2, epochs=2, labels=labels)
+
+    # final_results, final_confidences are a list of results for each epoch
+    epoch_id = 0
+    final_results = model_results[epoch_id]["pred"]
+    final_confidences = model_results[epoch_id]["conf"]
+
+The following illustrates how to extend the labels:
+
+.. code-block:: python
+
+    import dataprofiler as dp
+
+    new_labels = ['label1', 'label2', ...]
+    data = dp.Data("your_file.csv")  # contains data with new labels
+
+    # load default structured Data Labeler w/ trainable set to True
+    data_labeler = dp.DataLabeler(labeler_type='structured', trainable=True)
+
+    # this will maintain current labels and model weights, but extend the model's 
+    # labels
+    for label in new_labels:
+        data_labeler.add_label(label)
+    
+    # NOTE: a user can also add a label which maps to the same index as an existing 
+    # label
+    # data_labeler.add_label(label, same_as='<label_name>')
+
+    # For a trainable model, the user must then train the model to be able to 
+    # continue using the labeler since the model's graph has likely changed
+    # NOTE: data must be in an acceptable format for the preprocessor to interpret.
+    #       please refer to the preprocessor/model for the expected data format.
+    #       Currently, the DataLabeler cannot take in Tabular data, but requires 
+    #       data to be ingested with two columns [X, y] where X is the samples and 
+    #       y is the labels.
+    model_results = data_labeler.fit(x=data['samples'], y=data['labels'], 
+                                     validation_split=0.2, epochs=2)
+
+    # final_results, final_confidences are a list of results for each epoch
+    epoch_id = 0
+    final_results = model_results[epoch_id]["pred"]
+    final_confidences = model_results[epoch_id]["conf"]
+
+
+Changing pipeline parameters:
+
+.. code-block:: python
+
+    import dataprofiler as dp
+
+    # load default Data Labeler
+    data_labeler = dp.DataLabeler(labeler_type='structured')
+
+    # change parameters of specific component
+    data_labeler.preprocessor.set_params({'param1': 'value1'})
+
+    # change multiple simultaneously.
+    data_labeler.set_params({
+        'preprocessor':  {'param1': 'value1'},
+        'model':         {'param2': 'value2'},
+        'postprocessor': {'param3': 'value3'}
+    })
+
+
+Build Your Own Data Labeler
+===========================
+
+The DataLabeler has 3 main components: preprocessor, model, and postprocessor. 
+To create your own DataLabeler, each one would have to be created or an 
+existing component can be reused.
+
+Given a set of the 3 components, you can construct your own DataLabeler:
+
+.. code-block:: python
+    from dataprofiler.labelers.base_data_labeler import BaseDataLabeler, \
+                                                        TrainableDataLabeler
+    from dataprofiler.labelers.character_level_cnn_model import CharacterLevelCnnModel
+    from dataprofiler.labelers.data_processing import \
+         StructCharPreprocessor, StructCharPostprocessor
+
+    # load a non-trainable data labeler
+    model = CharacterLevelCnnModel(...)
+    preprocessor = StructCharPreprocessor(...)
+    postprocessor = StructCharPostprocessor(...)
+
+    data_labeler = BaseDataLabeler.load_with_components(
+        preprocessor=preprocessor, model=model, postprocessor=postprocessor)
+
+    # check for basic compatibility between the processors and the model
+    data_labeler.check_pipeline()
+
+
+    # load trainable data labeler
+    data_labeler = TrainableDataLabeler.load_with_components(
+        preprocessor=preprocessor, model=model, postprocessor=postprocessor)
+
+    # check for basic compatibility between the processors and the model
+    data_labeler.check_pipeline()
+
+Option for swapping out specific components of an existing labeler.
+
+.. code-block:: python
+
+    import dataprofiler as dp
+    from dataprofiler.labelers.character_level_cnn_model import \
+        CharacterLevelCnnModel
+    from dataprofiler.labelers.data_processing import \
+        StructCharPreprocessor, StructCharPostprocessor
+
+    model = CharacterLevelCnnModel(...)
+    preprocessor = StructCharPreprocessor(...)
+    postprocessor = StructCharPostprocessor(...)
+    
+    data_labeler = dp.DataLabeler(labeler_type='structured')
+    data_labeler.set_preprocessor(preprocessor)
+    data_labeler.set_model(model)
+    data_labeler.set_postprocessor(postprocessor)
+    
+    # check for basic compatibility between the processors and the model
+    data_labeler.check_pipeline()
+
+
+Model Component
+~~~~~~~~~~~~~~~
+
+In order to create your own model component for data labeling, you can utilize 
+the `BaseModel` class from `dataprofiler.labelers.base_model` and
+overriding the abstract class methods.
+
+Reviewing `CharacterLevelCnnModel` from 
+`dataprofiler.labelers.character_level_cnn_model` illustrates the functions 
+which need an override. 
+
+#. `__init__`: specifying default parameters and calling base `__init__`
+#. `_validate_parameters`: validating parameters given by user during setting
+#. `_need_to_reconstruct_model`: flag for when to reconstruct a model (i.e. 
+   parameters change or labels change require a model reconstruction)
+#. `_construct_model`: initial construction of the model given the parameters
+#. `_reconstruct_model`: updates model architecture for new label set while 
+   maintaining current model weights
+#. `fit`: mechanism for the model to learn given training data
+#. `predict`: mechanism for model to make predictions on data
+#. `details`: prints a summary of the model construction
+#. `save_to_disk`: saves model and model parameters to disk
+#. `load_from_disk`: loads model given a path on disk
+
+
+Preprocessor Component
+~~~~~~~~~~~~~~~~~~~~~~
+
+In order to create your own preprocessor component for data labeling, you can 
+utilize the `BaseDataPreprocessor` class 
+from `dataprofiler.labelers.data_processing` and override the abstract class 
+methods.
+
+Reviewing `StructCharPreprocessor` from 
+`dataprofiler.labelers.data_processing` illustrates the functions which 
+need an override.
+
+#. `__init__`: passing parameters to the base class and executing any 
+   extraneous calculations to be saved as parameters
+#. `_validate_parameters`: validating parameters given by user during
+   setting
+#. `process`: takes in the user data and converts it into an digestible, 
+   iterable format for the model
+#. `set_params` (optional): if a parameter requires processing before setting,
+   a user can override this function to assist with setting the parameter
+#. `_save_processor` (optional): if a parameter is not JSON serializable, a 
+   user can override this function to assist in saving the processor and its 
+   parameters
+#. `load_from_disk` (optional): if a parameter(s) is not JSON serializable, a 
+   user can override this function to assist in loading the processor 
+
+Postprocessor Component
+~~~~~~~~~~~~~~~~~~~~~~~
+
+The postprocessor is nearly identical to the preprocessor except it handles 
+the output of the model for processing. In order to create your own 
+postprocessor component for data  labeling, you can utilize the 
+`BaseDataPostprocessor` class from  `dataprofiler.labelers.data_processing` 
+and override the abstract class methods.
+
+Reviewing `StructCharPostprocessor` from 
+`dataprofiler.labelers.data_processing` illustrates the functions which 
+need an override.
+
+#. `__init__`: passing parameters to the base class and executing any 
+   extraneous calculations to be saved as parameters
+#. `_validate_parameters`: validating parameters given by user during
+   setting
+#. `process`: takes in the output of the model and processes for output to 
+   the user
+#. `set_params` (optional): if a parameter requires processing before setting,
+   a user can override this function to assist with setting the parameter 
+#. `_save_processor` (optional): if a parameter is not JSON serializable, a 
+   user can override this function to assist in saving the processor and its 
+   parameters
+#. `load_from_disk` (optional): if a parameter(s) is not JSON serializable, a 
+   user can override this function to assist in loading the processor 
diff --git a/docs/0.7.7/html/_sources/data_reader.nblink.txt b/docs/0.7.7/html/_sources/data_reader.nblink.txt
@@ -0,0 +1,3 @@
+{
+    "path": "../../feature_branch/examples/data_readers.ipynb"
+}