DataBook demo

The repository is made of the following files and folders:

data_book.py provides a class for data preparation, that is, converts an input JSON representation of an Excel file to features.
Model-Training.ipynb provides the code for training a model. Note this requires providing a file to support labeling.
Data-Inference.ipynb provides the code for doing inference over a cell or sheet of paper. At the moment we are using the same data used for training to show this.
model.pkl is a trained model that can be used for inference.
unit_test_databook_class.ipynb provides unit tests for the DataBook class
Data-v1 holds some sample data

Limitations

We are not providing the code to deploy these artifacts to the cloud. We are only supporting vertical consistency. Training has been made on a very limited set of data. Training has been made over very few ranges of cells with a single inconsistent cell. We are not considering formatting information to produce fratures.

Running the code

The code should be run on a local machine or any cloud notebook platform. Have an environment with Pandas and Scikit-Learn.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.azureml		.azureml
Data-v1		Data-v1
databook_module		databook_module
mlops		mlops
notebooks		notebooks
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DataBook demo

Limitations

Running the code

About

Releases

Packages

Languages

ks0411/databook_v1

Folders and files

Latest commit

History

Repository files navigation

DataBook demo

Limitations

Running the code

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages