Skip to content

Latest commit

 

History

History
45 lines (26 loc) · 3 KB

the_rising_use_case_of_llm_structuring_unstructured_data.md

File metadata and controls

45 lines (26 loc) · 3 KB

The rising use case of LLM: Structuring unstructured data

NOTE : This is the old README.md of the repo. It is reflecting a state of the repo that is not up to date with the current state of the repo. Indeed at the time it was written, Baker was only the parsing script to transform the recipes from the publicdomainrecipes.com website into a structured format. Now, it is a full-fledged API that can be used to find recipes based on a list of ingredients and a serving size. The code is available here.

This repo contains the code for the blog post "The rising use case of LLM: Structuring unstructured data". The blog post discusses the use of LLMs for structuring unstructured data and show an example by structuring the recipes available at publicdomainrecipes.com

Installation

In order to reuse the code or to reproduce the results, you need to install the required libraries. You can install the required libraries by running the following command:

pip install -r requirements.txt

(Assuming you cloned the repo)

Usage

The code is available in the form of a Jupyter notebook. You can run the notebook demo.ipynb and follow along with the blog post.

Some of the logic leaves outside of the notebook. In particular, the target schema for the recipes is defined in schemas.py, the prompt for the LLM is defined in prompt.py, and the communication channel with the LLM is defined in core.py.

In the article, I used Mistral AI models to structure the recipes. You can use any other LLMs like GPT or Llama, etc. by importing the ChatModel of your choice from langchain. You're likely need to provide an API Key to use the LLM which implies that you have an account on the LLM Provider platform.

Data

The original dataset available here in this repo, originally comes from Sebastian Bahr's repo

The structured dataset is available here.

Contributing

You can raise issues or pull requests on the GitHub repo if you have any suggestions or improvements, you can also comment the article on Towards Data Science.

License

This project is licensed under the MIT License - see the LICENSE