Skip to content

Latest commit

 

History

History

NER-dataset-with-Instructor

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 

Create a structured dataset using the Instructor

Structured data is still crucial when working with Structured datasets to ensure high data quality, easy management, and meaningful analysis. Using LLMs to extract structured data automates and improves this process, making it faster and scalable for different fields.

Pydantic, the backbone of Instructor, enables high customization and utilizes return datatype hints for seamless schema validation. It seamlessly integrates with LanceDB and directly inserts data into tables.

instructor

Run Code

Set Env variable

Add OPENAI_API_KEY as env variable

export OPENAI_API_KEY=sk-...

Once you have added key as env variable, You are ready to run following code.

python3 main.py

NOTE: For this example, we are using medium article dataset, check main.py for changing dataset or entities to be extracted in schemas mentioned.

Once you have created this dataset, You are ready to do NER-powered semantic search