GitHub - snehitvaddi/Llama2_Idioms_4Bit: Fine-tuned LLaMA2 7B model on idioms datasets, employing quantization and parameter-efficient fine-tuning (PEFT) techniques to suggest contextually relevant idioms

Llama2_Idioms_4Bit - Idiom Suggestor Using LLaMA2

Fine-tuned LLaMA2 7B model on idioms datasets, employing quantization and parameter-efficient fine-tuning (PEFT) techniques to suggest contextually relevant idioms, enhancing communication naturalness and engagement. Explored 4-bit quantization and parameter-efficient fine-tuning (PEFT) to enhance the model's efficiency and performance.

Overview

The Idiom Suggestor project involves three main stages:

LLM_Synth_Data_Gen_Langchain.ipynb:
- Extracting and preparing idiom data from a comprehensive English idiom dictionary.
- Idioms are extracted from a PDF containing an English idiom dictionary using PyPDF2.
- The text is split and processed into manageable chunks, which are then embedded and indexed using OpenAI's embeddings and FAISS for efficient retrieval during training.
Llama2_Data_Tamplating.ipynb:
- Formatting the data to align with LLaMA's training requirements.
- "<s>[INST] {context} [/INST] {response}</s>\"
- Data is transformed into LLaMA-compatible formats using pandas and sklearn, ensuring that idioms and their contexts are properly formatted.
- This step is crucial for effective learning during the model training phase.
Fine_tune_Llama_2.ipynb:
- Training the model using quantization and PEFT to effectively learn idiomatic expressions.
- Fine-tuning is conducted on a pre-trained LLaMA model using advanced techniques such as LoRA and QLoRA for efficient training.
- Specific attention is paid to managing GPU resources, especially in environments with limited hardware capabilities like Google Colab.

Files

Idioms-Dictionary.pdf - Source dictonary file for idioms.
final_Idioms_test_Evaluation.xlsx - LLM validation resulsts and Rogue scored

Installation

Before you start, ensure you have Python 3.x installed. Then, clone this repository and install the required packages.

git clone https://github.com/yourusername/idiom-suggestor.git
cd idiom-suggestor
# Execute Each cell in Jupyter files

Fine-Tuning Configurations

Model: NousResearch/Llama-2-7b-chat-hf
Quantization: 4-bit precision using BitsAndBytes
PEFT: LoRA with an attention dimension of 64 and an alpha scaling parameter of 16
Training Epochs: 5
Optimizer: AdamW with a cosine learning rate scheduler.

Model Evaluation Using ROUGE Scores

To assess the quality of the generated idiomatic expressions, we use the ROUGE (Recall-Oriented Understudy for Gisting Evaluation) metric. Ensure you have the rouge-score library installed to use the ROUGE metrics:

pip install rouge-score

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
Fine_tune_Llama_2.ipynb		Fine_tune_Llama_2.ipynb
Idioms-Dictonary.pdf		Idioms-Dictonary.pdf
LLM_Synth_Data_Gen_Langchain.ipynb		LLM_Synth_Data_Gen_Langchain.ipynb
Llama2_Data_Tamplating.ipynb		Llama2_Data_Tamplating.ipynb
README.md		README.md
final_Idioms_test_Evaluation.xlsx		final_Idioms_test_Evaluation.xlsx

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Llama2_Idioms_4Bit - Idiom Suggestor Using LLaMA2

Overview

Files

Installation

Fine-Tuning Configurations

Model Evaluation Using ROUGE Scores

About

Releases

Languages

snehitvaddi/Llama2_Idioms_4Bit

Folders and files

Latest commit

History

Repository files navigation

Llama2_Idioms_4Bit - Idiom Suggestor Using LLaMA2

Overview

Files

Installation

Fine-Tuning Configurations

Model Evaluation Using ROUGE Scores

About

Resources

Stars

Watchers

Forks

Releases

Languages