Finetuning Microsoft Phi-2

Introduction

This repository is dedicated to the fine-tuning of a Microsoft's Phi-2 small language model, aiming to enhance its capabilities and adapt it to specific tasks or domains.

About Phi-2

Phi-2 is a Transformer with 2.7 billion parameters. It was trained using the same data sources as Phi-1.5, augmented with a new data source that consists of various NLP synthetic texts and filtered websites (for safety and educational value). When assessed against benchmarks testing common sense, language understanding, and logical reasoning, Phi-2 showcased a nearly state-of-the-art performance among models with less than 13 billion parameters.

The hasn't been fine-tuned through reinforcement learning from human feedback. The intention behind crafting this open-source model is to provide the research community with a non-restricted small model to explore vital safety challenges, such as reducing toxicity, understanding societal biases, enhancing controllability, and more. Source: Microsoft/Phi-2

How to read this repository?

├── LICENSE
├── README.md
├── adapter_utils.py
├── app.py
├── config.py
├── data_utils.py
├── inference.py
├── model
│   └── checkpoint-700
│       ├── README.md
│       ├── adapter_config.json
│       ├── adapter_model.safetensors
│       ├── added_tokens.json
│       ├── merges.txt
│       ├── optimizer.pt
│       ├── rng_state.pth
│       ├── scheduler.pt
│       ├── special_tokens_map.json
│       ├── tokenizer.json
│       ├── tokenizer_config.json
│       ├── trainer_state.json
│       ├── training_args.bin
│       └── vocab.json
├── model_utils.py
├── quantization_utils.py
├── requirements.txt

Install PyPI Libraries

pip install -r requirements.txt --quiet

Dataset

The dataset known as OpenAssistant/oasst1 serves as the fine-tuning source for the model. It includes a collection of human-generated, human-annotated assistant-style conversations, totaling 161,443 messages across 35 diverse languages. This corpus is enriched with 461,292 quality ratings, leading to the creation of over 10,000 fully annotated conversation trees.

pip install datasets

Refer data_utils.py for converting the training dataset into the specific format for fine-tuning.

Instruction Template: ### Human: <YOUR QUERY> ### Assistant: <YOUR ANSWER>

Example: ### Human: What is the impact of cryptocurrency in the world? ### Assistant: Cryptocurrency has had a profound impact by revolutionizing traditional financial systems and fostering decentralization in global transactions.

Architecture of the adapter module and its integration with the Transformer

Source: Parameter-Efficient Transfer Learning for NLP

Transformer

The Adapter module is incorporated twice in the Transformers. Firstly, after the projection layer, which is followed by the Multi Headed attention.
Secondly, after the 2 FeedForward Layers

Adapter

Adapter contains few parameters relative to the attention and feedforward layers relative to the original pre-trained model. The green layers, which you see are the one trained on the domain specific dataset.

Model Fine-tuning

🔍 Quantize the 32-bit Language Model to 4-bit model. This technique reduces the memory and computation requirements of the Neural Network layer by representing the weights and activations in only 4 bits. Refer quantization_utils.py

🧠 Identify the Layers that require weight updates and freeze the rest during fine-tuning. Managing the layers this way will allow the crucial layers to adapt to the new domain-specific data, while preserving the rest of the parameters of the pre-trained model.

The layer names can be identified by printing the Architecture of the model

💡 LoRA, an adapter module, which will hold its own smaller set of parameters, which are learnt during the fine-tuning, enhancing the model's flexibility and adaptability to the domain specific nuances. Refer adapter_utils.py

📚 A dataset tailored specifically to the domain is constructed as Instructions and used as a training dataset for the fine-tuning process. Refer data_utils.py

Inferencing

🔍 Extract the Adapter from the Fine-tuned 4-Bit Quantized model. This Adapter encapsulates the refined parameters tailored to the domain-sepcific data.

🧩 Integrate the Adapter with the original Pre-Trained 32-Bit model. This fusion enables the Language Model with the domain knowledge acquired during the fine-tuning process.

💬 User provides the prompt to the Langauage Model for interaction

🚀 The Language Model generates the response for the provided Prompt.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Finetuning Microsoft Phi-2

Introduction

About Phi-2

How to read this repository?

Install PyPI Libraries

Dataset

Architecture of the adapter module and its integration with the Transformer

Transformer

Adapter

Model Fine-tuning

Inferencing

Sample Inferencing

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
results		results
.gitignore		.gitignore
LICENSE		LICENSE
Phi2.ipynb		Phi2.ipynb
README.md		README.md
adapter_utils.py		adapter_utils.py
config.py		config.py
data_utils.py		data_utils.py
inference.py		inference.py
model_utils.py		model_utils.py
quantization_utils.py		quantization_utils.py
requirements.txt		requirements.txt

License

bala1802/Phi2

Folders and files

Latest commit

History

Repository files navigation

Finetuning Microsoft Phi-2

Introduction

About Phi-2

How to read this repository?

Install PyPI Libraries

Dataset

Architecture of the adapter module and its integration with the Transformer

Transformer

Adapter

Model Fine-tuning

Inferencing

Sample Inferencing

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages