This repository is dedicated to the fine-tuning of a Microsoft's Phi-2 small language model, aiming to enhance its capabilities and adapt it to specific tasks or domains.
Phi-2 is a Transformer with 2.7 billion parameters. It was trained using the same data sources as Phi-1.5, augmented with a new data source that consists of various NLP synthetic texts and filtered websites (for safety and educational value). When assessed against benchmarks testing common sense, language understanding, and logical reasoning, Phi-2 showcased a nearly state-of-the-art performance among models with less than 13 billion parameters.
The hasn't been fine-tuned through reinforcement learning from human feedback. The intention behind crafting this open-source model is to provide the research community with a non-restricted small model to explore vital safety challenges, such as reducing toxicity, understanding societal biases, enhancing controllability, and more. Source: Microsoft/Phi-2
├── LICENSE
├── README.md
├── adapter_utils.py
├── app.py
├── config.py
├── data_utils.py
├── inference.py
├── model
│ └── checkpoint-700
│ ├── README.md
│ ├── adapter_config.json
│ ├── adapter_model.safetensors
│ ├── added_tokens.json
│ ├── merges.txt
│ ├── optimizer.pt
│ ├── rng_state.pth
│ ├── scheduler.pt
│ ├── special_tokens_map.json
│ ├── tokenizer.json
│ ├── tokenizer_config.json
│ ├── trainer_state.json
│ ├── training_args.bin
│ └── vocab.json
├── model_utils.py
├── quantization_utils.py
├── requirements.txt
pip install -r requirements.txt --quiet
The dataset known as OpenAssistant/oasst1 serves as the fine-tuning source for the model. It includes a collection of human-generated, human-annotated assistant-style conversations, totaling 161,443 messages across 35 diverse languages. This corpus is enriched with 461,292 quality ratings, leading to the creation of over 10,000 fully annotated conversation trees.
pip install datasets
Refer data_utils.py for converting the training dataset into the specific format for fine-tuning.
Instruction Template:
### Human: <YOUR QUERY> ### Assistant: <YOUR ANSWER>
Example:
### Human: What is the impact of cryptocurrency in the world? ### Assistant: Cryptocurrency has had a profound impact by revolutionizing traditional financial systems and fostering decentralization in global transactions.
Source: Parameter-Efficient Transfer Learning for NLP
- The
Adapter
module is incorporated twice in theTransformers
. Firstly, after theprojection
layer, which is followed by the Multi Headed attention. - Secondly, after the 2
FeedForward
Layers
Adapter
contains few parameters relative to the attention and feedforward layers relative to the original pre-trained model. The green layers, which you see are the one trained on the domain specific dataset.
🔍 Quantize the 32-bit Language Model to 4-bit model. This technique reduces the memory and computation requirements of the Neural Network layer by representing the weights and activations in only 4 bits. Refer quantization_utils.py
🧠 Identify the Layers that require weight updates and freeze the rest during fine-tuning. Managing the layers this way will allow the crucial layers to adapt to the new domain-specific data, while preserving the rest of the parameters of the pre-trained model.
The layer names can be identified by printing the Architecture of the model
💡 LoRA, an adapter module, which will hold its own smaller set of parameters, which are learnt during the fine-tuning, enhancing the model's flexibility and adaptability to the domain specific nuances. Refer adapter_utils.py
📚 A dataset tailored specifically to the domain is constructed as Instructions and used as a training dataset for the fine-tuning process. Refer data_utils.py
🔍 Extract the Adapter
from the Fine-tuned 4-Bit Quantized model. This Adapter
encapsulates the refined parameters tailored to the domain-sepcific data.
🧩 Integrate the Adapter
with the original Pre-Trained 32-Bit model. This fusion enables the Language Model with the domain knowledge acquired during the fine-tuning process.
💬 User provides the prompt to the Langauage Model for interaction
🚀 The Language Model generates the response for the provided Prompt.