Skip to content

Latest commit

 

History

History

llm-question-answering

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 

LLM Instruction-following pipeline with OpenVINO

LLM stands for “Large Language Model”, which refers to a type of artificial intelligence model that is designed to understand and generate human-like text based on the input it receives. LLMs are trained on large datasets of text to learn patterns, grammar, and semantic relationships, allowing them to generate coherent and contextually relevant responses. One core capability of Large Language Models (LLMs) is to follow natural language instructions. Instruction-following models are capable of generating text in response to prompts and are often used for tasks like writing assistance, chatbots, and content generation.

In this tutorial, we consider how to run an instruction-following text generation pipeline using popular LLMs and OpenVINO. We will use pre-trained models from the Hugging Face Transformers library and convert them to OpenVINO™ IR format using The Hugging Face Optimum Intel library. To simplify the user experience, we will use OpenVINO Generate API for generation of instruction-following inference pipeline.

The tutorial supports different models, you can select one from provided options to compare quality of open source LLM solutions.

The available options are:

  • tiny-llama-1b-chat - This is the chat model finetuned on top of TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T. The TinyLlama project aims to pretrain a 1.1B Llama model on 3 trillion tokens with the adoption of the same architecture and tokenizer as Llama 2. This means TinyLlama can be plugged and played in many open-source projects built upon Llama. Besides, TinyLlama is compact with only 1.1B parameters. This compactness allows it to cater to a multitude of applications demanding a restricted computation and memory footprint. More details about model can be found in model card
  • phi-2 - Phi-2 is a Transformer with 2.7 billion parameters. It was trained using the same data sources as Phi-1.5, augmented with a new data source that consists of various NLP synthetic texts and filtered websites (for safety and educational value). When assessed against benchmarks testing common sense, language understanding, and logical reasoning, Phi-2 showcased a nearly state-of-the-art performance among models with less than 13 billion parameters. More details about model can be found in model card.
  • dolly-v2-3b - Dolly 2.0 is an instruction-following large language model trained on the Databricks machine-learning platform that is licensed for commercial use. It is based on Pythia and is trained on ~15k instruction/response fine-tuning records generated by Databricks employees in various capability domains, including brainstorming, classification, closed QA, generation, information extraction, open QA, and summarization. Dolly 2.0 works by processing natural language instructions and generating responses that follow the given instructions. It can be used for a wide range of applications, including closed question-answering, summarization, and generation. More details about model can be found in model card.
  • red-pajama-3b-instruct - A 2.8B parameter pre-trained language model based on GPT-NEOX architecture. The model was fine-tuned for few-shot applications on the data of GPT-JT, with exclusion of tasks that overlap with the HELM core scenarios.More details about model can be found in model card.
  • mistral-7b - The Mistral-7B-v0.2 Large Language Model (LLM) is a pretrained generative text model with 7 billion parameters. You can find more details about model in the model card, paper and release blog post.

Notebook Contents

The tutorial consists of the following steps:

  • Install prerequisites
  • Download the model from Hugging Face and convert it to OpenVINO IR via optimum-intel CLI
  • Compress model weights to INT8 and INT4 with OpenVINO NNCF
  • Create an instruction-following inference pipeline with OpenVINO Generate API
  • Run instruction-following pipeline

The image below illustrates the provided user instruction and model answer examples.

phi2-example.png

Installation Instructions

This is a self-contained example that relies solely on its own code.
We recommend running the notebook in a virtual environment. You only need a Jupyter server to start. For details, please refer to Installation Guide.