Skip to content

Latest commit

 

History

History
158 lines (110 loc) · 4.7 KB

README.md

File metadata and controls

158 lines (110 loc) · 4.7 KB

Langchain Example

The examples in this folder shows how to use LangChain with ipex-llm on Intel GPU.

Tip

For more information, please refer to the upstream LangChain LLM documentation with IPEX-LLM here, and upstream LangChain embedding model documentation with IPEX-LLM here.

0. Requirements

To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to here for more information.

1. Install

1.1 Installation on Linux

We suggest using conda to manage environment:

conda create -n llm python=3.11
conda activate llm

pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/

1.2 Installation on Windows

We suggest using conda to manage environment:

conda create -n llm python=3.11 libuv
conda activate llm

pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/

2. Configures OneAPI environment variables for Linux

Note

Skip this step if you are running on Windows.

This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.

source /opt/intel/oneapi/setvars.sh

3. Runtime Configurations

For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.

3.1 Configurations for Linux

For Intel Arc™ A-Series Graphics and Intel Data Center GPU Flex Series
export USE_XETLA=OFF
export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
export SYCL_CACHE_PERSISTENT=1
For Intel Data Center GPU Max Series
export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
export SYCL_CACHE_PERSISTENT=1
export ENABLE_SDP_FUSION=1

Note: Please note that libtcmalloc.so can be installed by conda install -c conda-forge -y gperftools=2.10.

For Intel iGPU
export SYCL_CACHE_PERSISTENT=1

3.2 Configurations for Windows

For Intel iGPU and Intel Arc™ A-Series Graphics
set SYCL_CACHE_PERSISTENT=1

Note

For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.

4. Run examples with LangChain

4.1. Example: Streaming Chat

Install LangChain dependencies:

pip install -U langchain langchain-community

In the current directory, run the example with command:

python chat.py -m MODEL_PATH -q QUESTION

Additional Parameters for Configuration:

  • -m MODEL_PATH: required, path to the model
  • -q QUESTION: question to ask. Default is What is AI?.

4.2. Example: Retrival Augmented Generation (RAG)

The RAG example (rag.py) shows how to load the input text into vector database, and then use LangChain to build a retrival pipeline.

Install LangChain dependencies:

pip install -U langchain langchain-community langchain-chroma sentence-transformers==3.0.1

In the current directory, run the example with command:

python rag.py -m <path_to_llm_model> -e <path_to_embedding_model> [-q QUESTION] [-i INPUT_PATH]

Additional Parameters for Configuration:

  • -m LLM_MODEL_PATH: required, path to the model.
  • -e EMBEDDING_MODEL_PATH: required, path to the embedding model.
  • -q QUESTION: question to ask. Default is What is IPEX-LLM?.
  • -i INPUT_PATH: path to the input doc.

4.3. Example: Low Bit

The low_bit example (low_bit.py) showcases how to use use LangChain with low_bit optimized model.LangChain By save_low_bit we save the weights of low_bit model into the target folder.

Note

save_low_bit only saves the weights of the model. Users could copy the tokenizer model into the target folder or specify tokenizer_id during initialization.

Install LangChain dependencies:

pip install -U langchain langchain-community

In the current directory, run the example with command:

python low_bit.py -m <path_to_model> -t <path_to_target> [-q <your question>]

Additional Parameters for Configuration:

  • -m MODEL_PATH: Required, the path to the model
  • -t TARGET_PATH: Required, the path to save the low_bit model
  • -q QUESTION: question to ask. Default is What is AI?.