Merge branch 'main' of https://github.com/DiweiSun/intel-extension-fo…

…r-pytorch
intel · May 30, 2024 · 5e91ea9 · 5e91ea9
2 parents 70c1561 + 592aabf
commit 5e91ea9
Show file tree

Hide file tree

Showing 77 changed files with 3,599 additions and 781 deletions.
diff --git a/.lintrunner.toml b/.lintrunner.toml
@@ -0,0 +1,31 @@
+[[linter]]
+code = 'FLAKE8'
+include_patterns = ['*.py']
+exclude_patterns = [
+    '.git/**',
+]
+command = [
+    'python3',
+    'scripts/tools/setup/flake8.py',
+    '--',
+    '@{{PATHSFILE}}'
+]
+
+init_command = [
+    'python',
+    '-m',
+    'lintrunner_adapters',
+    'run',
+    'pip_init',
+    '--dry-run={{DRYRUN}}',
+    'flake8==3.8.2',
+    'flake8-bugbear==20.1.4',
+    'flake8-comprehensions==3.3.0',
+    'flake8-executable==2.0.4',
+    # 'git+https://github.com/malfet/flake8-coding.git',
+    'flake8-pyi==20.5.0',
+    'mccabe==0.6.1',
+    'pycodestyle==2.6.0',
+    'pyflakes==2.2.0',
+    'black==24.3.0',
+]
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -180,7 +180,9 @@ For example, if you wanted to run the test `MayContainAlias`, which is part of t
 ### Python Code
 We can find python code style utils in `scripts/tools/setup` folder. Please install the related dependency python modules:
 ```bash
-pip install -r scripts/tools/setup/requirements-flake8.txt
+pip install lintrunner
+pip install lintrunner-adapters
+lintrunner init
 ```
 Please run flake8.py to auto-format python code and check the python code style. The script will return results, please manual modify code follow the output information, and until it shows pass:
 ```bash

diff --git a/README.md b/README.md
@@ -5,7 +5,7 @@ Intel® Extension for PyTorch\*
 
 </div>
 
-**CPU** [💻main branch](https://github.com/intel/intel-extension-for-pytorch/tree/main)&nbsp;&nbsp;&nbsp;|&nbsp;&nbsp;&nbsp;[🌱Quick Start](https://intel.github.io/intel-extension-for-pytorch/cpu/latest/tutorials/getting_started.html)&nbsp;&nbsp;&nbsp;|&nbsp;&nbsp;&nbsp;[📖Documentations](https://intel.github.io/intel-extension-for-pytorch/cpu/latest/)&nbsp;&nbsp;&nbsp;|&nbsp;&nbsp;&nbsp;[🏃Installation](https://intel.github.io/intel-extension-for-pytorch/index.html#installation?platform=cpu&version=v2.2.0%2Bcpu)&nbsp;&nbsp;&nbsp;|&nbsp;&nbsp;&nbsp;[💻LLM Example](https://github.com/intel/intel-extension-for-pytorch/tree/main/examples/cpu/inference/python/llm) <br>
+**CPU** [💻main branch](https://github.com/intel/intel-extension-for-pytorch/tree/main)&nbsp;&nbsp;&nbsp;|&nbsp;&nbsp;&nbsp;[🌱Quick Start](https://intel.github.io/intel-extension-for-pytorch/cpu/latest/tutorials/getting_started.html)&nbsp;&nbsp;&nbsp;|&nbsp;&nbsp;&nbsp;[📖Documentations](https://intel.github.io/intel-extension-for-pytorch/cpu/latest/)&nbsp;&nbsp;&nbsp;|&nbsp;&nbsp;&nbsp;[🏃Installation](https://intel.github.io/intel-extension-for-pytorch/index.html#installation?platform=cpu&version=v2.3.0%2Bcpu)&nbsp;&nbsp;&nbsp;|&nbsp;&nbsp;&nbsp;[💻LLM Example](https://github.com/intel/intel-extension-for-pytorch/tree/main/examples/cpu/inference/python/llm) <br>
 **GPU** [💻main branch](https://github.com/intel/intel-extension-for-pytorch/tree/xpu-main)&nbsp;&nbsp;&nbsp;|&nbsp;&nbsp;&nbsp;[🌱Quick Start](https://intel.github.io/intel-extension-for-pytorch/xpu/latest/tutorials/getting_started.html)&nbsp;&nbsp;&nbsp;|&nbsp;&nbsp;&nbsp;[📖Documentations](https://intel.github.io/intel-extension-for-pytorch/xpu/latest/)&nbsp;&nbsp;&nbsp;|&nbsp;&nbsp;&nbsp;[🏃Installation](https://intel.github.io/intel-extension-for-pytorch/index.html#installation?platform=gpu)&nbsp;&nbsp;&nbsp;|&nbsp;&nbsp;&nbsp;[💻LLM Example](https://github.com/intel/intel-extension-for-pytorch/tree/xpu-main/examples/gpu/inference/python/llm)<br>  
 
 Intel® Extension for PyTorch\* extends PyTorch\* with up-to-date features optimizations for an extra performance boost on Intel hardware. Optimizations take advantage of Intel® Advanced Vector Extensions 512 (Intel® AVX-512) Vector Neural Network Instructions (VNNI) and Intel® Advanced Matrix Extensions (Intel® AMX) on Intel CPUs as well as Intel X<sup>e</sup> Matrix Extensions (XMX) AI engines on Intel discrete GPUs. Moreover, Intel® Extension for PyTorch* provides easy GPU acceleration for Intel discrete GPUs through the PyTorch* xpu device.
@@ -19,28 +19,35 @@ In the current technological landscape, Generative AI (GenAI) workloads and mode
 | MODEL FAMILY | MODEL NAME (Huggingface hub) | FP32 | BF16 | Static quantization INT8 | Weight only quantization INT8 | Weight only quantization INT4 |
 |:---:|:---:|:---:|:---:|:---:|:---:|:---:|
 |LLAMA| meta-llama/Llama-2-7b-hf | 🟩 | 🟩 | 🟩 | 🟩 | 🟨 |
-|LLAMA| meta-llama/Llama-2-13b-hf | 🟩 | 🟩 | 🟩 | 🟩 | 🟨 |
-|LLAMA| meta-llama/Llama-2-70b-hf | 🟩 | 🟩 | 🟩 | 🟩 | 🟨 |
+|LLAMA| meta-llama/Llama-2-13b-hf | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 |
+|LLAMA| meta-llama/Llama-2-70b-hf | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 |
+|LLAMA| meta-llama/Meta-Llama-3-8B | 🟩 | 🟩 | 🟨 | 🟩 |   |
+|LLAMA| meta-llama/Meta-Llama-3-70B | 🟩 | 🟩 | 🟨 | 🟩 | 🟨 |
 |GPT-J| EleutherAI/gpt-j-6b | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 |
 |GPT-NEOX| EleutherAI/gpt-neox-20b | 🟩 | 🟨 | 🟨 | 🟩 | 🟨 |
 |DOLLY| databricks/dolly-v2-12b | 🟩 | 🟨 | 🟨 | 🟩 | 🟨 |
+|FALCON| tiiuae/falcon-7b  | 🟩 | 🟩 | 🟩 | 🟩 |   |
 |FALCON| tiiuae/falcon-40b | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 |
 |OPT| facebook/opt-30b | 🟩 | 🟩 | 🟩 | 🟩 | 🟨 |
 |OPT| facebook/opt-1.3b | 🟩 | 🟩 | 🟩 | 🟩 | 🟨 |
 |Bloom| bigscience/bloom-1b7 | 🟩 | 🟨 | 🟩 | 🟩  | 🟨 |
-|CodeGen| Salesforce/codegen-2B-multi | 🟩 | 🟩 | 🟨 | 🟩 | 🟩 |
+|CodeGen| Salesforce/codegen-2B-multi | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 |
 |Baichuan| baichuan-inc/Baichuan2-7B-Chat | 🟩 | 🟩 | 🟩 | 🟩 |    |
-|Baichuan| baichuan-inc/Baichuan2-13B-Chat | 🟩 | 🟩 | 🟩 | 🟩 |    |
+|Baichuan| baichuan-inc/Baichuan2-13B-Chat | 🟩 | 🟩 | 🟨 | 🟩 |    |
 |Baichuan| baichuan-inc/Baichuan-13B-Chat | 🟩 | 🟨 | 🟩 | 🟩 |    |
 |ChatGLM| THUDM/chatglm3-6b | 🟩 | 🟩 | 🟨 | 🟩 |    |
 |ChatGLM| THUDM/chatglm2-6b | 🟩 | 🟩 | 🟨 | 🟩 |    |
 |GPTBigCode| bigcode/starcoder | 🟩 | 🟩 | 🟨 | 🟩 | 🟨 |
-|T5| google/flan-t5-xl | 🟩 | 🟩 | 🟨 | 🟩 |    |
+|T5| google/flan-t5-xl | 🟩 | 🟩 |   | 🟩 |    |
+|MPT| mosaicml/mpt-7b | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 |
 |Mistral| mistralai/Mistral-7B-v0.1 | 🟩 | 🟩 | 🟨 | 🟩 | 🟨 |
-|MPT| mosaicml/mpt-7b | 🟩 | 🟩 | 🟨 | 🟩 | 🟩 |
-|Mixtral| mistralai/Mixtral-8x7B-v0.1 | 🟩 | 🟩 |  |  🟩 |  |
-|Stablelm| stabilityai/stablelm-2-1_6b | 🟩 | 🟩 |  |  🟨 |  |
-|Qwen| Qwen/Qwen-7B-Chat | 🟩 | 🟩 |  |  🟩 |  |
+|Mixtral| mistralai/Mixtral-8x7B-v0.1 | 🟩 | 🟩 |  |  🟩 | 🟨 |
+|Stablelm| stabilityai/stablelm-2-1_6b | 🟩 | 🟩 | 🟨 | 🟩 | 🟨 |
+|Qwen| Qwen/Qwen-7B-Chat | 🟩 | 🟩 | 🟨 |  🟩 |  |
+|LLaVA| liuhaotian/llava-v1.5-7b | 🟩 | 🟩 |   | 🟩 |    |
+|GIT| microsoft/git-base | 🟩 | 🟩 |   | 🟩 |    |
+|Yuan| IEITYuan/Yuan2-102B-hf | 🟩 | 🟩 |   | 🟨 |    |
+|Phi| microsoft/phi-2 | 🟩 | 🟩 | 🟩 | 🟩 | 🟨 |
 
 - 🟩 signifies that the model can perform well and with good accuracy (<1% difference as compared with FP32).
 
@@ -49,6 +56,10 @@ In the current technological landscape, Generative AI (GenAI) workloads and mode
 *Note*: The above verified models (including other models in the same model family, like "codellama/CodeLlama-7b-hf" from LLAMA family) are well supported with all optimizations like indirect access KV cache, fused ROPE, and prepacked TPP Linear (fp32/bf16).
 We are working in progress to better support the models in the tables with various data types. In addition, more models will be optimized in the future.
 
+In addition, Intel® Extension for PyTorch* introduces module level optimization APIs (prototype feature) since release 2.3.0.
+The feature provides optimized alternatives for several commonly used LLM modules and functionalities for the optimizations of the niche or customized LLMs.
+Please read [**LLM module level optimization practice**](./examples/cpu/inference/python/llm-modeling) to better understand how to optimize your own LLM and achieve better performance.
+
 ## Support
 
 The team tracks bugs and enhancement requests using [GitHub issues](https://github.com/intel/intel-extension-for-pytorch/issues/). Before submitting a suggestion or bug report, search the existing GitHub issues to see if your issue has already been reported.

diff --git a/csrc/cpu/CMakeLists.txt b/csrc/cpu/CMakeLists.txt
@@ -251,14 +251,6 @@ if(BUILD_STRIPPED_BIN)
   set_target_properties(${PLUGIN_NAME_CPU} PROPERTIES LINK_FLAGS_RELEASE -s)
 endif()
 
-find_package(PythonLibs)
-if(${PYTHONLIBS_FOUND})
-  target_link_libraries(${PLUGIN_NAME_CPU} PUBLIC ${PYTHON_LIBRARIES})
-endif()
-
-find_library(TORCH_PYTHON_LIBRARY torch_python PATH "${TORCH_INSTALL_PREFIX}/lib")
-target_link_libraries(${PLUGIN_NAME_CPU}  PRIVATE ${TORCH_LIBRARIES} ${TORCH_PYTHON_LIBRARY})
-
 install(TARGETS ${PLUGIN_NAME_CPU}
   ARCHIVE DESTINATION ${CMAKE_INSTALL_LIBDIR}
   LIBRARY DESTINATION ${CMAKE_INSTALL_LIBDIR}

diff --git a/csrc/cpu/aten/AddLayerNorm.cpp b/csrc/cpu/aten/AddLayerNorm.cpp
@@ -4,7 +4,7 @@
 //  https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/layer_norm.cpp
 
 #include "AddLayerNorm.h"
-
+#include <torch/all.h>
 #include <torch/csrc/autograd/function.h>
 
 namespace torch_ipex {
@@ -57,5 +57,32 @@ at::Tensor dil_add_layernorm(
     return at::layer_norm(add_res, normalized_shape, weight_opt, bias_opt, eps);
   }
 }
+
+// register as a python op
+at::Tensor add_layernorm(
+    const at::Tensor& a,
+    const at::Tensor& b,
+    int64_t alpha,
+    at::IntArrayRef normalized_shape,
+    const c10::optional<at::Tensor>& weight_opt,
+    const c10::optional<at::Tensor>& bias_opt,
+    double eps) {
+  RECORD_FUNCTION("add_layernorm", c10::ArrayRef<c10::IValue>({}));
+  return dil_add_layernorm(
+      a, b, alpha, normalized_shape, weight_opt, bias_opt, eps, false);
+}
+
 } // namespace cpu
 } // namespace torch_ipex
+
+namespace {
+
+TORCH_LIBRARY_FRAGMENT(torch_ipex, m) {
+  m.def(
+      "add_layernorm(Tensor a, Tensor b, int alpha, int[] normalized_shape, Tensor ? weight_opt, \
+        Tensor ? bias_opt, float eps) -> Tensor");
+  m.impl(
+      "add_layernorm", c10::DispatchKey::CPU, torch_ipex::cpu::add_layernorm);
+}
+
+} // namespace
diff --git a/csrc/cpu/aten/AddLayerNorm.h b/csrc/cpu/aten/AddLayerNorm.h
@@ -81,6 +81,16 @@ at::Tensor dil_add_layernorm(
     float eps,
     bool cuda_enable);
 
+// register as a python op
+at::Tensor add_layernorm(
+    const at::Tensor& a,
+    const at::Tensor& b,
+    int64_t alpha,
+    at::IntArrayRef normalized_shape,
+    const c10::optional<at::Tensor>& weight_opt,
+    const c10::optional<at::Tensor>& bias_opt,
+    double eps);
+
 namespace {
 
 at::Tensor add_layer_norm_kernel_impl(