Skip to content

Commit

Permalink
Merge pull request #27 from premAI-io/main
Browse files Browse the repository at this point in the history
Merge from main.
  • Loading branch information
Anindyadeep authored Jan 31, 2024
2 parents 747b6ca + d862e1c commit 51d05cb
Show file tree
Hide file tree
Showing 22 changed files with 672 additions and 193 deletions.
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ Take a first glance of Llama-2-7B Model Performance Metrics Across Different Pre

| Engine | float32 | float16 | int8 | int4 |
|------------------------------|--------------|----------------|---------------|---------------|
| burn | 13.12 ± 0.85 | - | - | - |
| burn | 10.04 ± 0.64 | - | - | - |
| candle | - | 36.78 ± 2.17 | - | - |
| llama.cpp | - | - | 79.15 ± 1.20 | 100.90 ± 1.46 |
| ctranslate | 35.23 ± 4.01 | 55.72 ± 16.66 | 35.73 ± 10.87 | - |
Expand All @@ -46,9 +46,9 @@ Take a first glance of Llama-2-7B Model Performance Metrics Across Different Pre
| DeepSpeed | - | 81.44 ± 8.13 | - | |
| PyTorch Lightning | 24.85 ± 0.07 | 44.56 ± 2.89 | 10.50 ± 0.12 | 24.83 ± 0.05 |
| Optimum Nvidia | 110.36 ± 0.52| 109.09 ± 4.26 | - | - |
| Nvidia TensorRT-LLM | 60.39 ± 0.62 | 101.94 ± 8.34 | - | - |
| Nvidia TensorRT-LLM | 55.19 ± 1.03 | 85.03 ± 0.62 | 167.66 ± 2.05 | 235.18 ± 3.20 |

*(Data updated: `29th January 2024`)
*(Data updated: `31th January 2024`)



Expand Down
35 changes: 35 additions & 0 deletions bench_burn/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# Burn

[![GitHub Repo](https://img.shields.io/badge/github-%23121011.svg?style=for-the-badge&logo=github&logoColor=white)](https://github.com/Gadersd/llama2-burn)  

[Burn](https://github.com/tracel-ai/burn) is a new comprehensive dynamic Deep Learning Framework built using Rust with extreme flexibility, compute efficiency and portability as its primary goals. For this benchmark implementation, we used a [forked version](https://github.com/premAI-io/llama2-burn) of the [Llama2-Burn project](https://github.com/Gadersd/llama2-burn)


### 🚀 Running the Burn Benchmark.

For running this benchmark, make sure you have [Rust installed](https://www.rust-lang.org/tools/install). You can run the Burn benchmark using the following command:

```bash
./bench_burn/bench.sh \
--prompt <value> \ # Enter a prompt string
--max_tokens <value> \ # Maximum number of tokens to output
--repetitions <value> \ # Number of repititions to be made for the prompt.
--log_file <file_path> \ # A .log file underwhich we want to write the results.
--device <cpu/cuda/metal> \ # The device in which we want to benchmark.
--models_dir <path_to_models> # The directory in which model weights are present
```

To get started quickly you can simply run:

```bash
./bench_burn/bench.sh -d cuda
```
This will take all the default values (see in the [bench.sh](/bench_burn/bench.sh) file) and do the benchmarks. You can find all the benchmarks results for Burn [here](/docs/llama2.md). The HuggingFace Llama 2 weights through a conversion process before benchmarking. See [setup.sh](/bench_burn/setup.sh) to know more.


### 👀 Some points to note:

1. Running this benchmark requires [HuggingFace Llama2-7B weights](https://huggingface.co/meta-llama/Llama-2-7b). So running this benchmark would assume that you already agreed to the required [terms and conditions](https://ai.meta.com/resources/models-and-libraries/llama-downloads/) and got verified to download the weights. This weights are dumped and converted to binaries compatible with Burn. You can find the conversion process in [setup.sh](/bench_burn/setup.sh).
2. The current implementation of Llama2-Burn only supports Float32 precision for CUDA and CPU.
3. The current implementation of Llama2-Burn does not support Metal.
4. The current implementation of Llama2-Burn does not support INT-4/8 precision quantized models.
40 changes: 28 additions & 12 deletions bench_burn/bench.sh
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

########################################################################################################
# Script: bench.sh
# Description: This script runs benchmarks burn llama benchmark.
# Description: This script runs benchmarks Burn Llama-2 benchmark.
#
# Usage: ./bench.sh [OPTIONS]
# OPTIONS:
Expand All @@ -17,6 +17,7 @@

set -euo pipefail

CURRENT_DIR="$(pwd)"
SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"

print_usage() {
Expand Down Expand Up @@ -66,16 +67,28 @@ check_platform() {
}

check_python() {
if command -v python &> /dev/null
then
echo -e "\nUsing $(python --version)."
if command -v python &> /dev/null || command -v python3 &> /dev/null; then
echo "Python is installed."
else
echo -e "\nPython does not exist."
echo "Python is not installed."
exit 1
fi
}


setup() {

# Check if Logs folder exists else Make the logs folder
LOGS_FOLDER="$CURRENT_DIR/Logs"

if [ -d "$LOGS_FOLDER" ]; then
echo "Folder '$LOGS_FOLDER' already exists. Skipping."
else
# Create the folder
mkdir "$LOGS_FOLDER"
echo "'$LOGS_FOLDER' created."
fi

echo -e "\nSetting up with $SCRIPT_DIR/setup.sh..."
bash "$SCRIPT_DIR/setup.sh" "$1"
}
Expand Down Expand Up @@ -110,7 +123,8 @@ run_benchmarks() {
)
mean=$(echo "$benchmark_output" | grep -oP '\d+\.\d+ ± \d+\.\d+' | awk -F ' ± ' '{print $1}')
std=$(echo "$benchmark_output" | grep -oP '\d+\.\d+ ± \d+\.\d+' | awk -F ' ± ' '{print $2}')
echo "burn, float16 : $(printf "%.2f" "$mean") ± $(printf "%.2f" "$std")" >> "$LOG_FILENAME"
echo "burn, float32 : $(printf "%.2f" "$mean") ± $(printf "%.2f" "$std")"
echo "burn, float32 : $(printf "%.2f" "$mean") ± $(printf "%.2f" "$std")" >> "$LOG_FILENAME"
}
# Parse command-line arguments
while [ "$#" -gt 0 ]; do
Expand Down Expand Up @@ -164,16 +178,18 @@ while [ "$#" -gt 0 ]; do
esac
done

# Set default values if not provided
PROMPT="${PROMPT:-"Explain what is a transformer"}"
REPETITIONS="${REPETITIONS:-10}"
MAX_TOKENS="${MAX_TOKENS:-100}"
DEVICE="${DEVICE:-'cpu'}"
LOG_FILENAME="${LOG_FILENAME:-"benchmark_$(date +'%Y%m%d%H%M%S').log"}"
MODELS_DIR="${MODELS_DIR:-"./models"}"

check_platform
check_rust
check_python
setup "$MODELS_DIR"

# Set default values if not provided
PROMPT="${PROMPT:-"Write an essay about the transformer model architecture"}"
REPETITIONS="${REPETITIONS:-10}"
MAX_TOKENS="${MAX_TOKENS:-512}"
DEVICE="${DEVICE:-'cuda'}"
LOG_FILENAME="${LOG_FILENAME:-"$LOGS_FOLDER/benchmark_burn_$(date +'%Y%m%d%H%M%S').log"}"

run_benchmarks "$PROMPT" "$REPETITIONS" "$MAX_TOKENS" "$DEVICE" "$LOG_FILENAME" "$MODELS_DIR"
17 changes: 15 additions & 2 deletions bench_burn/setup.sh
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,17 @@

set -euo pipefail

check_python() {
if command -v python &> /dev/null; then
PYTHON_CMD="python"
elif command -v python3 &> /dev/null; then
PYTHON_CMD="python3"
else
echo "Python is not installed."
exit 1
fi
}

if [ "$#" -ne 1 ]; then
echo "Usage: $0 <models_folder>"
exit 1
Expand All @@ -28,8 +39,10 @@ check_and_create_directory() {
fi
}

check_python

if [ ! -d "$VENV_DIR" ]; then
python -m venv "$VENV_DIR"
"$PYTHON_CMD" -m venv "$VENV_DIR"
echo "Virtual environment '$VENV_DIR' created."
# shellcheck disable=SC1091
source "$VENV_DIR/bin/activate"
Expand All @@ -47,7 +60,7 @@ if [ ! -e "$BURN_MODEL_FOLDER/$BURN_MODEL_NAME.cfg" ]; then

if [ ! -d "$BURN_MODEL_FOLDER/params" ]; then
echo "Dumping model from $BURN_MODEL_INPUT_DIR to $BURN_MODEL_FOLDER"
python "$BURN_FOLDER/llama-py/dump_model.py" "$BURN_MODEL_INPUT_DIR" "$BURN_MODEL_INPUT_DIR/tokenizer.model"
"$PYTHON_CMD" "$BURN_FOLDER/llama-py/dump_model.py" "$BURN_MODEL_INPUT_DIR" "$BURN_MODEL_INPUT_DIR/tokenizer.model"
mv "$(pwd)/params" "$BURN_MODEL_FOLDER"
cp "$BURN_MODEL_INPUT_DIR/tokenizer.model" "$BURN_MODEL_FOLDER"
else
Expand Down
34 changes: 34 additions & 0 deletions bench_candle/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# Candle

[![GitHub Repo](https://img.shields.io/badge/github-%23121011.svg?style=for-the-badge&logo=github&logoColor=white)](https://github.com/huggingface/candle) &nbsp;

[Candle](https://github.com/huggingface/candle) is a minimalistic Machine/Deep Learning framework written on Rust by [huggingface](https://github.com/huggingface). It tries to provide a simpler interface to implement models along with GPU support. This is a modified implementation of [Llama2-Candle example](https://github.com/huggingface/candle/blob/main/candle-examples/examples/llama/main.rs) to analyse the benchmark performance across different devices and precision.


### 🚀 Running the Candle Benchmark.

For running this benchmark, make sure you have [Rust installed](https://www.rust-lang.org/tools/install). You can run the Candle benchmark using the following command:

```bash
./bench_candle/bench.sh \
--prompt <value> \ # Enter a prompt string
--max_tokens <value> \ # Maximum number of tokens to output
--repetitions <value> \ # Number of repititions to be made for the prompt.
--log_file <file_path> \ # A .log file underwhich we want to write the results.
--device <cpu/cuda/metal> \ # The device in which we want to benchmark.
--models_dir <path_to_models> # The directory in which model weights are present
```

To get started quickly you can simply run:

```bash
./bench_candle/bench.sh -d cuda
```
This will take all the default values (see in the [bench.sh](/bench_candle/bench.sh) file) and perform the benchmarks. You can find all the benchmarks results for Candle [here](/docs/llama2.md).


### 👀 Some points to note:

1. Running this benchmark requires [HuggingFace Llama2-7B weights](https://huggingface.co/meta-llama/Llama-2-7b). So running this benchmark would assume that you already agreed to the required [terms and conditions](https://ai.meta.com/resources/models-and-libraries/llama-downloads/) and got verified to download the weights.
2. Candle does not have support for Metal devices.
3. Candles does support [quantized models](https://github.com/huggingface/candle/blob/main/candle-examples/examples/quantized/main.rs). The benchmarks for quantized candles model will be available in the next versions.
35 changes: 35 additions & 0 deletions bench_onnxruntime/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# ONNX Runtime

[![GitHub Repo](https://img.shields.io/badge/github-%23121011.svg?style=for-the-badge&logo=github&logoColor=white)](https://github.com/ggerganov/llama.cpp) &nbsp;


[ONNX (Open Neural Network Exchange) Runtime](https://github.com/microsoft/onnxruntime) is an open-source, cross-platform runtime that enables efficient execution of neural network models trained in various frameworks, promoting interoperability and flexibility in deploying machine learning models. This benchmark implementation uses [HuggingFace Optimum](https://github.com/huggingface/optimum) which supports models running under ONNX Runtime.

### 🚀 Running the ONNX Runtime Benchmark.

You can run the ONNX Runtime benchmark using the following command:

```bash
./bench_onnxruntime/bench.sh \
--prompt <value> \ # Enter a prompt string
--max_tokens <value> \ # Maximum number of tokens to output
--repetitions <value> \ # Number of repititions to be made for the prompt.
--log_file <file_path> \ # A .log file underwhich we want to write the results.
--device <cpu/cuda/metal> \ # The device in which we want to benchmark.
--models_dir <path_to_models> # The directory in which model weights are present
```

To get started quickly you can simply run:

```bash
./bench_onnxruntime/bench.sh -d cuda
```
This will take all the default values (see in the [bench.sh](/bench_onnxruntime/bench.sh) file) and perform the benchmarks. You can find all the benchmarks results for ONNX Runtime [here](/docs/llama2.md).


### 👀 Some points to note:

1. ONNX Runtime requires HuggingFace Llama2-7B weights. And it converts those weights into ONNX format using this [setup.sh](/bench_onnxruntime/setup.sh) script. So running this benchmark would assume that you already agree to the required terms and conditions and verified to download the weights.
2. ONNX Runtime GPU only support Float16 precision format.
3. Running LLama 2 using ONNX Runtime in CPU/Metal is too memory intensive, so benchmarking is skipped for those.
4. Please note that you might not be able to run the current implementation of onnx benchmark. Since it requires some specific dependencies and anaconda support. A fix will be reflected in the upcoming versions.
4 changes: 3 additions & 1 deletion bench_onnxruntime/bench.py
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,9 @@ def benchmark(self, prompt, max_tokens, repetitions):


if __name__ == "__main__":
parser = argparse.ArgumentParser(description="llama.cpp Benchmark Llama model.")
parser = argparse.ArgumentParser(
description="ONXX Runtime Benchmark for Llama model."
)
parser.add_argument(
"--prompt",
type=str,
Expand Down
58 changes: 37 additions & 21 deletions bench_onnxruntime/bench.sh
Original file line number Diff line number Diff line change
Expand Up @@ -2,36 +2,36 @@

########################################################################################################
# Script: bench.sh
# Description: This script runs benchmarks onnxruntime llama benchmark.
# Description: This script runs benchmarks ONNX Runtime Llama-2 benchmark.
#
# Usage: ./bench.sh [OPTIONS]
# OPTIONS:
# -p, --prompt Prompt for benchmarks (default: 'Explain what is a transformer')
# -r, --repetitions Number of repetitions for benchmarks (default: 2)
# -m, --max_tokens Maximum number of tokens for benchmarks (default: 100)
# -d, --device Device for benchmarks (possible values: 'metal', 'gpu', and 'cpu', default: 'cpu')
# -p, --prompt Prompt for benchmarks (default: 'Write an essay about the transformer model architecture')
# -r, --repetitions Number of repetitions for benchmarks (default: 10)
# -m, --max_tokens Maximum number of tokens for benchmarks (default: 512)
# -d, --device Device for benchmarks (possible values: 'metal', 'cuda', and 'cpu', default: 'cuda')
# -lf, --log_file Logging file name.
# -md, --models_dir Models directory.
# -h, --help Show this help message
########################################################################################################

set -euo pipefail

CURRENT_DIR="$(pwd)"
SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"

print_usage() {
echo "Usage: $0 [OPTIONS]"
echo "OPTIONS:"
echo " -p, --prompt Prompt for benchmarks (default: 'Explain what is a transformer')"
echo " -r, --repetitions Number of repetitions for benchmarks (default: 2)"
echo " -m, --max_tokens Maximum number of tokens for benchmarks (default: 100)"
echo " -d, --device Device for benchmarks (possible values: 'metal', 'gpu', and 'cpu', default: 'cpu')"
echo " -p, --prompt Prompt for benchmarks (default: 'Write an essay about the transformer model architecture')"
echo " -r, --repetitions Number of repetitions for benchmarks (default: 10)"
echo " -m, --max_tokens Maximum number of tokens for benchmarks (default: 512)"
echo " -d, --device Device for benchmarks (possible values: 'metal', 'cuda', and 'cpu', default: 'cuda')"
echo " -lf, --log_file Logging file name."
echo " -md, --models_dir Models directory."
echo " -h, --help Show this help message"
exit 1
}

check_cuda() {
if command -v nvcc &> /dev/null
then
Expand All @@ -57,16 +57,29 @@ check_platform() {
}

check_python() {
if command -v python &> /dev/null
then
echo -e "\nUsing $(python --version)."
if command -v python &> /dev/null; then
PYTHON_CMD="python"
elif command -v python3 &> /dev/null; then
PYTHON_CMD="python3"
else
echo -e "\nPython does not exist."
echo "Python is not installed."
exit 1
fi
}

setup() {

# Check if Logs folder exists else Make the logs folder
LOGS_FOLDER="$CURRENT_DIR/Logs"

if [ -d "$LOGS_FOLDER" ]; then
echo "Folder '$LOGS_FOLDER' already exists. Skipping."
else
# Create the folder
mkdir "$LOGS_FOLDER"
echo "'$LOGS_FOLDER' created."
fi

echo -e "\nSetting up with $SCRIPT_DIR/setup.sh..."
bash "$SCRIPT_DIR"/setup.sh "$1"
}
Expand All @@ -82,7 +95,7 @@ run_benchmarks() {
# shellcheck disable=SC1091
source "$SCRIPT_DIR/venv/bin/activate"

python "$SCRIPT_DIR"/bench.py \
"$PYTHON_CMD" "$SCRIPT_DIR"/bench.py \
--prompt "$PROMPT" \
--repetitions "$REPETITIONS" \
--max_tokens "$MAX_TOKENS" \
Expand Down Expand Up @@ -147,15 +160,18 @@ while [ "$#" -gt 0 ]; do
;;
esac
done
# Set default values if not provided
PROMPT="${PROMPT:-"Explain what is a transformer"}"
REPETITIONS="${REPETITIONS:-10}"
MAX_TOKENS="${MAX_TOKENS:-100}"
DEVICE="${DEVICE:-'cpu'}"
LOG_FILENAME="${LOG_FILENAME:-"benchmark_$(date +'%Y%m%d%H%M%S').log"}"

MODELS_DIR="${MODELS_DIR:-"./models"}"

check_platform
check_python
setup "$MODELS_DIR"

# Set default values if not provided
PROMPT="${PROMPT:-"Write an essay about the transformer model architecture"}"
REPETITIONS="${REPETITIONS:-10}"
MAX_TOKENS="${MAX_TOKENS:-512}"
DEVICE="${DEVICE:-'cuda'}"
LOG_FILENAME="${LOG_FILENAME:-"$LOGS_FOLDER/benchmark_onnx_$(date +'%Y%m%d%H%M%S').log"}"

run_benchmarks "$PROMPT" "$REPETITIONS" "$MAX_TOKENS" "$DEVICE" "$LOG_FILENAME" "$MODELS_DIR"
Loading

0 comments on commit 51d05cb

Please sign in to comment.