-
Notifications
You must be signed in to change notification settings - Fork 5
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #27 from premAI-io/main
Merge from main.
- Loading branch information
Showing
22 changed files
with
672 additions
and
193 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,35 @@ | ||
# Burn | ||
|
||
[![GitHub Repo](https://img.shields.io/badge/github-%23121011.svg?style=for-the-badge&logo=github&logoColor=white)](https://github.com/Gadersd/llama2-burn) | ||
|
||
[Burn](https://github.com/tracel-ai/burn) is a new comprehensive dynamic Deep Learning Framework built using Rust with extreme flexibility, compute efficiency and portability as its primary goals. For this benchmark implementation, we used a [forked version](https://github.com/premAI-io/llama2-burn) of the [Llama2-Burn project](https://github.com/Gadersd/llama2-burn) | ||
|
||
|
||
### 🚀 Running the Burn Benchmark. | ||
|
||
For running this benchmark, make sure you have [Rust installed](https://www.rust-lang.org/tools/install). You can run the Burn benchmark using the following command: | ||
|
||
```bash | ||
./bench_burn/bench.sh \ | ||
--prompt <value> \ # Enter a prompt string | ||
--max_tokens <value> \ # Maximum number of tokens to output | ||
--repetitions <value> \ # Number of repititions to be made for the prompt. | ||
--log_file <file_path> \ # A .log file underwhich we want to write the results. | ||
--device <cpu/cuda/metal> \ # The device in which we want to benchmark. | ||
--models_dir <path_to_models> # The directory in which model weights are present | ||
``` | ||
|
||
To get started quickly you can simply run: | ||
|
||
```bash | ||
./bench_burn/bench.sh -d cuda | ||
``` | ||
This will take all the default values (see in the [bench.sh](/bench_burn/bench.sh) file) and do the benchmarks. You can find all the benchmarks results for Burn [here](/docs/llama2.md). The HuggingFace Llama 2 weights through a conversion process before benchmarking. See [setup.sh](/bench_burn/setup.sh) to know more. | ||
|
||
|
||
### 👀 Some points to note: | ||
|
||
1. Running this benchmark requires [HuggingFace Llama2-7B weights](https://huggingface.co/meta-llama/Llama-2-7b). So running this benchmark would assume that you already agreed to the required [terms and conditions](https://ai.meta.com/resources/models-and-libraries/llama-downloads/) and got verified to download the weights. This weights are dumped and converted to binaries compatible with Burn. You can find the conversion process in [setup.sh](/bench_burn/setup.sh). | ||
2. The current implementation of Llama2-Burn only supports Float32 precision for CUDA and CPU. | ||
3. The current implementation of Llama2-Burn does not support Metal. | ||
4. The current implementation of Llama2-Burn does not support INT-4/8 precision quantized models. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
# Candle | ||
|
||
[![GitHub Repo](https://img.shields.io/badge/github-%23121011.svg?style=for-the-badge&logo=github&logoColor=white)](https://github.com/huggingface/candle) | ||
|
||
[Candle](https://github.com/huggingface/candle) is a minimalistic Machine/Deep Learning framework written on Rust by [huggingface](https://github.com/huggingface). It tries to provide a simpler interface to implement models along with GPU support. This is a modified implementation of [Llama2-Candle example](https://github.com/huggingface/candle/blob/main/candle-examples/examples/llama/main.rs) to analyse the benchmark performance across different devices and precision. | ||
|
||
|
||
### 🚀 Running the Candle Benchmark. | ||
|
||
For running this benchmark, make sure you have [Rust installed](https://www.rust-lang.org/tools/install). You can run the Candle benchmark using the following command: | ||
|
||
```bash | ||
./bench_candle/bench.sh \ | ||
--prompt <value> \ # Enter a prompt string | ||
--max_tokens <value> \ # Maximum number of tokens to output | ||
--repetitions <value> \ # Number of repititions to be made for the prompt. | ||
--log_file <file_path> \ # A .log file underwhich we want to write the results. | ||
--device <cpu/cuda/metal> \ # The device in which we want to benchmark. | ||
--models_dir <path_to_models> # The directory in which model weights are present | ||
``` | ||
|
||
To get started quickly you can simply run: | ||
|
||
```bash | ||
./bench_candle/bench.sh -d cuda | ||
``` | ||
This will take all the default values (see in the [bench.sh](/bench_candle/bench.sh) file) and perform the benchmarks. You can find all the benchmarks results for Candle [here](/docs/llama2.md). | ||
|
||
|
||
### 👀 Some points to note: | ||
|
||
1. Running this benchmark requires [HuggingFace Llama2-7B weights](https://huggingface.co/meta-llama/Llama-2-7b). So running this benchmark would assume that you already agreed to the required [terms and conditions](https://ai.meta.com/resources/models-and-libraries/llama-downloads/) and got verified to download the weights. | ||
2. Candle does not have support for Metal devices. | ||
3. Candles does support [quantized models](https://github.com/huggingface/candle/blob/main/candle-examples/examples/quantized/main.rs). The benchmarks for quantized candles model will be available in the next versions. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,35 @@ | ||
# ONNX Runtime | ||
|
||
[![GitHub Repo](https://img.shields.io/badge/github-%23121011.svg?style=for-the-badge&logo=github&logoColor=white)](https://github.com/ggerganov/llama.cpp) | ||
|
||
|
||
[ONNX (Open Neural Network Exchange) Runtime](https://github.com/microsoft/onnxruntime) is an open-source, cross-platform runtime that enables efficient execution of neural network models trained in various frameworks, promoting interoperability and flexibility in deploying machine learning models. This benchmark implementation uses [HuggingFace Optimum](https://github.com/huggingface/optimum) which supports models running under ONNX Runtime. | ||
|
||
### 🚀 Running the ONNX Runtime Benchmark. | ||
|
||
You can run the ONNX Runtime benchmark using the following command: | ||
|
||
```bash | ||
./bench_onnxruntime/bench.sh \ | ||
--prompt <value> \ # Enter a prompt string | ||
--max_tokens <value> \ # Maximum number of tokens to output | ||
--repetitions <value> \ # Number of repititions to be made for the prompt. | ||
--log_file <file_path> \ # A .log file underwhich we want to write the results. | ||
--device <cpu/cuda/metal> \ # The device in which we want to benchmark. | ||
--models_dir <path_to_models> # The directory in which model weights are present | ||
``` | ||
|
||
To get started quickly you can simply run: | ||
|
||
```bash | ||
./bench_onnxruntime/bench.sh -d cuda | ||
``` | ||
This will take all the default values (see in the [bench.sh](/bench_onnxruntime/bench.sh) file) and perform the benchmarks. You can find all the benchmarks results for ONNX Runtime [here](/docs/llama2.md). | ||
|
||
|
||
### 👀 Some points to note: | ||
|
||
1. ONNX Runtime requires HuggingFace Llama2-7B weights. And it converts those weights into ONNX format using this [setup.sh](/bench_onnxruntime/setup.sh) script. So running this benchmark would assume that you already agree to the required terms and conditions and verified to download the weights. | ||
2. ONNX Runtime GPU only support Float16 precision format. | ||
3. Running LLama 2 using ONNX Runtime in CPU/Metal is too memory intensive, so benchmarking is skipped for those. | ||
4. Please note that you might not be able to run the current implementation of onnx benchmark. Since it requires some specific dependencies and anaconda support. A fix will be reflected in the upcoming versions. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.