Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable BNB multi-backend support #31098

Merged
merged 66 commits into from
Sep 24, 2024
Merged
Show file tree
Hide file tree
Changes from 53 commits
Commits
Show all changes
66 commits
Select commit Hold shift + click to select a range
846f853
enable cpu bnb path
jiqing-feng May 29, 2024
6c56703
fix style
jiqing-feng May 29, 2024
3f02c9b
fix code style
jiqing-feng May 29, 2024
9ccbf10
fix 4 bit path
jiqing-feng May 29, 2024
89fa5ef
Update src/transformers/utils/import_utils.py
jiqing-feng Jul 17, 2024
a52d7af
add multi backend refactor tests
jiqing-feng Jul 17, 2024
6f67862
fix style
jiqing-feng Jul 17, 2024
ee23eb0
tweak 4bit quantizer + fix corresponding tests
Titus-von-Koeller Jul 30, 2024
678e673
tweak 8bit quantizer + *try* fixing corresponding tests
Titus-von-Koeller Jul 30, 2024
0858b3e
fix dequant bnb 8bit
jiqing-feng Aug 1, 2024
c76d243
account for Intel CPU in variability of expected outputs
Titus-von-Koeller Aug 1, 2024
5843f28
enable cpu and xpu device map
jiqing-feng Aug 7, 2024
1a864a8
further tweaks to account for Intel CPU
Titus-von-Koeller Aug 2, 2024
f3753fc
fix autocast to work with both cpu + cuda
Titus-von-Koeller Aug 13, 2024
0cc1b7e
fix comments
Titus-von-Koeller Aug 14, 2024
b611812
fix comments
Titus-von-Koeller Aug 14, 2024
ab4836e
switch to testing_utils.torch_device
Titus-von-Koeller Aug 14, 2024
7399500
allow for xpu in multi-gpu tests
Titus-von-Koeller Aug 18, 2024
b41059c
fix tests 4bit for CPU NF4
jiqing-feng Aug 20, 2024
1a7a6fe
fix bug with is_torch_xpu_available needing to be called as func
Titus-von-Koeller Aug 20, 2024
87983df
avoid issue where test reports attr err due to other failure
Titus-von-Koeller Aug 20, 2024
7f17188
fix formatting
Titus-von-Koeller Aug 21, 2024
bb3ba4a
fix typo from resolving of merge conflict
Titus-von-Koeller Aug 21, 2024
463c211
polish based on last PR review
Titus-von-Koeller Aug 22, 2024
6d89ee4
fix CI
jiqing-feng Aug 28, 2024
7e01cfb
Update src/transformers/integrations/integration_utils.py
jiqing-feng Aug 29, 2024
9bffc93
Update src/transformers/integrations/integration_utils.py
jiqing-feng Aug 29, 2024
01b7587
fix error log
jiqing-feng Aug 29, 2024
171b130
fix error msg
jiqing-feng Aug 29, 2024
5e9bf9a
add \n in error log
jiqing-feng Aug 29, 2024
496c046
make quality
jiqing-feng Aug 29, 2024
86d0016
rm bnb cuda restriction in doc
jiqing-feng Aug 30, 2024
1c96ae9
cpu model don't need dispatch
jiqing-feng Sep 3, 2024
495354e
Merge branch 'main' into bnb_cpu
jiqing-feng Sep 3, 2024
3aec626
fix doc
jiqing-feng Sep 3, 2024
daa1e27
fix style
jiqing-feng Sep 3, 2024
d55db0e
check cuda avaliable in testing
jiqing-feng Sep 5, 2024
a21a916
fix tests
jiqing-feng Sep 5, 2024
8ad17e8
Update docs/source/en/model_doc/chameleon.md
jiqing-feng Sep 11, 2024
107e02b
Update docs/source/en/model_doc/llava_next.md
jiqing-feng Sep 11, 2024
20f6b5e
Update tests/quantization/bnb/test_4bit.py
jiqing-feng Sep 11, 2024
9ac038e
Update tests/quantization/bnb/test_4bit.py
jiqing-feng Sep 11, 2024
3bab7d7
fix doc
jiqing-feng Sep 11, 2024
968d9c5
Merge branch 'huggingface:main' into bnb_cpu
jiqing-feng Sep 11, 2024
08f31f8
fix check multibackends
jiqing-feng Sep 11, 2024
9eb0970
fix import sort
jiqing-feng Sep 11, 2024
b506b98
remove check torch in bnb
jiqing-feng Sep 11, 2024
2be4169
docs: update bitsandbytes references with multi-backend info
Titus-von-Koeller Sep 11, 2024
e607b7c
docs: fix small mistakes in bnb paragraph
Titus-von-Koeller Sep 11, 2024
ac108c6
run formatting
Titus-von-Koeller Sep 11, 2024
82dcb0d
Merge remote-tracking branch 'origin/main' into bnb_cpu
Titus-von-Koeller Sep 11, 2024
c66e7e7
reveret bnb check
jiqing-feng Sep 12, 2024
8f25ee2
move bnb multi-backend check to import_utils
jiqing-feng Sep 13, 2024
a4333cb
Update src/transformers/utils/import_utils.py
jiqing-feng Sep 14, 2024
32cbb8d
fix bnb check
jiqing-feng Sep 14, 2024
4ce4b55
minor fix for bnb
jiqing-feng Sep 14, 2024
937ed3b
check lib first
jiqing-feng Sep 14, 2024
e40f284
fix code style
jiqing-feng Sep 14, 2024
03dd03b
Merge branch 'huggingface:main' into bnb_cpu
jiqing-feng Sep 14, 2024
b8093ce
Revert "run formatting"
jiqing-feng Sep 14, 2024
0551d23
fix format
jiqing-feng Sep 14, 2024
e33e43b
give warning when bnb version is low and no cuda found]
jiqing-feng Sep 18, 2024
ced3c28
Merge branch 'huggingface:main' into bnb_cpu
jiqing-feng Sep 18, 2024
170dd58
fix device assignment check to be multi-device capable
Titus-von-Koeller Sep 22, 2024
9ba4a5e
address akx feedback on get_avlbl_dev fn
Titus-von-Koeller Sep 23, 2024
594f6f8
we don't want the function tat publicc, as docs would be too much
Titus-von-Koeller Sep 24, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
184 changes: 118 additions & 66 deletions .circleci/create_circleci_config.py

Large diffs are not rendered by default.

28 changes: 16 additions & 12 deletions .circleci/parse_test_outputs.py
Original file line number Diff line number Diff line change
@@ -1,53 +1,57 @@
import re
import argparse
import re


def parse_pytest_output(file_path):
skipped_tests = {}
skipped_count = 0
with open(file_path, 'r') as file:
with open(file_path, "r") as file:
for line in file:
match = re.match(r'^SKIPPED \[(\d+)\] (tests/.*): (.*)$', line)
match = re.match(r"^SKIPPED \[(\d+)\] (tests/.*): (.*)$", line)
if match:
skipped_count += 1
test_file, test_line, reason = match.groups()
skipped_tests[reason] = skipped_tests.get(reason, []) + [(test_file, test_line)]
for k,v in sorted(skipped_tests.items(), key=lambda x:len(x[1])):
for k, v in sorted(skipped_tests.items(), key=lambda x: len(x[1])):
print(f"{len(v):4} skipped because: {k}")
print("Number of skipped tests:", skipped_count)


def parse_pytest_failure_output(file_path):
failed_tests = {}
failed_count = 0
with open(file_path, 'r') as file:
with open(file_path, "r") as file:
for line in file:
match = re.match(r'^FAILED (tests/.*) - (.*): (.*)$', line)
match = re.match(r"^FAILED (tests/.*) - (.*): (.*)$", line)
if match:
failed_count += 1
_, error, reason = match.groups()
failed_tests[reason] = failed_tests.get(reason, []) + [error]
for k,v in sorted(failed_tests.items(), key=lambda x:len(x[1])):
for k, v in sorted(failed_tests.items(), key=lambda x: len(x[1])):
print(f"{len(v):4} failed because `{v[0]}` -> {k}")
print("Number of failed tests:", failed_count)
if failed_count>0:
if failed_count > 0:
exit(1)


def parse_pytest_errors_output(file_path):
print(file_path)
error_tests = {}
error_count = 0
with open(file_path, 'r') as file:
with open(file_path, "r") as file:
for line in file:
match = re.match(r'^ERROR (tests/.*) - (.*): (.*)$', line)
match = re.match(r"^ERROR (tests/.*) - (.*): (.*)$", line)
if match:
error_count += 1
_, test_error, reason = match.groups()
error_tests[reason] = error_tests.get(reason, []) + [test_error]
for k,v in sorted(error_tests.items(), key=lambda x:len(x[1])):
for k, v in sorted(error_tests.items(), key=lambda x: len(x[1])):
print(f"{len(v):4} errored out because of `{v[0]}` -> {k}")
print("Number of errors:", error_count)
if error_count>0:
if error_count > 0:
exit(1)


def main():
parser = argparse.ArgumentParser()
parser.add_argument("--file", help="file to parse")
Expand Down
2 changes: 0 additions & 2 deletions benchmark/benchmark.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,9 +31,7 @@
from pathlib import Path

from git import Repo

from huggingface_hub import HfApi

from optimum_benchmark import Benchmark
from optimum_benchmark_wrapper import main

Expand Down
2 changes: 1 addition & 1 deletion docs/source/en/llm_tutorial_optimization.md
Original file line number Diff line number Diff line change
Expand Up @@ -181,7 +181,7 @@ for every matrix multiplication. Dequantization and re-quantization is performed

Therefore, inference time is often **not** reduced when using quantized weights, but rather increases.
Enough theory, let's give it a try! To quantize the weights with Transformers, you need to make sure that
the [`bitsandbytes`](https://github.com/TimDettmers/bitsandbytes) library is installed.
the [`bitsandbytes`](https://github.com/bitsandbytes-foundation/bitsandbytes) library is installed.

```bash
!pip install bitsandbytes
Expand Down
12 changes: 11 additions & 1 deletion docs/source/en/model_doc/chameleon.md
Original file line number Diff line number Diff line change
Expand Up @@ -128,7 +128,17 @@ processor.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokeniza

### Quantization using Bitsandbytes

The model can be loaded in 8 or 4 bits, greatly reducing the memory requirements while maintaining the performance of the original model. First make sure to install bitsandbytes, `pip install bitsandbytes` and make sure to have access to a CUDA compatible GPU device. Simply change the snippet above with:
The model can be loaded in 8 or 4 bits, greatly reducing the memory requirements while maintaining the performance of the original model. First make sure to install bitsandbytes, `pip install bitsandbytes` and to have access to a GPU/accelerator that is supported by the library.

<Tip>

bitsandbytes is being refactored to support multiple backends beyond CUDA. Currently, ROCm (AMD GPU) and Intel CPU implementations are mature, with Intel XPU in progress and Apple Silicon support expected by Q4/Q1. For installation instructions and the latest backend updates, visit [this link](https://huggingface.co/docs/bitsandbytes/main/en/installation#multi-backend).

We value your feedback to help identify bugs before the full release! Check out [these docs](https://huggingface.co/docs/bitsandbytes/main/en/non_cuda_backends) for more details and feedback links.

</Tip>

Simply change the snippet above with:

```python
from transformers import ChameleonForConditionalGeneration, BitsAndBytesConfig
Expand Down
12 changes: 11 additions & 1 deletion docs/source/en/model_doc/llava_next.md
Original file line number Diff line number Diff line change
Expand Up @@ -233,7 +233,17 @@ processor.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokeniza

### Quantization using Bitsandbytes

The model can be loaded in 8 or 4 bits, greatly reducing the memory requirements while maintaining the performance of the original model. First make sure to install bitsandbytes, `pip install bitsandbytes` and make sure to have access to a CUDA compatible GPU device. Simply change the snippet above with:
The model can be loaded in 8 or 4 bits, greatly reducing the memory requirements while maintaining the performance of the original model. First make sure to install bitsandbytes, `pip install bitsandbytes`, and to have access to a GPU/accelerator that is supported by the library.

<Tip>

bitsandbytes is being refactored to support multiple backends beyond CUDA. Currently, ROCm (AMD GPU) and Intel CPU implementations are mature, with Intel XPU in progress and Apple Silicon support expected by Q4/Q1. For installation instructions and the latest backend updates, visit [this link](https://huggingface.co/docs/bitsandbytes/main/en/installation#multi-backend).

We value your feedback to help identify bugs before the full release! Check out [these docs](https://huggingface.co/docs/bitsandbytes/main/en/non_cuda_backends) for more details and feedback links.

</Tip>

Simply change the snippet above with:

```python
from transformers import LlavaNextForConditionalGeneration, BitsAndBytesConfig
Expand Down
12 changes: 11 additions & 1 deletion docs/source/en/model_doc/llava_next_video.md
Original file line number Diff line number Diff line change
Expand Up @@ -205,7 +205,17 @@ processor.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokeniza

The model can be loaded in lower bits, significantly reducing memory burden while maintaining the performance of the original model. This allows for efficient deployment on resource-constrained cases.

First make sure to install bitsandbytes by running `pip install bitsandbytes` and to have access to a CUDA compatible GPU device. Load the quantized model by simply adding [`BitsAndBytesConfig`](../main_classes/quantization#transformers.BitsAndBytesConfig) as shown below:
First, make sure to install bitsandbytes by running `pip install bitsandbytes` and to have access to a GPU/accelerator that is supported by the library.

<Tip>

bitsandbytes is being refactored to support multiple backends beyond CUDA. Currently, ROCm (AMD GPU) and Intel CPU implementations are mature, with Intel XPU in progress and Apple Silicon support expected by Q4/Q1. For installation instructions and the latest backend updates, visit [this link](https://huggingface.co/docs/bitsandbytes/main/en/installation#multi-backend).

We value your feedback to help identify bugs before the full release! Check out [these docs](https://huggingface.co/docs/bitsandbytes/main/en/non_cuda_backends) for more details and feedback links.

</Tip>

Then simply load the quantized model by adding [`BitsAndBytesConfig`](../main_classes/quantization#transformers.BitsAndBytesConfig) as shown below:


```python
Expand Down
14 changes: 12 additions & 2 deletions docs/source/en/model_doc/llava_onevision.md
Original file line number Diff line number Diff line change
Expand Up @@ -264,9 +264,19 @@ processor.batch_decode(out, skip_special_tokens=True, clean_up_tokenization_spac

## Model optimization

### Quantization using Bitsandbytes
### Quantization using bitsandbytes

The model can be loaded in 8 or 4 bits, greatly reducing the memory requirements while maintaining the performance of the original model. First make sure to install bitsandbytes, `pip install bitsandbytes` and make sure to have access to a CUDA compatible GPU device. Simply change the snippet above with:
The model can be loaded in 8 or 4 bits, greatly reducing the memory requirements while maintaining the performance of the original model. First make sure to install bitsandbytes, `pip install bitsandbytes` and make sure to have access to a GPU/accelerator that is supported by the library.

<Tip>

bitsandbytes is being refactored to support multiple backends beyond CUDA. Currently, ROCm (AMD GPU) and Intel CPU implementations are mature, with Intel XPU in progress and Apple Silicon support expected by Q4/Q1. For installation instructions and the latest backend updates, visit [this link](https://huggingface.co/docs/bitsandbytes/main/en/installation#multi-backend).

We value your feedback to help identify bugs before the full release! Check out [these docs](https://huggingface.co/docs/bitsandbytes/main/en/non_cuda_backends) for more details and feedback links.

</Tip>

Simply change the snippet above with:

```python
from transformers import LlavaOnevisionForConditionalGeneration, BitsAndBytesConfig
Expand Down
2 changes: 1 addition & 1 deletion docs/source/en/model_doc/mixtral.md
Original file line number Diff line number Diff line change
Expand Up @@ -141,7 +141,7 @@ The Flash Attention-2 model uses also a more memory efficient cache slicing mech

As the Mixtral model has 45 billion parameters, that would require about 90GB of GPU RAM in half precision (float16), since each parameter is stored in 2 bytes. However, one can shrink down the size of the model using [quantization](../quantization.md). If the model is quantized to 4 bits (or half a byte per parameter), a single A100 with 40GB of RAM is enough to fit the entire model, as in that case only about 27 GB of RAM is required.

Quantizing a model is as simple as passing a `quantization_config` to the model. Below, we'll leverage the BitsAndyBytes quantization (but refer to [this page](../quantization.md) for other quantization methods):
Quantizing a model is as simple as passing a `quantization_config` to the model. Below, we'll leverage the bitsandbytes quantization library (but refer to [this page](../quantization.md) for alternative quantization methods):

```python
>>> import torch
Expand Down
12 changes: 11 additions & 1 deletion docs/source/en/model_doc/video_llava.md
Original file line number Diff line number Diff line change
Expand Up @@ -139,7 +139,17 @@ processor.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokeniza

The model can be loaded in lower bits, significantly reducing memory burden while maintaining the performance of the original model. his allows for efficient deployment on resource-constrained cases.

First make sure to install bitsandbytes by running `pip install bitsandbytes` and to have access to a CUDA compatible GPU device. Load the quantized model by simply adding [`BitsAndBytesConfig`](../main_classes/quantization#transformers.BitsAndBytesConfig) as shown below:
First make sure to install bitsandbytes by running `pip install bitsandbytes` and to have access to a GPU/accelerator that is supported by the library.

<Tip>

bitsandbytes is being refactored to support multiple backends beyond CUDA. Currently, ROCm (AMD GPU) and Intel CPU implementations are mature, with Intel XPU in progress and Apple Silicon support expected by Q4/Q1. For installation instructions and the latest backend updates, visit [this link](https://huggingface.co/docs/bitsandbytes/main/en/installation#multi-backend).

We value your feedback to help identify bugs before the full release! Check out [these docs](https://huggingface.co/docs/bitsandbytes/main/en/non_cuda_backends) for more details and feedback links.

</Tip>

Load the quantized model by simply adding [`BitsAndBytesConfig`](../main_classes/quantization#transformers.BitsAndBytesConfig) as shown below:


```python
Expand Down
2 changes: 1 addition & 1 deletion docs/source/en/model_memory_anatomy.md
Original file line number Diff line number Diff line change
Expand Up @@ -233,7 +233,7 @@ Let's look at the details.
**Optimizer States:**

- 8 bytes * number of parameters for normal AdamW (maintains 2 states)
- 2 bytes * number of parameters for 8-bit AdamW optimizers like [bitsandbytes](https://github.com/TimDettmers/bitsandbytes)
- 2 bytes * number of parameters for 8-bit AdamW optimizers like [bitsandbytes](https://github.com/bitsandbytes-foundation/bitsandbytes)
- 4 bytes * number of parameters for optimizers like SGD with momentum (maintains only 1 state)

**Gradients**
Expand Down
2 changes: 1 addition & 1 deletion docs/source/en/perf_train_gpu_one.md
Original file line number Diff line number Diff line change
Expand Up @@ -284,7 +284,7 @@ training_args = TrainingArguments(per_device_train_batch_size=4, optim="adamw_bn

However, we can also use a third-party implementation of the 8-bit optimizer for demonstration purposes to see how that can be integrated.

First, follow the installation guide in the GitHub [repo](https://github.com/TimDettmers/bitsandbytes) to install the `bitsandbytes` library
First, follow the installation guide in the GitHub [repo](https://github.com/bitsandbytes-foundation/bitsandbytes) to install the `bitsandbytes` library
that implements the 8-bit Adam optimizer.

Next you need to initialize the optimizer. This involves two steps:
Expand Down
8 changes: 8 additions & 0 deletions docs/source/en/quantization/bitsandbytes.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,14 @@ pip install --upgrade accelerate transformers
</hfoption>
</hfoptions>

<Tip>

bitsandbytes is being refactored to support multiple backends beyond CUDA. Currently, ROCm (AMD GPU) and Intel CPU implementations are mature, with Intel XPU in progress and Apple Silicon support expected by Q4/Q1. For installation instructions and the latest backend updates, visit [this link](https://huggingface.co/docs/bitsandbytes/main/en/installation#multi-backend).

We value your feedback to help identify bugs before the full release! Check out [these docs](https://huggingface.co/docs/bitsandbytes/main/en/non_cuda_backends) for more details and feedback links.

</Tip>

Now you can quantize a model by passing a `BitsAndBytesConfig` to [`~PreTrainedModel.from_pretrained`] method. This works for any model in any modality, as long as it supports loading with Accelerate and contains `torch.nn.Linear` layers.

<hfoptions id="bnb">
Expand Down
16 changes: 15 additions & 1 deletion docs/source/en/quantization/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,11 +49,25 @@ Use the table below to help you decide which quantization method to use.
|-------------------------------------|-------------------------|-----|----------|----------------|-----------------------|-------------------------|----------------|-------------------------------------|--------------|------------------------|---------------------------------------------|
| [AQLM](./aqlm) | 🔴 | 🟢 | 🟢 | 🔴 | 🔴 | 🟢 | 1 / 2 | 🟢 | 🟢 | 🟢 | https://github.com/Vahe1994/AQLM |
| [AWQ](./awq) | 🔴 | 🔴 | 🟢 | 🟢 | 🔴 | ? | 4 | 🟢 | 🟢 | 🟢 | https://github.com/casper-hansen/AutoAWQ |
| [bitsandbytes](./bitsandbytes) | 🟢 | 🔴 | 🟢 | 🔴 | 🔴 | 🔴 | 4 / 8 | 🟢 | 🟢 | 🟢 | https://github.com/TimDettmers/bitsandbytes |
| [bitsandbytes](./bitsandbytes) | 🟢 | 🟡 * | 🟢 | 🟡 * | 🔴 ** | 🔴 (soon!) | 4 / 8 | 🟢 | 🟢 | 🟢 | https://github.com/bitsandbytes-foundation/bitsandbytes |
| [EETQ](./eetq) | 🟢 | 🔴 | 🟢 | 🔴 | 🔴 | ? | 8 | 🟢 | 🟢 | 🟢 | https://github.com/NetEase-FuXi/EETQ |
| GGUF / GGML (llama.cpp) | 🟢 | 🟢 | 🟢 | 🔴 | 🟢 | 🔴 | 1 - 8 | 🔴 | [See GGUF section](../gguf) | [See GGUF section](../gguf) | https://github.com/ggerganov/llama.cpp |
| [GPTQ](./gptq) | 🔴 | 🔴 | 🟢 | 🟢 | 🔴 | 🔴 | 2 - 3 - 4 - 8 | 🟢 | 🟢 | 🟢 | https://github.com/AutoGPTQ/AutoGPTQ |
| [HQQ](./hqq) | 🟢 | 🟢 | 🟢 | 🔴 | 🔴 | 🟢 | 1 - 8 | 🟢 | 🔴 | 🟢 | https://github.com/mobiusml/hqq/ |
| [Quanto](./quanto) | 🟢 | 🟢 | 🟢 | 🔴 | 🟢 | 🟢 | 2 / 4 / 8 | 🔴 | 🔴 | 🟢 | https://github.com/huggingface/quanto |
| [FBGEMM_FP8](./fbgemm_fp8.md) | 🟢 | 🔴 | 🟢 | 🔴 | 🔴 | 🔴 | 8 | 🔴 | 🟢 | 🟢 | https://github.com/pytorch/FBGEMM |
| [torchao](./torchao.md) | 🟢 | | 🟢 | 🔴 | partial support (int4 weight only) | | 4 / 8 | | 🟢🔴 | 🟢 | https://github.com/pytorch/ao |

<Tip>

\* bitsandbytes is being refactored to support multiple backends beyond CUDA. Currently, ROCm (AMD GPU) and Intel CPU implementations are mature, with Intel XPU in progress and Apple Silicon support expected by Q4/Q1. For installation instructions and the latest backend updates, visit [this link](https://huggingface.co/docs/bitsandbytes/main/en/installation#multi-backend).

We value your feedback to help identify bugs before the full release! Check out [these docs](https://huggingface.co/docs/bitsandbytes/main/en/non_cuda_backends) for more details and feedback links.

</Tip>

<Tip>

\** bitsandbytes is seeking contributors to help develop and lead the Apple Silicon backend. Interested? Contact them directly via their repo. Stipends may be available through sponsorships.

</Tip>
7 changes: 2 additions & 5 deletions scripts/benchmark/trainer-benchmark.py
Original file line number Diff line number Diff line change
Expand Up @@ -181,23 +181,21 @@ def get_original_command(max_width=80, full_python_path=False):


def get_base_command(args, output_dir):

# unwrap multi-line input
args.base_cmd = re.sub(r"[\\\n]+", " ", args.base_cmd)

# remove --output_dir if any and set our own
args.base_cmd = re.sub("--output_dir\s+[^\s]+", "", args.base_cmd)
args.base_cmd = re.sub(r"--output_dir\s+[^\s]+", "", args.base_cmd)
args.base_cmd += f" --output_dir {output_dir}"

# ensure we have --overwrite_output_dir
args.base_cmd = re.sub("--overwrite_output_dir\s+", "", args.base_cmd)
args.base_cmd = re.sub(r"--overwrite_output_dir\s+", "", args.base_cmd)
args.base_cmd += " --overwrite_output_dir"

return [sys.executable] + shlex.split(args.base_cmd)


def process_run_single(id, cmd, variation, output_dir, target_metric_key, metric_keys, verbose):

# Enable to debug everything but the run itself, to do it fast and see the progress.
# This is useful for debugging the output formatting quickly - we can remove it later once
# everybody is happy with the output
Expand Down Expand Up @@ -296,7 +294,6 @@ def get_versions():


def process_results(results, target_metric_key, report_metric_keys, base_variation, output_dir):

df = pd.DataFrame(results)
variation_key = "variation"
diff_key = "diff_%"
Expand Down
Loading