Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable rocm build for CI #38

Merged
merged 31 commits into from
Jun 21, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
7772fa3
doc-builder image had been changed, need to revert to old one due to …
Titus-von-Koeller May 6, 2024
b659c70
Update CONTRIBUTING.md
Titus-von-Koeller May 7, 2024
b97ea77
Update README.md
Titus-von-Koeller May 7, 2024
b891f80
Update README.md
Titus-von-Koeller May 7, 2024
09cc153
Support NF4 on CPU backend
Xia-Weiwen May 8, 2024
177bd39
Minor improvements
Xia-Weiwen May 10, 2024
881b5fc
Add fp4 support; add UT; fix lint issues
Xia-Weiwen May 11, 2024
dd15734
Reduce memory usage
Xia-Weiwen May 11, 2024
85a01b0
Fix UT
Xia-Weiwen May 11, 2024
2c489f8
reduce memory usage for nf4
Xia-Weiwen May 11, 2024
13c70d3
clarify
stevhliu Apr 29, 2024
2b7daed
clarify
stevhliu May 14, 2024
d7a5a24
feedback
stevhliu May 16, 2024
25abf8d
Merge pull request #1211 from stevhliu/fix
Titus-von-Koeller May 19, 2024
c51437b
Update matplotlib requirement from ~=3.8.4 to ~=3.9.0 in the major group
dependabot[bot] May 20, 2024
fa65a9d
Bump pytest from 8.2.0 to 8.2.1 in the minor-patch group
dependabot[bot] May 20, 2024
be6700b
Merge pull request #1215 from TimDettmers/dependabot/pip/major-2d933c…
Titus-von-Koeller May 23, 2024
328b5a9
Merge pull request #1216 from TimDettmers/dependabot/pip/minor-patch-…
Titus-von-Koeller May 23, 2024
701c5aa
Merge pull request #1206 from Xia-Weiwen/multi-backend-refactor-cpu-4bit
Titus-von-Koeller May 24, 2024
eb3b816
Merge pull request #1207 from ROCm/device_abstraction
Titus-von-Koeller May 24, 2024
79815ad
README: ask for help from volunteer alpha testers
Titus-von-Koeller May 24, 2024
ccee5d8
Add empty stubs for Ascend NPU
ji-huazhong May 27, 2024
a8644b7
Bump scipy from 1.13.0 to 1.13.1 in the minor-patch group
dependabot[bot] May 27, 2024
09c314a
Merge pull request #1223 from statelesshz/backend-npu
Titus-von-Koeller May 28, 2024
c08653b
Merge pull request #1224 from TimDettmers/dependabot/pip/minor-patch-…
Titus-von-Koeller May 28, 2024
2dbf876
Merge branch 'main' into multi-backend-refactor
Titus-von-Koeller May 28, 2024
36fe1a0
fix blocksize
jiqing-feng May 29, 2024
dba8376
Merge pull request #1228 from jiqing-feng/4bit
Titus-von-Koeller May 30, 2024
517eaf2
CPU: add torch.compile for F.double_quant and F.quantize_4bit (#1238)
Xia-Weiwen Jun 6, 2024
5891465
Add build job for rocm
pnunna93 Jun 19, 2024
d03a680
Add rocm build script
pnunna93 Jun 19, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 19 additions & 0 deletions .github/scripts/build-rocm.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
#!/bin/bash
declare build_arch
declare build_os

set -xeuo pipefail
if [ "${build_os:0:6}" == ubuntu ]; then
image=rocm/dev-ubuntu-22.04:6.1-complete
echo "Using image $image"
docker run --rm --platform "linux/$build_arch" -i \
-w /src -v "$PWD:/src" "$image" sh -c \
"apt-get update \
&& DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends cmake \
&& cmake -DCOMPUTE_BACKEND=hip . \
&& cmake --build ."
fi

#output_dir="output/${build_os}/${build_arch}"
#mkdir -p "${output_dir}"
#(shopt -s nullglob && cp bitsandbytes/*.{so,dylib,dll} "${output_dir}")
1 change: 1 addition & 0 deletions .github/workflows/build_documentation.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,5 +14,6 @@ jobs:
commit_sha: ${{ github.sha }}
package: bitsandbytes
repo_owner: TimDettmers
custom_container: huggingface/transformers-doc-builder
secrets:
hf_token: ${{ secrets.HUGGINGFACE_PUSH }}
1 change: 1 addition & 0 deletions .github/workflows/build_pr_documentation.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,3 +16,4 @@ jobs:
pr_number: ${{ github.event.number }}
package: bitsandbytes
repo_owner: TimDettmers
custom_container: huggingface/transformers-doc-builder
22 changes: 22 additions & 0 deletions .github/workflows/python-package.yml
Original file line number Diff line number Diff line change
Expand Up @@ -103,6 +103,28 @@ jobs:
name: shared_library_cuda_${{ matrix.os }}_${{ matrix.arch }}_${{ matrix.cuda_version }}
path: output/*
retention-days: 7
build-shared-libs-rocm:
strategy:
matrix:
os: [ubuntu-latest]
arch: [x86_64]
runs-on: ${{ matrix.os }} # One day, we could run them on native agents. Azure supports this now but it's planned only for Q3 2023 for hosted agents
steps:
- uses: actions/checkout@v4
- name: Set up Docker multiarch
if: startsWith(matrix.os, 'ubuntu')
uses: docker/setup-qemu-action@v2
- name: Clean up disk space
run: |
sudo rm -rf /usr/share/dotnet
sudo rm -rf /opt/ghc
sudo rm -rf "/usr/local/share/boost"
sudo rm -rf "$AGENT_TOOLSDIRECTORY"
- name: Build C++
run: bash .github/scripts/build-rocm.sh
env:
build_os: ${{ matrix.os }}
build_arch: ${{ matrix.arch }}
build-wheels:
needs:
- build-shared-libs
Expand Down
13 changes: 1 addition & 12 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,23 +9,12 @@ We actively welcome your pull requests.
2. If you've added code that should be tested, add tests.
3. If you've changed APIs, update the documentation.
4. Ensure the test suite passes.
5. Make sure your code lints.
6. If you haven't already, complete the Contributor License Agreement ("CLA").

## Contributor License Agreement ("CLA")
In order to accept your pull request, we need you to submit a CLA. You only need
to do this once to work on any of Facebook's open source projects.

Complete your CLA here: <https://code.facebook.com/cla>
5. Make sure your code lints, install the [pre-commit hooks as documented here](https://huggingface.co/docs/bitsandbytes/main/en/contributing#setup-pre-commit-hooks).

## Issues
We use GitHub issues to track public bugs. Please ensure your description is
clear and has sufficient instructions to be able to reproduce the issue.

Facebook has a [bounty program](https://www.facebook.com/whitehat/) for the safe
disclosure of security bugs. In those cases, please go through the process
outlined on that page and do not file a public issue.

## License
By contributing to bitsandbytes, you agree that your contributions will be licensed
under the LICENSE file in the root directory of this source tree.
12 changes: 11 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,18 @@ There are ongoing efforts to support further hardware backends, i.e. Intel CPU +

**[https://huggingface.co/docs/bitsandbytes/main](https://huggingface.co/docs/bitsandbytes/main)**

## ALPHA TESTERS WANTED: `multi-backend-refactor` AMD GPU + Intel CPU/GPU specific BNB backend implementations

We're in the process of a complex refactor in order to allow the support of additional hardware backends, other than CUDA, in BNB. The efforts around this are already quite far along and there's plenty of functionality already in place that is in need for users to take a hands-on approach! Mac support will likely soon also see progress. However, I recommend waiting 2 weeks until the device abstraction has further consolidated (**breaking changes upcoming**).

Currently, you still need to compile from source, after checking out the `multi-backend-refactor` branch (instructions WIP, but [the current docs on the compilation from source](https://huggingface.co/docs/bitsandbytes/main/en/installation#compile-from-source) are a good starting point; [feel free to share tips / input in this Github discussion](https://github.com/TimDettmers/bitsandbytes/discussions/1219). We'll soon enable nightly releases to make this much easier for you!

Please give feedback to us in [this dedicated Github Discussion space](https://github.com/TimDettmers/bitsandbytes/discussions/categories/catch-all-alpha-testing-the-multi-backend-refactor)!

We're super excited about these recent developments and grateful for any constructive input or support that you can give to help us make this a reality. BNB is a community project and we're excited for your collaboration 🤗

## License

The majority of bitsandbytes is licensed under MIT, however small portions of the project are available under separate license terms, as the parts adapted from Pytorch are licensed under the BSD license.
`bitsandbytes` is MIT licensed.

We thank Fabio Cannizzo for his work on [FastBinarySearch](https://github.com/fabiocannizzo/FastBinarySearch) which we use for CPU quantization.
6 changes: 5 additions & 1 deletion bitsandbytes/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
)
from .backends import register_backend
from .backends.cpu import CPUBackend
from .backends.npu import NPUBackend
from .cextension import lib
from .nn import modules

Expand Down Expand Up @@ -49,11 +50,14 @@

register_backend("xpu", XPUBackend())

# Register Ascend NPU backend, if available.
if hasattr(torch, "npu") and torch.npu.is_available():
register_backend("npu", NPUBackend())

# TODO: Other potential backends:
# XLA - Google TPU / PJRT runtime
# HPU - Habana / Intel Gaudi
# IPU - Graphcore
# NPU - Ascend
# Note that we may not map 1:1 with a device type, e.g. SYCL, XLA
# In this case, it will be up to each backend to dispatch as needed

Expand Down
3 changes: 2 additions & 1 deletion bitsandbytes/autograd/_functions.py
Original file line number Diff line number Diff line change
Expand Up @@ -575,7 +575,8 @@ def matmul_4bit(
bias=None,
):
assert quant_state is not None
if A.numel() == A.shape[-1] and A.requires_grad == False:
if (A.numel() == A.shape[-1] or A.device.type == "cpu") and A.requires_grad == False:
# CPU backend does not require A to be a vector
if A.shape[-1] % quant_state.blocksize != 0:
warn(
f"Some matrices hidden dimension is not a multiple of {quant_state.blocksize} and efficient inference kernels are not supported for these (slow). Matrix input size found: {A.shape}",
Expand Down
20 changes: 17 additions & 3 deletions bitsandbytes/backends/cpu.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,12 @@

from .base import Backend
from .cpu_xpu_common import (
dequantize_4bit_impl,
double_quant_impl,
gemm_4bit_impl,
igemmlt_impl,
mm_dequant_impl,
quantize_4bit_impl,
)

Tensor = torch.Tensor
Expand Down Expand Up @@ -132,7 +135,11 @@ def quantize_4bit(
quant_type: Literal["fp4", "nf4"] = "fp4",
quant_storage=torch.uint8,
) -> Tuple[torch.Tensor, QuantState]:
raise NotImplementedError("Not yet implemented for CPU backend")
if blocksize is None:
blocksize = 64
assert_on_cpu([A, absmax, out])
assert quant_storage == torch.uint8, "CPU backend only supports uint8 quant_storage"
return quantize_4bit_impl(A, absmax, out, blocksize, compress_statistics, quant_type)

def dequantize_4bit(
self,
Expand All @@ -143,7 +150,10 @@ def dequantize_4bit(
blocksize: int = 64,
quant_type: Literal["fp4", "nf4"] = "fp4",
) -> torch.Tensor:
raise NotImplementedError("Not yet implemented for CPU backend")
if blocksize is None:
blocksize = 64
assert_on_cpu([A, absmax, out])
return dequantize_4bit_impl(A, quant_state, absmax, out, blocksize, quant_type)

def gemv_4bit(
self,
Expand All @@ -154,7 +164,11 @@ def gemv_4bit(
transposed_B=False,
state: QuantState = None,
) -> torch.Tensor:
raise NotImplementedError("Not yet implemented for CPU backend")
assert_on_cpu([A, B, out])
if state is None:
raise ValueError("state cannot be None. gemv_4bit() requires the state from quantize_4bit()")

return gemm_4bit_impl(A, B, out, transposed_A, transposed_B, state)

def dequantize_blockwise(
self,
Expand Down
Loading
Loading