Add support for quantization with bitsandbytes #490

mryab · 2022-06-27T16:27:57Z

This PR integrates blockwise quantization from bitsandbytes as a new compression mechanism of Hivemind. The important part is that it is an optional compression protocol: the user should only install an external library if they are going to need it, and hence the "conditional import"/"extra dependency" parts.

The code on the Hivemind side is pretty simple, but it'd be cool to have a way to include a CPU-only build of bitsandbytes as a dependency, so that we'll be able to both include it without checking for a CUDA version and to test the integration in GHA. @TimDettmers has granted me access to the bitsandbytes repo, so I'm going to work on that first before making this PR as ready to merge.

justheuristic · 2022-07-14T15:01:46Z

How's the PR going? need any help?

mryab · 2022-07-14T15:03:42Z

Not sure if I need any help, since we're mostly waiting for the new bitsandbytes release

codecov · 2022-08-22T23:21:25Z

Codecov Report

Merging #490 (f311943) into master (6395e89) will decrease coverage by 0.04%.
The diff coverage is 89.74%.

@@            Coverage Diff             @@
##           master     #490      +/-   ##
==========================================
- Coverage   86.31%   86.27%   -0.05%     
==========================================
  Files          81       81              
  Lines        7887     7919      +32     
==========================================
+ Hits         6808     6832      +24     
- Misses       1079     1087       +8

Impacted Files	Coverage Δ
hivemind/compression/quantization.py	`94.59% <87.50%> (-2.88%)`	⬇️
hivemind/compression/__init__.py	`100.00% <100.00%> (ø)`
hivemind/compression/serialization.py	`100.00% <100.00%> (ø)`
hivemind/averaging/matchmaking.py	`88.35% <0.00%> (-0.90%)`	⬇️
hivemind/averaging/averager.py	`88.27% <0.00%> (-0.24%)`	⬇️

borzunov · 2022-08-23T12:43:53Z

README.md

@@ -53,6 +53,11 @@ If your versions of Python and PyTorch match the requirements, you can install h
 pip install hivemind
 ```

+Also, if you want to use blockwise 8-bit compression from [bitsandbytes](https://github.com/TimDettmers/bitsandbytes) 
+during data transfer, you can [build it from source](https://github.com/TimDettmers/bitsandbytes#compile-from-source) 


Why suggest building it from source?

Instead, we can just suggest to run pip install bitsandbytes==0.32.1 (if you have GPU) or run pip install git+https://github.com/TimDettmers/bitsandbytes.git@4cd7ea6 (for CPU-only builds).

README.md

borzunov · 2022-08-23T12:46:42Z

tests/test_start_server.py

@@ -32,10 +31,15 @@ def test_cli_run_server_identity_path():
            encoding="utf-8",
        )

+        # Skip line "UserWarning: The installed version of bitsandbytes was compiled without GPU support. <...>"


Can we assert that the skipped line contains this?

Then we'll be safe if smth changes, and the code will be self-descriptive (no need for the comment anymore).

Co-authored-by: Alexander Borzunov <borzunov.alexander@gmail.com>

(to make the test less flaky)

This ensures that the server can actually launch in a GPU-enabled environment: otherwise initializing the CUDA context in a parent process prevents it

justheuristic · 2022-09-09T10:37:30Z

LGTM, please merge at will

dbaranchuk

* Add support for quantization with bitsandbytes * Extend the compression benchmark * Add a test for blockwise compression * Add a note to README about bitsandbytes * Install bitsandbytes in tests as well * Verify outputs consistently in test_moe.py (to make the test less flaky) * Pass device="cpu" in test_background_server_identity_path This ensures that the server can actually launch in a GPU-enabled environment: otherwise initializing the CUDA context in a parent process prevents it * Filter bitsandbytes warnings (cherry picked from commit 131f82c)

mryab requested a review from justheuristic June 27, 2022 16:27

mryab force-pushed the bnb_integration branch from b92444a to 623d957 Compare July 26, 2022 07:27

mryab force-pushed the bnb_integration branch from 665428b to ba15780 Compare August 22, 2022 21:54

mryab requested a review from dbaranchuk August 22, 2022 22:32

mryab changed the title ~~[WIP] Add support for quantization with bitsandbytes~~ Add support for quantization with bitsandbytes Aug 22, 2022

mryab marked this pull request as ready for review August 22, 2022 23:23

borzunov reviewed Aug 23, 2022

View reviewed changes

README.md Outdated Show resolved Hide resolved

borzunov reviewed Aug 23, 2022

View reviewed changes

mryab force-pushed the bnb_integration branch from 01d41ca to 363874e Compare August 24, 2022 10:36

mryab and others added 17 commits September 9, 2022 07:00

Add support for quantization with bitsandbytes

fc657d5

Extend the compression benchmark

794b966

Fix formatting and imports

540aa08

Build a cpuonly version of bitsandbytes

d68a559

Build a cpuonly version of bitsandbytes

6a857f7

Build a cpuonly version of bitsandbytes

8bb5f3b

Revert changes

e730cff

Add a test for blockwise compression

6b5cf2a

Replace building bitsandbytes from source with pip installation

442b932

Add a note to README about bitsandbytes

0d63242

Revert to cpuonly build

0f7bb72

Revert to cpuonly build

98538a0

Revert to cpuonly build

ebc73ad

Revert to cpuonly build

44060b5

Update the docs

2b82251

Install bitsandbytes in tests as well

2aef651

Install bitsandbytes in tests as well

93c3ea5

mryab and others added 12 commits September 9, 2022 07:00

Skip bitsandbytes warnings about cpu-only versions

300a153

Skip bitsandbytes warnings about cpu-only versions

3a22e4d

Replace bitsandbytes with a newer pypi version

552e1d2

Update README.md

4bd9ca1

Co-authored-by: Alexander Borzunov <borzunov.alexander@gmail.com>

Use hivemind[bitsandbytes] for README

939e6f8

Use hivemind[bitsandbytes] for README

8a7d89e

Use hivemind[bitsandbytes] for README

9735432

Make bitsandbytes error parsing more explicit

f534fdc

Verify outputs consistently in test_moe.py

d5b9265

(to make the test less flaky)

Pass device="cpu" in test_background_server_identity_path

8120564

This ensures that the server can actually launch in a GPU-enabled environment: otherwise initializing the CUDA context in a parent process prevents it

Filter bitsandbytes warnings

48f4d0b

Bump the version of bitsandbytes

4c0fef6

mryab force-pushed the bnb_integration branch from c661d82 to 4c0fef6 Compare September 9, 2022 04:00

justheuristic approved these changes Sep 9, 2022

View reviewed changes

dbaranchuk approved these changes Sep 9, 2022

View reviewed changes

Reduce diff

f311943

mryab merged commit 131f82c into master Sep 10, 2022

mryab deleted the bnb_integration branch September 10, 2022 14:39

mryab mentioned this pull request Sep 10, 2022

[BUG] Tests for compression fail on GPU servers with bitsandbytes installed #507

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for quantization with bitsandbytes #490

Add support for quantization with bitsandbytes #490

mryab commented Jun 27, 2022

justheuristic commented Jul 14, 2022

mryab commented Jul 14, 2022

codecov bot commented Aug 22, 2022 •

edited

Loading

borzunov Aug 23, 2022

borzunov Aug 23, 2022

justheuristic commented Sep 9, 2022

dbaranchuk left a comment

Add support for quantization with bitsandbytes #490

Add support for quantization with bitsandbytes #490

Conversation

mryab commented Jun 27, 2022

justheuristic commented Jul 14, 2022

mryab commented Jul 14, 2022

codecov bot commented Aug 22, 2022 • edited Loading

Codecov Report

borzunov Aug 23, 2022

Choose a reason for hiding this comment

borzunov Aug 23, 2022

Choose a reason for hiding this comment

justheuristic commented Sep 9, 2022

dbaranchuk left a comment

Choose a reason for hiding this comment

codecov bot commented Aug 22, 2022 •

edited

Loading