Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for quantization with bitsandbytes #490

Merged
merged 30 commits into from
Sep 10, 2022
Merged

Conversation

mryab
Copy link
Member

@mryab mryab commented Jun 27, 2022

This PR integrates blockwise quantization from bitsandbytes as a new compression mechanism of Hivemind. The important part is that it is an optional compression protocol: the user should only install an external library if they are going to need it, and hence the "conditional import"/"extra dependency" parts.

The code on the Hivemind side is pretty simple, but it'd be cool to have a way to include a CPU-only build of bitsandbytes as a dependency, so that we'll be able to both include it without checking for a CUDA version and to test the integration in GHA. @TimDettmers has granted me access to the bitsandbytes repo, so I'm going to work on that first before making this PR as ready to merge.

@mryab mryab requested a review from justheuristic June 27, 2022 16:27
@justheuristic
Copy link
Member

How's the PR going? need any help?

@mryab
Copy link
Member Author

mryab commented Jul 14, 2022

Not sure if I need any help, since we're mostly waiting for the new bitsandbytes release

@mryab mryab force-pushed the bnb_integration branch from b92444a to 623d957 Compare July 26, 2022 07:27
@mryab mryab requested a review from dbaranchuk August 22, 2022 22:32
@codecov
Copy link

codecov bot commented Aug 22, 2022

Codecov Report

Merging #490 (f311943) into master (6395e89) will decrease coverage by 0.04%.
The diff coverage is 89.74%.

@@            Coverage Diff             @@
##           master     #490      +/-   ##
==========================================
- Coverage   86.31%   86.27%   -0.05%     
==========================================
  Files          81       81              
  Lines        7887     7919      +32     
==========================================
+ Hits         6808     6832      +24     
- Misses       1079     1087       +8     
Impacted Files Coverage Δ
hivemind/compression/quantization.py 94.59% <87.50%> (-2.88%) ⬇️
hivemind/compression/__init__.py 100.00% <100.00%> (ø)
hivemind/compression/serialization.py 100.00% <100.00%> (ø)
hivemind/averaging/matchmaking.py 88.35% <0.00%> (-0.90%) ⬇️
hivemind/averaging/averager.py 88.27% <0.00%> (-0.24%) ⬇️

@mryab mryab changed the title [WIP] Add support for quantization with bitsandbytes Add support for quantization with bitsandbytes Aug 22, 2022
@mryab mryab marked this pull request as ready for review August 22, 2022 23:23
README.md Outdated
@@ -53,6 +53,11 @@ If your versions of Python and PyTorch match the requirements, you can install h
pip install hivemind
```

Also, if you want to use blockwise 8-bit compression from [bitsandbytes](https://github.com/TimDettmers/bitsandbytes)
during data transfer, you can [build it from source](https://github.com/TimDettmers/bitsandbytes#compile-from-source)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why suggest building it from source?

Instead, we can just suggest to run pip install bitsandbytes==0.32.1 (if you have GPU) or run pip install git+https://github.com/TimDettmers/bitsandbytes.git@4cd7ea6 (for CPU-only builds).

README.md Outdated Show resolved Hide resolved
@@ -32,10 +31,15 @@ def test_cli_run_server_identity_path():
encoding="utf-8",
)

# Skip line "UserWarning: The installed version of bitsandbytes was compiled without GPU support. <...>"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we assert that the skipped line contains this?

Then we'll be safe if smth changes, and the code will be self-descriptive (no need for the comment anymore).

@justheuristic
Copy link
Member

LGTM, please merge at will

Copy link
Collaborator

@dbaranchuk dbaranchuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mryab mryab merged commit 131f82c into master Sep 10, 2022
@mryab mryab deleted the bnb_integration branch September 10, 2022 14:39
mryab added a commit that referenced this pull request Sep 13, 2022
* Add support for quantization with bitsandbytes

* Extend the compression benchmark

* Add a test for blockwise compression

* Add a note to README about bitsandbytes

* Install bitsandbytes in tests as well

* Verify outputs consistently in test_moe.py
(to make the test less flaky)

* Pass device="cpu" in test_background_server_identity_path
This ensures that the server can actually launch in a GPU-enabled environment: otherwise initializing the CUDA context in a parent process prevents it

* Filter bitsandbytes warnings

(cherry picked from commit 131f82c)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants