-
Notifications
You must be signed in to change notification settings - Fork 173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for quantization with bitsandbytes #490
Conversation
How's the PR going? need any help? |
Not sure if I need any help, since we're mostly waiting for the new bitsandbytes release |
Codecov Report
@@ Coverage Diff @@
## master #490 +/- ##
==========================================
- Coverage 86.31% 86.27% -0.05%
==========================================
Files 81 81
Lines 7887 7919 +32
==========================================
+ Hits 6808 6832 +24
- Misses 1079 1087 +8
|
README.md
Outdated
@@ -53,6 +53,11 @@ If your versions of Python and PyTorch match the requirements, you can install h | |||
pip install hivemind | |||
``` | |||
|
|||
Also, if you want to use blockwise 8-bit compression from [bitsandbytes](https://github.com/TimDettmers/bitsandbytes) | |||
during data transfer, you can [build it from source](https://github.com/TimDettmers/bitsandbytes#compile-from-source) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why suggest building it from source?
Instead, we can just suggest to run pip install bitsandbytes==0.32.1
(if you have GPU) or run pip install git+https://github.com/TimDettmers/bitsandbytes.git@4cd7ea6
(for CPU-only builds).
tests/test_start_server.py
Outdated
@@ -32,10 +31,15 @@ def test_cli_run_server_identity_path(): | |||
encoding="utf-8", | |||
) | |||
|
|||
# Skip line "UserWarning: The installed version of bitsandbytes was compiled without GPU support. <...>" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we assert that the skipped line contains this?
Then we'll be safe if smth changes, and the code will be self-descriptive (no need for the comment anymore).
Co-authored-by: Alexander Borzunov <borzunov.alexander@gmail.com>
(to make the test less flaky)
This ensures that the server can actually launch in a GPU-enabled environment: otherwise initializing the CUDA context in a parent process prevents it
c661d82
to
4c0fef6
Compare
LGTM, please merge at will |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Add support for quantization with bitsandbytes * Extend the compression benchmark * Add a test for blockwise compression * Add a note to README about bitsandbytes * Install bitsandbytes in tests as well * Verify outputs consistently in test_moe.py (to make the test less flaky) * Pass device="cpu" in test_background_server_identity_path This ensures that the server can actually launch in a GPU-enabled environment: otherwise initializing the CUDA context in a parent process prevents it * Filter bitsandbytes warnings (cherry picked from commit 131f82c)
This PR integrates blockwise quantization from bitsandbytes as a new compression mechanism of Hivemind. The important part is that it is an optional compression protocol: the user should only install an external library if they are going to need it, and hence the "conditional import"/"extra dependency" parts.
The code on the Hivemind side is pretty simple, but it'd be cool to have a way to include a CPU-only build of bitsandbytes as a dependency, so that we'll be able to both include it without checking for a CUDA version and to test the integration in GHA. @TimDettmers has granted me access to the bitsandbytes repo, so I'm going to work on that first before making this PR as ready to merge.