Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ Examples ] E2E Examples #5

Merged
merged 30 commits into from
Jul 2, 2024
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
a8c3ad8
added examples
robertgshaw2-neuralmagic Jun 24, 2024
539d31a
updated examples
robertgshaw2-neuralmagic Jun 24, 2024
6f298a7
set to 32 samples for testing
robertgshaw2-neuralmagic Jun 24, 2024
cfc1ec0
fix
robertgshaw2-neuralmagic Jun 25, 2024
82e8910
Update llama7b_quantize_sparse_cnn.py
robertgshaw2-neuralmagic Jun 25, 2024
62f8011
Merge branch 'main' into rs/examples
robertgshaw2-neuralmagic Jun 25, 2024
af0be23
tweak W8A8
robertgshaw2-neuralmagic Jun 25, 2024
931c504
firx w4a16
robertgshaw2-neuralmagic Jun 26, 2024
e12b65e
added example
robertgshaw2-neuralmagic Jun 27, 2024
982e3ee
tweak fp8 example
Jun 27, 2024
5971dce
remove changes
Jun 27, 2024
438b01e
fix
Jun 27, 2024
8822f3c
update examples to use tokenized data
Jun 27, 2024
a6bcb90
save
Jun 27, 2024
466cdb6
Merge branch 'main' into rs/examples
robertgshaw2-neuralmagic Jul 2, 2024
f430e43
fp8 example end to end
robertgshaw2-neuralmagic Jul 2, 2024
b0eaf12
tweak README
robertgshaw2-neuralmagic Jul 2, 2024
a020ebe
rename title
robertgshaw2-neuralmagic Jul 2, 2024
7c58ff4
update title
robertgshaw2-neuralmagic Jul 2, 2024
556eca2
finished example
robertgshaw2-neuralmagic Jul 2, 2024
39f2ef0
refactored directory structure
robertgshaw2-neuralmagic Jul 2, 2024
284a0f0
nits
robertgshaw2-neuralmagic Jul 2, 2024
2da06f9
restructure w4a16
robertgshaw2-neuralmagic Jul 2, 2024
367fb0f
fixed w4a16
robertgshaw2-neuralmagic Jul 2, 2024
956e1a4
added w8a8-int8 example
robertgshaw2-neuralmagic Jul 2, 2024
5911c45
finalized example
robertgshaw2-neuralmagic Jul 2, 2024
3d4d03b
added back example
robertgshaw2-neuralmagic Jul 2, 2024
d600009
stash
robertgshaw2-neuralmagic Jul 2, 2024
708f288
format
robertgshaw2-neuralmagic Jul 2, 2024
59ea79e
Update examples/quantization_w4a16/README.md
robertgshaw2-neuralmagic Jul 2, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/ISSUE_TEMPLATE/bug_report.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ A clear and concise description of what you expected to happen.
Include all relevant environment information:
1. OS [e.g. Ubuntu 18.04]:
2. Python version [e.g. 3.7]:
3. SparseML version or commit hash [e.g. 0.1.0, `f7245c8`]:
3. LLM Compressor version or commit hash [e.g. 0.1.0, `f7245c8`]:
4. ML framework version(s) [e.g. torch 1.7.1]:
5. Other Python package versions [e.g. SparseZoo, DeepSparse, numpy, ONNX]:
6. Other relevant environment information [e.g. hardware, CUDA version]:
Expand Down
6 changes: 6 additions & 0 deletions .github/PULL_REQUEST_TEMPLATE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
SUMMARY:
"please provide a brief summary"


TEST PLAN:
"please outline how the changes were tested"
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -801,3 +801,4 @@ nm_temp_test_logs/*
sparse_logs/*
wandb/
output_finetune/
env_log.json
103 changes: 35 additions & 68 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -1,86 +1,53 @@
<!--
Copyright (c) 2021 - present / Neuralmagic, Inc. All Rights Reserved.
# Contributing to LLM Compressor

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
Thank you for your interest in contributing to LLM Compressor!
Our community is open to everyone and welcomes all kinds of contributions, no matter how small or large.
There are several ways you can contribute to the project:

http://www.apache.org/licenses/LICENSE-2.0
- Identify and report any issues or bugs.
- Request or add new compression methods or research.
- Suggest or implement new features.

Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
However, remember that contributions aren't just about code.
We believe in the power of community support; thus, answering queries, assisting others, and enhancing the documentation are highly regarded and beneficial contributions.

TODO: update for upstream push
Finally, one of the most impactful ways to support us is by raising awareness about LLM Compressor and the vLLM community.
Talk about it in your blog posts, highlighting how it's driving your incredible projects.
Express your support on Twitter if vLLM aids you, or simply offer your appreciation by starring our repository.

# Contributing to SparseML
## Setup for development

If you’re reading this, hopefully we have piqued your interest to take the next step. Join us and help make SparseML even better! As a contributor, here are some community guidelines we would like you to follow:
### Install from source

- [Code of Conduct](#code-of-conduct)
- [Ways to Contribute](#ways-to-contribute)
- [Bugs and Feature Requests](#bugs-and-feature-requests)
- [Question or Problem](#question-or-problem)
- [Developing SparseML](DEVELOPING.md)
```bash
pip install -e ./[dev]
```

## Code of Conduct
### Code Styling and Formatting checks

Help us keep the software inclusive. Please read and follow our [Code of Conduct](https://github.com/neuralmagic/sparseml/blob/main/CODE_OF_CONDUCT.md) in order to promote an environment that is friendly, fair, respectful, and safe. We want to inspire collaboration, innovation, and fun!
```bash
make style
make quality
```

## Ways to Contribute
### Testing

Whether you’re a newbie, dabbler, or expert, we appreciate you jumping in.
```bash
make test
```

### Contributing Code
## Contributing Guidelines

- Make pull requests for addressing bugs, open issues, and documentation
- Neural Magic as the maintainer will do reviews and final merge
### Issue Reporting

### Reporting In
If you encounter a bug or have a feature request, please check our issues page first to see if someone else has already reported it.
If not, please file a new issue, providing as much relevant information as possible.

- See something, say something: bugs, documentation
- Propose new feature requests to Neural Magic
### Pull Requests & Code Reviews

### Helping Others
Please check the PR checklist in the [PR template](.github/PULL_REQUEST_TEMPLATE.md) for detailed guide for contribution.

- Answer open discussion topics
- Spread the word about SparseML
- Teach and empower others. This is the way!
### Thank You

## Bugs and Feature Requests

Please search through existing issues and requests first to avoid duplicates. Neural Magic will work with you further to take next steps.

- Go to: [GitHub Issues](https://github.com/vllm-project/llm-compressor/issues)

For bugs, include:

- brief summary
- OS/Environment details
- steps to reproduce (s.t.r.)
- code snippets, screenshots/casts, log content, sample models
- add the GitHub label "bug" to your post

For feature requests, include:

- problem you’re trying to solve
- community benefits
- other relevant details to support your proposal
- add the GitHub label "enhancement" to your post

For documentation edits, include:

- current state, proposed state
- if applicable, screenshots/casts
- add the GitHub label "documentation" to your post

## Question or Problem

Sign up or log in to our [**Neural Magic Community Slack**](https://neuralmagic.com/community/). We are growing the community member by member and happy to see you there. Don’t forget to search through existing discussions to avoid duplication! Thanks!

## Developing SparseML

Made it this far? Review [Developing SparseML](DEVELOPING.md) to get started.
Finally, thank you for taking the time to read these guidelines and for your interest in contributing to LLM Compressor.
Your contributions make LLM Compressor a great tool for everyone!
50 changes: 8 additions & 42 deletions DEVELOPING.md
Original file line number Diff line number Diff line change
@@ -1,25 +1,7 @@
<!--
Copyright (c) 2021 - present / Neuralmagic, Inc. All Rights Reserved.
# Developing LLM Compressor

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->

TODO: update for upstream push

# Developing SparseML

SparseML is developed and tested using Python 3.8-3.11.
To develop SparseML, you will also need the development dependencies and to follow the styling guidelines.
LLM Compressor is developed and tested using Python 3.8-3.11.
To develop LLM Compressor, you will also need the development dependencies and to follow the styling guidelines.

Here are some details to get started.

Expand All @@ -33,17 +15,7 @@ cd llm-compressor
python3 -m pip install -e "./[dev]"
```

This will clone the SparseML repo, install it, and install the development dependencies.

To develop framework specific features, you will also need the relevant framework packages.
Those can be installed by adding the framework name to the install extras. Frameworks include
`torch`, `keras`, and `tensorflow_v1`. For example:
```bash
python3 -m pip install -e "./[dev,torch]"
```

Note: Running all pytorch tests using `make test TARGETS=torch`, also requires `torchvision`
and `onnxruntime` install all these dependencies using `python3 -m pip install -e "./[dev, torch, torchvision, onnxruntime]"`
This will clone the LLM Compressor repo, install it, and install the development dependencies.

**Code Styling and Formatting checks**

Expand All @@ -52,22 +24,16 @@ make style
make quality
```

This will run automatic code styling using `black` and `isort` and test that the
This will run automatic code styling using `ruff`, `flake8`, `black`, and `isort` to test that the
repository's code matches its standards.

**EXAMPLE: test changes locally**

```bash
make test TARGETS=<CSV of frameworks to run>
make test
```

This will run the targeted SparseML unit tests for the frameworks specified.
The targets should be specified, because not all framework dependencies can be installed to run all tests.

To run just PyTorch tests, run
```bash
make test TARGETS=pytorch
```
This will run the targeted LLM Compressor unit tests for the frameworks specified.

File any error found before changes as an Issue and fix any errors found after making changes before submitting a Pull Request.

Expand All @@ -92,7 +58,7 @@ File any error found before changes as an Issue and fix any errors found after m
3. Add a remote to keep up with upstream changes.

```bash
git remote add upstream https://github.com/neuralmagic/sparseml.git
git remote add upstream https://github.com/vllm-project/llm-compressor.git
```

If you already have a copy, fetch upstream changes.
Expand Down
22 changes: 3 additions & 19 deletions examples/finetuning/configure_fsdp.md
Original file line number Diff line number Diff line change
@@ -1,31 +1,15 @@
<!--
Copyright (c) 2021 - present / Neuralmagic, Inc. All Rights Reserved.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->

# Configuring FSDP for Sparse Finetuning

An example FSDP configuration file, `example_fsdp_config.yaml`, is provided in this
folder. It can be used out of the box by editting the `num_processes` parameter to
folder. It can be used out of the box by editing the `num_processes` parameter to
fit the number of GPUs on your machine.

You can also customize your own config file by running the following prompt
```
accelerate config
```

An FSDP config file can be passed to the SparseML finetuning script like this:
An FSDP config file can be passed to the LLM Compressor finetuning script like this:
```
accelerate launch --config_file example_fsdp_config.yaml --no_python sparseml.transformers.text_generation.finetune
accelerate launch --config_file example_fsdp_config.yaml --no_python llmcompressor.transformers.text_generation.finetune
```
38 changes: 38 additions & 0 deletions examples/quantization/llama7b_fp8_quantization.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
import torch
from datasets import load_dataset
from transformers import AutoTokenizer

from llmcompressor.modifiers.quantization import QuantizationModifier
from llmcompressor.transformers import SparseAutoModelForCausalLM, oneshot

model_stub = "meta-llama/Meta-Llama-3-8B-Instruct"
output_dir = "Meta-Llama-3-8B-Instruct-FP8-Compressed"
num_calibration_samples = 512

tokenizer = AutoTokenizer.from_pretrained(model_stub, use_fast=True)
tokenizer.pad_token = tokenizer.eos_token


def preprocess(batch):
text = tokenizer.apply_chat_template(batch["messages"], tokenize=False)
tokenized = tokenizer(text, padding=True, truncation=True, max_length=2048)
return tokenized


ds = load_dataset("mgoin/ultrachat_2k", split="train_sft")
examples = ds.map(preprocess, remove_columns=ds.column_names)

recipe = QuantizationModifier(targets="Linear", scheme="FP8")
Satrat marked this conversation as resolved.
Show resolved Hide resolved

model = SparseAutoModelForCausalLM.from_pretrained(
model_stub, torch_dtype=torch.bfloat16, device_map="auto"
)

oneshot(
model=model,
dataset=examples,
recipe=recipe,
output_dir=output_dir,
num_calibration_samples=num_calibration_samples,
save_compressed=True,
)
2 changes: 1 addition & 1 deletion examples/quantization/llama7b_one_shot_quantization.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Creating a Quantized Llama Model in One Shot

Quantizing a model to a lower precision can save on both memory and speed at inference time.
This example demonstrates how to use the SparseML API to quantize a Llama model from 16 bits
This example demonstrates how to use the LLM Compressor API to quantize a Llama model from 16 bits
to 4 bits and save it to a compressed-tensors format for inference with vLLM.

## Step 1: Select a model and dataset
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
model_stub, torch_dtype=torch.bfloat16, device_map="auto"
)

# uses SparseML's built-in preprocessing for ultra chat
# uses LLM Compressor's built-in preprocessing for ultra chat
dataset = "ultrachat-200k"

# save location of quantized model
Expand Down
Loading
Loading