Add uintx quant to generate and eval #811

jerryzh168 · 2024-09-05T00:02:01Z

Summary:
att

Also rerun the benchmarks/eval for llama2/llama3 to get most recent perf/acc data

Test Plan:
torchao/_models/llama/generate.py
torchao/_models/llama/eval.py

llama2:

# torch.uint4, group_size = 64
python generate.py --compile --precision bfloat16 --quantization uintx-4-64
Average tokens/sec: 48.25
Average Bandwidth: 189.32 GB/s
Peak Memory Usage: 6.29 GB
Model Size: 3.92 GB

wikitext: {'word_perplexity,none': 12.890544846479484, 'word_perplexity_stderr,none': 'N/A', 'byte_perplexity,none': 1.612969956510788, 'byte_perplexity_stderr,none': 'N/A', 'bits_per_byte,none': 0.6897195668279897, 'bits_per_byte_stderr,none': 'N/A', 'alias': 'wikitext'}

# torch.uint2, group_size = 8
python generate.py --compile --precision bfloat16 --quantization uintx-2-8
Average tokens/sec: 36.11
Average Bandwidth: 238.58 GB/s
Peak Memory Usage: 9.26 GB
Model Size: 6.61 GB

python eval.py --compile --precision bfloat16 --quantization uintx-2-8
wikitext: {'word_perplexity,none': 28.766343716897, 'word_perplexity_stderr,none': 'N/A', 'byte_perplexity,none': 1.8742120465648264, 'byte_perplexity_stderr,none': 'N/A', 'bits_per_byte,none': 0.9062841873734042, 'bits_per_byte_stderr,none': 'N/A', 'alias': 'wikitext'}

llama3:

# torch.uint4, group_size = 64
python generate.py --compile --precision bfloat16 --checkpoint_path=../../../checkpoints/meta-llama/Meta-Llama-3-8B/model.pth --quantization uintx-4-64
Average tokens/sec: 47.77
Average Bandwidth: 212.90 GB/s
Peak Memory Usage: 11.85 GB
Model Size: 4.46 GB

wikitext: {'word_perplexity,none': 8.112931736704462, 'word_perplexity_stderr,none': 'N/A', 'byte_perplexity,none': 1.479179221121259, 'byte_perplexity_stderr,none': 'N/A', 'bits_per_byte,none': 0.5647968636325521, 'bits_per_byte_stderr,none': 'N/A', 'alias': 'wikitext'}


# torch.uint2, group_size = 8
python generate.py --compile --precision bfloat16 --checkpoint_path=../../../checkpoints/meta-llama/Meta-Llama-3-8B/model.pth --quantization uintx-2-8
Average tokens/sec: 33.21
Average Bandwidth: 249.22 GB/s
Peak Memory Usage: 15.04 GB
Model Size: 7.51 GB

wikitext: {'word_perplexity,none': 39.36764348732592, 'word_perplexity_stderr,none': 'N/A', 'byte_perplexity,none': 1.98746296691363, 'byte_perplexity_stderr,none': 'N/A', 'bits_per_byte,none': 0.9909279784106695, 'bits_per_byte_stderr,none': 'N/A', 'alias': 'wikitext'}

Reviewers:

Subscribers:

Tasks:

Tags:

pytorch-bot · 2024-09-05T00:02:04Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/811

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit d5ebc0e with merge base 317392d ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

HDCharles · 2024-09-05T02:04:22Z

i would put the generate/eval results in a table somewhere, if you want to add them to the standard benchmarks you can add them to benchmarks.sh

also i would rebase on mine or you will have merge issues

HDCharles · 2024-09-05T02:05:05Z

if eval is broken for you, can you send me the error?

torchao/_models/llama/generate.py

jerryzh168 · 2024-09-05T03:33:59Z

if eval is broken for you, can you send me the error?

seems to be fine, it seems that int8wo and bfloat16 are just very close, I thought they were exactly the same before, but there is actually a slight difference

Summary: att Also rerun the benchmarks/eval for llama2/llama3 to get most recent perf/acc data Test Plan: torchao/_models/llama/generate.py torchao/_models/llama/eval.py Reviewers: Subscribers: Tasks: Tags:

jerryzh168 · 2024-09-05T05:47:09Z

right now these are slow, we can add to benchmarks.sh later when the perf is better I think

Summary: att Also rerun the benchmarks/eval for llama2/llama3 to get most recent perf/acc data Test Plan: torchao/_models/llama/generate.py torchao/_models/llama/eval.py Reviewers: Subscribers: Tasks: Tags:

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 5, 2024

jerryzh168 requested review from HDCharles and msaroufim September 5, 2024 00:02

HDCharles approved these changes Sep 5, 2024

View reviewed changes

HDCharles reviewed Sep 5, 2024

View reviewed changes

torchao/_models/llama/generate.py Show resolved Hide resolved

Add uintx quant to generate and eval

d5ebc0e

Summary: att Also rerun the benchmarks/eval for llama2/llama3 to get most recent perf/acc data Test Plan: torchao/_models/llama/generate.py torchao/_models/llama/eval.py Reviewers: Subscribers: Tasks: Tags:

jerryzh168 force-pushed the benchmarks branch from 5a4a915 to d5ebc0e Compare September 5, 2024 05:46

jerryzh168 merged commit e05635e into pytorch:main Sep 5, 2024

jerryzh168 deleted the benchmarks branch September 5, 2024 16:46

yanbing-j pushed a commit to yanbing-j/ao that referenced this pull request Dec 9, 2024

clean up unused files (pytorch#811)

9e52152

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add uintx quant to generate and eval #811

Add uintx quant to generate and eval #811

Uh oh!

jerryzh168 commented Sep 5, 2024 •

edited

Loading

Uh oh!

pytorch-bot bot commented Sep 5, 2024 •

edited

Loading

Uh oh!

HDCharles commented Sep 5, 2024

Uh oh!

HDCharles commented Sep 5, 2024

Uh oh!

Uh oh!

jerryzh168 commented Sep 5, 2024

Uh oh!

jerryzh168 commented Sep 5, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Add uintx quant to generate and eval #811

Add uintx quant to generate and eval #811

Uh oh!

Conversation

jerryzh168 commented Sep 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Sep 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/811

✅ No Failures

Uh oh!

HDCharles commented Sep 5, 2024

Uh oh!

HDCharles commented Sep 5, 2024

Uh oh!

Uh oh!

jerryzh168 commented Sep 5, 2024

Uh oh!

jerryzh168 commented Sep 5, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jerryzh168 commented Sep 5, 2024 •

edited

Loading

pytorch-bot bot commented Sep 5, 2024 •

edited

Loading