Skip to content

Latest commit

 

History

History
10 lines (7 loc) · 575 Bytes

example_count_r.md

File metadata and controls

10 lines (7 loc) · 575 Bytes

VPTQ can count 'r' NOW!

We've just released the latest model, Llama-3.1-Nemotron-70B-Instruct-HF VPTQ-community, compressed from 4 bits to 1.5 bits. Everyone is welcome to try out the new model! We invite all kinds of suggestions. Now, we can count "r" on our local GPUs!

CUDA_VISIBLE_DEVICES=0 python -m vptq --model VPTQ-community/Llama-3.1-Nemotron-70B-Instruct-HF-v8-k65536-65536-woft --chat
# how many r in strrrraberry

count_r