-
Notifications
You must be signed in to change notification settings - Fork 171
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Summary: the peak memory improvement is extremely small, tried a few things to fix this but didn't have any luck. Accuracy is very poor (text is unintelligible) tried to leave most recent token not quantized (since we have full fidelity information for whatever the current token is). That didn't solve the issue and resulted in a significant memory increase, may need to try affine quantization but currently more concerned with the lack of memory improvement. (see benchmark_results.txt for the results see kv_quant: True vs kv_quant: False for comparison.) i also took a memory trace you can get with (if you're a meta employee) jf download GCqU9BqGNUybzv8CABWUzUtOiPZ5bsIXAAAz --file "mem_trace_kvq.html" Test Plan: sh benchmarks.sh Reviewers: Subscribers: Tasks: Tags:
- Loading branch information
Showing
4 changed files
with
90 additions
and
27 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters