Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support to export gguf q4_0 and q4_1 format #393

Merged
merged 30 commits into from
Jan 8, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
8355347
export gguf
n1ck-guo Dec 24, 2024
dd55003
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 24, 2024
f67219b
q4_0/1 port c++ to python
n1ck-guo Dec 24, 2024
611c4c1
Merge branch 'hengguo/gguf' of https://github.com/intel/auto-round in…
n1ck-guo Dec 24, 2024
ce1c48e
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 24, 2024
7ab730b
change to llama.cpp stype and add uint8 store
n1ck-guo Dec 25, 2024
287b5af
abstract
n1ck-guo Dec 25, 2024
49d95a8
merge
n1ck-guo Dec 25, 2024
113532a
update
n1ck-guo Dec 26, 2024
ee66c47
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 26, 2024
d395c6b
fix
n1ck-guo Dec 26, 2024
8b13f1f
Merge branch 'hengguo/gguf' of https://github.com/intel/auto-round in…
n1ck-guo Dec 26, 2024
ce2c346
update
n1ck-guo Dec 30, 2024
8bceb3f
default sequence eval
n1ck-guo Dec 30, 2024
722a1d8
modify by comments
n1ck-guo Dec 30, 2024
8712170
update
n1ck-guo Dec 30, 2024
1aa979a
pylint
n1ck-guo Dec 30, 2024
515160d
clean
n1ck-guo Dec 30, 2024
a064c44
pylint
n1ck-guo Dec 30, 2024
fa2328d
fix
n1ck-guo Dec 30, 2024
7906284
update
n1ck-guo Dec 31, 2024
4261191
Merge branch 'main' into hengguo/gguf
n1ck-guo Dec 31, 2024
e525f97
add ut
n1ck-guo Dec 31, 2024
b0f96a0
add cuda ut
n1ck-guo Dec 31, 2024
c7ec3a5
add requirements
n1ck-guo Dec 31, 2024
79c5c5a
format
n1ck-guo Dec 31, 2024
2720287
code scane
n1ck-guo Dec 31, 2024
db15354
update
n1ck-guo Jan 7, 2025
24a68a9
merge main
n1ck-guo Jan 7, 2025
cb67c1a
update
n1ck-guo Jan 7, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
endianess
12 changes: 9 additions & 3 deletions auto_round/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,9 +14,15 @@
import sys

def run_eval():
from auto_round.script.llm import setup_eval_parser, eval
args = setup_eval_parser()
eval(args)
if "--native" in sys.argv:
sys.argv.remove("--native")
from auto_round.script.llm import setup_eval_parser, eval
args = setup_eval_parser()
eval(args)
else:
from auto_round.script.llm import setup_eval_parser, eval_sequence
args = setup_eval_parser()
eval_sequence(args)

def run():
if "--eval" in sys.argv:
Expand Down
8 changes: 8 additions & 0 deletions auto_round/autoround.py
Original file line number Diff line number Diff line change
Expand Up @@ -1267,6 +1267,14 @@ def save_quantized(self, output_dir=None, format="auto_round", inplace=True, **k
if processor is not None:
processor.save_pretrained(output_dir)
return
if format in ["gguf:q4_0", "gguf:q4_1"]:
if self.group_size != 32:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also better check bits

logger.error(f"{format} need group_size=32, but it is {self.group_size}, cannot export.")
return
if format == "gguf:q4_0" and not self.sym:
logger.warning(f"incorrect format choose, will reset to gguf:q4_1")
if format == "gguf:q4_1" and self.sym:
logger.warning(f"incorrect format choose, will reset to gguf:q4_0")

from auto_round.export import EXPORT_FORMAT
backend = format
Expand Down
5 changes: 5 additions & 0 deletions auto_round/export/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -48,3 +48,8 @@ def _save_quantized_as_autoawq(*args, **kwargs):
from auto_round.export.export_to_awq.export import save_quantized_as_autoawq

return save_quantized_as_autoawq(*args, **kwargs)

@register_format("gguf")
def _save_quantized_as_autoawq(*args, **kwargs):
from auto_round.export.export_to_gguf.export import save_quantized_as_gguf
return save_quantized_as_gguf(*args, **kwargs)
13 changes: 13 additions & 0 deletions auto_round/export/export_to_gguf/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# Copyright (c) 2024 Intel Corporation
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
Loading
Loading