mit-han-lab / llm-awq Public

Notifications You must be signed in to change notification settings
Fork 214
Star 2.6k

Code
Issues 139
Pull requests 7
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Issues: mit-han-lab/llm-awq

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

139 Open 53 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

[BUG] GPU memory used is much more in v0.2.7 than v0.2.5 while quantizing models.

#247 opened Dec 18, 2024 by GodHforever

AWQ quantization doesn't work in many opensource LLM in terms of inference efficiency

#243 opened Dec 10, 2024 by loulianzhang

Don`t work on CPU "Unable to get JIT kernel for brgemm"

#241 opened Nov 25, 2024 by andretisch

Inquiry about GPU memory usage of VILA 1.5-3b AWQ model for 12 frames video.

#240 opened Nov 18, 2024 by gj-raza

RuntimeError: CUDA error: no kernel image is available for execution on the device

#238 opened Nov 15, 2024 by new-Sunset-shimmer

Could you explain me how can I change the percentage of kept salient weights in FP16?

#237 opened Nov 15, 2024 by akylbekmaxutov

Cannot clone from Efficient-Large-Model/VILA.git, Dependency Issues with alternative

#236 opened Nov 14, 2024 by rossgreer

[QST] Why does awq write its own int3/int4 GEMM kernels instead of using CUTLASS

#235 opened Nov 11, 2024 by SimpleTheoryOfTypes

Unable to run Gradio demo: VILA with TinyChat on a local GPU server

#234 opened Nov 4, 2024 by mitraavi

Support for llava_next Architecture in LLM-AWQ (Issue with Quantizing llava-hf/llava-v1.6-mistral-7b-hf)

#233 opened Nov 1, 2024 by ShobhaRajanna

How to convert the AWQ model after the quantization into safetensors

#232 opened Oct 31, 2024 by vladimiralbrekhtccr

Regarding the issues encountered with w_bit 3 quantification

#231 opened Oct 30, 2024 by langxinspieder

About the use of calibration sets

#230 opened Oct 30, 2024 by langxinspieder

Questions on the AWQ

#229 opened Oct 23, 2024 by suhcrates-web

No video inference code

#227 opened Oct 16, 2024 by Closertodeath

怎么将生成的.pt文件与模型结构合并，转换为其他结构

#226 opened Oct 14, 2024 by gdgfd22

AutoModelForSequenceClassification模型量化

#225 opened Oct 12, 2024 by Fenglly

AttributeError: 'LlamaConfig' object has no attribute 'rope_theta'

#222 opened Sep 30, 2024 by lvtao65535

How to Split AWQ Weights?

#221 opened Sep 28, 2024 by Azure-Tang

Unsupported NVHPC compiler found. nvc++ is the only NVHPC compiler

#220 opened Sep 17, 2024 by SimWangArizona

"Expected all tensors to be on the same device" when running "Perform AWQ search" on Llama3

#219 opened Sep 10, 2024 by charlesyju

About the implementation of scaled activation

#217 opened Aug 22, 2024 by XcloudFance

Batch Processing not implemented for LlavaStreamGenerator

#216 opened Aug 12, 2024 by rahulthakur319

NotImplementedError: <class 'transformers_modules.modeling_chatglm.ChatGLMForConditionalGeneration'>

#214 opened Aug 8, 2024 by lihaofd

GGUF export support / CPU inference

#213 opened Aug 5, 2024 by TomekPro

Previous 1 2 3 4 5 6 Next

Previous Next

ProTip! Type g i on any issue or pull request to go back to the issue listing page.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly