Skip to content

Commit

Permalink
add vllm awq quantization
Browse files Browse the repository at this point in the history
  • Loading branch information
董晓龙 committed Sep 22, 2023
1 parent a040cdc commit c147858
Showing 1 changed file with 2 additions and 0 deletions.
2 changes: 2 additions & 0 deletions fastchat/serve/vllm_worker.py
Original file line number Diff line number Diff line change
Expand Up @@ -210,6 +210,8 @@ async def api_model_details(request: Request):
args.model = args.model_path
if args.num_gpus > 1:
args.tensor_parallel_size = args.num_gpus
if args.quantizaiton:
args.quantization = args.quantization

engine_args = AsyncEngineArgs.from_cli_args(args)
engine = AsyncLLMEngine.from_engine_args(engine_args)
Expand Down

0 comments on commit c147858

Please sign in to comment.