Skip to content

v0.2.0(cuda12)对比 v0.1.13(cuda11)表现下降 #74

@invisifire

Description

@invisifire

环境配置
A 环境 cuda12.1 v0.2.0
B 环境 cuda11.8 v0.1.13
硬件
A800单卡测试

模型 qwen14B
单卡加载 int8推理 环境变量如下配置
export CUDA_VISIBLE_DEVICES=1
export MODEL_TYPE=qwen_2
export ACT_TYPE=BF16
export WEIGHT_TYPE=INT8
export INT8_KV_CACHE=1
export MAX_SEQ_LEN=32000
export CONCURRENCY_LIMIT=50
export TOKENIZER_PATH="/data/models/Qwen1.5-14B-Chat"
export CHECKPOINT_PATH="/data/models/Qwen1.5-14B-Chat"
export START_PORT=8020
export KV_CACHE_MEM_MB=8000
export PP_SIZE=1
export TP_SIZE=1

python -m maga_transformer.start_server

测试数据 10输入 50输出 超短场景

image

经测试 deepseek 等其余模型也有一定的速度下降

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions