-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gptManagerBenchmark std::bad_alloc error #66
Comments
Hi @clockfly , Can you share the command used to build the model, please? We'd like to see if we can reproduce the problem. Thanks, |
@jdemouth-nvidia build: python build.py --model_dir /models/Llama-2-7b-chat-hf \
--dtype float16 \
--use_gpt_attention_plugin float16 \
--use_gemm_plugin float16 \
--output_dir /models/Llama-2-7b-chat-hf/fp16/1-gpu/ \
--use_inflight_batching \
--paged_kv_cache \
--remove_input_padding run: CUDA_VISIBLE_DEVICES=7 ${proj_dir}/cpp/bbb/benchmarks/gptManagerBenchmark \
--model=llama \
--engine_dir=/models/Llama-2-7b-chat-hf/fp16/1-gpu/ \
--dataset=${proj_dir}/benchmarks/cpp/preprocessed_dataset.json \
--log_level=verbose log:
Model: llama2-7b |
Thanks for sharing the concrete steps, we will try to reproduce it and hopefully provide some feedback tomorrow. June |
Hi June, how's the issue going.
|
Hi @kaiyux, was also able to reproduce this issue with both V1 and IFB:
Build Engine
Run
Out
|
It is a docker container with base image nvcr.io/nvidia/pytorch:23.08-py3 running on host.
|
Meet the same issue on the base image
Run:
|
I resolved the issue. The root cause is CXX11_ABI related. Root cause: Solution:
It should an issue would confuse users. |
@ljayx thanks for finding the issue and sharing the workaround. |
Also share this, unsure if it's relevant the tensorrtllm_backend Dockerfile sets the pytorch installation arg as
VS skipped in this package Dockerfile:
Using same base image:
|
Install pytorch with src_non_cxx11_abi seems to fix the issues encountered when using the batch manager, but maybe breaks some of the other scripts ?
|
@ryxli I write a simple demo, FYI: int main(void) {
auto logger = std::make_shared<TllmLogger>();
using severity = nvinfer1::ILogger::Severity;
logger->setLevel(severity::kWARNING);
initTrtLlmPlugins(logger.get());
std::filesystem::path engine_path{"/ljay/model/llama2-7b-chat-hf/trt_engines/fp16/1-gpu/"};
auto model_type = TrtGptModelType::InflightFusedBatching;
int32_t max_seq_len = 512;
int32_t max_num_req = 8;
int32_t max_beam_width = 1;
int32_t max_tokens_in_paged_kvcache = -1;
float kv_cache_free_gpu_mem_fraction = -1;
bool enable_trt_overlap = false;
uint64_t terminate_reqId = 10000;
batch_scheduler::SchedulerPolicy scheduler_policy{batch_scheduler::SchedulerPolicy::GUARANTEED_NO_EVICT};
auto const worldConfig = WorldConfig::mpi(*logger);
const TrtGptModelOptionalParams& optional_params = TrtGptModelOptionalParams(
max_num_req, max_tokens_in_paged_kvcache, kv_cache_free_gpu_mem_fraction, enable_trt_overlap);
auto m = std::make_shared<GptManager>(engine_path, model_type, max_beam_width, scheduler_policy, requests_callback, response_callback, nullptr, nullptr, optional_params, terminate_reqId);
for (;;)
;
} |
Hit the same issue with you. Please Nvidia experts help check it @jdemouth-nvidia @kaiyux |
tried uninstall transformer-engine, build.py and gptManagerBenchmark get to work. Not sure will it have bad influence.
|
We are still trying to reproduce and investigate, we will get back to you when we have conclusion. Thanks. |
Same error. Keep tuned. |
Thanks for the patience. We have found the root cause and are now working on the fix. We will push the fix(with other enhancements) in the recent days, and when it gets pushed, a new "announcement" will also get updated. June |
machine: nv 4090 24GB
model: llama13B-gptq (the GPU memory should be enough)
Problem: std::bad_alloc error when starting GptManager.
Expected: runs successfully.
The text was updated successfully, but these errors were encountered: