triton server failed exited with coredump #4010

jackzhou121 · 2022-03-03T08:31:54Z

Description
triton server exited with core dump

Triton Information
triton version:2.12

Are you using the Triton container or did you build it yourself?
container: nvcr.io/nvidia/tritonserver:21.07-py3
HARDWARE: A30, NVIDIA-SMI 470.57.02 Driver Version: 470.57.02 CUDA Version: 11.4

To Reproduce
Steps to reproduce the behavior.
here is the code:

#3808
#include
#include
#include
#include
#include
#include
#include
#include <unordered_map>
#include
#include
#include
#include <torch/torch.h>

#include "triton/core/tritonserver.h"
#include "common/common.h"

#define TRITON_ENABLE_GPU 1

#ifdef TRITON_ENABLE_GPU
#include <cuda_runtime_api.h>
#endif

static std::shared_ptr<TRITONSERVER_Server> g_server;

int create_server(std::shared_ptr<TRITONSERVER_Server> &g_server)
{
std::string model_repository_path = "/workspace/triton_tts/triton_tts/build/tts_model_repo_separate";
int verbose_level = 1;
TRITONSERVER_MemoryType requested_memory_type = TRITONSERVER_MEMORY_CPU_PINNED;

    if(model_repository_path.empty()){
            std::cout << "model repo path not set" << '\n';
            return 0;
    }

    uint32_t api_version_major, api_version_minor;
    FAIL_IF_ERR(
            TRITONSERVER_ApiVersion(&api_version_major, &api_version_minor),
            "getting triton versino");
    std::cout << "triton version: "<<api_version_major << "-" << api_version_minor << '\n';

    if((TRITONSERVER_API_VERSION_MAJOR != api_version_major) || (TRITONSERVER_API_VERSION_MINOR) > api_version_minor){
            FAIL("triton server API version mismatch");
    }

    //Create triton server
    TRITONSERVER_ServerOptions *server_options = nullptr;
    FAIL_IF_ERR(
            TRITONSERVER_ServerOptionsNew(&server_options),
            "creating server options");
    FAIL_IF_ERR(
            TRITONSERVER_ServerOptionsSetModelRepositoryPath(server_options, model_repository_path.c_str()),
            "setting model repository path");
    FAIL_IF_ERR(
            TRITONSERVER_ServerOptionsSetLogVerbose(server_options, verbose_level),
            "setting verbose logging level");
    FAIL_IF_ERR(
            TRITONSERVER_ServerOptionsSetBackendDirectory(server_options, "/opt/tritonserver/backends"),
            "setting backend directory");
    FAIL_IF_ERR(
            TRITONSERVER_ServerOptionsSetRepoAgentDirectory(server_options, "/opt/tritonserver/repoagents"),
            "setting repository agent directory");
    FAIL_IF_ERR(
            TRITONSERVER_ServerOptionsSetStrictModelConfig(server_options, false),
            "settign strict model configuration");

#ifdef TRITON_ENABLE_GPU
double min_compute_capability = TRITON_MIN_COMPUTE_CAPABILITY;
#else
double ming_compute_capability = 0;
#endif

    FAIL_IF_ERR(
            TRITONSERVER_ServerOptionsSetMinSupportedComputeCapability(server_options, min_compute_capability),
            "setting minimum support cuda compute capability");

    TRITONSERVER_Server *server_ptr = nullptr;
    FAIL_IF_ERR(
            TRITONSERVER_ServerNew(&server_ptr, server_options),
            "creating server");
    FAIL_IF_ERR(
            TRITONSERVER_ServerOptionsDelete(server_options),
            "deleting server options");

    std::shared_ptr<TRITONSERVER_Server> server(server_ptr, TRITONSERVER_ServerDelete);

    g_server = server;

    size_t health_iters = 0;
    while(true){
            bool live, ready;
            FAIL_IF_ERR(
                    TRITONSERVER_ServerIsLive(server.get(), &live),
                    "unable to get server liveness");
            FAIL_IF_ERR(
                    TRITONSERVER_ServerIsReady(server.get(), &ready),
                    "unable to get server readiness");
            std::cout << "Server Health is live: " << live << ", ready: " << ready << '\n';
            if(live && ready) {
                    break;
            }


            if(++health_iters >= 10) {
                    FAIL("failed to find healthy inference server");
            }
            std::this_thread::sleep_for(std::chrono::milliseconds(500));
    }

    //Print statis of the Server
    {
            TRITONSERVER_Message *server_metadata_message;
            FAIL_IF_ERR(
                    TRITONSERVER_ServerMetadata(server.get(), &server_metadata_message),
                    "unable to get server metadata message");
            const char *buffer;
            size_t byte_size;
            FAIL_IF_ERR(
                    TRITONSERVER_MessageSerializeToJson(server_metadata_message, &buffer, &byte_size),
                    "unable to serilize server metadata message");

            std::cout << "Server Status: "<<std::endl;
            std::cout << std::string(buffer, byte_size) <<'\n';

            FAIL_IF_ERR(
                    TRITONSERVER_MessageDelete(server_metadata_message),
                    "deleting status metadata");
    }
    return 0;

}

int run(void) {
create_server(g_server);
return 0;
}

int main(){
run();
return 0;
}

Describe the models (framework, inputs, outputs), ideally include the model configuration file (if using an ensemble include the model configuration file for that as well).
torchscript model

Expected behavior
the code was run well in the old enviroment but now it broken with following error info:

terminate called after throwing an instance of 'c10::Error'
what(): invalid device pointer: 0x7f5c9b400000
Exception raised from free at ../c10/cuda/CUDACachingAllocator.cpp:1223 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits, std::allocator >) + 0x6c (0x7f5ea8ffb24c in /opt/tritonserver/backends/pytorch/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&) + 0xfa (0x7f5ea8fc6a66 in /opt/tritonserver/backends/pytorch/libc10.so)
frame #2: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0x24f (0x7f5ea8f8b9af in /opt/tritonserver/backends/pytorch/libc10_cuda.so)
frame #3: c10::TensorImpl::release_resources() + 0x9c (0x7f5ea8fe51ec in /opt/tritonserver/backends/pytorch/libc10.so)
frame #4: + 0x11b5595 (0x7f5e6f61e595 in /opt/tritonserver/backends/pytorch/libtorch_cpu.so)
frame #5: + 0x23993 (0x7f5ea98e5993 in /opt/tritonserver/backends/pytorch/libtriton_pytorch.so)
frame #6: + 0x1779b (0x7f5ea98d979b in /opt/tritonserver/backends/pytorch/libtriton_pytorch.so)
frame #7: TRITONBACKEND_ModelInstanceFinalize + 0x1e4 (0x7f5ea98d9c64 in /opt/tritonserver/backends/pytorch/libtriton_pytorch.so)
frame #8: + 0x309d21 (0x7f5ee7d40d21 in /opt/tritonserver/lib/libtritonserver.so)
frame #9: + 0x305531 (0x7f5ee7d3c531 in /opt/tritonserver/lib/libtritonserver.so)
frame #10: + 0x305bdd (0x7f5ee7d3cbdd in /opt/tritonserver/lib/libtritonserver.so)
frame #11: + 0x1857a7 (0x7f5ee7bbc7a7 in /opt/tritonserver/lib/libtritonserver.so)
frame #12: + 0xd6de4 (0x7f5ee7922de4 in /lib/x86_64-linux-gnu/libstdc++.so.6)
frame #13: + 0x9609 (0x7f5eccdef609 in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #14: clone + 0x43 (0x7f5ee7761293 in /lib/x86_64-linux-gnu/libc.so.6)

The text was updated successfully, but these errors were encountered:

jackzhou121 · 2022-03-03T08:35:54Z

the format was stronger

jackzhou121 · 2022-03-03T08:36:57Z

I put the code here
#include<string>
#include<chrono>
#include<cstring>
#include<future>
#include<iostream>
#include<string>
#include <sstream>
#include <thread>
#include <unordered_map>
#include <vector>
#include <exception>
#include <stdexcept>
#include <torch/torch.h>
#include "triton/core/tritonserver.h"
#include "common/common.h"
#define TRITON_ENABLE_GPU 1
#ifdef TRITON_ENABLE_GPU
#include <cuda_runtime_api.h>
#endif
static std::shared_ptr<TRITONSERVER_Server> g_server;
int create_server(std::shared_ptr<TRITONSERVER_Server> &g_server)
{
std::string model_repository_path = "/workspace/triton_tts/triton_tts/build/tts_model_repo_separate";
int verbose_level = 1;
TRITONSERVER_MemoryType requested_memory_type = TRITONSERVER_MEMORY_CPU_PINNED;
if(model_repository_path.empty()){
std::cout << "model repo path not set" << '\n';
return 0;
}
uint32_t api_version_major, api_version_minor;
FAIL_IF_ERR(
TRITONSERVER_ApiVersion(&api_version_major, &api_version_minor),
"getting triton versino");
std::cout << "triton version: "<<api_version_major << "-" << api_version_minor << '\n';
if((TRITONSERVER_API_VERSION_MAJOR != api_version_major) || (TRITONSERVER_API_VERSION_MINOR) > api_version_minor){
FAIL("triton server API version mismatch");
}
//Create triton server
TRITONSERVER_ServerOptions *server_options = nullptr;
FAIL_IF_ERR(
TRITONSERVER_ServerOptionsNew(&server_options),
"creating server options");
FAIL_IF_ERR(
TRITONSERVER_ServerOptionsSetModelRepositoryPath(server_options, model_repository_path.c_str()),
"setting model repository path");
FAIL_IF_ERR(
TRITONSERVER_ServerOptionsSetLogVerbose(server_options, verbose_level),
"setting verbose logging level");
FAIL_IF_ERR(
TRITONSERVER_ServerOptionsSetBackendDirectory(server_options, "/opt/tritonserver/backends"),
"setting backend directory");
FAIL_IF_ERR(
TRITONSERVER_ServerOptionsSetRepoAgentDirectory(server_options, "/opt/tritonserver/repoagents"),
"setting repository agent directory");
FAIL_IF_ERR(
TRITONSERVER_ServerOptionsSetStrictModelConfig(server_options, false),
"settign strict model configuration");
#ifdef TRITON_ENABLE_GPU
double min_compute_capability = TRITON_MIN_COMPUTE_CAPABILITY;
#else
double ming_compute_capability = 0;
#endif
FAIL_IF_ERR(
TRITONSERVER_ServerOptionsSetMinSupportedComputeCapability(server_options, min_compute_capability),
"setting minimum support cuda compute capability");

    TRITONSERVER_Server *server_ptr = nullptr;
    FAIL_IF_ERR(
            TRITONSERVER_ServerNew(&server_ptr, server_options),
            "creating server");
    FAIL_IF_ERR(
            TRITONSERVER_ServerOptionsDelete(server_options),
            "deleting server options");
    std::shared_ptr<TRITONSERVER_Server> server(server_ptr, TRITONSERVER_ServerDelete);
    g_server = server;
    size_t health_iters = 0;
    while(true){
            bool live, ready;
            FAIL_IF_ERR(
                    TRITONSERVER_ServerIsLive(server.get(), &live),
                    "unable to get server liveness");
            FAIL_IF_ERR(
                    TRITONSERVER_ServerIsReady(server.get(), &ready),
                    "unable to get server readiness");
            std::cout << "Server Health is live: " << live << ", ready: " << ready << '\n';
            if(live && ready) {
                    break;
            }
            if(++health_iters >= 10) {
                    FAIL("failed to find healthy inference server");
            }
            std::this_thread::sleep_for(std::chrono::milliseconds(500));
    }
    //Print statis of the Server
    {
            TRITONSERVER_Message *server_metadata_message;
            FAIL_IF_ERR(
                    TRITONSERVER_ServerMetadata(server.get(), &server_metadata_message),
                    "unable to get server metadata message");
            const char *buffer;
            size_t byte_size;
            FAIL_IF_ERR(
                    TRITONSERVER_MessageSerializeToJson(server_metadata_message, &buffer, &byte_size),
                    "unable to serilize server metadata message");
            std::cout << "Server Status: "<<std::endl;
            std::cout << std::string(buffer, byte_size) <<'\n';
            FAIL_IF_ERR(
                    TRITONSERVER_MessageDelete(server_metadata_message),
                    "deleting status metadata");
    }
    return 0;

}
int run(void) {
create_server(g_server);
return 0;
}
int main(){
run();
return 0;
}

GuanLuo · 2022-03-08T01:15:32Z

The segfault seems to be coming from Torch library, do you still encounter segfault with a different framework? If not, then the issue may not be within Triton.

CoderHam · 2022-03-08T01:28:48Z

@jackzhou121 was the segfault during inference? If so you should attempt to run the model outside Triton using PyTorch (C API) directly to confirm the issue is not specific to pytorch and lies in Triton.

jackzhou121 · 2022-03-08T05:14:33Z

static std::shared_ptr<TRITONSERVER_Server> g_server;

static std::shared_ptr<TRITONSERVER_Server> g_server;

if "g_server" is static wether gloable or local in my function, i have to call TRITONSERVER_ServerDelete(g_server) before my program exited.

if "g_server" become local and no static, the program can existed successful.

jackzhou121 · 2022-03-08T05:14:56Z

static std::shared_ptr<TRITONSERVER_Server> g_server;

if "g_server" is static wether gloable or local in my function, i have call TRITONSERVER_ServerDelete(g_server) before my program exited.

if "g_server" become local and no static, the program can existed successful.

static std::shared_ptr<TRITONSERVER_Server> g_server;

if "g_server" is static wether gloable or local in my function, i have to call TRITONSERVER_ServerDelete(g_server) before my program exited.

if "g_server" become local and no static, the program can existed successful.

jackzhou121 · 2022-03-08T05:23:35Z

@jackzhou121 was the segfault during inference? If so you should attempt to run the model outside Triton using PyTorch (C API) directly to confirm the issue is not specific to pytorch and lies in Triton.

when the program existed, i guss some resouce has double freed, the last free operation ocurred when the tritonserver has release all resouces

GuanLuo · 2022-03-11T22:35:21Z

if "g_server" is static wether gloable or local in my function, i have to call TRITONSERVER_ServerDelete(g_server) before my program exited.

Do you mean in this case you will need to call TRITONSERVER_ServerDelete explicitly to exit normally? I would avoid using static keyword because Triton internally creates other static objects and this may cause unintended destruction orders which may be the cause of the segfault. So I think maintaining server object locally will solve the issue.

dyastremsky · 2022-03-25T16:50:27Z

Closing issue due to lack of activity. Please re-open the issue if you would like to follow up with this issue.

jackzhou121 closed this as completed Mar 3, 2022

jackzhou121 reopened this Mar 3, 2022

dyastremsky closed this as completed Mar 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

triton server failed exited with coredump #4010

triton server failed exited with coredump #4010

jackzhou121 commented Mar 3, 2022 •

edited

Loading

jackzhou121 commented Mar 3, 2022

jackzhou121 commented Mar 3, 2022 •

edited

Loading

GuanLuo commented Mar 8, 2022

CoderHam commented Mar 8, 2022

jackzhou121 commented Mar 8, 2022 •

edited

Loading

jackzhou121 commented Mar 8, 2022 •

edited

Loading

jackzhou121 commented Mar 8, 2022

GuanLuo commented Mar 11, 2022

dyastremsky commented Mar 25, 2022

triton server failed exited with coredump #4010

triton server failed exited with coredump #4010

Comments

jackzhou121 commented Mar 3, 2022 • edited Loading

jackzhou121 commented Mar 3, 2022

jackzhou121 commented Mar 3, 2022 • edited Loading

GuanLuo commented Mar 8, 2022

CoderHam commented Mar 8, 2022

jackzhou121 commented Mar 8, 2022 • edited Loading

jackzhou121 commented Mar 8, 2022 • edited Loading

jackzhou121 commented Mar 8, 2022

GuanLuo commented Mar 11, 2022

dyastremsky commented Mar 25, 2022

jackzhou121 commented Mar 3, 2022 •

edited

Loading

jackzhou121 commented Mar 3, 2022 •

edited

Loading

jackzhou121 commented Mar 8, 2022 •

edited

Loading

jackzhou121 commented Mar 8, 2022 •

edited

Loading