Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

triton server failed exited with coredump #4010

Closed
jackzhou121 opened this issue Mar 3, 2022 · 9 comments
Closed

triton server failed exited with coredump #4010

jackzhou121 opened this issue Mar 3, 2022 · 9 comments

Comments

@jackzhou121
Copy link

jackzhou121 commented Mar 3, 2022

Description
triton server exited with core dump

Triton Information
triton version:2.12

Are you using the Triton container or did you build it yourself?
container: nvcr.io/nvidia/tritonserver:21.07-py3
HARDWARE: A30, NVIDIA-SMI 470.57.02 Driver Version: 470.57.02 CUDA Version: 11.4

To Reproduce
Steps to reproduce the behavior.
here is the code:

#3808
#include
#include
#include
#include
#include
#include
#include
#include <unordered_map>
#include
#include
#include
#include <torch/torch.h>

#include "triton/core/tritonserver.h"
#include "common/common.h"

#define TRITON_ENABLE_GPU 1

#ifdef TRITON_ENABLE_GPU
#include <cuda_runtime_api.h>
#endif

static std::shared_ptr<TRITONSERVER_Server> g_server;

int create_server(std::shared_ptr<TRITONSERVER_Server> &g_server)
{
std::string model_repository_path = "/workspace/triton_tts/triton_tts/build/tts_model_repo_separate";
int verbose_level = 1;
TRITONSERVER_MemoryType requested_memory_type = TRITONSERVER_MEMORY_CPU_PINNED;

    if(model_repository_path.empty()){
            std::cout << "model repo path not set" << '\n';
            return 0;
    }

    uint32_t api_version_major, api_version_minor;
    FAIL_IF_ERR(
            TRITONSERVER_ApiVersion(&api_version_major, &api_version_minor),
            "getting triton versino");
    std::cout << "triton version: "<<api_version_major << "-" << api_version_minor << '\n';

    if((TRITONSERVER_API_VERSION_MAJOR != api_version_major) || (TRITONSERVER_API_VERSION_MINOR) > api_version_minor){
            FAIL("triton server API version mismatch");
    }

    //Create triton server
    TRITONSERVER_ServerOptions *server_options = nullptr;
    FAIL_IF_ERR(
            TRITONSERVER_ServerOptionsNew(&server_options),
            "creating server options");
    FAIL_IF_ERR(
            TRITONSERVER_ServerOptionsSetModelRepositoryPath(server_options, model_repository_path.c_str()),
            "setting model repository path");
    FAIL_IF_ERR(
            TRITONSERVER_ServerOptionsSetLogVerbose(server_options, verbose_level),
            "setting verbose logging level");
    FAIL_IF_ERR(
            TRITONSERVER_ServerOptionsSetBackendDirectory(server_options, "/opt/tritonserver/backends"),
            "setting backend directory");
    FAIL_IF_ERR(
            TRITONSERVER_ServerOptionsSetRepoAgentDirectory(server_options, "/opt/tritonserver/repoagents"),
            "setting repository agent directory");
    FAIL_IF_ERR(
            TRITONSERVER_ServerOptionsSetStrictModelConfig(server_options, false),
            "settign strict model configuration");

#ifdef TRITON_ENABLE_GPU
double min_compute_capability = TRITON_MIN_COMPUTE_CAPABILITY;
#else
double ming_compute_capability = 0;
#endif

    FAIL_IF_ERR(
            TRITONSERVER_ServerOptionsSetMinSupportedComputeCapability(server_options, min_compute_capability),
            "setting minimum support cuda compute capability");

    TRITONSERVER_Server *server_ptr = nullptr;
    FAIL_IF_ERR(
            TRITONSERVER_ServerNew(&server_ptr, server_options),
            "creating server");
    FAIL_IF_ERR(
            TRITONSERVER_ServerOptionsDelete(server_options),
            "deleting server options");

    std::shared_ptr<TRITONSERVER_Server> server(server_ptr, TRITONSERVER_ServerDelete);

    g_server = server;

    size_t health_iters = 0;
    while(true){
            bool live, ready;
            FAIL_IF_ERR(
                    TRITONSERVER_ServerIsLive(server.get(), &live),
                    "unable to get server liveness");
            FAIL_IF_ERR(
                    TRITONSERVER_ServerIsReady(server.get(), &ready),
                    "unable to get server readiness");
            std::cout << "Server Health is live: " << live << ", ready: " << ready << '\n';
            if(live && ready) {
                    break;
            }


            if(++health_iters >= 10) {
                    FAIL("failed to find healthy inference server");
            }
            std::this_thread::sleep_for(std::chrono::milliseconds(500));
    }

    //Print statis of the Server
    {
            TRITONSERVER_Message *server_metadata_message;
            FAIL_IF_ERR(
                    TRITONSERVER_ServerMetadata(server.get(), &server_metadata_message),
                    "unable to get server metadata message");
            const char *buffer;
            size_t byte_size;
            FAIL_IF_ERR(
                    TRITONSERVER_MessageSerializeToJson(server_metadata_message, &buffer, &byte_size),
                    "unable to serilize server metadata message");

            std::cout << "Server Status: "<<std::endl;
            std::cout << std::string(buffer, byte_size) <<'\n';

            FAIL_IF_ERR(
                    TRITONSERVER_MessageDelete(server_metadata_message),
                    "deleting status metadata");
    }
    return 0;

}

int run(void) {
create_server(g_server);
return 0;
}

int main(){
run();
return 0;
}

Describe the models (framework, inputs, outputs), ideally include the model configuration file (if using an ensemble include the model configuration file for that as well).
torchscript model

Expected behavior
the code was run well in the old enviroment but now it broken with following error info:

terminate called after throwing an instance of 'c10::Error'
what(): invalid device pointer: 0x7f5c9b400000
Exception raised from free at ../c10/cuda/CUDACachingAllocator.cpp:1223 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits, std::allocator >) + 0x6c (0x7f5ea8ffb24c in /opt/tritonserver/backends/pytorch/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&) + 0xfa (0x7f5ea8fc6a66 in /opt/tritonserver/backends/pytorch/libc10.so)
frame #2: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0x24f (0x7f5ea8f8b9af in /opt/tritonserver/backends/pytorch/libc10_cuda.so)
frame #3: c10::TensorImpl::release_resources() + 0x9c (0x7f5ea8fe51ec in /opt/tritonserver/backends/pytorch/libc10.so)
frame #4: + 0x11b5595 (0x7f5e6f61e595 in /opt/tritonserver/backends/pytorch/libtorch_cpu.so)
frame #5: + 0x23993 (0x7f5ea98e5993 in /opt/tritonserver/backends/pytorch/libtriton_pytorch.so)
frame #6: + 0x1779b (0x7f5ea98d979b in /opt/tritonserver/backends/pytorch/libtriton_pytorch.so)
frame #7: TRITONBACKEND_ModelInstanceFinalize + 0x1e4 (0x7f5ea98d9c64 in /opt/tritonserver/backends/pytorch/libtriton_pytorch.so)
frame #8: + 0x309d21 (0x7f5ee7d40d21 in /opt/tritonserver/lib/libtritonserver.so)
frame #9: + 0x305531 (0x7f5ee7d3c531 in /opt/tritonserver/lib/libtritonserver.so)
frame #10: + 0x305bdd (0x7f5ee7d3cbdd in /opt/tritonserver/lib/libtritonserver.so)
frame #11: + 0x1857a7 (0x7f5ee7bbc7a7 in /opt/tritonserver/lib/libtritonserver.so)
frame #12: + 0xd6de4 (0x7f5ee7922de4 in /lib/x86_64-linux-gnu/libstdc++.so.6)
frame #13: + 0x9609 (0x7f5eccdef609 in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #14: clone + 0x43 (0x7f5ee7761293 in /lib/x86_64-linux-gnu/libc.so.6)

@jackzhou121
Copy link
Author

the format was stronger

@jackzhou121 jackzhou121 reopened this Mar 3, 2022
@jackzhou121
Copy link
Author

jackzhou121 commented Mar 3, 2022

I put the code here
#include<string>
#include<chrono>
#include<cstring>
#include<future>
#include<iostream>
#include<string>
#include <sstream>
#include <thread>
#include <unordered_map>
#include <vector>
#include <exception>
#include <stdexcept>
#include <torch/torch.h>
#include "triton/core/tritonserver.h"
#include "common/common.h"
#define TRITON_ENABLE_GPU 1
#ifdef TRITON_ENABLE_GPU
#include <cuda_runtime_api.h>
#endif
static std::shared_ptr<TRITONSERVER_Server> g_server;
int create_server(std::shared_ptr<TRITONSERVER_Server> &g_server)
{
std::string model_repository_path = "/workspace/triton_tts/triton_tts/build/tts_model_repo_separate";
int verbose_level = 1;
TRITONSERVER_MemoryType requested_memory_type = TRITONSERVER_MEMORY_CPU_PINNED;
if(model_repository_path.empty()){
std::cout << "model repo path not set" << '\n';
return 0;
}
uint32_t api_version_major, api_version_minor;
FAIL_IF_ERR(
TRITONSERVER_ApiVersion(&api_version_major, &api_version_minor),
"getting triton versino");
std::cout << "triton version: "<<api_version_major << "-" << api_version_minor << '\n';
if((TRITONSERVER_API_VERSION_MAJOR != api_version_major) || (TRITONSERVER_API_VERSION_MINOR) > api_version_minor){
FAIL("triton server API version mismatch");
}
//Create triton server
TRITONSERVER_ServerOptions *server_options = nullptr;
FAIL_IF_ERR(
TRITONSERVER_ServerOptionsNew(&server_options),
"creating server options");
FAIL_IF_ERR(
TRITONSERVER_ServerOptionsSetModelRepositoryPath(server_options, model_repository_path.c_str()),
"setting model repository path");
FAIL_IF_ERR(
TRITONSERVER_ServerOptionsSetLogVerbose(server_options, verbose_level),
"setting verbose logging level");
FAIL_IF_ERR(
TRITONSERVER_ServerOptionsSetBackendDirectory(server_options, "/opt/tritonserver/backends"),
"setting backend directory");
FAIL_IF_ERR(
TRITONSERVER_ServerOptionsSetRepoAgentDirectory(server_options, "/opt/tritonserver/repoagents"),
"setting repository agent directory");
FAIL_IF_ERR(
TRITONSERVER_ServerOptionsSetStrictModelConfig(server_options, false),
"settign strict model configuration");
#ifdef TRITON_ENABLE_GPU
double min_compute_capability = TRITON_MIN_COMPUTE_CAPABILITY;
#else
double ming_compute_capability = 0;
#endif
FAIL_IF_ERR(
TRITONSERVER_ServerOptionsSetMinSupportedComputeCapability(server_options, min_compute_capability),
"setting minimum support cuda compute capability");

    TRITONSERVER_Server *server_ptr = nullptr;
    FAIL_IF_ERR(
            TRITONSERVER_ServerNew(&server_ptr, server_options),
            "creating server");
    FAIL_IF_ERR(
            TRITONSERVER_ServerOptionsDelete(server_options),
            "deleting server options");
    std::shared_ptr<TRITONSERVER_Server> server(server_ptr, TRITONSERVER_ServerDelete);
    g_server = server;
    size_t health_iters = 0;
    while(true){
            bool live, ready;
            FAIL_IF_ERR(
                    TRITONSERVER_ServerIsLive(server.get(), &live),
                    "unable to get server liveness");
            FAIL_IF_ERR(
                    TRITONSERVER_ServerIsReady(server.get(), &ready),
                    "unable to get server readiness");
            std::cout << "Server Health is live: " << live << ", ready: " << ready << '\n';
            if(live && ready) {
                    break;
            }
            if(++health_iters >= 10) {
                    FAIL("failed to find healthy inference server");
            }
            std::this_thread::sleep_for(std::chrono::milliseconds(500));
    }
    //Print statis of the Server
    {
            TRITONSERVER_Message *server_metadata_message;
            FAIL_IF_ERR(
                    TRITONSERVER_ServerMetadata(server.get(), &server_metadata_message),
                    "unable to get server metadata message");
            const char *buffer;
            size_t byte_size;
            FAIL_IF_ERR(
                    TRITONSERVER_MessageSerializeToJson(server_metadata_message, &buffer, &byte_size),
                    "unable to serilize server metadata message");
            std::cout << "Server Status: "<<std::endl;
            std::cout << std::string(buffer, byte_size) <<'\n';
            FAIL_IF_ERR(
                    TRITONSERVER_MessageDelete(server_metadata_message),
                    "deleting status metadata");
    }
    return 0;

}
int run(void) {
create_server(g_server);
return 0;
}
int main(){
run();
return 0;
}

@GuanLuo
Copy link
Contributor

GuanLuo commented Mar 8, 2022

The segfault seems to be coming from Torch library, do you still encounter segfault with a different framework? If not, then the issue may not be within Triton.

@CoderHam
Copy link
Contributor

CoderHam commented Mar 8, 2022

@jackzhou121 was the segfault during inference? If so you should attempt to run the model outside Triton using PyTorch (C API) directly to confirm the issue is not specific to pytorch and lies in Triton.

@jackzhou121
Copy link
Author

jackzhou121 commented Mar 8, 2022

static std::shared_ptr<TRITONSERVER_Server> g_server;

static std::shared_ptr<TRITONSERVER_Server> g_server;

if "g_server" is static wether gloable or local in my function, i have to call TRITONSERVER_ServerDelete(g_server) before my program exited.

if "g_server" become local and no static, the program can existed successful.

@jackzhou121
Copy link
Author

jackzhou121 commented Mar 8, 2022

static std::shared_ptr<TRITONSERVER_Server> g_server;

if "g_server" is static wether gloable or local in my function, i have call TRITONSERVER_ServerDelete(g_server) before my program exited.

if "g_server" become local and no static, the program can existed successful.

static std::shared_ptr<TRITONSERVER_Server> g_server;

if "g_server" is static wether gloable or local in my function, i have to call TRITONSERVER_ServerDelete(g_server) before my program exited.

if "g_server" become local and no static, the program can existed successful.

@jackzhou121
Copy link
Author

@jackzhou121 was the segfault during inference? If so you should attempt to run the model outside Triton using PyTorch (C API) directly to confirm the issue is not specific to pytorch and lies in Triton.

when the program existed, i guss some resouce has double freed, the last free operation ocurred when the tritonserver has release all resouces

@GuanLuo
Copy link
Contributor

GuanLuo commented Mar 11, 2022

if "g_server" is static wether gloable or local in my function, i have to call TRITONSERVER_ServerDelete(g_server) before my program exited.

Do you mean in this case you will need to call TRITONSERVER_ServerDelete explicitly to exit normally? I would avoid using static keyword because Triton internally creates other static objects and this may cause unintended destruction orders which may be the cause of the segfault. So I think maintaining server object locally will solve the issue.

@dyastremsky
Copy link
Contributor

Closing issue due to lack of activity. Please re-open the issue if you would like to follow up with this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

4 participants