Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

多线程环境下使用fluid在线预估库,释放clone的predictor出core #19361

Closed
songyiting opened this issue Aug 22, 2019 · 2 comments
Closed
Assignees
Labels
status/close 已关闭 User 用于标记用户问题

Comments

@songyiting
Copy link

  • 版本、环境信息:
       1)PaddlePaddle版本:fluid 1.4
       2)CPU:Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz
       3)系统环境:Linux
  • 复现信息:内网环境,在模型更新时必现
  • 问题描述:请详细描述您的问题,同步贴出报错信息、日志/代码关键片段
    我们的预估服务在每个线程中分配一个thread_local的PaddlePredictor指针,各自指向一个Clone的PaddlePredictor对象,当模型更新时释放对象并将指针指向新Clone的对象。相关代码如下:
    class UapTagPredictor {
    ...
    std::unique_ptr<paddle::PaddlePredictor> _predictor;
    }
    //当检测到模型更新时,释放旧的_predictor并重新Clone
    void UapTagPredictor::reset_predictor() {
    _model = ModelManager::get_instance().get_model();
    _predictor = nullptr;
    _predictor = _model->clone_predictor();
    _last_model_time = _model->get_udpate_time();
    }
    在释放clone出的predictor时出core,core信息如下:
    (gdb) bt
    #0 0x00007efedb8035f9 in paddle::memory::detail::MetadataCache::load(paddle::memory::detail::MemoryBlock const*) const () from /home/work/cpufutureattention_a7ae9c75a852f7dbfd5afd52ffcafce7/bin/../lib/libpaddle_fluid.so
    read data from hdfs #1 0x00007efedb802e54 in paddle::memory::detail::MemoryBlock::type(paddle::memory::detail::MetadataCache const&) const () from /home/work/cpufutureattention_a7ae9c75a852f7dbfd5afd52ffcafce7/bin/../lib/libpaddle_fluid.so
    sxi_sock.h is missing during RDMA build #2 0x00007efedb8022c9 in paddle::memory::detail::BuddyAllocator::Free(void*) () from /home/work/cpufutureattention_a7ae9c75a852f7dbfd5afd52ffcafce7/bin/../lib/libpaddle_fluid.so
    Support CUDA 8.0 #3 0x00007efedb7ff365 in void paddle::memory::legacy::Freepaddle::platform::CPUPlace(paddle::platform::CPUPlace const&, void*, unsigned long) ()
    from /home/work/cpufutureattention_a7ae9c75a852f7dbfd5afd52ffcafce7/bin/../lib/libpaddle_fluid.so
    Semantic role labeling demo #4 0x00007efedb8000e5 in paddle::memory::allocation::LegacyAllocator::Free(paddle::memory::allocation::Allocation*) () from /home/work/cpufutureattention_a7ae9c75a852f7dbfd5afd52ffcafce7/bin/../lib/libpaddle_fluid.so
    怎么没有windows版本呢? #5 0x00007efeda9a920a in paddle::AnalysisPredictor::~AnalysisPredictor() () from /home/work/cpufutureattention_a7ae9c75a852f7dbfd5afd52ffcafce7/bin/../lib/libpaddle_fluid.so
    paddle.trainer .config_parser  #6 0x00007efeda9a9321 in paddle::AnalysisPredictor::~AnalysisPredictor() () from /home/work/cpufutureattention_a7ae9c75a852f7dbfd5afd52ffcafce7/bin/../lib/libpaddle_fluid.so
    支持mac os吗 #7 0x000000000048294d in operator() (this=, __ptr=) at /home/opt/gcc-4.8.2.bpkg-r4/gcc-4.8.2.bpkg-r4/include/c++/4.8.2/bits/unique_ptr.h:67
    有人安装成功吗??? #8 reset (__p=, this=0x20ae08150) at /home/opt/gcc-4.8.2.bpkg-r4/gcc-4.8.2.bpkg-r4/include/c++/4.8.2/bits/unique_ptr.h:262
    测试vgg_16_cifar.py报错 #9 operator= (this=0x20ae08150) at /home/opt/gcc-4.8.2.bpkg-r4/gcc-4.8.2.bpkg-r4/include/c++/4.8.2/bits/unique_ptr.h:213
    some mistakes in Paddle installation wiki of docker version #10 cpu::predict::UapTagPredictor::reset_predictor (this=0x20ae080b0) at baidu/cpu/fluid-paddle-predictor/src/predictor/uap_tag_predictor.cpp:25
    fixed issue when exporting python wrapper in image classification demo #11 0x00000000004779fe in get_predictor () at baidu/cpu/fluid-paddle-predictor/src/predictor/predictor_manager.h:31
    安装失败 #12 cpu::predict::PredictionServiceImpl::predict (this=, controller=0x237991c00, request=0x20af1bfc0, response=0x21b998800, done=0x21bfdc7b0) at baidu/cpu/fluid-paddle-predictor/src/business/prediction_service_impl.cpp:37
    安装指南中的小问题 #13 0x000000000046d7e5 in cpu::predict::PredictionService::CallMethod (this=, method=, controller=, request=, response=, done=)
    at bc_out/baidu/cpu/fluid-paddle-predictor/predict.pb.cc:3933
    Remove unnecessary null pointer checks #14 0x0000000000515f2d in baidu::rpc::policy::ProcessRpcRequest (msg_base=0x237c30000) at baidu/base/baidu-rpc/src/baidu/rpc/policy/baidu_rpc_protocol.cpp:522
    fixed build issue of double definition of atomicAdd on modern GPUs #15 0x000000000059263a in baidu::rpc::ProcessInputMessage (void_arg=) at baidu/base/baidu-rpc/src/baidu/rpc/input_messenger.cpp:134
    配置文档中下载docker包命名错误! #16 0x000000000059393f in baidu::rpc::InputMessenger::OnNewMessages (m=0x237cc1a00) at baidu/base/baidu-rpc/src/baidu/rpc/input_messenger.cpp:344
    fixes to build on mac os x #17 0x00000000004c203d in baidu::rpc::Socket::ProcessEvent (arg=0x237cc1a00) at baidu/base/baidu-rpc/src/baidu/rpc/socket.cpp:1110
    Compatiblity issue with CUDA 8.0 #18 0x0000000000652fda in bthread::TaskGroup::task_runner (skip_remained=) at baidu/base/bthread/bthread/task_group.cpp:293
    Compilation error for ParameterClient2.cpp - Narrowing conversion #19 0x000000000064a681 in bthread_make_fcontext ()
    Backtrace stopped: Cannot access memory at address 0x7efec4bdd000
@Superjomn Superjomn added the User 用于标记用户问题 label Aug 22, 2019
@songyiting
Copy link
Author

我写了个测试程序,每个线程反复调用AnalysisPredictor::Clone()并释放,结果跑一小会儿就会core,信息如下:
terminate called after throwing an instance of 'paddle::platform::EnforceNotMet'
terminate called recursively
what(): 0x2433300 Cannot find 0x21d941840 as kid scope at [/paddle/paddle/fluid/framework/scope.cc:120]
PaddlePaddle Call Stacks:
0 0x7f9bd6c91c93p void paddle::platform::EnforceNotMet::Init<char const*>(char const*, char const*, int) + 563
1 0x7f9bd6c923f9p paddle::platform::EnforceNotMet::EnforceNotMet(std::__exception_ptr::exception_ptr, char const*, int) + 137
2 0x7f9bd6dd980dp paddle::framework::Scope::DeleteScope(paddle::framework::Scope*) const + 1581
3 0x7f9bd6ca1dcdp paddle::AnalysisPredictor::~AnalysisPredictor() + 93
4 0x7f9bd6ca2321p paddle::AnalysisPredictor::~AnalysisPredictor() + 17
5 0x4779a3p
6 0x46d7e5p
7 0x5dddedp baidu::rpc::policy::ProcessHttpRequest(baidu::rpc::InputMessageBase*) + 3437
8 0x591f1ap baidu::rpc::ProcessInputMessage(void*) + 10
9 0x59321fp baidu::rpc::InputMessenger::OnNewMessages(baidu::rpc::Socket*) + 383
10 0x4c191dp baidu::rpc::Socket::ProcessEvent(void*) + 13
11 0x6528bap bthread::TaskGroup::task_runner(long) + 266
12 0x649f61p

@shengzhe8688
Copy link

我也碰到这个错误,我是C++的,不知道楼上解决没有

@paddle-bot paddle-bot bot added the status/close 已关闭 label Jan 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status/close 已关闭 User 用于标记用户问题
Projects
None yet
Development

No branches or pull requests

5 participants