Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DCN预测时5%概率会有core出现 #21863

Closed
AltenLi opened this issue Dec 20, 2019 · 2 comments
Closed

DCN预测时5%概率会有core出现 #21863

AltenLi opened this issue Dec 20, 2019 · 2 comments
Assignees

Comments

@AltenLi
Copy link

AltenLi commented Dec 20, 2019

  • 版本、环境信息:
       1)PaddlePaddle版本:1.6
       2)CPU
       4)系统环境:python3.6.7
    复现信息:以下是在hadoop上运行时的报错,本地运行时能看到很多core文件生成。
    Namespace(batch_size=512, cat_feat_num='./data/poi_all/cat_feature_num.txt', clip_by_norm=100.0, cross_num=6, dnn_hidden_units=[1024, 1024], infer_by_user=True, infer_thre=0.9, is_sparse=False, l2_reg_cross=1e-05, lr=0.0001, model_output_dir='./cluster_model', num_epoch=2, num_thread=20, poi_fea='./data/poi_all/poi-info.infer.dat', pre_output_dir='data/predict_result', print_steps=100, steps=150000, test_epoch='10', test_valid_data_dir='data/test_valid', train_data_dir='data/train', use_bn=True, use_cuda=False, vocab_dir='./data/poi_all/vocab')
    OMP: Error fix bugs under kSgdSparseCpuTraining mode #100: Fatal system error detected.
    OMP: System error remove extra paddle #22: Invalid argument
    W1220 15:22:38.570390 215818 init.cc:205] *** Aborted at 1576826558 (unix time) try "date -d @1576826558" if you are using GNU date ***
    W1220 15:22:38.573508 215818 init.cc:205] PC: @ 0x0 (unknown)
    W1220 15:22:38.574007 215818 init.cc:205] *** SIGABRT (@0x1f500034b0a) received by PID 215818 (TID 0x7fc763c85700) from PID 215818; stack trace: ***
    W1220 15:22:38.582072 215818 init.cc:205] @ 0x7fc76385c160 (unknown)
    W1220 15:22:38.733220 215818 init.cc:205] @ 0x7fc762dca3f7 __GI_raise
    W1220 15:22:38.740550 215818 init.cc:205] @ 0x7fc762dcb7d8 __GI_abort
    W1220 15:22:38.743173 215818 init.cc:205] @ 0x7fc739d7e023 __kmp_abort_process
    W1220 15:22:38.749852 215818 init.cc:205] @ 0x7fc739d69aaf __kmp_fatal
    W1220 15:22:38.777932 215818 init.cc:205] @ 0x7fc739d0b4a8 KMPNativeAffinity::Mask::set_system_affinity()
    W1220 15:22:38.788611 215818 init.cc:205] @ 0x7fc739db9517 __kmp_affinity_bind_thread
    W1220 15:22:38.791082 215818 init.cc:205] @ 0x7fc739d03c5d _INTERNAL_26_______src_kmp_affinity_cpp_da295ce7::__kmp_affinity_create_x2apicid_map()
    W1220 15:22:38.801770 215818 init.cc:205] @ 0x7fc739cf97b5 _INTERNAL_26_______src_kmp_affinity_cpp_da295ce7::__kmp_aux_affinity_initialize()
    W1220 15:22:38.812433 215818 init.cc:205] @ 0x7fc739cf8d8b __kmp_affinity_initialize
    W1220 15:22:38.815022 215818 init.cc:205] @ 0x7fc739d7cf98 __kmp_middle_initialize
    W1220 15:22:38.825731 215818 init.cc:205] @ 0x7fc739d5e9ee __kmp_api_omp_get_num_procs
    W1220 15:22:38.847759 215818 init.cc:205] @ 0x7fc73a49105e mkl_serv_get_num_stripes
    W1220 15:22:38.861289 215818 init.cc:205] @ 0x7fc73a370974 mkl_blas_sgemm
    W1220 15:22:38.874635 215818 init.cc:205] @ 0x7fc73a2def09 SGEMM
    W1220 15:22:38.896353 215818 init.cc:205] @ 0x7fc73a29daa1 cblas_sgemm
    W1220 15:22:38.909498 215818 init.cc:205] @ 0x7fc7446654ab paddle::operators::math::Blas<>::MatMul<>()
    W1220 15:22:38.931319 215818 init.cc:205] @ 0x7fc7446659e3 paddle::operators::MulKernel<>::Compute()
    W1220 15:22:38.943982 215818 init.cc:205] @ 0x7fc744665bd3 ZNSt17_Function_handlerIFvRKN6paddle9framework16ExecutionContextEEZNKS1_24OpKernelRegistrarFunctorINS0_8platform8CPUPlaceELb0ELm0EINS0_9operators9MulKernelINS7_16CPUDeviceContextEfEENSA_ISB_dEEEEclEPKcSG_iEUlS4_E_E9_M_invokeERKSt9_Any_dataS4
    W1220 15:22:38.957010 215818 init.cc:205] @ 0x7fc74564ba9b paddle::framework::OperatorWithKernel::RunImpl()
    W1220 15:22:38.971024 215818 init.cc:205] @ 0x7fc74564c421 paddle::framework::OperatorWithKernel::RunImpl()
    W1220 15:22:38.988966 215818 init.cc:205] @ 0x7fc745646500 paddle::framework::OperatorBase::Run()
    W1220 15:22:39.002995 215818 init.cc:205] @ 0x7fc7440a5736 paddle::framework::Executor::RunPreparedContext()
    W1220 15:22:39.015275 215818 init.cc:205] @ 0x7fc7440a89df paddle::framework::Executor::Run()
    W1220 15:22:39.027966 215818 init.cc:205] @ 0x7fc743ef630d ZZN8pybind1112cpp_function10initializeIZN6paddle6pybindL22pybind11_init_core_avxERNS_6moduleEEUlRNS2_9framework8ExecutorERKNS6_11ProgramDescEPNS6_5ScopeEibbRKSt6vectorISsSaISsEEE102_vIS8_SB_SD_ibbSI_EINS_4nameENS_9is_methodENS_7siblingEEEEvOT_PFT0_DpT1_EDpRKT2_ENUlRNS_6detail13function_callEE1_4_FUNES10
    W1220 15:22:39.037434 215818 init.cc:205] @ 0x7fc743f3944e pybind11::cpp_function::dispatcher()
    W1220 15:22:39.037901 215818 init.cc:205] @ 0x4ac717 _PyCFunction_FastCallKeywords
    W1220 15:22:39.038012 215818 init.cc:205] @ 0x543b75 call_function
    W1220 15:22:39.038370 215818 init.cc:205] @ 0x5492f1 _PyEval_EvalFrameDefault
    W1220 15:22:39.038463 215818 init.cc:205] @ 0x543a21 _PyEval_EvalCodeWithName
    W1220 15:22:39.038542 215818 init.cc:205] @ 0x543d1f call_function
    W1220 15:22:39.038889 215818 init.cc:205] @ 0x54917d _PyEval_EvalFrameDefault
    mapper_infer_all.sh: line 11: 215818 Aborted $PYTHON_BIN -u infer_hdp.py --test_epoch 10 --vocab_dir ./data/poi_all/vocab --cat_feat_num ./data/poi_all/cat_feature_num.txt --poi_fea ./data/poi_all/poi-info.infer.dat --model_output_dir ./cluster_model --infer_thre 0.9 --infer_by_user True
@wilhelmzh
Copy link
Contributor

wilhelmzh commented Dec 22, 2019

已和用户联系正在沟通,目前在python3.6.7尝试pip 1.6.1和1.6.2版本,都出现上述问题。但本地使用用户环境未复现上述错误。

可能是机器不支持mkl,建议尝试不带mkl的版本,或关闭mkl编译paddle。

@paddle-bot-old
Copy link

Since you haven't replied for more than a year, we have closed this issue/pr.
If the problem is not solved or there is a follow-up one, please reopen it at any time and we will continue to follow up.
由于您超过一年未回复,我们将关闭这个issue/pr。
若问题未解决或有后续问题,请随时重新打开,我们会继续跟进。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants