Fix enforce test failed and demangle stack info #4107

gangliao · 2017-09-14T09:12:42Z

Note: If no symbol with a suitable value is found, both this field and dli_saddr shall be set to NULL.

gangliao

fixed enforce

gangliao · 2017-09-14T09:13:18Z

paddle/platform/enforce.h

@@ -61,8 +78,8 @@ struct EnforceNotMet : public std::exception {

      Dl_info info;
      for (int i = 0; i < size; ++i) {
-        if (dladdr(call_stack[i], &info)) {
-          auto demangled = info.dli_sname;
+        if (dladdr(call_stack[i], &info) && info.dli_sname) {


bug is here:

dli_sname The name of the nearest runtime symbol with value less than or equal to addr. Where possible, the symbol name shall be returned as it would appear in C source code.If no symbol with a suitable value is found, both this field and dli_saddr shall be set to NULL.

gangliao · 2017-09-14T09:16:23Z

Pass under internal test machine

69: Test command: /home/liaogang/github/Paddle/build/paddle/platform/enforce_test
69: Test timeout computed to be: 9.99988e+06
69: Running main() from gtest_main.cc
69: [==========] Running 19 tests from 9 test cases.
69: [----------] Global test environment set-up.
69: [----------] 3 tests from ENFORCE
69: [ RUN      ] ENFORCE.OK
69: [       OK ] ENFORCE.OK (0 ms)
69: [ RUN      ] ENFORCE.FAILED
69: [       OK ] ENFORCE.FAILED (1 ms)
69: [ RUN      ] ENFORCE.NO_ARG_OK
69: [       OK ] ENFORCE.NO_ARG_OK (0 ms)
69: [----------] 3 tests from ENFORCE (1 ms total)
69:
69: [----------] 2 tests from ENFORCE_EQ
69: [ RUN      ] ENFORCE_EQ.NO_EXTRA_MSG_FAIL
69: [       OK ] ENFORCE_EQ.NO_EXTRA_MSG_FAIL (0 ms)
69: [ RUN      ] ENFORCE_EQ.EXTRA_MSG_FAIL
69: [       OK ] ENFORCE_EQ.EXTRA_MSG_FAIL (0 ms)
69: [----------] 2 tests from ENFORCE_EQ (0 ms total)
69:
69: [----------] 2 tests from ENFORCE_NE
69: [ RUN      ] ENFORCE_NE.OK
69: [       OK ] ENFORCE_NE.OK (0 ms)
69: [ RUN      ] ENFORCE_NE.FAIL
69: [       OK ] ENFORCE_NE.FAIL (1 ms)
69: [----------] 2 tests from ENFORCE_NE (1 ms total)
69:
69: [----------] 2 tests from ENFORCE_GT
69: [ RUN      ] ENFORCE_GT.OK
69: [       OK ] ENFORCE_GT.OK (0 ms)
69: [ RUN      ] ENFORCE_GT.FAIL
69: [       OK ] ENFORCE_GT.FAIL (0 ms)
69: [----------] 2 tests from ENFORCE_GT (0 ms total)
69:
69: [----------] 2 tests from ENFORCE_GE
69: [ RUN      ] ENFORCE_GE.OK
69: [       OK ] ENFORCE_GE.OK (0 ms)
69: [ RUN      ] ENFORCE_GE.FAIL
69: [       OK ] ENFORCE_GE.FAIL (0 ms)
69: [----------] 2 tests from ENFORCE_GE (0 ms total)
69:
69: [----------] 2 tests from ENFORCE_LE
69: [ RUN      ] ENFORCE_LE.OK
69: [       OK ] ENFORCE_LE.OK (0 ms)
69: [ RUN      ] ENFORCE_LE.FAIL
69: [       OK ] ENFORCE_LE.FAIL (0 ms)
69: [----------] 2 tests from ENFORCE_LE (0 ms total)
69:
69: [----------] 2 tests from ENFORCE_LT
69: [ RUN      ] ENFORCE_LT.OK
69: [       OK ] ENFORCE_LT.OK (0 ms)
69: [ RUN      ] ENFORCE_LT.FAIL
69: [       OK ] ENFORCE_LT.FAIL (1 ms)
69: [----------] 2 tests from ENFORCE_LT (1 ms total)
69:
69: [----------] 2 tests from ENFORCE_NOT_NULL
69: [ RUN      ] ENFORCE_NOT_NULL.OK
69: [       OK ] ENFORCE_NOT_NULL.OK (0 ms)
69: [ RUN      ] ENFORCE_NOT_NULL.FAIL
69: [       OK ] ENFORCE_NOT_NULL.FAIL (0 ms)
69: [----------] 2 tests from ENFORCE_NOT_NULL (0 ms total)
69:
69: [----------] 2 tests from ENFORCE_USER_DEFINED_CLASS
69: [ RUN      ] ENFORCE_USER_DEFINED_CLASS.EQ
69: [       OK ] ENFORCE_USER_DEFINED_CLASS.EQ (0 ms)
69: [ RUN      ] ENFORCE_USER_DEFINED_CLASS.NE
69: [       OK ] ENFORCE_USER_DEFINED_CLASS.NE (0 ms)
69: [----------] 2 tests from ENFORCE_USER_DEFINED_CLASS (0 ms total)
69:
69: [----------] Global test environment tear-down
69: [==========] 19 tests from 9 test cases ran. (3 ms total)
69: [  PASSED  ] 19 tests.
1/1 Test #69: enforce_test .....................   Passed    0.01 sec

The following tests passed:
	enforce_test

100% tests passed, 0 tests failed out of 1

Total Test time (real) =   0.02 sec

qingqing01 · 2017-09-14T10:31:05Z

我在Python单测里测了下：

case 1 可以正常显示：

126: Traceback (most recent call last):
126:   File "test_mul_op.py", line 13, in setUp
126:     self.outputs = {'Out': np.dot(self.inputs['X'], self.inputs['Y'])}
126: ValueError: shapes (32,84) and (80,100) not aligned: 84 (dim 1) != 80 (dim 0)

case 2 只报 Exception: SegFault ：

126: Test timeout computed to be: 9.99988e+06
1/1 Test #126: test_mul_op ......................***Exception: SegFault 45.98 sec

0% tests passed, 1 tests failed out of 1

Total Test time (real) =  46.00 sec

Xreki · 2017-09-14T10:52:09Z

@qingqing01 求问case 1是哪个，case 2是哪个？我碰到过很多次SegFault，大概情况是python没有设置正确的Output。

qingqing01 · 2017-09-14T11:14:21Z

@Xreki 用MulOp测的：

case 1： shape改成不匹配:

'Y': np.random.random((84, 100)) -> 'Y': np.random.random((80, 100))

class TestMulOp(OpTest):
    def setUp(self):
        self.op_type = "mul"
        self.inputs = {
            'X': np.random.random((32, 84)).astype("float32"),
            'Y': np.random.random((80, 100)).astype("float32")
        }
        self.outputs = {'Out': np.dot(self.inputs['X'], self.inputs['Y'])}

    def test_check_output(self):
        self.check_output()

case 2: 改错名字Y->Y0

class TestMulOp(OpTest):
    def setUp(self):
        self.op_type = "mul"
        self.inputs = {
            'X': np.random.random((32, 84)).astype("float32"),
            'Y0': np.random.random((84, 100)).astype("float32")
        }
        self.outputs = {'Out': np.dot(self.inputs['X'], self.inputs['Y0'])}

    def test_check_output(self):
        self.check_output()

case2有可能是C++代码内部check做的不好。

qingqing01

LGTM.

qingqing01 · 2017-09-14T11:15:39Z

但其实希望case2也能有错误栈信息~

Xreki · 2017-09-14T11:29:34Z

case2很奇怪，就是C++端createOp时，名字写错，是有提示和stack信息的。python端写错就是SegFault，很奇怪。这两种情况我都在#3899 里面提到过。

Xreki · 2017-09-14T11:53:58Z

@qingqing01 case1的错误，应该是python报的，因为单测里面会执行np.dot()，会检查数据的规模。

Fix enforce test failed

59d661b

Note: If no symbol with a suitable value is found, both this field and dli_saddr shall be set to NULL.

gangliao requested review from qingqing01 and lcy-seso September 14, 2017 09:12

gangliao commented Sep 14, 2017

View reviewed changes

gangliao requested review from wangkuiyi and reyoung September 14, 2017 09:14

gangliao changed the title ~~Fix enforce test failed~~ Fix enforce test failed and demangle stack info Sep 14, 2017

qingqing01 approved these changes Sep 14, 2017

View reviewed changes

Xreki merged commit 13d0005 into develop Sep 14, 2017

qingqing01 mentioned this pull request Sep 15, 2017

重构后的stack trace信息仍旧不太可读 #4116

Closed

Xreki mentioned this pull request Sep 15, 2017

python单测中输入/输出参数名写错时，出现的SegFault问题 #4117

Closed

luotao1 deleted the enforce_failed branch September 21, 2017 06:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix enforce test failed and demangle stack info #4107

Fix enforce test failed and demangle stack info #4107

gangliao commented Sep 14, 2017 •

edited

Loading

gangliao left a comment

gangliao Sep 14, 2017 •

edited

Loading

gangliao commented Sep 14, 2017

qingqing01 commented Sep 14, 2017

Xreki commented Sep 14, 2017 •

edited

Loading

qingqing01 commented Sep 14, 2017 •

edited

Loading

qingqing01 left a comment

qingqing01 commented Sep 14, 2017 •

edited

Loading

Xreki commented Sep 14, 2017

Xreki commented Sep 14, 2017

Fix enforce test failed and demangle stack info #4107

Fix enforce test failed and demangle stack info #4107

Conversation

gangliao commented Sep 14, 2017 • edited Loading

gangliao left a comment

Choose a reason for hiding this comment

gangliao Sep 14, 2017 • edited Loading

Choose a reason for hiding this comment

gangliao commented Sep 14, 2017

qingqing01 commented Sep 14, 2017

Xreki commented Sep 14, 2017 • edited Loading

qingqing01 commented Sep 14, 2017 • edited Loading

qingqing01 left a comment

Choose a reason for hiding this comment

qingqing01 commented Sep 14, 2017 • edited Loading

Xreki commented Sep 14, 2017

Xreki commented Sep 14, 2017

gangliao commented Sep 14, 2017 •

edited

Loading

gangliao Sep 14, 2017 •

edited

Loading

Xreki commented Sep 14, 2017 •

edited

Loading

qingqing01 commented Sep 14, 2017 •

edited

Loading

qingqing01 commented Sep 14, 2017 •

edited

Loading