Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix enforce test failed and demangle stack info #4107

Merged
merged 1 commit into from
Sep 14, 2017
Merged

Conversation

gangliao
Copy link
Contributor

@gangliao gangliao commented Sep 14, 2017

Note: If no symbol with a suitable value is found, both this field and dli_saddr shall be set to NULL.

Fix: #4072

Note: If no symbol with a suitable value is found, both this field and dli_saddr shall be set to NULL.
Copy link
Contributor Author

@gangliao gangliao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed enforce

@@ -61,8 +78,8 @@ struct EnforceNotMet : public std::exception {

Dl_info info;
for (int i = 0; i < size; ++i) {
if (dladdr(call_stack[i], &info)) {
auto demangled = info.dli_sname;
if (dladdr(call_stack[i], &info) && info.dli_sname) {
Copy link
Contributor Author

@gangliao gangliao Sep 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bug is here:

dli_sname   The name of the nearest runtime symbol with value less than or equal to addr. Where possible, the symbol name shall be returned as it would appear in C source code.If no symbol with a suitable value is found, both this field and dli_saddr shall be set to NULL.

@gangliao
Copy link
Contributor Author

Pass under internal test machine

69: Test command: /home/liaogang/github/Paddle/build/paddle/platform/enforce_test
69: Test timeout computed to be: 9.99988e+06
69: Running main() from gtest_main.cc
69: [==========] Running 19 tests from 9 test cases.
69: [----------] Global test environment set-up.
69: [----------] 3 tests from ENFORCE
69: [ RUN      ] ENFORCE.OK
69: [       OK ] ENFORCE.OK (0 ms)
69: [ RUN      ] ENFORCE.FAILED
69: [       OK ] ENFORCE.FAILED (1 ms)
69: [ RUN      ] ENFORCE.NO_ARG_OK
69: [       OK ] ENFORCE.NO_ARG_OK (0 ms)
69: [----------] 3 tests from ENFORCE (1 ms total)
69:
69: [----------] 2 tests from ENFORCE_EQ
69: [ RUN      ] ENFORCE_EQ.NO_EXTRA_MSG_FAIL
69: [       OK ] ENFORCE_EQ.NO_EXTRA_MSG_FAIL (0 ms)
69: [ RUN      ] ENFORCE_EQ.EXTRA_MSG_FAIL
69: [       OK ] ENFORCE_EQ.EXTRA_MSG_FAIL (0 ms)
69: [----------] 2 tests from ENFORCE_EQ (0 ms total)
69:
69: [----------] 2 tests from ENFORCE_NE
69: [ RUN      ] ENFORCE_NE.OK
69: [       OK ] ENFORCE_NE.OK (0 ms)
69: [ RUN      ] ENFORCE_NE.FAIL
69: [       OK ] ENFORCE_NE.FAIL (1 ms)
69: [----------] 2 tests from ENFORCE_NE (1 ms total)
69:
69: [----------] 2 tests from ENFORCE_GT
69: [ RUN      ] ENFORCE_GT.OK
69: [       OK ] ENFORCE_GT.OK (0 ms)
69: [ RUN      ] ENFORCE_GT.FAIL
69: [       OK ] ENFORCE_GT.FAIL (0 ms)
69: [----------] 2 tests from ENFORCE_GT (0 ms total)
69:
69: [----------] 2 tests from ENFORCE_GE
69: [ RUN      ] ENFORCE_GE.OK
69: [       OK ] ENFORCE_GE.OK (0 ms)
69: [ RUN      ] ENFORCE_GE.FAIL
69: [       OK ] ENFORCE_GE.FAIL (0 ms)
69: [----------] 2 tests from ENFORCE_GE (0 ms total)
69:
69: [----------] 2 tests from ENFORCE_LE
69: [ RUN      ] ENFORCE_LE.OK
69: [       OK ] ENFORCE_LE.OK (0 ms)
69: [ RUN      ] ENFORCE_LE.FAIL
69: [       OK ] ENFORCE_LE.FAIL (0 ms)
69: [----------] 2 tests from ENFORCE_LE (0 ms total)
69:
69: [----------] 2 tests from ENFORCE_LT
69: [ RUN      ] ENFORCE_LT.OK
69: [       OK ] ENFORCE_LT.OK (0 ms)
69: [ RUN      ] ENFORCE_LT.FAIL
69: [       OK ] ENFORCE_LT.FAIL (1 ms)
69: [----------] 2 tests from ENFORCE_LT (1 ms total)
69:
69: [----------] 2 tests from ENFORCE_NOT_NULL
69: [ RUN      ] ENFORCE_NOT_NULL.OK
69: [       OK ] ENFORCE_NOT_NULL.OK (0 ms)
69: [ RUN      ] ENFORCE_NOT_NULL.FAIL
69: [       OK ] ENFORCE_NOT_NULL.FAIL (0 ms)
69: [----------] 2 tests from ENFORCE_NOT_NULL (0 ms total)
69:
69: [----------] 2 tests from ENFORCE_USER_DEFINED_CLASS
69: [ RUN      ] ENFORCE_USER_DEFINED_CLASS.EQ
69: [       OK ] ENFORCE_USER_DEFINED_CLASS.EQ (0 ms)
69: [ RUN      ] ENFORCE_USER_DEFINED_CLASS.NE
69: [       OK ] ENFORCE_USER_DEFINED_CLASS.NE (0 ms)
69: [----------] 2 tests from ENFORCE_USER_DEFINED_CLASS (0 ms total)
69:
69: [----------] Global test environment tear-down
69: [==========] 19 tests from 9 test cases ran. (3 ms total)
69: [  PASSED  ] 19 tests.
1/1 Test #69: enforce_test .....................   Passed    0.01 sec

The following tests passed:
	enforce_test

100% tests passed, 0 tests failed out of 1

Total Test time (real) =   0.02 sec

@gangliao gangliao changed the title Fix enforce test failed Fix enforce test failed and demangle stack info Sep 14, 2017
@qingqing01
Copy link
Contributor

我在Python单测里测了下:

case 1 可以正常显示:

126: Traceback (most recent call last):
126:   File "test_mul_op.py", line 13, in setUp
126:     self.outputs = {'Out': np.dot(self.inputs['X'], self.inputs['Y'])}
126: ValueError: shapes (32,84) and (80,100) not aligned: 84 (dim 1) != 80 (dim 0)

case 2 只报 Exception: SegFault :

126: Test timeout computed to be: 9.99988e+06
1/1 Test #126: test_mul_op ......................***Exception: SegFault 45.98 sec

0% tests passed, 1 tests failed out of 1

Total Test time (real) =  46.00 sec

@Xreki
Copy link
Contributor

Xreki commented Sep 14, 2017

@qingqing01 求问case 1是哪个,case 2是哪个?我碰到过很多次SegFault,大概情况是python没有设置正确的Output。

@qingqing01
Copy link
Contributor

qingqing01 commented Sep 14, 2017

@Xreki 用MulOp测的:

case 1: shape改成不匹配:

'Y': np.random.random((84, 100)) -> 'Y': np.random.random((80, 100))

class TestMulOp(OpTest):
    def setUp(self):
        self.op_type = "mul"
        self.inputs = {
            'X': np.random.random((32, 84)).astype("float32"),
            'Y': np.random.random((80, 100)).astype("float32")
        }
        self.outputs = {'Out': np.dot(self.inputs['X'], self.inputs['Y'])}

    def test_check_output(self):
        self.check_output()

case 2: 改错名字Y->Y0

class TestMulOp(OpTest):
    def setUp(self):
        self.op_type = "mul"
        self.inputs = {
            'X': np.random.random((32, 84)).astype("float32"),
            'Y0': np.random.random((84, 100)).astype("float32")
        }
        self.outputs = {'Out': np.dot(self.inputs['X'], self.inputs['Y0'])}

    def test_check_output(self):
        self.check_output()

case2有可能是C++代码内部check做的不好。

Copy link
Contributor

@qingqing01 qingqing01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@qingqing01
Copy link
Contributor

qingqing01 commented Sep 14, 2017

但其实希望case2也能有错误栈信息~

@Xreki
Copy link
Contributor

Xreki commented Sep 14, 2017

case2很奇怪,就是C++端createOp时,名字写错,是有提示和stack信息的。python端写错就是SegFault,很奇怪。这两种情况我都在#3899 里面提到过。

@Xreki Xreki merged commit 13d0005 into develop Sep 14, 2017
@Xreki
Copy link
Contributor

Xreki commented Sep 14, 2017

@qingqing01 case1的错误,应该是python报的,因为单测里面会执行np.dot(),会检查数据的规模。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants