Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failing filesystem tests due to memory allocation #7712

Closed
aashishc1988 opened this issue Aug 6, 2018 · 20 comments
Closed

Failing filesystem tests due to memory allocation #7712

aashishc1988 opened this issue Aug 6, 2018 · 20 comments

Comments

@aashishc1988
Copy link
Contributor

aashishc1988 commented Aug 6, 2018

Description

  • Type: Bug
  • Priority: Major

Bug

Target
WIZWIKI_W7500

Toolchain:
GCC_ARM

Tests we are failing:
features-tests-filesystem-buffered_block_device

Steps to reproduce

mbed test -m WIZWIKI_W7500 -t IAR -n features-tests-filesystem-buffered_block_device -DMBED_HEAP_STATS_ENABLED=1 -DMBED_TRAP_ERRORS_ENABLED=1 -v

So, it seems that the call to new would fail, and even though we have nothrow exception defined, the mbed-os error handler is not very forgiving. The failure gets trapped by the error handler.

log snippet

> 
00:26:05.091 mbedgt: test case summary: 4 passes, 0 failures
00:26:05.091 mbedgt: checking for 'host_tests' directory above image directory structure
00:26:05.092 	'host_tests' directory not found: two directory levels above image path checked
00:26:05.092 mbedgt: selecting test case observer...
00:26:05.093 	calling mbedhtrun: mbedhtrun -m WIZWIKI_W7500 -p DUMMY:9600 -f "BUILD/tests/WIZWIKI_W7500/GCC_ARM/features/TESTS/filesystem/buffered_block_device/buffered_block_device.bin" --grm raas_client:goku.austin.arm.com:8000 -C 4 --sync 10 -P 60
00:26:05.093 mbedgt: mbed-host-test-runner: started
00:26:05.895 [1533239535.10][root]Generating grammar tables from /usr/lib/python2.7/lib2to3/Grammar.txt
00:26:05.970 [1533239535.17][root]Generating grammar tables from /usr/lib/python2.7/lib2to3/PatternGrammar.txt
00:26:05.993 [1533239535.19][HTST][INF] host test executor ver. 1.4.0
00:26:05.993 [1533239535.19][HTST][INF] copy image onto target... SKIPPED!
00:26:05.994 [1533239535.19][HTST][INF] starting host test process...
00:26:06.007 [1533239535.21][CONN][INF] starting connection process...
00:26:06.017 [1533239535.22][CONN][INF] notify event queue about extra 60 sec timeout for serial port pooling
00:26:06.018 [1533239535.22][CONN][INF] initializing global resource mgr listener...
00:26:06.028 [1533239535.23][HTST][INF] setting timeout to: 60 sec
00:26:06.078 [1533239535.28][GLRM][INF] remote resources initialization: remote(host=goku.austin.arm.com, port=8000)
00:26:06.081 [1533239535.28][urllib3.connectionpool]Starting new HTTP connection (1): goku.austin.arm.com:8000
00:26:06.085 [1533239535.29][urllib3.connectionpool]http://goku.austin.arm.com:8000 "POST /login HTTP/1.1" 200 213
00:26:06.087 [1533239535.29][urllib3.connectionpool]Starting new HTTP connection (1): goku.austin.arm.com:8000
00:26:06.093 [1533239535.29][urllib3.connectionpool]http://goku.austin.arm.com:8000 "GET /socket.io/?token=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJ1c2VybmFtZSI6InVzZXIiLCJncm91cCI6InVzZXIiLCJpYXQiOjE1MzMyMzk1MzUsImV4cCI6MTUzMzg0NDMzNX0.fbqrbQuKBPgDg87ij1-AUoR01YR0YRV1NgueWFCYIqs&EIO=3&transport=polling&t=1533239535286-0 HTTP/1.1" 200 101
00:26:06.133 [1533239535.33][urllib3.connectionpool]Starting new HTTP connection (1): goku.austin.arm.com:8000
00:26:06.135 [1533239535.34][urllib3.connectionpool]http://goku.austin.arm.com:8000 "GET /this/resource HTTP/1.1" 200 3651
00:26:06.137 [1533239535.34][urllib3.connectionpool]Starting new HTTP connection (1): goku.austin.arm.com:8000
00:26:06.151 [1533239535.35][urllib3.connectionpool]http://goku.austin.arm.com:8000 "GET /resource HTTP/1.1" 200 55713
00:26:06.155 [1533239535.36][GLRM][INF] remote resources count: 39
00:26:06.172 [1533239535.37][urllib3.connectionpool]Starting new HTTP connection (1): goku.austin.arm.com:8000
00:26:06.175 [1533239535.38][urllib3.connectionpool]http://goku.austin.arm.com:8000 "GET /resource/2201000003b88c5b00000000000000000000000097969902 HTTP/1.1" 200 1847
00:26:06.176 [1533239535.38][GLRM][INF] remote resources flashing with 'BUILD/tests/WIZWIKI_W7500/GCC_ARM/features/TESTS/filesystem/buffered_block_device/buffered_block_device.bin'...
00:26:06.177 [1533239535.38][urllib3.connectionpool]Starting new HTTP connection (1): goku.austin.arm.com:8000
00:26:06.182 [1533239535.38][urllib3.connectionpool]http://goku.austin.arm.com:8000 "POST /file HTTP/1.1" 200 257
00:26:20.739 [1533239549.94][GLRM][INF] opening connection to platform at baudrate='9600'
00:26:20.739 [1533239549.94][urllib3.connectionpool]Starting new HTTP connection (1): goku.austin.arm.com:8000
00:26:20.927 [1533239550.13][urllib3.connectionpool]http://goku.austin.arm.com:8000 "PUT /resource/2201000003b88c5b00000000000000000000000097969902/connect HTTP/1.1" 200 15
00:26:20.928 [1533239550.13][GLRM][INF] remote resources reset...
00:26:23.189 [1533239552.39][GLRM][TXD] mbedmbedmbedmbedmbedmbedmbedmbedmbedmbed
00:26:23.189 [1533239552.39][CONN][INF] sending up to 10 __sync packets (specified with --sync=10)
00:26:23.189 [1533239552.39][CONN][INF] sending preamble '919e5587-69b0-4500-bad1-0fbc7ca4b290'
00:26:23.190 [1533239552.39][GLRM][TXD] {{__sync;919e5587-69b0-4500-bad1-0fbc7ca4b290}}
00:26:24.334 [1533239553.53][CONN][RXD] mbedmbedmbedmbedmbedmbedmbedmbed
00:26:24.334 [1533239553.54][HTST][INF] sync KV found, uuid=919e5587-69b0-4500-bad1-0fbc7ca4b290, timestamp=1533239553.535081
00:26:24.335 [1533239553.54][CONN][INF] found SYNC in stream: {{__sync;919e5587-69b0-4500-bad1-0fbc7ca4b290}} it is #0 sent, queued...
00:26:24.453 [1533239553.64][CONN][RXD] >>> Running 1 test cases...
00:26:24.454 [1533239553.65][HTST][INF] DUT greentea-client version: 1.3.0
00:26:24.454 [1533239553.65][CONN][INF] found KV pair in stream: {{__version;1.3.0}}, queued...
00:26:24.454 [1533239553.65][HTST][INF] setting timeout to: 30 sec
00:26:24.454 [1533239553.65][CONN][INF] found KV pair in stream: {{__timeout;30}}, queued...
00:26:24.456 [1533239553.66][HTST][INF] host test class: '<class 'mbed_host_tests.host_tests.default_auto.DefaultAuto'>'
00:26:24.456 [1533239553.66][HTST][INF] host test setup() call...
00:26:24.456 [1533239553.66][HTST][INF] CALLBACKs updated
00:26:24.456 [1533239553.66][HTST][INF] host test detected: default_auto
00:26:24.456 [1533239553.66][CONN][INF] found KV pair in stream: {{__host_test_name;default_auto}}, queued...
00:26:24.539 [1533239553.74][CONN][RXD]
00:26:24.539 [1533239553.74][CONN][INF] found KV pair in stream: {{__testcase_name;BufferedBlockDevice functionality test}}, queued...
00:26:24.634 [1533239553.84][CONN][RXD] >>> Running case #1: 'BufferedBlockDevice functionality test'...
00:26:24.635 [1533239553.84][CONN][INF] found KV pair in stream: {{__testcase_start;BufferedBlockDevice functionality test}}, queued...
00:26:24.747 [1533239553.95][CONN][RXD] :34::SKIP: Not enough memory for test
00:26:24.747 [1533239553.95][CONN][INF] found KV pair in stream: {{__testcase_finish;BufferedBlockDevice functionality test;1;0}}, queued...
00:26:24.848 [1533239554.05][CONN][RXD] >>> 'BufferedBlockDevice functionality test': 1 passed, 0 failed
00:26:24.848 [1533239554.05][CONN][RXD]
00:26:24.955 [1533239554.16][CONN][RXD] >>> Test cases: 1 passed, 0 failed
00:26:24.956 [1533239554.16][CONN][RXD]
00:26:24.956 [1533239554.16][CONN][RXD]
00:26:24.956 [1533239554.16][CONN][INF] found KV pair in stream: {{__testcase_summary;1;0}}, queued...
00:26:24.956 [1533239554.16][CONN][INF] found KV pair in stream: {{max_heap_usage;0}}, queued...
00:26:24.956 [1533239554.16][HTST][ERR] orphan event in main phase: {{max_heap_usage;0}}, timestamp=1533239554.156132
00:26:24.956 [1533239554.16][CONN][INF] found KV pair in stream: {{reserved_heap;4624}}, queued...
00:26:24.956 [1533239554.16][HTST][ERR] orphan event in main phase: {{reserved_heap;4624}}, timestamp=1533239554.156136
00:26:25.039 [1533239554.24][CONN][RXD] ++ MbedOS Error Info ++
00:26:25.039 [1533239554.24][CONN][RXD] Error Status: 0x8001011F Code: 287 Module: 1
00:26:25.039 [1533239554.24][CONN][RXD] Error Message: Operator new[] out of memory
00:26:25.139 [1533239554.34][CONN][RXD]
00:26:25.139 [1533239554.34][CONN][RXD] Location: 0x41CF
00:26:25.139 [1533239554.34][CONN][RXD] Error Value: 0xC
00:26:25.332 [1533239554.53][CONN][RXD] Current Thread: Id: 0x2000194C Entry: 0x44F9 StackSize: 0x1000 StackMem: 0x20001998 SP: 0x20002828
00:26:25.333 [1533239554.53][CONN][RXD] -- MbedOS Error Info --
00:26:55.368 [1533239584.57][HTST][INF] test suite run finished after 30.91 sec...
00:26:55.386 [1533239584.59][CONN][INF] received special event '__host_test_finished' value='True', finishing
00:26:55.387 [1533239584.59][urllib3.connectionpool]Starting new HTTP connection (1): goku.austin.arm.com:8000
00:26:55.480 [1533239584.68][urllib3.connectionpool]http://goku.austin.arm.com:8000 "PUT /resource/2201000003b88c5b00000000000000000000000097969902/disconnect HTTP/1.1" 200 2
00:26:55.481 [1533239584.68][urllib3.connectionpool]Starting new HTTP connection (1): goku.austin.arm.com:8000
00:26:58.715 [1533239587.92][urllib3.connectionpool]http://goku.austin.arm.com:8000 "PUT /resource/2201000003b88c5b00000000000000000000000097969902/release HTTP/1.1" 200 66
00:26:58.718 [1533239587.92][HTST][INF] CONN exited with code: 0
00:26:58.718 [1533239587.92][HTST][INF] No events in queue
00:26:58.718 [1533239587.92][HTST][INF] stopped consuming events
00:26:58.718 [1533239587.92][HTST][INF] host test result(): None
00:26:58.718 [1533239587.92][HTST][WRN] missing __exit event from DUT
00:26:58.718 [1533239587.92][HTST][WRN] missing __exit_event_queue event from host test
00:26:58.718 [1533239587.92][HTST][ERR] missing __exit_event_queue event from host test and no result from host test, timeout...
00:26:58.718 [1533239587.92][HTST][INF] calling blocking teardown()
00:26:58.718 [1533239587.92][HTST][INF] teardown() finished
00:26:58.718 [1533239587.92][HTST][INF] {{result;timeout}}
[ ] Question
[ ] Enhancement 
[X ] Bug 
@aashishc1988
Copy link
Contributor Author

cc: @davidsaada @SenRamakri @studavekar

@aashishc1988
Copy link
Contributor Author

aashishc1988 commented Aug 6, 2018

One way to handle this, as suggested by @SenRamakri , is that we over-write the mbed-os error handler with a dummy procedure that does nothing when the call to "new" fails. Here is the snippet of what code would look like (with some commentary about the changes)

/**
  * If the flag MBED_TRAP_ERRORS_ENABLED is defined, and the part cannot allocate enough memory for the test, the call to memory allocator will fail and mbed-os error function will be called, and we will be stuck in a loop, which ideally is good for debugging purposes, but since we are aware of the reasons for failure, and we don't want the CI to get stuck due to that, we will overwrite "error" function call with a dummy procedure that returns SUCCESS. Please be wary while using this, as you do not want to use this for cases that can do without it because it will mask some reason failures due to memory allocation, therefore, if a test has multiple cases and not all of them needs this overwrite, consider splitting the test in two files.
*/

#if defined(MBED_TRAP_ERRORS_ENABLED) && MBED_TRAP_ERRORS_ENABLED
void error(const char* format, ...) {
    (void) format;
}

//Override the set_error function to trap the errors 
mbed_error_status_t mbed_error(mbed_error_status_t error_status, const char *error_msg, unsigned int error_value, const char *filename, int line_number) 
{
    return MBED_SUCCESS;
}
#endif

However, this change is only needed for some specific parts, and tool chain. Like the test, features-tests-filesystem-buffered_block_device is failing for WIZWIKI_W7500 on GCC_ARM toolchain. Adding the code snippet above can/will mask failures for other devices that do not need this change in. So, making this change would require having some more ifdef in the test script, and to me it looks like we are entering into a new territory of modifying the test according to specific part numbers which doesn't look as appealing.

@ciarmcom
Copy link
Member

ciarmcom commented Aug 6, 2018

ARM Internal Ref: MBOTRIAGE-1496

@0xc0170
Copy link
Contributor

0xc0170 commented Aug 7, 2018

Why is it failing? not enough memory? There's 48 kB SRAM (if no socket are used, not sure if these are allocated by default)

@davidsaada
Copy link
Contributor

@aashishc1988 The test you are running does not match the output.
The command you showed is for buffered_block_device test, which handles the allocation problems properly. I have tried this test on your board and it works properly.
However, the output you pasted is for the tests-mbedmicro-rtos-mbed-threads test. I haven't handled this test yet, which is indeed described in #7535.
So this issue actually duplicates #7535.

@davidsaada
Copy link
Contributor

Just to add: std::nothrow works perfectly. No need to change anything in the error handler. Shown failure is in a test that doesn't include it yet (and it will take some time to fix, as it's a complicated test).

@0xc0170
Copy link
Contributor

0xc0170 commented Aug 7, 2018

Closing as duplicate, lets continue in 7535 and close that one

@0xc0170 0xc0170 closed this as completed Aug 7, 2018
@aashishc1988
Copy link
Contributor Author

@davidsaada My bad, I pasted the wrong output. I was filing multiple issues and got confused. Sorry about that. Please take a look at the updated log. The current test runs with std::nothrow, however, in our tests, we run with -DMBED_TRAP_ERRORS_ENABLED flag enabled, which will get triggered when memory allocation fails, irrespective of std::nothrow. The idea behind filing this issue is to see if we can figure out a way around it (look at my second comment).

@studavekar
Copy link
Contributor

re-opening as per above comment from @aashishc1988

@studavekar studavekar reopened this Aug 7, 2018
@davidsaada
Copy link
Contributor

@aashishc1988 This issue found me asking the following questions:

  1. Is the -DMBED_TRAP_ERRORS_ENABLED included in our CI builds? Must say I'm not familiar with it.
  2. If so, what's the reason we include it? It seems to nullify the std::nothrow directive.
  3. I'm running the same command you do (with this flag included) on the same board, but still don't see the problem. Getting the same output, but without the exception. Tried on both IAR and GCC_ARM. What can be the reason for it?
  4. I see that in your command the test name doesn't include mbed-os as a prefix. This hints that you have an older version of mbed tools. Could this be the reason for the different results?

@aashishc1988
Copy link
Contributor Author

aashishc1988 commented Aug 7, 2018

@davidsaada I am little unfamiliar with all the terms and environment setup. I will try my best to answer questions to the best of my knowledge and will also tag relevant people.

  1. When you say CI builds, I am assuming you meant the build artifacts generated by Jenkins (aws)? If yes, then -DMBED_TRAP_ERRORS_ENABLED is included. @studavekar

  2. It does seem to nullify std::throw directive in this case, but my assumption is that the addition of this flag and the corresponding error handler is to catch hold of lot of other problems besides memory allocation failures (I would need to dig in deeper), and I am not even sure if all the tests that call new for memory allocation actually use std::throw directive, however we can discuss it further to see if this check for the failure is actually needed in the error handler, given that if proper precaution is taken from the host with regards to memory allocation. Actually, I am not sure that if we expose this flag to customers, it very could be just for internal use. @SenRamakri

  3. Are you also including this flag -DMBED_HEAP_STATS_ENABLED=1 I think we enable some heap traces that might use more memory. So, are you saying that you dont see :34::SKIP: Not enough memory for test , or you do see that but dont see mbed_os error info like in the output log above? The addition of mbed_os error handler is fairly new, could it be that you are on older version of mbed_os?

  4. I dont quite understand. Can you show me what that command actually looks like? Has the directory structure for the test changed?

@davidsaada
Copy link
Contributor

Regarding 3: Yes - I'm using the same flags as you do. I simply copied your command (with the exception I broke it to two, in order to run it on a remote board). I am seeing the :34::SKIP: Not enough memory for test line, and I expect to see it. It is just the line that tells us that the test was skipped due to insufficient memory. That's OK and it doesn't fail the test. What I don't see is the exception that follows it - Error Message: Operator new[] out of memory for instance, which makes your tests fail.

Regarding 4: it's the same as yours, but instead of features-tests-filesystem-buffered_block_device I need mbed-os-features-tests-filesystem-buffered_block_device. AFAIK, with the newer version of the tools, I am obliged to run it with this mbed-os prefix, something that wasn't true in the older version.

@aashishc1988
Copy link
Contributor Author

Interesting.
I updated point 3 above. It says The addition of mbed_os error handler is fairly new, could it be that you are on older version of mbed_os?
Regarding mbed-os as prefix, I need to investigate it more as to why I dont need it. The tools that I am using are all updated to there latest version.

@davidsaada
Copy link
Contributor

Well. I am updated with the latest mbed-os changes. In addition, I'm pretty sure that the PR that included the std::nothrow changes has passed CI with the error handling mechanism already in (it is fairly new). I really don't know what to say. Only way I can tackle it is if I reproduce it, something that didn't happen yet.
As for the tools, I'm no expert here, so my take on it could be wrong. Saw it in one of the previous PRs I was involved in.

@aashishc1988
Copy link
Contributor Author

@davidsaada Can you point me to the CI test log where it passed so that i can compare what is different between the two setups? It's possible that I might be missing something here.

@aashishc1988
Copy link
Contributor Author

aashishc1988 commented Aug 7, 2018

@davidsaada Can you also try this:
Delete the BUILD directory (or the part directory)
Do a fresh build with this command:
mbed test --compile -m WIZWIKI_W7500 -t GCC_ARM -DMBED_HEAP_STATS_ENABLED=1 -DMBED_STACK_STATS_ENABLED=1 -DMBED_TRAP_ERRORS_ENABLED=1 -DMBED_ALL_STATS_ENABLED

and see if you can reproduce the error

@davidsaada
Copy link
Contributor

Thanks @aashishc1988. I have managed to reproduce the problem with your command. Apparently, it has nothing to do with the MBED_TRAP_ERRORS_ENABLED flag, but with the MBED_STACK_STATS_ENABLED flag (which is also turned on if the MBED_ALL_STATS_ENABLED is set). This flag unleashes a whole bunch of monitoring code, which probably allocates some stuff dynamically (likely without std::nothrow). I'll try to find out where and fix it.

@davidsaada
Copy link
Contributor

@aashishc1988 #7730 should fix it. Please verify.

@aashishc1988
Copy link
Contributor Author

@davidsaada #7730 fixes it. Thanks for investigating (and fixing) it :)

@davidsaada
Copy link
Contributor

@aashishc1988 Thanks for verifying. That PR still needs some work, but glad it's in the right direction.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants