Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sanitycheck halt some test cases with parallel running. #21123

Closed
chen-png opened this issue Dec 3, 2019 · 9 comments
Closed

sanitycheck halt some test cases with parallel running. #21123

chen-png opened this issue Dec 3, 2019 · 9 comments
Assignees
Labels
area: Sanitycheck Sanitycheck has been renamed to Twister bug The issue is a bug, or the PR is fixing a bug priority: low Low impact/importance bug

Comments

@chen-png
Copy link
Collaborator

chen-png commented Dec 3, 2019

Describe the bug

for the parallel running feature in sanitycheck script refer to #18603, I tested it with two boards reel_board and frdm_k64f, they are all the pyocd runner in the supported runner list, but some successful tests judged as failed sometimes.
I found that test device handler ran
"subprocess.check_output(command, stderr=subprocess.STDOUT)",
it will cause a subprocess error, raise ValueError"The device has no langid", then it will halt the thread and judge this test failed.
the "tests/kernel/threads/no-multithreading/kernel.threads.no-multithreading" failed, however, others passed, the same board.

To Reproduce
Steps to reproduce the behavior:

  1. only run reel_board, all tests passed
    cmd: sanitycheck --hardware-map map.yml --device-testing -T tests/kernel/threads/ -v -v
    output:
    4 of 4 tests passed (100.00%), 0 failed, 0 skipped with 0 warnings in 46.43 seconds
    In total 4 test cases were executed on 1 out of total 213 platforms (0.47%)
    Hardware distribution summary:
    reel_board (OSHW000041114e4500623006bcf000220571000097969900): 4

  2. only run frdm_k64f, all tests also passed
    cmd: sanitycheck --hardware-map map.yml --device-testing -T tests/kernel/threads/ -v -v
    output:
    4 of 4 tests passed (100.00%), 0 failed, 0 skipped with 0 warnings in 43.64 seconds
    In total 4 test cases were executed on 1 out of total 213 platforms (0.47%)
    Hardware distribution summary:
    frdm_k64f (0240020116B95E69EB47A3D1): 4

  3. parallel run reel_board and frdm_k64f, it will faile some tests sometimes.
    cmd: sanitycheck --hardware-map map.yml --device-testing -T tests/kernel/threads/ -v -v
    output:
    run test: frdm_k64f/tests/kernel/threads/no-multithreading/kernel.threads.no-multithreading
    command:['west', 'flash', '--skip-rebuild', '-d', '/home/ztest/work/zephyrproject/zephyr/sanity-out/frdm_k64f/tests/kernel/threads/no-multithreading/kernel.threads.no-multithreading', '--runner', 'pyocd', '--board-id', '0240020116B95E69EB47A3D1']
    run test: reel_board/tests/kernel/threads/no-multithreading/kernel.threads.no-multithreading
    command:['west', 'flash', '--skip-rebuild', '-d', '/home/ztest/work/zephyrproject/zephyr/sanity-out/reel_board/tests/kernel/threads/no-multithreading/kernel.threads.no-multithreading', '--runner', 'pyocd', '--board-id', 'OSHW000041114e4500623006bcf000220571000097969900']

error:b'-- west flash: using runner pyocd
0000433:CRITICAL:main:The device has no langid
raise ValueError("The device has no langid")\nValueError: The device has no langid
ERROR: command exited with status 1: pyocd flash -e sector -t nrf52840 -u OSHW000041114e4500623006bcf000220571000097969900 -f 4000000 /home/ztest/work/zephyrproject/zephyr/sanity-out/reel_board/tests/kernel/threads/no-multithreading/kernel.threads.no-multithreading/zephyr/zephyr.hex'

halted
1/8 reel_board tests/kernel/threads/no-multithreading/kernel.threads.no-multithreading FAILED N/A (device 0.750s)
see: sanity-out/reel_board/tests/kernel/threads/no-multithreading/kernel.threads.no-multithreading/handler.log

7 of 8 tests passed (87.50%), 1 failed, 0 skipped with 0 warnings in 44.30 seconds
In total 4 test cases were executed on 2 out of total 213 platforms (0.94%)
Hardware distribution summary:
reel_board (OSHW000041114e4500623006bcf000220571000097969900): 4
frdm_k64f (0240020116B95E69EB47A3D1): 4

Environment (please complete the following information):

  • OS: Fedora28
  • Toolchain: zephyr-sdk-0.10.3
  • Commit ID: bbcb352
@chen-png chen-png added the bug The issue is a bug, or the PR is fixing a bug label Dec 3, 2019
@aescolar aescolar added area: Sanitycheck Sanitycheck has been renamed to Twister priority: low Low impact/importance bug labels Dec 3, 2019
@nashif
Copy link
Member

nashif commented Dec 4, 2019

@chen-png This is tricky to resolve when dealing with hardware and flashing in parallel, I suggest to always use --retry-failed 3 to re-run the failing tests, usually this will pass in the second try.

@nashif
Copy link
Member

nashif commented Dec 4, 2019

Also, please check this out: #21096 , it now does better error reporting when flashing.

@chen-png
Copy link
Collaborator Author

chen-png commented Dec 5, 2019

@nashif thanks, i will have a try.

@chen-png
Copy link
Collaborator Author

chen-png commented Dec 5, 2019

I set the --retry-failed as 2, then run all tests on reel_board and frdm_k64f, the failed tests number from 22 reduce to 5, and finally reduce to 4, the same as the expected result.

@nashif
Copy link
Member

nashif commented Dec 5, 2019

@chen-png so are we good?

@chen-png
Copy link
Collaborator Author

chen-png commented Dec 6, 2019

yeah, for this issue, it's ok.

@chen-png
Copy link
Collaborator Author

chen-png commented Dec 6, 2019

but I have a question, how do you decide whether a runner can be paralle running?

because i found --generate-hardware-map is only used for device testing, just according to manufacture, product to set which board can be detected.

and really parallel running used the concurrent.futures.threadpoolexecutor, is seems that if I add all boards into the map file, sanitycheck will add all tests of those boards into queue together used for thread pool to execute, it seems not relevant to the device testing before.

@nashif
Copy link
Member

nashif commented Dec 18, 2019

but I have a question, how do you decide whether a runner can be paralle running?

Not all devices can be run in parallel. There is a note in the documentation which mentions what runners and board types are supported. If a board is not supported for parallel runs, then it needs to be singled out and tested on its own.

@nashif nashif closed this as completed Dec 18, 2019
@chen-png
Copy link
Collaborator Author

Not all devices can be run in parallel. There is a note in the documentation which mentions

Do you mean this note "note: not all types of boards are supported yet, pyocd and nrfjprog flashable devices work fine", or there is other detail document about this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Sanitycheck Sanitycheck has been renamed to Twister bug The issue is a bug, or the PR is fixing a bug priority: low Low impact/importance bug
Projects
None yet
Development

No branches or pull requests

3 participants