Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cypress: FPGA: UART random failure #11860

Closed
yarbcy opened this issue Nov 13, 2019 · 23 comments
Closed

Cypress: FPGA: UART random failure #11860

yarbcy opened this issue Nov 13, 2019 · 23 comments

Comments

@yarbcy
Copy link
Contributor

yarbcy commented Nov 13, 2019

Description of defect

Cypress: FPGA: UART random failure

Step to reproduce:

  1. Run UART tests
  2. Comment all test in mbed_hal_fpga_ci_test_shield/uart/main.cpp, uncomment any test for example line 313.
  3. Comment line 252 - 271
  4. Run test (repeate couple times)

Expected: PASS

Actual:
tester.reset() returned:

++ MbedOS Error Info ++
[1573648317.62][CONN][RXD] Error Status: 0x80010133 Code: 307 Module: 1
[1573648317.68][CONN][RXD] Error Message: Mutex: 0x8003B14, Not allowed in ISR context
[1573648317.70][CONN][RXD] Location: 0x100095AD
[1573648317.73][CONN][RXD] Error Value: 0x8003B14
[1573648317.84][CONN][RXD] Current Thread: main Id: 0x8003B38 Entry: 0x10009477 StackSize: 0x1000 StackMem: 0x8004A18 SP: 0x80055D4
[1573648317.94][CONN][RXD] For more info, visit: https://mbed.com/s/error?error=0x80010133&tgt=CY8CKIT_062_WIFI_BT

This test PASSED if comment line 251.

Target(s) affected by this defect ?

All Cypress boards. Tested on CY8CKIT_062_WIFI_BT

Toolchain(s) (name and version) displaying this defect ?

Tested on GCC_ARM

What version of Mbed-os are you using (tag or sha) ?

Latest from main.
f1848f9

What version(s) of tools are you using. List all that apply (E.g. mbed-cli)

N/A

How is this defect reproduced ?

50/50

@yarbcy
Copy link
Contributor Author

yarbcy commented Nov 13, 2019

@0xc0170 @mprse

@0xc0170
Copy link
Contributor

0xc0170 commented Nov 13, 2019

Latest from main.

please report SHA, helps to get fixed version, edit the above to contain SHA.

@yarbcy
Copy link
Contributor Author

yarbcy commented Nov 13, 2019

Latest from main.

please report SHA, helps to get fixed version, edit the above to contain SHA.

Done. Thanks.

@ciarmcom
Copy link
Member

Internal Jira reference: https://jira.arm.com/browse/MBOTRIAGE-2315

@yarbcy
Copy link
Contributor Author

yarbcy commented Nov 15, 2019

@mprse Please take a llok if you have time.

@yarbcy
Copy link
Contributor Author

yarbcy commented Nov 15, 2019

@mprse I see this exception from tester (FPGA) side.

@mprse
Copy link
Contributor

mprse commented Nov 15, 2019

So if you run only one test case (any) it fails randomly?

Comment line 252 - 271

What does it mean? What is the purpose of removing these lines:

core_util_critical_section_exit();
tester.rx_start();
tester.tx_start(use_flow_control);
while (core_util_atomic_load_u32(&td.rx_cnt) != GETC_REPS) {
// Wait until the last byte is received to UART RX reg.
};
core_util_critical_section_enter();
serial_irq_set(&serial, RxIrq, 0);
core_util_critical_section_exit();
tester.tx_stop();
tester.rx_stop();
for (int i = 0; i < GETC_REPS; tester_buff++, i++) {
TEST_ASSERT_EQUAL(tester_buff, td.rx_buff[i]);
}
// Make sure TX IRQ was disabled during the last RX test.
TEST_ASSERT_EQUAL_UINT32(checksum, tester.rx_get_checksum());
TEST_ASSERT_EQUAL_UINT32(2 * PUTC_REPS, tester.rx_get_count());
// Cleanup

Mutex: 0x8003B14, Not allowed in ISR context
This test PASSED if comment line 251:

serial_irq_set(&serial, RxIrq, 1);

It looks like something bad happens inside serial interrupt handler - some disallowed operation and we got an exception. serial_irq_set() enables serial interrupts, so if you remove this line probably interrupt mode is disabled, so you don't have the exception.

@fkjagodzinski have you seen such behavior?

@yarbcy
Copy link
Contributor Author

yarbcy commented Nov 15, 2019

@mprse

So if you run only one test case (any) it fails randomly?

Yes.

Comment line 252 - 271
What does it mean? What is the purpose of removing these lines:

It is not important. I commented because I wanted to isolate the problem.

@yarbcy
Copy link
Contributor Author

yarbcy commented Nov 18, 2019

@fkjagodzinski ?

@fkjagodzinski
Copy link
Member

3. Comment line 252 - 271

++ MbedOS Error Info ++
[1573648317.62][CONN][RXD] Error Status: 0x80010133 Code: 307 Module: 1
[1573648317.68][CONN][RXD] Error Message: Mutex: 0x8003B14, Not allowed in ISR context
[1573648317.70][CONN][RXD] Location: 0x100095AD
[1573648317.73][CONN][RXD] Error Value: 0x8003B14
[1573648317.84][CONN][RXD] Current Thread: main Id: 0x8003B38 Entry: 0x10009477 StackSize: 0x1000 StackMem: 0x8004A18 SP: 0x80055D4
[1573648317.94][CONN][RXD] For more info, visit: https://mbed.com/s/error?error=0x80010133&tgt=CY8CKIT_062_WIFI_BT

This looks suspicious. On current master (and also the SHA you mentioned -- f1848f9) you should not remove line 252 alone; you should also remove 249 & 251.

core_util_critical_section_enter();
// Enable only the RX IRQ.
serial_irq_set(&serial, RxIrq, 1);
core_util_critical_section_exit();

I do not have any Cypress target with an Arduino form factor to test with FPGA shield but I can see a similar crash on K64F if I comment out 252-271:

[1574070560.88][CONN][RXD] ++ MbedOS Error Info ++
[1574070560.92][CONN][RXD] Error Status: 0x80010133 Code: 307 Module: 1
[1574070560.98][CONN][RXD] Error Message: Mutex: 0x2000113C, Not allowed in ISR context
[1574070560.99][CONN][RXD] Location: 0x6817
[1574070561.03][CONN][RXD] Error Value: 0x2000113C
[1574070561.13][CONN][RXD] Current Thread: main Id: 0x20001160 Entry: 0x6747 StackSize: 0x1000 StackMem: 0x200022B0 SP: 0x20002F50
[1574070561.21][CONN][RXD] For more info, visit: https://mbed.com/s/error?error=0x80010133&tgt=K64F
[1574070561.23][CONN][RXD] -- MbedOS Error Info --

If you like to skip DUT RX (IRQ) / FPGA TX part of the test, I suggest commenting out all the related code (lines 239-269):

// DUT RX (IRQ) / FPGA TX
// serial_getc() may return 16-bit as well as 8-bit value cast to an int.
// Use a random initial value, but make sure it is low enouth,
// so the FPGA will not overflow 8 bits when incrementing it.
tester_buff = rand() % ((1 << test_buff_bits) - GETC_REPS);
tester.tx_set_next(tester_buff);
tester.tx_set_count(GETC_REPS);
if (!use_flow_control) {
tester.tx_set_delay(TX_START_DELAY_NS);
}
core_util_critical_section_enter();
// Enable only the RX IRQ.
serial_irq_set(&serial, RxIrq, 1);
core_util_critical_section_exit();
tester.rx_start();
tester.tx_start(use_flow_control);
while (core_util_atomic_load_u32(&td.rx_cnt) != GETC_REPS) {
// Wait until the last byte is received to UART RX reg.
};
core_util_critical_section_enter();
serial_irq_set(&serial, RxIrq, 0);
core_util_critical_section_exit();
tester.tx_stop();
tester.rx_stop();
for (int i = 0; i < GETC_REPS; tester_buff++, i++) {
TEST_ASSERT_EQUAL(tester_buff, td.rx_buff[i]);
}
// Make sure TX IRQ was disabled during the last RX test.
TEST_ASSERT_EQUAL_UINT32(checksum, tester.rx_get_checksum());
TEST_ASSERT_EQUAL_UINT32(2 * PUTC_REPS, tester.rx_get_count());

@yarbcy
Copy link
Contributor Author

yarbcy commented Nov 18, 2019

@mprse @0xc0170 Please take a look @fkjagodzinski can reproduce this issue:

I do not have any Cypress target with an Arduino form factor to test with FPGA shield but I can see a similar crash on K64F if I comment out 252-271:

@mprse
Copy link
Contributor

mprse commented Nov 19, 2019

I think that we are missing the point here. @fkjagodzinski reproduced your failure on K64F by removing some random part of the test that you mentioned. Original test version passes on the K64F.

I think that we should start again since I think that no one understands now where or even if we have a problem.

Please provide full test output with the failure (without modifying the test).

@yarbcy
Copy link
Contributor Author

yarbcy commented Nov 19, 2019

I think that we are missing the point here. @fkjagodzinski reproduced your failure on K64F by removing some random part of the test that you mentioned. Original test version passes on the K64F.

I think that we should start again since I think that no one understands now where or even if we have a problem.

Please provide full test output with the failure (without modifying the test).

OK.

@yarbcy
Copy link
Contributor Author

yarbcy commented Nov 19, 2019

I think that we are missing the point here. @fkjagodzinski reproduced your failure on K64F by removing some random part of the test that you mentioned. Original test version passes on the K64F.

I think that we should start again since I think that no one understands now where or even if we have a problem.

Please provide full test output with the failure (without modifying the test).

@mprse Please see test output (without modifying the test):

[1574157839.82][HTST][INF] host test executor ver. 0.0.9
[1574157839.82][HTST][INF] copy image onto target... SKIPPED!
[1574157839.82][HTST][INF] starting host test process...
[1574157839.83][CONN][INF] starting connection process...
[1574157839.83][CONN][INF] notify event queue about extra 60 sec timeout for serial port pooling
[1574157839.83][CONN][INF] initializing serial port listener...
[1574157839.83][SERI][INF] serial(port=/dev/tty.usbmodem14344302, baudrate=9600, read_timeout=0.01, write_timeout=5)
[1574157839.83][HTST][INF] setting timeout to: 60 sec
[1574157839.83][SERI][TXD] mbedmbedmbedmbedmbedmbedmbedmbedmbedmbed
[1574157839.83][CONN][INF] sending up to 2 __sync packets (specified with --sync=2)
[1574157839.83][CONN][INF] sending preamble '623e60f4-5280-4229-a350-36c6211c7437'
[1574157839.83][SERI][TXD] {{__sync;623e60f4-5280-4229-a350-36c6211c7437}}
[1574157839.96][CONN][RXD] mbedmbedmbedmbedmbedmbedmbedmbed
[1574157840.01][CONN][INF] found SYNC in stream: {{__sync;623e60f4-5280-4229-a350-36c6211c7437}} it is #0 sent, queued...
[1574157840.02][HTST][INF] sync KV found, uuid=623e60f4-5280-4229-a350-36c6211c7437, timestamp=1574157840.014832
[1574157840.04][CONN][INF] found KV pair in stream: {{__version;1.3.0}}, queued...
[1574157840.04][HTST][INF] DUT greentea-client version: 1.3.0
[1574157840.06][CONN][INF] found KV pair in stream: {{__timeout;240}}, queued...
[1574157840.06][HTST][INF] setting timeout to: 240 sec
[1574157840.09][CONN][INF] found KV pair in stream: {{__host_test_name;default_auto}}, queued...
[1574157840.09][HTST][INF] host test class: '<class 'mbed_os_tools.test.host_tests.default_auto.DefaultAuto'>'
[1574157840.09][HTST][INF] host test setup() call...
[1574157840.09][HTST][INF] CALLBACKs updated
[1574157840.09][HTST][INF] host test detected: default_auto
[1574157840.11][CONN][INF] found KV pair in stream: {{__testcase_count;1}}, queued...
[1574157840.14][CONN][RXD] >>> Running 1 test cases...
[1574157840.20][CONN][RXD]
[1574157840.20][CONN][INF] found KV pair in stream: {{__testcase_name;basic, 9600, 8N1, FC off}}, queued...
[1574157840.25][CONN][RXD] >>> Running case #1: 'basic, 9600, 8N1, FC off'...
[1574157840.29][CONN][INF] found KV pair in stream: {{__testcase_start;basic, 9600, 8N1, FC off}}, queued...
[1574157840.32][CONN][RXD] Skipping TX pin D1 (327681)
[1574157840.35][CONN][RXD] Skipping TX pin D14 (393217)
[1574157840.39][CONN][RXD] Skipping TX pin D12 (786433)
[1574157840.42][CONN][RXD] Skipping RX pin D0 (327680)
[1574157840.45][CONN][RXD] Skipping RX pin D15 (393216)
[1574157840.48][CONN][RXD] Skipping RX pin D11 (786432)
[1574157840.53][CONN][RXD] Testing all UART-no-FC peripherals
[1574157840.58][CONN][RXD] 1080098816 - Could not find pins to test peripheral
[1574157840.67][CONN][RXD] 1080164352 - peripheral tested on port: peripheral=(1080164352) TX=(A1) RX=(A0) ...succeeded

No responce...

@yarbcy
Copy link
Contributor Author

yarbcy commented Nov 20, 2019

@mprse This test expects data from tester (FPGA):
Last main.cpp call - line 251 (mbed_hal_fpga_ci_test_shield/uart/main.cpp)

RemoteDesktopManager64_NgpXNJVWwo

@mprse
Copy link
Contributor

mprse commented Nov 21, 2019

I'm on sick leave. Will chcek later.

So the last call in the test is enabling the rx interrupt. And then you land in cyhal_uart_getc()? It looks like the rx interrupt fired and it tries to get data from uart buffer... This is probably some issue if getc is called immediately after enabling rx interrupt.

@fkjagodzinski
Copy link
Member

Good point @mprse. If the

static void test_irq_handler(uint32_t id, SerialIrq event)
is called immediately after
serial_irq_set(&serial, RxIrq, 1);
that means the source of RX interrupt is different than a transmission from the FPGA shield. FPGA does not start sending data until
tester.tx_start(use_flow_control);

@yarbcy I suggest looking into

void serial_irq_set(serial_t *obj, SerialIrq irq, uint32_t enable)
and
static void serial_handler_internal(void *handler_arg, cyhal_uart_event_t event)

@yarbcy
Copy link
Contributor Author

yarbcy commented Nov 22, 2019

@mprse Get well. @fkjagodzinski

This test consist of 4 test cases:

  1. // DUT TX / FPGA RX
  2. // DUT RX / FPGA TX
  3. // DUT TX (IRQ) / FPGA RX
  4. // DUT RX (IRQ) / FPGA TX

I tried Run 4) separatelly. But it stucked on line 255.
If run 3) and 4) test some time passed because it used tester data from 3).

It looks like test issue.

@yarbcy
Copy link
Contributor Author

yarbcy commented Nov 22, 2019

Issue is not reproducible with code change:

1111

@0xc0170
Copy link
Contributor

0xc0170 commented Nov 25, 2019

Issue is not reproducible with code change:

I don't see what is changed. Can you rather include git diff ?

@yarbcy
Copy link
Contributor Author

yarbcy commented Nov 25, 2019

1111

@mprse
Copy link
Contributor

mprse commented Nov 29, 2019

Hello again. According to the defined behavior of serial_irq_set():

mbed-os/hal/serial_api.h

Lines 141 to 145 in 7177d8f

* * ::serial_irq_set enables or disables the serial RX or TX IRQ.
* * If `RxIrq` is enabled by ::serial_irq_set, ::serial_irq_handler will be invoked whenever
* Receive Data Register Full IRQ is generated.
* * If `TxIrq` is enabled by ::serial_irq_set, ::serial_irq_handler will be invoked whenever
* Transmit Data Register Empty IRQ is generated.

In the test, we are enabling RX interrupt which should fire when Receive Data Register is full, however it looks like the interrupt is triggered just after enabling it (where no data should be received). This means that for sure this is a Cypress driver issue.

In the following line we can see that the following interrupts are enabled (CYHAL_UART_IRQ_RX_DONE | CYHAL_UART_IRQ_RX_ERROR | CYHAL_UART_IRQ_RX_NOT_EMPTY):

static const cyhal_uart_event_t interrupt_mask = (cyhal_uart_event_t)
(CYHAL_UART_IRQ_RX_DONE | CYHAL_UART_IRQ_RX_ERROR | CYHAL_UART_IRQ_RX_NOT_EMPTY);

I suggest leaving only CYHAL_UART_IRQ_RX_NOT_EMPTY. That would be consistent with the defined behavior. Maybe CYHAL_UART_IRQ_RX_DONE | CYHAL_UART_IRQ_RX_ERROR are causing that interrupt is fired immediately after enabling when no data is received yet.
Alternatively, try to flush RX buffer before enabling the RX interrupt.

@yarbcy
Copy link
Contributor Author

yarbcy commented Dec 9, 2019

Created #12052

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants