Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core: add core_mutex_debug to aid debugging deadlocks #18620

Merged
merged 1 commit into from
Apr 25, 2023

Conversation

maribu
Copy link
Member

@maribu maribu commented Sep 21, 2022

Contribution description

Adding USEMODULE += core_mutex_debug to your Makefile results in
on log messages such as

[mutex] waiting for thread 1 (pc = 0x800024d)

being added whenever mutex_lock() blocks. This makes tracing down
deadlocks easier.

Testing procedure

Run e.g.

USEMODULE=core_mutex_debug BOARD=nucleo-f767zi make -C tests/mutex_cancel flash test

which should provide output such as

Welcome to pyterm!
Type '/exit' to exit.
READY
s
[mutex] waiting for thread 1 (pc = 0x8000f35)
START
main(): This is RIOT! (Version: 2022.10-devel-841-g5cc02-core/mutex/debug)
Test Application for mutex_cancel / mutex_lock_cancelable
=========================================================

Test without cancellation: OK
Test early cancellation: OK
Verify no side effects on subsequent calls: [mutex] waiting for thread 1 (pc = 0x800024d)
OK
Test late cancellation: [mutex] waiting for thread 1 (pc = 0x0)
OK
TEST PASSED
$ arm-none-eabi-addr2line -a 0x800024d -e tests/mutex_cancel/bin/nucleo-f767zi/tests_mutex_cancel.elf 
0x0800024d
/home/maribu/Repos/software/RIOT/tests/mutex_cancel/main.c:51

Issues/PRs references

Depends on and includes #18619

@maribu maribu added Type: enhancement The issue suggests enhanceable parts / The PR enhances parts of the codebase / documentation Area: core Area: RIOT kernel. Handle PRs marked with this with care! labels Sep 21, 2022
@github-actions github-actions bot added the Area: Kconfig Area: Kconfig integration label Sep 21, 2022
core/include/mutex.h Outdated Show resolved Hide resolved
core/mutex.c Outdated Show resolved Hide resolved
core/mutex.c Outdated Show resolved Hide resolved
core/mutex.c Show resolved Hide resolved
core/mutex.c Show resolved Hide resolved
@benpicco
Copy link
Contributor

benpicco commented Oct 13, 2022

Nice idea!
One question: How is pc = 0x0 to be interpreted?
e.g.

START
main(): This is RIOT! (Version: 2022.10-devel-1017-g528b4-HEAD)
[mutex] waiting for thread 2 (pc = 0x804d1b1)
[mutex] waiting for thread 2 (pc = 0x804d1b1)
[mutex] waiting for thread 2 (pc = 0x804d1b1)
[mutex] waiting for thread 3 (pc = 0x804d8be)
[mutex] waiting for thread 3 (pc = 0x0)
SUCCESS

@maribu
Copy link
Member Author

maribu commented Oct 13, 2022

One question: How is pc = 0x0 to be interpreted?

Either MUTEX_INIT_LOCKED was used or something is really wrong :)

@@ -194,6 +211,9 @@ void mutex_unlock(mutex_t *mutex)
sched_change_priority(owner, mutex->owner_original_priority);
}
#endif
#if IS_USED(MODULE_CORE_MUTEX_DEBUG)
mutex->owner_calling_pc = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah no - if another thread was waiting on the mutex, it will be locked again here, but we don't have the PC of that one.

Not sure if

Suggested change
mutex->owner_calling_pc = 0;
mutex->owner_calling_pc = cpu_get_caller_pc();

would be any better of if we should just document that. (Or can we get the PC of the blocked thread?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be fixed now. There is still a brief time window between clearing this and when the new owner updates this in which theoretically a higher priority thread could wake up and try to take the mutex and find a value of 0 here, but that should barely ever happen. I think for a debug print this is good enough and the issue is now documented.

@benpicco benpicco added the CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR label Apr 25, 2023
@riot-ci
Copy link

riot-ci commented Apr 25, 2023

Murdock results

✔️ PASSED

3e86e39 core: add core_mutex_debug to aid debugging deadlocks

Success Failures Total Runtime
6919 0 6919 10m:21s

Artifacts

Copy link
Contributor

@benpicco benpicco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's certainly a handy addition.

@maribu
Copy link
Member Author

maribu commented Apr 25, 2023

Whoopsie, I forgot to push the fix and doc update I claimed I added in the comment above.

Adding `USEMODULE += core_mutex_debug` to your `Makefile` results in
on log messages such as

    [mutex] waiting for thread 1 (pc = 0x800024d)

being added whenever `mutex_lock()` blocks. This makes tracing down
deadlocks easier.
@benpicco
Copy link
Contributor

bors merge

bors bot added a commit that referenced this pull request Apr 25, 2023
18620: core: add core_mutex_debug to aid debugging deadlocks r=benpicco a=maribu

### Contribution description

Adding `USEMODULE += core_mutex_debug` to your `Makefile` results in
on log messages such as

    [mutex] waiting for thread 1 (pc = 0x800024d)

being added whenever `mutex_lock()` blocks. This makes tracing down
deadlocks easier.

### Testing procedure

Run e.g.

```sh
USEMODULE=core_mutex_debug BOARD=nucleo-f767zi make -C tests/mutex_cancel flash test
```

which should provide output such as

```
Welcome to pyterm!
Type '/exit' to exit.
READY
s
[mutex] waiting for thread 1 (pc = 0x8000f35)
START
main(): This is RIOT! (Version: 2022.10-devel-841-g5cc02-core/mutex/debug)
Test Application for mutex_cancel / mutex_lock_cancelable
=========================================================

Test without cancellation: OK
Test early cancellation: OK
Verify no side effects on subsequent calls: [mutex] waiting for thread 1 (pc = 0x800024d)
OK
Test late cancellation: [mutex] waiting for thread 1 (pc = 0x0)
OK
TEST PASSED
```

```sh
$ arm-none-eabi-addr2line -a 0x800024d -e tests/mutex_cancel/bin/nucleo-f767zi/tests_mutex_cancel.elf 
0x0800024d
/home/maribu/Repos/software/RIOT/tests/mutex_cancel/main.c:51
```

### Issues/PRs references

Depends on and includes #18619

19296: nanocoap: allow to define CoAP resources as XFA r=benpicco a=benpicco



19504: cpu/cc26xx_cc13xx: Fix bogus array-bound warning r=benpicco a=maribu

### Contribution description

GCC 12 create a bogus array out of bounds warning as it assumes that because there is special handling for `uart == 0` and `uart == 1`, `uart` can indeed be `1`. There is an `assert(uart < UART_NUMOF)` above that would blow up prior to any out of bounds access.

In any case, optimizing out the special handling of `uart == 1` for when `UART_NUMOF == 1` likely improves the generated code and fixes the warning.

    /home/maribu/Repos/software/RIOT/cc2650/cpu/cc26xx_cc13xx/periph/uart.c:88:8: error: array subscript 1 is above array bounds of 'uart_isr_ctx_t[1]' [-Werror=array-bounds]
       88 |     ctx[uart].rx_cb = rx_cb;
          |     ~~~^~~~~~
    /home/maribu/Repos/software/RIOT/cc2650/cpu/cc26xx_cc13xx/periph/uart.c:52:23: note: while referencing 'ctx'
       52 | static uart_isr_ctx_t ctx[UART_NUMOF];
          |                       ^~~
    /home/maribu/Repos/software/RIOT/cc2650/cpu/cc26xx_cc13xx/periph/uart.c:89:8: error: array subscript 1 is above array bounds of 'uart_isr_ctx_t[1]' [-Werror=array-bounds]
       89 |     ctx[uart].arg = arg;
          |     ~~~^~~~~~
    /home/maribu/Repos/software/RIOT/cc2650/cpu/cc26xx_cc13xx/periph/uart.c:52:23: note: while referencing 'ctx'
       52 | static uart_isr_ctx_t ctx[UART_NUMOF];
          |                       ^~~

### Testing procedure

The actual change is a pretty obvious one-liner, so that code review and a green CI should be sufficient. If not, running any UART example app without regression should do.

### Issues/PRs references

None

19506: tools/openocd: Fix handling of OPENOCD_CMD_RESET_HALT r=maribu a=maribu

### Contribution description

The OPENOCD_CMD_RESET_HALT was not longer correctly passed to the script. This fixes the issue.

### Testing procedure

Flashing of e.g. the `cc2650-launchpad` with upstream OpenOCD should work again.

### Issues/PRs references

The change was added to #19050 after testing the PR and before merging. I'm not sure if the fix never worked because of this, or if behavior of `target-export-variables` or GNU Make changed.

19507: cpu/cc26x0_cc13x0: Drop feature cortexm_mpu r=benpicco a=maribu

### Contribution description

At least the CC2650 doesn't have an MPU, I assume this is also true for the rest of the family.

The CC2652 does have an MPU according to the datasheet. So I keep the feature there in place.

### Testing procedure

E.g.

```
make BOARD=cc2650-launchpad -C tests/mpu_noexec_ram flash test
```

fails. (Note: A successful test run would also crash but with a mem manage handler rather than a hardfault due to an invalid instruction on the stack being executed.)

It would be nice to also test the same for a `cc2652-launchpad`, for which the MPU should work.

### Issues/PRs references

None

Co-authored-by: Marian Buschsieweke <marian.buschsieweke@ovgu.de>
Co-authored-by: Benjamin Valentin <benjamin.valentin@bht-berlin.de>
Co-authored-by: Benjamin Valentin <benjamin.valentin@ml-pa.com>
@bors
Copy link
Contributor

bors bot commented Apr 25, 2023

Build failed (retrying...):

bors bot added a commit that referenced this pull request Apr 25, 2023
18620: core: add core_mutex_debug to aid debugging deadlocks r=benpicco a=maribu

### Contribution description

Adding `USEMODULE += core_mutex_debug` to your `Makefile` results in
on log messages such as

    [mutex] waiting for thread 1 (pc = 0x800024d)

being added whenever `mutex_lock()` blocks. This makes tracing down
deadlocks easier.

### Testing procedure

Run e.g.

```sh
USEMODULE=core_mutex_debug BOARD=nucleo-f767zi make -C tests/mutex_cancel flash test
```

which should provide output such as

```
Welcome to pyterm!
Type '/exit' to exit.
READY
s
[mutex] waiting for thread 1 (pc = 0x8000f35)
START
main(): This is RIOT! (Version: 2022.10-devel-841-g5cc02-core/mutex/debug)
Test Application for mutex_cancel / mutex_lock_cancelable
=========================================================

Test without cancellation: OK
Test early cancellation: OK
Verify no side effects on subsequent calls: [mutex] waiting for thread 1 (pc = 0x800024d)
OK
Test late cancellation: [mutex] waiting for thread 1 (pc = 0x0)
OK
TEST PASSED
```

```sh
$ arm-none-eabi-addr2line -a 0x800024d -e tests/mutex_cancel/bin/nucleo-f767zi/tests_mutex_cancel.elf 
0x0800024d
/home/maribu/Repos/software/RIOT/tests/mutex_cancel/main.c:51
```

### Issues/PRs references

Depends on and includes #18619

19296: nanocoap: allow to define CoAP resources as XFA r=benpicco a=benpicco



Co-authored-by: Marian Buschsieweke <marian.buschsieweke@ovgu.de>
Co-authored-by: Benjamin Valentin <benjamin.valentin@bht-berlin.de>
Co-authored-by: Benjamin Valentin <benjamin.valentin@ml-pa.com>
@maribu
Copy link
Member Author

maribu commented Apr 25, 2023

bors r-

@bors
Copy link
Contributor

bors bot commented Apr 25, 2023

Canceled.

@maribu
Copy link
Member Author

maribu commented Apr 25, 2023

bors merge

bors bot added a commit that referenced this pull request Apr 25, 2023
18620: core: add core_mutex_debug to aid debugging deadlocks r=maribu a=maribu

### Contribution description

Adding `USEMODULE += core_mutex_debug` to your `Makefile` results in
on log messages such as

    [mutex] waiting for thread 1 (pc = 0x800024d)

being added whenever `mutex_lock()` blocks. This makes tracing down
deadlocks easier.

### Testing procedure

Run e.g.

```sh
USEMODULE=core_mutex_debug BOARD=nucleo-f767zi make -C tests/mutex_cancel flash test
```

which should provide output such as

```
Welcome to pyterm!
Type '/exit' to exit.
READY
s
[mutex] waiting for thread 1 (pc = 0x8000f35)
START
main(): This is RIOT! (Version: 2022.10-devel-841-g5cc02-core/mutex/debug)
Test Application for mutex_cancel / mutex_lock_cancelable
=========================================================

Test without cancellation: OK
Test early cancellation: OK
Verify no side effects on subsequent calls: [mutex] waiting for thread 1 (pc = 0x800024d)
OK
Test late cancellation: [mutex] waiting for thread 1 (pc = 0x0)
OK
TEST PASSED
```

```sh
$ arm-none-eabi-addr2line -a 0x800024d -e tests/mutex_cancel/bin/nucleo-f767zi/tests_mutex_cancel.elf 
0x0800024d
/home/maribu/Repos/software/RIOT/tests/mutex_cancel/main.c:51
```

### Issues/PRs references

Depends on and includes #18619

Co-authored-by: Marian Buschsieweke <marian.buschsieweke@ovgu.de>
@maribu
Copy link
Member Author

maribu commented Apr 25, 2023

bors r-
bors r+

@bors
Copy link
Contributor

bors bot commented Apr 25, 2023

Canceled.

@bors
Copy link
Contributor

bors bot commented Apr 25, 2023

Build succeeded:

@bors bors bot merged commit 46af92d into RIOT-OS:master Apr 25, 2023
@maribu maribu deleted the core/mutex/debug branch July 3, 2023 13:22
@benpicco benpicco added this to the Release 2023.07 milestone Aug 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: core Area: RIOT kernel. Handle PRs marked with this with care! Area: Kconfig Area: Kconfig integration CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR Type: enhancement The issue suggests enhanceable parts / The PR enhances parts of the codebase / documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants