Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

otbr-agent received signal SIGSEGV, Segmentation fault. (PriorityQueue) #2475

Open
jinpeng1989 opened this issue Sep 5, 2024 · 11 comments
Open

Comments

@jinpeng1989
Copy link

jinpeng1989 commented Sep 5, 2024

Describe the bug:
The otbr-agent process crashed, and GDB debugging found that the error was near the PriorityQueue function.
The ot-br-posix code used is: https://github.com/SiliconLabs/simplicity_sdk/tree/v2024.6.1-0/util/third_party/ot-br-posix
Release note: https://github.com/SiliconLabs/simplicity_sdk/releases/tag/v2024.6.1-0
1170286e646140d3714b7a2d0178196

@jinpeng1989
Copy link
Author

Crash here, please see log.
SyslogCatchAll-2024-08-28-1-and-2.zip
image

@jwhui
Copy link
Member

jwhui commented Sep 5, 2024

Are you able to reference a specific GitHub commit in an OpenThread repo? I did look at the Simplicity SDK link you provided above, but it wasn't obvious which OpenThread repo commit it was using.

Can you provide more details on the specific test scenario so that others can reproduce this issue?

@abtink
Copy link
Member

abtink commented Sep 5, 2024

Thanks for reporting this.

@jwhui and I investigated this and found a potential cause for this situation.

This scenario can occur when IPv6 fragmentation is enabled and utilized. Could you confirm whether you have OPENTHREAD_CONFIG_IP6_FRAGMENTATION_ENABLE enabled in your project?

Brief description of the issue:
- A message using IPv6 fragmentation can be placed in Ip6::mReassemblyList even if it's also marked for transmission to the Thread mesh.
- This can lead to the message being included in two separate queues. Which is not allowed and causes the assert.
- I'll submit a PR later to address this.

Ignore earlier comment. Investigating this further, there is no issue related to this (as a clone of message is allocated to be added in Ip6::mReassemblyList).

@jinpeng1989
Copy link
Author

The release note for the simplicity_sdk describes the code repository used.
The Silicon Labs OpenThread SDK includes all changes from the OpenThread GitHub repo (https://github.com/openthread/openthread)
up to and including commit 1fceb225b.
The Silicon Labs OpenThread SDK includes all changes from the OpenThread border router GitHub repo
(https://github.com/openthread/ot-br-posix) up to and including commit e56c020.
https://www.silabs.com/documents/public/release-notes/open-thread-release-notes-2.5.1.0.pdf
image

@jwhui
Copy link
Member

jwhui commented Sep 6, 2024

Can you provide more information about your HW setup? Are you running this on a Raspberry Pi?

This is the first time we've seen this bug reported, so just trying to understand if there's an issue related to your specific setup.

@jinpeng1989
Copy link
Author

We discovered the issue during a system test involving five models of device. Three of them is SED, one is TBR, one is REED. The system consists of 1 TBR + 16 TME + 84 SED. However, it does not mean that Thread network size is a necessary condition for this issue. The otbr-agent crash has also been observed in small systems. One special feature is that both the diagnostics and mesh diagnostics interfaces are accessed.

@jinpeng1989
Copy link
Author

The otbr-posix runs on OpenWRT system. This solution has been around for two or three years. The otbr-posix code was recently updated to introduce the mesh dianostic feature. Many issues occur frequently on this version.
image

@jwhui
Copy link
Member

jwhui commented Sep 6, 2024

From the stack trace in #2475 (comment), it appears that this assert is getting triggered:

https://github.com/openthread/openthread/blob/4459c54069bb8573579aa4e84c3c6cb6ea82b1cf/src/core/common/message.cpp#L901-L902

However, the first thing that HandleSendQueue() does is call Dequeue(), which does this:

https://github.com/openthread/openthread/blob/4459c54069bb8573579aa4e84c3c6cb6ea82b1cf/src/core/common/message.cpp#L948-L951

So it's not clear yet why the asserts are failing.

@jinpeng1989
Copy link
Author

This issue occurred frequently in our test environment, and was observed at least once in five days. What can we do to further analyze this issue?

@jwhui
Copy link
Member

jwhui commented Sep 6, 2024

This issue occurred frequently in our test environment, and was observed at least once in five days. What can we do to further analyze this issue?

If possible, you can help analyze the code path identified in #2475 (comment) and determine where the assert conditions are no longer true.

@abtink
Copy link
Member

abtink commented Sep 6, 2024

I would suggest checking whether or not OPENTHREAD_CONFIG_IP6_FRAGMENTATION_ENABLE is enabled on your build.

If it is enabled, it would be good to see if you can disable it and test again (this would give a clue whether the fragmentation logic may be impacting this).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants