Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash in OTBR related to SRP #2519

Open
agners opened this issue Oct 1, 2024 · 1 comment
Open

Crash in OTBR related to SRP #2519

agners opened this issue Oct 1, 2024 · 1 comment

Comments

@agners
Copy link
Contributor

agners commented Oct 1, 2024

Describe the bug A clear and concise description of what the bug is.

I've encountered a crash using a relatively recent commit ff7227e. It coincided with the shutdown of another OTBR on the same network (this might be by chance, but the timing is very suspicious). This has been observed with the Home Assistant OpenThread Border Router add-on 2.11.0.

otbr-agent: /usr/src/ot-br-posix/third_party/openthread/repo/src/core/net/udp6.cpp:178: ot::Error ot::Ip6::Udp::Open(SocketHandle&, ReceiveHandler, void*): Assertion `!IsOpen(aSocket)' failed.
00:03:35.331 [C] Platform------: ------------------ BEGINNING OF CRASH -------------
00:03:35.331 [C] Platform------: *** FATAL ERROR: Caught signal: 6 (Aborted)
00:03:35.351 [C] Platform------: # 0: /usr/sbin/otbr-agent(+0x38d660) [0x5579a8d660]
00:03:35.351 [C] Platform------: # 1: /usr/sbin/otbr-agent(+0x38d7a0) [0x5579a8d7a0]
00:03:35.351 [C] Platform------: # 2: linux-vdso.so.1 __kernel_rt_sigreturn+0x0 [0xb5eb97a0]
00:03:35.351 [C] Platform------: # 3: /lib/aarch64-linux-gnu/libc.so.6(+0x809f0) [0x7fb59a09f0]
00:03:35.351 [C] Platform------: # 4: /lib/aarch64-linux-gnu/libc.so.6 gsignal+0x1c [0xb595a72c]
00:03:35.351 [C] Platform------: # 5: /lib/aarch64-linux-gnu/libc.so.6 abort+0xf0 [0xb594747c]
00:03:35.351 [C] Platform------: # 6: /lib/aarch64-linux-gnu/libc.so.6(+0x341a4) [0x7fb59541a4]
00:03:35.351 [C] Platform------: # 7: /lib/aarch64-linux-gnu/libc.so.6(+0x3420c) [0x7fb595420c]
00:03:35.351 [C] Platform------: # 8: /usr/sbin/otbr-agent ot::Ip6::Udp::Open(ot::Ip6::Udp::SocketHandle&, void (*)(void*, otMessage*, otMessageInfo const*), void*)+0x5c [0x799cfe0c]
00:03:35.351 [C] Platform------: # 9: /usr/sbin/otbr-agent ot::Ip6::Udp::Socket::Open()+0x38 [0x799cfa54]
00:03:35.351 [C] Platform------: #10: /usr/sbin/otbr-agent ot::Srp::Client::Start(ot::Ip6::SockAddr const&, ot::Srp::Client::Requester)+0x78 [0x799c14bc]
00:03:35.351 [C] Platform------: #11: /usr/sbin/otbr-agent ot::Srp::Client::ProcessAutoStart()+0x364 [0x799c5900]
00:03:35.351 [C] Platform------: #12: /usr/sbin/otbr-agent ot::Srp::Client::HandleNotifierEvents(ot::Events)+0x50 [0x799c176c]
00:03:35.351 [C] Platform------: #13: /usr/sbin/otbr-agent ot::Notifier::EmitEvents()+0x148 [0x7997b0d8]
00:03:35.351 [C] Platform------: #14: /usr/sbin/otbr-agent ot::TaskletIn<ot::Notifier, &ot::Notifier::EmitEvents>::HandleTasklet(ot::Tasklet&)+0x18 [0x7997b978]
00:03:35.351 [C] Platform------: #15: /usr/sbin/otbr-agent ot::Tasklet::RunTask()+0x1c [0x7997e6f0]
00:03:35.351 [C] Platform------: #16: /usr/sbin/otbr-agent ot::Tasklet::Scheduler::ProcessQueuedTasklets()+0x68 [0x7997e694]
00:03:35.351 [C] Platform------: #17: /usr/sbin/otbr-agent otTaskletsProcess+0x38 [0x79a2ac20]
00:03:35.351 [C] Platform------: #18: /usr/sbin/otbr-agent otbr::Ncp::RcpHost::Process(otbr::MainloopContext const&)+0x1c [0x79a3fd20]
00:03:35.351 [C] Platform------: #19: /usr/sbin/otbr-agent otbr::MainloopManager::Process(otbr::MainloopContext const&)+0x68 [0x79a56ec4]
00:03:35.351 [C] Platform------: #20: /usr/sbin/otbr-agent otbr::Application::Run()+0x1e8 [0x7990ac34]
00:03:35.351 [C] Platform------: #21: /usr/sbin/otbr-agent(+0x2107e0) [0x55799107e0]
00:03:35.351 [C] Platform------: #22: /usr/sbin/otbr-agent main+0x88 [0x799109a0]
00:03:35.351 [C] Platform------: #23: /lib/aarch64-linux-gnu/libc.so.6(+0x27740) [0x7fb5947740]
00:03:35.351 [C] Platform------: #24: /lib/aarch64-linux-gnu/libc.so.6 __libc_start_main+0x98 [0xb5947818]
00:03:35.351 [C] Platform------: #25: /usr/sbin/otbr-agent _start+0x30 [0x7990a570]
00:03:35.351 [C] Platform------: ------------------ END OF CRASH ------------------

To Reproduce Information to reproduce the behavior, including:

  1. Git commit id: ff7227e
  2. IEEE 802.15.4 hardware platform: EFR32MG21
  3. Build steps
  4. Network topology

Expected behavior A clear and concise description of what you expected to happen.

Console/log output If applicable, add console/log output to help explain your problem.

Additional context Add any other context about the problem here.

@abtink
Copy link
Member

abtink commented Oct 1, 2024

Thanks, @agners.

Based on my analysis, this appears to be a platform UDP issue and not related to SRP client itself. I assume OPENTHREAD_CONFIG_PLATFORM_UDP_ENABLE is enabled and used, which delegates the handling of UDP sockets to the underlying platform (POSIX in this case).

The assertion is happening in Udp::Open() where the socket seems to already be open:

Error Udp::Open(SocketHandle &aSocket, ReceiveHandler aHandler, void *aContext)
{
    Error error = kErrorNone;

    OT_ASSERT(!IsOpen(aSocket));

    aSocket.Clear();
    aSocket.mHandler = aHandler;
    aSocket.mContext = aContext;
    ...

The Srp::Client::ProcessAutoStart() method will always stop the client (which in turn closes the UDP socket) before restarting it again:

    if (IsRunning())
    {
        VerifyOrExit((GetServerAddress() != serverSockAddr) || shouldRestart);
        Stop(kRequesterAuto, kResetRetryInterval);
    }

    ...
    IgnoreError(Start(serverSockAddr, kRequesterAuto));

The Udp::Close() method can fail only when CONFIG_PLATFORM_UDP_ENABLE is used and the platform provided otPlatUdpClose() fails:

Error Udp::Close(SocketHandle &aSocket)
{
    Error error = kErrorNone;

    VerifyOrExit(IsOpen(aSocket));

#if OPENTHREAD_CONFIG_PLATFORM_UDP_ENABLE
    error = otPlatUdpClose(&aSocket);
#endif
    SuccessOrExit(error);

    RemoveSocket(aSocket);
    aSocket.GetSockName().Clear();
    aSocket.GetPeerName().Clear();

exit:
    return error;
}

My suggestion is to investigate the otPlatUdpClose() implementation (whether there are any additional changes in it) and if/how it may fail.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants