Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SonoBus crash due to out of bounds access in aoo #253

Open
Athrunen opened this issue Aug 21, 2024 · 11 comments
Open

SonoBus crash due to out of bounds access in aoo #253

Athrunen opened this issue Aug 21, 2024 · 11 comments

Comments

@Athrunen
Copy link

For some time now my SonoBus has been crashing after an extended runtime of 6+ hours.
Not a problem for short runs but annoying in continuous operation.

When debugging, this line in aoo causes the actual crash:

std::copy(&buffer_[rdhead_], &buffer_[rdhead_ + n1], buffer);

When the read exceeds the length of the local buffer, rdhead_ + n1 is supposed to read to buffer_.size() - 1 but instead reads to buffer_.size() which is of course out of bounds.

inline int32_t SLIP::read_bytes(uint8_t *buffer, int32_t size){
auto capacity = (int32_t)buffer_.size();
if (size > balance_){
size = balance_;
}
auto end = rdhead_ + size;
int32_t n1, n2;
if (end > capacity){
n1 = capacity - rdhead_;
n2 = end - capacity;
} else {
n1 = size;
n2 = 0;
}
std::copy(&buffer_[rdhead_], &buffer_[rdhead_ + n1], buffer);
std::copy(&buffer_[0], &buffer_[n2], buffer + n1);
rdhead_ += size;
if (rdhead_ >= capacity){
rdhead_ -= capacity;
}
balance_ -= size;
return size;
}

The fix should be to just subtract 1 from n1, but I currently do not have the time to test that.

@Spacechild1
Copy link

Spacechild1 commented Aug 21, 2024

When the read exceeds the length of the local buffer, rdhead_ + n1 is supposed to read to buffer_.size() - 1 but instead reads to buffer_.size() which is of course out of bounds.

C++ generally uses half-open ranges, i.e. the end iterator points to one element past the end of the sequence.

The only problem I see with this code is that it may fail with Visual Studio debug builds if checked iterators are enabled (https://learn.microsoft.com/en-us/cpp/standard-library/checked-iterators?view=msvc-170). &buffer_[buffer_.size()] should be harmless because the actual item is never dereferenced, but a checked [] operator would complain nevertheless. The code can be rewritten as buffer_.begin() + n.

Anyway, the code should work fine in release builds. Can you post a full stack trace?

@Athrunen
Copy link
Author

I did some napkin math because some values are optimized out and that off by one error was my best guess, gonna recompile in Debug now.. but this is the BT for now:

#0  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
#1  0x00007ffff76a5463 in __pthread_kill_internal (threadid=<optimized out>, signo=6) at pthread_kill.c:78
#2  0x00007ffff764c120 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#3  0x00007ffff76334c3 in __GI_abort () at abort.c:79
#4  0x00007ffff5ed3bb0 in std::__glibcxx_assert_fail
(file=file@entry=0x7fffa6278be0 "/usr/include/c++/14.2.1/bits/stl_vector.h", line=line@entry=1130, function=function@entry=0x7fffa627a7e8 "std::vector<_Tp, _Alloc>::reference std::vector<_Tp, _Alloc>::operator[](size_type) [with _Tp = unsigned char; _Alloc = std::allocator<unsigned char>; reference = unsigned char&; size_type = long unsi"..., condition=condition@entry=0x7fffa62700d4 "__n < this->size()") at /usr/src/debug/gcc/gcc/libstdc++-v3/src/c++11/assert_fail.cc:41
#5  0x00007fffa5ec79c0 in std::vector<unsigned char, std::allocator<unsigned char> >::operator[] (this=<optimized out>, __n=<optimized out>)
at /usr/include/c++/14.2.1/bits/stl_vector.h:1128
#6  std::vector<unsigned char, std::allocator<unsigned char> >::operator[] (this=<optimized out>, __n=<optimized out>) at /usr/include/c++/14.2.1/bits/stl_vector.h:1128
#7  aoo::SLIP::read_bytes (size=26, this=<optimized out>, buffer=0x7fffa5bff890 "\300/aoo/server/ping") at /usr/src/debug/sonobus/sonobus-1.7.2/deps/aoo/lib/src/SLIP.hpp:59
#8  aoo::net::client::send_server_message_tcp (this=0x555556255d50, data=<optimized out>, size=<optimized out>) at /usr/src/debug/sonobus/sonobus-1.7.2/deps/aoo/lib/src/client.cpp:918
#9  0x00007fffa5ebe6e9 in aoo::net::client::send_server_message_tcp (this=0x555556255d50, data=<optimized out>, size=<optimized out>)
at /usr/src/debug/sonobus/sonobus-1.7.2/deps/aoo/lib/src/client.cpp:222
#10 aoo::net::client::run (this=0x555556255d50) at /usr/src/debug/sonobus/sonobus-1.7.2/deps/aoo/lib/src/client.cpp:222
#11 0x00007fffa607f760 in juce::Thread::createNativeThread(juce::Thread::Priority)::{lambda(void*)#1}::_FUN(void*) [clone .lto_priv.0] ()
at /usr/src/debug/sonobus/sonobus-1.7.2/deps/juce/modules/juce_core/threads/juce_Thread.cpp:98
#12 0x00007ffff76a339d in start_thread (arg=<optimized out>) at pthread_create.c:447
#13 0x00007ffff772849c in __GI___clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78

@Spacechild1
Copy link

Thanks! Just as I expected, the crash in the debug build is caused by debug iterators. However, this is not the source of the original crash. I think the quickest solution is to compile with RelWithDebInfo instead of Debug. All assertions will be disabled, but you still get (some) debug symbols. This way we can at least narrow down the actual source of the issue.

@Athrunen
Copy link
Author

Weird, the above stacktrace was with RelWithDebInfo, wich is why some stuff is optimized out.
But I'm compiling this by modifying the aur PKGBUILD which might introduce some weirdness.

@Spacechild1
Copy link

That is weird indeed! RelWithDebInfo isn't supposed to enable assertions. I'd suggest to follow the official build instructions.

@Athrunen
Copy link
Author

Found the problem: Arch includes _GLIBCXX_ASSERTIONS by default. Even in release builds. So the crashes might just have been failed asserts.

Adding -Wp,-U_GLIBCXX_ASSERTIONS to the CXXFLAGS should disable the asserts for now.

I will test out the new build and reach out to the aur maintainer in the meantime.

@Spacechild1
Copy link

Spacechild1 commented Aug 21, 2024

Arch includes _GLIBCXX_ASSERTIONS by default.

Ouch! Good catch! Why do they do this? It's terrible for performance!

@Jannik2099
Copy link

Jannik2099 commented Sep 23, 2024

&buffer_[buffer_.size()] should be harmless because the actual item is never dereferenced

no, forming a reference to an out-of-bounds element is UB. It doesn't matter if / how the reference is used afterwards.
If you really do want to form a past-the-array address, I think you may be able to with casting to std::uintptr_t and doing arithmetic on it - but ideally just don't?

A &buffer_[std::min(rdhead_ + n1, buffer_.size() - 1)] should do just fine.

EDIT: no it won't due to the last element. Probably best to use std::copy_n here.

Why do they do this?

Because developers don't know the language semantics, evidently.

@Spacechild1
Copy link

Spacechild1 commented Sep 23, 2024

&buffer_[buffer_.size()] should be harmless because the actual item is never dereferenced

no, forming a reference to an out-of-bounds element is UB

No, forming a reference to one element past the end of the array is completely valid! How else would you implement end() iterators? &buffer_[buffer_.size()] would be 100% valid C code. The only problem really is the debug bound check in the [] operator. (The [] member function does not know that we are immediately taking the address, so it aborts nevertheless.)

Probably best to use std::copy_n here.

Or use buffer_.begin() + n, as I've written above. They really do the same. Here's the implementation of std::copy_n in GCC's libstdc++:

 __copy_n(_RandomAccessIterator __first, _Size __n,
	     _OutputIterator __result, random_access_iterator_tag)
    { return std::copy(__first, __first + __n, __result); }

@Jannik2099
Copy link

Yes sorry, from the original I read rdhead_ as being equal to buffer.size(), not rdhead_ + 1.

Either way, _GLIBCXX_ASSERTIONS are meant for production use similar to FORTIFY_SOURCE. RHEL and Fedora have been using it for a long time, Arch and Gentoo (and I think also Suse?) do too. libc++ has also recently adopted a similar setting that will also check operator[] for < size().

@Spacechild1
Copy link

Spacechild1 commented Sep 23, 2024

Either way, _GLIBCXX_ASSERTIONS are meant for production

It is first and foremost a debug setting. IMO adding range-checks to operator[] by default - even in release builds - breaks C++'s principle of zero-cost abstractions. After all, std::vector already has the at member function... But then again, it's not my business how other people compile my code. If they want these bound checks, so be it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants