-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failing tests and deadlocks on linux/aarch64 (Amazon Graviton2) #687
Comments
More info in the duplicate but closed issue: #688 |
I have been working on installing and testing this package for amd64 and arm64 architectures. While testing I am getting 2 tests failure on my local arm server. It would be really helpful if you could share some pointers on it.
Log for reference: oneTBB_tests_arm64.txt |
As for other tests, it is quite difficult to suppose what is going wrong (supposedly, it might be related to relaxed memory model of aarch64). Is possible to share core dumps of hanged tests? |
Due to Christmas vacation we are not likely to return before January. Sorry. |
@alexey-katranov, @anton-potapov Error in Hardware: Raspberry Pi 4 8GB RAM.
Test: // g++ -pthread -std=c++17 pthread.cpp && ./a.out && echo OK || echo ERROR
#include <algorithm>
#include <atomic>
#include <condition_variable>
#include <thread>
#include <vector>
#include <iostream>
#include <sys/types.h>
#include <sys/time.h>
#include <sys/resource.h>
#include <unistd.h>
void limitThreads(size_t limit)
{
rlimit rlim;
int ret = getrlimit(RLIMIT_NPROC, &rlim);
if (ret != 0)
{
std::cerr << "getrlimit has returned an error" << std::endl;
exit(1);
}
rlim.rlim_cur = (rlim.rlim_max == (rlim_t)RLIM_INFINITY) ? limit : std::min(limit, rlim.rlim_max);
ret = setrlimit(RLIMIT_NPROC, &rlim);
if (ret != 0)
{
std::cerr << "setrlimit has returned an error" << std::endl;
exit(1);
}
}
static std::mutex m;
static std::condition_variable cv;
static std::atomic<bool> stop{ false };
static void* thread_routine(void*)
{
std::unique_lock<std::mutex> lock(m);
cv.wait(lock, [] { return stop == true; });
return 0;
}
static void* new_thread_routine(void*)
{
std::cerr << "sleep" << std::endl;
sleep(60);
return 0;
}
class Thread {
pthread_t mHandle{};
bool mValid{};
public:
Thread() {
mValid = false;
pthread_attr_t attr;
// Limit the stack size not to consume all virtual memory on 32 bit platforms.
if (pthread_attr_init(&attr) == 0 && pthread_attr_setstacksize(&attr, 100*1024) == 0) {
mValid = pthread_create(&mHandle, &attr, thread_routine, /* arg = */ nullptr) == 0;
}
}
bool isValid() const { return mValid; }
void join() {
pthread_join(mHandle, nullptr);
}
};
void check( int error_code, const char* routine )
{
if (error_code)
{
std::cerr << routine << std::endl;
_exit(1);
}
}
int main()
{
// Some systems set really big limit (e.g. >45К) for the number of processes/threads
limitThreads(1024);
std::thread /* isolate test */ ([] {
std::vector<Thread> threads;
stop = false;
auto finalize = [&] {
stop = true;
cv.notify_all();
for (auto& t : threads) {
t.join();
}
};
for (int i = 0;; ++i) {
Thread thread;
if (!thread.isValid()) {
break;
}
threads.push_back(thread);
if (i == 1024) {
std::cerr << "setrlimit seems having no effect" << std::endl;
finalize();
return;
}
}
pthread_t new_handle;
pthread_attr_t s;
check(pthread_attr_init( &s ), "pthread_attr_init has failed");
pthread_t handle;
void * arg = nullptr;
int ec = pthread_create( &new_handle, &s, new_thread_routine, arg );
if (ec) {
std::cerr << "EXPECTED ERROR: " << "pthread_create has failed" << std::endl;
} else {
std::cerr << "UNEXPECTED OK: " << "pthread_create is not failed" << std::endl;
_exit(1);
}
check( pthread_attr_destroy( &s ), "pthread_attr_destroy has failed" );
pthread_join(new_handle, nullptr);
finalize();
}).join();
return 0;
} Output:
Maybe bug in glibc or in ubuntu 20.04 kernel? ... Or in my test? |
Maybe |
@alexey-katranov YES!!! https://man7.org/linux/man-pages/man3/pthread_attr_setstacksize.3.html:
On aarch64 |
Signed-off-by: Vladislav Shchapov <phprus@gmail.com>
@phprus I guess issue was closed automatically because you mentioned it. Reopening. |
Release build:
RelWithDebInfo build work without error. |
@ulfworsoe is this issue still relevant? |
I don't know if it's relevant with the latest oneapi release. I'll have to check. |
Sure, thank you for a quick response |
I have rerun the tests. I don't have access to a graviton at the moment, so it is run on a machine with similar capabilities.
There is one test that hangs, but I can't say if it is the same issue:
This is where it hangs:
|
Found on:
CC=clang CXX=clang++ cmake -GNinja -DCMAKE_INSTALL_PREFIX=$HOME/local/2021.4.0 -DCMAKE_BUILD_TYPE=Release
).Running the TBB tests, some tests failed:
test_composite_node
appear to hang, killed manually after ~1000 secondstest_concurrent_vector
appear to hang, killed manually after ~1000 secondstest_eh_thead
abortsThe text was updated successfully, but these errors were encountered: