-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
If I run my application for the second time, i will always receive a SIGSEGV #288
Comments
I have no idea about wsl.... Else, what's with:
The instance is a number, actually a pid (getpid())
|
the m_instance part is not my problem. as I told you for the first time, everything is OK. My concern is this part:
I want to know if there are any leftovers left on the system after an application finishes. |
Are you able to run any of the examples or tests for a second time?
0. I set hugepage number: sudo echo 512 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
1. I mount hugepages: sudo mount -t hugetlbfs pagesize=1GB /mnt/huge
Maybe this is not related to the problem at hand, but you are reserving 2M pages, but mounting 1G pages.
…________________________________________
From: LinArcX ***@***.***>
Sent: Tuesday, September 12, 2023 19:54
To: OpenFastPath/ofp
Cc: Subscribed
Subject: [OpenFastPath/ofp] If I run my application for the second time, i will always receive a SIGSEGV (Issue #288)
I created a test application that has a structure like this:
if (0 == odp_init_global(&instance, NULL, NULL)) {
printf("odp_init_global: success!\n");
if (0 == odp_init_local(instance, ODP_THREAD_CONTROL)) {
printf("odp_init_local: success!\n");
ofp_init_global_param(&app_init_params);
}
else {
printf("Error: ODP local init failed.\n");
odp_term_global(instance);
}
}
else {
printf("Error: ODP global init failed.\n");
}
I put the above lines inside the constructor of my class FOO. and I do these things at the constructor:
ofp_term_local();
ofp_term_global();
odp_term_local();
if (m_instance) {
odp_term_global(m_instance);
}
When I ran my application for the first time, everything was ok. if I close my application and try to rerun it, it will crash and this is the backtrace output in gdb:
(gdb) bt
#0 0xffffcd4c in ?? ()
#1 0xf798afd2 in start_thread () from /lib/i386-linux-gnu/libpthread.so.0
#2 0xf75fd306 in clone () from /lib/i386-linux-gnu/libc.so.6
Steps before I run my application:
0. I set hugepage number: sudo echo 512 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
1. I mount hugepages: sudo mount -t hugetlbfs pagesize=1GB /mnt/huge
I was also thinking that maybe after first run, some processes or FD or something else left on my system. but there is no process left as I saw at htop.
Also, I tried to remove everything in these directories:
sudo rm -r /mnt/huge/0/
sudo rm -r /dev/shm/0/
But still receive SIGSEGV on the second run. Did I miss some steps in the cleanup process?
—
Reply to this email directly, view it on GitHub<#288>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AFSGF6QYYQBR543XYIP2QXLX2CHS5ANCNFSM6AAAAAA4VEJVDI>.
You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>
|
I couldn't even run webserver2 for the first time.
Do you mean I should put a number here: Actually, I put 11 there and I still have this issue. |
I couldn't even run webserver2 for the first time.
Some of the examples, including webserver2, appear to have bugs in thread creation. Thank you for reporting that.
How about for example test/cunit/ofp_test_init?
Do you mean I should put a number here: /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages ?
Might be better to use the default 2M page size. I usually do it like this:
echo 1000 > /proc/sys/vm/nr_hugepages
mkdir -p /mnt/huge
mount -t hugetlbfs nodev /mnt/huge
…________________________________________
From: LinArcX ***@***.***>
Sent: Wednesday, September 13, 2023 12:33
To: OpenFastPath/ofp
Cc: Jere Leppanen (Nokia); Comment
Subject: Re: [OpenFastPath/ofp] If I run my application for the second time, i will always receive a SIGSEGV (Issue #288)
Are you able to run any of the examples or tests for a second time?
I couldn't even run webserver2 for the first time.
Maybe this is not related to the problem at hand, but you are reserving 2M pages, but mounting 1G pages.
Do you mean I should put a number here: /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages ?
Actually, I put 11 there and I still have this issue.
—
Reply to this email directly, view it on GitHub<#288 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AFSGF6RW2UACHXYNTJ5B6ATX2F4U7ANCNFSM6AAAAAA4VEJVDI>.
You are receiving this because you commented.Message ID: ***@***.***>
|
Just found something maybe related to my problem. after I called: My application still continue to run until in some parts of my code, it tries to create another thread like this:
and after: pthread_attr_init(&tattr) call, application will crash. Maybe this is the reason of crash? |
I have tried 'webserver2' with @JereLeppanen 's fix and more changes but I don't reproduce this error on my "Ubuntu 22.04 LTS". Maybe is related to wsl or application. On, pthread_attr_init(), remember to pthread_attr_destroy() and not to pthread_attr_init() twice on the same object. Else, out of curiosity, what exactly are you trying to do (what use case)? You may also have a look at this more advanced implementation: https://github.com/NetInoSoftware/nfp |
I don't talk about webserver2. my use case is very simple. I have a huge application that I try to integrate ofp, odp into it. I ran into crashes in certain cases. Let me clarify the flow of the crash: At the beginning of the application before doing anything i setup ofp/odp like this: odp_init_global();
... my applications start to run and continue ...
And exactly this place is where my application will crash. My questions are clear:
|
Hi,
Are you not calling ofp_init_local() in every ofp thread at start? If not, anything can happen.
should i call odph_thread_join() after odph_thread_create()?
It depends on what you are trying to do. If you do not want to wait that a thread exits, then you should not call it.
what is the relation of my crash and pthread_attr_init()?
I have no idea.
why ofp and odp cause this problem?
Maybe the problem is not caused by ofp and odp but by your code that uses them somehow incorrectly?
Janne
From: LinArcX ***@***.***>
Sent: Friday, September 15, 2023 10:30 AM
To: OpenFastPath/ofp ***@***.***>
Cc: Subscribed ***@***.***>
Subject: Re: [OpenFastPath/ofp] If I run my application for the second time, i will always receive a SIGSEGV (Issue #288)
I don't talk about webserver2. my use case is very simple. I have a huge application that I try to integrate ofp, odp into it. I ran into crashes in certain cases. Let me clarify the flow of the crash:
At the beginning of the application before doing anything i setup ofp/odp like this:
odp_init_global();
odp_init_local();
ofp_init_global()
ofp_init_local()
odph_thread_create()
...
...
* Notice that i didn't call odph_thread_join() after above function calls.
...
...
...
my applications start to run and continue ...
...
...
as I told in, somewhere in our application we call pthread_attr_init() like this:
pthread_attr_t tattr;
if (pthread_attr_init(&tattr)) {
throw Exception("Awww");
}
And exactly this place is where my application will crash.
My questions are clear:
1. should i call odph_thread_join() after odph_thread_create()?
2. what is the relation of my crash and pthread_attr_init()? (i want to know internally why this happens. why ofp and odp cause this problem? since if i remove ofp and odp from my application i never see this crash.)
—
Reply to this email directly, view it on GitHub<#288 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AHAYVO2SURNSNR7V5VHWJT3X2P7V5ANCNFSM6AAAAAA4VEJVDI>.
You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>
|
Are you not calling ofp_init_local() in every ofp thread at start? If not, anything can happen. |
how can i know how many ofp thread i have? |
With "ofp thread" I meant a thread that you create and in which you call ofp. IOW, if you create a thread and intend to call ofp functions in it, the first ofp function you call in the thread must be ofp_init_local(). |
Oh, that's so hard. What about setting this parameters: |
'default_event_dispatcher' calls ofp_init_local() indeed. The question is: do you have other threads that are using ofp API ? Those threads should be created with odph_thread_create() and should call ofp_init_local() at the beginning. Note: odph_thread_create() calls underneath odp_init_local() and odp_term_local(). This is why you should use this API for those threads. |
@bogdanPricope aweaome tips. thank you. just one thing. i'm using process instead of pthreads for thread_model: still i should follow your approach? i mean i should call |
For processes you should use odph_linux_process_fork() (or odph_linux_process_fork_n()). It calls underneath odp_init_local() on child process. That means, you should call ofp_init_local() when child process starts and ofp_term_local() / odp_term_local() when child ends. |
I am using the same
And this is my code for another thread:
This is my code for the main thread(beginning of my application):
|
Recap:
Question: `int odp_init_local(odp_instance_t instance, odp_thread_type_t thr_type)
....... |
I created a test application that has a structure like this:
I put the above lines inside the constructor of my class FOO. and I do these things at the constructor:
When I ran my application for the first time, everything was ok. if I close my application and try to rerun it, it will crash and this is the backtrace output in gdb:
Steps before I run my application:
sudo mount -t hugetlbfs pagesize=1GB /mnt/huge
I was also thinking that maybe after first run, some processes or FD or something else left on my system. but there is no process left as I saw at htop.
Also, I tried to remove everything in these directories:
But still receive SIGSEGV on the second run. Did I miss some steps in the cleanup process?
It's worth mentioning that I'm developing my application inside wsl(Debian Buster). maybe it causes the issue? is there any restriction on wsl?
The text was updated successfully, but these errors were encountered: