Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

If I run my application for the second time, i will always receive a SIGSEGV #288

Open
LinArcX opened this issue Sep 12, 2023 · 18 comments
Open

Comments

@LinArcX
Copy link

LinArcX commented Sep 12, 2023

I created a test application that has a structure like this:

 if (0 == odp_init_global(&instance, NULL, NULL)) {
    printf("odp_init_global: success!\n");
 
    if (0 == odp_init_local(instance, ODP_THREAD_CONTROL)) {
      printf("odp_init_local: success!\n");
      ofp_init_global_param(&app_init_params);
    }
    else {
      printf("Error: ODP local init failed.\n");
      odp_term_global(instance);
    }
  }
  else {
    printf("Error: ODP global init failed.\n");
  }

I put the above lines inside the constructor of my class FOO. and I do these things at the constructor:

  ofp_term_local();
  ofp_term_global();
  odp_term_local();
  if (m_instance) {
  	odp_term_global(m_instance);
  }

When I ran my application for the first time, everything was ok. if I close my application and try to rerun it, it will crash and this is the backtrace output in gdb:

(gdb) bt
#0  0xffffcd4c in ?? ()
#1  0xf798afd2 in start_thread () from /lib/i386-linux-gnu/libpthread.so.0
#2  0xf75fd306 in clone () from /lib/i386-linux-gnu/libc.so.6

Steps before I run my application:

  1. I set hugepage number: sudo echo 512 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
  2. I mount hugepages: sudo mount -t hugetlbfs pagesize=1GB /mnt/huge

I was also thinking that maybe after first run, some processes or FD or something else left on my system. but there is no process left as I saw at htop.

Also, I tried to remove everything in these directories:

      sudo rm -r /mnt/huge/0/
      sudo rm -r /dev/shm/0/

But still receive SIGSEGV on the second run. Did I miss some steps in the cleanup process?

It's worth mentioning that I'm developing my application inside wsl(Debian Buster). maybe it causes the issue? is there any restriction on wsl?

@bogdanPricope
Copy link
Contributor

I have no idea about wsl....

Else, what's with:

if (m_instance) {
  	odp_term_global(m_instance);
  }

The instance is a number, actually a pid (getpid())

typedef uint64_t odp_instance_t;

@LinArcX
Copy link
Author

LinArcX commented Sep 13, 2023

the m_instance part is not my problem. as I told you for the first time, everything is OK.

My concern is this part:

I was also thinking that maybe after first run, some processes or FD or something else left on my system. but there is no process left as I saw at htop.

I want to know if there are any leftovers left on the system after an application finishes.

@JereLeppanen
Copy link
Contributor

JereLeppanen commented Sep 13, 2023 via email

@LinArcX
Copy link
Author

LinArcX commented Sep 13, 2023

Are you able to run any of the examples or tests for a second time?

I couldn't even run webserver2 for the first time.

Maybe this is not related to the problem at hand, but you are reserving 2M pages, but mounting 1G pages.

Do you mean I should put a number here: /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages ?

Actually, I put 11 there and I still have this issue.

@JereLeppanen
Copy link
Contributor

JereLeppanen commented Sep 13, 2023 via email

@LinArcX
Copy link
Author

LinArcX commented Sep 14, 2023

Just found something maybe related to my problem. after I called: odp_init_global(), odp_init_local(), odph_thread_create() I never called: odph_thread_join()

My application still continue to run until in some parts of my code, it tries to create another thread like this:

	pthread_attr_t tattr;
	if (pthread_attr_init(&tattr)) {
		throw Exception("Awww");
	}

and after: pthread_attr_init(&tattr) call, application will crash.

Maybe this is the reason of crash?

@bogdanPricope
Copy link
Contributor

I have tried 'webserver2' with @JereLeppanen 's fix and more changes but I don't reproduce this error on my "Ubuntu 22.04 LTS".
ofp_term_global(); odp_term_local(); odp_term_global(instance);
and 'linux_sigaction', etc.

Maybe is related to wsl or application.

On, pthread_attr_init(), remember to pthread_attr_destroy() and not to pthread_attr_init() twice on the same object.

Else, out of curiosity, what exactly are you trying to do (what use case)?

You may also have a look at this more advanced implementation: https://github.com/NetInoSoftware/nfp

@LinArcX
Copy link
Author

LinArcX commented Sep 15, 2023

I don't talk about webserver2. my use case is very simple. I have a huge application that I try to integrate ofp, odp into it. I ran into crashes in certain cases. Let me clarify the flow of the crash:

At the beginning of the application before doing anything i setup ofp/odp like this:

odp_init_global();
odp_init_local();
ofp_init_global()
ofp_init_local()
odph_thread_create()
...
...

  • Notice that i didn't call odph_thread_join() after above function calls.

...
...
...

my applications start to run and continue ...
...
...
as I told in, somewhere in our application we call pthread_attr_init() like this:

pthread_attr_t tattr;
if (pthread_attr_init(&tattr)) {
throw Exception("Awww");
}

And exactly this place is where my application will crash.

My questions are clear:

  1. should i call odph_thread_join() after odph_thread_create()?
  2. what is the relation of my crash and pthread_attr_init()? (i want to know internally why this happens. why ofp and odp cause this problem? since if i remove ofp and odp from my application i never see this crash.)

@JannePeltonen
Copy link
Contributor

JannePeltonen commented Sep 15, 2023 via email

@JannePeltonen
Copy link
Contributor

Are you not calling ofp_init_local() in every ofp thread at start? If not, anything can happen.

@LinArcX
Copy link
Author

LinArcX commented Sep 15, 2023

how can i know how many ofp thread i have?

@JannePeltonen
Copy link
Contributor

With "ofp thread" I meant a thread that you create and in which you call ofp. IOW, if you create a thread and intend to call ofp functions in it, the first ofp function you call in the thread must be ofp_init_local().

@LinArcX
Copy link
Author

LinArcX commented Sep 15, 2023

Oh, that's so hard.

What about setting this parameters: thr_params.start = default_event_dispatcher;? it won't do the same thing?

@bogdanPricope
Copy link
Contributor

'default_event_dispatcher' calls ofp_init_local() indeed. The question is: do you have other threads that are using ofp API ? Those threads should be created with odph_thread_create() and should call ofp_init_local() at the beginning.

Note: odph_thread_create() calls underneath odp_init_local() and odp_term_local(). This is why you should use this API for those threads.

@LinArcX
Copy link
Author

LinArcX commented Sep 16, 2023

@bogdanPricope aweaome tips. thank you. just one thing. i'm using process instead of pthreads for thread_model:
https://github.com/OpenDataPlane/odp/blob/master/helper/include/odp/helper/threads.h#L166

still i should follow your approach? i mean i should call
odph_thread_create() and ofp_init_local() at the beginning of each thread?

@bogdanPricope
Copy link
Contributor

For processes you should use odph_linux_process_fork() (or odph_linux_process_fork_n()). It calls underneath odp_init_local() on child process. That means, you should call ofp_init_local() when child process starts and ofp_term_local() / odp_term_local() when child ends.

@LinArcX
Copy link
Author

LinArcX commented Sep 18, 2023

I am using the same thr_common.instance that I used for odp_init_global(), for my other thread also. but I get this error:

[New Thread 0xd07fdb40 (LWP 21150)]
E 21 4136856064 thread.cpp:123] SUCCESS: odph_thread_create()
ERR: odp_init.c:611:odp_init_local(): Bad instance.
threads.c:56:run_thread(): Local init failed
[Thread 0xd17ffb40 (LWP 21148) exited]
ERR: odp_init.c:611:odp_init_local(): Bad instance.
threads.c:56:run_thread(): Local init failed
[Thread 0xd0ffeb40 (LWP 21149) exited]
[New Thread 0xcfbffb40 (LWP 21151)]
[New Thread 0xcfbffb40 (LWP 21151)]
E 27 4136856064 thread.cpp:123] SUCCESS: odph_thread_create()

And this is my code for another thread:

	if (0 == ofp_init_local()) {
		odph_thread_t thread_tbl[MAX_WORKERS];
		odph_thread_param_t thr_params;
		odph_thread_common_param_t thr_common;
		memset(thread_tbl, 0, sizeof(thread_tbl));
		/* Start dataplane dispatcher worker threads */
		odph_thread_param_init(&thr_params);
		thr_params.start = default_event_dispatcher;
		thr_params.arg = (void*)ofp_eth_vlan_processing;
		thr_params.thr_type = ODP_THREAD_WORKER;
		odph_thread_common_param_init(&thr_common);
		thr_common.instance = MySingletoonClass::getInstance()->ofpInstance();
		thr_common.cpumask = &cpumask;
		thr_common.share_param = 1;

		if (num_workers == odph_thread_create(thread_tbl, &thr_common, &thr_params, num_workers)) {
			OFP_ERR("SUCCESS: odph_thread_create() .\n");
	
		}
		else {
			OFP_ERR("Error: odph_thread_create() failed.\n");
		}
	}

This is my code for the main thread(beginning of my application):

	if (0 == odp_init_global(&m_instance, NULL, NULL)) {
		if (0 == odp_init_local(m_instance, ODP_THREAD_CONTROL)) {
			ofp_global_param_t app_init_params;
			ofp_init_global_param(&app_init_params);
			int num_workers = 1;
			char cpumaskstr[64];
			odp_cpumask_t cpumask;
			num_workers = odp_cpumask_default_worker(&cpumask, num_workers);
			if (odp_cpumask_to_str(&cpumask, cpumaskstr, sizeof(cpumaskstr)) < 0) {
				OFP_ERR("Error: Too small buffer provided to odp_cpumask_to_str");
			}
			OFP_INFO("Num worker threads: %i", num_workers);
			OFP_INFO("First CPU:          %i", odp_cpumask_first(&cpumask));
			OFP_INFO("CPU mask:           %s", cpumaskstr);

			char interface[25];
			strncpy(interface, "eth0", sizeof(interface)-1);

			char* interfaces[] = {interface};
			app_init_params.if_count = 1;
			app_init_params.if_names = interfaces;

			if (app_init_params.pktin_mode != ODP_PKTIN_MODE_SCHED) {
				app_init_params.pktin_mode = ODP_PKTIN_MODE_SCHED;
			}
			switch (app_init_params.sched_sync) {
			case ODP_SCHED_SYNC_PARALLEL:
				OFP_WARN("Warning: Packet order is not preserved with parallel RX queues\n");
				break;
			case ODP_SCHED_SYNC_ATOMIC:
				break;
			case ODP_SCHED_SYNC_ORDERED:
				if (app_init_params.pktout_mode != ODP_PKTOUT_MODE_QUEUE) {
					OFP_WARN("Warning: Packet order is not preserved with ordered RX queues and direct TX queues.\n");
				}
				break;
			default:
				OFP_WARN("Warning: Unknown scheduling synchronization mode. Forcing atomic mode.\n");
				app_init_params.sched_sync = ODP_SCHED_SYNC_ATOMIC;
				break;
			}

			app_init_params.pkt_hook[OFP_HOOK_LOCAL] = fastpath_local_hook;

			if (0 == ofp_init_global(m_instance, &app_init_params)) {
				if (0 == ofp_init_local()) {
					odph_thread_t thread_tbl[MAX_WORKERS];
					odph_thread_param_t thr_params;
					odph_thread_common_param_t thr_common;
					memset(thread_tbl, 0, sizeof(thread_tbl));
					/* Start dataplane dispatcher worker threads */
					odph_thread_param_init(&thr_params);
					thr_params.start = default_event_dispatcher;
					thr_params.arg = (void*)ofp_eth_vlan_processing;
					thr_params.thr_type = ODP_THREAD_WORKER;
					odph_thread_common_param_init(&thr_common);
					thr_common.instance = m_instance;
					thr_common.cpumask = &cpumask;
					thr_common.share_param = 1;
					//thr_common.sync = 1;
					thr_common.thread_model = 1;

					if (num_workers == odph_thread_create(thread_tbl, &thr_common, &thr_params, num_workers)) {
	                                       // some internal process
					}
					else {
						OFP_ERR("Error: odph_thread_create() failed.\n");
					}
				}
				else {
					OFP_ERR("Error: OFP local init failed.");
				}
			}
			else {
				OFP_ERR("Error: OFP global init failed.");
			}
		}
		else {
			OFP_ERR("Error: ODP local init failed.");
		}
	}
	else {
		OFP_ERR("Error: ODP global init failed.");
	}

@bogdanPricope
Copy link
Contributor

Recap:

  • the main thread initializes odp, ofp and creates num_workers processes. Is that this part alone (without 'another thread') working well?
  • the 'another thread' starts a new worker.

Question:
How is started this 'another thread'? Is it a thread or a process and if is a process when it was forked and with what API?

`int odp_init_local(odp_instance_t instance, odp_thread_type_t thr_type)
{
enum init_stage stage = NO_INIT;

if (instance != (odp_instance_t)odp_global_ro.main_pid) {
	ODP_ERR("Bad instance.\n");
	goto init_fail;
}

.......
`
Either 'odp_global_ro.main_pid' is not initialized or 'instance' is invalid
You may try to print that MySingletoonClass::getInstance()->ofpInstance() (cast it to pid_t or int) and see if it is valid.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants