Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

3D NIHT fails when num_auto_parallelize = true #246

Open
hakkelt opened this issue Feb 15, 2021 · 4 comments
Open

3D NIHT fails when num_auto_parallelize = true #246

hakkelt opened this issue Feb 15, 2021 · 4 comments

Comments

@hakkelt
Copy link
Contributor

hakkelt commented Feb 15, 2021

I'm trying to run the following set of commands:

bart phantom -3 phan
bart fft $(bart bitmask 0 1 2) phan y
bart ones 3 128 128 128 smap
bart pics -R N:$(bart bitmask 0 1 2):0:10 y smap recon

The last command fails with the following assertion error: bart: /home/hakta/.bin/bart/src/num/vecops.c:547: klargest_complex_partsort: Assertion 'k <= N' failed. Similarly, bart pics -R H:$(bart bitmask 0 1 2):0:10 y smap recon also fails with the same error message.

After some debugging, I found the source of the error in /src/num/optimize.c (lines 705-735):

	int skip = min_blockdim(N, ND, tdims, nstr1, sizes);
	unsigned long flags = 0;

	debug_printf(DP_DEBUG4, "MD-Fun. Io: %d Input: ", io);
	debug_print_dims(DP_DEBUG4, D, dim);

#ifdef USE_CUDA
	if (num_auto_parallelize && !use_gpu(N, nptr1) && !one_on_gpu(N, nptr1)) {
#else
	if (num_auto_parallelize) {
#endif
		flags = dims_parallel(N, io, ND, tdims, nstr1, sizes);

		while ((0 != flags) && (ffs(flags) <= skip))
			skip--;

		flags = flags >> skip;
	}

	const long* nstr2[N];

	for (unsigned int i = 0; i < N; i++)
		nstr2[i] = *nstr1[i] + skip;

#ifdef USE_CUDA
	debug_printf(DP_DEBUG4, "This is a %s call\n.", use_gpu(N, nptr1) ? "gpu" : "cpu");

	__block struct nary_opt_data_s data = { md_calc_size(skip, tdims), use_gpu(N, nptr1) ? &gpu_ops : &cpu_ops };
#else
	__block struct nary_opt_data_s data = { md_calc_size(skip, tdims), &cpu_ops };
#endif

As num_auto_parallelize = true, value of skip is decreased from 3 to 2, and the size field of data calculated by md_calc_size(skip, tdims) is incorrect because the 3D NIHT cannot be parallelized along the third dimension (or, at least, this feature is not implemented)... Setting num_auto_parallelize = false at line 574 in /src/num/optimize.c solved the problem, but I also experienced a slight decrease in performance in the FISTA-based reconstructions.

Is it be possible to switch off auto parallelization only for NIHT reconstructions, or is there an easy way to perform NIHT parallel?

@uecker
Copy link
Member

uecker commented Feb 17, 2021

Thanks, I will fix this. The problem is that the higher-level function uses the optimization framework incorrectly (as NIHT can not be parallelized easily in this automatic way).

@sdimoudi
Copy link
Contributor

Thank you Martin, I will look into this too from my end and get back to you tomorrow. There is also some GPU support I want to finalise and submit within the next couple of weeks, if it is of interest to Hakkel, I will check the 3D case and get back to you.

@uecker
Copy link
Member

uecker commented Feb 18, 2021

I think you can't use optimized_nop for this. You need to reorder dimensions so that all jointly threshold dimensions are innermost and all others (which could be processed in parallel) on the outside by copying in a temporary array (this can be skipped if they are already in this order). You can then reshape the dimensions that the innermost block is collapsed to one dimension. Finally, you use md_parallel_nary to apply the kernel while processing all blocks in parallel but each block must be processed by one kernel application.

@hakkelt
Copy link
Contributor Author

hakkelt commented Feb 19, 2021

Thank you both for your help. :)
Sofia: Yes, it would be really nice to have this algorithm be implemented on GPU even for the 3D case!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants