3D NIHT fails when num_auto_parallelize = true #246

hakkelt · 2021-02-15T14:16:36Z

I'm trying to run the following set of commands:

bart phantom -3 phan
bart fft $(bart bitmask 0 1 2) phan y
bart ones 3 128 128 128 smap
bart pics -R N:$(bart bitmask 0 1 2):0:10 y smap recon

The last command fails with the following assertion error: bart: /home/hakta/.bin/bart/src/num/vecops.c:547: klargest_complex_partsort: Assertion 'k <= N' failed. Similarly, bart pics -R H:$(bart bitmask 0 1 2):0:10 y smap recon also fails with the same error message.

After some debugging, I found the source of the error in /src/num/optimize.c (lines 705-735):

	int skip = min_blockdim(N, ND, tdims, nstr1, sizes);
	unsigned long flags = 0;

	debug_printf(DP_DEBUG4, "MD-Fun. Io: %d Input: ", io);
	debug_print_dims(DP_DEBUG4, D, dim);

#ifdef USE_CUDA
	if (num_auto_parallelize && !use_gpu(N, nptr1) && !one_on_gpu(N, nptr1)) {
#else
	if (num_auto_parallelize) {
#endif
		flags = dims_parallel(N, io, ND, tdims, nstr1, sizes);

		while ((0 != flags) && (ffs(flags) <= skip))
			skip--;

		flags = flags >> skip;
	}

	const long* nstr2[N];

	for (unsigned int i = 0; i < N; i++)
		nstr2[i] = *nstr1[i] + skip;

#ifdef USE_CUDA
	debug_printf(DP_DEBUG4, "This is a %s call\n.", use_gpu(N, nptr1) ? "gpu" : "cpu");

	__block struct nary_opt_data_s data = { md_calc_size(skip, tdims), use_gpu(N, nptr1) ? &gpu_ops : &cpu_ops };
#else
	__block struct nary_opt_data_s data = { md_calc_size(skip, tdims), &cpu_ops };
#endif

As num_auto_parallelize = true, value of skip is decreased from 3 to 2, and the size field of data calculated by md_calc_size(skip, tdims) is incorrect because the 3D NIHT cannot be parallelized along the third dimension (or, at least, this feature is not implemented)... Setting num_auto_parallelize = false at line 574 in /src/num/optimize.c solved the problem, but I also experienced a slight decrease in performance in the FISTA-based reconstructions.

Is it be possible to switch off auto parallelization only for NIHT reconstructions, or is there an easy way to perform NIHT parallel?

The text was updated successfully, but these errors were encountered:

uecker · 2021-02-17T20:45:28Z

Thanks, I will fix this. The problem is that the higher-level function uses the optimization framework incorrectly (as NIHT can not be parallelized easily in this automatic way).

sdimoudi · 2021-02-18T12:28:12Z

Thank you Martin, I will look into this too from my end and get back to you tomorrow. There is also some GPU support I want to finalise and submit within the next couple of weeks, if it is of interest to Hakkel, I will check the 3D case and get back to you.

uecker · 2021-02-18T13:01:20Z

I think you can't use optimized_nop for this. You need to reorder dimensions so that all jointly threshold dimensions are innermost and all others (which could be processed in parallel) on the outside by copying in a temporary array (this can be skipped if they are already in this order). You can then reshape the dimensions that the innermost block is collapsed to one dimension. Finally, you use md_parallel_nary to apply the kernel while processing all blocks in parallel but each block must be processed by one kernel application.

hakkelt · 2021-02-19T13:00:10Z

Thank you both for your help. :)
Sofia: Yes, it would be really nice to have this algorithm be implemented on GPU even for the 3D case!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

3D NIHT fails when num_auto_parallelize = true #246

3D NIHT fails when num_auto_parallelize = true #246

hakkelt commented Feb 15, 2021

uecker commented Feb 17, 2021

sdimoudi commented Feb 18, 2021

uecker commented Feb 18, 2021

hakkelt commented Feb 19, 2021

3D NIHT fails when num_auto_parallelize = true #246

3D NIHT fails when num_auto_parallelize = true #246

Comments

hakkelt commented Feb 15, 2021

uecker commented Feb 17, 2021

sdimoudi commented Feb 18, 2021

uecker commented Feb 18, 2021

hakkelt commented Feb 19, 2021