-
-
Notifications
You must be signed in to change notification settings - Fork 381
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
antsMultivariateTemplateConstruction2 itk:MemoryAllocationError on big computer #1764
Comments
The pexec options are confusingly written, something I need to address. Running The per-process number of threads is controlled by the environment variable
If From your logs, I suspect it's failing when calling N4BiasFieldCorrection. Could you try setting the number of threads as above, and then running with |
I found another potential problem
I see here that your xyzt_units=3, indicating microns. The first images have xyzt_units=0, which will default to being read as mm. The default parameters for N4 are set up for a human size brain. I would advise setting all the headers consistently such that the pixel dimensions are read as 2.45 mm. |
Thank you, @cookpa , for your quick an thorough explanation. The Until now it appears to work (still running) as it should; even before fixing the images, which I am in the process of doing right now - for consistency sake. Thanks a lot! |
Great, I'm glad it's working. Internally, ITK always works in mm coordinates, so images in other units should be converted at run time. In reality, this was not always done consistently (eg, I made some fixes here). So registrations that appear to work with different spatial units might break in the future when the ANTs ITK version is updated. I think the approach taken in the first images, effectively scaling up the images so that the pixels are ~ 1mm, is the right thing to do. There's various numerical issues with very small spacings that are quite hard to debug, and many small animal imagers do this scaling to improve stability with ANTs and other tools. I also noticed the input images were stored without qform or sform transforms. Reading such images relies on backwards compatibility with Analyze, and ITK will not write NIFTI images without a valid header transform - the output of registration will have a qform and sform. ITK will try to keep it consistent but you need to be careful that your template does not get flipped with respect to the input images, and that the input images have the same orientation. You can preview how ANTs interprets the image space by loading images into ITK-SNAP. |
As it took me a little to read up on
... in PrintHeader and will report back here in the coming days. I highly appreciate the help, thank you very much. |
Yes, sorry if I was unclear, I think setting |
Unfortunately, I can only report that the template-building based on 12 modified (units=mm) images didn’t run through either - but for a new reason, I guess.
… which means the following jobs failed:
I find that remarkable, as these are not the last jobs (nor the first). From job_0003_metriclog.txt (randomly selected from the failed jobs):
It looks to me like the affine registration has already failed. Is that correct? The stdout-build-template.txt keeps going on but ends up on :
I took the freedom to attach the cited files, and I’d highly appreciate it if you could have a look at them and help me overcome this (new) hick-up (or should I rather start a new issue on this ?). |
Can we see the end of |
You may be interested in https://github.com/CoBrALab/optimized_antsMultivariateTemplateConstruction which is a re-implementation with better error handing as well as better control of the parallelization of the processes. |
"Killed" with no explanation is generally memory, yes |
On the head-node of my national HPC they kill processes that use more than a certain total amount of CPU time (you're not supposed to process on the head-node). This is another possibility. |
Yeah, wall time limits could also trigger that message. The usual tell for memory is that it consistently happens at the end of a stage, as the registration tries to allocate for the higher resolution. |
@gdevenyi, well caught!
However, that is not the only appearance of oom in dmesg. I extracted them ( Is there a clever way (which does not include downsampling) to overcome these memory issues? Maybe by limiting the number of processes run in parallel? Any thoughts I'd highly appreciate. |
Thank you. Yes, at first glance, this appears to be a valid alternative. I will have a deeper look into it over the weekend. |
I am in the lucky situation, that I am all alone on this machine - no sharing with nobody.
I was not aware, that the wall time has an impact on |
I'm running the first two images again, on my local Mac. With -c 2 -j 2, It's possible that running with more threads will increase memory use. Also, I believe I noticed before that the input images are different sizes (meaning the grid had a different number of voxels). Whatever is chosen as the initial template will determine the memory use, so it's worth checking the initial template is not too large. I would run one of the pairwise registrations, eg job_0.sh, and monitor its memory use to get an idea of how much it needs. |
I wonder if the initial images are very misaligned and the first dumb average is making a huge image? |
This is possible if not using an initial template, but the commands in the OP suggest an initial template. But yes, bad misalignment during population averaging can create a very large template. Worth checking. |
Operating system and version
Ubuntu 22.04.4
CPU architecture
x86_64 (PC, Intel Mac, other Intel/AMD)
ANTs code version
2.5.1.post73-g3a788f8
ANTs installation type
Compiled from source
Summary of the problem
Hi All,
I am trying to build a template calling
antsMultivariateTemplateConstruction2.sh
with 12 files... and after reading the first 5 files properly, I get 7 times a memory allocation error from ITK
... which I can not explain.
I saw this problem popping up in other issues (#917, #934) but the explanations there do not fit my situation.
The system I am using has 72 cores and 512BG of RAM and the files are not overly big.
From PrintHeader I computed this from one of the bigger files (.nii.gz)
I experimented with ANTSPexec.sh, implementing a memory limit to be checked before submitting another job. However, neither
free
norvmstat
show/detect low memory while/before itk throws the error.Can somebody please help me understand why I am getting these errors - and how to overcome them?
Commands to reproduce the problem.
Output of the command with verbose output.
stdout-build-template.txt
stderr-build-template.txt
Data to reproduce the problem
data downloadable from https://filesender.renater.fr/?s=download&token=6c3fc7b4-b96a-4511-aa6c-c6617d4a6f51
The text was updated successfully, but these errors were encountered: