Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Seg fault when using batch processing #8

Open
MTclement1 opened this issue Nov 3, 2023 · 7 comments
Open

Seg fault when using batch processing #8

MTclement1 opened this issue Nov 3, 2023 · 7 comments

Comments

@MTclement1
Copy link

Hi,
MotionCorr works perfectly on single file (although for compiling I needed the -no-pie option as others), but when using batch the program seg fault after image loading.

The command for my tests is (I aliased motionCorr) : motionCor -InMrc ./data/ -OutMrc ./MotionCor/ -Gpu 0 -Patch 5 5 -Iter 10 -Serial 1

In the output the files are found :

added: ./data/test_fram_004.mrc
added: ./data/test_fram_006.mrc
added: ./data/test_fram_005.mrc
added: ./data/test_fram_009.mrc
added: ./data/test_fram_000.mrc
added: ./data/test_fram_010.mrc
added: ./data/test_fram_003.mrc
added: ./data/test_fram_001.mrc
added: ./data/test_fram_008.mrc
added: ./data/test_fram_007.mrc
added: ./data/test_fram_002.mrc

But after seemingly loading 5 files it segfault. Here is the last bit of the output :

MRC file size mode: 4096 4096 20 1
Rendered size mode: 4096 4096 20 1
MRC file size mode: 4096 4096 20 1
Rendered size mode: 4096 4096 20 1
GPU 0 Allocation time: GPU ( 0.06 GB) 0.00 s, CPU ( 0.00 GB) 0.00 s
GPU 0 Allocation time: GPU ( 0.25 GB) 0.00 s, CPU ( 0.00 GB) 0.00 s
GPU 0 Allocation time: GPU ( 0.31 GB) 0.00 s, CPU ( 0.00 GB) 0.00 s
GPU 0 Allocation time: GPU ( 1.25 GB) 0.01 s, CPU ( 0.00 GB) 0.00 s
Create buffers: total memory allocation 0.17 GB
Create buffers: 0.04 seconds

MRC file size mode: 4096 4096 20 1
Rendered size mode: 4096 4096 20 1
Segmentation fault (core dumped)

Then I reduced the amount of files to 4 and it just finish with no output :

MRC file size mode: 4096 4096 20 1
Rendered size mode: 4096 4096 20 1
MRC file size mode: 4096 4096 20 1
Rendered size mode: 4096 4096 20 1
GPU 0 Allocation time: GPU ( 0.06 GB) 0.00 s, CPU ( 0.00 GB) 0.00 s
GPU 0 Allocation time: GPU ( 0.25 GB) 0.00 s, CPU ( 0.00 GB) 0.00 s
GPU 0 Allocation time: GPU ( 0.31 GB) 0.00 s, CPU ( 0.00 GB) 0.00 s
GPU 0 Allocation time: GPU ( 0.01 GB) 0.00 s, CPU ( 0.00 GB) 0.00 s
GPU 0 Allocation time: GPU ( 1.25 GB) 0.02 s, CPU ( 0.00 GB) 0.00 s
Create buffers: total memory allocation 0.17 GB
Create buffers: 0.04 seconds

Total time: 0.088412 sec

I use a RTX 3080 that is recongnized by serial EM as having 9987 MB of VRAM left which sounds right.

Anyone has an idea on what I can do ?
Thanks

@leetleyang
Copy link

leetleyang commented Nov 5, 2023

We're seeing a similar issue with batch processing:

MotionCor3 -InMrc /data/stack_folder/ -OutMrc /data/corrected/ -LogDir /data/log/ -Patch 5 5 10 -FmDose 1.1 -Kv 200 -PixSize 0.96 -SumRange 0 0 -InFmMotion 1 -Gpu 0 1 2 3 -Serial 1 -OutStar 1

Seems to adds the input files (~5000 gain-normalized MRC stacks) but then breezes through the GPU allocation messages (e.g. above) without writing any outputs.

Processes a single movie fine, but -OutStar 1 does not lead to a corresponding star file being written.

Something amiss during our compilation, perhaps?

@szhengczii
Copy link
Collaborator

@MTclement1: Sorry for the tardy response. I will look into this and let you know.
@leetleyang: I will also look into this and let you know.

@jonathanrd
Copy link

I also have this issue, have there been any updates or workarounds?

@Fengyun0101
Copy link

I also have this issue when I process .mrc files of tomo datasets with -Serial 1.
end with the info as:

MRC file size mode: 5760 4092 6 0
Rendered size mode: 5760 4092 6 0
MRC file size mode: 5760 4092 6 0
Rendered size mode: 5760 4092 6 0
GPU 2 Allocation time: GPU ( 0.09 GB) 0.00 s, CPU ( 0.00 GB) 0.00 s
GPU 3 Allocation time: GPU ( 0.09 GB) 0.00 s, CPU ( 0.00 GB) 0.00 s
GPU 2 Allocation time: GPU ( 0.35 GB) 0.00 s, CPU ( 0.00 GB) 0.00 s
GPU 3 Allocation time: GPU ( 0.35 GB) 0.00 s, CPU ( 0.00 GB) 0.00 s
GPU 2 Allocation time: GPU ( 0.07 GB) 0.00 s, CPU ( 0.00 GB) 0.00 s
GPU 3 Allocation time: GPU ( 0.07 GB) 0.00 s, CPU ( 0.00 GB) 0.00 s
GPU 2 Allocation time: GPU ( 0.00 GB) 0.00 s, CPU ( 0.00 GB) 0.00 s
GPU 3 Allocation time: GPU ( 0.00 GB) 0.00 s, CPU ( 0.00 GB) 0.00 s
GPU 2 Allocation time: GPU ( 0.26 GB) 0.00 s, CPU ( 0.00 GB) 0.00 s
GPU 3 Allocation time: GPU ( 0.26 GB) 0.00 s, CPU ( 0.00 GB) 0.00 s
Create buffers: total memory allocation 0.37 GB
Create buffers: 0.07 seconds

MRC file size mode: 5760 4092 6 0
Rendered size mode: 5760 4092 6 0
Segmentation fault (core dumped)

By the way, I succefully processed the .eer files with MotionCor3 1.1.1. Could anyone have the updates? Many thanks.

@szhengczii
Copy link
Collaborator

szhengczii commented Mar 14, 2024 via email

@Fengyun0101
Copy link

Fengyun0101 commented Mar 14, 2024 via email

@Poko18
Copy link

Poko18 commented May 10, 2024

I have similar problem. Job gets killed during the batch process:
cmd:
/usr/local/MotionCor3 -InTiff dataset/ -InSuffix .tif -OutMrc 3/sum/corrected -Patch 5 5 -Gain dataset/GLP-1_gain.mrc -Gpu 0 -Kv 300 -PixSize 0.83 -FmDose 0.8 -Serial 1 -OutStar 1 -LogDir 3/logdir

Output:

Gain reference has been loaded.
DarkReference not found.

TIFF file size mode: 5760  4092  75  0
Rendered size mode: 5760  4092  75  0

GPU 0 Allocation time: GPU (  0.26 GB)   0.00 s, CPU (  0.00 GB)   0.00 s
GPU 0 Allocation time: GPU (  0.35 GB)   0.00 s, CPU (  0.00 GB)   0.00 s
GPU 0 Allocation time: GPU (  1.65 GB)   0.00 s, CPU (  0.00 GB)   0.00 s
GPU 0 Allocation time: GPU (  0.07 GB)   0.00 s, CPU (  0.00 GB)   0.00 s
GPU 0 Allocation time: GPU (  6.59 GB)   0.00 s, CPU (  0.00 GB)   0.00 s
Create buffers: total memory allocation 0.28 GB
Create buffers: 0.02 seconds

Killed

Im running on A10 GPU with 23Gb of memory. Does anyone have an idea what is going on?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants