Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fefgGet (line 224) running out of memory #94

Open
soichih opened this issue Feb 23, 2019 · 17 comments
Open

fefgGet (line 224) running out of memory #94

soichih opened this issue Feb 23, 2019 · 17 comments

Comments

@soichih
Copy link
Contributor

soichih commented Feb 23, 2019

We've been seeing a lot of out-of-memory issue with encode lately.

Out of memory. Type HELP MEMORY for your options.

Error in unique


Error in unique


Error in fefgGet (line 224)


Error in feSet (line 75)


Error in feConnectomeInit (line 52)

This might be happening because we started feeding track.tck output from mrtrix3 act app, which output somewhat more fibers and track.tck file size is slightly larger (1.69GB)

tckinfo.txt
 
***********************************
  Tracks file: "track.tck"
    act:                  5tt.mif
    backtrack:            variable
    count:                675000
    crop_at_gmwmi:        1
    downsample_factor:    3
    fod_power:            0.25
    init_threshold:       variable
    lmax:                 variable
    max_angle:            variable
    max_num_seeds:        15000000
    max_num_tracks:       15000
    max_seed_attempts:    1000
    max_trials:           1000
    method:               variable
    mrtrix_version:       3.0_RC3
    output_step_size:     0.625
    rk4:                  0
    samples_per_step:     4
    sh_precomputed:       1
    source:               variable
    step_size:            variable
    stop_on_all_include:  0
    threshold:            variable
    timestamp:            1550301026.1405482292
    total_count:          675000
    unidirectional:       0

I will benchmark / profile the memory usage, but if @ccaiafa @francopestilli have any hunch as to what might be causing this issue, please let me know!

@francopestilli
Copy link
Contributor

@soichih is this inside a docker container?

@francopestilli
Copy link
Contributor

what is the size of 'val'? and that of fg.fibers?

val = unique(val,'rows');

@ccaiafa
Copy link
Collaborator

ccaiafa commented Feb 23, 2019

Hi @soichih
You may change the number of batches to make sure each batch fits in memory when ENCODE model is built. Best. Cesar

@soichih
Copy link
Contributor Author

soichih commented Feb 23, 2019

We are running inside singularity, and singularity itself doesn't impose any memory limit. If we runt it on karst, it should have access to all 32g available. For app-life, however, we only tell the #PBS that we use 24gb max. so the scheduler will kill the job if are using more than that, but the error messsage we are seeing is from matlab running out of memory.

@francopestilli
Copy link
Contributor

@soichih I just realized this is an mrtrix3 file. We never tested those files in matlab. Is it the case that is read correctly? There can be something in the header that makes them incompatible with fgRead. I would dlook into fg.fibers and see how the fibers look and the size.

@daducci
Copy link

daducci commented Feb 27, 2020

Hi @francopestilli , do you have any update on this issue by any chance? I was trying to run LiFE on a .tck tractogram with 5M streamlines to test the new version (encode) as it looks cool. But we run out of memory on a server with 64 GB of ram; it looks like it requires about 100GB. I was wondering whether we are doing something wrong, as the new version should be optimized for memory consumption. Any insight? Thanks, Ale

@francopestilli
Copy link
Contributor

Hi @francopestilli , do you have any update on this issue by any chance? I was trying to run LiFE on a .tck tractogram with 5M streamlines to test the new version (encode) as it looks cool. But we run out of memory on a server with 64 GB of ram; it looks like it requires about 100GB. I was wondering whether we are doing something wrong, as the new version should be optimized for memory consumption. Any insight? Thanks, Ale

@daducci let's me look into this. Which branch of the repo are using? master?

@daducci
Copy link

daducci commented Feb 27, 2020

Yes, master branch in this repo (https://github.com/brain-life/encode). Then we used the demo_LiFE.m script to process this tractogram. Thanks for the quick reply!

@francopestilli
Copy link
Contributor

Hi @daducci

Take a look at this code: https://github.com/brain-life/encode/blob/master/life/fe/feConnectomeEncoding.m#L29

and: https://github.com/brain-life/encode/blob/master/life/fe/feConnectomeEncoding.m#L58

We provide some educated-guesses for the size of the memory that can be allocated and for the size fo the batches used to process the streamlines as input.

If the guess is not appropriate for your OS and/or the number of streamlines you are using (much bigger than we normally use), the initialization is likely to fail.

You should be able to change the batch size so that each batch can fit in memory. Does it make sense?

@ccaiafa
Copy link
Collaborator

ccaiafa commented Mar 1, 2020

Thanks @francopestilli for the clarification.
Hi @daducci, let me add something more. As explained in our paper, the information of the whole connectome is encoded into an extremely large but very sparse 3D array (tensor Phi), whose dimensions corresponds to orientation x voxel x fiber. To avoid explosion of memory requirements during the tensor building, we divide the connectome in groups of fibers (batches), we create a sparse 3D tensor for each group and, finally, we obtain the objective sparse 3D tensor by concatenating all sub-tensors along the 3rd dimension.
If you still have memory problems, you can try to continue increasing the number of batches (groups of fibers).
I hope it helps.
Best
Cesar

@daducci
Copy link

daducci commented Mar 1, 2020

Thank you both, I really appreciate your help! Indeed, the fitting does not fail, it simply requires about 100GB of RAM and 55h/brain to complete (probably because of swapping I suspect). But then the coefficients look fine. So, if I understood correctly, you suggest reducing the size of the batches so that the amount of memory is reduced, am I right?

@francopestilli
Copy link
Contributor

Hi @daducci correct, it should. But the RAM you measure worry me a little. We never see such large memory usage. Maybe your streamlines are sampled at very high resolution also (we use 1mm node spacing)? Anyways, reducing the batch size should work, I am just confused by the 100GB! If things are still so crazy (55h), would you feel comfortable sharing the dataset for a local test on our side (dwi+BVECS?BVALS and .tck file)?

@daducci
Copy link

daducci commented Mar 3, 2020

Hi Franco and Cesar, of course, there's no problem sharing the data! We are just running a few more tests to provide you with correct timings and memory requirements we observe: we fit LIFE to all 3 shells of the HCP dataset, but then saw that in your paper you only use the b=2000 shell, so we are performing this test and will come back to you asap. We also take care of the b0=5 issue.

Meanwhile, we have run some other experiments on a smaller tractogram to match the one in your paper (500k streamlines, iFOD2 algo, default step size 0.625mm, file size 770MB). The aim was to play with the MAXMEM parameter you suggested. Indeed we observed that the time for constructing the fe-structure increased, as expected, but the total amount of ram didn't seem to change. Besides, the ram required at fitting time was about 12GB (our workstation has 64GB). I report hereafter some values we got so far:

  • MAXMEM set to default value
    Encoding batch 16 took: 325.317s.
    [feConnectomeInit] fe-structure Memory Storage:3.82 Gb
    Fit process completed in 131.359 minutes
    Memory used by Matlab creating the dictionary 12GB, fitting 10GB
  • MAXMEM set to half its default value
    Encoding batch 32 took: 510.341s.
    [feConnectomeInit] fe-structure Memory Storage:3.82 Gb
    Memory used by Matlab creating the dictionary 12GB, fitting 10GB
    Fit process completed in 131.851 minutes

As you can see, fitting time and memory requirements do not change, only the construction time does. What are we doing wrong?

@ccaiafa
Copy link
Collaborator

ccaiafa commented Mar 3, 2020

Hi @daducci,
Thanks for sharing the results. I think they make sense.
Using batched constructing of tensor PHI reduces memory requirements only during building. The final result, i.e. the obtained full sparse tensor PHI is the same, independently of the number of batches used. Each batch has the goal to construct a piece of the final tensor PHI.
Regarding time for fitting, your numbers are reasonable, maybe a little bit large. I remember that for our paper results each brain needed approximately 45mins to be fit on Indiana University servers.
Best
Cesar

@daducci
Copy link

daducci commented Mar 5, 2020

Dear Cesar,

the test on the 4M streamlines and using only the b=2000 shell just finished, here are the outcomes:

HCP subject 119833
MAXMEM default value
Encoding batch 26 took: 1.33 hours.
[feConnectomeInit] fe-structure Memory Storage: 28.00 Gb
Memory used by Matlab creating the dictionary 100GB, fitting 48GB
Fit process completed in 22.88 hours

Do these numbers make sense? If so, that's ok, we are just worried we are doing something wrong. Here is the link to the data in case you want to give it a try.

@ccaiafa
Copy link
Collaborator

ccaiafa commented Mar 6, 2020

Dear Alessandro,
Sorry for my late response. Thank you for sharing the results and data. I think the numbers makes sense although we have never used such a huge connectome.
@francopestilli let us know if you have a chance to run encode on this dataset at IU servers.
Best.
Cesar

@francopestilli
Copy link
Contributor

hi @daducci @ccaiafa I have been testing the code with the data Alessandro provided. The code seems to work, but:

  • The heuristic for guessing the batches does not seem to scale well with larger tractograms (>1mil). I edited the code so that we provide a min number of batches independently of the heuristics. This will slow down things with small tractograms (seconds likely) but work for larger tractograms. @ccaiafa if you have time, would you please mind looking at the logic here and below: https://github.com/brain-life/encode/blob/master/life/fe/feConnectomeEncoding.m#L58 it seems to top me that we might not need all the zeroes there in the ratio and also for some reasons the ratio shows up as less than 0 (0.23 I believe with @daducci's data, so when we compute ceil we get 1 and that gives us one batch).
  • The code is running on my Mac laptop with 32GB of RAM. It is running while I write and do all the other things. So not to bad. But it did take some time.
    image

@daducci can you please try if this version of the code runs for you now? Thanks for the feedback the more people use the code, the better it gets.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants