Reduce volume memory usage #168

glasserm · 2020-03-03T22:10:48Z

Reduce memory usage when computing variance normalization on volume files.

ICAFIX/scripts/functionhighpassandvariancenormalize.m

mharms

Were you able to check the memory performance of this relative to the previous version?

glasserm · 2020-03-04T09:27:05Z

I will not have time to mess with this any more, so if you wish to make further adjustments, you can do this after it is merged or we can stick with what we have that uses more memory. This approach minimizes the memory footprint as much as is easily possible regarding the zeros in the NIFTI volume. One clears the full array immediately after using it so as to never have that in memory when doing other steps that may temporarily increase memory usage over the amount needed when the full array is being used (e.g. icaDim). This was tested with both detrending and hp2000, but is anticipated to only be used with detrending given current recommendations.

glasserm · 2020-03-04T17:08:34Z

The above changes are fine with me. Were they tested with hp=0 and hp=2000 (and now I guess hp=-1)?

mharms · 2020-03-04T17:17:45Z

FYI: The push I just forced just corrects the variable indentation. I was hoping that you could quickly test this with what you already have set up, since I'm not really set up to test under interpreted matlab.

Also, still unclear though if you have empirically tested the memory reduction, which I would like to be established before I work with @hodgem to get these changes into the compiled matlab. I wonder if the matlab profiler would make it easy to assess the memory usage...

…mri file as input

glasserm · 2020-03-04T17:25:17Z

I can run the basic run to completion tests; however, I don't think it is necessary to test the memory reduction (and I don't have a way to do that)--it will be reduced considerably.

glasserm · 2020-03-04T17:40:17Z

The code runs through the VN function for hp=0 and hp=2000. hp=-1 apparently is not allowed.

mharms · 2020-03-04T17:50:43Z

That's good. hp=-1 should be able to run successfully through functionhighpassandvariancenormalize.m itself. But it isn't a supported option in hcp_fix_multi_run.

ICAFIX/scripts/functionhighpassandvariancenormalize.m

coalsont · 2020-03-04T20:28:22Z

Sticking a /usr/bin/time in front of the matlab call would give you some memory numbers to work with, though I seem to recall they may be scaled strangely (so, should not be compared against things like htop).

mharms · 2020-03-04T20:39:27Z

?? /usr/bin/time reports the time used by a command, not the memory...

coalsont · 2020-03-04T20:41:54Z

tim@timsdev:~$ /usr/bin/time ls
...
0.00user 0.00system 0:00.00elapsed 71%CPU (0avgtext+0avgdata 1088maxresident)k
0inputs+0outputs (0major+342minor)pagefaults 0swaps

"maxresident" refers to memory. In posix mode, it will only report time, so you might need to give it some options in your environment.

mharms · 2020-03-04T20:57:35Z

This is interesting (from CHPC):

[mharms@login02 fixTesting]$ time ls
...
real	0m0.016s
user	0m0.000s
sys	0m0.010s
[mharms@login02 fixTesting]$ /usr/bin/time ls
...
0.00user 0.01system 0:00.02elapsed 52%CPU (0avgtext+0avgdata 920maxresident)k
0inputs+0outputs (0major+278minor)pagefaults 0swaps

The man time page reports that

Note: some shells (e.g., bash(1)) have a built-in time command that provides less  function-
       ality  than the command described here.  To access the real command, you may need to specify
       its pathname (something like /usr/bin/time).

But, then, I'd expect that which time would report that time is a "built-in", but that is not the case:

[mharms@login02 fixTesting]$ which time
/usr/bin/time

I guess that isn't how which functions under bash?

coalsont · 2020-03-04T21:04:07Z

tim@timsdev:~$ type time
time is a shell keyword
tim@timsdev:~$ type which
which is /usr/bin/which
tim@timsdev:~$ type type
type is a shell builtin

mharms · 2020-03-04T21:09:42Z

Any particular reason bash doesn't report "built-ins" via which, since most people are (I suspect) much more familiar with which?

mharms · 2020-03-04T21:10:45Z

Esp. considering that which does report on aliases...

coalsont · 2020-03-04T21:16:54Z

which doesn't tell me about aliases:

tim@timsdev:~$ which ls
/bin/ls
tim@timsdev:~$ type ls
ls is aliased to `ls --color=auto'

As I edited my reply to show, which is not a builtin (at least not on my system), so it doesn't know about the internals of bash.

coalsont · 2020-03-04T21:19:02Z

And here is your answer on the chpc:

[...@login01 ~]$ type which
which is aliased to `alias | /usr/bin/which --tty-only --read-alias --show-dot --show-tilde'
[...@login01 ~]$ type type
type is a shell builtin

From here via /etc/profile:

[...@login01 ~]$ cat /etc/profile.d/which2.sh
# Initialization script for bash and sh

# export AFS if you are in AFS environment
alias which='alias | /usr/bin/which --tty-only --read-alias --show-dot --show-tilde'

mharms · 2020-03-04T21:24:57Z

I see that my comment about which was tripped up by the fact that the system itself aliases which as part of a default set of aliases (to include the --read-alias flag):

[mharms@login02 ~]$ which which
alias which='alias | /usr/bin/which --tty-only --read-alias --show-dot --show-tilde'
	/usr/bin/which
[mharms@login02 ~]$ which rm
alias rm='rm -i'
	/bin/rm
[mharms@login02 ~]$ /usr/bin/which rm
/bin/rm

I see that you just made the same discovery...

coalsont · 2020-03-04T22:07:27Z

With my change to delete(), it appears to complete successfully (with hp=2000, so it should hit that code).

coalsont · 2020-03-04T22:46:44Z

From the master commit this PR started from, here are the outputs of /usr/bin/time on just the functionhighpass... lines:

148.10user 61.93system 1:18.69elapsed 266%CPU (0avgtext+0avgdata 3794780maxresident)k
72inputs+2750040outputs (0major+30161541minor)pagefaults 0swaps

110.79user 41.44system 0:49.44elapsed 307%CPU (0avgtext+0avgdata 3721664maxresident)k
40inputs+1246304outputs (0major+20672548minor)pagefaults 0swaps

Before changing single(zeros(:

190.61user 77.33system 1:26.93elapsed 308%CPU (0avgtext+0avgdata 2812540maxresident)k
72inputs+2750040outputs (0major+38105443minor)pagefaults 0swaps

104.78user 43.89system 0:44.69elapsed 332%CPU (0avgtext+0avgdata 2737868maxresident)k
40inputs+1246304outputs (0major+19979866minor)pagefaults 0swaps

So, those edits saved a fair amount of memory used by matlab in that function. Strange that it took more time, maybe from the stdev computation? After zeros(..., 'single'):

190.40user 77.16system 1:25.92elapsed 311%CPU (0avgtext+0avgdata 2812632maxresident)k
72inputs+2750040outputs (0major+37486119minor)pagefaults 0swaps

107.54user 39.42system 0:44.10elapsed 333%CPU (0avgtext+0avgdata 2737484maxresident)k
32inputs+1246304outputs (0major+19668406minor)pagefaults 0swaps

So, not really any change to the maximum there.

coalsont · 2020-03-11T23:11:16Z

My version, with range and masking-bypass:

First pass:

156.81user 38.57system 3:20.25elapsed 97%CPU (0avgtext+0avgdata 1577072maxresident)k
171952inputs+2750040outputs (420major+25930748minor)pagefaults 0swaps
173.58user 46.03system 4:54.48elapsed 74%CPU (0avgtext+0avgdata 1492044maxresident)k
503368inputs+2750040outputs (1831major+28643473minor)pagefaults 0swaps

Second run had more major pagefaults, so ignore the memory difference between the two.

Concatenated:

108.39user 25.36system 4:28.90elapsed 49%CPU (0avgtext+0avgdata 2687976maxresident)k
234824inputs+2487416outputs (1138major+15806211minor)pagefaults 0swaps

Some major pagefaults, so expect another 100MB or so...I still have /usr/bin/time on melodic, etc, so I can compare to those numbers. We used std in the fix_3_clean changes too, so that needs editing.

On another note, does functionhighpass... really need to spit out mountains of numbers? It makes it slightly more difficult to find the timing results in the output.

glasserm · 2020-03-11T23:13:01Z

The numbers tell you if the golden search is behaving sensibly.

coalsont · 2020-03-11T23:16:13Z

Do you mean if the golden mean search has a bug in it, or do you mean if there isn't anything particularly good for the search to find? If it is only for debugging, then it can be commented out when not debugging.

glasserm · 2020-03-11T23:17:47Z

If there is something wrong with the data it can floor or ceil, which is bad.

coalsont · 2020-03-11T23:20:53Z

That sounds like you can just print the end result and the initial endpoints, and that will tell you whether what it found was an initial endpoint.

coalsont · 2020-03-12T00:38:07Z

So, for my 2-run test, the concatenated highpass is now only barely above the melodic memory usage, and fix_3_clean (fix -f) is now in line with fix -c:

melodic:

659.51user 6.62system 21:21.15elapsed 51%CPU (0avgtext+0avgdata 2523728maxresident)k
38480inputs+923472outputs (204major+5845224minor)pagefaults 0swaps

fix -f (LOTS of major pagefaults, so this number could be wrong, but it mostly matches a much-earlier run of fix -f):

872.93user 70.86system 56:32.20elapsed 27%CPU (0avgtext+0avgdata 2381880maxresident)k
3677648inputs+35472outputs (25280major+18657473minor)pagefaults 0swaps

fix -c:

29.95user 0.90system 0:39.67elapsed 77%CPU (0avgtext+0avgdata 2952384maxresident)k
2261832inputs+40outputs (73major+766193minor)pagefaults 0swaps

fix -a:

11.04user 4.06system 1:40.66elapsed 15%CPU (0avgtext+0avgdata 2884456maxresident)k
154048inputs+2748856outputs (324major+1970588minor)pagefaults 0swaps

So, my versions appear to peak at only slightly more memory than melodic (which we can't edit easily). Emailing them to Mike for change resolution.

coalsont · 2020-03-12T01:05:19Z

ICAFIX/scripts/icaDim.m

+% Apply mask, reusing 'data' variable for memory efficiency
+data = data(mask,:);


Input variables don't count as used memory until they are assigned to, nor does a trivial copy (no indexing expression). I prefer not overwriting variables with modified versions, and in this case and data_detrend, it shouldn't improve memory usage versus my version.

I'll add the usemask stuff, but I don't see the disadvantage of overwriting variables with a modified version, esp. since it is a misnomer to name the variable data_detrend since whether it is actually detrended depends on the value of DEMDT.

… 'std' to compute mask

…d Tim's modifications to not mask if it isn't actually helpful

mharms · 2020-03-12T20:09:14Z

Ok, see latest commits, that role in Tim's changes. I restored Origdata as the input variable name, but use (and reuse) data for the rest of the code. That seems clean to me.

@glasserm: Given that icaDim isn't actually the memory limiting step, and given that we need to return to using the non-VN'ed data over and over, it doesn't seem worth it to me to bother with deleting data and recomputing it on the fly from the SVD within the while loop. Besides, in this version, the full time series of data_trend is only formed at the front end for the DEMDT==1 condition (which is the only condition for which we actually need to preserve that data throughout the while looping), so for the DEMDT=={0,-1} conditions, this version is actually a wash memory-wise with your earlier 6edc725 version.

Update: I realized that data can/should nonetheless be deleted once we are actually done with it (outside the while loop), so I just added that.

mharms · 2020-03-12T20:10:32Z

@coalsont: If you could run this version, that would be appreciated!

coalsont · 2020-03-12T21:12:23Z

For f1b2c5f:
First pass:

152.06user 48.16system 5:57.56elapsed 55%CPU (0avgtext+0avgdata 1574704maxresident)k
301160inputs+2750040outputs (1034major+27044101minor)pagefaults 0swaps
211.18user 58.82system 2:33.85elapsed 175%CPU (0avgtext+0avgdata 1575388maxresident)k
26400inputs+2750040outputs (145major+38181696minor)pagefaults 0swaps

Concatenated VN:

90.43user 34.90system 1:54.81elapsed 109%CPU (0avgtext+0avgdata 2751528maxresident)k
327000inputs+2487416outputs (1050major+14559735minor)pagefaults 0swaps

glasserm · 2020-03-12T21:30:22Z

Let's get this merged, compiled, and running on the CHPC.

mharms · 2020-03-12T21:33:52Z

@glasserm: Do you have scripts in place to quickly test all three DEMDT conditions? Not for memory, but just that they run successfully?

glasserm · 2020-03-12T21:43:31Z

It runs in all three conditions.

mharms · 2020-03-12T21:48:55Z

Fabulous! @coalsont: Anything else you want to change (or are still reviewing), before I request this gets recompiled?

coalsont · 2020-03-12T22:14:31Z

The excessive amount of text output isn't that important, and is easy enough to comment out when it gets annoying. I'd say go ahead.

mharms · 2020-03-13T21:45:58Z

ARGH!!! There is no doubt that we've improved the memory utilization of interpreted matlab. We've been assuming that any such gains there would readily translate to the compiled matlab as well. But, what if that isn't actually the case?!?

I just ran a test of the compiled matlab, and the results of /usr/bin/time on the 1st pass VN from the latest compile (25f3e1f) are no different than what we had from the earlier compile (624c8d1) from the code that added the masking to functionhighpass... but accomplished it via std rather than range. Which is to say, the maxresident values were ~ 4.4 x the size of the input (uncompressed) volume time series, rather than ~ 2.5 x as @coalsont achieved in his testing with interpreted matlab following the switch to 'range'.

I've checked with @hodgem and he is confident that recompiled from the tip of this PR (although to be absolutely 100% sure, it might be worth adding an fprintf to the code, recompiling, and confirming that we see the expected output in the log).

The results above would be consistent with the hypothesis that in compiled matlab, the range operation is basically the same as std in terms of memory.

mharms · 2020-03-13T21:55:14Z

Ok, another wrinkle. The 2nd pass VN stage just completed from my compiled matlab run, and the results are even more confusing (to me at least)

The output of the /usr/bin/time that I appended to the front end of the functionhighpasss... call was:

1315.55user 117.50system 32:46.33elapsed 72%CPU (0avgtext+0avgdata 38761832maxresident)k
1338648inputs+22320072outputs (832major+6527444minor)pagefaults 0swaps

i.e., 39 GB, which represents no improvement relative to the version that created the spatial mask using std.
However, the reported Resource usage on CHPC for the overall job was:

Resources: cput=00:31:43,vmem=29324464kb,walltime=00:32:49,mem=23844428kb,energy_used=0

i.e., mem of only 24 GB. This is the biggest discrepancy between the Resources mem value and the /usr/bin/time maxresident value that I've seen.
@coalsont Any insight as to what might be going on?

coalsont · 2020-03-13T22:09:48Z

/usr/bin/time should measure the true memory peak, while other resource tracking may be polling the process, so if the peak was extremely short...

I would have expected compiled matlab to be no worse than interpreted, there doesn't seem like any good reason their compiler should be dumber (though maybe the compiler exists for bad reasons in the first place...).

glasserm · 2020-03-13T22:16:30Z

I agree we should add an echo statement and retry.

mharms · 2020-03-15T16:57:54Z

I ran the 2nd pass VN stage two more times using the compiled matlab, just to see if my previously reported result (with the inconsistent reported peak memory: #168 (comment)) was repeatable, and in each case the reported mem value from TORQUE this time matched the maxresident value reported by /usr/bin/time on the functionhighpass... call:

Resources: cput=02:19:43,vmem=43890700kb,walltime=02:24:14,mem=37181008kb,energy_used=0

1153.64user 6028.97system 2:24:11elapsed 83%CPU (0avgtext+0avgdata 38652224maxresident)k
1865120inputs+26811104outputs (4323major+31476466minor)pagefaults 0swaps

Resources: cput=06:36:33,vmem=43824136kb,walltime=06:40:50,mem=37187800kb,energy_used=0

1183.44user 16265.38system 6:40:45elapsed 72%CPU (0avgtext+0avgdata 38761520maxresident)k
1900048inputs+23573592outputs (9039major+19870045minor)pagefaults 0swaps

Of note, in each of these other two instances, the duration of the job on the node was much longer than the earlier test (which took only 33 min), so the longer duration for the latest two test may have given the TORQUE polling process the opportunity to catch the memory peak (consistent with @coalsont's hypothesis).

I'll ask @hodgem to do a recompile after I add an fprintf to the matlab code, just to be 100% sure that the recompile is hitting the intended code. But at this point, it sure looks like compiled matlab is using memory that interpreted matlab is optimized to not need.

mharms · 2020-03-17T03:14:32Z

I have confirmed that the expected out of the fprintf statements that I added are showing up in the log files when using compiled matlab mode for the latest compile. But, the memory profile is as before. I'm therefore concluding that the improved (lower) memory profile that @coalsont achieved using interpreted matlab is unfortunately not propagated into the compiled matlab.

We do still have a small reduction (~10% in compiled matlab mode in the 2nd pass VN stage), and of course the more efficient memory is obviously beneficial for interpreted matlab. So, I suggest we merge this PR and tag a new release (v4.2) that we can use in the QuNex container for ICAFIX.

Reduce volume memory usage

348c75d

Reduce memory usage when computing variance normalization on volume files.

coalsont approved these changes Mar 3, 2020

View reviewed changes

mharms reviewed Mar 4, 2020

View reviewed changes

ICAFIX/scripts/functionhighpassandvariancenormalize.m Outdated Show resolved Hide resolved

mharms reviewed Mar 4, 2020

View reviewed changes

ICAFIX/scripts/functionhighpassandvariancenormalize.m Outdated Show resolved Hide resolved

mharms reviewed Mar 4, 2020

View reviewed changes

More logical and efficient sequence of steps for the the hp>0 condition

8ad2ca5

mharms force-pushed the Reduce-Volume-Memory-Usage- branch from 5740f07 to 8ad2ca5 Compare March 4, 2020 17:12

Forgot to modify the fslmaths -bptf call to operate on the original f…

b7303a8

…mri file as input

coalsont reviewed Mar 4, 2020

View reviewed changes

ICAFIX/scripts/functionhighpassandvariancenormalize.m Show resolved Hide resolved

coalsont reviewed Mar 4, 2020

View reviewed changes

ICAFIX/scripts/functionhighpassandvariancenormalize.m Outdated Show resolved Hide resolved

Change shell rm to matlab delete

e2a0f24

change single(zeros(... to zeros(..., 'single')

9ac0eaa

coalsont reviewed Mar 12, 2020

View reviewed changes

mharms added 2 commits March 12, 2020 14:45

Decrease memory usage of functionhighpass by using 'range' instead of…

e0c8ca2

… 'std' to compute mask

Decrease memory usage of icaDim by using 'range' instead of 'std'; Ad…

a5b0cd2

…d Tim's modifications to not mask if it isn't actually helpful

Delete 'data' once done with it

f1b2c5f

Fix some whitespace (tab vs space) misalignment

9f08366

Add reporting of percentage of masked voxels

ca736b1

mharms force-pushed the Reduce-Volume-Memory-Usage- branch from 25f3e1f to ca736b1 Compare March 16, 2020 15:59

MATLAB compile: functionhighpassandvariancenormalize

a75919f

glasserm merged commit 49830a6 into master Mar 17, 2020

glasserm deleted the Reduce-Volume-Memory-Usage- branch March 17, 2020 08:26

		% Apply mask, reusing 'data' variable for memory efficiency
		data = data(mask,:);

Reduce volume memory usage #168

Reduce volume memory usage #168

Conversation

glasserm commented Mar 3, 2020

mharms left a comment

Choose a reason for hiding this comment

glasserm commented Mar 4, 2020

glasserm commented Mar 4, 2020

mharms commented Mar 4, 2020

glasserm commented Mar 4, 2020

glasserm commented Mar 4, 2020

mharms commented Mar 4, 2020

coalsont commented Mar 4, 2020

mharms commented Mar 4, 2020

coalsont commented Mar 4, 2020

mharms commented Mar 4, 2020

coalsont commented Mar 4, 2020 • edited Loading

mharms commented Mar 4, 2020

mharms commented Mar 4, 2020

coalsont commented Mar 4, 2020

coalsont commented Mar 4, 2020 • edited Loading

mharms commented Mar 4, 2020

coalsont commented Mar 4, 2020

coalsont commented Mar 4, 2020

coalsont commented Mar 11, 2020

glasserm commented Mar 11, 2020

coalsont commented Mar 11, 2020

glasserm commented Mar 11, 2020

coalsont commented Mar 11, 2020 • edited Loading

coalsont commented Mar 12, 2020

coalsont Mar 12, 2020

Choose a reason for hiding this comment

mharms Mar 12, 2020

Choose a reason for hiding this comment

mharms commented Mar 12, 2020 • edited Loading

mharms commented Mar 12, 2020

coalsont commented Mar 12, 2020

glasserm commented Mar 12, 2020

mharms commented Mar 12, 2020

glasserm commented Mar 12, 2020

mharms commented Mar 12, 2020

coalsont commented Mar 12, 2020

mharms commented Mar 13, 2020

mharms commented Mar 13, 2020

coalsont commented Mar 13, 2020

glasserm commented Mar 13, 2020

mharms commented Mar 15, 2020

mharms commented Mar 17, 2020

coalsont commented Mar 4, 2020 •

edited

Loading

coalsont commented Mar 4, 2020 •

edited

Loading

coalsont commented Mar 11, 2020 •

edited

Loading

mharms commented Mar 12, 2020 •

edited

Loading