KK discrepancies #579

ntustison · 2024-03-16T02:00:11Z

Okay, just a quick update on this issue. It appears that there's something about running KK through the ANTsPy or ANTsR interface which causes a "failure", i.e. instances where the thickness value(s) blow up beyond what is expected. Although KK is deterministic, the problem occurs completely at random, occurs with a single thread or multiple threads, and the deviation from the correct or expected thickness image occurs in the first couple of optimization iterations. Also, as a check, it does not occur when running it via the command line.

Below are minimal examples comparing the three scripts running KK on the same data in a loop of 10 iterations: shell, python, and R. On my machine, the shell script produces 10 images which are exactly the same and have the expected values. The R and Python scripts produce the expected image anywhere from 50% to 80% of the time (very loose estimate).

I'll be diving deeper and seeing which exact internal image variable deviates but just wanted to provide an update. Also, any thoughts are welcome on why running KK through R and Python causes this.

Shell

#! /usr/bin/sh

export ITK_GLOBAL_DEFAULT_NUMBER_OF_THREADS=8

inDirectory=./

seg=${inDirectory}/seg.nii.gz
gm=${inDirectory}/gm.nii.gz
wm=${inDirectory}/wm.nii.gz

iter=3

for i in {0..9};
  do 
    echo "KKsh $i"
    kk=${inDirectory}/sh_kk_${i}.nii.gz
    KellyKapowski -d 3 \
                  -s $seg \
                  -g $gm \
                  -w $wm \
                  -v 1 \
                  -m 1.5 \
                  -r 0.0025 \
                  -t 10 \
                  -c [${iter}] \
                  -x 0 \
                  -o $kk 
  done

Python

import ants
import os

os.environ['ITK_GLOBAL_DEFAULT_NUMBER_OF_THREADS'] = '8'

in_directory = "./"

seg = ants.image_read(in_directory + "seg.nii.gz")
gm = ants.image_read(in_directory + "gm.nii.gz")
wm = ants.image_read(in_directory + "wm.nii.gz")

iter=3

for i in range(10):
    print("KKpy " + str(i))
    kk = ants.kelly_kapowski(s=seg, g=gm, w=wm,
                            its=iter, r=0.0025, m=1.5, x=0, t=10,
                            verbose=1)
    ants.image_write(kk, in_directory + "py_kk_" + str(i) + ".nii.gz")

R

library( ANTsR )

Sys.setenv(ITK_GLOBAL_DEFAULT_NUMBER_OF_THREADS = "8")

inDirectory <- "./"

seg <- antsImageRead( paste0( inDirectory, "seg.nii.gz" ) )
gm <- antsImageRead( paste0( inDirectory, "gm.nii.gz" ) )
wm <- antsImageRead( paste0( inDirectory, "wm.nii.gz" ) )

iter <- 3

for( i in seq.int(9) )
  {
  kk <- kellyKapowski( s = seg, g = gm, w = wm, its = iter, 
                       r = 0.0025, m = 1.5, x = FALSE, t = 10, 
                       verbose = TRUE )
  antsImageWrite( kk, paste0( inDirectory, "r_kk_", i, ".nii.gz") )   
  }

gm.nii.gz
seg.nii.gz
wm.nii.gz

ncullen93 · 2024-03-16T11:43:43Z

Other people have definitely posted issues with things being different or memory issues in python or R... It must be related to the wrappers, right? Perhaps the C++ versions need to be updated?

cookpa · 2024-03-16T14:16:29Z

As the OP in #552 tried to warn us, it is some kind of memory leak in ANTsPy and presumably R too.

I modified the script slightly

import sys
import os

os.environ['ITK_GLOBAL_DEFAULT_NUMBER_OF_THREADS'] = '4'

import ants

in_directory = "./"

seg = ants.image_read(in_directory + "seg.nii.gz")
gm = ants.image_read(in_directory + "gm.nii.gz")
wm = ants.image_read(in_directory + "wm.nii.gz")

iter=3

label=sys.argv[1]

print("KKpy " + label)
kk = ants.kelly_kapowski(s=seg, g=gm, w=wm,
                        its=iter, r=0.0025, m=1.5, x=0, t=10,
                        verbose=1)
ants.image_write(kk, in_directory + "results/py_kk_" + label + ".nii.gz")

This gives identical results if I run a bash for loop calling this multiple times. Iif I run a for loop inside python, it varies some of the time.

ntustison · 2024-03-16T15:59:09Z

Thanks @cookpa and @ncullen93 for looking at this. I've narrowed it down a bit in the actual ITK filter but am still completely perplexed as to why this is happening.

@cookpa, I pulled out the for loop from my original Python script and it looks as though the problem might persist. I'm double checking that right now.

cookpa · 2024-03-16T21:17:03Z

You're right. It happened 0 out of 10 times last night, but today I tried it again and it failed 1 out of 50 times. The 49 other runs were identical to KellyKapowski

ntustison · 2024-03-17T00:01:01Z

Thanks @cookpa for the confirmation.

To add to my confusion, I'm currently running the original Python script where the for loop is internal to the Python script on my GPU linux box. So far I'm 40 iterations in and it hasn't failed once.

cookpa · 2024-03-17T17:47:33Z

Bizarre. I haven't run on Linux but on my Mac, running with a for loop in Python produces different results nearly every run. Without a for loop the failure rate is approximately 1%, it's otherwise consistent with the CLI. I don't get it at all

cookpa · 2024-03-17T17:48:25Z

I tried re-reading the input images at every iteration, but it doesn't help. So I don't think the input images are changing during the for loop.

ntustison · 2024-03-17T19:25:13Z

Yeah, I initially tried that but ultimately put image reading outside the loop to rule out that as a possibility. I also ran the original Python script on the university linux cluster and it reproduced the expected result every time. Within the DiReCT filter I've isolated it to which variable is deviating from the expected value on my Mac but I haven't figured out why yet.

stnava · 2024-03-17T20:18:52Z

I also dont know what's happening and have seen this issue in my large-scale studies that are run on a linux cluster ... but my guess is that the issue comes down to something to do with (reliability of) precision for relatively small float values. if some of the very small values are shrunk ( lost in some sense ) then the algorithm would lead to overly large thickness estimates. I wonder if the "count" image ( or any other image used in the denominator ) should be stored as double precision.

ntustison · 2024-03-17T20:26:10Z

@stnava , In this case I should've mentioned that my initial run on the UVa cluster showed some variation in output. After updating with @ncullen93 's updating of the tags, I was able to get the expected repeatability in output.

In reference to my comment to @cookpa , the two main image variables in the filter in constructing the thickness image, hitImage and totalImage, the hitImage is consistent through the different runs. It's the totalImage that seems to be the immediate cause of variation in output. I'll try using double precision for that image and see if that fixes the issue.

stnava · 2024-03-17T20:31:27Z

@ntustison thanks - good to know. re hit vs total I would have guessed the opposite effect. I wonder if any insight could be gained from writing out every state of any image involved in the total ... if all else fails.

ntustison · 2024-03-17T20:34:26Z

Yeah, will do after I investigate the effect of changing RealType = float to RealType = double in KellyKapowski.cxx. It gets a little messy with the integration step but it would definitely narrow it down.

ntustison · 2024-03-18T03:52:18Z

Okay, I'm pretty sure this was the issue.

These are always bittersweet---glad to have figured it out but it's so damn obvious in hindsight.

cookpa · 2024-03-18T17:17:05Z

Re-opening, we will fix by updating ANTs version to incorporate updates from @ntustison

cookpa mentioned this issue Mar 16, 2024

Descrepancy in performance of AntsPy Behaviour when running in a loop for a long time. #552

Open

ntustison mentioned this issue Mar 18, 2024

BUG: Need to fill buffer for R and Python interfaces. ANTsX/ANTs#1699

Merged

ntustison closed this as completed Mar 18, 2024

cookpa mentioned this issue Mar 18, 2024

To do: Review library code and remove any uninitialized memory access #583

Closed

cookpa added the bug label Mar 18, 2024

cookpa reopened this Mar 18, 2024

cookpa added this to the Find and fix uninitialized memory access milestone Mar 18, 2024

cookpa self-assigned this Mar 21, 2024

cookpa linked a pull request Mar 22, 2024 that will close this issue

ENH: Bump ITK and ANTs, initialize all memory allocated for images #590

Merged

cookpa closed this as completed in #590 Mar 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KK discrepancies #579

KK discrepancies #579

ntustison commented Mar 16, 2024

ncullen93 commented Mar 16, 2024

cookpa commented Mar 16, 2024 •

edited

Loading

ntustison commented Mar 16, 2024

cookpa commented Mar 16, 2024

ntustison commented Mar 17, 2024

cookpa commented Mar 17, 2024

cookpa commented Mar 17, 2024

ntustison commented Mar 17, 2024

stnava commented Mar 17, 2024

ntustison commented Mar 17, 2024

stnava commented Mar 17, 2024

ntustison commented Mar 17, 2024

ntustison commented Mar 18, 2024

cookpa commented Mar 18, 2024

KK discrepancies #579

KK discrepancies #579

Comments

ntustison commented Mar 16, 2024

Shell

Python

R

ncullen93 commented Mar 16, 2024

cookpa commented Mar 16, 2024 • edited Loading

ntustison commented Mar 16, 2024

cookpa commented Mar 16, 2024

ntustison commented Mar 17, 2024

cookpa commented Mar 17, 2024

cookpa commented Mar 17, 2024

ntustison commented Mar 17, 2024

stnava commented Mar 17, 2024

ntustison commented Mar 17, 2024

stnava commented Mar 17, 2024

ntustison commented Mar 17, 2024

ntustison commented Mar 18, 2024

cookpa commented Mar 18, 2024

cookpa commented Mar 16, 2024 •

edited

Loading