Skip to content

Commit

Permalink
Merge branch 'master' into normalizeBeforeMAPCA
Browse files Browse the repository at this point in the history
  • Loading branch information
notZaki committed Jan 25, 2021
2 parents 9b72780 + 82bda7c commit c60d57e
Show file tree
Hide file tree
Showing 8 changed files with 115 additions and 31 deletions.
21 changes: 17 additions & 4 deletions docs/approach.rst
Original file line number Diff line number Diff line change
Expand Up @@ -201,9 +201,9 @@ Here we can see time series for some example components (we don't really care ab

These components are subjected to component selection, the specifics of which
vary according to algorithm.
Specifically, ``tedana`` offers two different approaches that perform this step.
Specifically, ``tedana`` offers three different approaches that perform this step.

The simplest approach (the default ``mdl``, ``aic`` and ``kic`` options for
The recommended approach (the default ``mdl`` option, along with the ``aic`` and ``kic`` options, for
``--tedpca``) is based on a moving average (stationary Gaussian) process
proposed by `Li et al (2007)`_ and used primarily in the Group ICA of fMRI Toolbox (GIFT).
A moving average process is the output of a linear system (which, in this case, is
Expand All @@ -229,8 +229,21 @@ to select the PCA components based on three widely-used model selection criteria
option ``mdl`` might not yield perfect results on your data. We suggest you explore the ``kic``
and ``aic`` options if running ``tedana`` with ``mdl`` returns less components than expected.

In addition to the moving average process-based options described above, we also support a
decision tree-based selection method (similar to the one in the :ref:`TEDICA` section below).
The simplest approach uses a user-supplied threshold applied to the cumulative variance explained
by the PCA.
In this approach, the user provides a value to ``--tedpca`` between 0 and 1.
That value corresponds to the percent of variance that must be explained by the components.
For example, if a value of 0.9 is provided, then PCA components
(ordered by decreasing variance explained)
cumulatively explaining up to 90% of the variance will be retained.
Components explaining more than that threshold
(except for the component that crosses the threshold)
will be excluded.

In addition to the moving average process-based options and the variance explained threshold
described above,
we also support a decision tree-based selection method
(similar to the one in the :ref:`TEDICA` section below).
This method involves applying a decision tree to identify and discard PCA components which,
in addition to not explaining much variance, are also not significantly TE-dependent (i.e.,
have low Kappa) or TE-independent (i.e., have low Rho).
Expand Down
9 changes: 9 additions & 0 deletions tedana/decay.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,15 @@ def _apply_t2s_floor(t2s, echo_times):
eps = np.finfo(dtype=t2s.dtype).eps # smallest value for datatype
temp_arr = np.exp(-echo_times / t2s) # (E x V) array
bad_voxel_idx = np.any(temp_arr == 0, axis=0) & (t2s != 0)
n_bad_voxels = np.sum(bad_voxel_idx)
if n_bad_voxels > 0:
n_voxels = temp_arr.size
floor_percent = 100 * n_bad_voxels / n_voxels
LGR.debug(
"T2* values for {0}/{1} voxels ({2:.2f}%) have been "
"identified as close to zero and have been "
"adjusted".format(n_bad_voxels, n_voxels, floor_percent)
)
t2s_corrected[bad_voxel_idx] = np.min(-echo_times) / np.log(eps)
return t2s_corrected

Expand Down
30 changes: 24 additions & 6 deletions tedana/decomposition/pca.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
"""
import logging
import os.path as op
from numbers import Number

import numpy as np
import pandas as pd
Expand Down Expand Up @@ -73,11 +74,14 @@ def tedpca(data_cat, data_oc, combmode, mask, adaptive_mask, t2sG,
Reference image to dictate how outputs are saved to disk
tes : :obj:`list`
List of echo times associated with `data_cat`, in milliseconds
algorithm : {'kundu', 'kundu-stabilize', 'mdl', 'aic', 'kic'}, optional
Method with which to select components in TEDPCA. Default is 'mdl'. PCA
algorithm : {'kundu', 'kundu-stabilize', 'mdl', 'aic', 'kic', float}, optional
Method with which to select components in TEDPCA. PCA
decomposition with the mdl, kic and aic options are based on a Moving Average
(stationary Gaussian) process and are ordered from most to least aggresive.
See (Li et al., 2007).
(stationary Gaussian) process and are ordered from most to least aggressive
(see Li et al., 2007).
If a float is provided, then it is assumed to represent percentage of variance
explained (0-1) to retain from PCA.
Default is 'mdl'.
kdaw : :obj:`float`, optional
Dimensionality augmentation weight for Kappa calculations. Must be a
non-negative float, or -1 (a special value). Default is 10.
Expand All @@ -90,6 +94,7 @@ def tedpca(data_cat, data_oc, combmode, mask, adaptive_mask, t2sG,
Whether to output files from fitmodels_direct or not. Default: False
low_mem : :obj:`bool`, optional
Whether to use incremental PCA (for low-memory systems) or not.
This is only compatible with the "kundu" or "kundu-stabilize" algorithms.
Default: False
Returns
Expand Down Expand Up @@ -166,6 +171,10 @@ def tedpca(data_cat, data_oc, combmode, mask, adaptive_mask, t2sG,
"connectivity mapping using multiecho fMRI. Proceedings "
"of the National Academy of Sciences, 110(40), "
"16187-16192.")
elif isinstance(algorithm, Number):
alg_str = (
"in which the number of components was determined based on a "
"variance explained threshold")
else:
alg_str = ("based on the PCA component estimation with a Moving Average"
"(stationary Gaussian) process (Li et al., 2007)")
Expand All @@ -191,6 +200,14 @@ def tedpca(data_cat, data_oc, combmode, mask, adaptive_mask, t2sG,
mask_img = io.new_nii_like(ref_img, mask.astype(int))
voxel_comp_weights, varex, varex_norm, comp_ts = ma_pca.ma_pca(
data_img, mask_img, algorithm)
elif isinstance(algorithm, Number):
ppca = PCA(copy=False, n_components=algorithm, svd_solver="full")
ppca.fit(data_z)
comp_ts = ppca.components_.T
varex = ppca.explained_variance_
voxel_comp_weights = np.dot(np.dot(data_z, comp_ts),
np.diag(1. / varex))
varex_norm = varex / varex.sum()
elif low_mem:
voxel_comp_weights, varex, comp_ts = low_mem_pca(data_z)
varex_norm = varex / varex.sum()
Expand Down Expand Up @@ -227,9 +244,10 @@ def tedpca(data_cat, data_oc, combmode, mask, adaptive_mask, t2sG,
comptable = kundu_tedpca(comptable, n_echos, kdaw, rdaw, stabilize=False)
elif algorithm == 'kundu-stabilize':
comptable = kundu_tedpca(comptable, n_echos, kdaw, rdaw, stabilize=True)
elif algorithm in ['mdl', 'aic', 'kic']:
else:
alg_str = "variance explained-based" if isinstance(algorithm, Number) else algorithm
LGR.info('Selected {0} components with {1} dimensionality '
'detection'.format(comptable.shape[0], algorithm))
'detection'.format(comptable.shape[0], alg_str))
comptable['classification'] = 'accepted'
comptable['rationale'] = ''

Expand Down
14 changes: 0 additions & 14 deletions tedana/tests/data/nih_five_echo_outputs_verbose.txt
Original file line number Diff line number Diff line change
Expand Up @@ -96,17 +96,3 @@ figures/comp_048.png
figures/comp_049.png
figures/comp_050.png
figures/comp_051.png
figures/comp_052.png
figures/comp_053.png
figures/comp_054.png
figures/comp_055.png
figures/comp_056.png
figures/comp_057.png
figures/comp_058.png
figures/comp_059.png
figures/comp_060.png
figures/comp_061.png
figures/comp_062.png
figures/comp_063.png
figures/comp_064.png
figures/comp_065.png
2 changes: 1 addition & 1 deletion tedana/tests/test_integration.py
Original file line number Diff line number Diff line change
Expand Up @@ -92,7 +92,7 @@ def test_integration_five_echo(skip_integration):
data=datalist,
tes=echo_times,
out_dir=out_dir,
tedpca='aic',
tedpca=0.95,
fittype='curvefit',
tedort=True,
verbose=True)
Expand Down
23 changes: 23 additions & 0 deletions tedana/tests/test_workflows_parser_utils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
"""Test workflow parser utility functions."""
import argparse
import pytest

from tedana.workflows.parser_utils import check_tedpca_value


def test_check_tedpca_value():
"""Test the check_tedpca_value function."""
with pytest.raises(argparse.ArgumentTypeError):
check_tedpca_value("hello", is_parser=True)

with pytest.raises(ValueError):
check_tedpca_value("hello", is_parser=False)

with pytest.raises(argparse.ArgumentTypeError):
check_tedpca_value(1.5, is_parser=True)

with pytest.raises(ValueError):
check_tedpca_value(1.5, is_parser=False)

assert check_tedpca_value(0.95) == 0.95
assert check_tedpca_value("mdl") == "mdl"
26 changes: 26 additions & 0 deletions tedana/workflows/parser_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,32 @@
"""
import os.path as op
import logging
from numbers import Number

import argparse


def check_tedpca_value(string, is_parser=True):
"""Check if argument is a float in range 0-1 or one of a list of strings."""
valid_options = ("mdl", "aic", "kic", "kundu", "kundu-stabilize")
msg = None
if string not in valid_options:
if not isinstance(string, Number):
msg = "Argument must be a float or one of: {}".format(
", ".join(valid_options)
)
elif not (0 <= float(string) <= 1):
msg = "Argument must be between 0 and 1."
else:
string = float(string)

if msg:
if is_parser:
raise argparse.ArgumentTypeError(msg)
else:
raise ValueError(msg)

return string


def is_valid_file(parser, arg):
Expand Down
21 changes: 15 additions & 6 deletions tedana/workflows/tedana.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
reporting, selection, utils)
import tedana.gscontrol as gsc
from tedana.stats import computefeats2
from tedana.workflows.parser_utils import is_valid_file, ContextFilter
from tedana.workflows.parser_utils import is_valid_file, check_tedpca_value, ContextFilter

LGR = logging.getLogger(__name__)
RepLGR = logging.getLogger('REPORT')
Expand Down Expand Up @@ -98,12 +98,15 @@ def _get_parser():
default='t2s')
optional.add_argument('--tedpca',
dest='tedpca',
type=check_tedpca_value,
help=('Method with which to select components in TEDPCA. '
'PCA decomposition with the mdl, kic and aic options '
'is based on a Moving Average (stationary Gaussian) '
'process and are ordered from most to least aggresive. '
'Default=\'mdl\'.'),
choices=['kundu', 'kundu-stabilize', 'mdl', 'aic', 'kic'],
'process and are ordered from most to least aggressive. '
"Users may also provide a float from 0 to 1, "
"in which case components will be selected based on the "
"cumulative variance explained. "
"Default='mdl'."),
default='mdl')
optional.add_argument('--seed',
dest='fixed_seed',
Expand Down Expand Up @@ -262,8 +265,11 @@ def tedana_workflow(data, tes, out_dir='.', mask=None,
Default is 'loglin'.
combmode : {'t2s'}, optional
Combination scheme for TEs: 't2s' (Posse 1999, default).
tedpca : {'kundu', 'kundu-stabilize', 'mdl', 'aic', 'kic'}, optional
Method with which to select components in TEDPCA. Default is 'mdl'.
tedpca : {'mdl', 'aic', 'kic', 'kundu', 'kundu-stabilize', float}, optional
Method with which to select components in TEDPCA.
If a float is provided, then it is assumed to represent percentage of variance
explained (0-1) to retain from PCA.
Default is 'mdl'.
tedort : :obj:`bool`, optional
Orthogonalize rejected components w.r.t. accepted ones prior to
denoising. Default is False.
Expand Down Expand Up @@ -387,6 +393,9 @@ def tedana_workflow(data, tes, out_dir='.', mask=None,
if not isinstance(gscontrol, list):
gscontrol = [gscontrol]

# Check value of tedpca *if* it is a float
tedpca = check_tedpca_value(tedpca, is_parser=False)

LGR.info('Loading input data: {}'.format([f for f in data]))
catd, ref_img = io.load_data(data, n_echos=n_echos)
n_samp, n_echos, n_vols = catd.shape
Expand Down

0 comments on commit c60d57e

Please sign in to comment.