-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ambiguous error during "spatial clustering of components" step #181
Comments
If |
Thank you for your quick response. Hmm I installed Tedana using the instructions on the site (“pip install tedana”). Do I need to do something else to get the latest version? I haven’t processed any other subjects. Thanks for your help! |
To update to the newest release, you can do Once you've updated, please re-run the same command, minus the conv argument, and let me know if there's an error. |
Ah, I see. I updated to the newest release, which seems to have resolved this particular issue (although, for what its worth, I still see --conv as an optional input argument when I type "tedana -h"). I ran the following command ... It finishes without error. It appears to have produced all the expected output files. Including, "dn_ts_OC.nii", which I believe is the denoised ("dn"), time-series("ts"), optimally combined ("OC") .nii file that will be the focus of any subsequent analysis. I do, however, get this warning, "FastICA did not converge. Consider increasing tolerance or the maximum number of iterations." Does this warning mean that my results are invalid? Thanks again for your help, |
We're planning to look into the effect of convergence failure on results in one of our analyses, but I don't believe that it should invalidate the results. Whether the model converges will impact what the components look like, but the components' TE-dependence will be evaluated separately, so the metrics used to remove noisy components from the denoised data should remain valid. @emdupre @handwerkerd What do you think? |
Hi Taylor - Just wanted to follow up --- I tried updating tedana again and I am no longer getting the convergence error, at least for the first few runs of my dataset I have finished analyzing so far. I apologize for the confusion. I think either 1) the software was updated since my initial discussion with you or 2) I was not updating the software correctly before. I have noticed, however, that Tedana now takes much longer to run (>3-4 hrs.). Related to the issue of a long run time, I have a question re: the use of tedana for longer / higher resolution scans. I have a somewhat unique dataset, 5 hrs. collected from a single subject (5 sessions total; 1 hr. per session.... and its fairly high spatial/temporal resolution 2.4 isotropic, TR=1.355) ... which I quickly realized is difficult to denoise as a single concatenated dataset (very RAM intensive), even on a cluster. My solution has been to run tedana on 15 mins of data at a time and then concatenate all the denoised outputs for subsequent analysis. My question is whether there is any reason why denoising in chunks and then concatenating like I described above is not a good idea? Best, |
Hi Chuck! Thanks so much for following up. I'm jumping in here since I think (hope) I might be able to answer some of your questions. Regarding the long runtime: this is a bit difficult to diagnose since we're not sure what version of
These are not exhaustive (there have been a lot of changes to Regarding your chunking of the dataset: since the PCA and ICA steps (which are the crux of |
Great - this is really useful information re: the recent updates that were made to tedana. Re: this issue of chunking. I should clarify --- I have 5 hrs. total; 1 hr. per session; with four 15 min. runs per session (so, 20 x 15 min. runs total). So, I am not chopping up a single continuous hr. run into 15 min. chunks - they were discrete runs to begin with. Does this clarification change your opinion? If not, I think I could try to merge the four runs for each session and see if tedana will run without throwing a memory error (I think I could get away with it). So that is one option... but if we think the noise structure could vary from session to session, presumably it could also differ from run to run. Which makes me think now that maybe denoising each run separately and concatenating the resultant files is the more reasonable approach. Definitely interested in hearing the thoughts/opinions of others here though. Maybe it is useful to provide some more context for this issue. If not, feel free to ignore everything below here. In short, I am interested in resting-state functional connectivity / brain network mapping performed in individuals, and characterizing the reliability (e.g., from session to session) of these individual-defined maps. Given the work showing the effectiveness of me-denoising for cleaning resting-state fMRI data vs traditional strategies, I wanted to test if this approach yields more reliable individual-specific brain network maps / functional connectivity relevant measures. To do address this question, I collected the dataset I described above and plan to examine reliability by iteratively and randomly sampling pairs of non-overlapping epochs of me-dn time-series (varying in size) and comparing their similarity. I plan to then repeat this analysis but without the multi-echo denoising as a control condition [by selecting the single echo most similar to the typical single-echo experiment (30ms) and performing a "traditional" head motion / nuisance regression based denoising]. The hypothesis is that me-dn data will yield more reliable maps with shorter scan durations. This would be useful because right now the general thinking in the field is that one needs large quantities of (single-echo) data per subject to do any reliable mapping (~30-45+ mins of data). So, any approach that can yield higher reliability would be important step, especially for any potential clinical application. Anyways - thank you all for your help and great work putting this toolbox together. Any guidance/thoughts re: this issue of chunking is appreciated! Best, |
Understood! The explanation of your data structure does change things a bit. You absolutely should be processing each run separately, so it sounds like what you're doing (that is, processing each 15 minute run individually then concatenating the output) is the right thing to do. Apologies for confusing you with my misunderstanding! There's a lot of discussions happening in |
Hi tedana experts - I wanted to follow up with some observations and questions re: an analysis I performed, which I described earlier in this comment chain. As a brief reminder, I collected 20 x 15 min. runs of multi-echo resting-state data on myself over a period of five days (4 runs per session, 1 session per day). I wanted to test if reliability of functional connectivity measures across time is better when using tedana to denoise vs. a single-echo strategy . My preliminary results indicate that functional connectivity estimates in the multi-echo denoised dataset were indeed more reliable than those produced using a comparable single-echo denoising strategy (ICA-aroma). Such that with only ~10-15mins of multi-echo data one can achieve high reliability only possible with ~45-60 mins. of single-echo denoised data. But, this was only the case when I used the “kundu-stabilize” tedpca option. I am curious to learn more about the origin and intended use of the different -tedpca options. As far as I can tell from skimming the python code, the kundu-stabilize option is recommended for “low-quality” data, so it was a little surprising to see reliability was highest when using this option When I dug into my data a bit more, I noticed that when using the other -tedpca options (not kundu-stabilize), there are often relatively high kappa components (e.g., components 7,8,10-16 in the “comp_table_ica.txt” file attached here) being rejected that I think (when looking at “betas_OC.nii.gz”) clearly resemble a known functional brain network. Often these rejected components are rejected using the "I006" rationale. I suspect that the reliability of the functional connectivity estimates was negatively impacted when this happened. You can imagine, for example, that if some brain networks were inconsistently classified as "accepted" or "rejected" across runs, it was difficult for them to be reproducible across those runs. As one potential solution, I wrote some code that compares each of the rejected ICA components to a set of independently generated brain network templates (the same ones I am ultimately testing the reliability of) …this code tests whether a rejected component is highly similar (spatially) to any of the brain network templates and outputs a list of components that can then be manually accepted in a second running of tedana. You can see in the .txt file I attached there is a column called “networks” , which indicates whether a component was successfully matched to a brain network and if so, which network. So, my specific questions are...
I am happy to share any of this data/code to help answer these questions. Thanks! PS - It is also probably relevant to point out that I had zero convergence errors when using kundu-stabilize, whereas MLE and kundu often yielded convergence errors. This may have contributed to the differences in reliability, though I havent explored this systematically just yet. |
Thanks so much for posting this investigation, @cjl2007 ! Regarding initial components being rejected: I've just started noticing this as well when reviewing #226. .@tsalo was looking into where this might have been introduced with #229. Regarding the To answer your more specific questions:
Regarding convergence errors more generally: We've just merged in #224, which I think (and hope !) will cut these down dramatically. If you want to try your data with the current github master, I would be curious to see the performance ! Let me know if I can provide any tips on getting that copy of the code into your local environment. |
Thanks for your response, this is definitely useful information. Your points re: the use of network templates are well-taken. I agree that the fewer assumptions we can make about our data, including how BOLD-like signals should manifest spatially, the better. I can only speak to the dataset I am working with currently (N=1; 20 x 15 min. runs), but all of the (initially) rejected components that were subsequently matched to a network template were rejected using the mid-kappa ("I006") rationale ... so if this issue you mention where some of the initial (mid-kappa?) components are rejected is resolved, it would likely make using network templates unnecessary. Although, more generally, I have found that matching the components to canonical networks and including this information in the comp_table_ica.txt output file was useful for me personally from a quality control perspective. For example, I have observed that rejected (kappa < rho) components were never matched to a network template, as one would expect/hope. I would definitely like to try out to newest updates to tedana; especially if some of these issues have been resolved. I am a bit new to GitHub though; so I am unsure how to download this version of the code. I have been updating tedana using "pip install tedana -U" in a miniconda environment that I created on a cluster. |
That's great -- yes, "I006" is exactly what we're looking into :) Happy to hear you're interested in trying it out ! There are some general guidelines for installing commits from GitHub, and for us this will look like: pip install git+git://github.com/me-ica/tedana.git@e44058cc8628880dd6b7cbc5983c74431d548c11 That will automatically install the Then to re-install the current release, you'd run pip install -I tedana==0.0.6 Let me know how that works for you ! |
Great - that was easy, I am rerunning some of the data now and will compare to the output when using 0.0.6. To clarify though, this update does not address the midK/I006 issues, but may help address convergence failure issue (via the added options for max iterations and restarts), correct? |
Yes, exactly. It might (in part) address some midK issues with the new masking procedure, but it's only designed to address convergence issues at present ! |
hi @emdupre - Just wanted to follow up. I am seeing greater likelihoods of convergence after specifying high number of iterations and restarts; ~5k iterations and 10 restarts). I am still a bit concerned, however, about the component classifications, especially with respect to components being rejected due to this mid-Kappa rationale... in my data, these components often contain signal of interest. I have continued to refine a procedure that I run after tedana where spatial templates [both of signal (networks), but also candidate artifacts] and the temporal characteristics (frequency content, relationship with head position traces) of the components are used to automatically fine-tune tedana's initial component classifications (and then I re-run tedana with these manually specified classifications) ...which has been an imrpovement in my hands. For example, this component below which was initially accepted, but is clearly CSF and is rejected for this reason. What I am most concerned about, however, is that "mid-kappa" rejections are very frequent and potentially removing a lot of signal of interest. See below. This is a visual summary of the denoising in a represenative run, and the light green/red dots in the left most panel are components that were manually accepted / rejected, respectively, after fine-tuning. There are about ~10 components that were initially rejected due to this mid-Kappa rationale, which upon visual inspection clearly (I think) represent signal of interest (see component 0 below). This is the premot/somatomotor network. I am assuming that a mid-Kappa rejection means that a components' kappa value falls close to the middle of a distribution of kappa values. I don't understand then how a component with the highest kappa value (e.g., Comp0) can be rejected! By definition, the component with the greatest kappa value is not mid-kappa, right? So, I know from our previous conversation that mid-kappa is on your radar, but I wonder if you could answer the following in the meantime....
Thank you for your time and help! I am python illiterate, so I haven't tried looking into the code myself just yet, maybe I'll take this opportunity to learn (and move away from Matlab finally). Best, |
Hi @cjl2007 -- Thank you so much for sharing this ! Unfortunately I don't have time today to respond in depth, but I just wanted to flag that we have several ongoing conversations on the component selection procedure (#153 is probably the best place to continue those). Essentially, the current selection procedure is directly taken from v2.5 of the ME-ICA algorithm, which was known to have issues with several data types. We're actively working to update this and assess its robustness, and I think it'd be really valuable to have your perspective ! Just to keep our issues focused, I'd love to continue this conversation in #153, bringing in @handwerkerd @javiergcas and @dowdlelt, if that's ok with you 😸 Thanks again ! |
@cjl2007 I agree with @emdupre that what you've done is immensely helpful and that we should continue this discussion in #153, but figured that I should answer your specific questions here.
The components are definitely sorted by Kappa, but mid-Kappa is a bit of a misnomer. It's unfortunately not as simple as identifying components with mid-range Kappa values. I've been elbow-deep in the component selection code for a little while now and I'm still trying to figure out the rationale behind many different steps.
I don't know the conceptual history behind the mid-Kappa rejection criterion, but the mid-Kappa step is supposed to identify components that score high on a combined metric that should index noise in the R2-model and explain a relatively large amount of variance. My guess is that these components should reflect noise that is not being properly identified by the S0 model, but which are also significant enough that they should be rejected instead of ignored. That said, it seems to be a very problematic step. |
I'll add that mid-Kappa is a poor label. the basic way the original decision tree worked is that some components were rejected purely on their kappa & rho values. Then there are a bunch of other factors that are used to reject other components based partially on kappa, but also on other factors. For example, whenever I see a component 0 (highest kappa) component rejected, it was almost always because of that component has relatively high variance (see discussion in #176 ). I've particularly had trouble with the variance-based criteria, which is why I've removed them when I've denoised in the past. Hopefully, as this code gets improved, it will be more obvious why components are rejected, and easier to adjust the decision tree. |
Thanks for the information! Yes, just from qualitatively looking at the component time-courses, I can see that the variability is a bit higher for the mid-kappa rejected components. |
Hello - I am hoping for a little feedback or guidance re: an error I am encountering. To give you some context, I am a new user of tedana / me epi data (<24hrs).
I was initially dealing with a "failure to converge" error -- but after reading through various support forums, I seem to have solved this issue by skull stripping my data first and setting the value of "--conv" to 5000.
Now I am seeing another error, however, later in the processing stream. I have pasted it below at the very bottom of this comment. To give you some more context, I have realigned and skull-stripped the epi data (using FSL command line tools, not AFNI, in case that that matters).
My command looks like this ...
"tedana -d rest_e1.nii.gz rest_e2.nii.gz rest_e3.nii.gz rest_e4.nii.gz rest_e5.nii.gz -e 13.40 31.11 48.82 66.53 84.24 --conv 5000"
Not sure what other information I can provide to help us diagnosis this problem. I am happy to upload my data if that would be helpful. More generally, I understand that this software is a work in progress and I appreciate everyone's time and hard work making this package available in the first place!
Best,
Chuck
INFO:tedana.model.fit:Performing spatial clustering of components
Traceback (most recent call last):
File "/Users/charleslynch/miniconda3/envs/me/bin/tedana", line 11, in
sys.exit(_main())
File "/Users/charleslynch/miniconda3/envs/me/lib/python3.7/site-packages/tedana/workflows/tedana.py", line 391, in _main
tedana_workflow(**vars(options))
File "/Users/charleslynch/miniconda3/envs/me/lib/python3.7/site-packages/tedana/workflows/tedana.py", line 346, in tedana_workflow
n_echos)
File "/Users/charleslynch/miniconda3/envs/me/lib/python3.7/site-packages/tedana/selection/select_comps.py", line 210, in selcomps
rho_elbow = np.mean((getelbow(comptable.loc[ncls, 'rho'], return_val=True),
File "/Users/charleslynch/miniconda3/envs/me/lib/python3.7/site-packages/tedana/selection/_utils.py", line 113, in getelbow
p = coords - coords[:, 0].reshape(2, 1)
IndexError: index 0 is out of bounds for axis 1 with size 0
The text was updated successfully, but these errors were encountered: