Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Operate on data as imgs and lists of imgs throughout package #251

Closed
tsalo opened this issue Apr 5, 2019 · 8 comments
Closed

Operate on data as imgs and lists of imgs throughout package #251

tsalo opened this issue Apr 5, 2019 · 8 comments
Labels
discussion issues that still need to be discussed refactoring issues proposing/requesting changes to the code which do not impact behavior

Comments

@tsalo
Copy link
Member

tsalo commented Apr 5, 2019

Summary

We currently use confusing masking and unmasking methods to keep our data in arrays throughout the workflow. Most functions working on data don't support img-like objects. In my opinion, this makes it harder for users (and even devs) to work with individual functions outside the context of the overall workflow. I propose that we shift to operating on imgs and lists of imgs throughout the package.

Additional Detail

I originally started working on this in a PR (#70), but this was closed without merging in part because it became overcomplicated. The problem was that I was coercing our data into five-dimensional (X x Y x Z x E x T) Nifti1Images, but unfortunately 5D images are not supported for anything beyond their initial creation. If we instead choose to operate on lists of 4D images, we'll be able to use standard masking/unmasking functions throughout the package, instead of having to write our own.

The benefits to this are that it will make it easier for users to interact with specific functions within tedana without a deep understanding of package-specific masking, loading, and file writing functions.

The only con I can see is that this will make some steps take slightly longer, but I don't foresee it having an appreciable impact on speed.

@tsalo tsalo added discussion issues that still need to be discussed refactoring issues proposing/requesting changes to the code which do not impact behavior labels Apr 5, 2019
@rmarkello
Copy link
Member

Just wanted to chime in that I think this is a good idea! This was more-or-less how the package originally operated and we removed it because we were thinking of accepting surface-projected data for a period of time. Since that is off the table at this point (largely due to the difficulties in detecting spatial artifacts) I would definitely 👍 switching back to using img-like objects. Sorry for the difficulty of having to re-refactor!

As for your con: I agree that things shouldn't take too much longer -- once the data are loaded once using nibabel they'll be cached in memory so subsequent loads will be approximately as quick as querying a numpy array.

@tsalo
Copy link
Member Author

tsalo commented Apr 6, 2019

Awesome! This is going to be fairly straightforward, but will involve a pretty thorough refactor, so I'm going to hold off on trying to do it until some of the other extensive PRs (e.g., #247 and #152) are handled. But it's definitely on my to-do list.

@jbteves
Copy link
Collaborator

jbteves commented May 2, 2019

@rmarkello @emdupre do you happen to know if this is contained to a certain commit or even group of commits that could be reverted? Even if there are many merge conflicts it will help us get a good handle on which places need to be changed to revert the change.

@jbteves
Copy link
Collaborator

jbteves commented Jun 1, 2019

@tsalo you had mentioned in the video conference that this was involved. Is there anything I can do to help?

@tsalo
Copy link
Member Author

tsalo commented Jun 1, 2019

I think it's just a matter of putting the time in. We can come up with a standard way of handling it and can split up the functions if you want.

Here's my basic summary of how I think it should work:

  • All public functions should operate on imgs and/or lists of imgs. I'm not sure if we should also support numpy arrays or if that would make things too complicated.
  • I think it's reasonable for private functions to still work on arrays. This refactor is mostly about ease of use, so private functions seem like they should work in the most efficient- not useable- way possible.
  • Multi-echo data can be stored as lists of imgs.
  • Adaptive masks can be stored as lists of imgs, unless we think it's necessary to write a specific function for applying adaptive masks to lists of imgs.
  • We'll need to do something substantial with the eimasking used in TEDPCA. I think that we can build a new adaptive mask matching the same procedure and then can stack the echo-specific masked data. I'm not sure if that requires a separate function or if it can just be done in the tedpca function.
  • We can drop io.filewrite and the ref_img variable.
  • Pretty much every function will require a mask argument (although most already have it, so this isn't that big a deal).

@jbteves
Copy link
Collaborator

jbteves commented Jun 3, 2019

Okay. That all looks reasonable. I would think the simplest way to handle it is for me to add your repository as a remote and open PRs to your repository so we can hash out differences there, then you can open the PR when we're completely done. Thoughts?

@tsalo
Copy link
Member Author

tsalo commented Oct 4, 2019

Recently we've pivoted to the idea of operating on files, rather than arrays or even image objects, per #394 and our monthly tedana developers call. As such, I'm comfortable closing this. Does that sound good to everyone?

@jbteves
Copy link
Collaborator

jbteves commented Oct 4, 2019

@tsalo sounds good to me. One more down ; - )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion issues that still need to be discussed refactoring issues proposing/requesting changes to the code which do not impact behavior
Projects
None yet
Development

No branches or pull requests

3 participants