-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] Adding STC classes that support frequency and trial dimensions #10721
Comments
support for frequency dimension within STCs is already on the roadmap: https://mne.tools/dev/overview/roadmap.html#time-frequency-classes and work is underway on that goal in #10184 (so far only for sensor-space containers though) ...but that doesn't cover including a dimension for |
Thanks for connecting that document and PR @drammock! What do you think about the idea that it might be assembled more modularly than a |
I guess my gut reaction is that if the data is too large to generate the spectrum/spectrogram in-memory, then users should consider either (1) a lower-density source space, or (2) fewer timepoints/frequency bins (AKA downsample / decimate). If neither of those are possible, maybe the pipeline shouldn't be running on a laptop? I know it sounds a bit harsh, but some datasets/analyses are big enough to start to look like "big data" and I'm uncertain how far we should go to accommodate running such jobs on under-resourced hardware. The more magic we do under the hood, the harder that code becomes to maintain. (side note: FFT and an MNE or dSPM inverse are all linear operations, so you can do FFT on sensors before applying inverse and get the same answer as doing inverse first and then FFT, but FFT first is much more efficient because in typical cases you're doing FFT on ~300 channels instead of ~20k vertices.) As far as across trials, we have some support for a |
I disagree, you could have use cases where you want time-frequency decomposed data over 72 hours of continuous data recording sampled at 1 kHz for 100+ channels, it just isn't practically feasible yet to have this kind of data in RAM even if you have a lot of money for hardware resources. Practically, I've found when working with large datasets, it's much more efficient to loop over something like channels where each one is independent or the dependence can be done in one step such as the solving the leadfield. As a very practical example, let's say you have a typical 2 GB file, if you were to do a TFR with 100 frequencies and trial resolution, that's 200 GB of RAM which is just really impractical relative to looping over channels and keeping the max RAM usage around 2 or 4 GB with the tradeoff of having to read/write to disk more. I think, in practice, you're often going to decimate to keep less then 200 GB on disk but you could also use the results right away rather than storing to disk which can be practical when the analysis can be done for each channel independently. |
Let me chime in as well with a few assorted thoughts: Background: We touched upon this subject coming from the visualization side (e.g., visualizing space, frequency, and time interactively for source projections), and then extended our musings towards other use-cases (e.g. trials). Regarding visualization, an array of data would make more sense than a list/generator. One possibility could thus be, to assemble that immediately before or during the call to the viz GUI from the list/generator. This would leave any methods that return such objects alone (e.g. However, so we thought, it could also make sense to change the class such methods return - from my experience, the lists are not great to work with, need more memory than necessary (due to redundancy), and usually have to be manually transferred into an array for further usage (e.g., decoding or so) anyway. Now, I tend to agree with @drammock that (at least for now) we could assume that the user has enough memory to do such operations. Alternatively, we could leave the methods that work across trials alone - and let them return a generator (or have the option to let them return a generator), which would fix part of this issue? From the (GSOC) visualization perspective, we should probably just focus on making sure that we use future-proof input and I think for that @drammock can give us useful hints - so that we can also make sure that sensor and source space objects behave similarly. |
all good points. For the time-frequency visualization use case, there is no question in my mind that a new class is needed (it's already on the roadmap). But I'm -0.5 on also trying to tackle the across-trials interactive visualization problem at the same time. |
@drammock Yes, I see your concerns and I think it'd be great to focus the effort as much as possible. It would still be great to write the visualization with already anticipating coming changes to the input (i.e. @alexrockhill maybe the version where we have a wrapper for current state |
yes, definitely. I haven't yet finalized a source space TFR object's methods/properties yet, but based on existing sensor-space TFR objects you can expect |
I thought about what you said more here @drammock, and perhaps you are right that this is a more reasonable place to start given that we put in the examples and tutorials how to concatenate over time. Whether you want to loop over channels or over chunks of data in time, I think both should be supported as I can see use cases for either (e.g. channels might be for something like sEEG where stats might be contact-level because it's difficult to compare across groups with different montages whereas an MEG/EEG analysis would probably use the same sensor set/montage and the time chunks given lots of data would be more appropriate). Ideally, both ways would be reasonable to make a source space and I think we're a ways away but maybe we can get there this summer. Thanks for the feedback, I'll try and get started on this implementation probably early next week once #10777 and a followup to add the basic GUI merge. |
Per conversation on Discord, it sounds like everyone is on board making an stc class that would be returned by an apply |
This would not be consistent with current STC class, where "space" is really "vertex number" and only uses 1D Also maybe I'm just misunderstanding your text above, but it seems different from how I remember the conversation. My memory is "mock up a 3D STC class (i.e. make a one-off with whatever init signature is convenient, and give it only the methods needed for the GUI testing), then instantiate it and use it to test your WIP GUI. No need to make the mocked class actually be returned by any of our existing functions yet." Please correct me if I'm mis-remembering @britta-wstnr @agramfort |
Yes. 1d axis for space with vertno for indexing a source space is what is consistent
And yes mock an object just to make the gui doable. We will see next how to adjust the public functions to return such an object. You can use a function that converts the list of list of stc to this new object for testing on real data
|
Yup, I agree with @drammock and @agramfort. Also that it is only in principle 5D, but a 3D object. |
Talking to @britta-wstnr about the options for time-frequency resolved source estimates (and also trial-resolved source estimation ideally for forward extensibility) here is what we came up with:
Here is the problem:
stc
attributes all match every timestc
data object would be the easiest but would break backward compatibility for people using the public attributestc.data
Here is the proposed solution:
Avoiding potential issues:
We couldn't think of any other future extension dimensions that would be good to include since this proposal is to make new classes but if there are things, we could consider including them. Connectivity comes to mind but I think should be outside the main MNE-Python repository. TFR seems right in the MNE-Python main repository wheelhouse so I think this would be nice to incorporate so that the visualization GUI can take in an object with all the necessary info and the user doesn't have to cobble it together. What do people think?
cc @agramfort @larsoner @jasmainak related to GSoC
The text was updated successfully, but these errors were encountered: