Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Report which custom reference was applied #8962

Open
hoechenberger opened this issue Feb 25, 2021 · 32 comments
Open

Report which custom reference was applied #8962

hoechenberger opened this issue Feb 25, 2021 · 32 comments
Labels

Comments

@hoechenberger
Copy link
Member

Currently, after calling set_eeg_reference() with a custom channel name, or invoking add_reference_channels() (which calls the former after adding all-zeros channels), to my knowledge it's impossible to retrieve which reference channels are being used in the current referencing scheme. The only thing that gets set in info is 'custom_ref_applied'.

import os.path as op
import mne

data_path = mne.datasets.testing.data_path()
examples_dir = op.join(data_path, 'Brainvision')
vhdr_file = op.join(examples_dir, 'Analyzer_nV_Export.vhdr')

raw = mne.io.read_raw(vhdr_file, preload=True)
raw.set_eeg_reference(['F8'])

# do fancy stuff here
# ...
# Wait … WHICH reference did I apply again? Guess I'll have to find the channel
# with all-zero data :(

Isn't there a way to keep track of which references were set?

@cbrnr
Copy link
Contributor

cbrnr commented Feb 26, 2021

Maybe we can use this existing attribute, but instead of True/False we set it to the reference channel name(s)? I'd also include the average reference case here, which is currently also not stored anywhere.

@hoechenberger
Copy link
Member Author

hoechenberger commented Feb 26, 2021

Maybe we can use this existing attribute, but instead of True/False we set it to the reference channel name(s)?

Currently, the following values are supported:

FIFF.FIFFV_MNE_CUSTOM_REF_OFF       = 0
FIFF.FIFFV_MNE_CUSTOM_REF_ON        = 1
FIFF.FIFFV_MNE_CUSTOM_REF_CSD       = 2

(CSD – current source density)

The correct value is then stored at the following location

FIFF.FIFF_MNE_CUSTOM_REF             = 3567     # Whether a custom reference was applied to the data

In any case, when writing the measurement info, we do the following:

mne-python/mne/io/meas_info.py

Lines 1681 to 1682 in 8e3bcef

if info.get('custom_ref_applied'):
write_int(fid, FIFF.FIFF_MNE_CUSTOM_REF, info['custom_ref_applied'])

I don't think we can simply write a different data type there…
At first I thought: Oh, cool, it's an int, so we simply write the index of the reference channel! But … obviously this wouldn't work for channels 0 to 2 ;) Also you wouldn't be able to specify more than just a single reference.

As for the avg reference:
It would be easy to add something like

FIFF.FIFFV_MNE_CUSTOM_REF_AVG       = 3

however this would break compatibility with existing code, as so far FIFF.FIFF_MNE_CUSTOM_REF has been set to 0 (i.e., False) when an average reference had been applied.

@hoechenberger
Copy link
Member Author

(heads-up, I've edited my above post several times)

@cbrnr
Copy link
Contributor

cbrnr commented Feb 26, 2021

Yeah, that's the old FIFF format compatibility story which is really holding us back. If we can't change info["custom_ref_applied"], what about adding info["reference"]?

@hoechenberger
Copy link
Member Author

hoechenberger commented Feb 26, 2021

Yeah, that's the old FIFF format compatibility story which is really holding us back. If we can't change info["custom_ref_applied"], what about adding info["reference"]?

Can one "simply" add new entries to the FIFF files? I wasn't aware of that!

I was also thinking about abusing the new per-channel Annotations somehow… But this is really not my area of expertise. Let's wait for @larsoner to chime in here :)

@hoechenberger
Copy link
Member Author

hoechenberger commented Feb 26, 2021

Also we'd want to support the REST ref as well. And maybe future references which haven't been invented yet…

@cbrnr
Copy link
Contributor

cbrnr commented Feb 26, 2021

No, we can't simply add new FIFF fields, but I thought that exporting to FIFF doesn't simply dump the whole Info object, but instead picks out the relevant fields. If this really works like this, we can add a new field which will be ignored. This field should be a string or list of strings, so any future references are automatically supported.

@hoechenberger
Copy link
Member Author

hoechenberger commented Feb 26, 2021

No, we can't simply add new FIFF fields, but I thought that exporting to FIFF doesn't simply dump the whole Info object, but instead picks out the relevant fields.

Yes, that seems to be the case

If this really works like this, we can add a new field which will be ignored.

👌 You mean ignored when read through software that doesn't know about this field, right?

@cbrnr
Copy link
Contributor

cbrnr commented Feb 26, 2021

I meant ignored by MNE when writing to FIFF - so this field shouldn't be written. However, it would be even better if we wrote that field and programs would just ignore it when reading if they don't know about it. But let's wait for @larsoner, he'll know for sure.

@hoechenberger
Copy link
Member Author

I meant ignored by MNE when writing to FIFF - so this field shouldn't be written.

But then you'd lose the info during an I/O roundtrip, which is super confusing IMHO…

Let's wait for @larsoner ;)

@cbrnr
Copy link
Contributor

cbrnr commented Feb 26, 2021

But then you'd lose the info during an I/O roundtrip, which is super confusing IMHO…

I know, but FIFF was never designed to hold all that information. If we can easily add optional fields, then everything is fine. Maybe we should think about a new simple format such as pickling raw?

@hoechenberger
Copy link
Member Author

Maybe we should think about a new simple format such as pickling raw?

I do love pickles (Spreewald 🥳) but I don't like pickled data :) It's not really portable and not suitable for long-term storage.

I'd actually prefer something like a simple JSON sidecar file or so…

@cbrnr
Copy link
Contributor

cbrnr commented Feb 26, 2021

I don't like the pickle format either, but two files are also asking for trouble. Maybe we should just store as .mat (I'm only half-joking here)...

@larsoner
Copy link
Member

If this really works like this, we can add a new field which will be ignored.

But then you'd lose the info during an I/O roundtrip, which is super confusing IMHO…

Agreed, Info should only hold things that will survive I/O round trip. If we need to store more information there we should improve the spec.

Maybe we should think about a new simple format such as pickling raw?

I don't think we plan to jump to a new format anytime soon and even if we do, we need to make FIF work as well as possible so I don't see it really helping with the problem.

We can add things to info the right way -- i.e., by enhancing the fiff-constants spec -- but we need to justify it and do the work of coming up with an appropriate spec.

I agree it would be nice to store what each channel's set of references is more flexibly. For example in sEEG if you have a ton if bipolar-reference pairs, it would be nice to store properly these references and be able to recover them. This case is technically possible now because you can store one ref location in info['chs']['loc'][3:6]. However, our system currently only works if each channel only has a single reference, or all channels use an average reference.

Is there a format that stores such arbitrary reference information well? We could look to that for guidance.

Just to give one proposal -- we recently spec'ed out and added support for an enhanced channel information block, mostly to get us past the 15-char limit for FIF. To make a concrete proposal, then, we could add a new tag there like FIFF_CH_REFS, which is a float32 ndarray of shape (N, 4) containing the x/y/z location (first 3 cols) and weight (last col) for the N references used for the channel. For standard Neuromag data and standard one-ref EEG acquisitions, this would just be (ref_x, ref_y, ref_z, 1.). If you did an average reference using a direct application instead of a projector, this would be all channel locations with a value of 1 / N in the last column. For non-EEG channels it's empty.

@hoechenberger
Copy link
Member Author

hoechenberger commented Feb 26, 2021

I agree it would be nice to store what each channel's set of references is more flexibly. For example in sEEG if you have a ton if bipolar-reference pairs, it would be nice to store properly these references and be able to recover them. This case is technically possible now because you can store one ref location in info['chs']['loc'][3:6]. However, our system currently only works if each channel only has a single reference, or all channels use an average reference.

But this would only help you recover the location of the reference, not its name or channel index (in case the channel is present in the data). To find out if the specified reference channel is even still part of your data, you'd have to iterate through all channel locations until you hopefully find the matching coordinates … seems awkward?

Which also highlights that there are two different situations we have to deal with:

  1. reference channels that are present in the data – they'd have a channel index, an associated name, and a location
  2. reference channels that are not (currently) present in the data. They'd only have a channel location (if any), and potentially a name.

When I opened this issue, I was only thinking about addressing case 1, assuming this would be easier. We'd force users to add missing reference channels via mne.add_reference_channels(), reducing the amount of special-casing needed to be done surrounding those references.

To make a concrete proposal, then, we could add a new tag there like FIFF_CH_REFS, which is a float32 ndarray of shape (N, 4) containing the x/y/z location (first 3 cols) and weight (last col) for the N references used for the channel. For standard Neuromag data and standard one-ref EEG acquisitions, this would just be (ref_x, ref_y, ref_z, 1.). If you did an average reference using a direct application instead of a projector, this would be all channel locations with a value of 1 / N in the last column. For non-EEG channels it's empty.

I'm not sure we really need the weights. I'd just assume all refs are weighted equally … I mean, I've never seen anything else during EEG processing. But of course, your experience may differ.

Again, here we're only talking about channel locations only, instead of indices or names. Doesn't that make things unnecessarily complicated for case 1 described above?

@larsoner
Copy link
Member

To find out if the specified reference channel is even still part of your data, you'd have to iterate through all channel locations until you hopefully find the matching coordinates … seems awkward?

Let's be clear about the "who" is here. When I'm talking about keeping track of the information in the FIF spec, the "who" I'm talking us as maintainers dealing with our internal storage of the information and I/O possibilities.

For the user -- a very different "who" -- we can easily expose some function or something that converts these locations to channel names, or whatever they need.

So to me the I/O spec should be complete (what I was addressing), and the user-facing API is a separate issue.

To find out if the specified reference channel is even still part of your data, you'd have to iterate through all channel locations until you hopefully find the matching coordinates … seems awkward?

For the base case of Neuromag EEG data, you won't find one because there is a single ref electrode and it's not in the data (it would be all zero anyway). So we're already in this situation for a lot of datasets. And I'm not too worried about having to do this iteration, a suitable np.allclose should get us there. And we update set_montage or anything else that updates channel locations to be "channel reference aware". So I don't think it's too difficult for us to internally manage this.

Doesn't that make things unnecessarily complicated for case 1 described above?

Only in terms of how we internally deal with things, not to users. And it actually supports use case (2), and works like things do now ("store the location of the reference" -> "store the locations of any references").

@larsoner
Copy link
Member

I'm not sure we really need the weights.

I don't know what sorts of refs people have used, but having weights allows for (nearly) arbitrary linear transformations of the data, which is nice. And the storage here is pretty trivial so I don't see any problem with including it, even if we never really end up using it.

@hoechenberger
Copy link
Member Author

hoechenberger commented Feb 26, 2021

Thanks for the clarification, @larsoner! If you think the np.allclose() approach would work to construct location <> channel mappings, I'm +1 for your proposal!

@hoechenberger
Copy link
Member Author

How would we encode the REST referencing scheme?

@larsoner
Copy link
Member

REST is weird as it depends on passing a forward etc. We could think about adding a new constant for it or something but this seems like a far less critical case than standard channel-based references

@hoechenberger
Copy link
Member Author

REST is weird as it depends on passing a forward etc.

Yep. But a way to store a name – simply the string "REST" – could make sense, don't you think?

@larsoner
Copy link
Member

Actually on second thought IIRC REST is also known as an infinity reference, so we can signal it's being used even in main just by setting locs[3:6] to np.inf, and we can also do this in the system I proposed. So no need for a chance just for that

@hoechenberger
Copy link
Member Author

@larsoner Good idea!

What do you think, how should we go about moving forward here? Clearly I'm lacking a big deal of expertise, but if you think I could somehow contribute, please do let me know!

@agramfort
Copy link
Member

using locations to know how channels were referenced should not be the primary source of information. I anticipate that we have EEG data without channel locations but we some channel based references. See channel names you see in EEG world (eg saying that it's Cz-M1 referenced to M1). This suggests that the channel block idea (which I like) should contain a list of channels and not locations). Note that it means that it may contains channel names that don't exist in inst.ch_names or info['chs"] and it contains no data...

@larsoner
Copy link
Member

Note that it means that it may contains channel names that don't exist in inst.ch_names

This is exactly the case for data acquired by Neuromag systems currently for example. So I think just keeping track of channel names won't work for systems that do actually encode physical locations, just like you think keeping track based on locations won't work for systems that don't.

What if we keep track of name+location of each reference? This would just mean one more tag with is a list of names to match the list of locations.

@hoechenberger
Copy link
Member Author

What if we keep track of name+location of each reference? This would just mean one more tag with is a list of names to match the list of locations.

Sounds good to me. @agramfort's comment also reminded me of EEG recordings we'd done a few years back with reference electrodes placed on the tip of the nose. These would not have any proper location associated with them.

@agramfort
Copy link
Member

agramfort commented Mar 1, 2021 via email

@hoechenberger
Copy link
Member Author

but to me you will always have names even if they are ad-hoc names like EEG001,EEG002, etc.

Yep!

@cbrnr
Copy link
Contributor

cbrnr commented Mar 1, 2021

Whatever you decide is best to represent the reference internally, I would really love to see a public API that simply gives me the reference as a string (e.g. "C3", "average", "REST", ...) or a list of strings (e.g. ["M1", "M2"]). Many EEG data sets do not specify the reference, so in that case we could simply set this to "unknown".

Come to think of it, maybe we shouldn't even use "average", because we could use a list of all channel names instead. This would be more explicit and would avoid ambiguities such as whether or not the original reference channel was added back to the data before the average reference was computed. If references consisting of multiple channels that are weighed differently are a thing, we could also store a list of weights in addition (which by default consists of all 1/N elements, where N is the number of reference electrodes).

@larsoner
Copy link
Member

larsoner commented Mar 1, 2021

but to me you will always have names even if they are ad-hoc names like EEG001,EEG002, etc.

It sounds like you're pushing to store only the names? To me this is still problematic because it's lossy. If we eventually want to do anything having to do with the physical arrangement of the recording setup (e.g., forward modeling), we need the locations of the references, which might not be proper channels (in the sense of having an entry in info['chs']) in the dataset, hence they could be missing/gone.

We have some recordings with just channels and no locations. We have other recordings with (at least) a single ref location and no channel name (Neuromag). And in theory it's not hard to imagine a recording with just reference locations stored in the dataset somehow (e.g., a recording with two EEG data channels, Cz and FCz, each of which was recorded referenced to the average of left and right mastoids), and in practice it's easy to get this after some picking/dropping.

To me I don't see a downside to storing names+locations rather than just names. In terms of implementation it's not much harder: if we set an EEG location based on a channel name (set_montage is really the only place we do this I think?) then we need to check the EEG reference names and set those locations, too.

@agramfort
Copy link
Member

agramfort commented Mar 2, 2021 via email

@hoechenberger
Copy link
Member Author

I like that!
It's almost like eating the cake and having it too! 🍰

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants