-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix participant tsv values when Info
contains weight and height information as numpy arrays
#1310
Conversation
Hello, thanks for the contribution! I'm wondering if this shouldn't be addressed upstream in MNE-Python itself. Is this documented behavior? Seems like a bug to me tbh! |
Well, in my opinion, yes but who knows what kind of surprises are waiting considering the high number of formats that MNE-Python supports and the often non-standard implementations of vendors and libraries out there. I'm happy to report and hunt this down in MNE-Python. But I think that if MNE-BIDS expects data to be of a certain format, it is a good idea to a) check if the data fulfills these requirements and b) fixes it if it is an easy fix. And this is basically all this PR does. |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #1310 +/- ##
=======================================
Coverage 97.45% 97.46%
=======================================
Files 40 40
Lines 8810 8840 +30
=======================================
+ Hits 8586 8616 +30
Misses 224 224 ☔ View full report in Codecov by Sentry. |
I generally agree, but I believe MNE-Python should do the validation and not allow for invalid stuff to appear in its Info structure. I'm surprised this can be even written to disk that way … So in any case, I'd like to escalate this to upstream just to be sure cc @larsoner |
Yes MNE-Python should make sure everything in |
Yes, this should be validated in MNE-Python and yes, this is probably an MNE-Python bug. However, it is always the responsibility of the function that takes arguments to validate its arguments. Especially as MNE-BIDS does not explicitly rely on the reader functions of MNE-Python. The |
Agreed that inputs must be validated but I think for MNE-BIDS this should be essentially Even if you don't buy this argument, there is at least a practical one in that MNE-Python already has a bunch of Info-validation stuff built into it already, so from that standpoint it makes more sense (more DRY) to continue implementing checks there rather than here. |
Thanks a lot @thht for this report! I agree with eric to have the "core validation" in BaseRaw at mne, and then a basic check in mne-bids. |
The thing is, I can do something like this: raw = mne.io.read_raw('valid_data.fif');
raw.info['subject_data']['weight'] = np.array([np.random.rand((200, 20))])
write_raw_bids(raw, ....) And the current implementation would just process it. AFAIK, there is no proper and feasible way for MNE-Python to make sure that the And the point is not to validate the In this particular case, the non-validation leads to a quite confusing error message that the participant is already in the participant file. Which is true but the specified behavior in that case is that if all information about the participant matches, there is no error. Adding this bit of validation produces either the documented behavior or a more meaningful error message. |
We have some processes in place in MNE-Python that at move us in this direction. They are not perfect but incremental progress is usually made when bugs like this show up actually! So at the very least I think we should try to extend these at the MNE-Python end and see how far we can get.
The idea is that they (EDIT: meaning, the enforcement functions) shouldn't need to be documented because the enforcement should automagically happen. And enforcement does happen for some operations, e.g.:
or for something you shouldn't be able to set:
etc. So in this sense I think we "just" need to extend this logic a bit further to the
So there is probably a placeholder stub where we can already start working this validation in. |
generally I agree with @larsoner that new validation code belongs in MNE-Python. But if there are already files out there in the wild that have array-like values in the |
Yes, to make it clear: It's great and important that MNE-Python handles validation of its own datastructures. And if this issue uncovered a bug in that validation and/or implementation of reading fif files, this is valuable information that @larsoner has already appreciated. And I don't think that MNE-BIDS should handle every possible bad/strange/malformed input it gets thrown at. But as @drammock pointed out, stuff that is found in the wild and that can be easily handled, why not handle it? |
We could easily validate (in MNE) at runtime with Pydantic. So saying that this is not possible is not quite right. |
My understanding is that you're essentially saying that there is no need to enforce a proper interface, with which I strongly disagree. It's MNE's responsibility to ensure it provides data as documented. Downstream tools like MNE-BIDS must not need to worry about this; anything else is a bug -- either in MNE or in user code. BIDS validation, OTOH, is a different issue. My impression is that these two things are being conflated here, hence the disagreement. |
No, this is not my point. My point is that if a proper interface is expected, it must be enforced by the software accepting the data. It's the prerogative of the software what to do in case of a violation. It can be a proper error message, or doing a workaround with our without a warning. My entirely subjective opinion is that in this case, the interface violation could be handled within MNE-BIDS, because it's an easy and effective fix that might handle similar errors as well and increase the user experience with minimal risk of creating new unwanted behavior. Yet, I totally get the argument that MNE-BIDS should only accept valid inputs and not correct invalid ones. That is your choice, of course. In any case, my argument is that if MNE-BIDS expects data in a certain format, it is of course the responsibility of the provider to ensure conformity. But it is still the responsibility of the data sink to make sure the data is valid. Because it's correct to require conformity but you can never trust that incoming data meets it. |
I suppose I'm against special-casing your specific use case. I've never seen this problem before and I wonder where one would stop? I'd argue that there's a bug in your code and the proper way to prevent this would be by improving MNE-Python; and to prevent writing invalid BIDS, MNE-BIDS must check the outputs it produces more carefully; but it's not MNE-BIDS's responsibility to ensure that an upstream package uses its own, documented interfaces correctly. This is not web dev where we receive and need to marshal input from an untrusted party. So of course we can rely on the data being submitted in the correct format; and if that fails, we need to fix that bug where it originates from (i.e., upstream), not add more special-casing. |
That said, I really appreciate your input; I just don't think it's the best approach for this specific problem. But in the end it's @sappelhoff who will have to decide 😃 |
I'm pretty sure it's not my code because all it did is that the file extract some information and send it to write_raw_bids. The code should never touch the info property. Might be some side effect, sure. Gonna check tomorrow to be sure. Anyhow, I can agree with you that MNE-BIDS should not correct the wrong input. I still think that the error message should be improved as what it reports at the moment is not what the issue is. If you find a consensus among yourselves, I'm more than happy to alter the code so that it checks what it wants to hand off to the participants tsv writing function and just produces a better error message. And thanks! I appreciate your work on this and I also enjoy this constructive, respectful discussion on different opinions 😀 |
[...]
This comes across as a bit harsh to me. If I look back at the very first sentence of the PR description, @thht says
...which doesn't sound like a case where the user wants to do something unusual / out-of-spec. That said, I'm actually going to take back my earlier statement that adding more permissivity to MNE-BIDS might be alright in this case. The right fix is to adjust the MNE-Python reader function(s) (and possibly to make the MNE-BIDS error message more informative/accurate, e.g., by checking for nested arrays in the relevant fields of |
I think we have found that consensus: We want to improve the way I would be happy to include additional checks in Having said all of this, I personally do not have the capacity to work on the validation in If you can't and nobody else can for the forseeable future, I would be happy to merge the present fix as an interim solution. So to move on:
|
I'm totally fine with merging this as-is, as long as we decide to address this problem upstream sometime in the future. 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! I also opened mne-tools/mne-python#12874
mne_bids/write.py
Outdated
else: | ||
raise ValueError( | ||
f"Value for key {key} is a list with more " | ||
f"than one element. This is not supported." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
f"than one element. This is not supported." | |
f"than one element. This is not supported. " | |
f"Got: {cur_value}" |
for key in data.keys(): | ||
cur_value = data[key] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for key in data.keys(): | |
cur_value = data[key] | |
for key, cur_value in data.items(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue here is that I do:
data[key] = new_value
within the loop. And afaik, one should never change what is iterated over. It might be save here, because care is taken that only the current entry is changed... But I think, it is safer this way....
raw.info["subject_info"] = { | ||
"weight": np.array([75.0]), | ||
"height": np.array([180.0]), | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually just thought of an issue... whenever mne-tools/mne-python#12874 is tackled (which I was about to try to see if it's trivial!) this will raise a ValueError
. So I think you need to bypass the soon-to-be-added MNE protection:
raw.info["subject_info"] = { | |
"weight": np.array([75.0]), | |
"height": np.array([180.0]), | |
} | |
# bypass MNE-Python setter protection | |
dict.__setitem__( | |
raw.info, | |
"subject_info", | |
{ | |
"weight": np.array([75.0]), | |
"height": np.array([180.0]), | |
}, | |
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm afraid this does not do the trick because while the initial setting works, the write_raw_bids
function copies the raw
object here. And this calls the validation again. I have not found any way around it, including trying to mock the __setitems__
method to be the original one from dict
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pushed a commit using monkeypatch
that should work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so that monkeypatch will essentially always have to be there now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the test to make sure that MNE-BIDS has its own validation, yes
I have restarted the CIs now that: has been merged. Once this PR has been adjusted to the new state of things in mne, we can merge it! |
Hi! Sorry, I had some work that came up and was super urgent. I'm taking care of this now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What exactly does this PR do, now that mne does not allow weight and height to be arrays anymore?
At least for MNE < 1.9 it will do some helpful validation |
for more information, see https://pre-commit.ci
PR Description
We have raw fiff files at our institution that produce weight and height information as numpy arrays when read with the standard mne functions. These values are not unpacked so they end up as arrays in the participant.tsv. This also leads to wrong behavior when a second run for the same participant and session is written as the detection for already existing rows in participant.tsv then fails. Check the added test for an example and simulation.
This PR adds some sanitation to the handling of participant data, namely making sure that no nested lists/arrays are in there. If it does find a nested list/array and the inner array only has one item, that item gets pulled up. Otherwise, it throws an exception.
Merge checklist
Maintainer, please confirm the following before merging.
If applicable: