-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Streams in load_xdf should not be ordered by default #35
Comments
I think the issue here is that in an earlier step the streams are collected in a dictionary which has no order, meaning that without sorting they would come back in random order each time the same file is loaded. I’m on mobile now so I can’t easily confirm. If this is the case then maybe the streams should be first loaded into an OrderedDict. |
Thanks for the input @cboulay. You are correct on fact that streams are saved in a dictionary, but since Python 3.7, dictionaries preserve insertion order. It is still strongly suggested to use an
That's very interesting. Do you have any documentation that claims this so I could read a bit further on this and have a solid proof to argue against relying on order? |
Addresses sccn#35 but note: apparently, one should not count on a specific order of streams (it can be arbitrary). I am still not sure if this warning concerns the StreamId as well.
Each stream, including its header, is written by a stream-specific thread. See here. Relying on thread ordering is a bad idea and the discussion could end here. Maybe because of the threading architecture, maybe for other reasons, no attention is paid to stream ordering elsewhere in LabRecorder. Streams are ordered according to the order that |
Ok, I am convinced to not rely on stream ordering and the purpose of this issue is now weak to me. |
I think we should probably just delete the lines that do the sorting. I'll leave this issue open until I find time to delete the sorting (and test the result). If you want to make a PR for this small change then please go ahead. You're right that returning a dictionary would be better than returning a list with dubious ordering, but this would be a breaking change and thus come at a high cost for a relatively small benefit. |
Ok I can do the PR later today. |
I also stumbled upon this recasting part at the end of load_xdf. It makes development tricky. One option to ensure backwards compatibility is to make sorting an option with default true; or making load_xdf a load_xdf_as_dict() which does everything but sort. |
Hello, |
All right, I actually did not do a PR because I am not sure what the consensus is. I have also encountered the same problem described by Raymund; I have some XDF files with streams that have the same name and the current sorting falls back to the second tuple element comparison (the streams) and fails. So far what I gather is that:
I lean towards the dict option and I would only need to know:
|
Sorry, my original description was a little bit off. My original suggestion should have been(for backwards compatibility) to keep While sorting makes a lot of sense imho for a a list, it is pretty unnecessary for a dictionary, I find the recasting from dict to list at the end of load_xdf unnecessary. I personally could live with a break in backwards compatibility, end even if the output would be a regular dict instead of an OrderedDict. Consider also that recasting a dict as a list is pretty simple, especially if there is a backwards-compatibility function that does that. So, i personally would prefer load_xdf returning by default an unsorted dict of streams, not a list. That means, add a deprecation warning for the time being. But i'd tend to agree with what @cboulay think's is better. Point against my view: We might also consider that if xdf is to become BIDS-compatible, and if there is a use case where we want to sync all streams sample-wise and store everything as a single stream, we might need to find a way to ensure that all signals are sorted identical across all calls to load_xdf. |
What about storing the |
The best solution, however, would be to directly return the ordered dict, because it already contains the stream IDs as keys. How terrible would it be to make this breaking change? If this is properly documented, people using the old list return type would need to change only very little to accommodate for the new dict return type. |
So basically I'm +1 on @agricolab's suggestion to make the breaking change (and even use a plain |
Fixed in xdf-modules/pyxdf#3. |
This can be closed. If you would like to continue discussing the type of the returned object (e.g. if you'd rather have a dict instead of a list), please open a new issue in the new repository https://github.com/xdf-modules/xdf-Python. |
I recently found that there was a difference between the order of the streams in the Matlab and Python loader functions. I read the xdf specification, and there is no mention on the order of the streams. However,
_load_xdf
does this:Sorting the streams by their name may be useful, but it may break for a user that expects stream 0 to be of a particular type, stream 1 to be of another particular type, and so on, because this might be the way the data is saved from their acquisition procedure. Moreover, this sorting is not documented, so it comes as a surprise.
What I propose is to either drop this sort (a breaking change) or, more conservatively, add a keyword parameter (which defaults to the current behavior) to perform this sorting:
(I simplified the 3-line code to sort the indices)
If you guys agree with this, I can file a PR with this change.
The text was updated successfully, but these errors were encountered: