-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sync timestamps #1
base: main
Are you sure you want to change the base?
Conversation
On a side note, i added the feature before the ordering/recasting currently discussed in sccn/xdf#42 |
Can you make a trivial change and commit it? I'm hoping to trigger a CI build and that it will run the test suite. Maybe add a newline at the end of test_limit_on_synced_stamps.py. |
Seems to work! |
The test in CI gave a warning: |
I guess the next action item is to rebase this commit on the changes in master. |
What's the status of this PR? I'm not even sure what this adds - is this implementing resampling of all streams to a common sampling frequency? @agricolab maybe you could update this PR with a short description in your top comment. |
Updated the top comment with a direct quote from the original pull request. Regarding your question, yes it syncs streams by interpolation/shifting so that samples match. Regarding the status, this is relatively old now, and does therefore not (yet) account for the recent feature of subselection of streams from an xdf-file. |
Nice, this feature is important. I can rebase if you want. |
That'd be great, i am currently quite busy with work, and would probably find the opportunity to work on this not before january. |
cf4b6be
to
fe01058
Compare
Alright, I rebased and fixed some PEP8-related stuff. Maybe I can come back to this before January. |
@agricolab I assume you don't have time to finalize this PR? In addition to rebasing, I think what's currently missing is support for stream selection (which has been added to |
I can make time in the next few weeks. Considering it's been apparently five years in the making, it really is time to resolve this 😅 |
😄 This would be really great! If this is available in PyXDF, downstream projects don't have to roll their own implementations. Speaking of which, we've been discussing the problem of resampling here: cbrnr/mnelab#385 Specifically, we were thinking about a solution when there are long gaps between otherwise relatively regularly spaced samples. @DominiqueMakowski implemented linear interpolation (using pandas), which is certainly better than assuming that samples are regularly spaced, but I am not sure if interpolating such long periods is the best solution. I think that replacing these segments with NaNs would be a better option. WDYT? Is this covered by your changes, and if so, how do you deal with large gaps? |
IIRC there was no consideration for large gaps. I'll make up my mind, but probably it's simplest to let the user decide. Which means even more arguments to the function 🥴 In the end I'd say it depends on the source of the data, or let's say it's complexity / spectrum / non-stationarity and corresponding timescale. |
a927407
to
3db5c6d
Compare
I updated everything to integrate with the most recent version. I also updated the tests, and integrated a smoke test with the example-files (i.e. minimal.xdf) Note that i used a pytest fixture for that.
|
Small note:
I would not only do a special treatment when long gaps are detected and otherwise assume that is regularly sampled. In fact, I would never assume that it is (from experience, it's often not, and a few milliseconds are vital in some applications). + LSL returns the timestamp of all samples so it seems like a shame not to use it and have a loss of precision |
This PR is about creating regularly sampled time series via resampling. So I think large gaps need to be treated somehow in order to avoid interpolating a large gap and instead insert e.g. NaNs. |
@agricolab I was wondering if this PR can also deal with just a single stream, i.e. resample that stream to a regular sampling frequency (and dealing with larger gaps). Or is this already possible with the current release? |
With |
Ok. I implemented shift_alignment as a default, and handle segments. Unit and system tests run fine for segmented and unsegmented streams, and when So i guess it's time to let you fellows play with it. Looking forward to your feedback. Hope i didn't miss a big thing... |
In regards to
I made my life easy. If a user support timestamps that do not cover a span where a stream has samples, shift_alignment will fail with an exception. Therefore, this approach would only work if an interpolation function is supported that can handle this (in fact, supporting an interpolation function for a stream skips any safety checks whether its samples can be assigned to the new timestamps). In that case, it doesn't really matter whether the |
Ok great -- thanks for pushing through! Will have a look this weekend at the latest. |
(Incidentally, is it necessary that |
sorry, that was leftover from an earlier implementation.
yes, and its helper functions. |
Remove docstrings for unused arguments in load_xdf
I refactored the main alignment function |
@agricolab I tried to test this new functionality, but I'm not sure how exactly I'm supposed to use it. For example, I have a data set with a large gap somewhere in the middle of the recording (available here). I would like to load this file and then sync and resample all streams to 256Hz. Then I would like to end up with a 2D NumPy array containing all EEG channels. Is this possible? If so, how? The problem here is that with my resampling implementation in MNELAB, I'm completely ignoring any gaps in the data (cbrnr/mnelab#385). |
Talk about revenants... Oh boy, have to take a look myself after a year or so... So, we have the free function
All you really need to support is the streams, the rest is using defaults. As discussed, the default is So, if you need another interpolation function that can handle resampling, you have to specify the second argument, and assign a callable to those streamIds that need resampling. The module supports in addition to the default That leaves open your question on how are gaps handled. First, gaps are detected as defined by load_xdf IIRC i had to change load_xdf to export this, so it might not work with recent load_xdfs. Second, each aligned timeseries is initialized with NaN and only segments with actual samples are aligned/interpolated, leaving gaps to stay NaNs. See https://github.com/agricolab/xdf-Python/blob/sync_timestamps/pyxdf/align.py#L162-L175 In conclusionSo at least from reading code and docs and memory, you would need to support: The streams, the sampling rate (256 in your case), and a dict defining a callable for your streamids that can handle downsampling. Good luck! If you run into any more troubles, feel free to comment. |
Thanks @agricolab, unfortunately it doesn't work for me using this dataset and the following code: import pyxdf
fname = "sub-13_ses-S001_task-HCT_run-001_eeg.xdf"
streams, header = pyxdf.load_xdf(fname, select_streams=[2]) # EEG stream
pyxdf.align_streams(streams) This results in the following error:
Not sure if this is supposed to work with one single stream in the first place (which would be extremely important for me), but I also tried with two streams which also yield an error: import pyxdf
fname = "sub-13_ses-S001_task-HCT_run-001_eeg.xdf"
streams, header = pyxdf.load_xdf(fname, select_streams=[2, 5]) # EEG and ACC streams
pyxdf.align_streams(streams)
|
Hm. Sorry to hear. I'll take a look asap, probably this weekend. Can you describe a little bit more about your use case or needs for a single stream? I guess you'd want to align that stream to externally supported timestamps? |
Okay, no longer Python 3.12 compatible, because pkg_resources is deprecated... Guess that's another issue. Fixed it by wrapping it into a try-catch. First issueYour first issue stems from the fact that there is no unique mapping from the old to the new timestamps. Likely due to down-sampling. The bug is in the reporting, where i check how many new timestamps are possible for each old timestamp. This is not really relevant, i fixed it for the time being with a catch IndexError. Now it complains
The gist is https://github.com/agricolab/xdf-Python/blob/8fa4645f50904dff6f3fe00ada57c06d1c87d6dc/pyxdf/align.py#L76 In consequence, because your two streams can not be aligned by shifting (because multiple new timestamps are the closest neighbour of the old timestamp), you would need to resample to align both streams. |
Thanks @agricolab! Regarding my use case, I need to interpolate the original data to a regular grid of time stamps. This is necessary, because I cannot just assume that the recorded samples really align on a grid defined by the nominal sampling frequency. This is even true for the effective sampling frequency, because there could be gaps in the samples (and pretending samples are evenly spaced by their effective sampling frequency would neglect that). Therefore, even for only a single stream, I think the safest option is to interpolate to the desired sampling frequency. On the other hand, if you are sure there are no gaps, and you either trust the nominal or the effective sampling frequency, I can just take the samples and pretend they are evenly spaced according to either of those two frequencies. So I think users should be able to make this choice. Do we trust the nominal or effective sampling frequencies? If yes, no interpolation or resampling is required. If no, we have to interpolate (and as a post processing step fill gaps with e.g. NaNs). |
If you rebase onto |
Yeah, but if i rebase, i'll have to fix merge conflicts... :D |
Okay, at first i thought that the second issue stems from some edge case, i..e that new_timestamps older or younger than any old timestamps (i.e. the original timestamps) could not be assigned. That didn't seem to be the problem. Instead it appears that if you do not support timestamps or sampling rate yourself, the function picks it from the stream and creates a new, aligned_timestamps. Now, we face some floating errors
i even had that commented apparently, but alas np.linspace fails as well |
So, maybe it comes from the regularization, i.e. the original timestamps miss the beat somehow, and have an additional sample at a weird timestamp strewn in; or it comes from an interaction between the new aligned_timestamps and having segments.
suggests the latter.... |
Okay, i inspected the data a little bit more and it seems _shift_align can not assign an old_timestamp from the middle of the segment
shift_align goes through all new_timestamps and finds the closest old_timestamp (unless it is already mapped to another new_timestamp. That appears to be the case here. As the new_timestamps (whether created by linspace or arange) throw in both cases (but can be assumed to be unique and more or less correctly spaced, i would assume the issue stems from the fact that timestamps were created differently. By default., pyxdf dejitters timestamps, and this happens with these streams too (othwise we wouldn't have segments). But jitter_removal factually uses different sampling rate for different streams, i.e. it linearizes from first to last timestamp (using linalg.lstsq). But the effective_srate is calculated across all segments. align_streams instead creates new_timestamps evenly spaced across the whole span from the very first to the very last timestamp of the stream. So, i would assume the issue lies there. A small misalignment makes it impossible to shift_align within a segment, because effective srates do not align sufficiently well. In general, i should note that shift_align is expected to fail in most cases, because it has very strict requirements... As a solution, I expanded the exception messaging. If a new_timestamp lies within a segment, but could still not be assigned, _shift_align will now complain that If an old timestamp could not be assigned, it will complain differently, and suggests two solutions: In your case:
|
3d73e6d
to
19b6b58
Compare
When i interpolate instead (i.e.
it passes. using |
Thanks @agricolab, now this makes sense! With thorough documentation (including examples), I think this will be a nice addition! However, I still have questions 😄: Maybe my understanding of pyxdf is too limited, but what exactly are those segments you mention? Are they available in the output? And if the effective sampling frequency ignores segments, it is not really reliable in this case, right? I'm viewing this problem from a user perspective. Basically, I have some streams, and pyxdf gives me those streams on a common time basis. However, I also need them on a uniform time grid, which means that these streams need to be resampled, even if it's just a single stream, but most definitely for two or more streams. For a single stream, I would like to be able to take the samples without any further resampling and just assume the nominal (or effective) sampling frequency is accurate. This is now possible with your |
Easiest is to not use any alignment then at all. Simply discard the information contained in the timestamps, and stick with your assumption. Did it all the time with my pure EEG streams, and it works usually sufficiently reliable if the duration of the recording is not too long. Especially if you use medical grade EEG amplifiers, one can usually assume that no samples are dropped, and sampling rate is sufficiently constant, and assume that all jitter stems from TCP/IP or other transporting or protocol translations. Now, more generally, we face two problems. First, segments. Second, aligning multiple streams with different clocks (typically markers and EEG). Segment can occur when there is a break due to hardware or network issues, and data is delayed or missing. It then depends on your additional knowledge or assumptions. If samples are missing, we are really talking about a gap between segments. If it is just delayed in transport, you might want to ignore the gap and assume constant sampling rate. If you had to change amplifiers during the recording, you might actually have a slightly different clock now. LSL/XDF couldn't really know. All it offers are methods to dejitter and segment streams if gaps are too large. In that sense, it makes total sense if pyxdf assumes that different segments might have slightly different steps between samples. This information about segments is relayed to align_streams. Second, because each stream has its own clock and jitter, no two samples from different streams really perfectly align. Markers are typical examples, and we might want to know the closest sample in another stream. Or we might want to align different physiological signals, e.g. EEG and accelerometer data. LSL/XDF couldn't really know. So you can decide whether you only want to shift the data (works usually quite well with a Marker stream) or actually interpolate. The latter means you have to rely very much on the information in the timestamps (or manipulate it before you pass it to align_streams). Align_streams attempts to map all data to common timestamps. That usually works always if one uses some kind of interpolation. But the default is shift_align which guarantees that the data is not changed. If you feel comfortable with interpolated data, you can do so explicitly. In that sense, the policy of LSL/XDF is that it manipulates and controls timestamps, but passes the actual data (i.e. timeseries) untouched. Now, back to the start. If you have a single, non-segmented stream, and believe that the sampling rate should be uniform, just discard the timestamps and take the manufacturers sampling rate as gospel. |
Thanks @agricolab, that's very reasonable! Just one more minor question, how do I know that pyxdf detected segments? Is this reflected in the output somehow? |
Jupp, you can control detection with jitter_break_threshold_* (i.e. https://github.com/xdf-modules/pyxdf/blob/main/pyxdf/pyxdf.py#L165C1-L173C74) IIRC for some reason, pyxdf exposes this not by default. I had to change https://github.com/agricolab/xdf-Python/blob/sync_timestamps/pyxdf/pyxdf.py#L405C9-L405C36 to actually pass this info. IMHO, should be default :D |
Oh, and obviously, you can turn dejittering and therefore segmenting completely off by setting |
Interesting, I've never paid attention to those arguments. But how does the output structure differ if there are segments? The docs don't mention that... |
That's the point. AFAIK it doesn't. Only the fork from this PR exposes it explicitly in the info field so that align_streams can use that. |
😮 |
As requested, i moved the pull request to the new repo structure according to sccn/xdf#28
As mentioned there:
I had to manually copy the code, as the history is broken, and hope i did not miss anything. Tests did run fine.