Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Annotation timing info loss due to datetime.timedelta precision limit #12327

Open
JonathanShor opened this issue Dec 26, 2023 · 9 comments
Open
Labels

Comments

@JonathanShor
Copy link

Description of the problem

Sub-microsecond timing data is lost when pushed through a datetime.timedelta object step, for example in mne.annotations.crop. As raw.set_annotations calls crop internally, this can lead to misalignment of the created annotations and raw.times, with the annotations covering slightly different frames than expected.

This will only affect cases where 1/sfreq has sub-microsecond precision, but 512Hz is common in my experience, and 1/512 = 0.001953125s.

Steps to reproduce

from datetime import datetime, timezone

import numpy as np
from mne import Annotations, create_info
from mne.io import RawArray

info = create_info(5, 512, "eeg")
data = np.random.randn(5, 10000)
raw = RawArray(data, info)
print(raw.times[1])
# 0.001953125
annotation = Annotations(onset=raw.times[1], duration=1, description="test")
print(annotation[0])
# OrderedDict([('onset', 0.001953125), ('duration', 1.0), ('description', 'test'), ('orig_time', None)])
raw.set_annotations(annotation)
print(raw.annotations[0])
# OrderedDict([('onset', 0.001953), ('duration', 1.0), ('description', 'test'), ('orig_time', None)])
print(raw.time_as_index(raw.annotations[0]['onset']))
# array([0])

Link to data

No response

Expected results

The annotation details shouldn't be lost when attached to the raw object. The correct original range of frames should be as precisely obtainable from the timing data as before attached to the raw object.

annotation[0]['onset'] and raw.annotations[0]['onset'] should be equal at 0.001953125.
raw.time_as_index(raw.annotations[0]['onset']) should return array([1]), corresponding to how the annotation was created.

Actual results

annotation[0]['onset'] is 0.001953125 while raw.annotations[0]['onset'] is 0.001953.
raw.time_as_index(raw.annotations[0]['onset']) returns array([0])

Additional information

Platform Linux-3.10.0-1160.102.1.el7.x86_64-x86_64-with-glibc2.17
Python 3.11.5 (main, Sep 5 2023, 13:23:33) [GCC 4.8.5 20150623 (Red Hat 4.8.5-44)]
Executable /isilon/LFMI/VMdrive/Jonathan/PriorWeightingMooney/Visual/ECoG/Code/.venv/bin/python
CPU x86_64 (20 cores)
Memory 96.2 GB

Core
├☑ mne 1.5.1
├☑ numpy 1.26.2 (OpenBLAS 0.3.23.dev with 20 threads)
├☑ scipy 1.11.4
├☑ matplotlib 3.8.2 (backend=module://matplotlib_inline.backend_inline)
├☑ pooch 1.8.0
└☑ jinja2 3.1.2

Numerical (optional)
├☑ sklearn 1.3.2
├☑ nibabel 5.1.0
├☑ nilearn 0.10.2
├☑ pandas 2.1.3
└☐ unavailable numba, dipy, openmeeg, cupy

Visualization (optional)
├☑ pyvista 0.42.3 (OpenGL unavailable)
├☑ pyvistaqt 0.11.0
├☑ vtk 9.3.0
├☑ ipympl 0.9.3
├☑ ipywidgets 7.8.1
└☐ unavailable qtpy, pyqtgraph, mne-qt-browser, trame_client, trame_server, trame_vtk, trame_vuetify

Ecosystem (optional)
└☐ unavailable mne-bids, mne-nirs, mne-features, mne-connectivity, mne-icalabel, mne-bids-pipeline

Copy link

welcome bot commented Dec 26, 2023

Hello! 👋 Thanks for opening your first issue here! ❤️ We will try to get back to you soon. 🚴

@agramfort
Copy link
Member

agramfort commented Dec 27, 2023 via email

@JonathanShor
Copy link
Author

Getting away from using datetime.timedelta is the direct approach, but not being familiar with MNE internals & existing dependencies I was unsure what replacement would make the most sense.

Using pandas datetime equivalents would achieve nanosecond resolution

`pandas.Timedelta` has nanosecond resolution and per the docs(https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Timedelta.html) "equivalent of python’s datetime.timedelta and is interchangeable with it in most cases". I suppose you must already have pandas as a dependency, given the `.to_data_frame` methods, so perhaps this is the easiest fix.

Looking at `annotations.py` import line, I imagine `datetime.datetime` would also need to be changed to `pandas.Timestamp`(https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Timestamp.html). It also mimics the builtin.

I believe both of these use nanoseconds as their default unit, so proper scaling would need to be checked everywhere given the current microsecond regime.

Alas, this still doesn't quite fix the root issue, which I only realized after writing all that hidden stuff, as the pandas classes just truncate to the closest ns. While it does fix my issue with 512Hz, if I were to use my truly raw 2048Hz data I'd fall to the issue still: 1/2048Hz = 488,281.25ns. Other oddball sample rates would also.

Perhaps the best solution is to just confirm precision hasn't been lost after going through any timedelta or datetime steps, and revert back if it has (this is what I'm doing in my own code after each offending mne call). Is there truly a need to use timedelta at all given neither the input nor the output is?

@cbrnr
Copy link
Contributor

cbrnr commented Dec 28, 2023

You don't even have to use pandas, since NumPy also provides datetime types with higher precision, i.e. numpy.datetime64 and numpy.timedelta64 (https://numpy.org/doc/stable/reference/arrays.datetime.html).

However, to address the underlying issue, we would have to transition from a time-based annotation processing approach to a sample-based approach. I'd be 👍, but this could be a lot of work (but I haven't looked into the code at all yet).

@agramfort
Copy link
Member

agramfort commented Dec 28, 2023 via email

@cbrnr
Copy link
Contributor

cbrnr commented Dec 28, 2023

but then your annotations depend on sampling rates (filtering, resampling, decimation)

Yes, it's a tradeoff. I know that this was a deliberate decision, I just wanted to make it clear that we cannot have infinitely accurate timestamps with time-based indexing. We can make potential errors smaller with np.timedelta64, but we can never make it zero in general.

@agramfort
Copy link
Member

agramfort commented Dec 28, 2023 via email

@JonathanShor
Copy link
Author

np.timedelta64 looks better, but still just pushes the problem down to smaller precisions. "Bad" sample rates (300Hz) still fail.

but then your annotations depend on sampling rates (filtering, resampling, decimation). Having one set of timestampted annotations that works whatever the sampling rates, whatever the features you look at (eg power over time) is to me super useful and much less error prone. And we have "events" files/arrays for index based events.

Agreed that allowing the annotations to be timestamp driven is very useful, but ultimately every action taken on the data will be sample-driven. From the user perspective, MNE is expected to consistently manage this translation behind the scenes.

I appreciate that Annotations are their own independent class, but perhaps at least when a raw/epoch/etc object's method is interacting with an Annotations (raw.set_annotations, raw.crop_by_annotations, others?) a consistency check may be in order. The samples each annotation is referencing before and after the method should not change unless it's an explicit purpose of the method.

Alternative: Make 'onset' and 'duration' visible to the user as timedeltas

The core issue is unexpected behavior from the user perspective due to the hidden conversion from user supplied float timing data to timedeltas that have a different precision cap. If Annotations insisted that 'onset' and 'duration' are always immediately converted to timedelta, and any loss of precision will be seen immediately. Then there is no lack of consistency, at least within MNE.

Perhaps raising a warning when precision is lost would still be necessary, since Annotations constructed from raw.times values could still be inconsistent. But at least its up front for the user to address.

@JonathanShor
Copy link
Author

see my message in #12324

Interesting. In my local fix, I use a tolerance parameter to determine when post-.set_annotations or ._crop_by_annotations Annotations 'onset' or 'duration' values should be overwritten with their prior value: if the difference is less than tol (typically 1/sfreq), the assumption is it is simply due to timedelta rounding, and should be overwritten.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants