-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Annotation timing info loss due to datetime.timedelta precision limit #12327
Comments
Hello! 👋 Thanks for opening your first issue here! ❤️ We will try to get back to you soon. 🚴 |
thanks for the detailed bug report. Do you have a suggestion for how to
address this issue?
… Message ID: ***@***.***>
|
Getting away from using Using pandas datetime equivalents would achieve nanosecond resolution`pandas.Timedelta` has nanosecond resolution and per the docs(https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Timedelta.html) "equivalent of python’s datetime.timedelta and is interchangeable with it in most cases". I suppose you must already have pandas as a dependency, given the `.to_data_frame` methods, so perhaps this is the easiest fix. Looking at `annotations.py` import line, I imagine `datetime.datetime` would also need to be changed to `pandas.Timestamp`(https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Timestamp.html). It also mimics the builtin. I believe both of these use nanoseconds as their default unit, so proper scaling would need to be checked everywhere given the current microsecond regime. Alas, this still doesn't quite fix the root issue, which I only realized after writing all that hidden stuff, as the pandas classes just truncate to the closest ns. While it does fix my issue with 512Hz, if I were to use my truly raw 2048Hz data I'd fall to the issue still: 1/2048Hz = 488,281.25ns. Other oddball sample rates would also. Perhaps the best solution is to just confirm precision hasn't been lost after going through any timedelta or datetime steps, and revert back if it has (this is what I'm doing in my own code after each offending mne call). Is there truly a need to use timedelta at all given neither the input nor the output is? |
You don't even have to use pandas, since NumPy also provides datetime types with higher precision, i.e. However, to address the underlying issue, we would have to transition from a time-based annotation processing approach to a sample-based approach. I'd be 👍, but this could be a lot of work (but I haven't looked into the code at all yet). |
You don't even have to use pandas, since NumPy also provides datetime
types with higher precision, i.e. numpy.datetime64 and numpy.timedelta64 (
https://numpy.org/doc/stable/reference/arrays.datetime.html).
+1
However, to address the underlying issue, we would have to transition from
a time-based annotation processing approach to a sample-based approach. I'd
be 👍, but this could be a lot of work (but I haven't looked into the code
at all yet).Message ID: <mne-tools/mne-python/issues/12327/1870949098@
github.com>
but then your annotations depend on sampling rates (filtering, resampling,
decimation). Having one set of timestampted annotations that works whatever
the sampling rates, whatever the features you look at (eg power over time)
is to me super useful and much less error prone. And we have "events"
files/arrays for index based events.
|
Yes, it's a tradeoff. I know that this was a deliberate decision, I just wanted to make it clear that we cannot have infinitely accurate timestamps with time-based indexing. We can make potential errors smaller with |
Agreed that allowing the annotations to be timestamp driven is very useful, but ultimately every action taken on the data will be sample-driven. From the user perspective, MNE is expected to consistently manage this translation behind the scenes. I appreciate that Annotations are their own independent class, but perhaps at least when a raw/epoch/etc object's method is interacting with an Annotations (raw.set_annotations, raw.crop_by_annotations, others?) a consistency check may be in order. The samples each annotation is referencing before and after the method should not change unless it's an explicit purpose of the method. Alternative: Make 'onset' and 'duration' visible to the user as timedeltasThe core issue is unexpected behavior from the user perspective due to the hidden conversion from user supplied float timing data to timedeltas that have a different precision cap. If Annotations insisted that 'onset' and 'duration' are always immediately converted to timedelta, and any loss of precision will be seen immediately. Then there is no lack of consistency, at least within MNE. Perhaps raising a warning when precision is lost would still be necessary, since Annotations constructed from raw.times values could still be inconsistent. But at least its up front for the user to address. |
Interesting. In my local fix, I use a tolerance parameter to determine when post-.set_annotations or ._crop_by_annotations Annotations 'onset' or 'duration' values should be overwritten with their prior value: if the difference is less than |
Description of the problem
Sub-microsecond timing data is lost when pushed through a datetime.timedelta object step, for example in mne.annotations.crop. As raw.set_annotations calls crop internally, this can lead to misalignment of the created annotations and raw.times, with the annotations covering slightly different frames than expected.
This will only affect cases where 1/sfreq has sub-microsecond precision, but 512Hz is common in my experience, and 1/512 = 0.001953125s.
Steps to reproduce
Link to data
No response
Expected results
The annotation details shouldn't be lost when attached to the raw object. The correct original range of frames should be as precisely obtainable from the timing data as before attached to the raw object.
annotation[0]['onset'] and raw.annotations[0]['onset'] should be equal at 0.001953125.
raw.time_as_index(raw.annotations[0]['onset']) should return array([1]), corresponding to how the annotation was created.
Actual results
annotation[0]['onset'] is 0.001953125 while raw.annotations[0]['onset'] is 0.001953.
raw.time_as_index(raw.annotations[0]['onset']) returns array([0])
Additional information
Platform Linux-3.10.0-1160.102.1.el7.x86_64-x86_64-with-glibc2.17
Python 3.11.5 (main, Sep 5 2023, 13:23:33) [GCC 4.8.5 20150623 (Red Hat 4.8.5-44)]
Executable /isilon/LFMI/VMdrive/Jonathan/PriorWeightingMooney/Visual/ECoG/Code/.venv/bin/python
CPU x86_64 (20 cores)
Memory 96.2 GB
Core
├☑ mne 1.5.1
├☑ numpy 1.26.2 (OpenBLAS 0.3.23.dev with 20 threads)
├☑ scipy 1.11.4
├☑ matplotlib 3.8.2 (backend=module://matplotlib_inline.backend_inline)
├☑ pooch 1.8.0
└☑ jinja2 3.1.2
Numerical (optional)
├☑ sklearn 1.3.2
├☑ nibabel 5.1.0
├☑ nilearn 0.10.2
├☑ pandas 2.1.3
└☐ unavailable numba, dipy, openmeeg, cupy
Visualization (optional)
├☑ pyvista 0.42.3 (OpenGL unavailable)
├☑ pyvistaqt 0.11.0
├☑ vtk 9.3.0
├☑ ipympl 0.9.3
├☑ ipywidgets 7.8.1
└☐ unavailable qtpy, pyqtgraph, mne-qt-browser, trame_client, trame_server, trame_vtk, trame_vuetify
Ecosystem (optional)
└☐ unavailable mne-bids, mne-nirs, mne-features, mne-connectivity, mne-icalabel, mne-bids-pipeline
The text was updated successfully, but these errors were encountered: