evaluation module #130

remrama · 2023-01-01T00:19:50Z

This PR adds a new yasa.evaluation module to YASA. The purpose of the module is to offer convenience methods for evaluating/comparing sleep-staging from multiple devices or raters. See Issue #78 for more detail.

Major additions:

yasa.evaluation.EpochByEpochEvaluation class. This class takes in two YASA Hypnograms and returns an instance where agreement scores and comparison plots can be returned. Suggested use is from the evaluate Hypnogram method: ebe = hyp_a.evaluate(hyp_b).
yasa.evaluation.SleepStatsEvaluation class. This class takes in a pandas DataFrame that includes sleep statistics calculated from two different raters/devices, for multiple subjects/sessions.

Minor changes:

pingouin>=0.5.3 requirement. Pingouin is used for a lot of the underlying statistics, and 0.5.3 is required because SleepStatsEvaluation.plot_blandaltman uses a feature from pingouin.plot_blandaltman that was introduced in 0.5.3.
Minor changes to yasa.plot_hypnogram to help accommodate EpochByEpochEvaluation.plot_hypnograms. Most notably, the explicit lw keyword argument was removed and replaced with **kwargs, which can include lw but also other parameters controlling line aesthetics (e.g., ls, alpha). Note that because lw is still accepted, this will not break previous code.

Future additions (things to consider but will not be implemented in the current PR):

Keyword argument to jitter the overlaid hypnogram line (to avoid direct overlap when they are the same, which can be confusing). This is tricky now with the way that y-values are derived in yasa.plot_hypnogram, so avoiding for now.
Group EpochByEpoch analysis. There are some ways to analyze group-level Epoch-By-Epoch results (beyond getting sleep statistics and using SleepStatsEvaluation). I'm not sure if this should just be a tutorial with examples or an explicit function/class in YASA. It's mostly just aggregating individual Epoch-By-Epoch results.
A tutorial or function/method to output a detailed "report" rather than individual results. Currently, users can of course compile their own report from the individual outputs, but if there is truly a standardized reporting that is accepted in the field (e.g., so meta-analyses are easier), then we could spit out this exact output.

@raphaelvallat I'm posting this still in draft state, not ready for review yet. A few questions to address before finalizing:

Currently, the plotting code is all baked into the yasa.evaluation classes. Previously, I've been arguing for keeping all plotting code and imports to be isolated in yasa.plotting. So I'm planning to port it over there, but I want to check this is the right approach before I do this. Am I right in thinking they should stay separate or am I overthinking that? And if all the plotting code is isolated in yasa.plotting, it seems like those functions should have leading underscores and be removed from the API docs (the docs would in the class methods instead). This could be how the evaluation plots start, and yasa.plot_hypnogram could be converted to yasa._plot_hypnogram later in v0.8. But again, I'd like a second opinion on whether this is best.
Some terminology I'm not totally satisfied with: test vs reference (ref) devices? true vs pred is more consistent with scikit-learn, but "true" is a bit of a misnomer in the sleep staging case. Could also just be a vs b. Probably overthinking it, I just know we're kinda locked in to whatever we start with.

codecov-commenter · 2023-01-01T00:24:05Z

Codecov Report

Attention: Patch coverage is 18.47291% with 331 lines in your changes are missing coverage. Please review.

Project coverage is 83.90%. Comparing base (776a257) to head (b2e3624).

Files	Patch %	Lines
yasa/evaluation.py	14.06%	330 Missing ⚠️
yasa/hypno.py	66.66%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #130      +/-   ##
==========================================
- Coverage   92.53%   83.90%   -8.64%     
==========================================
  Files          23       24       +1     
  Lines        3108     3498     +390     
==========================================
+ Hits         2876     2935      +59     
- Misses        232      563     +331

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

raphaelvallat · 2023-01-01T04:14:47Z

Just scrolled through the new files very quickly, this looks incredible!! Can't wait to try it and review in details in the next few days.

To answer your questions:

Hmm, I actually might vote for including the new plotting functions in the evaluation.py file. I worry that creating alias for the functions with leading underscore in plotting.py might be a bit confusing code-wise. But if you feel strongly about moving everything matplotlib-related to plotting.py then let's do it.
You're right — we need to choose carefully now. I think test and reference is a good choice. Menghini et al have used reference and device in their original R pipeline, but I like test better than device.

remrama · 2023-01-01T18:25:31Z

I actually might vote for including the new plotting functions in the evaluation.py file.

Agreed 👍

I think test and reference is a good choice. Menghini et al have used reference and device in their original R pipeline, but I like test better than device.

Agreed. device is a bit misleading because it assumes wearables, but this module is in no way restricted to that case.

Some more questions, and then I'll just leave any more for the real review:

I'm thinking I should convert these to "Results" classes, just to match your yasa.detection module convention. Agreed?
I'm adding the more formal @property and __repr__/__str__, is that overkill or appropriate here?
The EpochByEpoch class starts by trimming to the shorter hypnogram. It's slightly cumbersome code and there are probably going to be other needs for that. I don't wanna overpopulate yasa.Hypnogram but do you think there should be a .trim method that takes a new n_epochs and returns a new Hypnogram instance that is otherwise identical? Alternatively, there could be a .trim_to that could take another Hypnogram and return two Hypnograms matched in n_epochs (this would handle the "find Hypnogram with minimum length" part, but less generalizable).

raphaelvallat · 2023-01-02T17:46:52Z

I'm thinking I should convert these to "Results" classes, just to match your yasa.detection module convention.

Yep sounds good

I'm adding the more formal @Property and repr/str, is that overkill or appropriate here?

Appropriate! I would only add the @property decorator to attributes that are useful to the users. Decorated attributes will be automatically included in the Sphinx documentation and we don't want to end up with an endless list of attributes.

The EpochByEpoch class starts by trimming to the shorter hypnogram.

I actually wanted to tell you that this should be the responsibility of the users. I wouldn't trim at all, just return an error/warning if the hypnogram length and index do not match, especially if the index of the Hypnogram is a pandas.RangeIndex. I guess we could trim/join if the index of the two hypnograms are pd.DatetimeIndex

Do you think there should be a .trim method that takes a new n_epochs and returns a new Hypnogram instance that is otherwise identical?

Nope. I think it's easy enough for the users to regenerate a new Hypnogram after using hyp.hypno.loc[].

yasa/evaluation.py

remrama · 2023-01-02T22:45:01Z

@raphaelvallat I'm still in draft mode! I've been making major changes the last few days. Please wait until I leave draft mode before a review, otherwise I'm afraid this will be waste of your time. Although if you're in a rush to merge this, then also let me know and I'll take quicker steps. Otherwise I except to have the PR finalized and ready for review by tomorrow end of day.

raphaelvallat · 2023-01-02T23:17:37Z

Oh shoot sorry about that! I'll be travelling for the next two days and unable to review but I can review later this week. No rush at all on the PR though!

remrama · 2023-01-03T04:39:59Z

No worries! It would just suck if you put in lots of effort on old code. No time wasted so far though, your current comments are still useful and relevant so I'll incorporate them. Thanks! 👍

remrama · 2023-12-23T19:07:33Z

@raphaelvallat, I just pushed some old (and some new) commits. I'm catching up and getting it ready for review. The EpochByEpoch part is done but still working out the SleepStats class, the latter of which is not functional right now. A few more commits and it should be ready for review. I'm with you in publishing the next YASA version over winter break 👍

raphaelvallat · 2024-01-12T20:12:58Z

Hi @remrama ! Unfortunately I was sick for my entire week of vacation during the winter break and did not have a chance to work on the open-source libraries at all (and work has been crazy ever since I got back) 😞 Are you still planning on adding more commits soon?

raphaelvallat · 2024-01-13T14:11:19Z

Can you please rebase to the latest master (version 0.6.4)? Let me know if you need help resolving conflicts.

remrama · 2024-01-16T21:11:06Z

Same situation for me @raphaelvallat -- out sick for the end of break and then lots of stuff going on at work upon return. But yes, more commits coming soon. I will try to rebase as well.

docstrings examples plot_hypnogram lw --> linekwargs heatmap colorbar label pass kwargs through to all pingouin calls docstrings examples update setting attrs, docstrings, var name changes SleepStatsEval takes 2 dataframes as input, reshaping is done internally EpochByEpoch accepts sequences of Hypnograms for group evaluation EpochByEpoch gets sleep stats SleepStats move statistical tests to __init__() better plotting flexibility and baked-in sleepstats_order

cleanup

use hyp integers instead of strings for big skm agreement speed improvements (~2s) fmt quick comment addresses mad typo pd.crosstab --> skm.confusion_matrix

minor normality bug black fmt makes my pandas chains lonnggggg diff_data --> discrepancies trailing not leading _kwargs

class methods alphabetical order

…luation

remrama · 2024-02-12T14:26:34Z

@raphaelvallat this is available to review at your convenience. There are still some things to do, but the core structure is done so I think we could move forward.

After review, I'll still need to:

Add Bland-Altman plots
Test accuracy
Update pytests
Update CHANGELOG

raphaelvallat

What an absolute beast of a PR @remrama ! I read through the entire code and was really impressed by its quality, efficiency and extensive documentation. I have not yet tried to run the full pipeline but I trust that it is working as expected. I think that we should merge this PR now for the sake of simplicity, and then you and I can work on finishing the tasks that you listed in the PR description and adding an extensive tutorial in future PRs.

Once these are done, I think you need to publicize this amazing work (and get some recognition for it!), e.g. as a Twitter thread and/or as a blog post on the NSRR website.

I've included minor comments but none of these are blockers for merging the current PR.

Thanks again, really appreciate all this work. This is going to be a huge contribution to the sleep research/data science ecosystem.

PS: Can you please have a look into the conflicts on plotting.py?

PS2: Please run black . -l 100 with black>=24.1.1

yasa/evaluation.py

remrama · 2024-02-26T15:02:49Z

Merci beaucoup @raphaelvallat, the compliments are really appreciated.

I like your plan of merging and then working on the last bits in separate PR(s). I think it's safest to open up one issue that has all the tasks that must happen before v0.7 (e.g., testing), and then independent issues for things that are ideal but not as necessarily required (e.g., bland-altman plots). Let me know if you would like it implemented a different way.

There's one other thing I'd like to get nailed down before merging this: module name. We started with evaluation, but then I was thinking agreement is nicer because it more implies the evaluation of two scorers against each other. Other options are assessment or, as sklearn handles it, metrics. I assume the classes should match the module name (e.g., evaluate.EpochByEpochEvaluation ), but maybe leaving the module and class names separate (e.g., evaluation.EpochByEpochAgreement) makes sense if you envision other "evaluation" features being part of this module down the road). Let me know your preference and I'll make sure it's consistent before merging.

raphaelvallat · 2024-03-02T11:57:24Z

@remrama I have no strong preference between evaluation.EpochByEpochAgreement or agreement.EpochByEpochAgreement, so if you prefer the latter please go ahead with it!

I think it's safest to open up one issue that has all the tasks that must happen before v0.7 (e.g., testing), and then independent issues for things that are ideal but not as necessarily required (e.g., bland-altman plots).

Sounds great 👍

remrama · 2024-04-01T03:51:17Z

PS: Can you please have a look into the conflicts on plotting.py?

I'm not sure where the conflict started, but I needed to be able to pass **kwargs into yasa.plot_hypnogram for drawing the two overlaid hypnograms in EpochByEpochAgreement. I think it's resolved now. The only change is that lw is not a specific keyword argument now, but it will pass through to the relevant matplotlib functions just the same as a **kwarg, so there should be no breaks in previous code.

remrama added 7 commits December 30, 2022 21:06

first draft

fd6bbff

plot_hypnogramS method

af22cc0

docstrings examples

cfa1f6b

plot_hypnogram lw --> linekwargs

f036da3

heatmap colorbar label

16fc5bd

pass kwargs through to all pingouin calls

c042553

docstrings examples update

70f3627

remrama added the enhancement 🚧 New feature or request label Jan 1, 2023

remrama linked an issue Jan 1, 2023 that may be closed by this pull request

Add standardized framework for wearable validation #78

Closed

remrama self-assigned this Jan 1, 2023

raphaelvallat self-requested a review January 1, 2023 03:53

remrama added 3 commits January 1, 2023 22:32

setting attrs, docstrings, var name changes

cfaf8b6

SleepStatsEval takes 2 dataframes as input, reshaping is done internally

e490cc9

EpochByEpoch accepts sequences of Hypnograms for group evaluation

a8a3b1c

EpochByEpoch gets sleep stats

a9b784a

raphaelvallat reviewed Jan 2, 2023

View reviewed changes

raphaelvallat self-requested a review January 2, 2023 22:41

remrama added 2 commits January 2, 2023 17:59

SleepStats move statistical tests to __init__()

b4df021

better plotting flexibility and baked-in sleepstats_order

ba06ab1

remrama added 3 commits January 3, 2023 20:46

major restructure of attributes/methods and scores calculations

1eb95ce

cleanup

48f1df6

alternate ovr agreement implementation

120b097

remrama added 2 commits December 18, 2023 16:48

major EpochByEpoch restructure

dfd7005

class methods alphabetical order

349bfe7

remrama added 10 commits February 11, 2024 14:19

SleepStatsAgreement major restructure

e0ff2fe

first draft

44869c2

major restructure of attributes/methods and scores calculations

b47385b

cleanup

alternate ovr agreement implementation

28f268b

use hyp integers instead of strings for big skm agreement speed improvements (~2s) fmt quick comment addresses mad typo pd.crosstab --> skm.confusion_matrix

3 group-hypnogram plotting options, need feedback

510b4a6

minor normality bug black fmt makes my pandas chains lonnggggg diff_data --> discrepancies trailing not leading _kwargs

major EpochByEpoch restructure

7d3fd15

class methods alphabetical order

SleepStatsAgreement major restructure

b643d92

Merge branch 'evaluation' of https://github.com/remrama/yasa into eva…

3316763

…luation

formatting

8e575bd

remrama marked this pull request as ready for review February 12, 2024 14:22

raphaelvallat approved these changes Feb 25, 2024

View reviewed changes

yasa/evaluation.py Outdated Show resolved Hide resolved

yasa/evaluation.py Outdated Show resolved Hide resolved

yasa/evaluation.py Show resolved Hide resolved

yasa/evaluation.py Show resolved Hide resolved

remrama added 3 commits February 26, 2024 10:40

remove pingouin requirement

8dbfff2

addressed minor PR review comments

0447edc

formatting

ada6ab0

remrama added 2 commits March 4, 2024 13:27

black formatting

f3ed7aa

Merge branch 'master' into evaluation

b2e3624

remrama merged commit c734753 into raphaelvallat:master Apr 1, 2024
13 of 15 checks passed

remrama deleted the evaluation branch April 1, 2024 04:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

evaluation module #130

evaluation module #130

remrama commented Jan 1, 2023 •

edited

Loading

codecov-commenter commented Jan 1, 2023 •

edited

Loading

raphaelvallat commented Jan 1, 2023

remrama commented Jan 1, 2023

raphaelvallat commented Jan 2, 2023

remrama commented Jan 2, 2023

raphaelvallat commented Jan 2, 2023

remrama commented Jan 3, 2023

remrama commented Dec 23, 2023

raphaelvallat commented Jan 12, 2024

raphaelvallat commented Jan 13, 2024

remrama commented Jan 16, 2024

remrama commented Feb 12, 2024 •

edited

Loading

raphaelvallat left a comment •

edited

Loading

remrama commented Feb 26, 2024 •

edited

Loading

raphaelvallat commented Mar 2, 2024

remrama commented Apr 1, 2024

evaluation module #130

evaluation module #130

Conversation

remrama commented Jan 1, 2023 • edited Loading

codecov-commenter commented Jan 1, 2023 • edited Loading

Codecov Report

raphaelvallat commented Jan 1, 2023

remrama commented Jan 1, 2023

raphaelvallat commented Jan 2, 2023

remrama commented Jan 2, 2023

raphaelvallat commented Jan 2, 2023

remrama commented Jan 3, 2023

remrama commented Dec 23, 2023

raphaelvallat commented Jan 12, 2024

raphaelvallat commented Jan 13, 2024

remrama commented Jan 16, 2024

remrama commented Feb 12, 2024 • edited Loading

raphaelvallat left a comment • edited Loading

Choose a reason for hiding this comment

remrama commented Feb 26, 2024 • edited Loading

raphaelvallat commented Mar 2, 2024

remrama commented Apr 1, 2024

remrama commented Jan 1, 2023 •

edited

Loading

codecov-commenter commented Jan 1, 2023 •

edited

Loading

remrama commented Feb 12, 2024 •

edited

Loading

raphaelvallat left a comment •

edited

Loading

remrama commented Feb 26, 2024 •

edited

Loading