Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create Hypnogram class #105

Closed
raphaelvallat opened this issue Nov 26, 2022 · 15 comments · Fixed by #116
Closed

Create Hypnogram class #105

raphaelvallat opened this issue Nov 26, 2022 · 15 comments · Fixed by #116
Labels
enhancement 🚧 New feature or request

Comments

@raphaelvallat
Copy link
Owner

raphaelvallat commented Nov 26, 2022

The way that YASA currently handles hypnogram could be improved. Specifically, I think that we should create a class Hypnogram that would serve as the new default to store and manipulate hypnogram in YASA.

This class would have several methods (from existing functions) such as:

hyp = Hypnogram(values=np.array([...]), resolution="30s")
hyp.get_sleep_statistics()
hyp.get_transition_matrix()
hyp.upsample_to_data()
hyp.find_periods()

hyp.as_str()
hyp.as_int()

# and so much more...!

Any other suggestions welcome!

@remrama
Copy link
Collaborator

remrama commented Nov 26, 2022

Great idea!! I've also been thinking hypnograms could be handled better, but hadn't thought of making a hypnogram class. I like it.

The class would also solve these issues I'd come across:

  • The difference in str/int hypnograms can be confusing, mostly because SleepStaging.predict() returns strings, and most other functions require ints. Of course there are the conversion functions, but it seems like tracking that needn't be on the user. I was thinking each function could have an automatic conversion section if necessary.
  • Checking hypnogram inputs for all the different functions entails lots of repeated code. I was thinking there should at least be a check_hypno() utility function.
  • The str/int difference seems trivial, but in my experience it trips up a lot of new students, especially since YASA currently is not always explicit about the root of the error. The more consistent checks would allow for more explicit errors (though they'd probably become obsolete anyways).

The class idea is a better solution to all these problems. It really puts hypnograms at the center of the package, which I think is appropriate since they are really what makes sleep analyses unique.

After I finish working on the evaluation classes in #78 this is going to be all the more important, because it introduces the concept of hypnograms with different numbers of possible stages. Right now I've added n_stages as a keyword arg to a few functions, which I fear might confuse some users unfamiliar with wearables/actigraphy. This hypnogram class would be a great solution to that. Also the EpochByEpoch evaluation class I'm working on for the same PR could eventually be switch to something like hypno.evaluate_against(hypno). Way simpler.

@raphaelvallat
Copy link
Owner Author

Agree with all the above! These are great points. We do need to come up with a strict input validation that can handle 2, 3, 4 or 5-stages. For example, if you create Hypnogram(values, sf, n_classes=3), then the only accepted values are "WAKE", "NREM", and "REM". With n_classes=5 (default), accepted values are "WAKE", "W", "N1", "N2", "N3", "REM", "R", or 0, 1, 2, 3, 4.

@remrama
Copy link
Collaborator

remrama commented Nov 26, 2022

Yes, initializing the hypnogram class with strings would make checking that way easier because the different staging schemes usually have different names (eg, 2-stage: "Sleep" and "Wake", 4-stage: "Wake", "Light", "Deep", "REM"). I've been struggling to validate n_stages when ints are used because of course someone could request 4 stages but only use a subset of them and it would "look like" 3-stage incorrectly.

So +1 for using strings to start, and then the class could convert to ints with confidence for all the underlying functions 👍

@raphaelvallat
Copy link
Owner Author

Awesome! @remrama do you think we should implement this PR before your PR on performance evaluation? Seems like it would make your life way easier.

@remrama
Copy link
Collaborator

remrama commented Nov 27, 2022

hmm... Well I have it all working at this point (w/ the n_stages argument scattered in a few places). So waiting to implement it wouldn't solve any work burden, but you might want to wait just to avoid having n_stages in a few new places only to be removed shortly after.

Plus I was nervous that n_stages was starting to spread too far into the rest of the codebase anyways. That was almost the only reason I needed to modify a few existing functions (eg, sleep_statistics). It's minimal, but still maybe less than ideal.

So again it probably comes down to whether you are okay with the n_stages implementation having a temporary presence. If so, the evaluation stuff could merge soon and then just get modified whenever the Hypnogram class happens. Actually I was thinking about submitting that PR some time this coming week, so maybe the simplest course of action would be for me to go ahead and submit that PR, then you can see it in more detail and decide whether you think we should wait for the Hypnogram class to merge it or not. If you want to wait, we can leave it sitting there until the Hypnogram class is ready.

@raphaelvallat
Copy link
Owner Author

Looking forward to the PR! Feel free to submit it for now. I'll try to work on the hypnogram PR in the next few weeks. I'd love to release a new major version of YASA (0.7.0) around the new year, with:

  • Your PR
  • The new Hypnogram class
  • An updated classifier for sleep staging

@raphaelvallat
Copy link
Owner Author

FYI I started a new branch for this class here: https://github.com/raphaelvallat/yasa/tree/hypnogram_class

I added a few lines for class creation:

class Hypnogram:
    """Main class for manipulation of hypnogram in YASA."""

    def __init__(self, values, n_stages=5, *, freq="30s", start=None):
        assert isinstance(values, (list, np.ndarray, pd.Series))
        assert isinstance(n_stages, int)
        assert n_stages in [2, 3, 4, 5]
        assert isinstance(freq, str)
        assert isinstance(start, (type(None), str, pd.Timestamp))
        if n_stages == 2:
            accepted = ["S", "W", "SLEEP", "WAKE", "ART", "UNS"]
        elif n_stages == 3:
            accepted = ["WAKE", "W", "NREM", "REM", "R", "ART", "UNS"]
        elif n_stages == 4:
            accepted = ["WAKE", "W", "LIGHT", "DEEP", "REM", "R", "ART", "UNS"]
        else:
            accepted = ["WAKE", "W", "N1", "N2", "N3", "REM", "R", "ART", "UNS"]
        assert all([val.upper() in accepted for val in values]), (
            f"{np.unique(values)} do not match the accepted values for a {n_stages} stages "
            f"hypnogram: {accepted}"
        )
        hypno = pd.Series(values, name="Stage").str.upper()
        hypno = hypno.replace({"S": "SLEEP", "W": "WAKE", "R": "REM"})
        if start is not None:
            hypno.index = pd.date_range(start=start, freq=freq, periods=hypno.size)
        hypno.index.name = "Epoch"
        self._hypno = hypno
        self._freq = freq
        self._start = start
        self._n_stages = n_stages

    def __repr__(self):
        return f"{self._hypno}"

    def __str__(self):
        return f"{self._hypno}"

Example

image

@remrama
Copy link
Collaborator

remrama commented Dec 2, 2022

Beautiful.

I think I'll wait for my evaluation PR until this is available to work with. I had to nuke my previous fork (and in-progress branch) because I'm stupid, and so at this point it'll just be easier to reincorporate my previous code and accommodate this structure simultaneously. Otherwise it will be messy adding an n_stages argument to various places and then reverting it back.

@raphaelvallat
Copy link
Owner Author

Sounds good, I'll try to have a working version with basic methods (plot_hypnogram, sleep_statistics, etc) into master early next week. We can still add new methods later on.

@raphaelvallat
Copy link
Owner Author

FYI made some good progress today on the Hypnogram class. This is going to be a game-changer for YASA!

yasa/yasa/hypno.py

Lines 14 to 351 in f229fab

class Hypnogram:
"""Main class for manipulating sleep hypnogram in YASA."""
def __init__(self, values, n_stages=5, *, freq="30s", start=None):
assert isinstance(values, (list, np.ndarray, pd.Series))
assert isinstance(n_stages, int)
assert n_stages in [2, 3, 4, 5]
assert isinstance(freq, str)
assert isinstance(start, (type(None), str, pd.Timestamp))
if n_stages == 2:
accepted = ["S", "W", "SLEEP", "WAKE", "ART", "UNS"]
mapping = {"WAKE": 0, "SLEEP": 1, "ART": -1, "UNS": -2}
elif n_stages == 3:
accepted = ["WAKE", "W", "NREM", "REM", "R", "ART", "UNS"]
mapping = {"WAKE": 0, "NREM": 2, "REM": 4, "ART": -1, "UNS": -2}
elif n_stages == 4:
accepted = ["WAKE", "W", "LIGHT", "DEEP", "REM", "R", "ART", "UNS"]
mapping = {"WAKE": 0, "LIGHT": 2, "DEEP": 3, "REM": 4, "ART": -1, "UNS": -2}
else:
accepted = ["WAKE", "W", "N1", "N2", "N3", "REM", "R", "ART", "UNS"]
mapping = {"WAKE": 0, "N1": 1, "N2": 2, "N3": 3, "REM": 4, "ART": -1, "UNS": -2}
assert all([val.upper() in accepted for val in values]), (
f"{np.unique(values)} do not match the accepted values for a {n_stages} stages "
f"hypnogram: {accepted}"
)
if isinstance(values, pd.Series):
# Make sure to remove index if the input is a pandas.Series
values = values.to_numpy(copy=True)
hypno = pd.Series(values, name="Stage").str.upper()
# Combine accepted values
map_accepted = {"S": "SLEEP", "W": "WAKE", "R": "REM"}
hypno = hypno.replace(map_accepted)
labels = pd.Series(accepted).replace(map_accepted).unique().tolist()
if start is not None:
hypno.index = pd.date_range(start=start, freq=freq, periods=hypno.size)
timedelta = hypno.index - hypno.index[0]
else:
fake_dt = pd.date_range(start="2022-12-03 00:00:00", freq=freq, periods=hypno.shape[0])
timedelta = fake_dt - fake_dt[0]
hypno.index.name = "Epoch"
self._hypno = hypno
self._n_epochs = hypno.shape[0]
self._freq = freq
self._sampling_frequency = 1 / pd.Timedelta(freq).total_seconds()
self._timedelta = timedelta
self._start = start
self._n_stages = n_stages
self._labels = labels
self._mapping = mapping
def __repr__(self):
return f"{self.hypno}"
def __str__(self):
return f"{self.hypno}"
@property
def hypno(self):
# Q: Should this be called `hyp.stage`?
return self._hypno
@property
def n_epochs(self):
return self._n_epochs
@property
def freq(self):
return self._freq
@property
def sampling_frequency(self):
return self._sampling_frequency
@property
def start(self):
return self._start
@property
def timedelta(self):
return self._timedelta
@property
def n_stages(self):
return self._n_stages
@property
def labels(self):
return self._labels
@property
def mapping(self):
return self._mapping
@mapping.setter
def mapping(self, map_dict):
assert isinstance(map_dict, dict), "`mapping` must be a dictionary, e.g. {'WAKE': 0, ...}"
assert all([val in map_dict.keys() for val in self.hypno.unique()]), (
f"Some values in `hypno` ({self.hypno.unique()}) are not in `map_dict` "
f"({map_dict.keys()})"
)
if "ART" not in map_dict.keys():
map_dict["ART"] = -1
if "UNS" not in map_dict.keys():
map_dict["UNS"] = -2
self._mapping = map_dict
@property
def mapping_int(self):
return {v: k for k, v in self.mapping.items()}
def as_int(self):
"""Return hypnogram as integer.
The default mapping is:
* 2 stages: {"WAKE": 0, "SLEEP": 1, "ART": -1, "UNS": -2}
* 3 stages: {"WAKE": 0, "NREM": 2, "REM": 4, "ART": -1, "UNS": -2}
* 4 stages: {"WAKE": 0, "LIGHT": 2, "DEEP": 3, "REM": 4, "ART": -1, "UNS": -2}
* 5 stages: {"WAKE": 0, "N1": 1, "N2": 2, "N3": 3, "REM": 4, "ART": -1, "UNS": -2}
Users can define a custom mapping:
>>> hyp.mapping = {"WAKE": 0, "NREM": 1, "REM": 2}
"""
return self.hypno.replace(self.mapping).astype(int)
def transition_matrix(self):
counts, probs = transition_matrix(self.as_int())
counts.index = counts.index.map(self.mapping_int)
counts.columns = counts.columns.map(self.mapping_int)
probs.index = probs.index.map(self.mapping_int)
probs.columns = probs.columns.map(self.mapping_int)
return counts, probs
def sleep_statistics(self):
"""
Compute standard sleep statistics from an hypnogram.
Parameters
----------
self : yasa.Hypnogram
Hypnogram, assumed to be already cropped to time in bed (TIB, also referred to as
Total Recording Time, i.e. "lights out" to "lights on").
Returns
-------
stats : dict
Summary sleep statistics.
Notes
-----
All values except SE, SME, SFI and the percentage of each stage are expressed in minutes.
YASA follows the AASM guidelines to calculate these parameters:
* Time in Bed (TIB): total duration of the hypnogram.
* Sleep Period Time (SPT): duration from first to last period of sleep.
* Wake After Sleep Onset (WASO): duration of wake periods within SPT.
* Total Sleep Time (TST): total sleep duration in SPT.
* Sleep Onset Latency (SOL): Latency to first epoch of any sleep.
* SOL 5min: Latency to 5 minutes of persistent sleep (any stage).
* REM latency: latency to first REM sleep.
* Sleep Efficiency (SE): TST / TIB * 100 (%).
* Sleep Maintenance Efficiency (SME): TST / SPT * 100 (%).
* Sleep Fragmentation Index: number of transitions from sleep to wake / hours of TST
* Sleep stages amount and proportion of TST
.. warning::
Artefact and Unscored epochs are excluded from the calculation of the
total sleep time (TST). TST is calculated as the sum of all REM and NREM sleep in SPT.
References
----------
* Iber, C. (2007). The AASM manual for the scoring of sleep and
associated events: rules, terminology and technical specifications.
American Academy of Sleep Medicine.
* Silber, M. H., Ancoli-Israel, S., Bonnet, M. H., Chokroverty, S.,
Grigg-Damberger, M. M., Hirshkowitz, M., Kapen, S., Keenan, S. A.,
Kryger, M. H., Penzel, T., Pressman, M. R., & Iber, C. (2007).
`The visual scoring of sleep in adults
<https://www.ncbi.nlm.nih.gov/pubmed/17557422>`_. Journal of Clinical
Sleep Medicine: JCSM: Official Publication of the American Academy of
Sleep Medicine, 3(2), 121–131.
Examples
--------
Sleep statistics for a 2-stage hypnogram
>>> from yasa import Hypnogram
>>> # Generate a fake hypnogram, where "S" = Sleep, "W" = Wake
>>> values = 10 * ["W"] + 40 * ["S"] + 5 * ["W"] + 40 * ["S"] + 10 * ["W"]
>>> hyp = Hypnogram(values, n_stages=2)
>>> hyp.sleep_statistics()
{'TIB': 52.5,
'SPT': 42.5,
'WASO': 2.5,
'TST': 40.0,
'SE': 76.1905,
'SME': 94.1176,
'SFI': 1.5,
'SOL': 5.0,
'SOL_5min': 5.0,
'WAKE': 12.5}
Sleep statistics for a 5-stages hypnogram, where each epoch is one minute
>>> values = (10 * ["W"] + 4 * ["N1"] + 1 * ["W"] + 30 * ["N2"] + 30 * ["N3"] + 5 * ["W"]
... + 25 * ["REM"] + 15 * ["N2"] + 10 * ["W"])
>>> hyp = Hypnogram(values, freq="1min", n_stages=5)
>>> hyp.sleep_statistics()
{'TIB': 130.0,
'SPT': 110.0,
'WASO': 6.0,
'TST': 104.0,
'SE': 80.0,
'SME': 94.5455,
'SFI': 1.7308,
'SOL': 10.0,
'SOL_5min': 15.0,
'Lat_REM': 80.0,
'WAKE': 26.0,
'N1': 4.0,
'N2': 45.0,
'N3': 30.0,
'REM': 25.0,
'%N1': 3.8462,
'%N2': 43.2692,
'%N3': 28.8462,
'%REM': 24.0385}
"""
hypno = self.hypno.to_numpy()
assert self.n_epochs > 0, "Hypnogram is empty!"
all_sleep = ["SLEEP", "N1", "N2", "N3", "NREM", "REM", "LIGHT", "DEEP"]
all_non_sleep = ["WAKE", "ART", "UNS"]
stats = {}
# TIB, first and last sleep
stats["TIB"] = self.n_epochs
idx_sleep = np.where(~np.isin(hypno, all_non_sleep))[0]
if not len(idx_sleep):
first_sleep, last_sleep = 0, self.n_epochs
else:
first_sleep = idx_sleep[0]
last_sleep = idx_sleep[-1]
# Crop to SPT
hypno_s = hypno[first_sleep : (last_sleep + 1)]
stats["SPT"] = hypno_s.size if len(idx_sleep) else 0
stats["WASO"] = hypno_s[hypno_s == "WAKE"].size if len(idx_sleep) else np.nan
# Before YASA v0.5.0, TST was calculated as SPT - WASO, meaning that Art
# and Unscored epochs were included. TST is now restrained to sleep stages.
stats["TST"] = hypno_s[np.isin(hypno_s, all_sleep)].shape[0]
# Sleep efficiency and fragmentation
stats["SE"] = 100 * stats["TST"] / stats["TIB"]
if stats["SPT"] == 0:
stats["SME"] = np.nan
stats["SFI"] = np.nan
else:
# Sleep maintenance efficiency
stats["SME"] = 100 * stats["TST"] / stats["SPT"]
# SFI is the ratio of the number of transitions from sleep into Wake to TST (hours)
# The original definition included transitions into Wake or N1.
counts, _ = self.transition_matrix()
n_trans_to_wake = np.sum(
counts.loc[
np.intersect1d(counts.index, all_sleep), np.intersect1d(counts.index, ["WAKE"])
].to_numpy()
)
stats["SFI"] = n_trans_to_wake / (stats["TST"] / (3600 * self.sampling_frequency))
# Sleep stage latencies -- only relevant if hypno is cropped to TIB
stats["SOL"] = first_sleep if stats["TST"] > 0 else np.nan
sleep_periods = hypno_find_periods(
np.isin(hypno, all_sleep), self.sampling_frequency, threshold="5min"
).query("values == True")
if sleep_periods.shape[0]:
stats["SOL_5min"] = sleep_periods["start"].iloc[0]
else:
stats["SOL_5min"] = np.nan
if "REM" in self.labels:
# Question: should we add latencies for other stage too?
stats["Lat_REM"] = np.where(hypno == "REM")[0].min() if "REM" in hypno else np.nan
# Duration of each stage
for st in self.labels:
if st == "SLEEP":
# SLEEP == TST
continue
stats[st] = hypno[hypno == st].size
# Remove ART and UNS if they are empty
if stats["ART"] == 0:
stats.pop("ART")
if stats["UNS"] == 0:
stats.pop("UNS")
# Convert to minutes
for key, value in stats.items():
if key in ["SE", "SME"]:
continue
stats[key] = value / (60 * self.sampling_frequency)
# Proportion of each sleep stages
for st in all_sleep:
if st in stats.keys():
if stats["TST"] == 0:
stats[f"%{st}"] = np.nan
else:
stats[f"%{st}"] = 100 * stats[st] / stats["TST"]
# Round to 4 decimals
stats = {key: np.round(val, 4) for key, val in stats.items()}
return stats
def plot_hypnogram(self):
# This function requires mapping to be defined
raise NotImplementedError
def upsample(self, new_freq, **kwargs):
"""Upsample hypnogram to a higher frequency.
Frequency here is defined with a pandas Offset, e.g. "10s" or "1min".
Returns a copy: the original hypnogram is not modified in place.
"""
assert pd.Timedelta(new_freq) < pd.Timedelta(self.freq), (
f"The upsampling `new_freq` ({new_freq}) must be higher than the current frequency of "
f"hypnogram {self.freq}"
)
if isinstance(self.hypno.index, pd.DatetimeIndex):
new_hyp = self.hypno.resample(new_freq, origin="start", **kwargs).ffill()
else:
new_hyp = self.hypno.copy()
new_hyp.index = self.timedelta
new_hyp = new_hyp.resample(new_freq, **kwargs).ffill().reset_index(drop=True)
new_hyp.index.name = "Epoch"
return Hypnogram(values=new_hyp, n_stages=self.n_stages, freq=new_freq, start=self.start)

@remrama
Copy link
Collaborator

remrama commented Dec 3, 2022

Awesome!

btw I had an idea for a method here, something you might already have plans for. Below is a general idea of what I'm thinking, main purpose is to get a dataframe for exporting that has all hypno-info on it. I like to export these at the end of an auto-staging script so I can load it in elsewhere for plotting. So I might overdo it a bit, but I like to have all info possible here. Also, including things like onset and duration make it a BIDS-compatible events file.

Sorry I'm being lazy with variables here (just copy/pasting from my personal scripts), but I think you'll get the idea.

class Hypnogram:
    def to_dataframe(self):
        """Something that compiles all epoch-level info into a single dataframe for exporting."""
        epochs = np.arange(hypno.size)
        df = pd.DataFrame(
            {
                "epoch": epochs,
                "value": yasa.hypno_str_to_int(hypno),
                "stage": hypno,
                "onset": epochs * 30,
                "duration": 30,
            }
        )
        df = df.set_index("epoch").join(sls.predict_proba())
        return df

@remrama
Copy link
Collaborator

remrama commented Dec 3, 2022

Another suggestion: add a scorer or scorer_id property. Any string would be accepted, but would commonly be initials of a human scorer or name of algorithm. For example, the returned hypno from yasa.SleepStaging could have a scorer_id of "YASA" or "YASAv0.6.2" if you wanted to be more specific.

@raphaelvallat
Copy link
Owner Author

Yes and Yes! Love these ideas.

For scorer, we could even make it as the default name of the resulting pd.Series?

@remrama
Copy link
Collaborator

remrama commented Dec 4, 2022

Yes I was thinking that too, using the pd.Series name. I saw right now you name the pd.Series Stage, which also makes sense. Either works.

Maybe too avoid confusion you might not accept a pd.Series as input, which would make it clearer that YASA is going to control that. Not a big difference for users, since they could just use Series.values if they already had one, plus I imagine that 99% of the time users will initiate a new yasa.Hypnogram instance using a list or array anyways. Or maybe you foresee a different use I'm not thinking of.

@raphaelvallat
Copy link
Owner Author

I also wasn't sure about accepting pd.Series but eventually decided that it might make things easier for beginner users who just loaded their hypnogram from a CSV file in Pandas. YASA is converting to an array internally to make sure we don't mess things up with the previous index.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement 🚧 New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants