-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
yasa mice branch #72
Comments
This is great @matiasandina!! I'm sure this will be super useful to others. Can I ask: did you re-train a new classifier using these updated features? If so, can you describe the training set as well? And more importantly, would you be willing to share the training set and/or the trained classifier (lightgbm tree paths)? |
The dataset was collected from OSF. It contains mouse EEG/EMG recordings (sampling rate: 512 Hz) and sleep stage labels (epoch length: 2.5 sec). |
Thanks @matiasandina! I think that if we are to create a new branch (like "yasa_mice"), the minimum that we need is:
And then, we can create a separate repo (see https://github.com/raphaelvallat/yasa_classifier as an example) with the code to reproduce the trained classifier, i.e. model training, data partitioning, performance evaluation, etc. A few questions:
|
This is what I can do for now, please let me know if you can reproduce it.
Performance is quite high, above 90 for accuracy and a bit lower for cohen's kappa. I think it can be smoothed out to gain even more performance but I'm OK so far. You can find performance values in this folder, which includes accuracy, cohen's kappa, and classification matrices for each of the 50 4h recordings that I used for testing. The dataset contains one EEG and one EMG. I believe the choice of electrode will for sure affect the algorithm. Again, data in mice is not as standard. Even for labs that have detailed information, they see that the brain itself enters stages at slightly different moments. Below a result from LFP electrodes by Soltani 2019 I am collecting with a "high throughput" (9 EEG + 2 EMG), so I will have more to say about this in the future. Most critical thing is, classifier accuracy will be extremely dependent on the nature of the data itself. Mice data is not standard (in the same way human data is somewhat standardized). I expect all labs to need to re-train with data they generated themselves before getting good results with any classifier. I have not done so because I haven't collected enough data yet, but I don't see this as being an issue in the future (provided me and a few others in lab can label our own data!). An alternative would be to find ways of normalizing datasets so that the feature extractor would extract the same values for features in all. This sounds easier said than donw, but it could be great in light of people sharing open datasets like the one I used for training. |
This is really great work @matiasandina, thank you!
Could you say more? What if we only include features that are normalized to the mean and unit variance of the recording, i.e. we normalize data amplitude across recordings?
I am still undecided on what is the best way to go, i.e. a separate
Do you think that the length of the input data will change the output? Can you just pass 24-hour recordings or even multi-day recording to the sleep staging function? If there are mice researchers following this thread, please feel free to chime in with your ideas and preferences :) |
Mice data is not standardized in the same way human data might be. A few quick examples of these.
This is not true for mice. The coordinates are usually modified as per what the experimenter wants (one valid reason to do so is the fact that you might want to implant something in addition to the EEGs into the brain and you don't have space if you follow what others have done).
This is not to say the data is corrupted or low quality, it's just less industrial (?), less plug and play as I imagine human data would be.
The training was done with 24 h recordings. I don't think the length of the input data would change the output. I don't have multi-day at hand right now but it would be nice to try. Regarding the branch thing...Maybe the question is whether the branches would diverge so much that it's more burden to code handlers than to split them. I think it might be worth to keep it all together. Not sure how they implemented this in code, but people from Deep Lab Cut have taken the route of |
@matiasandina thanks for the detailed explanation! Another naive question: is mice sleep similar to rat sleep? Do you think your classifier would work well on rats data? Maybe another option is to have some sort of configurable file that determine the features, e.g. which feature to compute, the length of the epoch, the length of the smoothing window, etc. That way, you just need the config file (*.json or *.yaml) and the updated classifier to run YASA on another species. |
Sorry for the late reply. Provided re-training, I think this is flexible to work multi-species. I like the idea of a config file that determines the features! It's been a bit difficult to find some time to work on this since coming back from vacation and trying to get my PhD in motion again, but it's on the list! |
Hi, I am also trying to apply the yasa to mice data for auto stagin, and I am wondering how much progress you have until now. I applied 5-s window to slide split the EEG and EMG data. And I applied the features as followed:
I used the 14 features and trained some lightGBM models (Because we have different types of mice, so I trained several model) based on our own human labeled data. Consistent to the paper of YASA, I used grid search for the hyperparams. I got the model can predict Wake and NREM for over 0.8 in F1 Score, but for REM the median F1 Score is only about 0.5. See the picture below. I think the REM stage is the easiest stage for detection, but somehow the REM accuracy is the lowest. I don't know do you have the situation same with me? @matiasandina . And then I checked the model prediction probability for all stages, and found seems that the REM probility is always keep in low level. I don't know is this is the normal situation. Same situation for me with @matiasandina , I also found the result of prediction is fragmanted #139 . And I added some constraints to make it more smooth, like this: I want to solve the REM low accuracy issue, do you have any suggestion? @raphaelvallat Thank you, |
At some point, I started facing significant bias towards Wake. This problem was much more damaging to the project than the fragmentation and other issues I previously mentioned and stopped me from trying to continue to integrate this into the package. I tried to re-train the classifier many times (see here). Though I could never get to the bottom of why there was some sort of data drift or change in the classifier's behavior. From my classifiers, I'm getting very high wake percentages (around 70-80%). This is specially bad since a single feature classifier based on As I mentioned before either on this issue on another one, the fact that the lightGBM uses computed features is sensitive to data drift and I think in order to really use this a lab should ensure train-test set come from the same recording devices. Upon manual inspection, the open training data and my data do not differ in ways that a human would have issues with (or that a I attempt to quantify with features). And yet, the yasa classifier I trained went bananas. When it was working better, REM was somewhat shorter than what a human would do, and the transitions would be quite strange (e.g., NREM -> W -> REM -> NREM -> W instead of NREM -> REM -> REM -> REM -> W). This last point is understandable since there are no rules to rein this behavior (W->REM transitions are allowed). I am happy to talk about this at length and share data/ideas, but I hit a bit of a dead end on my side |
Well, I see. We got different recording devices, but I found the lightGBM classifiers have almost the same performance among different device data. See the comment above, the mousetype3 data was recorded by different device. In my situation, I got the EEG/EMG data of different ages' mice, so I trained a set of lightGBM models across age. See auto_stage_model. And I distinguished the Frontal and Parietal, because the REM is more specific in parietal.
I also noticed that the wake data is much more than NREM and REM, my proportion of Wake, NREM and REM is about 14:7:1. So I gathered the training data of the same proportion to 1:1:1, and retrained the model, found there is no significant difference. My wrong cases on REM detection is more like Maybe you have heard about DeepLabCut, they have a fine-tune process with own video data. I was trying to follow that, and the idea is like 'I label some data and use the labeled data to fine-tune the lightGBM model', but found that small set of labeled data cannot tune the lightGBM model. So I am tring other strategies, but I think this can be a good idea. And I think if apply the CNN to train model, maybe we can fine-tune it. I saw you trianed the model based on AccuSleep? Is that perform better on mice data? I haven't try that yet. Thanks, |
This issue contains brief details of what I changed to adapt
staging.py
to work with the recordings I had from mice.The most significant change is the use of
epoch_sec
inget_features()
,fit()
, andsliding_window()
.I don't remember why I kept this
min()
call. Myepoch_sec
was 2.5 seconds, so I didn't test what happens whenepoch_sec
is different.I removed the temporal axis because mice don't sleep in one lump. I think this might also help with classifying human napping data. I just commented it out, but it wouldn't be difficult to put a conditional statement there or have a better solution.
I changed the units, which I think has been superseded in #59
Another minor thing is the naming of features, that hardcodes the "min" into the variable name. I would consider using "epoch" instead of "min".
In the future, I also plan to change this, because I expect to be able to run yasa in real-time.
I think these lines create problems for people used to mice data because it's usually the case that they don't use all these ratios. For my classifier, I used them and I think they contain value, but it would be nice to check whether things are present before calculating ratios.
Below everything you can find the full file.
The text was updated successfully, but these errors were encountered: