Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to Handle Unequal Number of Trials Across Different Classes in MVPA Light? #48

Open
darianyao opened this issue Jul 12, 2024 · 7 comments

Comments

@darianyao
Copy link

Hi, Mr treder. I am currently working with MVPA Light and have encountered an issue where the number of trials across different classes is not equal. Could you please advise on the best practices or methods within MVPA Light to handle this imbalance? Any guidance or examples would be greatly appreciated. Thank you very much!

@darianyao
Copy link
Author

Additionally, I noticed that there are examples for analyzing MEEG data. These examples have been tremendously beneficial to me. Thank you for your assistance. And could you provide a similar example for classifying fMRI data using MVPA Light? This would be extremely helpful for beginners. Thank you again!

@treder
Copy link
Owner

treder commented Jul 24, 2024

Hi @darianyao ! You can used the preprocessing pipeline to either oversample the minority classes or undersample that majority classes, see the preprocessing examples.

Regarding fMRI data, it would be indeed be nice to have concrete examples in the toolbox examples here. For now, you can refer to the example I mentioned in the MVPA-Light paper, namely the analysis of the Haxby dataset. The code is here. I hope this gives you a useful starter.

Let me know whether this answers your queries.

@darianyao
Copy link
Author

Thank you for your reply. I have resolved the data imbalance issue by studying the examples in the toolbox. However, I seem to have encountered a bug. When I set the parameters cfg.cv = 'leaveout' and cfg.metric = 'auc', the results returned by mv_classify
function are always 0.

@darianyao
Copy link
Author

Additionally, I have another issue. When using PCA for preprocessing and neighborhoods (searchlight in the time dimension) for analysis, the mv_classify function throws an error. I suspect this might be because the searchlight matrix is defined based on the original data, but after PCA, some redundant features from the original data are removed. Perhaps it would be simpler for users to input parameters directly rather than defining a matrix, for example, setting cfg.neighborhoods = 3 for a window containing three points, setting cfg.neighborhood_dim = 'channel' or cfg.neighborhood_dim = 'time' for dimension selection. Thank you :).

@treder
Copy link
Owner

treder commented Jul 27, 2024

AUC is the area under the ROC curve, with 1 sample in the test the ROC curve is essentially a line so the area is 0. The way it's calculated in MVPA-Light you would need both negative and positive examples (class 1 and 2) in the test set. You could use leaveout, collect the outputs (e.g., dvals) and then run mv_calculate_performance manually on the collected data.
But I will try to see whether there is a reasonable hack for this situation just for convenience.

Re: neighbours
I am not sure I understand the problem exactly. If you run a PCA (say on your voxel) you lose the notion of a neighbourhood structure (e.g., PC1 is not really a neighbour of PC2, since all of them are some linear combinations of voxels). Is your goal to use say PC1-PC3, PC2-PC4, PC3-PC5 and so on in a sliding window? You could fix the number of PCs or calculate the PCA beforehand (not inside the classification loop) and then define the neighbourhood matrix according to the PCs. Would this work?

@darianyao
Copy link
Author

Thank you very much for your reply.

Re: AUC

"Thank you for helping me understand AUC better. I'm not from a computer science background, so I asked ChatGPT how to solve the issue of calculating AUC with leave-one-out cross-validation (LOO-CV) since it cannot be calculated in a single iteration. Here is ChatGPT's response: 'To correctly calculate the AUC using leave-one-out cross-validation (LOO-CV), we need to aggregate all the predicted results from each iteration and then calculate the ROC curve and AUC as a whole.' I'm not sure if this will be helpful to you."

Re: neighbourhoods

"Regarding the neighbourhoods issue, my solution is not to run PCA in the preprocessing stage:). I can't think of other methods. I will try the fixed PC numbers method you mentioned, but I'm not sure if it will resolve the errors that occur when using searchlight and PCA together."

@treder
Copy link
Owner

treder commented Jul 27, 2024

Glad I could help, good luck!

Re:AUC
Yes, this is exactly what I suggested above. For now you have to do this "by hand". MVPA-Light does not do this because metrics are calculated on test sets and then averaged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants