This repository contain the code for detecting batch effects in different datasets.
For different examples, you can follow our examples in the docs folder.
For installing the package, you can simply clone the repository and run the following command:
pip -q install <PATH TO THE FOLDER>
You need to have two pandas dataframes. One should include the metadata such as annotations, donors, hospitals, etc. The other one should include the features of the dataset. You can initalize the module simply by running:
from batchdetect.batchdetect import BatchDetect
bd = BatchDetect(metadata, features)
After that you can run different methods as you would like. These methods include:
bd.low_dim_visualization("pca")
bd.low_dim_visualization("tsne")
bd.low_dim_visualization("umap")
bd.classification_test()
For more examples or more information about the possible options, please refere to our docs folder.
The package also considers that there is no available feature set. We also provided
an automatic feature extraction based on first and second Order image features.
The only requirement is to provide a metadata
dataframe with the column file
.
You can create a feature dataframe by simply using the following:
from batchdetect.image import automatic_feature_extraction
df_features = automatic_feature_extraction(metadata)
The rest would be similar to the previous part.
coming soon
- Silhoutte Score (UMAP) (Rushin)
- Mean local diversity (Sophia)
- Shannon’s equitability (Sophia)
- Jensen-Shannon distance (Sophia)
- KL-Divergence (Manuel)
- other distribution metrics (Ali)
- Look at scIB (Ali)
- FID score (Manuel)
- Moran's I (Rushin) …..
Add datasets before Wednesday (21st)
LUNG: 3 Cohorts: TCGA, CPTAC, UCL (cis).