Skip to content

BatchDetect: a python package for detect systematic batch effects in datasets

License

Notifications You must be signed in to change notification settings

marrlab/BatchDetect

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

67 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BatchDetect

Codacy Badge

This repository contain the code for detecting batch effects in different datasets.

For different examples, you can follow our examples in the docs folder.

How to install the package

For installing the package, you can simply clone the repository and run the following command:

pip -q install <PATH TO THE FOLDER>

How to use the package

You need to have two pandas dataframes. One should include the metadata such as annotations, donors, hospitals, etc. The other one should include the features of the dataset. You can initalize the module simply by running:

from batchdetect.batchdetect import BatchDetect
bd = BatchDetect(metadata, features)

After that you can run different methods as you would like. These methods include:

  • bd.low_dim_visualization("pca")
  • bd.low_dim_visualization("tsne")
  • bd.low_dim_visualization("umap")
  • bd.classification_test()

For more examples or more information about the possible options, please refere to our docs folder.

Automatic feature extraction

The package also considers that there is no available feature set. We also provided an automatic feature extraction based on first and second Order image features. The only requirement is to provide a metadata dataframe with the column file.

You can create a feature dataframe by simply using the following:

from batchdetect.image import automatic_feature_extraction

df_features = automatic_feature_extraction(metadata)

The rest would be similar to the previous part.

How to cite this work

coming soon

Metrics to use:

  • Silhoutte Score (UMAP) (Rushin)
  • Mean local diversity (Sophia)
  • Shannon’s equitability (Sophia)
  • Jensen-Shannon distance (Sophia)
  • KL-Divergence (Manuel)
  • other distribution metrics (Ali)
  • Look at scIB (Ali)
  • FID score (Manuel)
  • Moran's I (Rushin) …..

Add datasets before Wednesday (21st)

LUNG: 3 Cohorts: TCGA, CPTAC, UCL (cis).

About

BatchDetect: a python package for detect systematic batch effects in datasets

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •