Skip to content

[산학연병 협동연구 프로젝트 참고자료]

License

Notifications You must be signed in to change notification settings

susooo/Histo2GeneSignatures

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

82 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Predict Immune and Inflammatory Gene Signature Expression Directly from Histology Images

Predict 6 gene signatures associated with response to nivolumab and survival in advanced hepatocellular carcinoma (HCC) from Sangro, Bruno, et al.

Hierarchical clustering was performed on the gene expression data to generate labels for Whole Slide Images (WSIs). The deep learning models were trained (60%) with 10-fold Monte-carlo cross validation (20%) and tested (20%) on the TCGA LIHC dataset. Our in-house dataset (from APHP Henri Mondor) was then used for external validation. Results using tumoral annotations (regions of interest drawn by our expert pathologist) are superior to those using all the tissue regions.

Of note, the discovery series was stained with hematein-eosin (H&E) while external validation series was stained with hematein-eosin-saffron (HES). Thus we tested stain unmixing (3 methods implemented: Macenko PCA or XU SNMF or a fixed HES vector) and saffron removal for external validation series. Color noralization (2 methods: Reinhard or Macenco PCA) was also tested for both discovery and validation series. Furthermore, on-the-fly basic geometric augmentation were also tested during the training.

3 Deep learning approaches:

Results

AUROC in the discovery series (TCGA-LIHC) with/without tumoral annotations:

Gene signature tumor annot Patch-based Classic MIL CLAM
Best fold Mean ± sd Best fold Mean ± sd Best fold Mean ± sd
6G Interferon Gamma 0.578 0.492 ± 0.065 0.690 0.576 ± 0.102 0.734 0.600 ± 0.080
✔️ 0.661 0.560 ± 0.067 0.758 0.630 ± 0.078 0.780 0.635 ± 0.097
Gajewski 13G Inflammatory 0.780 0.666 ± 0.072 0.851 0.577 ± 0.179 0.824 0.632 ± 0.107
✔️ 0.809 0.688 ± 0.062 0.893 0.694 ± 0.125 0.914 0.728 ± 0.096
Inflammatory 0.673 0.523 ± 0.079 0.717 0.539 ± 0.139 0.738 0.607 ± 0.090
✔️ 0.706 0.580 ± 0.077 0.806 0.641 ± 0.123 0.796 0.665 ± 0.081
Interferon Gamma biology 0.700 0.541 ± 0.088 0.672 0.562 ± 0.117 0.759 0.622 ± 0.088
✔️ 0.783 0.561 ± 0.119 0.677 0.610 ± 0.051 0.822 0.674 ± 0.102
Ribas 10G Inflammatory 0.672 0.583 ± 0.081 0.652 0.552 ± 0.083 0.758 0.627 ± 0.082
✔️ 0.727 0.640 ± 0.074 0.726 0.618 ± 0.065 0.806 0.669 ± 0.067
T cell exhaustion 0.661 0.490 ± 0.108 0.744 0.516 ± 0.123 0.627 0.555 ± 0.063
✔️ 0.661 0.543 ± 0.073 0.788 0.606 ± 0.086 0.788 0.577 ± 0.092

AUROC (of best-fold model) in the external validation series (Mondor) with tumoral anotations:

Gene signature (with tumor annot ✔️) Patch-based Classic MIL CLAM
6G Interferon Gamma 0.694 0.745 0.871
Gajewski 13G Inflammatory 0.657 0.782 0.810
Inflammatory 0.657 0.816 0.850
Interferon Gamma biology 0.755 0.793 0.823
Ribas 10G Inflammatory 0.605 0.779 0.810
T cell exhaustion 0.810 0.868 0.921

Visualization / exlainability:

Workflow

Part 1. Gene expression clustering

To generate labels for WSIs

  1. Process TCGA FPKM data with gene_clust/codes/tcga_fpkm_processing.ipynb
  2. Perform hierarchical clustering with gene_clust/codes/PlotHeatmapGeneSignature.R (to reproduce the heatmap). Or using Python with gene_clust/codes/tcga_fpkm_clustering.ipynb (to get the same clustering results)

All TCGA data used and clutering results are provided in gene_clust/data and gene_clust/results. Due to privacy issues, the data in Mondor series is not provided but commands for external validation are described in this tutorial.

Part 2. Deep learning

To classify WSIs

The patch based approach requires another conda environment compared the two MIL approaches. According to the original CLAM repository, there are two options for tessellation, either saving both coordinates and images, or only coordinates to economize storage space (especially for large dataset or multiple modified patch versions) and loading images on-the-fly during the feature extraction (so-called fp). Annotations should be coordinates at the highest mangification of the WSI. Simple annotations in TXT and hierarchical annotations (for example to exclude necrosis inside a tumor) in NPT can be accepted.

  1. Patch based approach
  2. Classic MIL
  3. CLAM
  4. Other settings: tutorial, including stain unmixing (and saffron removal), color normalization or data augmentation.

About

[산학연병 협동연구 프로젝트 참고자료]

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 86.0%
  • Python 13.7%
  • R 0.3%