Predict 6 gene signatures associated with response to nivolumab and survival in advanced hepatocellular carcinoma (HCC) from Sangro, Bruno, et al.
- 6-Gene Interferon Gamma (Ayers, Mark, et al.)
- Gajewski 13-Gene Inflammatory (Spranger, Stefani, Riyue Bao, and Thomas F. Gajewski)
- Inflammatory (Sangro, Bruno, et al)
- Interferon Gamma Biology (Ayers, Mark, et al.)
- Ribas 10-Gene Interferon Gamma (Ayers, Mark, et al.)
- T-cell Exhaustion (Ayers, Mark, et al.)
Hierarchical clustering was performed on the gene expression data to generate labels for Whole Slide Images (WSIs). The deep learning models were trained (60%) with 10-fold Monte-carlo cross validation (20%) and tested (20%) on the TCGA LIHC dataset. Our in-house dataset (from APHP Henri Mondor) was then used for external validation. Results using tumoral annotations (regions of interest drawn by our expert pathologist) are superior to those using all the tissue regions.
Of note, the discovery series was stained with hematein-eosin (H&E) while external validation series was stained with hematein-eosin-saffron (HES). Thus we tested stain unmixing (3 methods implemented: Macenko PCA or XU SNMF or a fixed HES vector) and saffron removal for external validation series. Color noralization (2 methods: Reinhard or Macenco PCA) was also tested for both discovery and validation series. Furthermore, on-the-fly basic geometric augmentation were also tested during the training.
3 Deep learning approaches:
- Patch-based (original repo)
- 2 Multiple Instance Learning (MIL): CLAM and classic MIL (original repo)
AUROC in the discovery series (TCGA-LIHC) with/without tumoral annotations:
| Gene signature | tumor annot | Patch-based | Classic MIL | CLAM | |||
| Best fold | Mean ± sd | Best fold | Mean ± sd | Best fold | Mean ± sd | ||
| 6G Interferon Gamma | ❌ | 0.578 | 0.492 ± 0.065 | 0.690 | 0.576 ± 0.102 | 0.734 | 0.600 ± 0.080 |
| ✔️ | 0.661 | 0.560 ± 0.067 | 0.758 | 0.630 ± 0.078 | 0.780 | 0.635 ± 0.097 | |
| Gajewski 13G Inflammatory | ❌ | 0.780 | 0.666 ± 0.072 | 0.851 | 0.577 ± 0.179 | 0.824 | 0.632 ± 0.107 |
| ✔️ | 0.809 | 0.688 ± 0.062 | 0.893 | 0.694 ± 0.125 | 0.914 | 0.728 ± 0.096 | |
| Inflammatory | ❌ | 0.673 | 0.523 ± 0.079 | 0.717 | 0.539 ± 0.139 | 0.738 | 0.607 ± 0.090 |
| ✔️ | 0.706 | 0.580 ± 0.077 | 0.806 | 0.641 ± 0.123 | 0.796 | 0.665 ± 0.081 | |
| Interferon Gamma biology | ❌ | 0.700 | 0.541 ± 0.088 | 0.672 | 0.562 ± 0.117 | 0.759 | 0.622 ± 0.088 |
| ✔️ | 0.783 | 0.561 ± 0.119 | 0.677 | 0.610 ± 0.051 | 0.822 | 0.674 ± 0.102 | |
| Ribas 10G Inflammatory | ❌ | 0.672 | 0.583 ± 0.081 | 0.652 | 0.552 ± 0.083 | 0.758 | 0.627 ± 0.082 |
| ✔️ | 0.727 | 0.640 ± 0.074 | 0.726 | 0.618 ± 0.065 | 0.806 | 0.669 ± 0.067 | |
| T cell exhaustion | ❌ | 0.661 | 0.490 ± 0.108 | 0.744 | 0.516 ± 0.123 | 0.627 | 0.555 ± 0.063 |
| ✔️ | 0.661 | 0.543 ± 0.073 | 0.788 | 0.606 ± 0.086 | 0.788 | 0.577 ± 0.092 | |
AUROC (of best-fold model) in the external validation series (Mondor) with tumoral anotations:
| Gene signature (with tumor annot ✔️) | Patch-based | Classic MIL | CLAM |
| 6G Interferon Gamma | 0.694 | 0.745 | 0.871 |
| Gajewski 13G Inflammatory | 0.657 | 0.782 | 0.810 |
| Inflammatory | 0.657 | 0.816 | 0.850 |
| Interferon Gamma biology | 0.755 | 0.793 | 0.823 |
| Ribas 10G Inflammatory | 0.605 | 0.779 | 0.810 |
| T cell exhaustion | 0.810 | 0.868 | 0.921 |
Visualization / exlainability:

To generate labels for WSIs
- Process TCGA FPKM data with gene_clust/codes/tcga_fpkm_processing.ipynb
- Perform hierarchical clustering with gene_clust/codes/PlotHeatmapGeneSignature.R (to reproduce the heatmap). Or using Python with gene_clust/codes/tcga_fpkm_clustering.ipynb (to get the same clustering results)
All TCGA data used and clutering results are provided in gene_clust/data and gene_clust/results. Due to privacy issues, the data in Mondor series is not provided but commands for external validation are described in this tutorial.
To classify WSIs
The patch based approach requires another conda environment compared the two MIL approaches. According to the original CLAM repository, there are two options for tessellation, either saving both coordinates and images, or only coordinates to economize storage space (especially for large dataset or multiple modified patch versions) and loading images on-the-fly during the feature extraction (so-called fp). Annotations should be coordinates at the highest mangification of the WSI. Simple annotations in TXT and hierarchical annotations (for example to exclude necrosis inside a tumor) in NPT can be accepted.
- Patch based approach
- fp
- Without annotations: tutorial_patch-based_fp
- With annotations: tutorial_patch-based_fp_anno
- not fp
- Without annotations: tutorial_patch-based
- With annotations: tutorial_patch-based_anno
- fp
- Classic MIL
- fp
- Without annotations: tutorial_mil_fp
- With annotations: tutorial_mil_fp_anno
- not fp
- Without annotations: tutorial_mil
- With annotations: tutorial_mil_anno
- fp
- CLAM
- fp
- Without annotations: tutorial_clam_fp
- With annotations: tutorial_clam_fp_anno
- not fp
- Without annotations: tutorial_clam
- With annotations: tutorial_clam_anno
- fp
- Other settings: tutorial, including stain unmixing (and saffron removal), color normalization or data augmentation.
