Skip to content

Latest commit

 

History

History

src

This folder contains source code used by the repository.

R/python scripts

  • DTox model

    • dtox.py learns and evaluates DTox model
      • dtox_data.py contains data-formatting functions used for DTox model training.
      • dtox_hierarchy.py contains functions used to process sorted DTox hiearchy files and compute model statistics.
      • dtox_nn.py contains functions used to build basic neural network structure for DTox model.
      • dtox_loss.py contains the the loss function used in DTox model.
      • early_stop.py contains early stop function of DTox model.
      • dtox_learning.py contains deep learning functions used in the DTox model construction.
      • run/run_dtox_implementation.R generates shell scripts that run DTox model on Tox21 datasets under Reactome pathway hierarchy. run/run_dtox_shuffle.R generates scripts that run DTox model on Tox21 datasets under shuffeld Reactome pathway hierarchy. run/run_dtox_null.R generates shell scripts that run DTox model on outcome-shuffled Tox21 datasets under Reactome pathway hierarchy. run/run_dtox_feature_shuffle.R generates shell scripts that run DTox model on feature-shuffled Tox21 datasets under Reactome pathway hierarchy.
    • predict_dtox.py implements trained DTox model to predict outcome probability based on input feature data.
    • interpret_dtox.py implements layer-wise relevance propagation to evaluate relevance of DTox paths.
      • dtox_lrp.py contains functions used for implementing LRP to evaluate relevance of DTox paths.
      • run/run_interpret_dtox.R generates shell scripts that runs DTox interpretation procedure on optimal models trained for Tox21 datasets.
  • Simple machine learning model

    • simple/simple.py develops and evaluates simple machine learning model (random forest or gradient boosting).
      • simple/simple_learning.py contains functions for building, evaluating, and implementing simple machine learning models.
      • run/run_simple.R generates shell scripts that run simple machine learning models on Tox21 datasets under different hyperparameter settings.
      • simple/interpret_by_lime.py implements LIME technique to explain sample-level predictions of simple machine learning models.
  • Multi-layer perceptron neural network model

    • mlp/mlp.py develops and evaluates a fully connected Multi-Layer Perceptron (MLP) neural network model, otherwise with the same number of hidden layer/neuron as the matched DTox model.
      • mlp/mlp_learning.py contains functions used in the Multi-Layer Perceptron (MLP) neural network model.
      • run/run_mlp.R generates shell scripts that run fully connected MLP neural network models on Tox21 datasets, which are built with the same number of hidden layer/neuron as matched DTox models.
  • Model performance analysis, comparison, and visualization

  • Model interpretation analysis and comparison

    • Interpretation result analysis
    • Interpretation validation by gene expression
      • analysis_expression/valid_interpret_by_expression.R uses LINCS pertubation gene expression data to validate whether significant DTox paths (identified from model interpretation) are differentially expressed after compound treatment, and compare the proportion of differential expression to backtround DTox paths.
      • analysis_expression/collect_valid_expression_results.R collects computed differential expression proportions of compounds from interpretation-validation result files, and performs t test to compare proportion among significant DTox paths vs among background DTox paths.
      • analysis_expression/visualize_expression_validation.py uses scatter plots to visualize the gene expression-validation of DTox interpretation results on Tox21 datasets, comparing the compound differential expression proportion compounds among significant DTox paths vs among background DTox paths.
      • analysis_expression/analyze_interpret_expression.R analyzes gene expression-validated DTox paths from model interpretation on Tox21 datasets, and identifies recurrent differentially expressed DTox paths among compounds.
      • analysis_expression/visualize_recurrent_path.py uses barplot to visualize the frequency of recurrent differentially expressed DTox paths from model interpretation results on Tox21 dataset of interest.
    • Interpretation validation by standard pathway-receptor pattern
      • analysis_standard/valid_interpret_by_standard.R uses standard Reactome pathway-receptor patterns to validate whether significant DTox paths (identified from model interpretation) contains particular pattern matched with each compound, and compare the observed outcome with expected probability.
      • analysis_standard/collect_valid_standard_results.R collects observed outcome and expected probability of compounds from interpretation-validation result files, then compute observed and expected proportion of validated compounds based on collected results.
      • analysis_standard/visualize_standard_validation.py uses density plot and barplot to visualize the standard pattern-validation of DTox interpretation results on Tox21 datasets, comparing the observed and expected proportion of validated compounds.
      • analysis_standard/interpret_by_read_across.R implements Read-across to connect query compounds with query target based on their chemical similarity to source compounds in Drugbank/ComptoxAI.
      • analysis_standard/collect_target_standard_results.R collects the validation results by DTox, LIME, and Read-across regarding the interpretation task of connecting active compounds to their respective target receptor in four nuclear receptor assays.
      • analysis_standard/visualize_target_standard_validation.py uses line charts to visualize the validation performance comparison among DTox, LIME, and Read-across regarding the interpretation task of connecting active compounds to their respective target receptor in four nuclear receptor assays.
    • Interpretation analysis on HepG2 cell viability assay
      • analysis_viability/analyze_viability_path_assay.R analyzes DTox module relevance scores of viability-related pathways in the context of two viability-related assays (CASP3/7 apoptosis and mitochondria toxicity), compares pathway relevance scores between active and inactive compounds, then uses survival plot to visualize the comparison.
      • analysis_viability/analyze_viability_path_map.R analyzes viability-related DTox paths from model interpretation results in the context of drug-induced liver injury (DILI) adverse events and ATC drug classification, evaluates the enrichment of DILI events/ATC drug classes among compounds identified with viability-related DTox paths, then visualizes the relationships between viability-related DTox paths and DILI events/ATC drug classes by heatmap.
      • analysis_viability/analyze_viability_network.R uses visNetwork package to visualize the flow of relevance along DTox viability-related paths between query compound, hidden pathway modules, and the HepG2 cell viability outcome.
      • analysis_viability/analyze_viability_new_path.R identifies the prevalent target proteins and lowest level pathways that are along the paths of cytotoxic compounds not linked to the viability-related pathways by DTox.
      • analysis_viability/visualize_new_path.py uses barplot to visualize the frequency of target proteins and lowest level pathways among the cytotoxic compounds not linked to the viability-related pathways by DTox.
  • Model prediction

  • functions.R contains R functions required for other scripts in the repository.

Executable shell scripts