hcrs_omics

R package

Under construction - user friendly interface to the HCR and HCS, allowing for using custom clustering functions. Link to development version of the package will be placed

HERE

License

The software is licensed under the GNU General Public License, version 3.

How to run using docker

Run the docker daemon in seprate terminal window if its not set up

sudo dockerd

Then run following commands

git clone https://github.com/p100mma/hcrs_omics
cd hcrs_omics
sudo docker build -t hcrs .
sudo docker run -it -v .:/home/hcrs_omics hcrs

This will open up micro environment with all necessary R packages and mcl installed. To run each script, execute

Rscript --no-save <script_name.R>

Uses:

MCL software (ver 14-137):

https://micans.org/mcl/src/mcl-14-137.tar.gz

R version 4.3.1

R packages used:

Hmisc
Matrix
data.table
rmetalog
DynamicTreeCut

Input data

gene_expr_data.rds - gene expression profiles of 1394 breast cancer patients, 8673 genes. Original source:

Pereira B et al. The somatic mutation profiles of 2, 433 breast cancers refine their genomic and transcriptomic landscapes. Nature Communications, 2016; 7.1 Preprocessed and prefiltered as in:
Polewko-Klim A, Mnich K, Rudnicki W, Robust Data Integration Method for Classification of Biomedical Data. Journal ofMedical Systems, 2021; 45.

KIRC_gene_expr_data.rds - data for additional tests, kidney cancer study from TCGA. In our study we model profiles of 605 samples, limiting ourselves to 15166 genes with highest variance. Sources:

Peng L, et al. Large-scale rna-seq transcriptome analysis of 4043 cancers and 548 normal tissue controls across 12 tcga cancer types. Scientific Reports 2015;5.
initial basic processing: olewko-Klim A, Rudnicki W: Analysis of ensemble feature selection for correlated high-dimensional rna-seq cancer data. In: V Krzhizhanovskaya, et al. (eds.), Computational Science - ICCS 2020. Cham: Springer International Publishing, 2020 525–538

Code

functions doing most of the work

blockwisePCA_R_engine.R

Experiment I(breast cancer data, run in that order):

Note: for BRCA dataset, 16 GB RAM is sufficient.

BRCA_decomposition.R
metalogs.R*
BRCA_simulation.R
WGCNA_clustering.R
WGCNA_simulation.R
plain_SVD.R

*metalogs.R should be run with command line argument ranging from 1 to 70:

Rscript --no-save metalogs.R 1
#(...)
Rscript --no-save metalogs.R 70

We have used an HPC system to run this part in parallel. To run sequentially through the docker image provided, one can execute run_metalogs.script (it might take 20-30 minutes that way):

./run_metalogs.script

Experiment II (BRCA):

BRCA_nPC.R - should be run with command line argument ranging from 2 to 5:

  Rscript --no-save BRCA_nPC.R 2
  #(...)
  Rscript --no-save BRCA_nPC.R 5

We have used an HPC system to run this part in parallel. Again, to run sequentially through the docker image provided, execute run_vary_nPC.script. Be prepared that it might take a while.

   ./run_vary_nPC.script

TOM hierarchical clustering plots (BRCA, run after executing Experiment I scripts):

tom.R
tomPlots.R - generates .jpg files in the main repo directory.

Summary of results (BRCA, run after Experiment I & II):

computation_heavy_metrics.R
nPC_comparison_table.R*
sim_comparison_table.R
concat_tables.R - generates .csv table in the vary_nPC folder.

*should be run with command line argument ranging from 2 to 5:

    Rscript --no-save nPC_comparison_table.R 2
    #(...)
    Rscript --no-save nPC_comparison_table.R 5

We have run this part in parallel using HPC system. One can run run_comp_tables.script instead, which will execute those commands one-by-one in a sequential manner. Running in that manner might take some time.

    ./run_comp_tables.script

Topology characteristics plots & KS distances (BRCA, latter in the supplementary material):

topology_plots_KSdistances.R* - generates .jpg files (plots) and .csv file (table of KS distances), all of them in the main directory.

*Should be run after Experiment I and computation_heavy_metrics.R

Follow up test on KIRC dataset:

Note: The functions are not yet optimized to do things piece-wise, given memory constraints. Given much higher dimensionality here, to process this dataset, we have used machine with 60 GB RAM. Be warned that if you want to replicate results and the amount of memory you have is lower than that, it might not be sufficient.

KIRC_decomposition.R
KIRC_metalogs.R*
KIRC_simulation.R
KIRC_WGCNA_clustering.R
KIRC_WGCNA_simulation.R
KIRC_plain_SVD.R
KIRC_tom.R
KIRC_computation_heavy_metrics.R
KIRC_tomPlots.R
KIRC_topology_plots.R
KIRC_sim_comparison_table.R - additional 4 rows of the table 1 from main manuscript

*KIRC_metalogs.R should be run with command line argument ranging from 1 to 93:

Rscript --no-save KIRC_metalogs.R 1
#(...)
Rscript --no-save KIRC_metalogs.R 93

We have used an HPC system to run this part in parallel. To run sequentially through the docker image provided, one can execute run_metalogs.script (it might take 20-30 minutes that way):

./KIRC_metalogs.script

Above scripts generate similar outputs to BRCA based ones, but each file created has a prefix KIRC_.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

hcrs_omics

R package

License

How to run using docker

Uses:

Input data

Code

Experiment I(breast cancer data, run in that order):

Experiment II (BRCA):

TOM hierarchical clustering plots (BRCA, run after executing Experiment I scripts):

Summary of results (BRCA, run after Experiment I & II):

Topology characteristics plots & KS distances (BRCA, latter in the supplementary material):

Follow up test on KIRC dataset:

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
PCs		PCs
clusterings		clusterings
vary_nPC		vary_nPC
BRCA_decomposition.R		BRCA_decomposition.R
BRCA_nPC.R		BRCA_nPC.R
BRCA_simulation.R		BRCA_simulation.R
Dockerfile		Dockerfile
KIRC_WGCNA_clustering.R		KIRC_WGCNA_clustering.R
KIRC_WGCNA_simulation.R		KIRC_WGCNA_simulation.R
KIRC_X_andV_X.R		KIRC_X_andV_X.R
KIRC_computation_heavy_metrics.R		KIRC_computation_heavy_metrics.R
KIRC_decomposition.R		KIRC_decomposition.R
KIRC_gene_expr_data.rds		KIRC_gene_expr_data.rds
KIRC_metalogs.R		KIRC_metalogs.R
KIRC_plain_SVD.R		KIRC_plain_SVD.R
KIRC_sim_comparison_table.R		KIRC_sim_comparison_table.R
KIRC_simulation.R		KIRC_simulation.R
KIRC_tom.R		KIRC_tom.R
KIRC_tomPlots.R		KIRC_tomPlots.R
KIRC_topology_plots.R		KIRC_topology_plots.R
README.md		README.md
WGCNA_clustering.R		WGCNA_clustering.R
WGCNA_simulation.R		WGCNA_simulation.R
blockwisePCA_R_engine.R		blockwisePCA_R_engine.R
computation_heavy_metrics.R		computation_heavy_metrics.R
concat_tables.R		concat_tables.R
gene_expr_data.rds		gene_expr_data.rds
getXandV_X.R		getXandV_X.R
metalogs.R		metalogs.R
nPC_comparison_table.R		nPC_comparison_table.R
plain_SVD.R		plain_SVD.R
run_comp_tables.script		run_comp_tables.script
run_metalogs.script		run_metalogs.script
run_vary_nPC.script		run_vary_nPC.script
sim_comparison_table.R		sim_comparison_table.R
tom.R		tom.R
tomPlots.R		tomPlots.R
topology_plots_KSdistances.R		topology_plots_KSdistances.R

p100mma/hcrs_omics

Folders and files

Latest commit

History

Repository files navigation

hcrs_omics

R package

License

How to run using docker

Uses:

Input data

Code

Experiment I(breast cancer data, run in that order):

Experiment II (BRCA):

TOM hierarchical clustering plots (BRCA, run after executing Experiment I scripts):

Summary of results (BRCA, run after Experiment I & II):

Topology characteristics plots & KS distances (BRCA, latter in the supplementary material):

Follow up test on KIRC dataset:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages