CRCbiome_virome_2023

This repository contains code for processing of sequencing data and virus predictions, and data analysis used to generate results in Istvan et al. For details on this analysis see the preprint at medRxiv.

Bioinformatic pipeline

The input files for the pipeline are

VirSorter output files (from sample-based assembly)
Contig statistics
Paired end reads passing QC
A file with a list of sample names corresponding to the preceding files.

The bioinformatic analyses are written for the Linux OS as a Snakemake pipeline, and are intended to be run on an HPC cluster system. Software dependencies provided in conda environment specification files in workflow/envs/.

A more user-friendly version of this pipeline derived from the current pipeline is available as VirMake.

Scripts for post-processing analysis

R scripts for analysis of data are available in analyses/. Required R packages include tidyverse, vegan, ape, gridExtra, broom, effectsize, ggrepel, UpSetR, and MaAsLin2.

To test the scripts, datasets and scripts are available for the generation of 1000 mock samples including information on a subset of the viral genomes identified in the study. Datasets required for running the mock analysis is available in the folder mock/from_real_data. To run the a mock analysis, run the script called analyses/scripts/for_manuscript/run_mock_analysis.R. The working directory for the script should be the repository folder. Running the mock datasets should take less than one hour on a regular desktop computer and will result in generation of statsitical tests, figures and tables reported in the manuscript, albeit with limited associations due to the random generation of input data.

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
analyses/scripts		analyses/scripts
config		config
mock/from_real_data		mock/from_real_data
workflow		workflow
.gitignore		.gitignore
LICENCE		LICENCE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CRCbiome_virome_2023

Bioinformatic pipeline

Scripts for post-processing analysis

About

Releases 1

Packages

Languages

License

Rounge-lab/CRCbiome_virome_2023

Folders and files

Latest commit

History

Repository files navigation

CRCbiome_virome_2023

Bioinformatic pipeline

Scripts for post-processing analysis

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages