Atlas analysis for controlled-access datasets

This repo initially is for the analysis of human RNA sequencing data coming from European Genome-phenome Archive (EGA), but it will be extended to other sources.

For GTEX RNA-seq data, see https://github.com/ebi-gene-expression-group/atlas-gtex-bulk.

Prerequisites

Snakemake >= 7.25.3
SLURM cluster management and job scheduling system
Two scripts located at the config private_script:
- ega_bulk_env.sh
- ega_bulk_init.sh
The irap human config file
- homo_sapiens.conf

1. Analysis of EGA datasets

1.1 Data preparation

For EGA, download the data and and arrange for analysis as indicated here.

The data and metadata should be in the format:

data
    |- EGAD00001011134
      |- EGAF00008123877
        |- Sample-509_1.fastq.gz
        |- Sample-509_1.fastq.gz.md5
      |- ...
metadata
    |- EGAD00001011134.merged.csv
    |- EGAD00001011134.enaIds.txt

The file .enaIds.txt is provided by curators and contains two columns with the matches between EGA run and ENA run ids.

Then run the Snakefile-ega workflow:

snakemake --restart-times 1 --keep-going \\
  --profile slurm-profile \\
  --latency-wait 150 -p --cores 1 \\
  --config dataset_id=EGADxxxxxxxxxx \\
      input_path=/path-to-data/data \\
      metadata_path=/path-to-metadata/metadata \\
  -s Snakefile-ega

2.1 Data analysis

The workflow Snakefile-irap will validate fastqs, run Irap and prepare the results for aggregation:

snakemake --restart-times 1 --keep-going \\
  --profile slurm-profile --latency-wait 150 -p --use-conda \\
  --conda-frontend conda --conda-base-path /conda-base-path \\
  --conda-prefix /conda-prefix-path/conda \\
  --cores 1 \\
  --config dataset_id=EGADxxxxxxxxxx \\
    metadata_path=/path-to-metadata/metadata \\
    read_type=pe \\
    atlas_ca_root=/path-to-github-repo/atlas-ca-analysis \\
    private_script=/path-private_script/gitlab_scripts \\
    irap_config=/path-to-config/homo_sapiens.conf \\
  -s Snakefile-irap

2.3 Library aggregation

Finally collate irap_single_lib results of individual libraries running

scripts/aggregate_slurm.sh

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
envs		envs
isl @ 516935b		isl @ 516935b
rules		rules
scripts		scripts
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
Snakefile-ega		Snakefile-ega
Snakefile-irap		Snakefile-irap

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Atlas analysis for controlled-access datasets

Prerequisites

1. Analysis of EGA datasets

1.1 Data preparation

2.1 Data analysis

2.3 Library aggregation

About

Releases

Packages

Contributors 2

Languages

License

ebi-gene-expression-group/atlas-ca-analysis

Folders and files

Latest commit

History

Repository files navigation

Atlas analysis for controlled-access datasets

Prerequisites

1. Analysis of EGA datasets

1.1 Data preparation

2.1 Data analysis

2.3 Library aggregation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages