Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update installation #363

Merged
merged 48 commits into from
Nov 8, 2017
Merged

Update installation #363

merged 48 commits into from
Nov 8, 2017

Conversation

sebastian-luna-valero
Copy link
Member

@sebastian-luna-valero sebastian-luna-valero commented Nov 1, 2017

Created scripts to scan the repository to get the Python, R and 3rd-party dependencies and generates a conda environment file with them. The output is kept under the folder "conda/environments".

The goal is to use conda environment files to deploy CGAT code everywhere. Hopefully this will offer portability (easier to install the code out of CGAT systems) and reproducibility (code and runtime environment under version control). Combining this with nosetests and pipeline_testing, we will be able to easily check in advance whether software updates will break existing code or produce different results.

Briefly, this is how it works:

  • R dependencies are parsed with: grep "library(r-package)"
  • Python dependencies are parsed with snakefood
  • 3rd party programs are picked up if they are called the CGAT way: statement = "commands" (using Abstract Syntax Trees). Anything called in a different way will not be picked up.

The approach is far from perfect, but gives an automatic way of getting dependencies. Happy to hear more advance solutions for doing this.

During the Python 3 migration we shortlisted a subset of production pipelines. Of those, not all of them are being tested with Jenkins. Right now, the conda installation will only cover those pipelines with Jenkins tests (sorry!) but I am happy to assist with the creation of new tests for the remaining pipelines.

Here is the list of pipelines with a supported conda installation:

  • annotations
  • enrichment
  • intervals
  • mapping
  • peakcalling
  • readqc
  • rnaseqde
  • rnaseqqc
  • scrnaseqqc
  • windows

Below is a list of the conda environments. Please let me know if you see missing dependencies, as my scripts might have skipped important ones.

  • Annotations:
# output generated by /ifs/devel/sebastian/py35-v1/CGATPipelines/scripts/cgat_conda_deps.sh --pipeline annotations
# on Wed Nov  1 15:32:32 GMT 2017

name: cgat-p

channels:
- bioconda
- conda-forge
- defaults

dependencies:
# python dependencies
- python
- brewer2mpl
- pandas
- pika
- pybigwig
- python-drmaa
- rpy2
- ruffus
- sqlalchemy
# R dependencies
- r-base
- bioconductor-kegg.db
# Misc dependencies
- bedtools
- htslib
- nomkl
- zlib
  • Enrichment:
# output generated by /ifs/devel/sebastian/py35-v1/CGATPipelines/scripts/cgat_conda_deps.sh --pipeline enrichment
# on Wed Nov  1 15:34:22 GMT 2017

name: cgat-p

channels:
- bioconda
- conda-forge
- defaults

dependencies:
# python dependencies
- python
- brewer2mpl
- matplotlib
- numpy
- pandas
- pika
- python-drmaa
- rpy2
- ruffus
- scipy
- sqlalchemy
- toposort
# R dependencies
- r-base
- bioconductor-hpar
# Misc dependencies
- nomkl
- zlib
  • Intervals:
# output generated by /ifs/devel/sebastian/py35-v1/CGATPipelines/scripts/cgat_conda_deps.sh --pipeline intervals
# on Wed Nov  1 15:36:41 GMT 2017

name: cgat-p

channels:
- bioconda
- conda-forge
- defaults

dependencies:
# python dependencies
- python
- brewer2mpl
- numpy
- pandas
- pika
- pybedtools
- pysam
- python-drmaa
- ruffus
- sqlalchemy
# Misc dependencies
# WARNING: macs2 is Py2 only. Please install it on a separate conda env
# WARNING: meme is Py2 only. Please install it on a separate conda env
# WARNING: sicer is Py2 only. Please install it on a separate conda env
- bedtools
- gat
- htslib
- nomkl
- peakranger
- picard
- samtools
- ucsc-bedgraphtobigwig
- zlib
  • Mapping:
# output generated by /ifs/devel/sebastian/py35-v1/CGATPipelines/scripts/cgat_conda_deps.sh --pipeline mapping
# on Wed Nov  1 15:37:50 GMT 2017

name: cgat-p

channels:
- bioconda
- conda-forge
- defaults

dependencies:
# python dependencies
- python
- brewer2mpl
- numpy
- pandas
- pika
- python-drmaa
- ruffus
- sqlalchemy
# Misc dependencies
# WARNING: tophat is Py2 only. Please install it on a separate conda env
- bedtools
- bismark
- bowtie
- bowtie2
- bwa
- cufflinks
- fastq-screen
- fastqc
- gat
- gmap
- hisat2
- htslib
- kallisto
- nomkl
- picard
- samtools
- shortstack
- star
- ucsc-bedgraphtobigwig
- ucsc-gtftogenepred
- zlib
  • Peakcalling:
# output generated by /ifs/devel/sebastian/py35-v1/CGATPipelines/scripts/cgat_conda_deps.sh --pipeline peakcalling
# on Wed Nov  1 15:38:57 GMT 2017

name: cgat-p

channels:
- bioconda
- conda-forge
- defaults

dependencies:
# python dependencies
- python
- brewer2mpl
- matplotlib
- numpy
- pandas
- pika
- pysam
- python-drmaa
- rpy2
- ruffus
- seaborn
- sqlalchemy
# R dependencies
- r-base
- bioconductor-chipqc
# Misc dependencies
# WARNING: macs2 is Py2 only. Please install it on a separate conda env
# WARNING: sicer is Py2 only. Please install it on a separate conda env
- bedtools
- coreutils
- htslib
- jupyter
- nomkl
- picard
- samtools
- ucsc-bedgraphtobigwig
- zlib
  • Readqc:
# output generated by /ifs/devel/sebastian/py35-v1/CGATPipelines/scripts/cgat_conda_deps.sh --pipeline readqc
# on Wed Nov  1 15:40:04 GMT 2017

name: cgat-p

channels:
- bioconda
- conda-forge
- defaults

dependencies:
# python dependencies
- python
- brewer2mpl
- pandas
- pika
- python-drmaa
- ruffus
- six
- sqlalchemy
# Misc dependencies
# WARNING: tophat is Py2 only. Please install it on a separate conda env
- bismark
- bowtie
- bowtie2
- bwa
- cufflinks
- fastq-screen
- fastqc
- fastx_toolkit
- gmap
- hisat2
- kallisto
- nomkl
- samtools
- shortstack
- star
- zlib
  • RNAseq differential expression:
# output generated by /ifs/devel/sebastian/py35-v1/CGATPipelines/scripts/cgat_conda_deps.sh --pipeline rnaseqde
# on Wed Nov  1 15:42:21 GMT 2017

name: cgat-p

channels:
- bioconda
- conda-forge
- defaults

dependencies:
# python dependencies
- python
- brewer2mpl
- numpy
- pandas
- pika
- python-drmaa
- rpy2
- ruffus
- sqlalchemy
# R dependencies
- r-base
- r-hiddenmarkov
- r-mass
- r-rcolorbrewer
- r-wasabi
- bioconductor-cummerbund
# Misc dependencies
- cufflinks
- kallisto
- nomkl
- sailfish
- salmon
- samtools
- stringtie
- zlib
  • RNAseq QC:
# output generated by /ifs/devel/sebastian/py35-v1/CGATPipelines/scripts/cgat_conda_deps.sh --pipeline rnaseqqc
# on Wed Nov  1 15:43:21 GMT 2017

name: cgat-p

channels:
- bioconda
- conda-forge
- defaults

dependencies:
# python dependencies
- python
- brewer2mpl
- matplotlib
- numpy
- pandas
- pika
- python-drmaa
- rpy2
- ruffus
- scipy
- seaborn
- sqlalchemy
# R dependencies
- r-base
- r-ggplot2
- r-gmd
- r-hmisc
- r-rcolorbrewer
- r-reshape2
# Misc dependencies
# WARNING: tophat is Py2 only. Please install it on a separate conda env
- bedtools
- bismark
- bowtie
- bowtie2
- bwa
- cufflinks
- fastq-screen
- fastqc
- gat
- gmap
- hisat2
- htslib
- kallisto
- nomkl
- picard
- sailfish
- salmon
- samtools
- shortstack
- star
- ucsc-gtftogenepred
- zlib
  • Single-Cell RNAseq QC:
# output generated by /ifs/devel/sebastian/py35-v1/CGATPipelines/scripts/cgat_conda_deps.sh --pipeline scrnaseqqc
# on Wed Nov  1 15:44:23 GMT 2017

name: cgat-p

channels:
- bioconda
- conda-forge
- defaults

dependencies:
# python dependencies
- python
- brewer2mpl
- pika
- python-drmaa
- ruffus
- sqlalchemy
# Misc dependencies
- nomkl
- picard
- subread
- zlib
  • Windows:
# output generated by /ifs/devel/sebastian/py35-v1/CGATPipelines/scripts/cgat_conda_deps.sh --pipeline windows
# on Wed Nov  1 15:45:25 GMT 2017

name: cgat-p

channels:
- bioconda
- conda-forge
- defaults

dependencies:
# python dependencies
- python
- brewer2mpl
- numpy
- pandas
- pika
- python-drmaa
- rpy2
- ruffus
- sqlalchemy
# Misc dependencies
- bedtools
- gat
- htslib
- nomkl
- picard
- samtools
- ucsc-bedgraphtobigwig
- ucsc-bedtobigbed
- zlib

@AndreasHeger
Copy link
Member

AndreasHeger commented Nov 3, 2017

😄 thanks, very useful

@sebastian-luna-valero sebastian-luna-valero merged commit 3a39da2 into master Nov 8, 2017
@sebastian-luna-valero sebastian-luna-valero deleted the SLV-update-install branch November 8, 2017 15:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants