Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ac update docs #362

Merged
merged 40 commits into from
Nov 8, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
ded1e02
Have updated the documenation for using pipelines
Aug 22, 2017
4445d07
I have changed the theme of cgat report to make it easier for the use…
Aug 22, 2017
c359c79
I have made changes to the rst files to increase documentation
Aug 22, 2017
a543ba1
Added extra images to make documentation of pipelines
Aug 22, 2017
5abd060
I have added config for documentation
Aug 23, 2017
1cbe7c8
Have updated some of the documentation as per meeting points
Sep 12, 2017
3a01d98
I have updated some of the documentation
Oct 2, 2017
b1a2440
I have moved some of the prototype pipelines to the obsolete folder
Oct 2, 2017
0d2ae67
I have updated documentation with the new tutorials
Oct 3, 2017
4283462
updated bowtie2 tutorial
Oct 3, 2017
6adb35c
Updated tutorial options
Oct 3, 2017
9313df1
I have added small changes to the docs
Oct 3, 2017
e0ae7b6
Have updated tutorial documentation
Oct 4, 2017
b1bc81b
Have moved PipelineGO because it doesnt seem to be used by any other …
Oct 4, 2017
b0c9cd9
I have also moved PipelineUCSC to obsolete because it seems like it i…
Oct 4, 2017
76fff01
I have removed the PipelineKEGG to obselete because it is only used i…
Oct 4, 2017
d63e6bb
Accidentally moved PipelineUCSC to obsolete, moved back
Oct 4, 2017
c5dc7fe
Have moved metagenome communities pipeline to obsolete as it is a pro…
Oct 4, 2017
c974bec
I have removed the mapping banchmark pipeline to obseolete because it…
Oct 4, 2017
70c5e2d
I have moved pipeline_rnaseqtranscripts to obsolete because it is lis…
Oct 4, 2017
eee8a4f
I have moved prototype pipelines to obsolete
Oct 4, 2017
74f472c
Merge branch 'master' into AC-updateDocs
Oct 4, 2017
c2476ca
I have moved annotations pipeline as this will be releplaced with bam…
Oct 4, 2017
3b70d39
I have updated pipeline_peakcalling documentation
Oct 5, 2017
4cd9183
I have moved some of the pipelines back to the CGATPipelines folder
Oct 9, 2017
772ca74
bamstats tutorial
Oct 17, 2017
98097f8
Updated tutorials with bamstats
Oct 18, 2017
878983b
Merge branch 'master' into AC-updateDocs
Oct 23, 2017
9400e4e
updated report and background docs
Oct 23, 2017
7204b2b
updated report doc
Oct 24, 2017
e0395d8
Merge branch 'master' into AC-updateDocs
Oct 26, 2017
c6dce66
updated tutorial
Oct 26, 2017
d3dc663
More tutorial doc updates
Oct 26, 2017
73168ae
update installation instructions
sebastian-luna-valero Nov 8, 2017
4925842
update installation instructions
sebastian-luna-valero Nov 8, 2017
be480e4
update installation instructions
sebastian-luna-valero Nov 8, 2017
dab033d
update installation instructions
sebastian-luna-valero Nov 8, 2017
c37b5c9
update installation instructions
sebastian-luna-valero Nov 8, 2017
02f1a38
update developers section
sebastian-luna-valero Nov 8, 2017
6d9a881
update pipelines background
sebastian-luna-valero Nov 8, 2017
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 1 addition & 4 deletions CGATPipelines/pipeline_mapping.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
Overview
========

The pipeline implements various mappers and QC plots. It can be used for
The pipeline implements various mappers. It can be used for

* Mapping against a genome
* Mapping RNASEQ data against a genome
Expand All @@ -23,9 +23,6 @@
mapping
perform all mappings

qc
perform all QC steps

full
compute all mappings and QC

Expand Down
47 changes: 15 additions & 32 deletions CGATPipelines/pipeline_peakcalling.py
Original file line number Diff line number Diff line change
Expand Up @@ -118,43 +118,18 @@
:doc:`pipeline_annotations`. Set the configuration variable
:py:data:`annotations_database` and :py:data:`annotations_dir`.

On top of the default CGAT setup, the pipeline requires the following
software to be in the path:

+---------+------------+------------------------------------------------+
|*Program*|*Version* |*Purpose* |
+---------+------------+------------------------------------------------+
|samtools |>=0.1.16 |bam/sam file manipulation & stats |
+---------+------------+------------------------------------------------+
|bedtools | |working with intervals |
+---------+------------+------------------------------------------------+
|picard |>=1.42 |duplication stats. The .jar files need to be in |
| | | your CLASSPATH environment variable. |
+---------+------------+------------------------------------------------+
|macs2 |>=2.1.1. |peakcalling |
+---------+------------+------------------------------------------------+
|Conda | | ????????????? |
+---------+------------+------------------------------------------------+
|python |>= 3.0 |run IDR analysis - currently set up in a |
| | |conda enviroment that the pipeline calls |
+---------+------------+------------------------------------------------+
|IDR |>= 2.0.2 |IDR analysis of peaks (bed files) |
| | |from: (https://github.com/nboley/idr) |
+---------+------------+------------------------------------------------+
|R | | used for QC stats |
+---------+------------+------------------------------------------------+
|ChIPQC | | |
|R Package| | |
+---------+------------+------------------------------------------------+
|SICER
+---------+------------+------------------------------------------------+
The software environment is handled by the CGATPipelines conda environment
and all software is installed as part of the installation process.

Usage
=====

See :ref:`PipelineSettingUp` and :ref:`PipelineRunning` on general
information how to use CGAT pipelines.

See :ref:`Tutorials` for a comprehensive introduction of how to run a
CGATPipeline.


Pipeline Input
==============
Expand All @@ -167,8 +142,16 @@
pipeline.ini = File containing paramaters and options for
running the pipeline

design.tsv = Design file based on design file for R package DiffBind
Has the following collumns:
design.tsv = This is a tab seperated file based on the design file for R package
DiffBind

It has the following collumns:

+---------+--------+--------+-----------+-----------+-----------+----------+-----------+--------------+
|SampleID | Tissue | Factor | Condition | Treatment | Replicate | bamReads | ControlID | bamControl |
+---------+--------+--------+-----------+-----------+-----------+----------+-----------+--------------+
|F123 |blood |H3K4 |normal |NA |1 |F123.bam | |F123_input.bam|
+---------+--------+--------+-----------+-----------+-----------+----------+-----------+--------------+


Pipeline output
Expand Down
56 changes: 16 additions & 40 deletions doc/BuildingPipelines.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,12 +6,20 @@ The best way to build a pipeline is to start from an example. There are several
pipelines available, see :ref:`cgatpipelines`. To start a new project, use
:file:`pipeline_quickstart.py`::

python <srcdir>pipeline_quickstart.py --set-name=test
cgatflow quickstart --set-name=test

This will create a new directory called ``test`` in the current directory.
This will create a report directory and an src directory.

Another source of information is the script :file:`pipeline_template.py` in
the :term:`source directory`.
If you navigate to the src directory you will observe that there are two folders
``pipeline_docs/``, ``pipline_test/`` and a ``pipeline_test.py`` pipeline task file.

In order to help with debugging and reading our code, our pipelines are written so that
a pipeline task file contains Ruffus tasks and calls functions in an associated module file,
which contains all of the code to transform and analyse the data.

The module file is not generated during running of the pipeline_testing.py script. Therefore,
if you wish to create a module file, we usually save this file in the following convention,
``PipelineTest.py`` and it can be imported into the main pipeline task file (``pipeline_test.py``).

This section describes how CGAT pipelines can be constructed using the
:mod:`Pipeline` module. The Pipeline.py module contains a variety of
Expand Down Expand Up @@ -117,16 +125,14 @@ The pipeline will stop and return an error if the command exits with an error co

If you chain multiple commands, only the return value of the last
command is used to check for an error. Thus, if an upstream command
fails, it will go unnoticed. To detect these errors, insert the
``checkpoint`` statement between commands. For example::
fails, it will go unnoticed. To detect these errors, insert
``&&`` between commands. For example::

@files( '*.unsorted.gz', suffix('.unsorted.gz'), '.sorted)
def sortFile( infile, outfile ):

statement = '''gunzip %(infile)s %(infile)s.tmp;
checkpoint;
sort -t %(tmpdir)s %(infile)s.tmp > %(outfile)s;
checkpoint;
statement = '''gunzip %(infile)s %(infile)s.tmp &&
sort -t %(tmpdir)s %(infile)s.tmp > %(outfile)s &&
rm -f %(infile)s.tmp
P.run()

Expand Down Expand Up @@ -514,36 +520,6 @@ directory`.

.. _PipelinePublishing:

Publishing data
===============

To publish data and a report, use the :meth:`Pipeline.publish_report`
method, such as in the following task::

@follows( update_report )
def publish_report():
'''publish report.'''

E.info( "publishing report" )
P.publish_report()

On publishing a report, the report (in the directory :file:`report`,
specified by ``report_dir``) will get copied to the directory
specified in the configuration value ``web_dir``. Also, all files in
the :file:`export` directory will get copied over and links pointing
to such files will be automatically corrected.

The report will then be available at
``http://www.cgat.org/downloads/%(project_id)s/report`` where
``project_id`` is the unique identifier given to each project. It is
looked up automatically, but the automatic look-up requires that the
pipeline is executed within the :file:`/ifs/proj` directory.

If the option *prefix* is given to publish_report, all output
directories will be output prefixed by *prefix*. This is very useful
if there is more than one report per project.

See :meth:`Pipeline.publish_report` for more options.

Checking requisites
===================
Expand Down
49 changes: 0 additions & 49 deletions doc/CGATPipelines.rst
Original file line number Diff line number Diff line change
@@ -1,51 +1,2 @@
==================
Inactive pipelines
==================

The pipelines are currently not being actively used. This might be
because they have evolved into different pipelines. For example,
pipeline_chipseq is now split into pipeline_mapping,
pipeline_peakcalling and pipeline_intervals. In other cases, pipelines
have addressed a specific issue but have not been reused since.

The pipelines are listed below for completeness:

Pipelines in development
========================

.. toctree::
:maxdepth: 1

pipelines/pipeline_fusion.rst
pipelines/pipeline_benchmark_rnaseqmappers.rst
pipelines/pipeline_cufflinks_optimization.rst
pipelines/pipeline_mappability.rst
pipelines/pipeline_fastqToBigWig.rst
pipelines/pipeline_mapping_benchmark.rst
pipelines/pipeline_capseq.rst
pipelines/pipeline_expression.rst
pipelines/pipeline_transcriptome.rst
pipelines/pipeline_variant_annotation.rst
pipelines/pipeline_variants.rst
pipelines/pipeline_promotors.rst
pipelines/pipeline_transfacmatch.rst
pipelines/pipeline_exome_cancer.rst
pipelines/pipeline_genesets.rst
pipelines/pipeline_idr.rst
pipelines/pipeline_metagenomecommunities.rst
pipelines/pipeline_motifs.rst
pipelines/pipeline_rnaseqqc.rst
pipelines/pipeline_rrbs.rst
pipelines/pipeline_timeseries.rst


Obsolete pipelines
==================

.. toctree::
:maxdepth: 1

pipelines/pipeline_rnaseq.rst
pipelines/pipeline_chipseq.rst
pipelines/pipeline_medip.rst

37 changes: 37 additions & 0 deletions doc/Developers.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
Developers
==========

Andreas Heger

Adam Cribbs

Hania Pavlou

David Sims

Reshma Nibhani

Sebastian Luna Valero

Charlotte George

Tom Smith

Ian Sudbery

Jakub Scaber

Mike Morgan

Katy Brown

Nick Ilott

Jethro Johnson

Katherine Fawcett

Steven Sansom

Antonio Berlanga

Loading