CGATOxford · sebastian-luna-valero · Nov 8, 2017 · Aug 22, 2017 · Aug 22, 2017 · Aug 22, 2017
diff --git a/CGATPipelines/pipeline_mapping.py b/CGATPipelines/pipeline_mapping.py
@@ -11,7 +11,7 @@
 Overview
 ========
 
-The pipeline implements various mappers and QC plots. It can be used for
+The pipeline implements various mappers. It can be used for
 
 * Mapping against a genome
 * Mapping RNASEQ data against a genome
@@ -23,9 +23,6 @@
 mapping
     perform all mappings
 
-qc
-    perform all QC steps
-
 full
     compute all mappings and QC
 

diff --git a/CGATPipelines/pipeline_peakcalling.py b/CGATPipelines/pipeline_peakcalling.py
@@ -118,43 +118,18 @@
 :doc:`pipeline_annotations`. Set the configuration variable
 :py:data:`annotations_database` and :py:data:`annotations_dir`.
 
-On top of the default CGAT setup, the pipeline requires the following
-software to be in the path:
-
-+---------+------------+------------------------------------------------+
-|*Program*|*Version*   |*Purpose*                                       |
-+---------+------------+------------------------------------------------+
-|samtools |>=0.1.16    |bam/sam file manipulation & stats               |
-+---------+------------+------------------------------------------------+
-|bedtools |            |working with intervals                          |
-+---------+------------+------------------------------------------------+
-|picard   |>=1.42      |duplication stats. The .jar files need to be in |
-|         |            | your CLASSPATH environment variable.           |
-+---------+------------+------------------------------------------------+
-|macs2	  |>=2.1.1.    |peakcalling                                 	|
-+---------+------------+------------------------------------------------+
-|Conda	  |	           |		?????????????		                  	|
-+---------+------------+------------------------------------------------+
-|python   |>= 3.0      |run IDR analysis - currently set up in a        |
-|         | 	       |conda enviroment that the pipeline calls	    |
-+---------+------------+------------------------------------------------+
-|IDR      |>= 2.0.2    |IDR analysis of peaks (bed files)               |
-|         |            |from: (https://github.com/nboley/idr)           |
-+---------+------------+------------------------------------------------+
-|R        |            | used for QC stats                              |
-+---------+------------+------------------------------------------------+
-|ChIPQC   |            |                                                |
-|R Package|            |                                                |
-+---------+------------+------------------------------------------------+
-|SICER
-+---------+------------+------------------------------------------------+
+The software environment is handled by the CGATPipelines conda environment
+and all software is installed as part of the installation process.
 
 Usage
 =====
 
 See :ref:`PipelineSettingUp` and :ref:`PipelineRunning` on general
 information how to use CGAT pipelines.
 
+See :ref:`Tutorials` for a comprehensive introduction of how to run a
+CGATPipeline.
+
 
 Pipeline Input
 ==============
@@ -167,8 +142,16 @@
 pipeline.ini = File containing paramaters and options for
 running the pipeline
 
-design.tsv = Design file based on design file for R package DiffBind
-Has the following collumns:
+design.tsv = This is a tab seperated file based on the design file for R package
+DiffBind
+
+It has the following collumns:
+
++---------+--------+--------+-----------+-----------+-----------+----------+-----------+--------------+
+|SampleID | Tissue | Factor | Condition | Treatment | Replicate | bamReads | ControlID | bamControl   |
++---------+--------+--------+-----------+-----------+-----------+----------+-----------+--------------+
+|F123     |blood   |H3K4    |normal     |NA         |1          |F123.bam  |           |F123_input.bam|
++---------+--------+--------+-----------+-----------+-----------+----------+-----------+--------------+
 
 
 Pipeline output

diff --git a/doc/BuildingPipelines.rst b/doc/BuildingPipelines.rst
@@ -6,12 +6,20 @@ The best way to build a pipeline is to start from an example. There are several
 pipelines available, see :ref:`cgatpipelines`. To start a new project, use 
 :file:`pipeline_quickstart.py`::
 
-   python <srcdir>pipeline_quickstart.py --set-name=test
+   cgatflow quickstart --set-name=test
 
-This will create a new directory called ``test`` in the current directory.
+This will create a report directory and an src directory.
 
-Another source of information is the script :file:`pipeline_template.py` in 
-the :term:`source directory`.
+If you navigate to the src directory you will observe that there are two folders
+``pipeline_docs/``, ``pipline_test/`` and a ``pipeline_test.py`` pipeline task file.
+
+In order to help with debugging and reading our code, our pipelines are written so that
+a pipeline task file contains Ruffus tasks and calls functions in an associated module file,
+which contains all of the code to transform and analyse the data.
+
+The module file is not generated during running of the pipeline_testing.py script. Therefore,
+if you wish to create a module file, we usually save this file in the following convention,
+``PipelineTest.py`` and it can be imported into the main pipeline task file (``pipeline_test.py``).
 
 This section describes how CGAT pipelines can be constructed using the
 :mod:`Pipeline` module. The Pipeline.py module contains a variety of
@@ -117,16 +125,14 @@ The pipeline will stop and return an error if the command exits with an error co
 
 If you chain multiple commands, only the return value of the last
 command is used to check for an error. Thus, if an upstream command
-fails, it will go unnoticed.  To detect these errors, insert the
-``checkpoint`` statement between commands. For example::
+fails, it will go unnoticed.  To detect these errors, insert
+``&&`` between commands. For example::
 
    @files( '*.unsorted.gz', suffix('.unsorted.gz'), '.sorted)
    def sortFile( infile, outfile ):
 
-       statement = '''gunzip %(infile)s %(infile)s.tmp; 
-                      checkpoint;
-		      sort -t %(tmpdir)s %(infile)s.tmp > %(outfile)s;
-		      checkpoint;
+       statement = '''gunzip %(infile)s %(infile)s.tmp &&
+		      sort -t %(tmpdir)s %(infile)s.tmp > %(outfile)s &&
 		      rm -f %(infile)s.tmp
        P.run()
 
@@ -514,36 +520,6 @@ directory`.
 
 .. _PipelinePublishing:
 
-Publishing data
-===============
-
-To publish data and a report, use the :meth:`Pipeline.publish_report`
-method, such as in the following task::
-
-   @follows( update_report )
-   def publish_report():
-       '''publish report.'''
-
-       E.info( "publishing report" )
-       P.publish_report()
-
-On publishing a report, the report (in the directory :file:`report`,
-specified by ``report_dir``) will get copied to the directory
-specified in the configuration value ``web_dir``. Also, all files in
-the :file:`export` directory will get copied over and links pointing
-to such files will be automatically corrected.
-
-The report will then be available at
-``http://www.cgat.org/downloads/%(project_id)s/report`` where
-``project_id`` is the unique identifier given to each project. It is
-looked up automatically, but the automatic look-up requires that the
-pipeline is executed within the :file:`/ifs/proj` directory.
-
-If the option *prefix* is given to publish_report, all output
-directories will be output prefixed by *prefix*. This is very useful
-if there is more than one report per project.
-
-See :meth:`Pipeline.publish_report` for more options.
 
 Checking requisites
 ===================

diff --git a/doc/CGATPipelines.rst b/doc/CGATPipelines.rst
@@ -1,51 +1,2 @@
-==================
-Inactive pipelines
-==================
 
-The pipelines are currently not being actively used. This might be
-because they have evolved into different pipelines. For example,
-pipeline_chipseq is now split into pipeline_mapping,
-pipeline_peakcalling and pipeline_intervals. In other cases, pipelines
-have addressed a specific issue but have not been reused since.
-
-The pipelines are listed below for completeness:
-
-Pipelines in development
-========================
-
-.. toctree::
-   :maxdepth: 1	
-
-   pipelines/pipeline_fusion.rst
-   pipelines/pipeline_benchmark_rnaseqmappers.rst
-   pipelines/pipeline_cufflinks_optimization.rst
-   pipelines/pipeline_mappability.rst
-   pipelines/pipeline_fastqToBigWig.rst
-   pipelines/pipeline_mapping_benchmark.rst
-   pipelines/pipeline_capseq.rst
-   pipelines/pipeline_expression.rst
-   pipelines/pipeline_transcriptome.rst
-   pipelines/pipeline_variant_annotation.rst
-   pipelines/pipeline_variants.rst
-   pipelines/pipeline_promotors.rst
-   pipelines/pipeline_transfacmatch.rst
-   pipelines/pipeline_exome_cancer.rst
-   pipelines/pipeline_genesets.rst
-   pipelines/pipeline_idr.rst
-   pipelines/pipeline_metagenomecommunities.rst
-   pipelines/pipeline_motifs.rst
-   pipelines/pipeline_rnaseqqc.rst
-   pipelines/pipeline_rrbs.rst
-   pipelines/pipeline_timeseries.rst
-
-
-Obsolete pipelines
-==================
-
-.. toctree::
-   :maxdepth: 1	
-
-   pipelines/pipeline_rnaseq.rst
-   pipelines/pipeline_chipseq.rst
-   pipelines/pipeline_medip.rst
 
diff --git a/doc/Developers.rst b/doc/Developers.rst
@@ -0,0 +1,37 @@
+Developers
+==========
+
+Andreas Heger
+
+Adam Cribbs
+
+Hania Pavlou
+
+David Sims
+
+Reshma Nibhani
+
+Sebastian Luna Valero
+
+Charlotte George
+
+Tom Smith
+
+Ian Sudbery
+
+Jakub Scaber
+
+Mike Morgan
+
+Katy Brown
+
+Nick Ilott
+
+Jethro Johnson
+
+Katherine Fawcett
+
+Steven Sansom
+
+Antonio Berlanga
+