Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

merge to nf-core dev branch #19

Merged
merged 7 commits into from
Apr 30, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 4 additions & 2 deletions .github/CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,9 @@ We try to manage the required tasks for nf-core/hic using GitHub issues, you pro

However, don't be put off by this template - other more general issues and suggestions are welcome! Contributions to the code are even more welcome ;)

> If you need help using or modifying nf-core/hic then the best place to go is the Gitter chatroom where you can ask us questions directly: https://gitter.im/nf-core/Lobby
> If you need help using or modifying nf-core/hic then the best place to ask is on the pipeline channel on [Slack](https://nf-core-invite.herokuapp.com/).



## Contribution workflow
If you'd like to write some code for nf-core/hic, the standard workflow
Expand Down Expand Up @@ -42,4 +44,4 @@ If there are any failures then the automated tests fail.
These tests are run both with the latest available version of Nextflow and also the minimum required version that is stated in the pipeline code.

## Getting help
For further information/help, please consult the [nf-core/hic documentation](https://github.com/nf-core/hic#documentation) and don't hesitate to get in touch on [Gitter](https://gitter.im/nf-core/Lobby)
For further information/help, please consult the [nf-core/hic documentation](https://github.com/nf-core/hic#documentation) and don't hesitate to get in touch on the pipeline channel on [Slack](https://nf-core-invite.herokuapp.com/).
9 changes: 9 additions & 0 deletions .github/markdownlint.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# Markdownlint configuration file
default: true,
line-length: false
no-multiple-blanks: 0
blanks-around-headers: false
blanks-around-lists: false
header-increment: false
no-duplicate-header:
siblings_only: true
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,4 @@ data/
results/
.DS_Store
tests/test_data
*.pyc
6 changes: 6 additions & 0 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ before_install:
# Pull the docker image first so the test doesn't wait for this
- docker pull nfcore/hic:dev
# Fake the tag locally so that the pipeline runs properly
# Looks weird when this is :dev to :dev, but makes sense when testing code for a release (:dev to :1.0.1)
- docker tag nfcore/hic:dev nfcore/hic:dev

install:
Expand All @@ -25,12 +26,17 @@ install:
- pip install nf-core
# Reset
- mkdir ${TRAVIS_BUILD_DIR}/tests && cd ${TRAVIS_BUILD_DIR}/tests
# Install markdownlint-cli
- sudo apt-get install npm && npm install -g markdownlint-cli

env:
- NXF_VER='0.32.0' # Specify a minimum NF version that should be tested and work
- NXF_VER='' # Plus: get the latest NF version and check that it works

script:
# Lint the pipeline code
- nf-core lint ${TRAVIS_BUILD_DIR}
# Lint the documentation
- markdownlint ${TRAVIS_BUILD_DIR} -c ${TRAVIS_BUILD_DIR}/.github/markdownlint.yml
# Run the pipeline with the test profile
- nextflow run ${TRAVIS_BUILD_DIR} -profile test,docker
23 changes: 12 additions & 11 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,15 @@

## v1.0dev - 2019-04-09

First version of nf-core-hic pipeline which is a Nextflow implementation of the HiC-Pro pipeline [https://github.com/nservant/HiC-Pro].
Note that all HiC-Pro functionalities are not yet all implemented. The current version is designed for protocols based on restriction enzyme digestion.

In summary, this version allows :
* Automatic detection and generation of annotation files based on igenomes if not provided.
* Two-steps alignment of raw sequencing reads
* Reads filtering and detection of valid interaction products
* Generation of raw contact matrices for a set of resolutions
* Normalization of the contact maps using the ICE algorithm
* Generation of cooler file for visualization on higlass [https://higlass.io/]
* Quality report based on HiC-Pro MultiQC module
First version of nf-core-hic pipeline which is a Nextflow implementation of the [HiC-Pro pipeline](https://github.com/nservant/HiC-Pro/).
Note that all HiC-Pro functionalities are not yet all implemented. The current version is designed for protocols based on restriction enzyme digestion.

In summary, this version allows :

* Automatic detection and generation of annotation files based on igenomes if not provided.
* Two-steps alignment of raw sequencing reads
* Reads filtering and detection of valid interaction products
* Generation of raw contact matrices for a set of resolutions
* Normalization of the contact maps using the ICE algorithm
* Generation of cooler file for visualization on [higlass](https://higlass.io/)
* Quality report based on HiC-Pro MultiQC module
2 changes: 1 addition & 1 deletion CODE_OF_CONDUCT.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ This Code of Conduct applies both within project spaces and in public spaces whe

## Enforcement

Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by contacting the project team on the [Gitter channel](https://gitter.im/nf-core/Lobby). The project team will review and investigate all complaints, and will respond in a way that it deems appropriate to the circumstances. The project team is obligated to maintain confidentiality with regard to the reporter of an incident. Further details of specific enforcement policies may be posted separately.
Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by contacting the project team on [Slack](https://nf-core-invite.herokuapp.com/). The project team will review and investigate all complaints, and will respond in a way that it deems appropriate to the circumstances. The project team is obligated to maintain confidentiality with regard to the reporter of an incident. Further details of specific enforcement policies may be posted separately.

Project maintainers who do not follow or enforce the Code of Conduct in good faith may face temporary or permanent repercussions as determined by other members of the project's leadership.

Expand Down
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# ![nf-core/hic](docs/images/nfcore-hic_logo.png)
# nf-core/hic

**Analysis of Chromosome Conformation Capture data (Hi-C)**
**Analysis of Chromosome Conformation Capture data (Hi-C)**.

[![Build Status](https://travis-ci.org/nf-core/hic.svg?branch=master)](https://travis-ci.org/nf-core/hic)
[![Build Status](https://travis-ci.com/nf-core/hic.svg?branch=master)](https://travis-ci.com/nf-core/hic)
[![Nextflow](https://img.shields.io/badge/nextflow-%E2%89%A50.32.0-brightgreen.svg)](https://www.nextflow.io/)

[![install with bioconda](https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg)](http://bioconda.github.io/)
Expand Down
17 changes: 0 additions & 17 deletions assets/email_template.txt
Original file line number Diff line number Diff line change
Expand Up @@ -17,23 +17,6 @@ ${errorReport}
} %>


<% if (!success){
out << """####################################################
## nf-core/hic execution completed unsuccessfully! ##
####################################################
The exit status of the task that caused the workflow execution to fail was: $exitStatus.
The full error message was:

${errorReport}
"""
} else {
out << "## nf-core/hic execution completed successfully! ##"
}
%>




The workflow was completed at $dateComplete (duration: $duration)

The command used to launch the workflow was as follows:
Expand Down
9 changes: 9 additions & 0 deletions assets/multiqc_config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
report_comment: >
This report has been generated by the <a href="https://github.com/nf-core/hic" target="_blank">nf-core/hic</a>
analysis pipeline. For information about how to interpret these results, please see the
<a href="https://github.com/nf-core/hic" target="_blank">documentation</a>.
report_section_order:
nf-core/hic-software-versions:
order: -1000

export_plots: true
31 changes: 28 additions & 3 deletions assets/sendmail_template.txt
Original file line number Diff line number Diff line change
@@ -1,11 +1,36 @@
To: $email
Subject: $subject
Mime-Version: 1.0
Content-Type: multipart/related;boundary="nfmimeboundary"
Content-Type: multipart/related;boundary="nfcoremimeboundary"

--nfmimeboundary
--nfcoremimeboundary
Content-Type: text/html; charset=utf-8

$email_html

--nfmimeboundary--
<%
if (mqcFile){
def mqcFileObj = new File("$mqcFile")
if (mqcFileObj.length() < mqcMaxSize){
out << """
--nfcoremimeboundary
Content-Type: text/html; name=\"multiqc_report\"
Content-Transfer-Encoding: base64
Content-ID: <mqcreport>
Content-Disposition: attachment; filename=\"${mqcFileObj.getName()}\"

${mqcFileObj.
bytes.
encodeBase64().
toString().
tokenize( '\n' )*.
toList()*.
collate( 76 )*.
collect { it.join() }.
flatten().
join( '\n' )}
"""
}}
%>

--nfcoremimeboundary--
24 changes: 21 additions & 3 deletions bin/scrape_software_versions.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,14 +3,22 @@
from collections import OrderedDict
import re

# TODO nf-core: Add additional regexes for new tools in process get_software_versions
# Add additional regexes for new tools in process get_software_versions
regexes = {
'nf-core/hic': ['v_pipeline.txt', r"(\S+)"],
'Nextflow': ['v_nextflow.txt', r"(\S+)"],
'Bowtie2': ['v_bowtie2.txt', r"Bowtie2 v(\S+)"],
'Python': ['v_python.txt', r"Python v(\S+)"],
'Samtools': ['v_samtools.txt', r"Samtools v(\S+)"],
'MultiQC': ['v_multiqc.txt', r"multiqc, version (\S+)"],
}
results = OrderedDict()
results['nf-core/hic'] = '<span style="color:#999999;\">N/A</span>'
results['Nextflow'] = '<span style="color:#999999;\">N/A</span>'
results['Bowtie2'] = '<span style="color:#999999;\">N/A</span>'
results['Python'] = '<span style="color:#999999;\">N/A</span>'
results['Samtools'] = '<span style="color:#999999;\">N/A</span>'
results['MultiQC'] = '<span style="color:#999999;\">N/A</span>'

# Search each file using its regex
for k, v in regexes.items():
Expand All @@ -20,9 +28,14 @@
if match:
results[k] = "v{}".format(match.group(1))

# Remove software set to false in results
for k in results:
if not results[k]:
del(results[k])

# Dump to YAML
print ('''
id: 'nf-core/hic-software-versions'
id: 'software_versions'
section_name: 'nf-core/hic Software Versions'
section_href: 'https://github.com/nf-core/hic'
plot_type: 'html'
Expand All @@ -31,5 +44,10 @@
<dl class="dl-horizontal">
''')
for k,v in results.items():
print(" <dt>{}</dt><dd>{}</dd>".format(k,v))
print(" <dt>{}</dt><dd><samp>{}</samp></dd>".format(k,v))
print (" </dl>")

# Write out regexes as csv file:
with open('software_versions.csv', 'w') as f:
for k,v in results.items():
f.write("{}\t{}\n".format(k,v))
11 changes: 8 additions & 3 deletions conf/awsbatch.config
Original file line number Diff line number Diff line change
@@ -1,10 +1,15 @@
/*
* -------------------------------------------------
* Nextflow config file for AWS Batch
* Nextflow config file for running on AWS batch
* -------------------------------------------------
* Imported under the 'awsbatch' Nextflow profile in nextflow.config
* Uses docker for software depedencies automagically, so not specified here.
* Base config needed for running with -profile awsbatch
*/
params {
config_profile_name = 'AWSBATCH'
config_profile_description = 'AWSBATCH Cloud Profile'
config_profile_contact = 'Alexander Peltzer (@apeltzer)'
config_profile_url = 'https://aws.amazon.com/de/batch/'
}

aws.region = params.awsregion
process.executor = 'awsbatch'
Expand Down
11 changes: 5 additions & 6 deletions conf/base.config
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
/*
* -------------------------------------------------
* Nextflow base config file
* nf-core/hic Nextflow base config file
* -------------------------------------------------
* A 'blank slate' config file, appropriate for general
* use on most high performace compute environments.
Expand All @@ -11,21 +11,20 @@

process {

container = process.container

cpus = { check_max( 2, 'cpus' ) }
// Check the defaults for all processes
cpus = { check_max( 1 * task.attempt, 'cpus' ) }
memory = { check_max( 8.GB * task.attempt, 'memory' ) }
time = { check_max( 2.h * task.attempt, 'time' ) }

errorStrategy = { task.exitStatus in [1,143,137,104,134,139] ? 'retry' : 'terminate' }
errorStrategy = { task.exitStatus in [143,137,104,134,139] ? 'retry' : 'finish' }
maxRetries = 1
maxErrors = '-1'

// Process-specific resource requirements
withName:makeBowtie2Index {
cpus = { check_max( 1, 'cpus' ) }
memory = { check_max( 10.GB * task.attempt, 'memory' ) }
time = { check_max( 12.h * task.attempt, 'time' ) }
time = { check_max( 12.h * task.attempt, 'time' ) }
}
withName:bowtie2_end_to_end {
cpus = { check_max( 4, 'cpus' ) }
Expand Down
4 changes: 2 additions & 2 deletions conf/igenomes.config
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ params {
}
'Gm01' {
fasta = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Sequence/WholeGenomeFasta/genome.fa"
bowtie2 = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Sequence/Bowtie2Index/genome"
bowtie2 = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Sequence/Bowtie2Index/genome"
}
'Mmul_1' {
fasta = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Sequence/WholeGenomeFasta/genome.fa"
Expand Down Expand Up @@ -96,7 +96,7 @@ params {
}
'AGPv3' {
fasta = "${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Sequence/WholeGenomeFasta/genome.fa"
bowtie2 = "${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Sequence/Bowtie2Index/genome"
bowtie2 = "${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Sequence/Bowtie2Index/genome"
}
}
}
3 changes: 1 addition & 2 deletions conf/test.config
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ params {
max_cpus = 2
max_memory = 4.GB
max_time = 1.h

// Input data
readPaths = [
['SRR4292758_00', ['https://github.com/nf-core/test-datasets/raw/hic/data/SRR4292758_00_R1.fastq.gz', 'https://github.com/nf-core/test-datasets/raw/hic/data/SRR4292758_00_R2.fastq.gz']]
Expand All @@ -31,4 +31,3 @@ params {
// Options
skip_cool = true
}

1 change: 1 addition & 0 deletions docs/configuration/local.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ Nextflow has [excellent integration](https://www.nextflow.io/docs/latest/docker.
First, install docker on your system: [Docker Installation Instructions](https://docs.docker.com/engine/installation/)

Then, simply run the analysis pipeline:

```bash
nextflow run nf-core/hic -profile docker --genome '<genome ID>'
```
Expand Down
3 changes: 2 additions & 1 deletion docs/configuration/reference_genomes.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,11 +39,12 @@ Multiple reference index types are held together with consistent structure for m
We have put a copy of iGenomes up onto AWS S3 hosting and this pipeline is configured to use this by default.
The hosting fees for AWS iGenomes are currently kindly funded by a grant from Amazon.
The pipeline will automatically download the required reference files when you run the pipeline.
For more information about the AWS iGenomes, see https://ewels.github.io/AWS-iGenomes/
For more information about the AWS iGenomes, see [AWS-iGenomes](https://ewels.github.io/AWS-iGenomes/)

Downloading the files takes time and bandwidth, so we recommend making a local copy of the iGenomes resource.
Once downloaded, you can customise the variable `params.igenomes_base` in your custom configuration file to point to the reference location.
For example:

```nextflow
params.igenomes_base = '/path/to/data/igenomes/'
```
2 changes: 1 addition & 1 deletion docs/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@ Be warned of two important points about this default configuration:
#### 3.1) Software deps: Docker
First, install docker on your system: [Docker Installation Instructions](https://docs.docker.com/engine/installation/)

Then, running the pipeline with the option `-profile docker` tells Nextflow to enable Docker for this run. An image containing all of the software requirements will be automatically fetched and used from dockerhub (https://hub.docker.com/r/nfcore/hic).
Then, running the pipeline with the option `-profile docker` tells Nextflow to enable Docker for this run. An image containing all of the software requirements will be automatically fetched and used from [dockerhub](https://hub.docker.com/r/nfcore/hic).

#### 3.1) Software deps: Singularity
If you're not able to use Docker then [Singularity](http://singularity.lbl.gov/) is a great alternative.
Expand Down
12 changes: 6 additions & 6 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ Singletons are discarded, and multi-hits are filtered according to the configura
Note that if the `--dnase` mode is activated, HiC-Pro will skip the second mapping step.

**Output directory: `results/mapping`**

* `*bwt2pairs.bam` - final BAM file with aligned paired data
* `*.pairstat` - mapping statistics

Expand All @@ -50,7 +50,7 @@ Invalid pairs are classified as follow:
* Dangling end, i.e. unligated fragments (both reads mapped on the same restriction fragment)
* Self circles, i.e. fragments ligated on themselves (both reads mapped on the same restriction fragment in inverted orientation)
* Religation, i.e. ligation of juxtaposed fragments
* Filtered pairs, i.e. any pairs that do not match the filtering criteria on inserts size, restriction fragments size
* Filtered pairs, i.e. any pairs that do not match the filtering criteria on inserts size, restriction fragments size
* Dumped pairs, i.e. any pairs for which we were not able to reconstruct the ligation product.

Only valid pairs involving two different restriction fragments are used to build the contact maps.
Expand All @@ -59,12 +59,12 @@ Duplicated valid pairs associated to PCR artefacts are discarded (see `--rm_dup`
In case of Hi-C protocols that do not require a restriction enzyme such as DNase Hi-C or micro Hi-C, the assignment to a restriction is not possible (see `--dnase`).
Short range interactions that are likely to be spurious ligation products can thus be discarded using the `--min_cis_dist` parameter.

* `*.validPairs` - List of valid ligation products
* `*.validPairs` - List of valid ligation products
* `*RSstat` - Statitics of number of read pairs falling in each category

The validPairs are stored using a simple tab-delimited text format ;

```
```bash
read name / chr_reads1 / pos_reads1 / strand_reads1 / chr_reads2 / pos_reads2 / strand_reads2 / fragment_size / res frag name R1 / res frag R2 / mapping qual R1 / mapping qual R2 [/ allele_specific_tag]
```

Expand Down Expand Up @@ -102,7 +102,7 @@ A contact map is defined by :

Based on the observation that a contact map is symmetric and usually sparse, only non-zero values are stored for half of the matrix. The user can specified if the 'upper', 'lower' or 'complete' matrix has to be stored. The 'asis' option allows to store the contacts as they are observed from the valid pairs files.

```
```bash
A B 10
A C 23
B C 24
Expand All @@ -124,4 +124,4 @@ The pipeline has special steps which allow the software versions used to be repo
* `Project_multiqc_data/`
* Directory containing parsed statistics from the different tools used in the pipeline

For more information about how to use MultiQC reports, see http://multiqc.info
For more information about how to use MultiQC reports, see [http://multiqc.info](http://multiqc.info)
Loading