generated from CCBR/CCBR_NextflowTemplate
-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #105 from CCBR/paired-end
Support paired-end reads and custom genomes
- Loading branch information
Showing
49 changed files
with
1,608 additions
and
223 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
I chrI | ||
II chrII | ||
III chrIII | ||
IV chrIV | ||
IX chrIX | ||
MT chrM | ||
V chrV | ||
VI chrVI | ||
VII chrVII | ||
VIII chrVIII | ||
X chrX | ||
XI chrXI | ||
XII chrXII | ||
XIII chrXIII | ||
XIV chrXIV | ||
XV chrXV | ||
XVI chrXVI |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,41 @@ | ||
# This is a configuration file for fastq_screen | ||
|
||
############## | ||
## Databases # | ||
############## | ||
## This section allows you to configure multiple databases | ||
## to search against in your screen. For each database | ||
## you need to provide a database name (which can't contain | ||
## spaces) and the location of the bowtie indices which | ||
## you created for that database. | ||
## | ||
## The entries shown below are only suggested examples, you | ||
## can add as many DATABASE sections as required, and you | ||
## can comment out or remove as many of the existing entries | ||
## as desired. | ||
## | ||
## Either the original bowtie or bowtie2 may be used for the | ||
## mapping. Specify the aligner to use with the command line | ||
## flag --aligner with arguments 'bowtie' or | ||
## 'bowtie2' (default). | ||
## | ||
## The configuration file may list paths to both bowtie and | ||
## bowtie2 indices. FastQ Screen automatically detects whether | ||
## a specified index is compatible with bowtie or bowtie2. | ||
## | ||
## Although the configuration file may list paths to both | ||
## bowtie and bowtie2 indices, only one aligner will be used | ||
## for the mapping, as specified by the --aligner flag. | ||
## | ||
## The path to the index files SHOULD INCLUDE THE BASENAME of | ||
## the index, e.g: | ||
## /data/public/Genomes/Human_Bowtie/GRCh37/Homo_sapiens.GRCh37 | ||
## Thus, the indices (Homo_sapiens.GRCh37.1.bt2, Homo_sapiens.GRCh37.2.bt2, etc.) | ||
## are found in a folder named 'GRCh37'. | ||
## | ||
## If the bowtie AND bowtie2 indices of a given genome reside in the SAME FOLDER, | ||
## a SINGLE path may be provided to BOTH sets of indices. | ||
## | ||
## Human - sequences available from | ||
## ftp://ftp.ensembl.org/pub/current/fasta/homo_sapiens/dna/ | ||
DATABASE rRNA fastq_screen_db/rRNA/rRNA |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,7 +1,7 @@ | ||
sample,fastq_1,fastq_2,antibody,control | ||
SPT5_T0_REP1,https://raw.githubusercontent.com/nf-core/test-datasets/atacseq/testdata/SRR1822153_1.fastq.gz,,SPT5,SPT5_INPUT_REP1 | ||
SPT5_T0_REP1,https://raw.githubusercontent.com/nf-core/test-datasets/atacseq/testdata/SRR1822153_1.fastq.gz,https://raw.githubusercontent.com/nf-core/test-datasets/atacseq/testdata/SRR1822153_2.fastq.gz,SPT5,SPT5_INPUT_REP1 | ||
SPT5_T0_REP2,https://raw.githubusercontent.com/nf-core/test-datasets/atacseq/testdata/SRR1822154_1.fastq.gz,,SPT5,SPT5_INPUT_REP2 | ||
SPT5_T15_REP1,https://raw.githubusercontent.com/nf-core/test-datasets/atacseq/testdata/SRR1822157_1.fastq.gz,,SPT5,SPT5_INPUT_REP1 | ||
SPT5_T15_REP1,https://raw.githubusercontent.com/nf-core/test-datasets/atacseq/testdata/SRR1822157_1.fastq.gz,https://raw.githubusercontent.com/nf-core/test-datasets/atacseq/testdata/SRR1822157_2.fastq.gz,SPT5,SPT5_INPUT_REP1 | ||
SPT5_T15_REP2,https://raw.githubusercontent.com/nf-core/test-datasets/atacseq/testdata/SRR1822158_1.fastq.gz,,SPT5,SPT5_INPUT_REP2 | ||
SPT5_INPUT_REP1,https://raw.githubusercontent.com/nf-core/test-datasets/chipseq/testdata/SRR5204809_Spt5-ChIP_Input1_SacCer_ChIP-Seq_ss100k_R1.fastq.gz,,, | ||
SPT5_INPUT_REP1,https://raw.githubusercontent.com/nf-core/test-datasets/chipseq/testdata/SRR5204809_Spt5-ChIP_Input1_SacCer_ChIP-Seq_ss100k_R1.fastq.gz,https://raw.githubusercontent.com/nf-core/test-datasets/chipseq/testdata/SRR5204809_Spt5-ChIP_Input1_SacCer_ChIP-Seq_ss100k_R2.fastq.gz,, | ||
SPT5_INPUT_REP2,https://raw.githubusercontent.com/nf-core/test-datasets/chipseq/testdata/SRR5204810_Spt5-ChIP_Input2_SacCer_ChIP-Seq_ss100k_R1.fastq.gz,,, |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,51 @@ | ||
#!/usr/bin/env python | ||
|
||
""" | ||
source https://github.com/CCBR/Pipeliner/blob/86c6ccaa3d58381a0ffd696bbf9c047e4f991f9e/Results-template/Scripts/bam_filter_by_mapq.py | ||
""" | ||
|
||
import pysam, sys | ||
import argparse | ||
|
||
parser = argparse.ArgumentParser(description="filter PE bamfile by mapQ values") | ||
parser.add_argument("-i", dest="inBam", required=True, help="Input Bam File") | ||
parser.add_argument("-o", dest="outBam", required=True, help="Output Bam File") | ||
parser.add_argument( | ||
"-q", | ||
dest="mapQ", | ||
type=int, | ||
required=False, | ||
help="mapQ value ... default 6", | ||
default=6, | ||
) | ||
args = parser.parse_args() | ||
samfile = pysam.AlignmentFile(args.inBam, "rb") | ||
mapq = dict() | ||
for read in samfile.fetch(): | ||
if read.is_unmapped: | ||
continue | ||
if read.is_supplementary: | ||
continue | ||
if read.is_secondary: | ||
continue | ||
if read.is_duplicate: | ||
continue | ||
if read.is_proper_pair: | ||
if read.mapping_quality < args.mapQ and read.query_name in mapq: | ||
del mapq[read.query_name] | ||
if read.mapping_quality >= args.mapQ and not read.query_name in mapq: | ||
mapq[read.query_name] = 1 | ||
samfile.close() | ||
samfile = pysam.AlignmentFile(args.inBam, "rb") | ||
pairedreads = pysam.AlignmentFile(args.outBam, "wb", template=samfile) | ||
for read in samfile.fetch(): | ||
if read.query_name in mapq: | ||
if read.is_supplementary: | ||
continue | ||
if read.is_secondary: | ||
continue | ||
if read.is_duplicate: | ||
continue | ||
pairedreads.write(read) | ||
samfile.close() | ||
pairedreads.close() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,60 @@ | ||
#!/usr/bin/env python | ||
|
||
from __future__ import print_function | ||
import os | ||
import sys | ||
|
||
|
||
def formatSequencelength(seq, stringlen): | ||
fseq = "" | ||
for i in range(len(seq)): | ||
index = i + 1 | ||
if index % 80 == 0: | ||
fseq += "{}{}".format(seq[i], "\n") | ||
else: | ||
fseq += seq[i] | ||
return fseq | ||
|
||
|
||
def parsed(filename): | ||
fh = open(filename, "r") | ||
sequence = "" | ||
chrom = "" | ||
seqindex = 0 | ||
seqlen = 0 | ||
for line in fh: | ||
line = line.strip() | ||
if line.startswith(">") and sequence != "": | ||
yield chrom, formatSequencelength(sequence, seqlen), len(sequence) | ||
chrom = line.split(" ")[0] | ||
sequence = "" | ||
elif line.startswith(">"): | ||
chrom = line.split(" ")[0] | ||
else: | ||
seqindex += 1 | ||
sequence += line | ||
if seqindex == 1: | ||
seqlen = len(line) | ||
else: | ||
# formatSequencelength(sequence, seqlen) | ||
yield chrom, formatSequencelength(sequence, seqlen), len(sequence) | ||
fh.close() | ||
|
||
|
||
def main(fasta_fn, chrom_sizes_fn, outdir): | ||
os.mkdir(outdir) | ||
chromsizesfh = open(chrom_sizes_fn, "w") | ||
|
||
for chrom, seq, chromsize in parsed(fasta_fn): | ||
chromsizesfh.write("{}\t{}\n".format(chrom.replace(">", ""), chromsize)) | ||
outfilename = os.path.join(outdir, chrom.replace(">", "") + ".fa") | ||
outfh = open(outfilename, "w") | ||
print("{}\n".format(chrom)) | ||
outfh.write("{}\n{}\n".format(chrom, seq.rstrip())) | ||
outfh.close() | ||
|
||
chromsizesfh.close() | ||
|
||
|
||
if __name__ == "__main__": | ||
main(sys.argv[1], sys.argv[2], sys.argv[3]) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
params { | ||
config_profile_name = 'Test single-end stubs' | ||
config_profile_description = 'Minimal test dataset with blank references to run stubs with continuous integration' | ||
|
||
outdir = 'results/test' | ||
input = 'assets/samplesheet_test.csv' // adapted from https://github.com/nf-core/test-datasets/blob/chipseq/samplesheet/v2.0/samplesheet_test.csv | ||
|
||
genome = 'custom_genome' | ||
read_length = 50 | ||
|
||
// Genome references | ||
genome_fasta = 'https://raw.githubusercontent.com/nf-core/test-datasets/atacseq/reference/genome.fa' | ||
genes_gtf = 'https://raw.githubusercontent.com/nf-core/test-datasets/atacseq/reference/genes.gtf' | ||
blacklist = 'tests/data/test.blacklist' | ||
rename_contigs = 'assets/R64-1-1_ensembl2UCSC.txt' | ||
|
||
|
||
max_cpus = 2 // for GitHub Actions https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners#supported-runners-and-hardware-resources | ||
max_memory = '6.GB' | ||
max_time = '6.h' | ||
|
||
publish_dir_mode = "symlink" | ||
|
||
// CCBR shared resource paths | ||
index_dir = "tests/data" | ||
fastq_screen = null | ||
sicer.species = "sacCer1" // supported species https://github.com/zanglab/SICER2/blob/master/sicer/lib/GenomeData.py | ||
|
||
deeptools.bin_size = 10000 // this value is only to make bamCoverage run faster. use smaller value for real data. | ||
deeptools.excluded_chroms = 'chrM' | ||
run.sicer = false // TODO set to true after https://github.com/CCBR/CHAMPAGNE/issues/109 | ||
} | ||
|
||
process { | ||
cpus = 1 | ||
memory = '1 GB' | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,22 +1,20 @@ | ||
params { | ||
genomes { | ||
'hg38' { | ||
blacklist = 'hg38.blacklist' | ||
blacklist_files = "${params.index_dir}/hg38_basic/indexes/hg38.blacklist*" | ||
reference_files = "${params.index_dir}/hg38_basic/indexes/hg38*" | ||
effective_genome_size = 2700000000 | ||
blacklist_index = "${params.index_dir}/hg38_basic/indexes/blacklist/hg38.blacklist_v3.chrM.chr_rDNA.*" | ||
reference_index = "${params.index_dir}/hg38_basic/bwa_index/hg38*" | ||
chromosomes_dir = "${params.index_dir}/hg38_basic/Chromsomes/" | ||
chrom_sizes = "${params.index_dir}/hg38_basic/indexes/hg38.fa.sizes" | ||
gene_info = "${params.index_dir}/hg38_basic/geneinfo.bed" | ||
chromosomes_dir = "${params.index_dir}/hg38_basic/Chromsomes/" | ||
effective_genome_size = 2700000000 | ||
} | ||
'mm10' { | ||
blacklist = 'mm10.blacklist' | ||
blacklist_files = "${params.index_dir}/mm10_basic/indexes/mm10.blacklist*" | ||
reference_files = "${params.index_dir}/mm10_basic/indexes/mm10*" | ||
effective_genome_size = 2400000000 | ||
blacklist_index = "${params.index_dir}/mm10_basic/indexes/blacklist/mm10.blacklist.chrM.chr_rDNA.*" | ||
reference_index = "${params.index_dir}/mm10_basic/indexes/reference/mm10*" | ||
chromosomes_dir = "${params.index_dir}/mm10_basic/Chromsomes/" | ||
chrom_sizes = "${params.index_dir}/mm10_basic/indexes/mm10.fa.sizes" | ||
gene_info = "${params.index_dir}/mm10_basic/geneinfo.bed" | ||
chromosomes_dir = "${params.index_dir}/mm10_basic/Chromsomes/" | ||
effective_genome_size = 2400000000 | ||
} | ||
} | ||
} |
Oops, something went wrong.