Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create module to filter out aligned reads #13

Merged
merged 22 commits into from
Oct 20, 2023
Merged
Show file tree
Hide file tree
Changes from 16 commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
b8dab0f
feat: boilerplate for filter_blacklist module [WIP]
kelly-sovacool Oct 19, 2023
a114200
chore: Merge branch 'main' into filter-aligned
kelly-sovacool Oct 19, 2023
e14e444
feat: write filter_aligned process and test workflow
kelly-sovacool Oct 19, 2023
1abf89f
fix: samtools view doesn't output index
kelly-sovacool Oct 19, 2023
c659e60
test: create tests for samtools/filter_aligned
kelly-sovacool Oct 19, 2023
8eeb99f
fix: lint errors
kelly-sovacool Oct 19, 2023
6efe2f3
test: custom pytest to check unaligned reads
kelly-sovacool Oct 19, 2023
c4212df
feat: index bam
kelly-sovacool Oct 19, 2023
c44c96f
ci: install from requirements.txt
kelly-sovacool Oct 19, 2023
c3427ed
fix(test): handle paired & single differently
kelly-sovacool Oct 19, 2023
f4d4d3a
fix(test): define workflow_dir
kelly-sovacool Oct 19, 2023
1674e23
fix: test md5sums
kelly-sovacool Oct 19, 2023
4e5a732
chore: remove comments
kelly-sovacool Oct 19, 2023
b4c8b29
chore: add PR #s
kelly-sovacool Oct 19, 2023
3c0daef
Merge branch 'main' into filter-aligned
kelly-sovacool Oct 20, 2023
0ea77ce
style: prettier changelog
kelly-sovacool Oct 20, 2023
1e0d6ae
test: switch to mus musculus test data
kelly-sovacool Oct 20, 2023
fb3d981
test: switch mouse to human test data -- no genome fasta for mouse
kelly-sovacool Oct 20, 2023
e215998
test: fix md5sums for human test data
kelly-sovacool Oct 20, 2023
e8cd4a9
ci: require pysam >= 0.22
kelly-sovacool Oct 20, 2023
8b9294e
test: simplify test logic
kelly-sovacool Oct 20, 2023
c2fbcc6
test: simplify single read test logic
kelly-sovacool Oct 20, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -130,7 +130,7 @@ jobs:
${{ runner.os }}-pip-

- name: Install Python dependencies
run: python -m pip install --upgrade pip pytest-workflow cryptography
run: python -m pip install --upgrade pip -r tests/requirements.txt

- name: Setup Nextflow ${{ matrix.NXF_VER }}
uses: nf-core/setup-nextflow@v1
Expand Down
16 changes: 9 additions & 7 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,11 @@
## development version

- new modules:
- bwa/index
- bwa/mem
- also runs samtools sort & outputs index in bai format.
- custom/bam_to_fastq (#14)
- cutadapt
- khmer/uniquekmers
### new modules

- bwa/index
- bwa/mem
- custom/bam_to_fastq (#14)
- cutadapt (#11)
- khmer/uniquekmers (#7)
- samtools/filter_aligned (#13)
- also runs samtools sort & outputs index in bai format. (#12)
45 changes: 45 additions & 0 deletions modules/CCBR/samtools/filter_aligned/main.nf
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@

process FILTER_ALIGNED {
'''
Given a bam file, filter out reads that aligned.
'''
tag { meta.id }
label 'process_high'

container 'nciccbr/ccbr_ubuntu_base_20.04:v6'

input:
tuple val(meta), path(bam), path(bai)

output:
tuple val(meta), path("*.unaligned.bam"), path('*.unaligned.bam.bai'), emit: bam
path "versions.yml" , emit: versions

when:
task.ext.when == null || task.ext.when

script:
def prefix = task.ext.prefix ?: "${meta.id}"
def filter_flag = meta.single_end ? '4' : '12'
"""
samtools view \\
-@ ${task.cpus} \\
-f ${filter_flag} \\
-b \\
-o ${prefix}.unaligned.bam \\
${prefix}.bam
samtools index \\
-@ ${task.cpus} \\
${prefix}.unaligned.bam

cat <<-END_VERSIONS > versions.yml
"${task.process}":
samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//')
END_VERSIONS
"""

stub:
"""
touch ${meta.id}.unaligned.bam ${meta.id}.unaligned.bam.bai versions.yml
"""
}
40 changes: 40 additions & 0 deletions modules/CCBR/samtools/filter_aligned/meta.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
name: filter_aligned
description: Filter out aligned reads from a BAM file.
keywords:
- bam
- filter
- samtools
tools:
- samtools:
description: |
SAMtools is a set of utilities for interacting with and post-processing
short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li.
These files are generated as output by short read aligners like BWA.
homepage: http://www.htslib.org/
documentation: http://www.htslib.org/doc/samtools.html
doi: 10.1093/bioinformatics/btp352
licence: ["MIT"]
input:
- meta:
type: map
description: |
Groovy Map containing reference information.
e.g. [ id:'test', single_end:false ]
- bam:
type: file
description: Input BAM file
output:
- meta:
type: map
description: |
Groovy Map containing reference information.
e.g. [ id:'test', single_end:false ]
- bam:
type: file
description: Output BAM file with only reads that were not aligned
- versions:
type: file
description: File containing software versions
pattern: "versions.yml"
authors:
- "@kelly-sovacool"
4 changes: 4 additions & 0 deletions tests/config/pytest_modules.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,3 +17,7 @@ cutadapt:
khmer/uniquekmers:
- modules/CCBR/khmer/uniquekmers/**
- tests/CCBR/khmer/uniquekmers/**

samtools/filter_aligned:
- modules/CCBR/samtools/filter_aligned/**
- tests/modules/CCBR/samtools/filter_aligned/**
40 changes: 40 additions & 0 deletions tests/modules/CCBR/samtools/filter_aligned/custom_test.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
import pathlib
import pysam
import pytest


@pytest.mark.workflow("samtools filter_aligned test_filter_aligned_paired_end")
def test_unaligned_paired(workflow_dir):
unaligned_bam_file = pysam.AlignmentFile(
pathlib.Path(workflow_dir, "output/filter/test.unaligned.bam"), "rb"
)
reads_paired = list()
reads_unmapped = list()
read_mates_unmapped = list()
for read in unaligned_bam_file:
reads_paired.append(read.is_paired)
reads_unmapped.append(read.is_unmapped)
read_mates_unmapped.append(read.mate_is_unmapped)
if not read.is_paired or not read.is_unmappped or not read.mate_is_unmapped:
print(
f"{read.query_name}; is_paired: {read.is_paired} is_unmapped: {read.is_unmapped} mate_is_unmapped: {read.mate_is_unmapped}"
)
assert all(reads_paired) and all(reads_unmapped) and all(read_mates_unmapped)


@pytest.mark.workflow("samtools filter_aligned test_filter_aligned_single_end")
def test_unaligned_single(workflow_dir):
unaligned_bam_file = pysam.AlignmentFile(
pathlib.Path(workflow_dir, "output/filter/test.unaligned.bam"), "rb"
)
reads_single = list()
reads_unmapped = list()
for read in unaligned_bam_file:
is_single = not read.is_paired
reads_single.append(is_single)
reads_unmapped.append(read.is_unmapped)
if not is_single or not read.is_unmapped:
print(
f"{read.query_name}; is_single: {is_paired} is_unmapped: {read.is_unmapped}"
)
assert all(reads_single) and all(reads_unmapped)
48 changes: 48 additions & 0 deletions tests/modules/CCBR/samtools/filter_aligned/main.nf
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
#!/usr/bin/env nextflow

nextflow.enable.dsl = 2

include { BWA_INDEX } from '../../../../../modules/CCBR/bwa/index/main.nf'
include { BWA_MEM } from '../../../../../modules/CCBR/bwa/mem/main.nf'
include { FILTER_ALIGNED } from '../../../../../modules/CCBR/samtools/filter_aligned/main.nf'

//
// Test with single-end data
//
workflow test_filter_aligned_single_end {
input = [
[ id:'test', single_end:true ], // meta map
[
file(params.test_data['sarscov2']['illumina']['test_1_fastq_gz'], checkIfExists: true)
]
]
fasta = [
[id: 'test'],
file(params.test_data['sarscov2']['genome']['genome_fasta'], checkIfExists: true)
]

BWA_INDEX ( fasta )
BWA_MEM ( input, BWA_INDEX.out.index )
FILTER_ALIGNED( BWA_MEM.out.bam )
}

//
// Test with paired-end data
//
workflow test_filter_aligned_paired_end {
input = [
[ id:'test', single_end:false ], // meta map
[
file(params.test_data['sarscov2']['illumina']['test_1_fastq_gz'], checkIfExists: true),
kelly-sovacool marked this conversation as resolved.
Show resolved Hide resolved
file(params.test_data['sarscov2']['illumina']['test_2_fastq_gz'], checkIfExists: true)
]
]
fasta = [
[id: 'test'],
file(params.test_data['sarscov2']['genome']['genome_fasta'], checkIfExists: true)
]

BWA_INDEX ( fasta )
BWA_MEM ( input, BWA_INDEX.out.index )
FILTER_ALIGNED( BWA_MEM.out.bam )
}
5 changes: 5 additions & 0 deletions tests/modules/CCBR/samtools/filter_aligned/nextflow.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
process {

publishDir = { "${params.outdir}/${task.process.tokenize(':')[-1].tokenize('_')[0].toLowerCase()}" }

}
41 changes: 41 additions & 0 deletions tests/modules/CCBR/samtools/filter_aligned/test.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
- name: samtools filter_aligned test_filter_aligned_single_end
command: nextflow run ./tests/modules/CCBR/samtools/filter_aligned -entry test_filter_aligned_single_end -c ./tests/config/nextflow.config
tags:
- samtools/filter_aligned
- samtools
files:
- path: output/bwa/test.bam
md5sum: 0f7b7413436295bcd839fec262eabb18
- path: output/bwa/test.bam.bai
md5sum: 61faacff2817f9a6051a7a44a5e3a142
- path: output/bwa/versions.yml
- path: output/filter/test.unaligned.bam
md5sum: e583cf9b78d2f635c22d3df53a32bfbc
- path: output/filter/versions.yml

- name: samtools filter_aligned test_filter_aligned_paired_end
command: nextflow run ./tests/modules/CCBR/samtools/filter_aligned -entry test_filter_aligned_paired_end -c ./tests/config/nextflow.config
tags:
- samtools/filter_aligned
- samtools
files:
- path: output/bwa/test.bam
md5sum: ae4a49f2dd6a487d75f99c9d3b42858b
- path: output/bwa/test.bam.bai
md5sum: 4626411a76af5ef7ece110c14464b52c
- path: output/bwa/versions.yml
- path: output/filter/test.unaligned.bam
md5sum: ae07b575c772327232797e0bb0b306d6
- path: output/filter/versions.yml

- name: samtools filter_aligned test_filter_aligned_single_end stub
command: nextflow run ./tests/modules/CCBR/samtools/filter_aligned -entry test_filter_aligned_single_end -c ./tests/config/nextflow.config -stub
tags:
- samtools/filter_aligned
- samtools
files:
- path: output/bwa/test.bam
- path: output/bwa/test.bam.bai
- path: output/bwa/versions.yml
- path: output/filter/test.unaligned.bam
- path: output/filter/versions.yml
2 changes: 2 additions & 0 deletions tests/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
pysam
pytest-workflow