Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pipeline updates / addressing open issues #770

Merged
merged 9 commits into from
Feb 21, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 18 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,26 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### Enhancements & fixes

* [[#734](https://github.com/nf-core/rnaseq/issues/734)] - Is a vulnerable picard still used ? log4j vulnerability
* [[#744](https://github.com/nf-core/rnaseq/issues/744)] - Auto-detect and raise error if CSI is required for BAM indexing
* [[#752](https://github.com/nf-core/rnaseq/issues/752)] - How to set publishing mode for all processes?
* [[#753](https://github.com/nf-core/rnaseq/issues/753)] - Add warning when user provides `--transcript_fasta`
* [[#754](https://github.com/nf-core/rnaseq/issues/754)] - DESeq2 QC issue linked to `--count_col` parameter
* [[#755](https://github.com/nf-core/rnaseq/issues/755)] - Rename RSEM_PREPAREREFERENCE_TRANSCRIPTS process
* [[#759](https://github.com/nf-core/rnaseq/issues/759)] - Empty lines in samplesheet.csv cause a crash

### Parameters

| Old parameter | New parameter |
|-------------------------------|---------------------------------------|
| | `--publish_dir_mode` |

> **NB:** Parameter has been **updated** if both old and new parameter information is present.
>
> **NB:** Parameter has been **added** if just the new parameter information is present.
>
> **NB:** Parameter has been **removed** if new parameter information isn't present.

## [[3.5](https://github.com/nf-core/rnaseq/releases/tag/3.5)] - 2021-12-17

### Enhancements & fixes
Expand Down
129 changes: 65 additions & 64 deletions bin/check_samplesheet.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,79 +61,80 @@ def check_samplesheet(file_in, file_out):

## Check sample entries
for line in fin:
lspl = [x.strip().strip('"') for x in line.strip().split(",")]

## Check valid number of columns per row
if len(lspl) < len(HEADER):
print_error(
f"Invalid number of columns (minimum = {len(HEADER)})!",
"Line",
line,
)

num_cols = len([x for x in lspl if x])
if num_cols < MIN_COLS:
print_error(
f"Invalid number of populated columns (minimum = {MIN_COLS})!",
"Line",
line,
)

## Check sample name entries
sample, fastq_1, fastq_2, strandedness = lspl[: len(HEADER)]
if sample.find(" ") != -1:
print(
f"WARNING: Spaces have been replaced by underscores for sample: {sample}"
)
sample = sample.replace(" ", "_")
if not sample:
print_error("Sample entry has not been specified!", "Line", line)

## Check FastQ file extension
for fastq in [fastq_1, fastq_2]:
if fastq:
if fastq.find(" ") != -1:
print_error("FastQ file contains spaces!", "Line", line)
if not fastq.endswith(".fastq.gz") and not fastq.endswith(".fq.gz"):
if line.strip():
lspl = [x.strip().strip('"') for x in line.strip().split(",")]

## Check valid number of columns per row
if len(lspl) < len(HEADER):
print_error(
f"Invalid number of columns (minimum = {len(HEADER)})!",
"Line",
line,
)

num_cols = len([x for x in lspl if x])
if num_cols < MIN_COLS:
print_error(
f"Invalid number of populated columns (minimum = {MIN_COLS})!",
"Line",
line,
)

## Check sample name entries
sample, fastq_1, fastq_2, strandedness = lspl[: len(HEADER)]
if sample.find(" ") != -1:
print(
f"WARNING: Spaces have been replaced by underscores for sample: {sample}"
)
sample = sample.replace(" ", "_")
if not sample:
print_error("Sample entry has not been specified!", "Line", line)

## Check FastQ file extension
for fastq in [fastq_1, fastq_2]:
if fastq:
if fastq.find(" ") != -1:
print_error("FastQ file contains spaces!", "Line", line)
if not fastq.endswith(".fastq.gz") and not fastq.endswith(".fq.gz"):
print_error(
"FastQ file does not have extension '.fastq.gz' or '.fq.gz'!",
"Line",
line,
)

## Check strandedness
strandednesses = ["unstranded", "forward", "reverse"]
if strandedness:
if strandedness not in strandednesses:
print_error(
"FastQ file does not have extension '.fastq.gz' or '.fq.gz'!",
f"Strandedness must be one of '{', '.join(strandednesses)}'!",
"Line",
line,
)

## Check strandedness
strandednesses = ["unstranded", "forward", "reverse"]
if strandedness:
if strandedness not in strandednesses:
else:
print_error(
f"Strandedness must be one of '{', '.join(strandednesses)}'!",
f"Strandedness has not been specified! Must be one of {', '.join(strandednesses)}.",
"Line",
line,
)
else:
print_error(
f"Strandedness has not been specified! Must be one of {', '.join(strandednesses)}.",
"Line",
line,
)

## Auto-detect paired-end/single-end
sample_info = [] ## [single_end, fastq_1, fastq_2, strandedness]
if sample and fastq_1 and fastq_2: ## Paired-end short reads
sample_info = ["0", fastq_1, fastq_2, strandedness]
elif sample and fastq_1 and not fastq_2: ## Single-end short reads
sample_info = ["1", fastq_1, fastq_2, strandedness]
else:
print_error("Invalid combination of columns provided!", "Line", line)

## Create sample mapping dictionary = {sample: [[ single_end, fastq_1, fastq_2, strandedness ]]}
if sample not in sample_mapping_dict:
sample_mapping_dict[sample] = [sample_info]
else:
if sample_info in sample_mapping_dict[sample]:
print_error("Samplesheet contains duplicate rows!", "Line", line)

## Auto-detect paired-end/single-end
sample_info = [] ## [single_end, fastq_1, fastq_2, strandedness]
if sample and fastq_1 and fastq_2: ## Paired-end short reads
sample_info = ["0", fastq_1, fastq_2, strandedness]
elif sample and fastq_1 and not fastq_2: ## Single-end short reads
sample_info = ["1", fastq_1, fastq_2, strandedness]
else:
print_error("Invalid combination of columns provided!", "Line", line)

## Create sample mapping dictionary = {sample: [[ single_end, fastq_1, fastq_2, strandedness ]]}
if sample not in sample_mapping_dict:
sample_mapping_dict[sample] = [sample_info]
else:
sample_mapping_dict[sample].append(sample_info)
if sample_info in sample_mapping_dict[sample]:
print_error("Samplesheet contains duplicate rows!", "Line", line)
else:
sample_mapping_dict[sample].append(sample_info)

## Write validated samplesheet with appropriate columns
if len(sample_mapping_dict) > 0:
Expand Down
2 changes: 1 addition & 1 deletion bin/deseq2_qc.r
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ library(pheatmap)

option_list <- list(
make_option(c("-i", "--count_file" ), type="character", default=NULL , metavar="path" , help="Count file matrix where rows are genes and columns are samples." ),
make_option(c("-f", "--count_col" ), type="integer" , default=2 , metavar="integer", help="First column containing sample count data." ),
make_option(c("-f", "--count_col" ), type="integer" , default=3 , metavar="integer", help="First column containing sample count data." ),
make_option(c("-d", "--id_col" ), type="integer" , default=1 , metavar="integer", help="Column containing identifiers to be used." ),
make_option(c("-r", "--sample_suffix" ), type="character", default='' , metavar="string" , help="Suffix to remove after sample name in columns e.g. '.rmDup.bam' if 'DRUG_R1.rmDup.bam'."),
make_option(c("-o", "--outdir" ), type="character", default='./' , metavar="path" , help="Output directory." ),
Expand Down
Loading