Skip to content

Commit

Permalink
Merge pull request #119 from JoseEspinosa/my_dsl2
Browse files Browse the repository at this point in the history
Add new version syntax based on yml files to the pipeline
  • Loading branch information
lpantano authored Nov 12, 2021
2 parents 0444a9c + 0019e48 commit 3e25b1c
Show file tree
Hide file tree
Showing 41 changed files with 614 additions and 434 deletions.
53 changes: 15 additions & 38 deletions .github/CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,23 +61,21 @@ For further information/help, please consult the [nf-core/smrnaseq documentation

To make the nf-core/smrnaseq code and processing logic more understandable for new contributors and to ensure quality, we semi-standardise the way the code and other contributions are written.

### Adding a new step

If you wish to contribute a new step, please use the following coding standards:

1. Define the corresponding input channel into your new process from the expected previous process channel
2. Write the process block (see below).
3. Define the output channel if needed (see below).
4. Add any new flags/options to `nextflow.config` with a default (see below).
5. Add any new flags/options to `nextflow_schema.json` with help text (with `nf-core schema build`).
6. Add any new flags/options to the help message (for integer/text parameters, print to help the corresponding `nextflow.config` parameter).
7. Add sanity checks for all relevant parameters.
8. Add any new software to the `scrape_software_versions.py` script in `bin/` and the version command to the `scrape_software_versions` process in `main.nf`.
9. Do local tests that the new code works properly and as expected.
10. Add a new test command in `.github/workflow/ci.yml`.
11. If applicable add a [MultiQC](https://https://multiqc.info/) module.
12. Update MultiQC config `assets/multiqc_config.yaml` so relevant suffixes, name clean up, General Statistics Table column order, and module figures are in the right order.
13. Optional: Add any descriptions of MultiQC report sections and output files to `docs/output.md`.
### Adding a new step or module

If you wish to contribute a new step or module please see the [official guidelines](https://nf-co.re/developers/adding_modules#new-module-guidelines-and-pr-review-checklist) and use the following coding standards:

1. Add any new flags/options to `nextflow.config` with a default (see section below).
2. Add any new flags/options to `nextflow_schema.json` with help text via `nf-core schema build`.
3. Add sanity checks for all relevant parameters.
4. Perform local tests to validate that the new code works as expected.
5. If applicable, add a new test command in `.github/workflow/ci.yml`.
6. Add any descriptions of output files to `docs/output.md`.
7. Do local tests that the new code works properly and as expected.
8. Add a new test command in `.github/workflow/ci.yml`.
9. If applicable add a [MultiQC](https://https://multiqc.info/) module.
10. Update MultiQC config `assets/multiqc_config.yaml` so relevant suffixes, name clean up, General Statistics Table column order, and module figures are in the right order.
11. Optional: Add any descriptions of MultiQC report sections and output files to `docs/output.md`.

### Default values

Expand All @@ -102,27 +100,6 @@ Please use the following naming schemes, to make it easy to understand what is g

If you are using a new feature from core Nextflow, you may bump the minimum required version of nextflow in the pipeline with: `nf-core bump-version --nextflow . [min-nf-version]`

### Software version reporting

If you add a new tool to the pipeline, please ensure you add the information of the tool to the `get_software_version` process.

Add to the script block of the process, something like the following:

```bash
<YOUR_TOOL> --version &> v_<YOUR_TOOL>.txt 2>&1 || true
```

or

```bash
<YOUR_TOOL> --help | head -n 1 &> v_<YOUR_TOOL>.txt 2>&1 || true
```

You then need to edit the script `bin/scrape_software_versions.py` to:

1. Add a Python regex for your tool's `--version` output (as in stored in the `v_<YOUR_TOOL>.txt` file), to ensure the version is reported as a `v` and the version number e.g. `v2.1.1`
2. Add a HTML entry to the `OrderedDict` for formatting in MultiQC.

### Images and figures

For overview images and other documents we follow the nf-core [style guidelines and examples](https://nf-co.re/developers/design_guidelines).
2 changes: 2 additions & 0 deletions .nf-core.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
lint:
files_unchanged:
- .github/CONTRIBUTING.md
- .markdownlint.yml
- assets/email_template.html
- assets/email_template.txt
Expand All @@ -8,3 +9,4 @@ lint:
files_exist:
- bin/scrape_software_versions.py
- modules/local/get_software_versions.nf

1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

## v1.3.0dev - [2021-09-15]

* Software version(s) will now be reported for every module imported during a given pipeline execution
* Adapted DSL 2.0
* Updated `nextflow_schema.json` should now display correctly on Nextflow Tower
* Added mirtop logs to multiqc
Expand Down
50 changes: 11 additions & 39 deletions bin/edgeR_miRBase.r
Original file line number Diff line number Diff line change
Expand Up @@ -4,44 +4,17 @@
args = commandArgs(trailingOnly=TRUE)

input <- as.character(args[1:length(args)])
# .libPaths( c( ".", .libPaths()) )
# install.packages("BiocManager", dependencies=TRUE, repos='http://cloud.r-project.org/')

# # Load / install required packages
# if (!require("limma")){
# BiocManager::install("limma", suppressUpdates=TRUE)
# library("limma")
# }

# if (!require("edgeR")){
# BiocManager::install("edgeR", suppressUpdates=TRUE)
# library("edgeR")
# }

# if (!require("statmod")){
# install.packages("statmod", dependencies=TRUE, repos='http://cloud.r-project.org/')
# library("statmod")
# }

# if (!require("data.table")){
# install.packages("data.table", dependencies=TRUE, repos='http://cloud.r-project.org/')
# library("data.table")
# }

# if (!require("gplots")) {
# install.packages("gplots", dependencies=TRUE, repos='http://cloud.r-project.org/')
# library("gplots")
# }

# if (!require("methods")) {
# install.packages("methods", dependencies=TRUE, repos='http://cloud.r-project.org/')
# library("methods")
# }
library("limma")
library("edgeR")
library("statmod")
library("data.table")
library("gplots")
library("methods")

# Put mature and hairpin count files in separated file lists
filelist<-list()
filelist[[1]]<-input[grep(".mature.*stats",input)]
filelist[[2]]<-input[grep(".hairpin.*stats",input)]
filelist[[1]]<-input[grep(".mature.sorted",input)]
filelist[[2]]<-input[grep(".hairpin.sorted",input)]
names(filelist)<-c("mature","hairpin")
print(filelist)

Expand All @@ -53,12 +26,11 @@ for (i in 1:2) {
unmapped<-do.call("cbind", lapply(filelist[[i]], fread, header=FALSE, select=c(4)))
data<-as.data.frame(data)
unmapped<-as.data.frame(unmapped)

temp <- fread(filelist[[i]][1],header=FALSE, select=c(1))
rownames(data)<-temp$V1
rownames(unmapped)<-temp$V1
colnames(data)<-gsub(".stats","",basename(filelist[[i]]))
colnames(unmapped)<-gsub(".stats","",basename(filelist[[i]]))
colnames(data)<-gsub("_mature.*","",basename(filelist[[i]]))
colnames(unmapped)<-gsub("_mature.*","",basename(filelist[[i]]))

data<-data[rownames(data)!="*",,drop=FALSE]
unmapped<-unmapped[rownames(unmapped)=="*",,drop=FALSE]
Expand Down Expand Up @@ -114,7 +86,7 @@ for (i in 1:2) {
write.table(MDSdata$distance.matrix, paste(header,"_edgeR_MDS_distance_matrix.txt",sep=""), quote=FALSE, sep="\t")

# Print plot x,y co-ordinates to file
MDSxy = MDSdata$cmdscale.out
MDSxy = data.frame(x=MDSdata$x, y=MDSdata$y)
colnames(MDSxy) = c(paste(MDSdata$axislabel, '1'), paste(MDSdata$axislabel, '2'))

write.table(MDSxy, paste(header,"_edgeR_MDS_plot_coordinates.txt",sep=""), quote=FALSE, sep="\t")
Expand Down
36 changes: 0 additions & 36 deletions bin/scrape_software_versions.py

This file was deleted.

7 changes: 4 additions & 3 deletions conf/test_full.config
Original file line number Diff line number Diff line change
Expand Up @@ -11,14 +11,15 @@
*/

params {
max_memory = 12.GB
max_cpus = 8
config_profile_name = 'Full test profile'
config_profile_description = 'Full test dataset to check pipeline function'

// Input data for full size test
input = 'https://github.com/nf-core/test-datasets/raw/smrnaseq-better-input/testdata/samplesheet.csv'


genome = 'GRCh37'
genome = 'GRCh37'
mirtrace_species = "hsa"
}


6 changes: 3 additions & 3 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,9 +22,9 @@ It should point to the 3-letter species name used by `miRBase`.

### miRNA related files

* `mirna_gtf`: If not supplied by the user, then `mirna_gtf` will point to the latest GFF3 file in miRbase: `ftp://mirbase.org/pub/mirbase/CURRENT/genomes/${params.mirtrace_species}.gff3`
* `mature`: points to the FASTA file of mature miRNA sequences. `ftp://mirbase.org/pub/mirbase/CURRENT/mature.fa.gz`
* `hairpin`: points to the FASTA file of precursor miRNA sequences. `ftp://mirbase.org/pub/mirbase/CURRENT/hairpin.fa.gz`
* `mirna_gtf`: If not supplied by the user, then `mirna_gtf` will point to the latest GFF3 file in miRbase: `https://mirbase.org/ftp/CURRENT/genomes/${params.mirtrace_species}.gff3`
* `mature`: points to the FASTA file of mature miRNA sequences. `https://mirbase.org/ftp/CURRENT/mature.fa.gz`
* `hairpin`: points to the FASTA file of precursor miRNA sequences. `https://mirbase.org/ftp/CURRENT/hairpin.fa.gz`

### Genome

Expand Down
30 changes: 9 additions & 21 deletions lib/NfcoreTemplate.groovy
Original file line number Diff line number Diff line change
Expand Up @@ -19,27 +19,16 @@ class NfcoreTemplate {
}

//
// Check params.hostnames
// Warn if a -profile or Nextflow config has not been provided to run the pipeline
//
public static void hostName(workflow, params, log) {
Map colors = logColours(params.monochrome_logs)
if (params.hostnames) {
try {
def hostname = "hostname".execute().text.trim()
params.hostnames.each { prof, hnames ->
hnames.each { hname ->
if (hostname.contains(hname) && !workflow.profile.contains(prof)) {
log.info "=${colors.yellow}====================================================${colors.reset}=\n" +
"${colors.yellow}WARN: You are running with `-profile $workflow.profile`\n" +
" but your machine hostname is ${colors.white}'$hostname'${colors.reset}.\n" +
" ${colors.yellow_bold}Please use `-profile $prof${colors.reset}`\n" +
"=${colors.yellow}====================================================${colors.reset}="
}
}
}
} catch (Exception e) {
log.warn "[$workflow.manifest.name] Could not determine 'hostname' - skipping check. Reason: ${e.message}."
}
public static void checkConfigProvided(workflow, log) {
if (workflow.profile == 'standard' && workflow.configFiles.size() <= 1) {
log.warn "[$workflow.manifest.name] You are attempting to run the pipeline without any custom configuration!\n\n" +
"This will be dependent on your local compute enviroment but can be acheived via one or more of the following:\n" +
" (1) Using an existing pipeline profile e.g. `-profile docker` or `-profile singularity`\n" +
" (2) Using an existing nf-core/configs for your Institution e.g. `-profile crick` or `-profile uppmax`\n" +
" (3) Using your own local custom config e.g. `-c /path/to/your/custom.config`\n\n" +
"Please refer to the quick start section and usage docs for the pipeline.\n "
}
}

Expand Down Expand Up @@ -168,7 +157,6 @@ class NfcoreTemplate {
log.info "-${colors.purple}[$workflow.manifest.name]${colors.red} Pipeline completed successfully, but with errored process(es) ${colors.reset}-"
}
} else {
hostName(workflow, params, log)
log.info "-${colors.purple}[$workflow.manifest.name]${colors.red} Pipeline completed with errors${colors.reset}-"
}
}
Expand Down
6 changes: 3 additions & 3 deletions lib/WorkflowMain.groovy
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,9 @@ class WorkflowMain {
// Print parameter summary log to screen
log.info paramsSummaryLog(workflow, params, log)

// Check that a -profile or Nextflow config has been provided to run the pipeline
NfcoreTemplate.checkConfigProvided(workflow, log)

// Check that conda channels are set-up correctly
if (params.enable_conda) {
Utils.checkCondaChannels(log)
Expand All @@ -68,9 +71,6 @@ class WorkflowMain {
// Check AWS batch settings
NfcoreTemplate.awsBatch(workflow, params)

// Check the hostnames against configured profiles
NfcoreTemplate.hostName(workflow, params, log)

// Check input has been provided
if (!params.input) {
log.error "Please provide an input samplesheet to the pipeline e.g. '--input samplesheet.csv'"
Expand Down
3 changes: 3 additions & 0 deletions modules.json
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,9 @@
"cat/fastq": {
"git_sha": "3aacd46da2b221ed47aaa05c413a828538d2c2ae"
},
"custom/dumpsoftwareversions": {
"git_sha": "3aacd46da2b221ed47aaa05c413a828538d2c2ae"
},
"fastqc": {
"git_sha": "3aacd46da2b221ed47aaa05c413a828538d2c2ae"
},
Expand Down
18 changes: 12 additions & 6 deletions modules/local/bowtie_genome.nf
Original file line number Diff line number Diff line change
@@ -1,11 +1,15 @@
// Import generic module functions
include { saveFiles; initOptions; getSoftwareName } from './functions'
include { saveFiles; initOptions; getSoftwareName; getProcessName } from './functions'

params.options = [:]
options = initOptions(params.options)

process INDEX_GENOME {
tag "$fasta"
label 'process_medium'
publishDir "${params.outdir}",
mode: params.publish_dir_mode,
saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process)+"/${options.suffix}", meta:meta, publish_by_meta:['id']) }

conda (params.enable_conda ? 'bioconda::bowtie=1.3.0-2' : null)
if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) {
Expand All @@ -18,21 +22,23 @@ process INDEX_GENOME {
path fasta

output:
path 'genome*ebwt' , emit: bt_indices
path 'genome.edited.fa' , emit: fasta
path 'genome*ebwt' , emit: bt_indices
path 'genome.edited.fa', emit: fasta
path "versions.yml" , emit: versions

script:
def software = getSoftwareName(task.process)

"""
# Remove any special base characters from reference genome FASTA file
sed '/^[^>]/s/[^ATGCatgc]/N/g' $fasta > genome.edited.fa
sed -i 's/ .*//' genome.edited.fa
# Build bowtie index
bowtie-build genome.edited.fa genome --threads ${task.cpus}
cat <<-END_VERSIONS > versions.yml
${getProcessName(task.process)}:
bowtie: \$(echo \$(bowtie --version 2>&1) | sed 's/^.*bowtie-align-s version //; s/ .*\$//')
END_VERSIONS
"""

}
Loading

0 comments on commit 3e25b1c

Please sign in to comment.