Skip to content

Commit

Permalink
Merge pull request #93 from apeltzer/master
Browse files Browse the repository at this point in the history
Move lots of general docs to webpage
  • Loading branch information
apeltzer authored Apr 7, 2019
2 parents b351951 + 80135f9 commit 9e1bd2b
Show file tree
Hide file tree
Showing 5 changed files with 217 additions and 2 deletions.
81 changes: 81 additions & 0 deletions markdown/usage/adding_own_config.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
---
title: Adding your own cluster configuration
subtitle: How to handle your own specific configuration for other clusters
---

It is entirely possible to nf-core pipelines on other clusters, though you will need to set up your own config file so that the pipeline knows how to work with your cluster.

> If you think that there are other people using the pipeline who would benefit from your configuration (eg. other common cluster setups), please let us know. We can add a new configuration and profile which can used by specifying `-profile <name>` when running the pipeline. The config file will then be hosted at `nf-core/configs` and will be pulled automatically before the pipeline is executed.
If you are the only person to be running this pipeline, you can create your config file as `~/.nextflow/config` and it will be applied every time you run Nextflow. Alternatively, save the file anywhere and reference it when running the pipeline with `-c path/to/config` (see the [Nextflow documentation](https://www.nextflow.io/docs/latest/config.html) for more).

A basic configuration comes with all nf-core pipelines. This means that you only need to configure the specifics for your system and overwrite any defaults that you intend to change.

## Cluster Environment
By default, pipeline uses the `local` Nextflow executor - in other words, all jobs are run in the login session. If you're using a simple server, this may be fine. If you're using a compute cluster, this is bad as all jobs will run on the head node.

To specify your cluster environment, add the following line to your config file:

```nextflow
process.executor = 'YOUR_SYSTEM_TYPE'
```

Many different cluster types are supported by Nextflow. For more information, please see the [Nextflow documentation](https://www.nextflow.io/docs/latest/executor.html).

Note that you may need to specify cluster options, such as a project or queue. To do so, use the `clusterOptions` config option:

```nextflow
process {
executor = 'SLURM'
clusterOptions = '-A myproject'
}
```
## Software Requirements
To run your selected nf-core pipeline, several software packages are required. How you satisfy these requirements is essentially up to you and depends on your system. If possible, we _highly_ recommend using either Docker or Singularity.

Please see the [`installation documentation`](usage/installation) for how to run using the below as a one-off. These instructions are about configuring a config file for repeated use.

### Docker
Docker is a great way to run nf-core pipelines, as it manages all software installations and allows the pipeline to be run in an identical software environment across a range of systems.

Nextflow has [excellent integration](https://www.nextflow.io/docs/latest/docker.html) with Docker, and beyond installing the two tools, not much else is required - nextflow will automatically fetch the Docker image that we have created and is hosted at DockerHub at run time.

To add docker support to your own config file, add the following:

```nextflow
docker.enabled = true
```

> Note that the DockerHub organisation name annoyingly can't have a hyphen, so it's `nfcore` and not `nf-core`.
### Singularity image
Many HPC environments are not able to run Docker due to security issues.
[Singularity](http://singularity.lbl.gov/) is a tool designed to run on such HPC systems which is very similar to Docker.

To specify singularity usage in your pipeline config file, add the following:

```nextflow
singularity.enabled = true
```

If you intend to run the pipeline offline, nextflow will not be able to automatically download the singularity image for you. Instead, you'll have to do this yourself manually first, transfer the image file and then point to that.

First, pull the image file where you have an internet connection:

```bash
singularity pull --name nfcore-YOURPIPELINENAME-VERSION.simg nfcore/YOURPIPELINENAME:VERSION
```

Then transfer this file and point the config file to the image:

```nextflow
singularity.enabled = true
process.container = "/path/to/nfcore-YOURPIPELINENAME-VERSION.simg"
```
### Conda
If you're not able to use Docker or Singularity, you can instead use conda to manage the software requirements.
To use conda in your own config file, add the following:

```nextflow
process.conda = "$baseDir/environment.yml"
```
4 changes: 2 additions & 2 deletions markdown/usage/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -146,8 +146,8 @@ To use it first ensure that you have conda installed (we recommend [miniconda](h
### Configuration profiles
See [`docs/configuration/adding_your_own.md`](configuration/adding_your_own.md)
See [Adding your own configuration profile](usage/adding_own_config).
## Reference genomes
See [`docs/configuration/reference_genomes.md`](configuration/reference_genomes.md)
See [Reference genomes](usage/reference_genomes)
49 changes: 49 additions & 0 deletions markdown/usage/local_installation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
---
title: Local Configuration of pipelines
subtitle: How to configure your local system to run nf-core pipelines.
---

If you are running the respective pipeline in a local environment, we highly recommend using either Docker or Singularity.

## Docker
Docker is a great way to run nf-core pipelines, as it manages all software installations and allows the pipelines to be run in an identical software environment across a range of systems.

Nextflow has [excellent integration](https://www.nextflow.io/docs/latest/docker.html) with Docker, and beyond installing the two tools, not much else is required. The `docker` profile provides a configuration profile for docker, making it very easy to use. This also comes with the required presets to use the AWS iGenomes resource, meaning that if using common reference genomes you just specify the reference ID and it will be automatically downloaded from AWS S3 to your local system.

First, install docker on your system: [Docker Installation Instructions](https://docs.docker.com/engine/installation/)

Then, simply run the analysis pipeline:

```bash
nextflow run nf-core/YOUR_PIPELINE_NAME -profile docker --genome '<genome ID>'
```

Nextflow will recognise `nf-core/YOUR_PIPELINE_NAME` and download the pipeline from GitHub. The `-profile docker` configuration lists the respective Docker image on DockerHub that we have created and this is downloaded automatically for you. Note that the usage of `--genome` only works when using iGenomes. For more information about this and how to work with reference genomes, see [`Reference Genomes`](usage/reference_genomes).

### Pipeline versions
The public docker images are tagged with the same version numbers as the code, which you can use to ensure reproducibility. When running the pipeline, specify the pipeline version with `-r`, for example `-r 1.0`. This uses pipeline code and docker image from this tagged version.

## Singularity image
Many HPC environments are not able to run Docker due to security issues. [Singularity](http://singularity.lbl.gov/) is a tool designed to run on such HPC systems which is very similar to Docker. Even better, it can use create images directly from dockerhub.

To use the singularity image for a single run, use `-with-singularity`, or use the profile `-profile singularity`. This will download the docker container from dockerhub and create a singularity image for you dynamically.

If you intend to run the pipeline offline, nextflow will not be able to automatically download the singularity image for you. Instead, you'll have to do this yourself manually first, transfer the image file and then point to that.

First, pull the image file where you have an internet connection:

> NB: The "tag" at the end of this command corresponds to the pipeline version.
> Here, we're pulling the docker image for version 1.0 of the `YOUR_PIPELINE_NAME` pipeline
> Make sure that this tag corresponds to the version of the pipeline that you're using
```bash
singularity pull --name nf-core/YOUR_PIPELINE_NAME-1.0.img docker://nfcore/YOUR_PIPELINE_NAME:1.0
```

Then transfer this file and run the pipeline with this path:

```bash
nextflow run /path/to/YOUR_PIPELINE_NAME -with-singularity /path/to/YOUR_PIPELINE_NAME-1.0.img
```

Note, that DockerHub doesn't support hyphens in repository names, thus the URI to DockerHub is always `nfcore` instead of `nf-core`.
55 changes: 55 additions & 0 deletions markdown/usage/reference_genomes.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
---
title: Reference Genomes
subtitle: How reference genomes are handled in nf-core
---

Many nf-core pipelines need a reference genome for alignment, annotation or similar purposes.

Paths to such files can be supplied on the command line at run time, but for convenience it's often better to save these paths in a nextflow config file. See below for instructions on how to do this.
Read [Adding your own cluster configuration](usage/adding_own_config) to find out how to set up custom config files.

## Adding paths to a config file
Specifying long paths every time you run the pipeline is a pain.
To make this easier, the pipeline comes configured to understand reference genome keywords which correspond to preconfigured paths, meaning that you can just specify `--genome ID` when running the pipeline.

Note that this genome key can also be specified in a config file if you always use the same genome.

To use this system, add paths to your config file using the following template:

```nextflow
params {
genomes {
'YOUR-ID' {
fasta = '<PATH TO FASTA FILE>/genome.fa'
}
'OTHER-GENOME' {
// [..]
}
}
// Optional - default genome. Ignored if --genome 'OTHER-GENOME' specified on command line
genome = 'YOUR-ID'
}
```

You can add as many genomes as you like as long as they have unique IDs.

## illumina iGenomes
To make the use of reference genomes easier, illumina has developed a centralised resource called [iGenomes](https://support.illumina.com/sequencing/sequencing_software/igenome.html).
Multiple reference index types are held together with consistent structure for multiple genomes.

We have put a copy of iGenomes up onto AWS S3 hosting and this pipeline is configured to use this by default.
The hosting fees for AWS iGenomes are currently kindly funded by a grant from Amazon.
The pipeline will automatically download the required reference files when you run the pipeline.
For more information about the AWS iGenomes, see [iGenomes](https://ewels.github.io/AWS-iGenomes/).

Downloading the files takes time and bandwidth, so we recommend making a local copy of the iGenomes resource.
Once downloaded, you can customise the variable `params.igenomes_base` in your custom configuration file to point to the reference location.
For example:

```nextflow
params.igenomes_base = '/path/to/data/igenomes/'
```

# Help

In any case if you need help with this, please don't hesitate to ask in our [Slack](https://nf-core-invite.herokuapp.com/) channel for help in the `#igenomes` or `#help` channels.
30 changes: 30 additions & 0 deletions markdown/usage/troubleshooting.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
---
title: Troubleshooting
subtitle: How to troubleshoot commong mistakes and issues
---


## Input files not found

If only no file, only one input file , or only read one and not read two is picked up then something is wrong with your input file declaration

1. The path must be enclosed in quotes (`'` or `"`)
2. The path must have at least one `*` wildcard character. This is even if you are only running one paired end sample.
3. When using the pipeline with paired end data, the path must use `{1,2}` or `{R1,R2}` notation to specify read pairs.
4. If you are running Single end data make sure to specify `--singleEnd`

If the pipeline can't find your files then you will get the following error

```bash
ERROR ~ Cannot find any reads matching: *{1,2}.fastq.gz
```

Note that if your sample name is "messy" then you have to be very particular with your glob specification. A file name like `L1-1-D-2h_S1_L002_R1_001.fastq.gz` can be difficult enough for a human to read. Specifying `*{1,2}*.gz` wont work give you what you want Whilst `*{R1,R2}*.gz` will.

## Data organization
The pipeline can't take a list of multiple input files - it takes a glob expression. If your input files are scattered in different paths then we recommend that you generate a directory with symlinked files. If running in paired end mode please make sure that your files are sensibly named so that they can be properly paired. See the previous point.

## Extra resources and getting help
If you still have an issue with running the pipeline then feel free to contact us via the [Slack](https://nf-core-invite.herokuapp.com/) channel or by opening an issue in the respective pipeline repository on GitHub asking for help.

If you have problems that are directly related to Nextflow and not our pipelines or the nf-core framework [tools](https://github.com/nf-core/tools) then check out the [Nextflow gitter channel](https://gitter.im/nextflow-io/nextflow) or the [google group](https://groups.google.com/forum/#!forum/nextflow).

0 comments on commit 9e1bd2b

Please sign in to comment.