diff --git a/docs/channel.md b/docs/channel.md deleted file mode 100644 index 3b4d099ff9..0000000000 --- a/docs/channel.md +++ /dev/null @@ -1,79 +0,0 @@ -(dataflow-page)= - -# Dataflow - -Nextflow uses a **dataflow** programming model to define workflows declaratively. In this model, {ref}`processes ` in a pipeline are connected to each other through *dataflow channels* and *dataflow values*. - -(dataflow-type-channel)= - -## Channels - -A *dataflow channel* (or simply *channel*) is an asynchronous sequence of values. - -The values in a channel cannot be accessed directly, but only through an operator or process. For example: - -```nextflow -channel.of(1, 2, 3).view { v -> "channel emits ${v}" } -``` - -```console -channel emits 1 -channel emits 2 -channel emits 3 -``` - -### Factories - -A channel can be created by factories in the `channel` namespace. For example, the `channel.fromPath()` factory creates a channel from a file name or glob pattern, similar to the `files()` function: - -```nextflow -channel.fromPath('input/*.txt').view() -``` - -See {ref}`channel-factory` for the full list of channel factories. - -### Operators - -Channel operators, or *operators* for short, are functions that consume and produce channels. Because channels are asynchronous, operators are necessary to manipulate the values in a channel. Operators are particularly useful for implementing glue logic between processes. - -Commonly used operators include: - -- {ref}`operator-combine`: emit the combinations of two channels - -- {ref}`operator-collect`: collect the values from a channel into a list - -- {ref}`operator-filter`: select the values in a channel that satisfy a condition - -- {ref}`operator-flatMap`: transform each value from a channel into a list and emit each list element separately - -- {ref}`operator-grouptuple`: group the values from a channel based on a grouping key - -- {ref}`operator-join`: join the values from two channels based on a matching key - -- {ref}`operator-map`: transform each value from a channel with a mapping function - -- {ref}`operator-mix`: emit the values from multiple channels - -- {ref}`operator-view`: print each value in a channel to standard output - -See {ref}`operator-page` for the full list of operators. - -(dataflow-type-value)= - -## Values - -A *dataflow value* is an asynchronous value. - -Dataflow values can be created using the {ref}`channel.value ` factory, and they are created by processes (under {ref}`certain conditions `). - -A dataflow value cannot be accessed directly, but only through an operator or process. For example: - -```nextflow -channel.value(1).view { v -> "dataflow value is ${v}" } -``` - -```console -dataflow value is 1 -``` - -See {ref}`stdlib-types-value` for the set of available methods for dataflow values. diff --git a/docs/conf.py b/docs/conf.py index 5955fa32ff..47d5739387 100644 --- a/docs/conf.py +++ b/docs/conf.py @@ -57,7 +57,8 @@ 'workflow-outputs.md': 'tutorials/workflow-outputs.md', 'flux.md': 'tutorials/flux.md', 'developer/plugins.md': 'plugins/developing-plugins.md', - 'plugins.md': 'plugins/plugins.md' + 'plugins.md': 'plugins/plugins.md', + 'channel.md': 'workflow.md' } # Add any paths that contain templates here, relative to this directory. diff --git a/docs/index.md b/docs/index.md index 0f2a41d082..9e9d3d43e6 100644 --- a/docs/index.md +++ b/docs/index.md @@ -78,7 +78,6 @@ script working-with-files process process-typed -channel workflow module notifications diff --git a/docs/reference/stdlib-types.md b/docs/reference/stdlib-types.md index 85e5dd39f6..5d46dca09f 100644 --- a/docs/reference/stdlib-types.md +++ b/docs/reference/stdlib-types.md @@ -731,7 +731,7 @@ The following methods are available for listing and traversing directories: : Returns the first-level elements (files and directories) in a directory. `listFiles() -> Iterable` -: :::{deprecated} +: :::{deprecated} 26.04.0 Use `listDirectory()` instead. ::: : Returns the first-level elements (files and directories) in a directory. diff --git a/docs/workflow.md b/docs/workflow.md index cba0047f41..ad103258c5 100644 --- a/docs/workflow.md +++ b/docs/workflow.md @@ -2,7 +2,7 @@ # Workflows -In Nextflow, a **workflow** is a function that is specialized for composing processes and dataflow logic (i.e. channels and operators). +In Nextflow, a **workflow** is a function that is specialized for composing {ref}`processes ` and dataflow logic. See {ref}`syntax-workflow` for a full description of the workflow syntax. @@ -97,6 +97,259 @@ workflow { The default value can be overridden by the command line, params file, or config file. Parameters from multiple sources are resolved in the order described in {ref}`cli-params`. +(workflow-output-def)= + +## Outputs + +:::{versionadded} 25.10.0 +This feature is available as a preview in Nextflow {ref}`24.04 `, {ref}`24.10 `, and {ref}`25.04 `. +::: + +:::{note} +Workflow outputs are intended to replace the {ref}`publishDir ` directive. See {ref}`migrating-workflow-outputs` for guidance on migrating from `publishDir` to workflow outputs. +::: + +A script can define an *output block* which declares the top-level outputs of the workflow. Each output should be assigned in the `publish` section of the entry workflow. Any channel in the workflow can be assigned to an output, including process and subworkflow outputs. + +Here is a basic example: + +```nextflow +process fetch { + // ... + + output: + path 'sample.txt' + + // ... +} + +workflow { + main: + ch_samples = fetch(params.input) + + publish: + samples = ch_samples +} + +output { + samples { + path '.' + } +} +``` + +In the above example, the output of process `fetch` is assigned to the `samples` workflow output. How this output is published to a directory structure is described in the next section. + +(workflow-publishing-files)= + +### Publishing files + +The top-level output directory of a workflow run can be set using the `-output-dir` command-line option or the `outputDir` config option: + +```bash +nextflow run main.nf -output-dir 'my-results' +``` + +```groovy +// nextflow.config +outputDir = 'my-results' +``` + +The default output directory is `results` in the launch directory. + +By default, all output files are published to the output directory. Each output in the output block can define where files are published using the `path` directive. For example: + +```nextflow +workflow { + main: + ch_step1 = step1() + ch_step2 = step2(ch_step1) + + publish: + step1 = ch_step1 + step2 = ch_step2 +} + +output { + step1 { + path 'step1' + } + step2 { + path 'step2' + } +} +``` + +The following directory structure is created: + +``` +results/ +└── step1/ + └── ... +└── step2/ + └── ... +``` + +All files received by an output are published into the specified directory. Lists, maps, and tuples are recursively scanned for nested files. For example: + +```nextflow +workflow { + main: + ch_samples = channel.of( + tuple( [id: 'SAMP1'], [ file('1.txt'), file('2.txt') ] ) + ) + + publish: + samples = ch_samples // 1.txt and 2.txt are published +} +``` + +The `path` directive can also be a closure which defines a custom publish path for each channel value: + +```nextflow +workflow { + main: + ch_samples = channel.of( + [id: 'SAMP1', fastq_1: file('1.fastq'), fastq_2: file('2.fastq')] + ) + + publish: + samples = ch_samples +} + +output { + samples { + path { sample -> "fastq/${sample.id}/" } + } +} +``` + +The above example publishes each channel value to a different subdirectory. In this case, each pair of FASTQ files is published into a subdirectory based on the sample ID. + +The closure can even define a different path for each individual file using the `>>` operator: + +```nextflow +output { + samples { + path { sample -> + sample.fastq_1 >> "fastq/${sample.id}/" + sample.fastq_2 >> "fastq/${sample.id}/" + } + } +} +``` + +Each `>>` specifies a *source file* and *publish target*. The source file should be a file or collection of files, and the publish target should be a directory or file name. If the publish target ends with a slash, it is treated as the directory in which source files are published. Otherwise, it is treated as the target filename of a source file. Only files that are published with the `>>` operator are saved to the output directory. + +:::{note} +Files that do not originate from the work directory are not published. +::: + +### Index files + +Each output can create an index file of the values that were published. An index file preserves the structure of channel values, including metadata, which is simpler than encoding this information with directories and file names. The index file can be a CSV (`.csv`), JSON (`.json`), or YAML (`.yml`, `.yaml`) file. The channel values should be files, lists, maps, or tuples. + +For example: + +```nextflow +workflow { + main: + ch_samples = channel.of( + [id: 1, name: 'sample 1', fastq_1: '1a.fastq', fastq_2: '1b.fastq'], + [id: 2, name: 'sample 2', fastq_1: '2a.fastq', fastq_2: '2b.fastq'], + [id: 3, name: 'sample 3', fastq_1: '3a.fastq', fastq_2: null] + ) + + publish: + samples = ch_samples +} + +output { + samples { + path 'fastq' + index { + path 'samples.csv' + } + } +} +``` + +The above example writes the following CSV file to `results/samples.csv`: + +``` +"1","sample 1","results/fastq/1a.fastq","results/fastq/1b.fastq" +"2","sample 2","results/fastq/2a.fastq","results/fastq/2b.fastq" +"3","sample 3","results/fastq/3a.fastq","" +``` + +You can customize the index file with additional directives, for example: + +```nextflow +index { + path 'samples.csv' + header true + sep '|' +} +``` + +This example produces the following index file: + +``` +"id"|"name"|"fastq_1"|"fastq_2" +"1"|"sample 1"|"results/fastq/1a.fastq"|"results/fastq/1b.fastq" +"2"|"sample 2"|"results/fastq/2a.fastq"|"results/fastq/2b.fastq" +"3"|"sample 3"|"results/fastq/3a.fastq"|"" +``` + +:::{note} +Files that do not originate from the work directory are not published, but are included in the index file. +::: + +See [Output directives](#output-directives) for the list of available index directives. + +### Output directives + +The following directives are available for each output in the output block: + +`index` +: Create an index file containing a record of each published value. + + The following directives are available in an index definition: + + `header` + : When `true`, the keys of the first record are used as the column names (default: `false`). Can also be a list of column names. Only used for CSV files. + + `path` + : The name of the index file relative to the base output directory (required). Can be a CSV, JSON, or YAML file. + + `sep` + : The character used to separate values (default: `','`). Only used for CSV files. + +`label` +: Specify a label to be applied to every published file. Can be specified multiple times. + +`path` +: Specify the publish path relative to the output directory (default: `'.'`). Can be a path, a closure that defines a custom directory for each published value, or a closure that publishes individual files using the `>>` operator. + +Additionally, the following options from the {ref}`workflow ` config scope can be specified as directives: +- `contentType` +- `enabled` +- `ignoreErrors` +- `mode` +- `overwrite` +- `storageClass` +- `tags` + +For example: + +```nextflow +output { + samples { + mode 'copy' + } +} +``` + ## Named workflows A *named workflow* is a workflow that can be called by other workflows: @@ -116,7 +369,7 @@ The above example defines a workflow named `my_workflow` which is called by the ### Takes and emits -The `take:` section is used to declare the inputs of a named workflow: +The `take:` section declares the inputs of a named workflow: ```nextflow workflow my_workflow { @@ -138,7 +391,7 @@ workflow { } ``` -The `emit:` section is used to declare the outputs of a named workflow: +The `emit:` section declares the outputs of a named workflow: ```nextflow workflow my_workflow { @@ -172,30 +425,110 @@ The result of the above workflow can be accessed using `my_workflow.out.my_data` Every output must be assigned to a name when multiple outputs are declared. ::: -:::{versionadded} 25.10.0 -::: +:::{versionadded} 25.10.0 +::: + +When using the {ref}`strict syntax `, workflow takes and emits can specify a type annotation: + +```nextflow +workflow my_workflow { + take: + data: Channel + + main: + ch_hello = hello(data) + ch_bye = bye(ch_hello.collect()) + + emit: + my_data: Value = ch_bye +} +``` + +In the above example, `my_workflow` takes a channel of files (`Channel`) and emits a dataflow value with a single file (`Value`). See {ref}`stdlib-types` for the list of available types. + +(dataflow-page)= + +## Dataflow + +Workflows consist of *dataflow* logic, in which processes are connected to each other through *dataflow channels* and *dataflow values*. + +(dataflow-type-channel)= + +### Channels + +A *dataflow channel* (or simply *channel*) is an asynchronous sequence of values. + +The values in a channel cannot be accessed directly, but only through an operator or process. For example: + +```nextflow +channel.of(1, 2, 3).view { v -> "channel emits ${v}" } +``` + +```console +channel emits 1 +channel emits 2 +channel emits 3 +``` + +**Factories** + +A channel can be created by factories in the `channel` namespace. For example, the `channel.fromPath()` factory creates a channel from a file name or glob pattern, similar to the `files()` function: + +```nextflow +channel.fromPath('input/*.txt').view() +``` + +See {ref}`channel-factory` for the full list of channel factories. + +**Operators** + +Channel operators, or *operators* for short, are functions that consume and produce channels. Because channels are asynchronous, operators are necessary to manipulate the values in a channel. Operators are particularly useful for implementing glue logic between processes. + +Commonly used operators include: + +- {ref}`operator-combine`: emit the combinations of two channels + +- {ref}`operator-collect`: collect the values from a channel into a list + +- {ref}`operator-filter`: select the values in a channel that satisfy a condition + +- {ref}`operator-flatMap`: transform each value from a channel into a list and emit each list element separately + +- {ref}`operator-grouptuple`: group the values from a channel based on a grouping key + +- {ref}`operator-join`: join the values from two channels based on a matching key + +- {ref}`operator-map`: transform each value from a channel with a mapping function + +- {ref}`operator-mix`: emit the values from multiple channels + +- {ref}`operator-view`: print each value in a channel to standard output + +See {ref}`operator-page` for the full list of operators. + +(dataflow-type-value)= + +### Values + +A *dataflow value* is an asynchronous value. -When using the {ref}`strict syntax `, workflow takes and emits can specify a type annotation: +Dataflow values can be created using the {ref}`channel.value ` factory, and they are created by processes (under {ref}`certain conditions `). -```nextflow -workflow my_workflow { - take: - data: Channel +A dataflow value cannot be accessed directly, but only through an operator or process. For example: - main: - ch_hello = hello(data) - ch_bye = bye(ch_hello.collect()) +```nextflow +channel.value(1).view { v -> "dataflow value is ${v}" } +``` - emit: - my_data: Value = ch_bye -} +```console +dataflow value is 1 ``` -In the above example, `my_workflow` takes a channel of files (`Channel`) and emits a dataflow value with a single file (`Value`). See {ref}`stdlib-types` for the list of available types. +See {ref}`stdlib-types-value` for the set of available methods for dataflow values. (workflow-process-invocation)= -## Calling processes and workflows +### Calling processes and workflows Processes and workflows are called like functions, passing their inputs as arguments: @@ -360,7 +693,7 @@ The fully qualified process name can be used as a {ref}`process selector ` is enabled. Using these operators will prevent the type checker from validating your code. ::: -### Pipe `|` +**Pipe `|`** The `|` *pipe* operator can be used to chain processes, operators, and workflows: @@ -406,7 +739,7 @@ workflow { } ``` -### And `&` +**And `&`** The `&` *and* operator can be used to call multiple processes in parallel with the same channel(s): @@ -457,9 +790,9 @@ workflow { (workflow-recursion)= -## Process and workflow recursion +### Process and workflow recursion -:::{versionadded} 21.11.0-edge +:::{versionadded} 22.04.0 ::: :::{note} @@ -480,7 +813,7 @@ In the above example, the `count_down` process is first invoked with the value ` The recursive output can also be limited using the `times` method: -```groovy +```nextflow count_down .recurse(params.start) .times(3) @@ -502,252 +835,3 @@ Workflows can also be invoked recursively: - A recursive process or workflow must have matching inputs and outputs, such that the outputs for each iteration can be supplied as the inputs for the next iteration. - Recursive workflows cannot use *reduction* operators such as `collect`, `reduce`, and `toList`, because these operators cause the recursion to hang indefinitely after the initial iteration. - -(workflow-output-def)= - -## Workflow outputs - -:::{versionadded} 25.10.0 -This feature is available as a preview in Nextflow {ref}`24.04 `, {ref}`24.10 `, and {ref}`25.04 `. -::: - -A script can define an *output block* which declares the top-level outputs of the workflow. Each output should be assigned in the `publish` section of the entry workflow. Any channel in the workflow can be assigned to an output, including process and subworkflow outputs. This approach is intended to replace the {ref}`publishDir ` directive. - -Here is a basic example: - -```nextflow -process fetch { - // ... - - output: - path 'sample.txt' - - // ... -} - -workflow { - main: - ch_samples = fetch(params.input) - - publish: - samples = ch_samples -} - -output { - samples { - path '.' - } -} -``` - -In the above example, the output of process `fetch` is assigned to the `samples` workflow output. How this output is published to a directory structure is described in the next section. - -(workflow-publishing-files)= - -### Publishing files - -The top-level output directory of a workflow run can be set using the `-output-dir` command-line option or the `outputDir` config option: - -```bash -nextflow run main.nf -output-dir 'my-results' -``` - -```groovy -// nextflow.config -outputDir = 'my-results' -``` - -The default output directory is `results` in the launch directory. - -By default, all output files are published to the output directory. Each output in the output block can define where files are published using the `path` directive. For example: - -```nextflow -workflow { - main: - ch_step1 = step1() - ch_step2 = step2(ch_step1) - - publish: - step1 = ch_step1 - step2 = ch_step2 -} - -output { - step1 { - path 'step1' - } - step2 { - path 'step2' - } -} -``` - -The following directory structure will be created: - -``` -results/ -└── step1/ - └── ... -└── step2/ - └── ... -``` - -All files received by an output will be published into the specified directory. Lists and maps are recursively scanned for nested files. For example: - -```nextflow -workflow { - main: - ch_samples = channel.of( - [ [id: 'SAMP1'], [ file('1.txt'), file('2.txt') ] ] - ) - - publish: - samples = ch_samples // 1.txt and 2.txt will be published -} -``` - -The `path` directive can also be a closure which defines a custom publish path for each channel value: - -```nextflow -workflow { - main: - ch_samples = channel.of( - [id: 'SAMP1', fastq_1: file('1.fastq'), fastq_2: file('2.fastq')] - ) - - publish: - samples = ch_samples -} - -output { - samples { - path { sample -> "fastq/${sample.id}/" } - } -} -``` - -The above example will publish each channel value to a different subdirectory. In this case, each pair of FASTQ files will be published to a subdirectory based on the sample ID. - -The closure can even define a different path for each individual file using the `>>` operator: - -```nextflow -output { - samples { - path { sample -> - sample.fastq_1 >> "fastq/${sample.id}/" - sample.fastq_2 >> "fastq/${sample.id}/" - } - } -} -``` - -Each `>>` specifies a *source file* and *publish target*. The source file should be a file or collection of files, and the publish target should be a directory or file name. If the publish target ends with a slash, it is treated as the directory in which source files are published. Otherwise, it is treated as the target filename of a source file. Only files that are published with the `>>` operator are saved to the output directory. - -:::{note} -Files that do not originate from the work directory are not published. -::: - -### Index files - -Each output can create an index file of the values that were published. An index file preserves the structure of channel values, including metadata, which is simpler than encoding this information with directories and file names. The index file can be a CSV (`.csv`), JSON (`.json`), or YAML (`.yml`, `.yaml`) file. The channel values should be files, lists, or maps. - -For example: - -```nextflow -workflow { - main: - ch_samples = channel.of( - [id: 1, name: 'sample 1', fastq_1: '1a.fastq', fastq_2: '1b.fastq'], - [id: 2, name: 'sample 2', fastq_1: '2a.fastq', fastq_2: '2b.fastq'], - [id: 3, name: 'sample 3', fastq_1: '3a.fastq', fastq_2: null] - ) - - publish: - samples = ch_samples -} - -output { - samples { - path 'fastq' - index { - path 'samples.csv' - } - } -} -``` - -The above example will write the following CSV file to `results/samples.csv`: - -``` -"1","sample 1","results/fastq/1a.fastq","results/fastq/1b.fastq" -"2","sample 2","results/fastq/2a.fastq","results/fastq/2b.fastq" -"3","sample 3","results/fastq/3a.fastq","" -``` - -You can customize the index file with additional directives, for example: - -```nextflow -index { - path 'samples.csv' - header true - sep '|' -} -``` - -This example will produce the following index file: - -``` -"id"|"name"|"fastq_1"|"fastq_2" -"1"|"sample 1"|"results/fastq/1a.fastq"|"results/fastq/1b.fastq" -"2"|"sample 2"|"results/fastq/2a.fastq"|"results/fastq/2b.fastq" -"3"|"sample 3"|"results/fastq/3a.fastq"|"" -``` - -:::{note} -Files that do not originate from the work directory are not published, but are included in the index file. -::: - -See [Output directives](#output-directives) for the list of available index directives. - -### Output directives - -The following directives are available for each output in the output block: - -`index` -: Create an index file which will contain a record of each published value. - - The following directives are available in an index definition: - - `header` - : When `true`, the keys of the first record are used as the column names (default: `false`). Can also be a list of column names. Only used for CSV files. - - `path` - : The name of the index file relative to the base output directory (required). Can be a CSV, JSON, or YAML file. - - `sep` - : The character used to separate values (default: `','`). Only used for CSV files. - -`label` -: Specify a label to be applied to every published file. Can be specified multiple times. - -`path` -: Specify the publish path relative to the output directory (default: `'.'`). Can be a path, a closure that defines a custom directory for each published value, or a closure that publishes individual files using the `>>` operator. - -Additionally, the following options from the {ref}`workflow ` config scope can be specified as directives: -- `contentType` -- `enabled` -- `ignoreErrors` -- `mode` -- `overwrite` -- `storageClass` -- `tags` - -For example: - -```nextflow -output { - samples { - mode 'copy' - } -} -```