Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modifying a mutable object (like Map) in a duplicated channel affects all instances #2660

Closed
Midnighter opened this issue Feb 18, 2022 · 11 comments · Fixed by #3740
Closed

Comments

@Midnighter
Copy link

Bug report

Expected behavior and actual behavior

I don't know if this is a bug or just a huge gotcha. Either way, I've been affected by it multiple times and so have others. DSL2 is supposed to automatically copy/duplicate channels when they get re-used multiple times. In general, this works perfectly fine. However, when the channel contains a mutable objects like a Map and those get modified in place, this affects all channel instances. I don't know the underlying code but I guess for memory reason channels get copied only shallowly, i.e., they reference the same objects. This can lead to more or less subtle bugs in pipelines and surprising behavior.

Steps to reproduce the problem

Consider the following workflow:

#!/usr/bin/env nextflow

nextflow.enable.dsl = 2

process ECHO {
    input:
    val meta

    output:
    val meta

    script:
    """
    echo '${meta.id}'
    """
}

process ECHO2 {
    input:
    val meta

    output:
    val meta

    script:
    """
    echo '${meta.id}'
    """
}

workflow {
    def ch_input = Channel.of([id: 'test1', idx: 1], [id: 'test2', idx: 2])

    ECHO(ch_input).view()

    ECHO2(
        ch_input.map { meta ->
            meta.id = 'foo'
            meta
        }
    ).view()
}

This produces below output where all id keys contain the 'foo' value. If instead you clone the map, the output is more in line with what I expect.

#!/usr/bin/env nextflow

nextflow.enable.dsl = 2

process ECHO {
    input:
    val meta

    output:
    val meta

    script:
    """
    echo '${meta.id}'
    """
}

process ECHO2 {
    input:
    val meta

    output:
    val meta

    script:
    """
    echo '${meta.id}'
    """
}

workflow {
    def ch_input = Channel.of([id: 'test1', idx: 1], [id: 'test2', idx: 2])

    ECHO(ch_input).view()

    ECHO2(
        ch_input.map { meta ->
            def dup = meta.clone()
            dup.id = 'foo'
            dup
        }
    ).view()
}

Program output

N E X T F L O W  ~  version 21.10.6
Launching `mutable.nf` [friendly_koch] - revision: 49b9711e9b
executor >  local (4)
[98/ed1c1a] process > ECHO (1)  [100%] 2 of 2 ✔
[a9/4e05a2] process > ECHO2 (2) [100%] 2 of 2 ✔
[id:foo, idx:2]
[id:foo, idx:1]
[id:foo, idx:2]
[id:foo, idx:1]

Output for the "cloned" version:

N E X T F L O W  ~  version 21.10.6
Launching `mutable.nf` [romantic_bhabha] - revision: 6df7bba35e
executor >  local (4)
[0a/3aed82] process > ECHO (1)  [100%] 2 of 2 ✔
[c3/029147] process > ECHO2 (1) [100%] 2 of 2 ✔
[id:test2, idx:2]
[id:test1, idx:1]
[id:foo, idx:2]
[id:foo, idx:1]

Environment

  • Nextflow version: 21.10.6
  • Java version: openjdk 11.0.13 2021-10-19
  • Operating system: Linux 5.15.15-76051515-generic #202201160435164269382420.04~97db1bb-Ubuntu
  • Bash version: GNU bash, version 5.0.17(1)-release (x86_64-pc-linux-gnu)
@adamrtalbot
Copy link
Collaborator

I've hit this problem as well, is there a workaround?

@Midnighter
Copy link
Author

The workaround is to clone the map as shown in the second example.

@adamrtalbot
Copy link
Collaborator

Ah sorry, clearly not paying enough attention! Using .clone() has fixed it.
For what it's worth, here are my details

  • Nextflow version: 21.10.6
  • Java version: openjdk 1.8.0
  • OS: Ubuntu 18.04.6 LTS
  • BASH: 4.4.20

@bentsherman
Copy link
Member

Good thing to add to the docs under a section on troubleshooting or gotchas.

@bentsherman bentsherman changed the title [DSL2] modifying a mutable object (like Map) in a duplicated channel affects all instances Modifying a mutable object (like Map) in a duplicated channel affects all instances Aug 9, 2022
@bentsherman
Copy link
Member

Been thinking about this issue. Fundamentally the mutable object needs to be cloned, the question is whether Nextflow should do it automatically or the user should do it when they need it.

I am leaning towards the latter approach. For Nextflow to perform a deep copy every time a channel is referenced (i.e. implicit into from DSL1) would add needless overhead in most cases, and in the worst case might copy some large in-memory object (e.g. a CSV file loaded into memory) that either takes a while or causes an OOM error. Also, mutating state in this way kinda goes against the functional paradigm employed by Nextflow.

So I think the best solution is to just use a functional style that doesn't require mutation. Here is your example reworked to produce a new map without mutating or using clone() at all:

        ch_input.map { meta ->
            meta + [id: 'foo']
        }

See this article

@Midnighter
Copy link
Author

Midnighter commented Aug 14, 2022

From the linked article:

The easiest way to merge two maps in Groovy is to use + operator. This method is straightforward - it creates a new map from the left-hand-side and right-hand-side maps.

that means it is equivalent to using clone and the modifying the state. On one hand, style-wise, I agree that your suggestion is nice since it can modify multiple keys in one statement. On the other hand, it may not be immediately obvious to every reader of the code that values of the LHS for keys in RHS get overwritten by RHS' values.

Personally, I would then probably prefer to implement a 'merge' function with explanatory docstring just as in the article and use that.

@diego-rt
Copy link

Hey guys, just want to note in this thread as well that .clone() does not work on maps that carry a groupKey.

This works:

#!/usr/bin/env nextflow
nextflow.enable.dsl = 1

/*
 * Defines the pipeline inputs parameters
 */

Channel
    .from( [ "sample_name":"sample_A" ], [ "sample_name":"sample_B" ], [ "sample_name":"sample_C" ] )
    .set { samples_ch }

Channel
    .from( ["scaffolds", "file_s"], ["contigs", "file_c"])
    .set { genomes_ch }

samples_ch
    .combine( genomes_ch )
    .map { sample_info, genome_name, index -> 
            def tmp = sample_info.clone()
            tmp.genomeName = genome_name
            return [tmp, tuple(genome_name), index] }
    .view()

This does not work:

#!/usr/bin/env nextflow
nextflow.enable.dsl = 1

/*
 * Defines the pipeline inputs parameters
 */

Channel
    .from( [ "sample_name":"sample_A" ], [ "sample_name":"sample_B" ], [ "sample_name":"sample_C" ] )
    .set { samples_ch }

Channel
    .from( ["scaffolds", "file_s"], ["contigs", "file_c"])
    .set { genomes_ch }

samples_ch
    .combine( genomes_ch )
    .map { sample_info, genome_name, index -> tuple(groupKey(sample_info, 1), genome_name, index )}
    .map { sample_info, genome_name, index -> 
            def tmp = sample_info.clone()
            tmp.genomeName = genome_name
            return [tmp, tuple(genome_name), index] }
    .view()

@bentsherman
Copy link
Member

@Midnighter I think the + syntax will work great for most people, but it's yet another example of how I think the Nextflow docs just need to have more comprehensive docs on Groovy syntax, instead of only linking to other sources.

@diego-rt Would the + syntax I showed above solve your problem without having to use clone()?

@diego-rt
Copy link

Actually yes, the + syntax seems to work just fine. It doesn't seem to preserve the groupKey of the object though.

Btw, also another thing that seems to be missing AFAIK is a way to remove the groupKey from an object.

@stale
Copy link

stale bot commented Jan 16, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Jan 16, 2023
@stale stale bot removed the stale label Jan 16, 2023
@stale stale bot removed the stale label Jan 16, 2023
@Midnighter
Copy link
Author

In order to keep this alive, just wanted to comment that I've started using the + operator more and more @bentsherman . I've grown to like it a lot. It's very concise yet explicit about what's happening.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants