Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

join doesn't match the same keys #3939

Open
berguner opened this issue May 12, 2023 · 4 comments
Open

join doesn't match the same keys #3939

berguner opened this issue May 12, 2023 · 4 comments

Comments

@berguner
Copy link

Bug report

Expected behavior and actual behavior

Sometimes join function fails matching entries with same key values. Tricky bit is that this happens most of the time but not always.

Steps to reproduce the problem

You can checkout this commit and run the test command:
nextflow run main.nf -profile test,tools_mutect2_ms,docker

Program output

Complete log file: nextflow-join-bug.log

Error message:

May-12 11:30:17.804 [Actor Thread 92] ERROR nextflow.extension.OperatorImpl - @unknown
nextflow.exception.AbortOperationException: Join mismatch for the following entries: 
- key=[id:multi_tumor_normal, normal_id:normal, num_intervals:2, patient:multi_tumor_normal, sex:XX] values=[/home/berguner/src/nf-core-sarek/work/25/e0c8134415c2e51506ffff18af1242/multi_tumor_normal.mutect2.vcf.gz, /home/berguner/src/nf-core-sarek/work/25/e0c8134415c2e51506ffff18af1242/multi_tumor_normal.mutect2.vcf.gz.tbi, /home/berguner/src/nf-core-sarek/work/42/bc146f28446582925d8788cc58320b/multi_tumor_normal.mutect2.vcf.gz.stats, /home/berguner/src/nf-core-sarek/work/05/d1a405c69b2445900fc000ba12ee4a/multi_tumor_normal.mutect2.artifactprior.tar.gz] 
- key=[id:multi_tumor_normal, normal_id:normal, num_intervals:2, patient:multi_tumor_normal, sex:XX] values=[[/home/berguner/src/nf-core-sarek/work/f7/853c49e5096c6241223dbda50ef62d/tumour_copy_vs_normal.mutect2.segmentation.table, /home/berguner/src/nf-core-sarek/work/16/95d99ed60da6dac8c7ea56ac776072/tumour_vs_normal.mutect2.segmentation.table]]
	at nextflow.extension.JoinOp.checkForMismatch(JoinOp.groovy:290)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:107)
	at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:323)
	at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1258)
	at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1035)
	at org.codehaus.groovy.runtime.InvokerHelper.invokePogoMethod(InvokerHelper.java:1036)
	at org.codehaus.groovy.runtime.InvokerHelper.invokeMethod(InvokerHelper.java:1019)
	at org.codehaus.groovy.runtime.InvokerHelper.invokeMethodSafe(InvokerHelper.java:97)
	at nextflow.extension.JoinOp$_handler_closure2.doCall(JoinOp.groovy:135)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:107)
	at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:323)
	at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:274)
	at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1035)
	at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.call(PogoMetaClassSite.java:38)
	at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:139)
	at nextflow.extension.DataflowHelper$3.afterStop(DataflowHelper.groovy:254)
	at groovyx.gpars.dataflow.operator.DataflowProcessor.fireAfterStop(DataflowProcessor.java:324)
	at groovyx.gpars.dataflow.operator.DataflowProcessorActor.afterStop(DataflowProcessorActor.java:59)
	at jdk.internal.reflect.GeneratedMethodAccessor203.invoke(Unknown Source)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:107)
	at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:323)
	at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1258)
	at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1035)
	at org.codehaus.groovy.runtime.InvokerHelper.invokePojoMethod(InvokerHelper.java:1024)
	at org.codehaus.groovy.runtime.InvokerHelper.invokeMethod(InvokerHelper.java:1015)
	at groovyx.gpars.actor.Actor.callDynamic(Actor.java:369)
	at groovyx.gpars.actor.Actor.handleTermination(Actor.java:320)
	at groovyx.gpars.actor.AbstractLoopingActor.terminate(AbstractLoopingActor.java:204)
	at groovyx.gpars.dataflow.operator.DataflowProcessor.terminate(DataflowProcessor.java:147)
	at groovyx.gpars.dataflow.operator.DataflowProcessorActor.checkPoison(DataflowProcessorActor.java:115)
	at groovyx.gpars.dataflow.operator.DataflowOperatorActor.onMessage(DataflowOperatorActor.java:83)
	at groovyx.gpars.actor.impl.SDAClosure$1.call(SDAClosure.java:43)
	at groovyx.gpars.actor.AbstractLoopingActor.runEnhancedWithoutRepliesOnMessages(AbstractLoopingActor.java:293)
	at groovyx.gpars.actor.AbstractLoopingActor.access$400(AbstractLoopingActor.java:30)
	at groovyx.gpars.actor.AbstractLoopingActor$1.handleMessage(AbstractLoopingActor.java:93)
	at groovyx.gpars.util.AsyncMessagingCore.run(AsyncMessagingCore.java:132)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)
May-12 11:30:17.806 [Actor Thread 88] ERROR nextflow.extension.OperatorImpl - @unknown
nextflow.exception.AbortOperationException: Join mismatch for the following entries: key=[id:multi_tumor_normal, normal_id:normal, num_intervals:2, patient:multi_tumor_normal, sex:XX] values=[[/home/berguner/src/nf-core-sarek/work/f7/853c49e5096c6241223dbda50ef62d/tumour_copy_vs_normal.mutect2.contamination.table, /home/berguner/src/nf-core-sarek/work/16/95d99ed60da6dac8c7ea56ac776072/tumour_vs_normal.mutect2.contamination.table]]
	at nextflow.extension.JoinOp.checkForMismatch(JoinOp.groovy:290)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:107)
	at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:323)
	at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1258)
	at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1035)
	at org.codehaus.groovy.runtime.InvokerHelper.invokePogoMethod(InvokerHelper.java:1036)
	at org.codehaus.groovy.runtime.InvokerHelper.invokeMethod(InvokerHelper.java:1019)
	at org.codehaus.groovy.runtime.InvokerHelper.invokeMethodSafe(InvokerHelper.java:97)
	at nextflow.extension.JoinOp$_handler_closure2.doCall(JoinOp.groovy:135)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:107)
	at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:323)
	at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:274)
	at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1035)
	at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.call(PogoMetaClassSite.java:38)
	at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:139)
	at nextflow.extension.DataflowHelper$3.afterStop(DataflowHelper.groovy:254)
	at groovyx.gpars.dataflow.operator.DataflowProcessor.fireAfterStop(DataflowProcessor.java:324)
	at groovyx.gpars.dataflow.operator.DataflowProcessorActor.afterStop(DataflowProcessorActor.java:59)
	at jdk.internal.reflect.GeneratedMethodAccessor203.invoke(Unknown Source)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:107)
	at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:323)
	at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1258)
	at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1035)
	at org.codehaus.groovy.runtime.InvokerHelper.invokePojoMethod(InvokerHelper.java:1024)
	at org.codehaus.groovy.runtime.InvokerHelper.invokeMethod(InvokerHelper.java:1015)
	at groovyx.gpars.actor.Actor.callDynamic(Actor.java:369)
	at groovyx.gpars.actor.Actor.handleTermination(Actor.java:320)
	at groovyx.gpars.actor.AbstractLoopingActor.terminate(AbstractLoopingActor.java:204)
	at groovyx.gpars.dataflow.operator.DataflowProcessor.terminate(DataflowProcessor.java:147)
	at groovyx.gpars.dataflow.operator.DataflowProcessorActor.checkPoison(DataflowProcessorActor.java:115)
	at groovyx.gpars.dataflow.operator.DataflowOperatorActor.onMessage(DataflowOperatorActor.java:83)
	at groovyx.gpars.actor.impl.SDAClosure$1.call(SDAClosure.java:43)
	at groovyx.gpars.actor.AbstractLoopingActor.runEnhancedWithoutRepliesOnMessages(AbstractLoopingActor.java:293)
	at groovyx.gpars.actor.AbstractLoopingActor.access$400(AbstractLoopingActor.java:30)
	at groovyx.gpars.actor.AbstractLoopingActor$1.handleMessage(AbstractLoopingActor.java:93)
	at groovyx.gpars.util.AsyncMessagingCore.run(AsyncMessagingCore.java:132)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)
May-12 11:30:17.840 [Actor Thread 92] DEBUG nextflow.Session - Session aborted -- Cause: Join mismatch for the following entries: 
- key=[id:multi_tumor_normal, normal_id:normal, num_intervals:2, patient:multi_tumor_normal, sex:XX] values=[/home/berguner/src/nf-core-sarek/work/25/e0c8134415c2e51506ffff18af1242/multi_tumor_normal.mutect2.vcf.gz, /home/berguner/src/nf-core-sarek/work/25/e0c8134415c2e51506ffff18af1242/multi_tumor_normal.mutect2.vcf.gz.tbi, /home/berguner/src/nf-core-sarek/work/42/bc146f28446582925d8788cc58320b/multi_tumor_normal.mutect2.vcf.gz.stats, /home/berguner/src/nf-core-sarek/work/05/d1a405c69b2445900fc000ba12ee4a/multi_tumor_normal.mutect2.artifactprior.tar.gz] 
- key=[id:multi_tumor_normal, normal_id:normal, num_intervals:2, patient:multi_tumor_normal, sex:XX] values=[[/home/berguner/src/nf-core-sarek/work/f7/853c49e5096c6241223dbda50ef62d/tumour_copy_vs_normal.mutect2.segmentation.table, /home/berguner/src/nf-core-sarek/work/16/95d99ed60da6dac8c7ea56ac776072/tumour_vs_normal.mutect2.segmentation.table]]

Environment

  • Nextflow version: 23.04.1 build 5866
  • Java version: openjdk version "11.0.18" 2023-01-17
  • Operating system: Ubuntu 20.04
  • Bash version: GNU bash, version 5.0.17(1)-release (x86_64-pc-linux-gnu)
@stenglein-lab
Copy link

FWIW: I've noticed this behavior too using nextflow version 22.10.8 and have noticed that it happens sporadically. I've attempted to create a minimal reproducible example but have not succeeded in reproducing the behavior in a simple example nextflow script.

Copy link

stale bot commented Dec 15, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Dec 15, 2023
@sbthandras
Copy link

For me what closed this is that one of the tuples was a nextflow.taskpath variable while the other was a java.string. What caused this is I had an upstream process where the tuple key was the same as its path:

output:
tuple val(sample_id), path("${sample_id}")

So I think it's worth trying to look if they are the same type of tuple with

def classOfElements = YOURPROCESS_HERE.out.collect { list ->
list.collect { it.class }
}
classOfElements.view().each { list ->
list.each { println it }
}

and then seeing if you have something similar going on than I had

@bentsherman
Copy link
Member

@sbthandras are you saying the path element in that output tuple was a TaskPath? I wouldn't expect that based on how the task processor works, but if true it should probably be unwrapped under the hood

@stale stale bot removed the stale label Apr 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants