Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DSL1: Input file name collision when merging two already merged BAMs in additional_library_merge step. #1017

Closed
3 of 5 tasks
TCLamnidis opened this issue Jul 26, 2023 · 3 comments · Fixed by #1021
Closed
3 of 5 tasks
Assignees
Labels
bug Something isn't working

Comments

@TCLamnidis
Copy link
Collaborator

TCLamnidis commented Jul 26, 2023

Check Documentation

I have checked the following places for your error:

Description of the bug

In niche cases where multiple UDG treatments exist for a sample, and multiple libraries have each of these treatments, a file name collision kills the pipeline at the additional_library_merge step.

Steps to reproduce

Steps to reproduce the behaviour:

  1. Command line: nextflow run ... (any input that requires merging of two already merged library-level BAMs during additional_library_merge step.
  2. See error:
Caused by:
  Process `additional_library_merge` input file name collision -- There are multiple input files for each of the following file names: MIS010_ss_libmerged.trimmed.bam, MIS010_ss_libmerged.trimmed.bam.bai

Expected behaviour

The BAMs initial libmerged should have unique names, to avoid such errors.

Log files

Have you provided the following extra information/files:

  • The command used to run the pipeline
  • The .nextflow.log file
  • The exact error:

System

  • Hardware: HPC
  • Executor: sge
  • OS: Ubuntu
  • Version 20.04.6 LTS

Nextflow Installation

  • Version: 21.10.6 build 5660

Container engine

  • Engine: Singularity
  • version: singularity version 3.7.1
  • Image tag: 2.4.5

Additional context

@TCLamnidis TCLamnidis added the bug Something isn't working label Jul 26, 2023
@jfy133
Copy link
Member

jfy133 commented Jul 27, 2023

In niche cases where multiple UDG treatments exist for a sample, and multiple libraries have each of these treatments, a file name collision kills the pipeline at the additional_library_merge step.

That's what confuses me... shouldn't they have been merged at the first post-dedup merging step? 🤔

@TCLamnidis
Copy link
Collaborator Author

They are, that's the problem. as they end up with the same name. Could be an issue with the naming of the initial library merge step, OR the trimming step.

@TCLamnidis
Copy link
Collaborator Author

TCLamnidis commented Aug 15, 2023

To give a better overview. Say we have a sample with 4 libraries with the following attributes:

Sample Library UDG_Treatment Strandedness Lane
ABC001 A0101 half double 1
ABC001 A0102 half double 1
ABC001 B0101 none double 1
ABC001 B0102 none double 1

The BAMs of the first two libraries will be merged at the initial lib_merge, and be named ABC001_udghalf_libmerged.bam.
Equally, the BAMs of the last two libraries will be merged at the initial lib_merge, and be named ABC001_udgnone_libmerged.bam.
However, once they undergo bam trimming, the outputs lose their UDG attribute, and both become ABC001_libmerged.bam
Once the two come together for the additional_library_merge step, the two input files share a name and the file collision pops up.

@TCLamnidis TCLamnidis self-assigned this Aug 15, 2023
@TCLamnidis TCLamnidis mentioned this issue Aug 18, 2023
11 tasks
@TCLamnidis TCLamnidis linked a pull request Aug 23, 2023 that will close this issue
11 tasks
@TCLamnidis TCLamnidis mentioned this issue Feb 16, 2024
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants