Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Does --allocate-multi-mappings 4 actually impact results? #175

Open
adamklie opened this issue Dec 4, 2024 · 5 comments
Open

[BUG] Does --allocate-multi-mappings 4 actually impact results? #175

adamklie opened this issue Dec 4, 2024 · 5 comments
Labels
bug Something isn't working

Comments

@adamklie
Copy link

adamklie commented Dec 4, 2024

Describe the bug
I've run chromap with and without the --allocate-multi-mappings 4 (all else equal) and I get the same result (despite log file saying "Will allocate multi-mappings after mapping." in one and "Won't allocate multi-mappings after mapping.\nOnly output unique mappings after mapping." in another.

To Reproduce
Steps to reproduce the behavior:

  1. Describe the data you are using and provide a sample of your data if possible: I'm working with paired-end scATAC-seq data generated by 10x scATAC-seq. The read length is 100bp and the barcode length is 16bp.
  2. Get the Chromap version by running chromap -v and post it here: 0.2.7-r493
  3. Run with --allocate-multi-mappings 4:
chromap --trim-adapters --remove-pcr-duplicates --remove-pcr-duplicates-at-cell-level --Tn5-shift --allocate-multi-mappings 4 --low-mem --BED -l 2000 --bc-error-threshold 2 --bc-probability-threshold 0.9 --read-format r1:0:-1,r2:0:-1,bc:0:-1 -x /cellar/users/aklie/data/ref/genomes/GHOST_KY2021/chromap/index -r /cellar/users/aklie/data/ref/genomes/GHOST_KY2021/HT.RefwMG0.fasta -q 30 -t 16 -1 /cellar/users/aklie/data/datasets/scATACseq_55hpf-Ciona/fastq/2024_01_22/5_5hr_scATAC_1_S1_L006_R1_001.fastq.gz -2 /cellar/users/aklie/data/datasets/scATACseq_55hpf-Ciona/fastq/2024_01_22/5_5hr_scATAC_1_S1_L006_R3_001.fastq.gz -b /cellar/users/aklie/data/datasets/scATACseq_55hpf-Ciona/fastq/2024_01_22/5_5hr_scATAC_1_S1_L006_R2_001.fastq.gz --barcode-whitelist /cellar/users/aklie/data/ref/bc_whitelists/737K-cratac-v1.rc.txt -o /cellar/users/aklie/data/datasets/scATACseq_55hpf-Ciona/processed/2024_10_30/chromap/medium/5_5hr_scATAC_1.bed --summary /cellar/users/aklie/data/datasets/scATACseq_55hpf-Ciona/processed/2024_10_30/chromap/medium/5_5hr_scATAC_1_barcode_log.txt > /cellar/users/aklie/data/datasets/scATACseq_55hpf-Ciona/processed/2024_10_30/chromap/medium/5_5hr_scATAC_1_alignment_log.txt 2>&1`
  1. Run without --allocate-multi-mappings 4:
chromap --trim-adapters --remove-pcr-duplicates --remove-pcr-duplicates-at-cell-level --Tn5-shift --low-mem --BED -l 2000 --bc-error-threshold 2 --bc-probability-threshold 0.9 --read-format r1:0:-1,r2:0:-1,bc:0:-1 -x /cellar/users/aklie/data/ref/genomes/GHOST_KY2021/chromap/index -r /cellar/users/aklie/data/ref/genomes/GHOST_KY2021/HT.RefwMG0.fasta -q 30 -t 16 -1 /cellar/users/aklie/data/datasets/scATACseq_55hpf-Ciona/fastq/2024_01_22/5_5hr_scATAC_1_S1_L006_R1_001.fastq.gz -2 /cellar/users/aklie/data/datasets/scATACseq_55hpf-Ciona/fastq/2024_01_22/5_5hr_scATAC_1_S1_L006_R3_001.fastq.gz -b /cellar/users/aklie/data/datasets/scATACseq_55hpf-Ciona/fastq/2024_01_22/5_5hr_scATAC_1_S1_L006_R2_001.fastq.gz --barcode-whitelist /cellar/users/aklie/data/ref/bc_whitelists/737K-cratac-v1.rc.txt -o /cellar/users/aklie/data/datasets/scATACseq_55hpf-Ciona/processed/2024_10_30/chromap/strict/5_5hr_scATAC_1.bed --summary /cellar/users/aklie/data/datasets/scATACseq_55hpf-Ciona/processed/2024_10_30/chromap/strict/5_5hr_scATAC_1_barcode_log.txt > /cellar/users/aklie/data/datasets/scATACseq_55hpf-Ciona/processed/2024_10_30/chromap/strict/5_5hr_scATAC_1_alignment_log.txt 2>&1
  1. I've attached the logs for both runs

Expected behavior
I would expect that the Number of output mappings (passed filters) would be greater when multimappers are allowed. I would also expect to see many metrics differing between the two runs. Instead the metrics are identical. I've attached the barcode summary files as well.

Screenshots/files
With multimappers
image
with_multimappers_barcode.txt
with_multimappers.txt
Without multimappers
image
no_multimappers_barcode.txt
no_multimappers.txt

Environment (please complete the following information):
LSB Version: :core-4.1-amd64:core-4.1-noarch
Distributor ID: Rocky
Description: Rocky Linux release 8.9 (Green Obsidian)
Release: 8.9
Codename: GreenObsidian

Installed chromap from source

Additional context
Add any other context about the problem here.

@adamklie adamklie added the bug Something isn't working label Dec 4, 2024
@adamklie
Copy link
Author

adamklie commented Dec 4, 2024

May have already answered my own question. My guess is that the q 30 parameter "overrides" the multimapping parameter because multimappers have MAPQ=0.

So I guess what I'm after is a way to keep multimappers and discard reads with MAPQ < 30 that aren't multimappers. Do those even exist?

@haowenz
Copy link
Owner

haowenz commented Dec 8, 2024

Allocating multi-mapping is kind of deprecated. If you want to try it, please do not use low memory mode and set q to 0 and try it again.

@akundaje
Copy link

akundaje commented Dec 8, 2024

Just want to chime in here. Multimappers are VERY important to avoid missing out on several key promoters and enhancers that get missed with unique mapping reads only. I would strongly recommend keeping it in if possible. It would disrupt the pipelines we are developing for several large consortia.

@haowenz
Copy link
Owner

haowenz commented Dec 8, 2024

I think it should be --allocate-multi-mappings --drop-repetitive-reads 4 -q 0. And please avoid using low mem mode since it needs to see all the mappings in memory to allocate multi-mappings. Can you try this?

The function to allocating multi-mapped reads is always there. And it uses all the mappings and runs some advanced algorithm to find some optimal allocation. But it not published as part of the Chromap paper. We did some test before and saw it was working. But it has been a while and a lot of new development has been done for Chromap. So you need to do some test before you use it.

@akundaje
Copy link

akundaje commented Dec 8, 2024

That is good to know that the functionality is still in there. We will test it out and get back to you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants