-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix/cell level dedup #60
base: fix/cell_level_dedup
Are you sure you want to change the base?
Fix/cell level dedup #60
Conversation
DongqingSun96
commented
Sep 23, 2020
- Cell-level deduplication
- Call peaks from bed files
Feat/cicd 1
Merge from liulab-dfci
Merge from liulab-dfci
@@ -200,10 +154,21 @@ if config["platform"] == "microfluidic": | |||
"Result/Benchmark/%s_QCMerge.benchmark" %(config["outprefix"]) | |||
shell: | |||
"python " + SCRIPT_PATH + "/scATAC_microfluidic_QC.py --log-dir {params.log} --directory {params.outdir};" | |||
|
|||
rule scatac_celldedup: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to make it as an option at config.yaml
see how I did it https://github.com/crazyhottommy/pyflow-single-cell/blob/feat/dedup/scATAC/rules/sc_atac_dedup.smk
|
||
rule scatac_allpeakcall: | ||
input: | ||
bam = "Result/minimap2/%s.sortedByPos.rmdp.CBadded.bam" %(config["outprefix"]) | ||
frag_dedup = "Result/minimap2/fragments_corrected_celllevel_dedup.bed", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
peaks are called using the deduplicated fragment file. in peak count, we should also use the deduplicated fragment file rather than using the original fragment file as in https://github.com/liulab-dfci/MAESTRO/blob/master/MAESTRO/Snakemake/scATAC/Snakefile#L872
"macs2 callpeak -f BAMPE -g {params.genome} --outdir Result/Analysis/ -n {params.name} -B -q 0.05 --nomodel --extsize=50 --SPMR --keep-dup all -t {input.bam}" | ||
"macs2 callpeak -f BEDPE -g {params.genome} --outdir Result/Analysis -n {params.name} -B -q 0.05 --nomodel --extsize=50 --SPMR --keep-dup all -t {input.frag_dedup}" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@taoliu for -f BAMPE or -f BEDPE , --extsize=50 has no effect and can be removed, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right
@@ -562,50 +497,35 @@ if config["platform"] == "10x-genomics" or config["platform"] == "sci-ATAC-seq": | |||
shell: | |||
"python " + SCRIPT_PATH + "/scATAC_FragmentGenerate.py -B {input.bam} -O {params.outdir} --CBtag CB --count;" | |||
|
|||
rule scatac_rmdp: | |||
rule scatac_fragmentsample: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to keep the mark duplicate step, but rename scatac_rmdp to scatac_markdp
and the bam file to "Result/minimap2/%s.sortedByPos.markdp.CBadded.bam".
because in the later bulk qc https://github.com/liulab-dfci/MAESTRO/blob/master/MAESTRO/Snakemake/scATAC/Snakefile#L154 we used samtools flagstat to get the read stats, and the duplication rate need to be there.