Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

just align (not global alignment) PE reads (without FLASH) to the amplicon sequence and quantify indel rate? #120

Open
YichaoOU opened this issue Aug 5, 2021 · 7 comments
Labels
enhancement New feature or request

Comments

@YichaoOU
Copy link

YichaoOU commented Aug 5, 2021

Is your feature request related to a problem? Please describe.

I think it's quite common that FLASH can't really merge the PE reads, For example:

    1. amplicon sequence too long or too short
    1. read is noisy
      • need to trim reads because the 3'-end reads looks like AAAAAAAAAA or TTTTTTTTTTTTT
      • overall read quality is too low

Describe the solution you'd like
Just use BWA to align PE reads to the amplicon sequence and extract indel frequency?

Thanks,
Yichao

@YichaoOU
Copy link
Author

YichaoOU commented Aug 5, 2021

BTW, I think crispresso can take bam as input, instead of fastq, but looking at the code, I think what the program really does is extracting the reads from the bam file and then do a global alignment again, right?

@kclem kclem added the enhancement New feature or request label Aug 5, 2021
@kclem
Copy link
Member

kclem commented Aug 5, 2021

Hi Yichao,

Thanks for using CRISPResso -- we've been thinking about this as well, and recognize that flash is a potential source of errors.

You are correct that CRISPResso realigns every read even if provided a bam input. Our biologically-informed alignment is our special sauce that makes the quantification accurate, and we like to think it's a lot better than bwa or another aligner.

I think it's probably best to do trimming before CRISPResso or to use trimmomatic or the filtering within crispresso to get rid of the AAAAA or TTTTTT sequences as well as sequences that are the wrong length.

As we work on the flash-less alignment, do you have a case of reads and a reference that are particularly challenging? Just to make sure that there is nothing we are missing?

@YichaoOU
Copy link
Author

YichaoOU commented Aug 5, 2021

Thanks for the quick reply! I don't have a very challenging dataset, I can just solve my problems by some data cleaning.

Looking forward to this FLASH-less version!

Thanks,
Yichao

@eul2021
Copy link

eul2021 commented Sep 23, 2021

Hi Kendell,
I have a ~400bp amplicon and ~210bp pair-end illumina amplicon sequencing. When I run CRISPResso on a unedited control ~100% of reads are filtered out. I tried various options from your past issues, but cannot find a solution to fix this. It's probably related to having long/unedited amplicons and reads merely overlapping. Is there are set of parameters to tweak FLASH or to omit running it all together? Thank you very much.

@kclem
Copy link
Member

kclem commented Sep 24, 2021

Try the parameter --force_merge_pairs. This is highly experimental and to be used with caution. R1 and R2 will be joined using Flash, and then if they cannot be joined, they will be pasted together. So a lot of apparent indels at the site of merging will be created.

@eul2021
Copy link

eul2021 commented Sep 28, 2021

Thank you, Kendell, I tried running with this parameter, but it returned error with the following:
"CRITICAL @ Tue, 28 Sep 2021 17:50:55:
Unexpected error, please check your input.
ERROR: 71 "
I was using 2.2.5 from last week, but let me know if there's a never one I should try.

Also, regarding the "force_merge_pairs" - I would imaging using this option or relaxing "min_pair_end_reads_overlap" to <10 could introduce some artefacts. Would you know what type of issues to watch out for? Thank you!

@kclem
Copy link
Member

kclem commented Sep 29, 2021

Sorry -- I just pushed a fix for the force_merge_pairs error. You can either wait for a next release in a week or two, or install from source.

The artifacts would be that if reads were merged, there could be deletions or insertions at the merge site.

For example, for:

ref: AATTCCGG
read1: AAT
read2: CGG
force-merged read:  AATCGG
Alignment: AAT--CGG
           AATTCCGG

In this case, there is an apparent deletion that is induced by the merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants