Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider switching from cutadapt to pTrimmer #552

Closed
6 tasks done
donkirkby opened this issue Apr 3, 2020 · 3 comments
Closed
6 tasks done

Consider switching from cutadapt to pTrimmer #552

donkirkby opened this issue Apr 3, 2020 · 3 comments

Comments

@donkirkby
Copy link
Member

donkirkby commented Apr 3, 2020

With the changes for SARS-CoV-2 in #549, we may need to trim many more primer sequences. Look at pTrimmer and its paper to see if it's more efficient than cutadapt.

  • Trim ARTIC primers with cutadapt after trimming adapters.
  • Compare pTrimmer results with cutadapt results.
  • Add HCV primers.
  • Fix microtest examples to work with new primer trimming.
  • Time cutadapt and pTrimmer calls for FASTQ files of sizes 20MB, 100MB, and 500MB, as well as SARS-CoV-2, HCV, and HIV.
  • Add options to skip different trimming steps: phiX, adapters, and primers.
@donkirkby
Copy link
Member Author

donkirkby commented Apr 15, 2020

The primer sequences to use for SARS-CoV-2 are in the artic project, and SRR11314339-SARS_S12 is a (poor) example of ARTIC/PrimalSeq data. (Since it was processed with Nextera XT it is likely that some read pairs will contain zero or one primer, rather than 2.)

@donkirkby
Copy link
Member Author

donkirkby commented May 5, 2020

Clarification from slack discussion:

  • Adapters have different rules from primers.
  • Primers only appear at the ends of reads, adapters can appear in the middle of a read if you read off the end.
  • Trim adapters first, then primers.
  • Primer-dimer scenario could look like [LEFT]-[RC RIGHT]-[ADAPTER]-[random garbage]

Check the documentation of adapter types.

@donkirkby
Copy link
Member Author

donkirkby commented May 20, 2020

Time in minutes for cutadapt and other trimming steps:

sample read count censor adapters primers
S8CT25MIXATOH-NT-Unknown_S29 108,287 1 0 2
P90786A-HCV_S86 475,577 3 1 8
S3CT27-IP-Unknown_S8 482,083 3 0 9
HIV3428H2-E21-HIV_S39 544,536 4 1 10
POS1TO1K-IP-Unknown_S2 1,632,059 20 2 30
POS1TO1KMIX-IP-Unknown_S4 1,751,820 21 2 32
POS1TO20K-IP-Unknown_S1 2,228,806 27 3 40
POS1TO20KMIX-IP-Unknown_S3 4,376,862 52 5 78

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant