Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support BGISEQ/DNBSEQ/MGISEQ? #196

Open
orangeSi opened this issue Oct 16, 2024 · 9 comments
Open

support BGISEQ/DNBSEQ/MGISEQ? #196

orangeSi opened this issue Oct 16, 2024 · 9 comments

Comments

@orangeSi
Copy link

orangeSi commented Oct 16, 2024

Hello, Do TrimGalore support auto filter adapter for BGISEQ/DNBSEQ/MGISEQ read yet? The adapter sequence is at OpenGene/fastp#259

Thanks~
Si

@FelixKrueger
Copy link
Owner

Would you be able to send me a small-ish test dataset (e.g. 100K reads) so I can take a look?

@orangeSi
Copy link
Author

thanks~ I download SRR28167102 which is sequenced by DNBSEQ-G400 from NCBI SRA and the part of it is SRR28167102_part.DNBSEQ-G400.zip .

@FelixKrueger
Copy link
Owner

Hmm, I downloaded the entire dataset and added the following adapter sequences to the adapter file of FastQC:

AAGTCGGAGGCCAAGCGGTCTTAGGAAGACAA  MGI/BGI forward
AAGTCGGATCGTAGCCATGTCGTTCTGTGAGCCAAGGAGTTG   MGI/BGI reverse
AAGTCGGA    MGI/BGI universal
Screenshot 2024-10-16 at 13 43 17

It doesn't look like this is a good example of 'contamination' with the MGI/BGI adapter (the universal sequence is just about visible at the end but it also is only 8bp long...).

@orangeSi
Copy link
Author

the result of fastqc is weird, beacuase I checked the r1 fq file by just zgrep the forward adapter got match as this:
image

@FelixKrueger
Copy link
Owner

FelixKrueger commented Oct 17, 2024

Can you include a -c to count how many times it is found in total? Maybe it is just so low that it doesn't show up in a % plot?

Update:

I've just done this myself there are 162 instances of this adapter sequence, or 0.6% of total sequences, starting at different positions. This is indeed not something you would see very well accumulating in a FastQC plot...

zgrep -c AAGTCGGAGGCCAAGCGGTCTTAGGAAGACAA SRR28167102.DNBSEQ-G400.r1.fq.gz                                                                                               
162

@orangeSi
Copy link
Author

can you extract only the 162 pair of read to fq files to do fastqc, if fastqc result show100% adapter will be normal ~

@FelixKrueger
Copy link
Owner

that works:

zgrep -A 2 -B 1 AAGTCGGAGGCCAAGCGGTCTTAGGAAGACAA SRR28167102.DNBSEQ-G400.r1.fq.gz | grep -v "^-" > hits.fastq
Screenshot 2024-10-18 at 09 56 16

This is the plot for the Read 2 adapter (80 sequences):

Screenshot 2024-10-18 at 09 58 56

I suppose adding a flag --bgiseq wouldn't be too difficult. If this type of sequencing becomes more common, we could also add it to the auto-detection.

@orangeSi
Copy link
Author

--bgiseq is ok for now.

@FelixKrueger
Copy link
Owner

The option --bgiseq is now available from the dev branch. Can you let me know if it works as expected for you? If yes, it will be part of the next release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants