Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possibility of extracting subset of fq according to barcodeTranslate? #301

Open
yuyuleung opened this issue Aug 7, 2024 · 11 comments
Open

Comments

@yuyuleung
Copy link

Hello Dr. Liu,

Thank you for creating such a great tool and keeping it updated. I am trying to perform assembly of TCR with my spatial transcript data in single cell mode. However, since the size of my data is always large, I have to split it into several parts to perform parallel assembly. However, the speed of splitting the fastq files is slow and I have noticed that the reads extraction step of fastq files is dependent on the barcodeTranslate file provided. But, when only the subset of barcodeTranslate file and the complete fq file are given, a barcode is found missing in the fastq file and an error occurs.

Therefore, I was wondering if it is possible to modify the FastqExtractor function to skip reads that are not in the barcodeTranslate file. This would allow me to skip the splitting step before assembly.

Thank you very much for your help!

Best wishes,
Yuyu

@mourisl
Copy link
Collaborator

mourisl commented Aug 7, 2024

Could you please elaborate why you provide a subset of barcdoeTranslate file? I can't recall your read layout, is this split barcode, so there is no "full-barcode" whitelist?

@yuyuleung
Copy link
Author

yuyuleung commented Aug 7, 2024 via email

@mourisl
Copy link
Collaborator

mourisl commented Aug 8, 2024

I just pushed a new update to the "dev" branch. It introduces a new option "--skipBarcodeErrorRead" option to the fastq-extractor program. With this option, it will skip the reads with uncorrectable barcode errors or the barcode is not in the translation table. Is this what you need? If it works fine on your data, I will merge it to the master branch. Thank you!

@yuyuleung
Copy link
Author

yuyuleung commented Aug 8, 2024 via email

@yuyuleung
Copy link
Author

Hello Dr. Liu,

should I add this option "--skipBarcodeErrorRead" directly on the run-trsut4?

Thanks.
Yuyu

@mourisl
Copy link
Collaborator

mourisl commented Aug 8, 2024

That option is for fastq-extractor only, it is not in the run-trust4 wrapper.

@yuyuleung
Copy link
Author

Hello Dr. Liu,

Thank you so much for your effort. I have tested with my data (there are total 434M reads) and I have tried to extract reads of two spot (35 reads). The extraction step took arount 2 hours. I think it is still a little bit slow. I wil keep trying with different dataset and check its efficiency. Do you have also any suggestions to me, how I can split my dataset efficiently into several parts in order to speed up the assembly?

Thank you so much!

Best wishes,
Yuyu

@mourisl
Copy link
Collaborator

mourisl commented Aug 9, 2024

Is this TCR-targeted sequencing data, or it is gene expression data?

@yuyuleung
Copy link
Author

It is TCR-targeted sequencing data.

I have many different TCR/BCR-targeted sequencing data. According to different enrichment efficiency, I can get around 50M - 200M TCR/BCR reads.

What I have tested before (434M reads) was the raw data (incl. adapters or other genes reads). I have just tested just now with clean reads (around 50M) to extract also 35 reads from it. It took only 10 minutes. I think it is better and efficient with smaller data?

I am still testing with larger data (like 200M TCR-targeted reads).

Thanks a lot again!
Yuyu

@mourisl
Copy link
Collaborator

mourisl commented Aug 9, 2024

That makes sense, because 50M is about 1/9 of 434M, so if 50M takes about 10minutes, 434M would take 1.5 hours. I feel like for 50M reads, TRUST4 without splitting might be fast enough?

@yuyuleung
Copy link
Author

yuyuleung commented Aug 12, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants