Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question Regarding UMI and Barcode Parameters for UMI-based Bulk BCR-Seq Data Assembly #336

Open
Yaxiong293 opened this issue Dec 19, 2024 · 7 comments

Comments

@Yaxiong293
Copy link

Dear Developer,

I am working with UMI-based bulk BCR sequencing data, and I would like to merge reads that have the same UMI. Should I use the -barcode parameter or the -umi parameter for this purpose?

Thank you for your assistance!

@Yaxiong293
Copy link
Author

I would like to ask why the results I obtained using the -umi parameter and the -barcode parameter are very different. The clone counts and types using the -barcode parameter are about half of those obtained with the -umi parameter. Aren't both parameters supposed to assemble reads with the same UMI?

Additionally, I would like to know how to output the unassembled reads. I appreciate your assistance on this matter. Thank you!

@mourisl
Copy link
Collaborator

mourisl commented Dec 19, 2024

The --UMI is for 10x Genomics platform, and TRUST4 only use that information for abundance estimation. For UMI-based BCR-seq, you shall regard the UMI as the barcode (use the --barcode option), and then add another option "--barcodeLevel molecule" to specify that this barcode is actually UMI. Hope this helps.

@Yaxiong293
Copy link
Author

Dear Mourisl,

Thank you for your assistance!

I have a few more questions regarding the parameters I'm using. My UMI is located in the first 34 bases of R1, and here are the parameters I am currently using:

--barcodeLevel molecule
--barcode $READ1_FILE
--readFormat bc:0:33 \

With these parameters, can I achieve assembly based on the same UMI as well as UMI deduplication?

Additionally, I inputted 20 million reads, but only 5 million were assembled. Is it possible for TRUST4 to output these unassembled reads?

@mourisl
Copy link
Collaborator

mourisl commented Dec 20, 2024

The parameter looks right. Does read1 also contain BCR data? For example, if after the first 34bp is the BCR sequence information, you may need to use --readFormat bc:0:33,r1:34:-1 so the sequence used in the assembly can skip the UMI data.

In the result, the assembly will be based on the same UMI (as in *_barcode_airr/report.tsv file). The abundance summarized based on UMI count (deduplication) should be in the airr/report files without the term "barcode" in it.

TRUST4 output the candidate reads and assembled reads. You can use the input read minus the assembled read to get the unassembled reads.

@Yaxiong293
Copy link
Author

Thank you for your response!

Yes, my Read1 does contain BCR data, and since I'm using paired-end sequencing, should my --readFormat be set as bc:0:33,r1:34:-1,r2:0:-1?

I appreciate your assistance!

@mourisl
Copy link
Collaborator

mourisl commented Dec 20, 2024

"r2:0:-1" can be skipped as TRUST4 will use the full sequence if the region is not specified.

@Yaxiong293
Copy link
Author

Thanks you! it helps a lot

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants