Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'Invalid BAM record: read: "1" is missing tag: "CR"' #12

Closed
marcjwilliams1 opened this issue Jul 31, 2019 · 7 comments
Closed

'Invalid BAM record: read: "1" is missing tag: "CR"' #12

marcjwilliams1 opened this issue Jul 31, 2019 · 7 comments

Comments

@marcjwilliams1
Copy link

I get the following error when trying to run bamtofastq
thread 'main' panicked at 'Invalid BAM record: read: "1" is missing tag: "CR"', src/main.rs:509:25

Here are the comment tags in the bam header:

@CO	10x_bam_to_fastq:I1(BC:QT)
@CO	10x_bam_to_fastq:R1(CR:CY,UR:UY,TR:TQ)
@CO	10x_bam_to_fastq:R2(SEQ:QUAL)

and the first read in the bamfile

52080637	256	chr10	13047	3	25S125M	*	0	0	GTGGTATCAACGCAGAGTACATGGGGGCTCCAACCCTCGGGATGCCTCATGCTCACCCTTTGGCACCCACCTGACAGCTCAGCATGTCTGCTCTCTGCCATCCTCAATGCCTGCTCTAGACAAGCCCAAGTCCGCCAGGAGTGGCAGAGG	FFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFF:FFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFF:FF:FFFFFFFFFFFFF:FFFFFFFFFF:FFFFF:FFFF	RG:Z:A3_inflamed:MissingLibrary:1:H3TWHDMXX:1	NH:i:2	NM:i:2

Wondering if something is formatted incorrectly. Any help much appreciated, thanks!

@pmarks
Copy link
Contributor

pmarks commented Aug 1, 2019

Hi @marcjwilliams1: Where did this BAM file come from? Some archives 're-process' the BAM file and remove all but a few tags. This BAM appears to be missing the tags that Cell Ranger puts on to every read, which bamtofastq needs to reconstruct the full original sequence. In this case bamtofastq is looking for BC,QT,CR,CY,UR,UY,TR,TQ.

You'll need to get a hold of original BAM file that Cell Ranger produced in order to run bamtofastq

@marcjwilliams1
Copy link
Author

OK, thanks. I got it from SRA, the fastq files also seem to have missing information, only one fastq file is dumped rather than two, was hoping the bam files would have all the information but doesn't look like it's the case.

@pmarks
Copy link
Contributor

pmarks commented Aug 1, 2019

@marcjwilliams1 can you share the SRA accession you're looking at? In theory SRA is not supposed to munge 10x BAM files for exactly this reason, but maybe this data isn't properly tagged as 10x? I'd like to look into this with SRA.

@marcjwilliams1
Copy link
Author

marcjwilliams1 commented Aug 2, 2019

Sure, the accession for one of the bams is SRR7420402 (the whole project has GEO accession GSE116222).

I used the following command to download it:
sam-dump SRR7420402 | samtools view -bS - > SRR7420402

Also if I try to download the fastq files I only got 1 fastq files rather than the 2 that I would have expected.
fasterq-dump --split-files SRR7420402

Do let me know if you find anything out. Thanks.

@qingnanl
Copy link

I do have exactly the same issue. Only 1 fastq file which is not good to be used for Cellranger pipeline. Thus I also thought about using the bam file from the SRA to get the compatible fastq but got the same problem.

@pmarks
Copy link
Contributor

pmarks commented Sep 18, 2019

@marcjwilliams1 @qingnanl I was poking around on SRA trying to understand what happened. I found the 'Original format' section in the Run Browser. In that section there's a BAM file link which appears to be the original BAM Cell Ranger library and works with bamtofastq -- so I think that's your path forward.

@KforKuma
Copy link

I am sorry for 'necrobumping' this post but I am trapped by GSE116222 too. Before try bam file I would like to know if it works. At first I too thought the fastq file is concantenated or interleaved, having some name issue. But it seems to have both side UMI & barcode trimmed to save the overlapped sequence only, if I havent get it wrong?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants