Problems preprocessing COVID-19 Sample from Paper #9

yeredh · 2020-07-03T02:35:14Z

Hello,

I downloaded the FASTQ files for sample GSM4339771 (SRR11181956) from SRA in the original format from https://trace.ncbi.nlm.nih.gov/Traces/sra/?run=SRR11181956

So I end up with two files

C143_R1.fastq.gz.1
C143_R2.fastq.gz.1

I was able to identify the cell barcodes with umi_tools

umi_tools whitelist --stdin C143_R1_test.fastq.gz  \
                    --bc-pattern=CCCCCCCCCCCCCCCCNNNNNNNNNN \
                    --set-cell-number=100 \
                    --log2stderr > whitelist.txt;

However, when I tried the next step; extracting the barcodes and UMIs and add to read names

umi_tools extract --bc-pattern=CCCCCCCCCCCCCCCCNNNNNNNNNN \
                  --stdin C143_R1.fastq.gz \
                  --stdout C143_R1_extracted.fastq.gz \
                  --read2-in C143_R2.fastq.gz  \
                  --read2-out=C143_R2_extracted.fastq.gz \
                  --filter-cell-barcode \
                  --whitelist=whitelist.txt;

I get the following error message

ValueError: 
Read pairs do not match
CL200152206L1C001R001_0/1 != CL200152206L1C001R001_0/2

What am I doing wrong?

Best,

Yered

The text was updated successfully, but these errors were encountered:

Dragonlongzhilin · 2020-07-03T08:09:12Z

I guess that the ids are not consistent one-to-one match between read 1 and read2. You should check the fastq file.

PierreBSC · 2020-07-03T08:39:42Z

Hi Yered,

So basically you are doing it compli right and the problem comes from the files.
UMI-tools has been designed to process fastq files produced by Illumina devices. The files you are mentionning have been generated by a BGI machine : therefore the headers are a bit different.
This is problematic but can be solved. First you need to install a specific version of UMI-tools : https://github.com/CGATOxford/UMI-tools/tree/%7BTS%7D-IgnoreReadPairSuffix. You then need to modify the extract line as describe here : CGATOxford/UMI-tools#325 and it should do the job !

Hope this will help,

Best

Pierre

yeredh · 2020-07-03T10:47:34Z

Thank you Pierre for your prompt reply!

yeredh closed this as completed Jul 3, 2020

TomSmithCGAT mentioned this issue Jul 3, 2020

{ts} ignore read pair suffixes CGATOxford/UMI-tools#421

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problems preprocessing COVID-19 Sample from Paper #9

Problems preprocessing COVID-19 Sample from Paper #9

yeredh commented Jul 3, 2020 •

edited

Loading

Dragonlongzhilin commented Jul 3, 2020

PierreBSC commented Jul 3, 2020

yeredh commented Jul 3, 2020

Problems preprocessing COVID-19 Sample from Paper #9

Problems preprocessing COVID-19 Sample from Paper #9

Comments

yeredh commented Jul 3, 2020 • edited Loading

Dragonlongzhilin commented Jul 3, 2020

PierreBSC commented Jul 3, 2020

yeredh commented Jul 3, 2020

yeredh commented Jul 3, 2020 •

edited

Loading