transcript numbers don't match between OUT.transcript_model_grouped_tpm.tsv and OUT.transcript_models.gtf #195

dudududu12138 · 2024-05-29T03:22:33Z

Hi, I recently used isoquant(version 3.4.1) to detect isoforms from multiple samples. But the results are a little odd. Below are my quaetions:

There are 167413 transcripts in OUT.transcript_models.gtf while 166857 in OUT.transcript_model_tpm.tsv.
I didn't see the <combine.*> files. There are only <grouped> files. So what is the result file that report the expression levels of multisamples?
Below is all my outputs:

aux                          OUT.gene_grouped_counts.tsv  OUT.transcript_counts.tsv          OUT.transcript_model_grouped_counts.tsv  OUT.transcript_model_tpm.tsv
OUT.corrected_reads.bed.gz   OUT.gene_grouped_tpm.tsv     OUT.transcript_grouped_counts.tsv  OUT.transcript_model_grouped_tpm.tsv     OUT.transcript_tpm.tsv
OUT.extended_annotation.gtf  OUT.gene_tpm.tsv             OUT.transcript_grouped_tpm.tsv     OUT.transcript_model_reads.tsv.gz
OUT.gene_counts.tsv          OUT.read_assignments.tsv.gz  OUT.transcript_model_counts.tsv    OUT.transcript_models.gtf

Below is my codes:

samples=`ls ../minimap2/*sorted.bam`

isoquant.py -t 30 \
        --reference $ref \
        --genedb $anno --complete_genedb \
        --bam $samples \
        --data_type pacbio_ccs \
        -o $out/

Thank you very much!

The text was updated successfully, but these errors were encountered:

andrewprzh · 2024-06-07T14:45:13Z

Dear @dudududu12138

Could you send me the files? This looks unexpected.
Currently, if you provide several BAM files, they will treated as distinct samples within the same experiment. Thus, a single output folder with a single transcript_models.gtf files will be created. "Per-file" count/TPM tables are stored in _grouped files.
Combined tables are now created only if you provide 2 or more experiments via YAML dataset file.

Best
Andrey

dudududu12138 · 2024-06-24T01:50:26Z

Dear @dudududu12138

Could you send me the files? This looks unexpected.

Currently, if you provide several BAM files, they will treated as distinct samples within the same experiment. Thus, a single output folder with a single transcript_models.gtf files will be created. "Per-file" count/TPM tables are stored in _grouped files.
Combined tables are now created only if you provide 2 or more experiments via YAML dataset file.

Best Andrey

Sorry for the late reply. Below are the OUT.transcript_model_counts.tsv and OUT.transcript_models.gtf files. There are more transcripts in the gtf file. That is ,the transcripts in *counts.tsv are subset of transcripts in *models.gtf. It is odd.
OUT.transcript_models.gtf.gz
OUT.transcript_model_counts.tsv.gz

Moreover, I noticed that in my result, OUT.transcript_model_reads.tsv.gz and OUT.transcript_models.gtf have the same number of transcripts. While OUT.transcript_model_tpm.tsv and OUT.transcript_model_counts.tsv have the same number of transcripts and they are the subset of OUT.transcript_models.gtf

dudududu12138 · 2024-06-24T03:17:37Z

By the way, I checked myself. I extracted an example of a transcript that is reported in the OUT.transcript_models.gtf file but not in the OUT.transcript_model_counts.tsv file.
Below is the transcript id and read id that support the transcript that extracted from OUT.transcript_model_reads.tsv.gz.

Then I extracted the these reads from OUT.read_assignments.tsv.gz and saved them into file(example.read.assignment.txt.)
It is odd that some reads that support the novel transcript are full splice match and uniquely assigned to the reference transcript. So I am very confused.
Is the OUT.transcript_model_counts.tsv just counts the reads that uniquely mapped to the transcript?

andrewprzh · 2024-06-27T21:27:42Z

@dudududu12138

Yes, this is really odd.
Could you also send me the full log file?

andrewprzh · 2024-06-28T10:42:46Z

@dudududu12138

Here is what I found so far. Output GTF and supporting reads are likely correct. However, expression estimation algorithm is independent from transcript discovery and for some reason assigns 0 to some of the transcripts.
Transcripts with 0 counts/TPMs are not printed to transcript_model_counts.tsv. Hence the difference.

I will now check why expression estimation algorithm assigns 0 to these transcripts despite the fact they are supported by reads.

Best
Andrey

dudududu12138 · 2024-07-04T08:14:27Z

@dudududu12138

Here is what I found so far. Output GTF and supporting reads are likely correct. However, expression estimation algorithm is independent from transcript discovery and for some reason assigns 0 to some of the transcripts. Transcripts with 0 counts/TPMs are not printed to transcript_model_counts.tsv. Hence the difference.

Yes, I also found this reason.

I will now check why expression estimation algorithm assigns 0 to these transcripts despite the fact they are supported by reads.

Thanks, looking forward to your checking results!

dudududu12138 · 2024-07-04T08:19:36Z

@dudududu12138

Yes, this is really odd. Could you also send me the full log file?

This is the log file.
isoquant.log

dudududu12138 · 2024-07-04T08:20:01Z

@dudududu12138

Yes, this is really odd. Could you also send me the full log file?

andrewprzh · 2024-07-04T18:14:21Z

I think I found the problem. IsoQuant outputs a few novel isoforms that are very similar to the reference ones, and that's where read-to-isoform assignment goes wrong and the novel isoform gets 0 counts because all assignments are ambiguous. So, these similar isoforms should not be there in the first place.

I'll make a bug-fix release soon.

andrewprzh · 2024-07-14T21:58:04Z

Will be fixed in 3.5.0.

andrewprzh · 2024-08-03T11:08:00Z

Should be fixed in https://github.com/ablab/IsoQuant/releases/tag/v3.5.0

andrewprzh added the bug Something isn't working label Jun 28, 2024

dudududu12138 closed this as completed Jul 4, 2024

dudududu12138 reopened this Jul 4, 2024

andrewprzh added the fixed in dev Issue resolved but not released yet label Jul 14, 2024

andrewprzh mentioned this issue Jul 23, 2024

Remove similar transcripts before assigning reads #216

Merged

andrewprzh closed this as completed in #216 Jul 23, 2024

andrewprzh added the fixed in release Issue resolved and the fix is released, waiting for approval label Aug 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

transcript numbers don't match between OUT.transcript_model_grouped_tpm.tsv and OUT.transcript_models.gtf #195

transcript numbers don't match between OUT.transcript_model_grouped_tpm.tsv and OUT.transcript_models.gtf #195

dudududu12138 commented May 29, 2024 •

edited

Loading

andrewprzh commented Jun 7, 2024

dudududu12138 commented Jun 24, 2024 •

edited

Loading

dudududu12138 commented Jun 24, 2024 •

edited

Loading

andrewprzh commented Jun 27, 2024

andrewprzh commented Jun 28, 2024

dudududu12138 commented Jul 4, 2024 •

edited

Loading

dudududu12138 commented Jul 4, 2024

dudududu12138 commented Jul 4, 2024

andrewprzh commented Jul 4, 2024

andrewprzh commented Jul 14, 2024 •

edited

Loading

andrewprzh commented Aug 3, 2024

transcript numbers don't match between OUT.transcript_model_grouped_tpm.tsv and OUT.transcript_models.gtf #195

transcript numbers don't match between OUT.transcript_model_grouped_tpm.tsv and OUT.transcript_models.gtf #195

Comments

dudududu12138 commented May 29, 2024 • edited Loading

andrewprzh commented Jun 7, 2024

dudududu12138 commented Jun 24, 2024 • edited Loading

dudududu12138 commented Jun 24, 2024 • edited Loading

andrewprzh commented Jun 27, 2024

andrewprzh commented Jun 28, 2024

dudududu12138 commented Jul 4, 2024 • edited Loading

dudududu12138 commented Jul 4, 2024

dudududu12138 commented Jul 4, 2024

andrewprzh commented Jul 4, 2024

andrewprzh commented Jul 14, 2024 • edited Loading

andrewprzh commented Aug 3, 2024

dudududu12138 commented May 29, 2024 •

edited

Loading

dudududu12138 commented Jun 24, 2024 •

edited

Loading

dudududu12138 commented Jun 24, 2024 •

edited

Loading

dudududu12138 commented Jul 4, 2024 •

edited

Loading

andrewprzh commented Jul 14, 2024 •

edited

Loading