Add RNA-Seq TIN QC support #56

jonperdomo · 2024-08-08T19:33:23Z

Add TIN values for RNA-Seq QC from BAM files, including unit tests.

jonperdomo · 2024-08-22T17:44:40Z

I test with a GTEx RNA-seq file GTEX-14BMU-0526-SM-5CA2F_rep.FAK93376.bam and compared results with RSeQC. RSeQC TIN.py has default parameters for minimum coverage and sample size, and thus I implement both these parameters for direct comparisons, so that users can expect identical results as RSeQC. For transcripts, I download the latest GENCODE v46 file of basic gene annotations for the GRCh38 reference chromosomes, gencode.v46.basic.annotation.bed from https://www.gencodegenes.org/human/release_46.html

I set minimum coverage to 2, and sample size to 100.
RSeQC:

tin.py -i "${mod_bam}" -r "${bed_file}" -c 2 -n 100
Number of scores: 67069
Mean TIN: 67.089549182989
Median TIN: 74.25578864168884
Standard deviation of TIN: 26.001131242677577

LongReadSum:

longreadsum bam -i "${mod_bam}" -o "${output_dir}" -t 12 --genebed "${bed_file}" --min-coverage 2 --sample-size 100
Number of scores: 67069
Mean TIN: 67.0683
Median TIN: 74.25
Standard deviation of TIN: 26.0379

jonperdomo · 2024-08-22T19:26:58Z

This PR will also address the help text error from issue #57

jonperdomo · 2024-08-28T17:37:03Z

Updated results with high precision.

TIN Results

RSeQC:

tin.py -i "${mod_bam}" -r "${bed_file}" -c 2 -n 100
Number of scores: 67069
Mean TIN: 67.089549182989
Median TIN: 74.25578864168884
Standard deviation of TIN: 26.001131242677577

LongReadSum:

longreadsum bam -i "${mod_bam}" -o "${output_dir}" -t 12 --genebed "${bed_file}" --min-coverage 2 --sample-size 100
Number of scores: 67069
Mean TIN: 67.06832655372376
Median TIN: 74.24996965188242
Standard deviation of TIN: 26.03788585287367

Performance comparison (--mem=50G, --cpus-per-task=8, --time=12:00:00) with seff:

RSeQC:

Nodes: 1
Cores per node: 8
CPU Utilized: 07:55:21
CPU Efficiency: 12.45% of 2-15:39:12 core-walltime
Job Wall-clock time: 07:57:24
Memory Utilized: 166.25 MB
Memory Efficiency: 0.32% of 50.00 GB

LongReadSum:

Nodes: 1
Cores per node: 8
CPU Utilized: 02:48:34
CPU Efficiency: 12.67% of 22:10:56 core-walltime
Job Wall-clock time: 02:46:22
Memory Utilized: 5.91 GB
Memory Efficiency: 11.83% of 50.00 GB

jonperdomo · 2024-08-28T17:40:19Z

Add a unit test to complete this PR.

jonperdomo · 2024-09-19T16:54:27Z

This PR adds a new feature for calculating TIN scores, yielding the scores and their summary statistics in TSV format, and adding this summary to the html report:

Update style and update base modification unit tests

1f8b92b

jonperdomo linked an issue Aug 8, 2024 that may be closed by this pull request

Add TIN QC for RNA-seq data #54

Closed

jonperdomo self-assigned this Aug 8, 2024

jonperdomo added 16 commits August 8, 2024 16:12

Remove modification testing code

f20de6e

Work on TIN

2124d42

Work on TIN

1f6cec9

work on TIN

0916f19

Update coverage to use mpileup parameters

66301a4

Update coverage

429d3d3

Update coverage values

c46b40c

Update implementation

65ef602

Update TIN

1228742

updates from multi exon tests

475f765

Fix sigma Ci error

8070125

Add TIN summary statistics

3f621a9

Tested sample size parameter

b8ab6d8

Added sample size and mincov parameters

cd663a7

Fix the mincov implementation

a56c015

GTEX tests

e2096b6

Work on saving TIN files, and fixing help text error

bd4df9e

jonperdomo added 7 commits August 23, 2024 12:13

Update README.md

585d100

Fix tsv output order

431f85b

Update README

ac0db6c

Update README.md

441a7be

Update README

bd7b6a9

Skip deletions like RSeQC

8c69409

Undo introduced error, remove test code

805c32e

Remove debug output

7eba3e9

jonperdomo added 17 commits August 28, 2024 15:35

Add TIN multi-file support and add to HTML report

ad58986

Remove debug output

110ab0b

Set output precision

f86791c

Update README

d016881

File parser update

c4cc4b7

Update README

b53ad18

Fix fasta read lengths error

92c15eb

Update README and remove unused seqtxt parameters

0953b50

Add TIN unit tests, and remove debug outputs

c052dc2

Update github actions and channel priorities

a2040b8

Update github actions

5b6c2d7

Update github actions

f9414cf

Set the library path

ff23e9d

Update actions

a6dfd2a

update makefile

c0825c3

remove debug code

19b9deb

Remove commented code

bf50b48

jonperdomo marked this pull request as ready for review September 19, 2024 16:54

jonperdomo merged commit b000dfb into main Sep 19, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add RNA-Seq TIN QC support #56

Add RNA-Seq TIN QC support #56

jonperdomo commented Aug 8, 2024 •

edited

Loading

jonperdomo commented Aug 22, 2024

jonperdomo commented Aug 22, 2024

jonperdomo commented Aug 28, 2024

jonperdomo commented Aug 28, 2024

jonperdomo commented Sep 19, 2024

Add RNA-Seq TIN QC support #56

Add RNA-Seq TIN QC support #56

Conversation

jonperdomo commented Aug 8, 2024 • edited Loading

jonperdomo commented Aug 22, 2024

jonperdomo commented Aug 22, 2024

jonperdomo commented Aug 28, 2024

TIN Results

Performance comparison (--mem=50G, --cpus-per-task=8, --time=12:00:00) with seff:

jonperdomo commented Aug 28, 2024

jonperdomo commented Sep 19, 2024

jonperdomo commented Aug 8, 2024 •

edited

Loading