prsconv2

Improved tool to convert Tombo per-read statistics files into CSV files

Usage

Suppose you have a Tombo per-read statistics file at /path/to/prs/file, and you want to transform it into a CSV file at /path/to/future/csv/file.csv. Then run something like this at your terminal:

$ python prsconv2.py --chromosome "truncated_hiv_rna_genome" --strand "+" --start 0 --end 1000000000 /path/to/prs/file /path/to/future/csv/file.csv

If you leave the --chromosome, --strand, --start, and --end options out, the script will default to the same values written explicitly in this example The default values are probably correct for the people using this script in July 2020, but they won't work for every per-read statistics file.

This tool will not delete the per-read statistics file. It will not overwrite a preexisting csv file either (unless the user supplies the --overwrite option).

For hints on command-line usage, run

$ python prsconv2.py --help

Output Format

The output CSV file will have a row for every read and a column for every nucleotide position. The entry in row x, column y is a statistic representing the likelihood that read x was modified at position y. This statistic is usually a p-value, but it can also be a log-likelihood ratio. (It depends on the options that tombo used to produce the per-read statistics file.)

Rows are labeled by labeled by their read IDs (which look like "01c40adf-7144-4f6f-95d7-4cc08abc52d2"). The column label is a number representing a nucleotide position along the reference sequence that was used for resquiggling. That means that if you resquiggled your reads to a reference fasta that looks like this:

> truncated_hiv_rna_genome
ACTGGTCAGATTACAACGTGTACCATGTCA...

then the column labeled as "2" corresponds to the same position as the first T in the reference Fasta. (Note that this is a zero-based index; position 2 corresponds to T, not C.)

The first and last few columns will usually be NaN values.

Troubleshooting

Make sure Python is installed (or that a Python module is loaded). This script was written to be run with Python 2.
Look at the reference sequence you used for resquiggling. If its name is something other than "truncated_hiv_rna_genome", you need to supply the --chromosome option. This script uses Tombo's Python API to read the per-read statistics file. Unfortunately, Tombo's API requires us to specify upfront the name of the chromosome for which we want statistics. If you really can't figure out the chromosome name, try opening up the per-read statistics file and looking for the chrom attribute. On my per-read statistics files, the following command works. (You will have to tweak it.)

$ h5dump -a /Statistic_Blocks/Block_0/chrm test1_depth_50k_fisher_3_v14.tombo.per_read_stats

Notes

This script was not heavily tested. I did my best, but be careful.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
tests		tests
.gitignore		.gitignore
README.md		README.md
cli.py		cli.py
prsconv2.py		prsconv2.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

prsconv2

Usage

Output Format

Troubleshooting

Notes

About

Releases

Packages

Languages

SycamoreLeaf/prsconv2

Folders and files

Latest commit

History

Repository files navigation

prsconv2

Usage

Output Format

Troubleshooting

Notes

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages