-
Notifications
You must be signed in to change notification settings - Fork 9
Bam file to count table
bamToCountTable.py is a flexible script to convert a BAM file into a table, the rows of the table contain samples and the columns features. The script can extract information from tags and/or any other attribute which can be read out by the PySAM API, such as is_paired
, is_duplicate
, reference_name
, reference_start
, reference_end
, is_qcfail
, is_read1
, mapping_quality
.
The -sampleTags argument defines which tags/attributes in the BAM file are used to determine the sample of a read. If multiple tags define the sample, use a comma without a space to separate the tags/attributes.
The -joinedFeatureTags argument defines a group of tags/attributes in the BAM file to be used as columns of the output table. By default the occurrences of tag combinations are counted, to sum over values the -byValue attribute should be used.
The -featureTags argument can be used instead of the -joinedFeatureTags argument and counts every supplied feature as a separate feature / column.
The output of the script is a csv file by default. When a filename ending with .pickle.gz is supplied the output will be written as a gzipped pickle file.
bamToCountTable.py test.bam -sampleTags SM -joinedFeatureTags reference_name -o reads_per_sample.csv
Amount of molecules --dedup
per sample -sampleTags SM
per chromosome -joinedFeatureTags reference_name
bamToCountTable.py test.bam -sampleTags SM -joinedFeatureTags reference_name --dedup -o molecules_per_sample.csv
Bin molecules in 250kb bins -bin 250_000
, where the bin is determined by the DS tag -binTag DS
and counted molecule have at least 20 mapping quality -minMQ 20
bamToCountTable.py test.bam -joinedFeatureTags reference_name -minMQ 20 --dedup -binTag DS -bin 250_000 -o molecules_binned_250k.csv
Bin molecules in 250kb bins -bin 250_000
with a sliding window of 50kb -sliding 50_000
where the location is determined by the starting position of the read -binTag reference_start
, ignore reads with alternative hits --filterXA
and mapping quality below 60 -minMQ 60
bamToCountTable.py test.bam -joinedFeatureTags reference_name -minMQ 60 --filterXA --dedup -binTag reference_start -bin 250_000 -o molecules_binned_250k_sliding_50k.csv
bamToCountTable.py test.bam -joinedFeatureTags reference_name --divideMultimapping -o counts.csv