Skip to content

pbfrandsen/fasta_metadata_parser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 

Repository files navigation

FASTA metadata parser

This parser takes a FASTA file as input and calculates both scaffold and contig statistics (N50, L50, etc.) from a scaffold FASTA file. It does this by breaking each scaffold wherever there is more than one N and then calculating statistics for both the scaffolds and contigs.

It requires NumPy and scikit-bio. Both of which can be easily obtained from the Anaconda Python distribution.

####Usage:

To run the parser, simply add it to the directory with your FASTA file and issue the command:

$ python fasta_meta_data_parser.py <fasta_file_name>

It will then write the stats to the screen. It will look something like this:

Contig statistics:
Total number of base pairs: 1106252391
Total number of contigs: 8104
N10: 2049786
N20: 1376365
N30: 1058529
N40: 829321
N50: 647974
L10: 43
L20: 112
L30: 204
L40: 321
L50: 472
GC content: 42.88%
Median contig size: 21973.5
Mean contig size: 136506.96
Longest contig is: 4781189.0
Shortest contig is: 1231.0

Scaffold statistics:
Total number of base pairs: 1106317551
Total number of scaffolds: 7918
N10: 2170181
N20: 1496579
N30: 1109545
N40: 872285
N50: 685849
L10: 41
L20: 105
L30: 192
L40: 304
L50: 447
GC content: 42.88%
Median scaffold size: 21818.5
Mean scaffold size: 139721.84
Longest scaffold is: 4781189.0
Shortest scaffold is: 1231.0

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages