-
Notifications
You must be signed in to change notification settings - Fork 6
Performance
csw edited this page Jun 28, 2012
·
4 revisions
Since MAF files can be hundreds of GB, performance is an important consideration. So far, the chunked parser appears to have very competitive performance. Times for parsing a 315 MB file and counting MAF blocks:
- chunked parser: 10.1 s
- line-based parser: 16.0 s
- bx-python parser: 22.7 s
- PHAST: <= 18.3 s (not strictly comparable, was writing MAF output also)
Also, JRuby 1.7 on Java 7 appears to be almost twice as fast at MAF parsing as CRuby 1.9.3, after it warms up, averaging 16 µs per alignment block compared to 25 µs.
Disabling JRuby's ObjectProxyCache with the -Xji.objectProxyCache=false
option gives a massive performance gain (about 2.5x in my testing) for multithreaded index scans by eliminating lock contention.
[bx-python]
$ time maf_count.py < ~/maf/chrY.maf
95437
real 0m23.136s
user 0m22.685s
sys 0m0.390s
[bio-maf, chunked parser]
$ time bin/maf_count --parser ChunkParser ~/maf/chrY.maf
Parsed 95437 MAF alignment blocks.
real 0m10.481s
user 0m10.140s
sys 0m0.249s
[bio-maf, original parser]
$ time bin/maf_count ~/maf/chrY.maf
Parsed 95437 MAF alignment blocks.
real 0m16.445s
user 0m16.003s
sys 0m0.285s
[PHAST]
$ time maf_parse ~/maf/chrY.maf > /dev/null
real 0m18.607s
user 0m18.325s
sys 0m0.255s
[MRI vs. JRuby]
$ ruby -v
jruby 1.7.0.preview1 (ruby-1.9.3-p203) (2012-05-19 00c8c98) (Java HotSpot(TM) 64-Bit Server VM 1.7.0_04) [darwin-x86_64-java]
$ bin/maf_parse_bench -w ~/maf/chrY.maf
0.000016 0.000000 0.000016 ( 0.000015)
=========================
$ ruby -v
ruby 1.9.3p194 (2012-04-20 revision 35410) [x86_64-darwin11.3.0]
$ bin/maf_parse_bench -w ~/maf/chrY.maf
0.000025 0.000000 0.000025 ( 0.000026)