Forked Excel Parsing Benchmarks

A fork of @shkm's project to test improvements to simple_xlsx_reader.

Benchmarking the speed of reading xlsx files with various Ruby libraries.

Usage

bundle
./benchmark.sh.
./big-benchmark.sh.

The project comes with three sample files originally found here:

10k_sales_records_non_unique.xlsx simplest to parse, having mostly non-unique strings, which is what I could find in stock sample data. Excel's shared strings usage is a great optimization for this kind of file.
10k_sales_records.xlsx whose first four string columns have unique values, which is IMO more representative of many real-world data files and makes the shared strings portion of the xlsx archive significant.
300k_sales_records.xlsx also using unique/representative strings but much larger.

Test them via:

WORKBOOK=10k_sales_records_non_unique.xlsx ./benchmark.sh

Results

Run on a 2021 Macbook Pro M1.

The TL;DR resultset is this - ./benchmark.sh run against 10k_sales_records.xlsx, which best shows the difference between simple_xlsx_reader and others:

Gem	Parses/second	RSS Increase	Allocated Mem	Retained Mem	Allocated Objects	Retained Objects
simple_xlsx_reader	1.13	36.94mb	614.51mb	1.13kb	8796275	3
roo	0.75	74.0mb	164.47mb	2.18kb	2128396	4
creek	0.65	107.55mb	581.38mb	3.3kb	7240760	16
xsv	0.61	75.66mb	2127.42mb	3.66kb	5922563	10
rubyxl	0.27	373.52mb	716.7mb	2.18kb	10612577	4

It's interesting to note what happens when you have a file that doesn't use much shared strings, aka 10k_sales_records_non_unique.xlsx, simple_xlsx_reader and creek are neck and neck:

Gem	Parses/second	RSS Increase	Allocated Mem	Retained Mem	Allocated Objects	Retained Objects
creek	1.32	36.06mb	520.48mb	3.2kb	6613688	15
simple_xlsx_reader	1.3	34.31mb	563.62mb	1.13kb	8056277	3
roo	0.86	54.53mb	141.48mb	2.18kb	1895656	4
xsv	0.76	43.0mb	1893.92mb	3.9kb	5483908	10
rubyxl	0.3	359.95mb	662.25mb	2.18kb	9822929	4

Then there's the really big file, 300k_sales_records.xlsx, which similar to our first example also has lots of unique strings (note, omitting MemoryProfiler results because it has too much overhead for this test, and changing parses/second to total time for a single parse):

Gem	Time	RSS Increase
simple_xlsx_reader	28.32s	95.66mb
roo	43.05s	900.86mb
creek	48.41s	820.64mb
xsv	50.09s	272.97mb
rubyxl	278.47s	3423.31mb

Also, here's the results from a real-world file I can't share, but was 26mb and a big portion of shared strings:

Gem	Time	RSS Increase
simple_xlsx_reader	28.71s	148.77mb
roo	40.25s	1322.08mb
xsv	45.82s	391.27mb
creek	60.63s	886.81mb
rubyxl	238.68s	9136.3mb

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.gitignore		.gitignore
10k_sales_records.xlsx		10k_sales_records.xlsx
10k_sales_records_non_unique.xlsx		10k_sales_records_non_unique.xlsx
300k_sales_records.xlsx		300k_sales_records.xlsx
Gemfile		Gemfile
Gemfile.lock		Gemfile.lock
README.md		README.md
benchmark.rb		benchmark.rb
benchmark.sh		benchmark.sh
big-benchmark.rb		big-benchmark.rb
big-benchmark.sh		big-benchmark.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Forked Excel Parsing Benchmarks

Usage

Results

About

Releases

Packages

Languages

woahdae/excel-parsing-benchmarks

Folders and files

Latest commit

History

Repository files navigation

Forked Excel Parsing Benchmarks

Usage

Results

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages