Skip to content

polydbms/sheetreader-core

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 

Repository files navigation

SheetReader

SheetReader is a blazingly fast and memory-efficient spreadsheet parser for tabular data from Excel OOXML (.xlsx) files, implemented in C++. Other spreadsheet parsers are based on general-purpose XML parsers, that lead to CPU and memory over-utilization, because of the redundant XML information and the inflated in-memory XML tree representation. In contrast, SheetReader leverages the fixed spreadsheet structure, employs parallelism at different levels, and manages memory efficiently.

Bindings

We also provide bindings for several environments:

  • R language: load spreadsheets into dataframes, also available via CRAN
  • Python language: load spreadsheets into Pandas dataframes.
  • PostgreSQL FDW: execute SQL on spreadsheets & combine spreadsheets with DBMS tables

Paper

SheetReader was published in the Information Systems Journal

@article{DBLP:journals/is/GavriilidisHZM23,
  author       = {Haralampos Gavriilidis and
                  Felix Henze and
                  Eleni Tzirita Zacharatou and
                  Volker Markl},
  title        = {SheetReader: Efficient Specialized Spreadsheet Parsing},
  journal      = {Inf. Syst.},
  volume       = {115},
  pages        = {102183},
  year         = {2023},
  url          = {https://doi.org/10.1016/j.is.2023.102183},
  doi          = {10.1016/J.IS.2023.102183},
  timestamp    = {Mon, 26 Jun 2023 20:54:32 +0200},
  biburl       = {https://dblp.org/rec/journals/is/GavriilidisHZM23.bib},
  bibsource    = {dblp computer science bibliography, https://dblp.org}
}

Acknowledgements

SheetReader includes and uses the following C/C++ libraries:

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •