Skip to content

Memory efficient Shared Strings Table implementation for POI streaming

License

Notifications You must be signed in to change notification settings

pjfanning/poi-shared-strings

Repository files navigation

Build Status Maven Central

poi-shared-strings

Memory efficient Shared Strings Table implementation for POI xlsx streaming.

Supports read and write use cases when used with POI 5.x. poi-shared-strings 1.x supports POI 4.x.

https://bz.apache.org/bugzilla/show_bug.cgi?id=61832

The TempFileSharedStringsTable uses a H2 MVStore to store the Excel Shared String data. The MVStore data can be encrypted using a generated password.

This class can be used instead of the POI SharedStringsTable and ReadOnlySharedStringsTable. It is only useful if you expect to need to support large numbers of shared string entries.

Usage

When reading files, use new TempFileSharedStringsTable(opcPackage, true) to have the shared strings loaded from the xlsx package.

If you are using the TempFileSharedStringsTable when writing files (eg using SXSSFWorkbook), then use new TempFileSharedStringsTable(true) to create an empty table that you can add shared string entries to.

Comments Table

v2.0.2 added support for TempFileCommentsTable which works in a similar way to TempFileSharedStringsTable.

Up until v2.4.0, TempFileCommentsTable only supported read use cases. With v2.4.0, it can now be used with SXSSFWorkbook like TempFileSharedStringsTable can.

This class can be used instead of the POI CommentsTable. It is only useful if you expect to need to support large numbers of comments entries.

Full Format

v2.1.0 added support for parsing the shared strings and comments and keeping the formatting the data. This is optional and not enabled by default. This support requires a little extra memory. The default is to just extract the text of the shared strings and comments. Full Format support may not work when the files are in Strict OOXML format (POI Issue).

The current implementation of Full Format can be a lot slower than the plain text solution due to the extra XML parsing overhead.

Map-Backed implementations

Since v2.5.0, you can now avoid using temp files by using Map-backed implementations.

  • MapBackedSharedStringsTable
  • MapBackedCommentsTable

Samples

There is an xlsx reading sample and also an xlsx writing sample at https://github.com/pjfanning/poi-shared-strings-sample.