Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to read large excel files #266

Open
fivetran-satvikpatil opened this issue Nov 7, 2023 · 6 comments
Open

Unable to read large excel files #266

fivetran-satvikpatil opened this issue Nov 7, 2023 · 6 comments

Comments

@fivetran-satvikpatil
Copy link

Hi guys,
I am unable to read large Excel files.

I am using the below code to fetch the workbook:

StreamingReader.builder()
                    .rowCacheSize(100) 
                    .bufferSize(4096) 
                    .open(inputStreamSupplier.get());

It is failing with below error :

java.lang.OutOfMemoryError: Requested array size exceeds VM limit
	at java.base/java.util.Arrays.copyOf(Arrays.java:3537)
	at java.base/java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:100)
	at java.base/java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:130)
	at org.apache.poi.util.IOUtils.toByteArray(IOUtils.java:185)
	at org.apache.poi.util.IOUtils.toByteArray(IOUtils.java:149)
	at org.apache.poi.util.IOUtils.toByteArray(IOUtils.java:136)
	at org.apache.poi.openxml4j.util.ZipArchiveFakeEntry.<init>(ZipArchiveFakeEntry.java:47)
	at org.apache.poi.openxml4j.util.ZipInputStreamZipEntrySource.<init>(ZipInputStreamZipEntrySource.java:53)
	at org.apache.poi.openxml4j.opc.ZipPackage.openZipEntrySourceStream(ZipPackage.java:210)
	at org.apache.poi.openxml4j.opc.ZipPackage.openZipEntrySourceStream(ZipPackage.java:194)
	at org.apache.poi.openxml4j.opc.ZipPackage.openZipEntrySourceStream(ZipPackage.java:168)
	at org.apache.poi.openxml4j.opc.ZipPackage.<init>(ZipPackage.java:149)
	at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:277)
	at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:186)
	at com.monitorjbl.xlsx.impl.StreamingWorkbookReader.init(StreamingWorkbookReader.java:113)
	at com.monitorjbl.xlsx.impl.StreamingWorkbookReader.init(StreamingWorkbookReader.java:91)
	at com.monitorjbl.xlsx.StreamingReader$Builder.open(StreamingReader.java:251)

The memory assigned is 8GB, but it is still failing with this error. By observing the stack trace and also the heap dump, I noticed that we are creating a byte array with the total size of around 2.1GB.
How can we fix this issue?

@pjfanning
Copy link
Contributor

pjfanning commented Nov 7, 2023

This project is unmaintained as far as I can see. I have a fork and have some docs about some POI settings that can further reduce memory usage. https://github.com/pjfanning/excel-streaming-reader#reading-very-large-excel-files

There is also https://github.com/dhatim/fastexcel

@mcv
Copy link

mcv commented Jan 9, 2024

Is your version sufficiently maintained that I should be using yours instead of forking this project myself? (I've got my own fork where I fixed a bunch of issues.)

And if not, is it possible to take over the maintenance of this project? I need this for a project, and I might be in a position to take over if necessary.

@pjfanning
Copy link
Contributor

@mcv
Copy link

mcv commented Jan 9, 2024

Looking at it now. You've got a lot more work in it than I have. Does this mean your fork is meant to be the official branch now? I'll switch over. That does make my life a lot easier.

@pjfanning
Copy link
Contributor

Looking at it now. You've got a lot more work in it than I have. Does this mean your fork is meant to be the official branch now? I'll switch over. That does make my life a lot easier.

It's not the official fork but I am not aware of any forks that have as many changes as mine. PRs on my fork are welcome.

@mcv
Copy link

mcv commented Jan 9, 2024

Unfortunately I can't use your version because of dependency conflicts. I'll stick with my own for now. Hopefully I can switch to yours later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants