Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion: Customizable row/cell limit #541

Merged
merged 3 commits into from
Apr 24, 2018

Conversation

papandreou
Copy link
Contributor

@papandreou papandreou commented Apr 13, 2018

Hi!

We're running an app where customers can import data by uploading Excel spreadsheets. Over the years, we've seen some pretty nasty examples of files that contain millions of cells, which causes the import process to stall for a long time and run out of memory.

It seems like these files are not maliciously crafted, and there aren't millions of rows with actual data, but those cells still exist in the XML markup with a reference to some styling or format. My suspicion is that the user typed Cmd+A and applied a format or style, and then the resulting file needs to contain all the rows as placeholders for the styling. Unfortunately, I don't have an example handy, but you might be familiar with that kind of thing already :)

I've come up with the enclosed hack to abort the reading when too many rows have been read. Think of it as a proof of concept -- I'm aware that it's not mergeable in its current form, mostly because of how the maxRows is defined and parsed down. At the very least, it should be part of an options object or something. Also, it would be nice if it could limit the number of cells instead of rows.

At this time, this is basically the only patch that's keeping us from switching back to the latest officially released version of exceljs, so I'm quite eager to get this feature added in some shape or form.

Is this something that other users would like in general?
Any input about how to do it "the right way"?

@guyonroche
Copy link
Collaborator

@papandreou This sounds like a very sensible feature - especially as it will reduce servers vulnerability to bad xlsx based inputs.

Regarding implementation, my thoughts are:

  • add an options argument to XLSX.readFile() and read() which can contain things like maxRows (or maxCols, etc). It can also indicate appropriate action - e.g. throw, crop, etc.
  • The options would be used in all the XLSX.process****Entry functions and passed into the relevant Xform objects
  • WorksheetXform would translate an option like "maxRows" into a "maxLength" option to give to the ListXform that handles the "sheetData" tag
  • ListXform can perform the appropriate action if the "maxLength" requirement is about to fail

@papandreou
Copy link
Contributor Author

@guyonroche, thanks for the input! I changed the API now so that the limit is configured via an options object passed to XLSX.read(..., {maxRows: 123}) instead. It feels a little awkward to explicitly pass it down like that in so many steps, am I missing something? Is there a more central place where it can be stored?

@guyonroche
Copy link
Collaborator

I don't think there is a completely 'tidy' solution - passing options down through the call stack is ok

@guyonroche guyonroche merged commit a3c7065 into exceljs:master Apr 24, 2018
@guyonroche
Copy link
Collaborator

@papandreou - I tweaked your changes a bit...

  • removed the options from the Workbook document as they weren't used
  • moved the check from RowXform (the child) to ListXform (the parent)
  • changed the text of the error to 'Max row count exceeded' as it now comes from ListXform
  • modified the error handling in base-xform
    Hope that's ok

@papandreou
Copy link
Contributor Author

@guyonroche, thanks a lot for getting back to me so soon and helping me get this into shape! You're an exemplary maintainer.

@Clowting
Copy link

Hi, trying to get this to work as well. Do you have a code example handy that shows the implementation, I'm struggling to get it to work.

@guyonroche
Copy link
Collaborator

@Clowting this feature adds an argument to the workbook.xlsx.readFile function.
There is an integration test added by Andreas that demonstrates how it works:
https://github.com/guyonroche/exceljs/blob/master/spec/integration/workbook-xlsx-reader.spec.js

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants