Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid array length on "large xml" file #20

Open
nunomarks opened this issue Mar 18, 2024 · 8 comments
Open

Invalid array length on "large xml" file #20

nunomarks opened this issue Mar 18, 2024 · 8 comments

Comments

@nunomarks
Copy link

nunomarks commented Mar 18, 2024

There seems to be a limit on the size of the XML that the tool can validate. The following error in thrown for a "large XML file":

RangeError [Error]: Invalid array length

I was able to reproduce the issue on an XML file that surpasses 134_217_724 bytes, or in hex, 0x7FFFFFC. Seems like a weird limit though, but pretty sure this is not an xmllint issue, as I'm able to validate the file via command line.

Is it possible to take a look at this issue?

@nunomarks nunomarks changed the title Invalid array length on very large xml file Invalid array length on "large xml" file Mar 18, 2024
@noppa
Copy link
Owner

noppa commented Mar 18, 2024

I could take a look, but I'd need to be able to reproduce the error first.

Can you post the output of

npm version

Are you running on Mac, Windows or Linux, and if on Mac, is it Intel or M1 or M2?

This doesn't look like a mermory limit issue per se, but you could try raising the max memory limit

const {validateXML, memoryPages} = require('xmllint-wasm');

validateXML({
    ...
    maxMemoryPages: xmllint.memoryPages.GiB,

@nunomarks
Copy link
Author

nunomarks commented Mar 18, 2024

npm version
{
  npm: '10.2.4',
  node: '20.11.1',
  acorn: '8.11.2',
  ada: '2.7.4',
  ares: '1.20.1',
  base64: '0.5.1',
  brotli: '1.0.9',
  cjs_module_lexer: '1.2.2',
  cldr: '43.1',
  icu: '73.2',
  llhttp: '8.1.1',
  modules: '115',
  napi: '9',
  nghttp2: '1.58.0',
  nghttp3: '0.7.0',
  ngtcp2: '0.8.1',
  openssl: '3.0.13+quic',
  simdutf: '4.0.4',
  tz: '2023c',
  undici: '5.28.3',
  unicode: '15.0',
  uv: '1.46.0',
  uvwasi: '0.0.19',
  v8: '11.3.244.8-node.17',
  zlib: '1.2.13.1-motley-5daffc7'
}

Are you running on Mac, Windows or Linux, and if on Mac, is it Intel or M1 or M2?

Mac Apple M1 Pro

This doesn't look like a memory limit issue per se, but you could try raising the max memory limit

I did that and increased to the maximum memory possible before creating this issue (and it didn't work), so you are right, it's not a memory limit issue.

@noppa
Copy link
Owner

noppa commented Mar 18, 2024

Thanks, I don't have a Mac available but I'll try to reproduce with the other environment specs

@nunomarks
Copy link
Author

Thanks, I don't have a Mac available but I'll try to reproduce with the other environment specs

Happy to provide you anything else you might need to help you reproduce it. I attached the XML I used to test.

very-large.xml.zip

noppa added a commit that referenced this issue Mar 18, 2024
This commit aims to fix issue #20.

Use the Emscripten FS.writeFile API for accepting XML input files,
instead of the createDataFile and especially the intArrayFromString
function. Those were inherited from the parent upstream project, but
this writeFile API seems to be simpler to use and performs better.

The bigger fix, though, is in the output side, as pushing one piece of
stdout (I guess it was pushing one byte at a time?) caused the
stdoutBuffer array to eventually grow so large that it'd throw

> RangeError [Error]: Invalid array length

when the output was very big, like when normalizing a big input XML, as
described in #20.

Here, too, we can switch to the print/printErr APIs, which seem to be not
only simpler but also more resilient to the input size growing.
noppa added a commit that referenced this issue Mar 18, 2024
This commit aims to fix issue #20.

Use the Emscripten FS.writeFile API for accepting XML input files,
instead of the createDataFile and especially the intArrayFromString
function. Those were inherited from the parent upstream project, but
this writeFile API seems to be simpler to use and performs better.

The bigger fix, though, is in the output side, as pushing one piece of
stdout (I guess it was pushing one byte at a time?) caused the
stdoutBuffer array to eventually grow so large that it'd throw

> RangeError [Error]: Invalid array length

when the output was very big, like when normalizing a big input XML, as
described in #20.

Here, too, we can switch to the print/printErr APIs, which seem to be not
only simpler but also more resilient to the input size growing.
@noppa
Copy link
Owner

noppa commented Mar 19, 2024

Thanks for the test XML, I was able to reproduce the issue.

I have a possible (not yet very well tested) fix, here: v5.0.0-alpha (PR #21). Could you test with this prerelease version, e.g. by installing it with

npm i xmllint-wasm@https://github.com/noppa/xmllint-wasm/releases/download/v5.0.0-alpha/xmllint-wasm.tgz

@nunomarks
Copy link
Author

The solution from #21 works. I have tested with several large XMLs (all UTF-8 encoded) and everything is flawless 👌

@noppa
Copy link
Owner

noppa commented Mar 19, 2024

Sweet. I'm a bit busy for a few days but will try to test and craft a proper release soonish. Meanwhile, the github alpha release should work fine.

noppa added a commit that referenced this issue Apr 14, 2024
This commit aims to fix issue #20.

Use the Emscripten FS.writeFile API for accepting XML input files,
instead of the createDataFile and especially the intArrayFromString
function. Those were inherited from the parent upstream project, but
this writeFile API seems to be simpler to use and performs better.

The bigger fix, though, is in the output side, as pushing one piece of
stdout (I guess it was pushing one byte at a time?) caused the
stdoutBuffer array to eventually grow so large that it'd throw

> RangeError [Error]: Invalid array length

when the output was very big, like when normalizing a big input XML, as
described in #20.

Here, too, we can switch to the print/printErr APIs, which seem to be not
only simpler but also more resilient to the input size growing.
@noppa
Copy link
Owner

noppa commented Apr 14, 2024

This is also now in npm, with version 5.0.0-rc.0. I'll also test this in prod a bit before I dare to tag that as the latest release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants