Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Output large json stats objects #129

Merged
merged 5 commits into from
Jan 18, 2018

Conversation

ryan953
Copy link
Contributor

@ryan953 ryan953 commented Nov 2, 2017

Fixes #119

I've seen this problem appear in the logs as:

fs.writeFileSync(statsFilepath, JSON.stringify(stats, null, 2));
RangeError: Invalid string length

The problem is that we're trying to serialize the whole stats object into JSON at once, which causes memory issues with larger projects. The stats object has a shape including {"assets": [...], "entrypoints": {...}, "chunks": [...], "modules": [...]} and it's the chunks and modules fields that are really large lists.

To fix this I imported JSONStream and paired it with a writable stream to serialize and flush each key on it's own. For larger projects we might need to go further and iterate/write each item within the chunks/modules/children keys individually.

@jsf-clabot
Copy link

jsf-clabot commented Nov 2, 2017

CLA assistant check
All committers have signed the CLA.

@ryan953 ryan953 force-pushed the stream-json-stringify branch 3 times, most recently from 972ee30 to 132868f Compare November 2, 2017 21:42
@valscion valscion self-assigned this Nov 3, 2017
@valscion
Copy link
Member

valscion commented Nov 3, 2017

Thanks for taking a stab at this!

I haven't used JSONStream myself, so I had to dig a bit deeper to find out whether the new fileUtils tests work.

I discovered that setImmediate cannot be await on, so it turns out that the assertions are actually never ran in the current form. It's also a bit of a red herring to have to use setImmediate in the test — shouldn't it be enough for us to wait on the streamObjectToJson promise to be resolved? Shouldn't that signal that the stats.json file is ready to be read now?

If I modify the assertions to run immediately after streamObjectToJson promise is resolved, I've got a new issue:

const fileContent = fs.readFileSync(statsFilename, 'utf8');

When my code hits this line, it reads a partially written file or one that has no contents at all, and I receive a SyntaxError: Unexpected end of JSON input from JSON.parse line.

I played around with adding a setTimeout before resolving the streamObjectToJson promise, but it only resulted in the test sometimes passing, sometimes reading no content at all and sometimes reading only a partial file. It seems like the streamObjectToJson has some subtle timing issues when writing the JSON to file — and unfortunately, I'm not that well-versed in node.js streams or JSONStream to know what's up with that.

screen shot 2017-11-03 at 11 12 51

That screenshot was me running only the new test a few times with the following diff applied:

diff --git a/src/fileUtils.js b/src/fileUtils.js
index e9a17cb..94d70fd 100644
--- a/src/fileUtils.js
+++ b/src/fileUtils.js
@@ -7,19 +7,21 @@ module.exports = {
   streamObjectToJson
 };
 
-async function streamObjectToJson(filepath, data) {
-  await new Promise((resolve, reject) => {
+function streamObjectToJson(filepath, data) {
+  return new Promise((resolve, reject) => {
     mkdir.sync(path.dirname(filepath));
 
     const transformStream = JSONStream.stringifyObject();
     const writableStream = fs.createWriteStream(filepath);
     transformStream.pipe(writableStream);
     transformStream.on('error', (err) => {
-      reject(err);
+      setTimeout(() => reject(err), 0);
+      // reject(err);
     });
     transformStream.on('end', () => {
       writableStream.end();
-      resolve();
+      setTimeout(() => resolve(), 0);
+      // resolve();
     });
     Object.keys(data).forEach((key) => {
       transformStream.write([key, data[key]]);
diff --git a/test/fileUtils.js b/test/fileUtils.js
index c7ed681..cc59446 100644
--- a/test/fileUtils.js
+++ b/test/fileUtils.js
@@ -6,17 +6,18 @@ const { streamObjectToJson } = require('../lib/fileUtils');
 
 const OUTPUT_DIR = `${__dirname}/output`;
 
-describe('fileUtils', function () {
+describe.only('fileUtils', function () {
   before(function () {
     del.sync(`${__dirname}/output`);
   });
 
-  afterEach(function () {
-    del.sync(`${__dirname}/output`);
-  });
+  // afterEach(function () {
+  //   del.sync(`${__dirname}/output`);
+  // });
 
   it('should print objects to files', async function () {
     const statsFilename = path.resolve(OUTPUT_DIR, 'stats.json');
+
     await streamObjectToJson(statsFilename, {
       string: 'webpack-bundle-analyzer',
       list: ['foo', 'bar', 'baz'],
@@ -26,20 +27,18 @@ describe('fileUtils', function () {
         baz: null
       }
     });
+    expect(fs.existsSync(statsFilename)).to.be.true;
 
-    await setImmediate(() => {
-      expect(fs.existsSync(statsFilename)).to.be.true;
-
-      const fileContent = fs.readFileSync(statsFilename, 'utf8');
-      expect(JSON.parse(fileContent)).to.deep.equal({
-        string: 'webpack-bundle-analyzer',
-        list: ['foo', 'bar', 'baz'],
-        obj: {
-          name: 'webpack-bundle-analyzer',
-          foo: 'bar',
-          baz: null
-        }
-      });
+    const fileContent = fs.readFileSync(statsFilename, 'utf8');
+    console.log('content:', fileContent);
+    expect(JSON.parse(fileContent)).to.deep.equal({
+      string: 'webpack-bundle-analyzer',
+      list: ['foo', 'bar', 'baz'],
+      obj: {
+        name: 'webpack-bundle-analyzer',
+        foo: 'bar',
+        baz: null
+      }
     });
   });
 });

Copy link
Member

@valscion valscion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment about the new module and tests

Copy link
Collaborator

@th0r th0r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I'm actually not sure that this PR in it's current implementation will solve "huge-stats-file" serialization error because according to my experience 99% of size in such stats files takes modules field (and sometimes chunks) which you stringify in one chunk.

Actually, it was a surprise to me that streamed stringifying of JSON is such a complex task. Why JSONStream has to be fed with object keys/values manually? Isn't there any other library that can just take an object as input and convert it to writable stream automatically?

@ryan953
Copy link
Contributor Author

ryan953 commented Nov 3, 2017

@valscion thanks for digging in to much!

I was seeing that SyntaxError: Unexpected end of JSON input too when I initially wrote the tests, but I was mostly testing that my codebase would build, so when the message disappeared with the setImmediate I trusted it too much. I wonder now if a clock.tick(1); would would help, in addition to better waiting for the promise. I'm no expert in streams, maybe I can force the flush quicker before resolving too.

@th0r I didn't find too many streaming libraries on npm, nothing jumped out at me claiming to work for complex/nested data structures like we have here. My codebase jump came up against the memory limit this week, so (for now) splitting chunks/modules gives us a little headroom. The redux devtools have a button to show/serialize all the store data, I can open that up and see if they've got something extra, otherwise we can make one.

@ryan953 ryan953 force-pushed the stream-json-stringify branch from 132868f to 366bc70 Compare November 5, 2017 03:26
@ryan953
Copy link
Contributor Author

ryan953 commented Nov 5, 2017

@valscion & @th0r I kept looking and found a thing! It's called bfj: https://github.com/philbooth/bfj. Looks like it can just take an object and output json.

I haven't wired this up to any big projects yet, but it's a new&simpler diff now, so I wanted to get it up early.

@ryan953 ryan953 changed the title Use JSONStream and createWriteStream() to output large json stats objects Output large json stats objects Nov 5, 2017
@ryan953
Copy link
Contributor Author

ryan953 commented Nov 5, 2017

Oh noes! bfj wants node >= 6.0.0
webpack itself supports node >=4.3.0 right now, but after webpack/webpack#5920 it will be on 6+

@joshwiens
Copy link
Member

@ryan953 - For reference, now that Node 8 is officially LTS the next major version of every loader & plugin in webpack-contrib will drop support for Node 4.x.

The pull request you are referencing is merging into next a.k.a. Webpack v4.0.0. The current 3.x major version range will continue to support Node 4.

@philbooth
Copy link

bfj wants node >= 6.0.0

fwiw, there's also a node-4 branch that is published to npm if you need it:

https://www.npmjs.com/package/bfj-node4

@valscion
Copy link
Member

valscion commented Nov 5, 2017

BFJ looks very promising! Great job with the README over there, @philbooth 👏.

Let's try the node-4 branch package as specified above -- we can later use node-6 version when we bump major version

@ryan953 ryan953 force-pushed the stream-json-stringify branch from 978d0f6 to 846999a Compare November 5, 2017 20:33
Copy link
Member

@valscion valscion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good to me!

@valscion
Copy link
Member

valscion commented Nov 7, 2017

@th0r, would you want to check changes as well? If it's OK to you, I'd like to test drive merging this PR, adding a changelog and releasing new patch version with this fix 😊

.catch((error) => {
this.logger.error(
`${bold('Webpack Bundle Analyzer')} error saving stats file to ${bold(statsFilepath)}.`,
JSON.stringify(error, null, "\n") // eslint-disable-line quotes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'\n' and "\n" are the same thing in JavaScript, so instead of disabling the ESLint rule on this line, we could use '\n' and ESLint would be happy ☺️

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh yeah, and lets use \t too

Copy link
Collaborator

@th0r th0r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to add a test for this.
We can either borrow some huge stats json from guys that faced this problem (e.g. in #119) or just generate something.

iterables: 'ignore',
circular: 'ignore'
};
return bfj.write(statsFilepath, stats, options)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's better use async/await here

this.logger.info(
`${bold('Webpack Bundle Analyzer')} saved stats file to ${bold(statsFilepath)}`
);
const options = {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inline this object into bfj.write call please

.catch((error) => {
this.logger.error(
`${bold('Webpack Bundle Analyzer')} error saving stats file to ${bold(statsFilepath)}.`,
JSON.stringify(error, null, '\t')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure this is the best way to show error. Have you tried that? It prints {} for me.

if we want to show stack trace, then let's use error.stack instead.

If we only want to show error message, then we can simply inline error object into the template string like this: ${bold('Webpack Bundle Analyzer')} error saving stats file to ${bold(statsFilepath)}: ${error}.

@chasemgray
Copy link

This solved the problem we were having with builds running out of memory. Good Job!

@ryan953
Copy link
Contributor Author

ryan953 commented Nov 14, 2017

@th0r I'm not sure how to write a test for this right now :(

I've tried to await on both webpackCompile() and plugin.generateStatsFile() but neither seem to be working. The file does showup in the output folder eventually, but I'm not sure what to await on in order to get the expect to come in afterwards. I'm probably doing something silly, here's what I've got:

it('with generateStatsFile should output a json file', async function (done) {
    this.timeout(5000);
    const plugin = new BundleAnalyzerPlugin({
      generateStatsFile: true,
      analyzerMode: 'disabled',
      openAnalyzer: false,
      logLevel: 'error'
    });
    plugin.compiler = {
      outputPath: `${__dirname}/output`
    };

    await plugin.generateStatsFile(
      readStatsFile('with-children-array.json')
    );

    expect(fs.existsSync(`${__dirname}/output/stats.json`)).to.be.true;
    expect(fs.readFileSync(`${__dirname}/output/stats.json`, 'utf8')).to.have.lengthOf(5378);

    done();
  });

  function readStatsFile(statsFilename) {
    return JSON.parse(fs.readFileSync(`${__dirname}/stats/${statsFilename}`));
  }

@th0r
Copy link
Collaborator

th0r commented Jan 18, 2018

Ok, let's ship it without tests because I just don't know how to emulate "out of memory" error.

@th0r th0r merged commit 5cb03ba into webpack-contrib:master Jan 18, 2018
@ryan953 ryan953 deleted the stream-json-stringify branch January 19, 2018 03:17
@valscion
Copy link
Member

valscion commented Feb 4, 2018

Thank you @ryan953 for your contributions ☺️. This has been released in v2.10.0 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Plugin crashes when trying to stringify a large stat blob
7 participants