Implement `jest-haste-map` instead of `node-haste` #896

cpojer · 2016-04-15T07:36:25Z

This is a new haste map implementation for Jest which is much more scalable than node-haste.

node-haste2 isn't well designed and not scalable for short-lived services like Jest. The startup time of node-haste1 vs. node-haste2 on www is almost the same, both between 6 and 8 seconds which is not acceptable for our engineers. This implementation is attempting to accomplish a much reduced and more scalable startup time. It also has reduced scope – the goal of this is to only build a haste map and provide a way to resolve a one-level deep dependency tree, which is all that Jest really needs from node-haste. jest-haste-map can serve as an ideal basis for rewriting node-haste2 (into node-haste3!).

With this, the cold start time (building the entire haste map) is now about 14 seconds on www (that's acceptable) but the incremental invocation is only 2 seconds (@kyldvs will love me). I haven't heavily micro-optimized the JavaScript in packages/jest-haste-map/src/index.js and there is a bunch of data copying with HasteResolver.js – I believe I can get close to 1 second with these optimizations once I'm done. One of the goals I have is to allow tacking on data (list of mocks) to the haste map and serialize the haste map and read that one in directly in the workers instead of keeping two caches.

On every startup node-haste2 asks watchman (or node) for a list of all files. It builds up an in memory-file system and then processes every single file through an insane amount of promise chaining. It takes about 3 seconds on react-native to do this on a warm start. With this new implementation the cold start is about 1.5 seconds and warm start is way less than a second.

jest-haste-map only does the minimum processing necessary: it retrieves a list of all files only on the first start when using watchman and subsequent invocations only receive the deltas of changed files since the last time Jest has run. It further processes only files that have been changed or added. The node crawler isn't as optimized but it should be faster than node-haste2 as well.

What to review:

packages/jest-haste-map/src/index.js
packages/jest-haste-map/src/worker.js
packages/jest-haste-map/src/crawlers/
The other modules in jest-haste-map were lifted from node-haste2 and just received small code style updates (removed babel compilation etc.)
src/resolvers/HasteResolver.js for usage.

cc @davidaurelio @kentaromiura @jeffmo @vjeux
cc @amasad – if you could take a quick look, I would greatly appreciate it :)

cpojer · 2016-04-15T07:47:04Z

Actually the time it takes to run a fast test (~10ms) on www with this is 2s overall. That includes the 500ms overhead of loading jsdom too, so we are already pretty close to 1s for building the haste map.

cpojer · 2016-04-15T08:09:55Z

packages/jest-haste-map/src/crawlers/node.js

+    path.sep + '**' + path.sep + '*.{' + extensions.join(',') + '}';
+
+  return Promise.all(roots.map(
+    root => globPromise(root + pattern)


this nice and simple implementation of the node crawler actually turns out to have 2x worse performance than the one in node-haste. I might revert to that one instead. Didn't think glob would be sooo slow.

cpojer · 2016-04-15T09:47:35Z

Done in e3189c4: "Clean up HasteResolver and file caching for all of this". Now we only create one cache file. It is amended by "HasteResolver" with mocking information and is directly read from a TestWorker.

One nice side-effect of all this work is that Jest's own test suite is almost 2x as fast now. It only takes 2.5s instead of 4-5s.

davidaurelio · 2016-04-15T20:56:15Z

packages/jest-haste-map/src/__tests__/fastpath-test.js

+/*
+ * original author: dead_horse <dead_horse@qq.com>
+ * ported by: yaycmyk <evan@yaycmyk.com>
+ */


It would be nice if we could take over the “fastpath” name on npm :-)

"fasterpath" :)

davidaurelio · 2016-04-15T22:26:59Z

congrats on the speed improvements, that’s awesome!

I don’t have too much to say about the code. The indentation level of promises can also get quite deep :-)
What was our case for multiple roots? Do we really need that?

davidaurelio · 2016-04-15T22:31:27Z

Should we write a dex with the haste resolution rules?

davidaurelio · 2016-04-15T22:37:43Z

we should try to get the “fastpath” module and npm and publish the two file crawlers seperately + an watchman auto-detection function.

davidaurelio · 2016-04-15T22:57:25Z

A quick synthetic benchmark seems to suggest that native Map and Set are really fast in node.js, even faster than property lookups on objects with many entries. So there isn’t really a case for Object.create(null). Deserialization from JSON also doesn’t need the setPrototypeOf trick. Serialization, on the other hand, is uglier (.stringify(Array.from(map.entries())))

cpojer · 2016-04-16T10:02:36Z

Thanks for the review.

We use testPathDirs as separate roots extensively on www. This reduces the scope of files to crawl
Yes, we should probably write a dex :D
It is probably fine to publish this as jest-haste-map as a first version and then consider splitting out certain packages into separate modules, depending on what other people might need.
Yes, Map and Set are super fast but they don't serialize well to JSON. Is writing to a Map or Set as fast as it is to an Object? We are doing so many writes during startup, which is quite critical when you have 100k+ items. https://github.com/cognitect/transit-js could help to speed up serialization – it is basically a custom serializer that is really fast and can encode/decode to immutable data structures. I think right now that is more than I need but if it yields real performance gains, I'd consider going for it :)

cpojer · 2016-04-19T05:42:58Z

I made a bunch of micro-optimizations in ed2ee78

Since both my metadata objects for all files/modules contain the same fixed amount of items, using constant keys and arrays was a great optimization and the code complexity stayed pretty much the same (module[H.ID] instead of module.id, for example).

The file size reduction of using constant keys and arrays instead of long string names and objects was 10 mb (!) down from 26 to 16 on a large code base. I was then able to exclude 5 % of the files from the original list and am now down to 14 mb. The parse time went down from 500ms to 250ms. The write time, interestingly, is still at 300ms. I looked into compacting the paths but that would only yield a win of another 2mb or so but it would add additional post-processing. I believe with this model the file size will grow to 30 mb over the next 3-4 years and decode in less than a second which will be more than good enough. Then, someone can consider different serialization models but right now we are in the diminishing-returns territory. The data model is simple enough and the code is easy enough to change.

Things I tried:

JSONStream: this is memory efficient but actually very slow
Decoding every metadata object individually so I could store them using my own data format. loop over JSON.parse(item) had exactly the same performance characteristics as JSON.parse(allItems);.

davidaurelio · 2016-04-21T10:48:41Z

packages/jest-haste-map/src/crawlers/node.js

+function findNative(roots, extensions, ignore, callback) {
+  const args = [].concat(roots);
+  args.push('-type', 'f');
+  extensions.forEach((ext, index) => {


you need parentheses around the extensions

find x -type f -name '*.json' -o -name '*.js' is not equivalent to
find x -type f \( -name '*.json' -o -name '*.js' \)

wait, what's the difference? It seems to work fine on www for example. This is what it was like in node-haste1.

the first one would also match directories with ending with “.js”

davidaurelio · 2016-04-21T13:25:39Z

packages/jest-haste-map/src/index.js

+  static getCacheFilePath(tmpdir) {
+    const hash = crypto.createHash('md5');
+    Array.from(arguments).slice(1).forEach(arg => hash.update(arg));
+    return path.join(tmpdir, hash.digest('hex'));


it would be nice to add a human-readable prefix here. That makes it easier to delete these files manually if necessary for debugging.

good idea! Let's use the name from the options and make it tmpdir/<name>-hash.

davidaurelio · 2016-04-21T13:31:41Z

packages/jest-haste-map/src/index.js

+    return path.join(tmpdir, hash.digest('hex'));
+  }
+
+  build() {


Personally, I’d prefer this to be a static method that returns a promise that resolves to an instance. That way, there’s no need to call this.build().then(…) / hasteMap.build().then(…) anywhere.

But that’s a cosmetic change, so feel free to ignore this comment :-)

davidaurelio · 2016-04-21T14:59:12Z

This looks solid in general.

To make this the base of node-haste (and have RN packager continue to use node-haste), we need better decoupling of caching, crawling, building the map, and extracting requires.

github-actions · 2021-05-14T16:09:00Z

This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
Please note this issue tracker is not a help forum. We recommend using StackOverflow or our discord channel for questions.

facebook-github-bot added the CLA Signed ✔️ label Apr 15, 2016

cpojer force-pushed the jest-haste-map branch 3 times, most recently from ab2582a to db220d7 Compare April 15, 2016 08:05

cpojer reviewed Apr 15, 2016
View reviewed changes

cpojer force-pushed the jest-haste-map branch 2 times, most recently from 8e447d9 to 10f8caf Compare April 15, 2016 09:03

davidaurelio reviewed Apr 15, 2016
View reviewed changes

cpojer force-pushed the jest-haste-map branch 7 times, most recently from 01bd200 to b743e43 Compare April 18, 2016 08:13

cpojer force-pushed the jest-haste-map branch 2 times, most recently from 0607cf7 to ead50f7 Compare April 19, 2016 06:49

davidaurelio reviewed Apr 21, 2016
View reviewed changes

Tests, documentation and fixes.

9b39c4e

cpojer force-pushed the jest-haste-map branch from 3298a26 to 9b39c4e Compare April 21, 2016 12:01

Add more test run combinations to npm test.

f131b9f

davidaurelio reviewed Apr 21, 2016
View reviewed changes

Make cache file path human readable.

c2f246c

davidaurelio reviewed Apr 21, 2016
View reviewed changes

Properly clear file metadata on watchman reset.

90e56aa

ghost closed this in 052af96 Apr 22, 2016

cpojer mentioned this pull request Apr 25, 2016

Incorrect assertion for Object.create(null) #933

Closed

cpojer deleted the jest-haste-map branch June 15, 2016 20:15

jzaefferer mentioned this pull request Mar 30, 2017

Make jasmine.DEFAULT_TIMEOUT_INTERVAL configurable #652

Closed

github-actions bot locked as resolved and limited conversation to collaborators May 14, 2021

This pull request was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement `jest-haste-map` instead of `node-haste` #896

Implement `jest-haste-map` instead of `node-haste` #896

cpojer commented Apr 15, 2016 •

edited

Loading

cpojer commented Apr 15, 2016 •

edited

Loading

cpojer Apr 15, 2016

cpojer commented Apr 15, 2016

davidaurelio Apr 15, 2016 •

edited

Loading

janicduplessis Apr 23, 2016

davidaurelio commented Apr 15, 2016

davidaurelio commented Apr 15, 2016

davidaurelio commented Apr 15, 2016

davidaurelio commented Apr 15, 2016

cpojer commented Apr 16, 2016

cpojer commented Apr 19, 2016

davidaurelio Apr 21, 2016

cpojer Apr 21, 2016

davidaurelio Apr 21, 2016

davidaurelio Apr 21, 2016

cpojer Apr 21, 2016

davidaurelio Apr 21, 2016

davidaurelio commented Apr 21, 2016 •

edited

Loading

github-actions bot commented May 14, 2021

Implement jest-haste-map instead of node-haste #896

Implement jest-haste-map instead of node-haste #896

Conversation

cpojer commented Apr 15, 2016 • edited Loading

cpojer commented Apr 15, 2016 • edited Loading

Choose a reason for hiding this comment

cpojer commented Apr 15, 2016

davidaurelio Apr 15, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

davidaurelio commented Apr 15, 2016

davidaurelio commented Apr 15, 2016

davidaurelio commented Apr 15, 2016

davidaurelio commented Apr 15, 2016

cpojer commented Apr 16, 2016

cpojer commented Apr 19, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

davidaurelio commented Apr 21, 2016 • edited Loading

github-actions bot commented May 14, 2021

Implement `jest-haste-map` instead of `node-haste` #896

Implement `jest-haste-map` instead of `node-haste` #896

cpojer commented Apr 15, 2016 •

edited

Loading

cpojer commented Apr 15, 2016 •

edited

Loading

davidaurelio Apr 15, 2016 •

edited

Loading

davidaurelio commented Apr 21, 2016 •

edited

Loading