-
Notifications
You must be signed in to change notification settings - Fork 29.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
module: support multi-dot file extensions #23416
Conversation
CI: https://ci.nodejs.org/job/node-test-pull-request/17738/ |
1fb0f8b
to
9f4d00d
Compare
startIndex = index + 1; | ||
if (index === 0) continue; // Skip dotfiles like .gitignore | ||
currentExtension = name.slice(index); | ||
if (Module._extensions[currentExtension]) return currentExtension; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't we just store and return the function instead to avoid a second, unnecessary lookup?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mscdex Would you clarify your question (maybe with pseudo code)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The value of Module._extensions[currentExtension]
is thrown away here. However after this function returns, Module._extensions[extension]
is done a second time. It would be nice if we could avoid this duplicate lookup.
So instead of:
currentExtension = name.slice(index);
if (Module._extensions[currentExtension]) return currentExtension;
we could do:
const loader = Module._extensions[name.slice(index)];
if (loader)
return loader;
and change the calling code appropriately:
findExtension(filename)(this, filename);
The function name and the default return value would need to be changed of course.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wrote some code to do a performance test:
const path = require('path');
// Mock Module for testing
const Module = {
_extensions: {
'.js': function () {
const makeWork = JSON.parse('{"foo": "bar", "baz": 123}');
return makeWork;
},
'.json': function () {
const makeWork = JSON.parse('{"foo": "bar", "baz": 123}');
return makeWork;
},
'.node': function () {
const makeWork = JSON.parse('{"foo": "bar", "baz": 123}');
return makeWork;
}
}
}
// Generate 100000 filenames
const filenames = [];
for (let i = 0; i < 100000; ++i) {
filenames.push(`${i}.${Module._extensions[i % 3]}`);
}
// Current version
function findExtension(filename) {
const name = path.basename(filename);
let currentExtension;
let index;
let startIndex = 0;
while ((index = name.indexOf('.', startIndex)) !== -1) {
startIndex = index + 1;
if (index === 0) continue; // Skip dotfiles like .gitignore
currentExtension = name.slice(index);
if (Module._extensions[currentExtension]) return currentExtension;
}
return '.js';
}
// Altered version that returns the loader itself rather than the extension,
// so that there’s only one lookup
function findLoaderByExtension(filename) {
const name = path.basename(filename);
let currentExtension;
let index;
let startIndex = 0;
let loader;
while ((index = name.indexOf('.', startIndex)) !== -1) {
startIndex = index + 1;
if (index === 0) continue; // Skip dotfiles like .gitignore
currentExtension = name.slice(index);
if (loader = Module._extensions[currentExtension]) return loader;
}
return Module._extensions['.js'];
}
const run = {
test1: function() {
console.time('test1');
filenames.forEach((filename) => {
var extension = findExtension(filename);
Module._extensions[extension](this, filename);
});
console.timeEnd('test1');
},
test2: function() {
console.time('test2');
filenames.forEach((filename) => {
var loader = findLoaderByExtension(filename);
loader(this, filename);
});
console.timeEnd('test2');
}
}
run[process.argv[2]]();
Here are my results:
✦ node -v
v10.12.0
✦ node test.js test1 && node test.js test1 && node test.js test1 && node test.js test1 && node test.js test1 && node test.js test1 && node test.js test1 && node test.js test1 && node test.js test1 && node test.js test1
test1: 85.108ms
test1: 85.046ms
test1: 87.216ms
test1: 85.767ms
test1: 85.926ms
test1: 85.650ms
test1: 84.461ms
test1: 85.011ms
test1: 83.525ms
test1: 87.337ms
✦ node test.js test2 && node test.js test2 && node test.js test2 && node test.js test2 && node test.js test2 && node test.js test2 && node test.js test2 && node test.js test2 && node test.js test2 && node test.js test2
test2: 90.877ms
test2: 86.902ms
test2: 86.268ms
test2: 85.143ms
test2: 90.064ms
test2: 86.636ms
test2: 84.882ms
test2: 86.778ms
test2: 89.278ms
test2: 85.963ms
- Test 1 average time: 85.505ms
- Test 2 average time: 87.279ms
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@GeoffreyBooth My suggestion was not so much about a huge performance improvement as it is removing duplicated effort.
Also, if you really want to do benchmarks, they should be done via the official benchmark mechanism for the best comparison.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first version has two lookups, but the second version has an extra assignment (loader = Module._extensions[currentExtension]
). So the “return the loader” version saves a second lookup at the cost of an extra assignment, which I think explains why the benchmarks are more or less equivalent between the two versions. Personally I think a helper that returns an extension is easier to grasp (and more potentially useful in the future) than a helper that returns a loader, but that’s a subjective call.
let startIndex = 0; | ||
while ((index = name.indexOf('.', startIndex)) !== -1) { | ||
startIndex = index + 1; | ||
if (index === 0) continue; // Skip dotfiles like .gitignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are to skip the dotfiles, shouldn't this just break
out of here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are to skip the dotfiles, shouldn't this just break out of here?
No, because it's not skipping the whole dotfile, just the first dot of the dot file.
path.extname(".dotfile")
// => ""
path.extname(".dotfile.ext")
// => ".ext"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A test was added for this case.
// find the longest (possibly multi-dot) extension registered in | ||
// Module._extensions | ||
function findExtension(filename) { | ||
const name = path.basename(filename); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you're only interested in the extension, is calling path.basename()
necessary? You could just scan backwards for dots, checking that there are no path separators in between:
let dot = -1;
let sep = -1;
for (let i = filename.length; --i > 0; /* empty */) {
const c = filename[i];
if (c === '.')
dot = i;
else if (c === '/') { // FIXME handle backslash on windows
sep = i;
break;
}
}
if (dot > sep + 1) {
const ext = filename.slice(dot);
if (Module._extensions[ext]) return ext;
}
return '.js'; // dot file or no extension
Only one slice == less garbage to collect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current algorithm scans forwards to short-circuit at the longest matched extension. So say you have a file like mr.robot.coffee.md
, and a loader registered for .coffee.md
. The current algorithm will iterate like:
- Is
.robot.coffee.md
registered? No, continue. - Is
.coffee.md
registered? Yes, break.
This way, even if .md
is also registered, the .coffee.md
loader takes precedence. This allows multi-dot extensions like .es6.js
, for example.
If I’m reading your code correctly, it’s checking only the longest possible multi-dot extension, which for this example would be .robot.coffee.md
, a false match. We want multi-dot extensions to work without prohibiting periods elsewhere in the filename.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps naming the function findRegisteredExtension
would be more self-descriptive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@richardlau Yes, or perhaps findLongestRegisteredExtension
would be even more descriptive (because we’re returning .coffee.md
instead of .md
, if both of those happened to be registered).
efbf50a
to
6908da4
Compare
@addaleax I rewrote the commit messages to get the build to pass. Does this PR need anything else? |
6908da4
to
64aab09
Compare
I believe this is |
Landed in 22f7d0a |
Support multi-dot file extensions like '.coffee.md' in Module.load. PR-URL: nodejs#23416 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: John-David Dalton <john.david.dalton@gmail.com>
Support multi-dot file extensions like '.coffee.md' in Module.load. PR-URL: #23416 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: John-David Dalton <john.david.dalton@gmail.com>
Support multi-dot file extensions like '.coffee.md' in Module.load. PR-URL: #23416 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: John-David Dalton <john.david.dalton@gmail.com>
Following up from #23228 and nodejs/citgm#604.
This PR adds support for registering “multi-dot” file extensions (like
.coffee.md
) to the module loader. Previously userland modules like CoffeeScript were patchingModule.prototype.load
to support code likerequire.extensions['.foo.bar'] = ...
. After this PR, such patching is no longer necessary. This also resolves the original issue that the reverted #23228 was addressing, fixing inconsistency betweenload
and_findPath
.Written with @jdalton.
Checklist
make -j4 test
(UNIX), orvcbuild test
(Windows) passes