-
Notifications
You must be signed in to change notification settings - Fork 30k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
readline: add support for async iteration #23916
Conversation
550bb04
to
b631b4c
Compare
Should we benchmark this comparatively with the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good work!
}; | ||
this.on('line', lineListener); | ||
this.on('close', closeListener); | ||
this[kLineObjectStream] = readable; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense expose this as a separate method? converting to a stream might be an issue for multiple people.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I consider this an implementation detail of @@asyncIterator
method. A major reason of why the performance of this method isn't up to par to 'line'
event, as you have noted in #23916 (comment), is because of the double buffering necessitated by the intermediate stream, so I'd rather not expose the stream at the moment.
I've compiled the branch and tested with 1 GB file (22 514 395 lines): Scripts and results:'use strict';
const fs = require('fs');
const readline = require('readline');
let counter = 0;
let dummy;
console.time('event');
const rl = readline.createInterface({
input: fs.createReadStream('big-file.txt', 'utf8'),
crlfDelay: Infinity,
});
rl.on('line', (line) => {
counter++;
dummy = line;
}).on('close', () => {
console.timeEnd('event');
console.log(`Lines: ${counter}, last line length: ${dummy.length}`);
});
'use strict';
const fs = require('fs');
const readline = require('readline');
(async function main() {
let counter = 0;
let dummy;
console.time('asyncIterator');
const rl = readline.createInterface({
input: fs.createReadStream('big-file.txt', 'utf8'),
crlfDelay: Infinity,
});
for await (const line of rl) {
counter++;
dummy = line;
}
console.timeEnd('asyncIterator');
console.log(`Lines: ${counter}, last line length: ${dummy.length}`);
})();
So event implementation currently 3 times as fast as async iterator implementation. Maybe we should warn about this. |
Can you check reading the same file with |
@mcollina Do you mean to compare async iterating over unsplit chunks vs async iterating over split lines? If so, I have ~ 4x factor: Scripts and results:const fs = require('fs');
(async function main() {
let counter = 0;
let dummy;
console.time('asyncIteratorChunks');
for await (const chunk of fs.createReadStream('big-file.txt', 'utf8')) {
counter++;
dummy = chunk;
}
console.timeEnd('asyncIteratorChunks');
console.log(`Chunks: ${counter}, last chunk length: ${dummy.length}`);
})();
'use strict';
const fs = require('fs');
const readline = require('readline');
(async function main() {
let counter = 0;
let dummy;
console.time('asyncIteratorLines');
const rl = readline.createInterface({
input: fs.createReadStream('big-file.txt', 'utf8'),
crlfDelay: Infinity,
});
for await (const line of rl) {
counter++;
dummy = line;
}
console.timeEnd('asyncIteratorLines');
console.log(`Lines: ${counter}, last line length: ${dummy.length}`);
})();
|
@mcollina Or do you mean to compare async iterating over unsplit chunks vs event implementation for unsplit chunks? If so, I have 1:1 factor, i.e. the same speed: Scripts and results:'use strict';
const fs = require('fs');
let counter = 0;
let dummy;
console.time('eventChunks');
const readable = fs.createReadStream('big-file.txt', 'utf8');
readable.on('data', (chunk) => {
counter++;
dummy = chunk;
}).on('close', () => {
console.timeEnd('eventChunks');
console.log(`Chunks: ${counter}, last chunk length: ${dummy.length}`);
});
'use strict';
const fs = require('fs');
(async function main() {
let counter = 0;
let dummy;
console.time('asyncIteratorChunks');
for await (const chunk of fs.createReadStream('big-file.txt', 'utf8')) {
counter++;
dummy = chunk;
}
console.timeEnd('asyncIteratorChunks');
console.log(`Chunks: ${counter}, last chunk length: ${dummy.length}`);
})();
|
#23901 has landed, it seems those commits can be excluded to simplify reviews. |
And beware #23929, we may have conflicts. |
@vsemozhetbyt thanks for those benchmarks, those are quite interesting. Specifically the fact that using stream iteration is now essentially on par with Related to |
Co-authored-by: Ivan Filenko <ivan.filenko@protonmail.com> Fixes: nodejs#18603 Refs: nodejs#18904
Documentation changes
b631b4c
to
f8ff7c7
Compare
@vsemozhetbyt @mcollina I've updated this PR to address the documentation comments. Please take a look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Docs LGTM. Thank you! |
Maybe cc @nodejs/streams ? |
@devsnek maybe? Anyway this could land because it's older than a week. I'd recommend to wait for 2 days and then land if no one objects. |
Co-authored-by: Ivan Filenko <ivan.filenko@protonmail.com> Fixes: nodejs#18603 Refs: nodejs#18904 PR-URL: nodejs#23916 Reviewed-By: Matteo Collina <matteo.collina@gmail.com> Reviewed-By: Gus Caplan <me@gus.host>
Notable Changes: * console,util: * `console` functions now handle symbols as defined in the spec. nodejs#23708 * The inspection `depth` default is now back at 2. nodejs#24326 * dgram,net: * Added ipv6Only option for `net` and `dgram`. nodejs#23798 * http: * Chosing between the http parser is now possible per runtime flag. nodejs#24739 * readline: * The `readline` module now supports async iterators. nodejs#23916 * repl: * The multiline history feature is removed. nodejs#24804 * tls: * Added min/max protocol version options. nodejs#24405 * The X.509 public key info now includes the RSA bit size and the elliptic curve. nodejs#24358 * url: * `pathToFileURL()` now supports LF, CR and TAB. nodejs#23720 * Windows: * Tools are not installed using Boxstarter anymore. nodejs#24677 * The install-tools scripts or now included in the dist. nodejs#24233 * Added new collaborator: * [antsmartian](https://github.com/antsmartian) - Anto Aravinth. nodejs#24655 PR-URL: nodejs#24854
PR-URL: nodejs#26472 Refs: nodejs#23916 Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de> Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
const TOTAL_LINES = 18; | ||
|
||
(async () => { | ||
const readable = new Readable({ read() {} }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at this now (to solve another bug) I think this backpressure behaviour is confusing since other consumers can listen to line
on the stream and it's surprising we pause()
it for them.
Rewritten version of #18904, using more existing streams mechanisms.
Depends on #23901 for some of the edge case tests (relevant commits included within this PR).
Co-authored-by: Ivan Filenko ivan.filenko@protonmail.com
Fixes: #18603
Refs: #18904
/cc @mcollina @devsnek @prog1dev
Checklist
make -j4 test
(UNIX), orvcbuild test
(Windows) passes