-
Notifications
You must be signed in to change notification settings - Fork 30k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
streaming / iterative fs.readdir #583
Comments
Using generators/iterators in io.js where they make sense would be a good addition. |
Does this depend on the work in joyent/libuv#1574 to be merged first? I don't think it's in libuv yet. |
👍 in the future asynchronous version could return an observable: for (dir on fs.readdir(__dirname)) {
} |
👍 |
I'm all for the new named versions. We don't want to break compatibility with existing code by changing behaviour of existing functions, but new stuff could work. I'm not sure what the policy is yet on new interfaces that deviate from node.js though. @iojs/tc Thoughts? |
Where else does an interface like this make sense? Would this be a one-off of would there be a flood of requests to add the same style interface to other core APIs? |
@rvagg I looked for any other APIs that would look like this but I couldn't find any. I think this may be a one-off. |
I think the way forward on this is for someone here to propose a change in a PR and then we'll escalate that to a TC discussion because it would be good to cover the question of whether adopting a new style of API is desirable. If it's too much work to come up with an initial implementation then we could just elevate this issue, it'd just be more of a hypothetical discussion then. |
I think the streams version makes sense. I would test the iterator design in userland though. It can apply to any sort of stream, really. And it's easy enough to abstract a stream as an iterator in userland. |
The eventual observables thing sounds pretty rad. |
Just use co? (or something similar) |
OK, a few things:
So I don't think this should be a new style of API. Just an object mode stream is fine. |
Whoa, yeah this is a totally bogus suggestion as iterators and generators can't be used for iterating over async. This tripped me up at first as well. But note that though it's not exactly what you had in mind |
So re-reading the OP: readdirSync already returns an array which is iterable |
It does, but readdirSync() is (too) eager: it slurps the directory in one pass. If you apply it to a directory with 1M+ entries, it will likely bring down the process. An iterator that walks the entries piecemeal would be great but it requires a number of changes to libuv. @misterdjules already did some of the prep work but I recall that there were still some loose ends. I'm not even sure if we reached consensus that his approach was really the best way forward. |
they can! for (let promise of iterator) {
let val = await promise;
} |
@bnoordhuis I see, so you'd block on each call to @vkurchatkin yes, you can invent contracts on top of iterators. Especially ones that only work in ES2016. But they won't necessarily be respected. Someone could "forget" an |
@vkurchatkin essentially you are proposing async iterable as For example in your example the producer can't decide when to return or throw from the generator side, because the producer has to wait for the async process to complete before doing so, but the only way to signal completion or errors is synchronous return/throw. |
Yes, but I'm mostly concerned with memory usage; an iterator brings it down from O(n) to O(1). Currently, trying to scan a large directory exhausts the heap and terminates the process. |
@domenic agreed, ugly and far from ideal.
I can't see how this can be achieved using existing ES2015/2016 primitives. Thanks for the link, probably it has explanation. |
Yeah the iterative version in my head was a synchronous version that doesn't load the entire array in memory at once. But i care way more about the async version. |
@domenic While generators on their own are sync, they can be used in an async way via co. Maybe you don't like the weird promise trampoline hack it uses, but it works and many people are already using it in that way. I suggested using co because I had already suggested doing the iterator thing in userland, which means those sort of tools are available. If you prefer some different async iterator module, you can write a streams abstraction on that. I wouldn't mind this being in core, I just suggested a userland approach because I expected some members of the core team might disagree. |
btw let's not do the iterator design. i realized that if something like this happens: var i = 0;
for (name of fs.readdirIter(__dirname)) {
if (i++ > 5) throw new Error('boom');
} assuming so... let's just do the stream version for now :D |
@jonathanong as long as the iterator implements |
@domenic why isn't |
@vkurchatkin same reason |
not working: var arr = [1,2,3,4,5];
var it = arr[Symbol.iterator]();
it.return = function() {
cobsole.log('return');
}
for (var v of it) {
if (v > 3) break;
console.log(v)
} |
@rijnhard That's gona add a whole bunch of overhead... But to be clear, you want to: have an object stream of directory entries, transform those into file reads, in the same stream? That doesn't really make sense to me. You end up muliplexing/splitting the stream anyways and that leads to a very similar can of worms as just "iterating" and "awaiting" until a stream is finished to pull the next entry. Maybe you could elaborate more? I am presently avoiding making this a stream due to complexity and lack of solid use cases. |
@Fishrock123 I think I misunderstood, I just need to iterate through directories and not their contents. |
While you are working with it, would it be a lot of trouble to add a function with glob or just '?' and '*' wildcard functionality? |
@paragi This API will do no filtering and will not return entries from subdirectories. That will be up to a module to do. |
Ok folks, the pull request is up: #29349 const fs = require('fs');
async function print(path) {
const dir = await fs.promises.opendir(path);
for await (const dirent of dir) {
console.log(dirent.name);
}
}
print('./').catch(console.error); |
This adds long-requested methods for asynchronously interacting and iterating through directory entries by using `uv_fs_opendir`, `uv_fs_readdir`, and `uv_fs_closedir`. `fs.opendir()` and friends return an `fs.Dir`, which contains methods for doing reads and cleanup. `fs.Dir` also has the async iterator symbol exposed. The `read()` method and friends only return `fs.Dirent`s for this API. Having a entry type or doing a `stat` call is deemed to be necessary in the majority of cases, so just returning dirents seems like the logical choice for a new api. Reading when there are no more entries returns `null` instead of a dirent. However the async iterator hides that (and does automatic cleanup). The code lives in separate files from the rest of fs, this is done partially to prevent over-pollution of those (already very large) files, but also in the case of js allows loading into `fsPromises`. Due to async_hooks, this introduces a new handle type of `DIRHANDLE`. This PR does not attempt to make complete optimization of this feature. Notable future improvements include: - Moving promise work into C++ land like FileHandle. - Possibly adding `readv()` to do multi-entry directory reads. - Aliasing `fs.readdir` to `fs.scandir` and doing a deprecation. Refs: nodejs/node-v0.x-archive#388 Refs: nodejs#583 Refs: libuv/libuv#2057
This adds long-requested methods for asynchronously interacting and iterating through directory entries by using `uv_fs_opendir`, `uv_fs_readdir`, and `uv_fs_closedir`. `fs.opendir()` and friends return an `fs.Dir`, which contains methods for doing reads and cleanup. `fs.Dir` also has the async iterator symbol exposed. The `read()` method and friends only return `fs.Dirent`s for this API. Having a entry type or doing a `stat` call is deemed to be necessary in the majority of cases, so just returning dirents seems like the logical choice for a new api. Reading when there are no more entries returns `null` instead of a dirent. However the async iterator hides that (and does automatic cleanup). The code lives in separate files from the rest of fs, this is done partially to prevent over-pollution of those (already very large) files, but also in the case of js allows loading into `fsPromises`. Due to async_hooks, this introduces a new handle type of `DIRHANDLE`. This PR does not attempt to make complete optimization of this feature. Notable future improvements include: - Moving promise work into C++ land like FileHandle. - Possibly adding `readv()` to do multi-entry directory reads. - Aliasing `fs.readdir` to `fs.scandir` and doing a deprecation. Refs: nodejs/node-v0.x-archive#388 Refs: #583 Refs: libuv/libuv#2057 PR-URL: #29349 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: David Carlier <devnexen@gmail.com>
Landed in cbd8d71 as |
This adds long-requested methods for asynchronously interacting and iterating through directory entries by using `uv_fs_opendir`, `uv_fs_readdir`, and `uv_fs_closedir`. `fs.opendir()` and friends return an `fs.Dir`, which contains methods for doing reads and cleanup. `fs.Dir` also has the async iterator symbol exposed. The `read()` method and friends only return `fs.Dirent`s for this API. Having a entry type or doing a `stat` call is deemed to be necessary in the majority of cases, so just returning dirents seems like the logical choice for a new api. Reading when there are no more entries returns `null` instead of a dirent. However the async iterator hides that (and does automatic cleanup). The code lives in separate files from the rest of fs, this is done partially to prevent over-pollution of those (already very large) files, but also in the case of js allows loading into `fsPromises`. Due to async_hooks, this introduces a new handle type of `DIRHANDLE`. This PR does not attempt to make complete optimization of this feature. Notable future improvements include: - Moving promise work into C++ land like FileHandle. - Possibly adding `readv()` to do multi-entry directory reads. - Aliasing `fs.readdir` to `fs.scandir` and doing a deprecation. Refs: nodejs/node-v0.x-archive#388 Refs: #583 Refs: libuv/libuv#2057 PR-URL: #29349 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: David Carlier <devnexen@gmail.com>
This adds long-requested methods for asynchronously interacting and iterating through directory entries by using `uv_fs_opendir`, `uv_fs_readdir`, and `uv_fs_closedir`. `fs.opendir()` and friends return an `fs.Dir`, which contains methods for doing reads and cleanup. `fs.Dir` also has the async iterator symbol exposed. The `read()` method and friends only return `fs.Dirent`s for this API. Having a entry type or doing a `stat` call is deemed to be necessary in the majority of cases, so just returning dirents seems like the logical choice for a new api. Reading when there are no more entries returns `null` instead of a dirent. However the async iterator hides that (and does automatic cleanup). The code lives in separate files from the rest of fs, this is done partially to prevent over-pollution of those (already very large) files, but also in the case of js allows loading into `fsPromises`. Due to async_hooks, this introduces a new handle type of `DIRHANDLE`. This PR does not attempt to make complete optimization of this feature. Notable future improvements include: - Moving promise work into C++ land like FileHandle. - Possibly adding `readv()` to do multi-entry directory reads. - Aliasing `fs.readdir` to `fs.scandir` and doing a deprecation. Refs: nodejs/node-v0.x-archive#388 Refs: nodejs/node#583 Refs: libuv/libuv#2057 PR-URL: nodejs/node#29349 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: David Carlier <devnexen@gmail.com>
This adds long-requested methods for asynchronously interacting and iterating through directory entries by using `uv_fs_opendir`, `uv_fs_readdir`, and `uv_fs_closedir`. `fs.opendir()` and friends return an `fs.Dir`, which contains methods for doing reads and cleanup. `fs.Dir` also has the async iterator symbol exposed. The `read()` method and friends only return `fs.Dirent`s for this API. Having a entry type or doing a `stat` call is deemed to be necessary in the majority of cases, so just returning dirents seems like the logical choice for a new api. Reading when there are no more entries returns `null` instead of a dirent. However the async iterator hides that (and does automatic cleanup). The code lives in separate files from the rest of fs, this is done partially to prevent over-pollution of those (already very large) files, but also in the case of js allows loading into `fsPromises`. Due to async_hooks, this introduces a new handle type of `DIRHANDLE`. This PR does not attempt to make complete optimization of this feature. Notable future improvements include: - Moving promise work into C++ land like FileHandle. - Possibly adding `readv()` to do multi-entry directory reads. - Aliasing `fs.readdir` to `fs.scandir` and doing a deprecation. Refs: nodejs/node-v0.x-archive#388 Refs: nodejs/node#583 Refs: libuv/libuv#2057 PR-URL: nodejs/node#29349 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: David Carlier <devnexen@gmail.com>
This adds long-requested methods for asynchronously interacting and iterating through directory entries by using `uv_fs_opendir`, `uv_fs_readdir`, and `uv_fs_closedir`. `fs.opendir()` and friends return an `fs.Dir`, which contains methods for doing reads and cleanup. `fs.Dir` also has the async iterator symbol exposed. The `read()` method and friends only return `fs.Dirent`s for this API. Having a entry type or doing a `stat` call is deemed to be necessary in the majority of cases, so just returning dirents seems like the logical choice for a new api. Reading when there are no more entries returns `null` instead of a dirent. However the async iterator hides that (and does automatic cleanup). The code lives in separate files from the rest of fs, this is done partially to prevent over-pollution of those (already very large) files, but also in the case of js allows loading into `fsPromises`. Due to async_hooks, this introduces a new handle type of `DIRHANDLE`. This PR does not attempt to make complete optimization of this feature. Notable future improvements include: - Moving promise work into C++ land like FileHandle. - Possibly adding `readv()` to do multi-entry directory reads. - Aliasing `fs.readdir` to `fs.scandir` and doing a deprecation. Refs: nodejs/node-v0.x-archive#388 Refs: nodejs/node#583 Refs: libuv/libuv#2057 PR-URL: nodejs/node#29349 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: David Carlier <devnexen@gmail.com>
This has been released in 12.12.0 |
What about also making directories sync iterable (as initially suggested)? I think this could be done by using The commit message could be "fs: make directories sync iterable". |
@ma11hew28 sorry to tell you that Sync can't be iterable as its Sync :) a iterable is a async type |
There are sync iterators too. It's entirely possible to make a sync version. |
@Qard why should i use a iterator for a Sync call that will return after all is in mermory already but ok your right it could exist it can be done. i for my self would suggest for...of as iterate method but ok |
Thank you, @frank-dspeed and @Qard, for responding. I'm sorry for not responding promptly. @frank-dspeed, I'm sorry, but I think you misunderstood me. What you suggested is what I meant, as it is the first part of what @jonathanong initially suggested. Ie, if we make directories sync iterable, then you could do something like this: const fs = require('fs')
const dir = fs.opendirSync('.')
for (const dirent of dir) {
console.log(dirent.name)
} instead of something like this: const fs = require('fs')
const dir = fs.opendirSync('.')
let dirent
while ((dirent = dir.readSync()) !== null) {
console.log(dirent.name)
}
dir.closeSync() As for the second sentence of my initial comment on this issue, by "this", I meant "making directories sync iterable". Ie, I think a directory's default sync iterator's |
@frank-dspeed It doesn't have to all be loaded into memory with a sync iterator. If you have a directory with millions of entries in it, a sync iterator could read and return entries one at a time, or in batches, but synchronously. |
This adds long-requested methods for asynchronously interacting and iterating through directory entries by using `uv_fs_opendir`, `uv_fs_readdir`, and `uv_fs_closedir`. `fs.opendir()` and friends return an `fs.Dir`, which contains methods for doing reads and cleanup. `fs.Dir` also has the async iterator symbol exposed. The `read()` method and friends only return `fs.Dirent`s for this API. Having a entry type or doing a `stat` call is deemed to be necessary in the majority of cases, so just returning dirents seems like the logical choice for a new api. Reading when there are no more entries returns `null` instead of a dirent. However the async iterator hides that (and does automatic cleanup). The code lives in separate files from the rest of fs, this is done partially to prevent over-pollution of those (already very large) files, but also in the case of js allows loading into `fsPromises`. Due to async_hooks, this introduces a new handle type of `DIRHANDLE`. This PR does not attempt to make complete optimization of this feature. Notable future improvements include: - Moving promise work into C++ land like FileHandle. - Possibly adding `readv()` to do multi-entry directory reads. - Aliasing `fs.readdir` to `fs.scandir` and doing a deprecation. Refs: nodejs/node-v0.x-archive#388 Refs: nodejs/node#583 Refs: libuv/libuv#2057 PR-URL: nodejs/node#29349 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: David Carlier <devnexen@gmail.com>
This adds long-requested methods for asynchronously interacting and iterating through directory entries by using `uv_fs_opendir`, `uv_fs_readdir`, and `uv_fs_closedir`. `fs.opendir()` and friends return an `fs.Dir`, which contains methods for doing reads and cleanup. `fs.Dir` also has the async iterator symbol exposed. The `read()` method and friends only return `fs.Dirent`s for this API. Having a entry type or doing a `stat` call is deemed to be necessary in the majority of cases, so just returning dirents seems like the logical choice for a new api. Reading when there are no more entries returns `null` instead of a dirent. However the async iterator hides that (and does automatic cleanup). The code lives in separate files from the rest of fs, this is done partially to prevent over-pollution of those (already very large) files, but also in the case of js allows loading into `fsPromises`. Due to async_hooks, this introduces a new handle type of `DIRHANDLE`. This PR does not attempt to make complete optimization of this feature. Notable future improvements include: - Moving promise work into C++ land like FileHandle. - Possibly adding `readv()` to do multi-entry directory reads. - Aliasing `fs.readdir` to `fs.scandir` and doing a deprecation. Refs: nodejs/node-v0.x-archive#388 Refs: nodejs/node#583 Refs: libuv/libuv#2057 PR-URL: nodejs/node#29349 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: David Carlier <devnexen@gmail.com>
since we're in ES6 territory now, i'm thinking the sync version should be an iterable
and have the async version be an object stream:
See: nodejs/node-v0.x-archive#388
The text was updated successfully, but these errors were encountered: