Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not traversing subdirectories if some filter functions applied #17

Closed
manidlou opened this issue Jan 27, 2017 · 3 comments
Closed

Not traversing subdirectories if some filter functions applied #17

manidlou opened this issue Jan 27, 2017 · 3 comments

Comments

@manidlou
Copy link
Collaborator

manidlou commented Jan 27, 2017

I was playing with #11 to see if I can find a solution, then I noticed another issue if filter function applied. This is different than #11 that the root directory is part of the result array although filter function is applied. This issue is about not traversing subdirectories at all if some filter functions applied, such as the following example.

Imagine we have

tmp
  |_dir1
  |   |_foo.md
  |_bar.md
  |_baz.md

and we want to get only .md files. If we use it like

var filterFunc = function (item) {
  return path.extname(item) === '.md'
}
var items = []
klaw('tmp', {filter: filterFunc}).on('data', function (item){
  items.push(item.path)
}).on('end', function () {
  console.dir(items)
})

the result array is ['tmp', 'bar.md', 'baz.md'].

So, I checked the code again and based on what I understood it happens because when a filter function is passed, all contents of the root directory pass through the filter function and since return path.extname(item) === '.md' fails for all subdirectories , so none of them will be read. Therefore, only items in the root directory itself are returned. Please correct me if I am wrong.

Edit

However, if we run the same example using pipe for filtering, everything is just fine. Apparently, the problem only arises when filter function is used.

var filter = through2.obj(function (item, enc, next) {
  if (path.extname(item.path) === '.md') this.push(item)
  next()
})

var items = []
klaw('tmp')
  .pipe(filter)
  .on('data', function (item) {
    items.push(item.path)
  })
  .on('end', function () {
    console.dir(items) // => ['bar.md', 'baz.md', 'dir1/foo.md']
  })
@manidlou
Copy link
Collaborator Author

manidlou commented Mar 4, 2017

In general, wouldn't be better to remove/deprecate the use of filter function and instead point users to use piping for filtering?

It seems like using filter function in a stream interface can be problematic. Please see my comment above. By removing/deprecating the filter function, users still be able to achieve the filter functionality by using either through2 or node native streams with correct results.

What are your thoughts on this?

@amadsen
Copy link

amadsen commented Nov 4, 2017

The only reason to use the filter function instead of piping through a filter stream would be to avoid calling lstat on paths you don't care about. In many cases this will be a premature optimization, but it can be useful.

This is actually also the reason the filter function causes not traversing subdirectories - they have been filtered out by name before checking to see if they are a directory.

I would argue that this is expected behavior and document it specifically. For most use cases (like the one described in this issue) a filter stream is what you actually want. For some specific use optimizations where early termination (especially skipping whole directory trees) is important a piped filter stream will not work and the filter function is exactly what is needed.

@RyanZim
Copy link
Collaborator

RyanZim commented Dec 30, 2022

Closing as expected behavior, PR welcome to clarify docs here.

@RyanZim RyanZim closed this as completed Dec 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants