Use tree-seq to make iterate-dir lazy #105
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The previous implementation of
iterate-dir
was not lazy and would eagerly traverse the entire directory hierarchy and load it all into memory. Because it was eager, the entire directory tree structure would need to be loaded before theiterate-dir
function returned. When used on very large directory trees, this could be very slow and could also produce ajava.lang.OutOfMemoryError
. (See issue #38)By using
tree-seq
to lazily traverse the directory tree, theiterate-dir
function now immediately returns a lazy sequence (even on very large directory trees) because it does not need to first traverse and load the tree. This also means thatiterate-dir
will not itself produce anOutOfMemoryError
when used on large directory trees. Unfortunately, however, it seems that Clojure's lazy sequence are still susceptible toOutOfMemoryError
s. Even using the lazytree-seq
approach, anOutOfMemoryError
can be produced when processing very large directory trees (e.g. withdorun
ordoseq
, or evencount
):OutOfMemoryError GC overhead limit exceeded
Nevertheless, this is still an improvement over the previous implementation since the
OutOfMemoryError
is not produced immediately upon callingiterate-dir
, but rather only after processing a very large portion of the results. This seems to be a limitation in Clojure itself, in any case.