-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add tree walking functions #199
base: master
Are you sure you want to change the base?
Conversation
I switched the names to ...
|
Random thought, would it make sense to implement the https://github.com/JuliaCollections/AbstractTrees.jl interface for the trees in this package? |
Good question. Let me think.... |
On a related note, currently, we can implement a parent function for BallTree nodes because it stores the associated regions explicitly, but I don't see an obvious way to do this for KDTree nodes without storing the min and max values for the dimension that was split in the tree. There are some key advantages to having a parent function, e.g. then you can do tree traversal and iteration without a stack. (And I think some of the abstract trees methods would need this...) On the other hand, for pure NearestNeighbors functions, storing this extra information is unneeded. How amenable would you be to adding a split_minmax value to the KDTree struct that stores that information? This would just store a tuple of values for the boundaries of the dimension that is split so they can be restored via a parent call. |
src/tree_ops.jl
Outdated
@@ -12,6 +12,175 @@ function show(io::IO, tree::NNTree{V}) where {V} | |||
print(io, " Reordered: ", tree.reordered) | |||
end | |||
|
|||
struct NNTreeNode{T <: NNTree, R} | |||
index::Int | |||
tree::T |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a general "philosophy" that storing something big (a full KD Tree) in something that is conceptually small (a tree node) is often a mistake.
As you traverse the tree you will create all these nodes that will all contain the same tree
. What do you think about dropping the tree
field and instead require a user to provide the tree a an argument to the traverse functions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good question and good rationale. My own experience has been that Julia is very good at optimizing the codes when the types are immutable, so I doubt it is really creating different copies if you use it in a function.
My argument for the current organization is that node ids are tied to the tree and so this makes it so that you don't have an additional argument hanging out everywhere..., it makes it easy and simple to write codes that do the right thing and get the answer right. But as I said, I hadn't considered your particular perspective here.
Is there a test we could do to resolve if this is an issue? (i.e to convince me that your perspective is correct, or for me to convince you it isn't a problem to store the tree and the compiler really is smart enough?)
Maybe, vectors of nodes would be bad for including the tree? But we do we ever actually need them?
Another argument for keeping it linked is that the AbstractTrees interface is 'node' oriented, so you define children, parent, etc. on a node level; which would require keeping the tree as part of the struct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay -- you are right -- this does make a big difference. I took a trivial walk the list and count up the sizes of the leaves code that is just going to benchmark the traversal... (total number of points 100k) By storing the NNTree variable it takes ~131 μs. If I just do it by raw calls with node ids and passing the tree as a parameter to the function, it takes ~29 μs. But... if I store a ref to the tree rather than the full tree structure, then I get all the functionality and it takes ~45 μs. I think the latter is worth doing. So I'll implement that and update the pull request. Not that all of this skips the region computations for the KDTree, so that will shorten the difference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But... if I store a ref to the tree rather than the full tree structure
I don't fully understand what that means.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's the updated structure. This stores a pointer to the tree information instead of a copy of all the information.
struct NNTreeNode{TreeRef <: Ref{ <: NNTree }, R}
index::Int
treeref::TreeRef
region::R
end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue I have with the iterate analogy is that iterate is designed to execute within a single function context -- and has some nice syntax to hide the complexity and different types of objects -- whereas most of the tree walking functions are designed to execute recursively, where there is no such affordance that I know of. So you'll have to pass the tree structure to any subfunction -- as well as the node structure.
The current design is just designed to be easy to use; it's also feasible to adapt to the AbstractTrees.jl interface (although I haven't done that yet...) where they do the same thing with parent/children/etc. functions.
But it seems like you are still leaning against it enough though there is minimal overhead, is that correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to be precise, the interface you would like is:
children(T::Tree, n::Node) -> (nl::Node, nr::Node)
parent(T::Tree, n::Node) -> (p::Node)
region(n::Node) ->
leaf_points(T::Tree, n::Node) -> something that iterates over points in the leaf node
etc...
where node is something simple like:
struct Node{R}
index::Int
region::R
end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a quick nudge on this question of interface. Would love to get this wrapped up in the next week or so before some obligations for school starts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, since I had a moment, I just implemented the interface above. As a check, we can do non-recursive exploration of the tree using the current children, parent, next/prev sibling structure, see, the e.g. points iterator...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, sorry for the slow response here and sorry for being a bit "annoying" with trying to figure out the "best" interface to use.
A reason for this is that this is my first Julia package so it holds a bit of a special place in my heart and I have also worked quite a bit to reduce memory footprint and improve performance.
I can add your package so it is tested as part of the CI here (and you could then at any time also implement whatever tree walking interface you want there and it will not be broken, or at least it can be updated if changes are made here that would be incompatible with it).
As a check of the functionality here it would be nice to reimplement https://github.com/KristofferC/NearestNeighbors.jl/blob/master/examples/balltree_illustration.ipynb using these official traverse functions. Doesn't strictly have to be done here but it would serve as somewhat of a use case check. |
This is an initial take at how to setup the tree walking codes.
This addresses #194.
I need to add more documentation still, but this should be enough to get some initial feedback before writing more docs.