-
Notifications
You must be signed in to change notification settings - Fork 247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Could you have a look at RedBlackTrees.jl? #5
Comments
Red black tree would be a great addition to this package. I just took a brief look at the codes. I think the following line would be a problem in terms of performance: https://github.com/pygy/RedBlackTrees.jl/blob/master/src/RedBlackTrees.jl#L75 Since Red-black tree is a binary tree, so each node should have at most two links. I think you don't need an array here, instead you can have a left and right field. |
The use of an array is intentional, it makes the code simpler and probably faster. See here for the justification. Now that I think of it, I could emulate that behavior using I'll have to benchmark it. |
Julia's array is much heavier than C's array. Tuples are much more efficient, and you can still use the |
A few general comments.
I'll try to look at the implementation itself at some point. I'll add that Dahua as written some of the fastest Julia code around, so it would be well worth your time to listen to his advice. ;-) |
> Julia's array is much heavier than C's array. Tuples are much more efficient, and you can still use the getindex syntax. Definitely, and, in the original, the array is actaully part of the struct, which improves memory locality and cacheability. The thing with tuples is that they are immutable. I'll have to allocate new ones for rotations and insertions, and a branch will be needed every time. I'll have to benchmark it to know what's worse in this case: pipeline trashing or cache misses? To give you an example, here's the code for a rotations. It is branch free. const left = 1
const right = 2
const opposite = (right, left)
function single_rotation(root::RBNode, dir::Int)
other = opposite[dir]
save = root.link[other]
root.link[other] = save.link[dir]
save.link[dir] = root
root.red = true
save.red = false
save
end
function double_rotation(root::RBNode, dir::Int)
other = opposite[dir]
root.link[other] = single_rotation(root.link[other], other)
single_rotation(root, dir)
end The same goes for node creation and deletion, even though it is less problematic. How heavy are arrays compared to tuples? @kmsquire: I've seen the other red black tree implementation, but it looks like it suffers from acute haskelliform typitis. Red and black nodes are a different type and insertions/deletions are sliced into several functions that call each other in order, with a different version for each color (up to I'll have a look at the associative collection interface. Currently, it only implements sets. Another tweak would be to keep a pointer either to the parent node or to the next and previous values in the tree, to make iteration lighter. Currently, the iterator keeps its state in a O log(n) stack which is allocated on |
I think at this point, we should focus on getting the API consistent with Julia's convention. It would be useful to call it |
Agreed. Before I go fruther, I'd like you opinion on this: I think that mutable values should be deep copied by default when used as dict keys or set elements. Otherwise, mutating an object could break the order. That being said, I'd like to offer the possibility to either do a shallow copy or no copy at all for performance conscious users who know what they are doing. Do you agree with the approach? If so, I suppose I should provide a constructor with a named argument that defaults to |
I would suggest following the behavior of the julia> import Base: isequal, hash
julia> type MyMutableType
x::Int
end
julia> isequal(x::MyMutableType, y::MyMutableType) = (x.x == y.x)
isequal (generic function with 34 methods)
julia> hash(x::MyMutableType) = hash(x.x)
hash (generic function with 25 methods)
julia> x = MyMutableType(1)
MyMutableType(1)
julia> x == MyMutableType(1)
true
julia> y = MyMutableType(2)
MyMutableType(2)
julia> d = [x=>'x', y=>'y']
[MyMutableType(2)=>'y',MyMutableType(1)=>'x']
julia> x.x = 3
3
julia> x in keys(d)
false
julia> y in keys(d)
true This is also the typical behavior in, e.g., Java and IOS programming. I'm hopeful that Julia will eventually support deeply immutable objects (as much as possible). |
Can you rerun this test with using SortingAlgorithms
push!(array, item); sort!(array, alg=TimSort)
Note that this should not take anything away from this work--I truly thing that we need a |
Mutating a key can break the whole tree, not just that entry... Regarding the naive array-based version, note that it doesn't check for dupes. A smarter way to work would be to do a binary search and, if needed, grow the array, and shift the values bigger than the one you want to insert. You can also decide to tolerate dupes in the array, and only sort when needed to read or iterate over the collection. (for sets only). The iteration code and With more precise benchmarks, the threshold with the standard sort is actually 1500 items. TimSort beats rbtree up to 2200 items. For the array vs. tuple vs. type field debate: we can get the best of the three worlds by using |
On Monday, January 27, 2014, Pierre-Yves Gérardy
There was talk before of creating an
I'm not as familiar with this. If you can code it up and show an obvious Cheers! |
You can already access fields using an integer argument, directly in Julia, via |
That's good news! No problem with using v0.3 (I'll have to install it, though). Do you know if it also fast if the initial types are homogenous even though it varies afterwards? Specifically, for type RBNode{K,V}
left::RBNode{K,V}
right::RBNode{K,V}
key::K
value::V
red::Bool
end Edit: No, it isn't, since the fast case is reserved to homogenous or "all pointer" types. I can use It would be nice if it were possible to make this kind of function completely private, though. |
In terms of mutating key, let's follow Julia base's behavior: Dict does not make a deep copy of a mutating key: julia> type X
x::Int
end
julia> k = X(1)
X(1)
julia> d = (X=>Int)[k=>10]
[X(1)=>10]
julia> k.x = 2
2
julia> d
[X(2)=>10] This may cause problems under certain situations. But it doesn't seem to be a real problem in practice. Of course, we may allow other behaviors through options. |
Did this work ever get pushed anywhere? |
The sorted containers are based on 2-3 trees, which are mostly functionally equivalent to red-black trees (although the latter may have a performance edge). If someone wants to implement the sorted containers using red-black trees as the substrate, there may be a performance boost. However, my own timing tests seem to indicate that the main performance drag on the currently implemented 2-3 trees is main-memory fetching (i.e., cache misses). Therefore, it may be more profitable to implement something like Bender et al.'s cache-oblivious B-trees, SIAM J. Comput., 2005. |
Since this hasn't been touched since 2014, shall it just be closed? |
I ported this C red black tree library, (see here for a complete description), and I'd like to know if you're interested in adding it to this package.
https://github.com/pygy/RedBlackTrees.jl
It implements
push!()
,delete!()
,in()
,size()
,length()
,show()
and the iterator protocol.A tree is created using
RedBlackTree(some_type)
. You can iterate backwards usingfor i in RedBlackTrees.Backward(tree) ...
show(tree)
only displays the type and size of the tree. You can see the content of the tree usingshow(tree.root)
.On my machine, insertion beats
push!(array, item); sort!(array)
from ~3000 items on.I'll add the possibility to pass custom
cmp()
andcopy()
functions (copy
for node insertion). Currently, it usesBase.cmp
and doesn't make any copy.There are probably some more things left to tweak. For example, according to
@time
iteration allocates ~ 400 bytes per item in the tree, even though it shouldn't.It's also the first time I use parametric types, and I may have overused parametric function definitions.
At last, I use an
alias RBLNode Union(RBNode{T}, Nothing)
as the type for nodes, with Nothing representing leafs. It may or may not help to use an abstract type and have both RBNode and Leaf depend on it, but it would require some tweaks since the nodes are parametrized, I'd have to keep a typed leaf around, or create one each time I need one, which may slow things down.The text was updated successfully, but these errors were encountered: