Skip to content

Latest commit

 

History

History
372 lines (189 loc) · 13.3 KB

erstar.md

File metadata and controls

372 lines (189 loc) · 13.3 KB

Module erstar

Implementation of R* tree data structure.

Data Types

rtree() = {'?MODULE', {pos_integer(), pos_integer(), pos_integer(), non_neg_integer()}, treenode()}
treeleaf() = {erstar_bound:bound(), any()}
treenode() = {node, empty | erstar_bound:bound(), [treenode() | treeleaf()]}

Function Index

around/4Locates all the leaves which are closer than CloserThan units to the given point.
at/3Locates all the leaves which contain the given point in its bound.
clear/1Removes all leaves from a tree.
fold/3Folds over entire tree in a depth-first traversal.
foldwide/3Folds over entire tree in a breadth-first traversal.
inbetween/5Locates all the leaves which are in between FartherThan and ButCloserThan units far from the given point.
insert/2Inserts a bulk of leafs simultaneously.
insert/3Inserts new leaf into an R* tree, given its bound and arbitrary term to associate with it.
instance/1Verifies that the given term is an R* tree.
leaves/1Gathers plain list of all leaves in an R* tree.
locate/2Locates all the leaves getting inside the given bound.
locate/3Locates all the leaves enclosed inside or overlapping the given bound.
new/1Creates an empty R* tree with specific maximum node capacity.
new/2Creates an empty R* tree with specific minimum and maximum node capacity.
new/4Creates an empty R* tree with specific node capacities, choose-subtree cutout value and reinserts count.
remove/2Removes the bulk of leaves with a single call.
remove/3Removes the leaf from an R* tree, given its bound and arbitrary term.
size/1Computes the size of an R* tree.
walk/2Walks over nodes and leaves in a tree, effectively gathering list of leaves accepted by the user-defined function.
walkfold/3Walks over nodes and leaves in a tree, simultaneously folding over these which were accepted by the user-defined function.

Function Details

around/4

around(X::number(), Y::number(), CloserThan::number(), RStar::rtree()) -> [treeleaf()]



Locates all the leaves which are closer than CloserThan units to the given point. Distance to a bound is said to be the distance to its center.

at/3

at(X::number(), Y::number(), RStar::rtree()) -> [treeleaf()]



Locates all the leaves which contain the given point in its bound.

See also: locate/3.

clear/1

clear(X1::rtree()) -> rtree()



Removes all leaves from a tree.

fold/3

fold(FoldFun, Acc, X3::rtree()) -> Acc

Folds over entire tree in a depth-first traversal. Accumulates result with each call to the user-defined function. Each call is given entry type, node or leaf, its bound, the thing it contain, numeric level in a tree and the accumulator. Leaves contain arbitrary terms passed to insert while nodes contain list of their children.

foldwide/3

foldwide(FoldFun, Acc, X3::rtree()) -> Acc

Folds over entire tree in a breadth-first traversal. Otherwise, the same as fold/3.

See also: fold/3.

inbetween/5

inbetween(X::number(), Y::number(), FartherThan::number(), ButCloserThan::number(), RStar::rtree()) -> [treeleaf()]



Locates all the leaves which are in between FartherThan and ButCloserThan units far from the given point. Distance to a bound is said to be the distance to its center.

insert/2

insert(Leafs::[treeleaf()], RStar::rtree()) -> rtree()



Inserts a bulk of leafs simultaneously. Each leaf is a tuple containing its bound and arbitrary term.

See also: insert/3.

insert/3

insert(Bound::erstar_bound:bound(), Data::any(), RStar::rtree()) -> rtree()



Inserts new leaf into an R* tree, given its bound and arbitrary term to associate with it.

Generally, it is expensive operation. If you expect large number of inserts with your usage plan, consider lowering reinserts count or maximum node capacity, or both.

instance/1

instance(X1::any()) -> boolean()



Verifies that the given term is an R* tree.

leaves/1

leaves(RTree::rtree()) -> [treeleaf()]



Gathers plain list of all leaves in an R* tree.

locate/2

locate(Where::erstar_bound:bound(), RStar::rtree()) -> [treeleaf()]



Locates all the leaves getting inside the given bound.

See also: locate/3.

locate/3

locate(X1::enclose | intersect, Where::erstar_bound:bound(), RStar::rtree()) -> [treeleaf()]



Locates all the leaves enclosed inside or overlapping the given bound. Thus, enclose and intersect specify which set will be returned, respectively.

new/1

new(MaxCapacity::pos_integer()) -> rtree()



Creates an empty R* tree with specific maximum node capacity. Same as to create a tree with minimum node capacity equal to 40% of the MaxCapacity.

See also: new/2.

new/2

new(MinCapacity::pos_integer(), MaxCapacity::pos_integer()) -> rtree()



Creates an empty R* tree with specific minimum and maximum node capacity. Same as to create a tree with given node capacities, choose-subtree cutout of 32 and reinserts count equal to 30% of the MaxCapacity.

See also: new/4.

new/4

new(MinCapacity, MaxCapacity, ChooseCutout::pos_integer(), ReinsertCount::non_neg_integer()) -> rtree()
  • MinCapacity = pos_integer()
  • MaxCapacity = pos_integer()

Creates an empty R* tree with specific node capacities, choose-subtree cutout value and reinserts count.

Node capacities dictate how many children each node should contain. Only the root node allowed to contain lesser than MinCapacity number of children. Minimum node capacity should be at least 2 and maximum capacity should be at least double the minimum one. Practically, in the most of cases wide nodes are preferred, starting from 16 or 32, and wider.

Choose-subtree cutout value says how many children at each level are being considered during each insertion. Generally, the lesser this value is, the faster each insert will be, but later lookups may degrade in performance instead.

Finally, reinserts count states how many nodes should be inserted again during each node split. Theretically, greater values would give better R* tree lookup performance in the long term. But on the other hand they will incur notable insert performace penalty. However, author's observations show that there was no notable improvement in the tree quality on different, even pretty big, datasets when recommended values were used.

As paper on the subject states, the general optimal value for the minimum node capacity is 40% of the maximum one, for the choose-subtree cutout is 32 and for the reinserts count is at 30% of the maximum node capacity.

remove/2

remove(Leaves::[treeleaf()], RStar::rtree()) -> rtree()



Removes the bulk of leaves with a single call.

See also: remove/3.

remove/3

remove(Bound::erstar_bound:bound(), Data::any(), RStar::rtree()) -> rtree()



Removes the leaf from an R* tree, given its bound and arbitrary term. Does nothing if tree does not contain such leaf.

size/1

size(RTree::rtree()) -> non_neg_integer()



Computes the size of an R* tree.

Please note that the computation involves traversal of the whole tree.

walk/2

walk(WalkFun, X2::rtree()) -> [treeleaf()]

Walks over nodes and leaves in a tree, effectively gathering list of leaves accepted by the user-defined function. User-defined function should decide, given entry type and its bound, if a node should be visited, should be walked over as a whole (as if to decide to walk over any descendants of this node unconditionally), or a leaf should appear in the result. If you want to issue some specific locate query on a tree, start here.

walkfold/3

walkfold(WalkFun, Acc, X3::rtree()) -> Acc
  • WalkFun = fun((node | leaf, erstar_bound:bound(), any(), pos_integer(), Acc) -> Result)
  • Result = {ok, Acc} | {descend, Acc} | {done, Acc}
  • Acc = any()

Walks over nodes and leaves in a tree, simultaneously folding over these which were accepted by the user-defined function. User-defined function should decide, given entry type, its bound, thing it contains, its level in a tree and the accumulator, what it wants to do further. If {ok, Acc} is returned, the walk operation continues over other siblings in a tree. On the other hand, if {descend, Acc} is returned, operation continues with all children of the current node. It is not allowed to descend when walking over a leaf. Finally, {done, Acc} ends the walk operation instantly, returning Acc as the final result.

This operation is a hybrid of fold/3 and walk/2.

See also: fold/3, walk/2.