Skip to content

Latest commit

 

History

History
298 lines (235 loc) · 9.11 KB

README.md

File metadata and controls

298 lines (235 loc) · 9.11 KB

Build Status

Note: Although this README mostly talks about Python, the included files avl_.h and avl_.c implement a completely generic C Library that can easily be used outside of Python.

AVL Tree Objects for Python

This package implements a dual-personality builtin object for Python, using AVL trees. AVL trees (named after the inventors, Adel'son-Vel'skii and Landis) are balanced binary search trees, where the difference in height between any one node's left and right branches is kept less than or equal to one. Whenever a key is inserted into a position in a way that would upset this balance, a 'rotation' is performed. These rotations happen less than half the time when inserting. Deletion is similar, but may result in up to log(n) rotations.

These trees can be used to implement dictionaries with fast lookup and deletion of any single item, but can also act like sequential lists. (In fact, the printed representation is just like that of a list).

Each of the following operations can be done in O(log n) time:

  1. Find an item having a given key
  2. Find the kth item, given k
  3. Insert an item at a specified place
  4. Delete a specified item

Other operations are also possible in comparable time, including concatenation and splitting.

Objects are inserted and deleted based on their order with respect to the builtin Python compare function, PyObject_RichCompare(). The __getitem__ method is provided, and returns a list rather than a new tree object. __getitem__ woprks with slices also.

Each node uses four pointers (key, left, right, parent) and a long (rank and balance), so the size overhead is probably comparable to that of Python's dictionary object [which uses two pointers and a long, and resizes whenever more than half full]. Another difference is that the dictionary object holds (key, value) pairs, but avl objects hold only single object (the key).

The algorithms are taken directly from Knuth's "Art of Computer Programming, Volume 2: Searching and Sorting", and were prototyped using Python (avl_tree.py). They were then translated directly into C (avl_.c, avl.h), and a Cython module wrapper was written (avl.pyx, avl.pxd)

Several of the newer methods rely on an interesting order analysis for the get_successor() and get_predecessor() functions in avl_.c. If you're an analysis of algorithms guru, please take a look at the comments in avl_tree.py and check my work! 8^)

The avl 'library' (avl_.c, avl_.h) contains nothing Python-specific, so it should be useful for other projects. (as if you'd want to after becoming a Python convert!). However it was necessary to implement a few methods (getslice, concat) directly in avl.pyc.

Building:

Just the usual

python setup.py install

Usage:

>>> t = avl.newavl()
>>> t
[]
>>> t.insert(50)
>>> t
[50]
>>> t.insert(45)
>>> t
[45, 50]
>>> t.remove(50)
>>> t
[45]

>>> t = avl.newavl()
>>> for x in range(20): t.insert(random.randint(0, 100))
...
>>> t
[3, 8, 12, 15, 15, 29, 32, 42, 44, 48, 50, 53, 55, 57, 69, 74, 75, 76, 79, 81, 88]
# [Note: print_internal_structure() prints directly to stdout,
#   so it doesn't work in pythonwin, but will work with the
#   console-mode python.exe]
>>> t.print_internal_structure()
                                       +-[- 88 001]
                          +-[/ 81 001]-|
             +-[\ 79 008]-|
             |            |                         +-[- 76 001]
             |            |            +-[- 75 002]-|
             |            |            |            +-[- 74 001]
             |            +-[- 69 004]-|
             |                         |            +-[- 57 001]
             |                         +-[- 55 002]-|
             |                                      +-[- 53 001]
+-[- 50 011]-|
             |                                      +-[- 48 001]
             |                         +-[/ 44 001]-|
             |            +-[/ 42 002]-|
             |            |            +-[- 32 001]
             +-[- 29 006]-|
                          |                        +-[- 15 001]
                          |           +-[- 15 002]-|
                          |           |            +-[- 12 001]
                          +-[/ 8 002]-|
                                      +-[- 3 001]

# [now isn't that pretty?.]
# FYI, the leftmost char in each node represents the 'balance factor'
# for that node, whether the tree is heavy on the left ('\'), right
# ('/'), or not at all ('-').  the middle item is 'repr(key)', and the
# final number is the 'rank' of the node, used internally to quickly
# locate a node of a certain index - it's the number of nodes in the
# left subtree, plus one.

# create a new tree from a list.
# Note: this sorts the list as a side-effect, if you don't
# want that, use a copy (i.e., 'l = avl.from_list(my_list[:])')

>>> t = avl.newavl (range(30))
>>> len(t)
30
>>>

# You can address individual elements in the sequence:
# Note: you cannot assign to the items in the tree, as
# this might change the relative order of the keys.

>>> t[25]
25
>>> t[-1]
30

# You can take slices of the list. The default slice
# operation returns a new avl tree, not a list.
# [If you want a list, use the slice_as_list() method]

>>> random_tree[3:8]
[45, 87, 29, 73, 12]
>>> type(_)
<type 'AVL tree'>

# You can insert any python object:

>>> t = avl.newavl(range(10))
>>> t.insert('Hey, where will this go?')
>>> t
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 'Hey, where will this go?']
>>> 

# is a certain element in the tree?

>>> t.lookup (5)
5
>>> t.lookup (34)
Traceback (innermost last):
  File "<stdin>", line 1, in ?
KeyError: 34
>>> t.has_key (5)
1
>>> t.has_key (34)
0
>>>

lookup is a little tricky, because it returns the first object in the tree that compares equal to the key you give. For builtin objects, the builtin compare is used to order the tree, but by using your own objects with your own cmp functions, you can do all sorts of interesting things.

Note: It is important that you not directly change an object stored in an avl tree in a way that would change its ordered position. If you need to do this, first remove() the item, change it, then reinsert it! This is still very fast, as both operations are O(log n)

The results of such an illegal modification are unspecified, and are likely to crash the program. The only way to completely avoid this problem is to use only immutable types. This restriction seemed too severe to enforce. So be careful out there!

# to copy a tree:
>>> t2 = avl.newavl(t)

# to concatenate two trees
>>> t3 = t1 + t2
# Note: this is the equivalent of creating a
#  copy of t1, and then inserting all the elements
#  of t2 in turn.

# the 'repeat' operation (t1 * 5) is currently undefined
# drop me a line if you need it. (and describe what you think
# it should do)

release 2.0 (february 1997)

Major changes to the C library have been applied for this version. The comparison function is now a member of the avl_tree structure, instead of being passed in with all of the API functions. An extra compare_arg pointer is included to allow attaching 'state' to the compare function - this is a bit ugly, but was necessary to cleanly interface to python.

Several new functions are visible from python, including two from David Ascher david_ascher@brown.edu (at_least() and at_most()).

# 't.span()' returns a pair of indices into the tree that 'span' the
# given key or key pair.  For example:

>>> t
[1,2,3,4,5]
>>> t.span(4)
(3,4)
>>> t[3:4]
[4]
>>> t = avl.newavl(map (lambda x: random.randint(0,1000), range(10)))
>>> t
[15, 34, 371, 536, 659, 691, 714, 754, 847, 936]
>>> t.span(200,500)
(2, 3)
>>> t[2:3]
[371]
>>> t.span(300,700)
(2, 6)
>>> t[2:6]
[371, 536, 659, 691]
>>> 

# at_least() returns the first object comparing greater to or equal to the <key> argument.

>>> t.at_least (400)
536

# at_most() returns the first object comparing less than or equal to the <key> argument.
>>> t.at_most (800)
754

release 2.1.0 (Jun 2005)

Berthold Höllmann finally spurs me into doing the long-awaited merge.

release 2.1.1 (Jun 2005)

Lots of fixes/changes merged from various sources.

Only one change in functionality:

  • insert_by_key() now returns the index of insertion.

Thanks To:

  • Berthold Höllmann (modernization, 64-bit fixes, setup.py)
  • Charlie Kemp (another setup.py)
  • Paul Cameron (subtle bug when removing non-existent key )

And From IronPort:

  • Martin Baker (made slicing match Python's, other changes)
  • Eric Huss (refcount touch-ups, lots of other work)
  • Sam Rushing

release 2.1.2 (Aug 2005)

Fixed a bug reported by Kenneth Duda:

import avl
z = avl.newavl( None, lambda x,y: cmp(x[0],y[0]) )
z.insert('hello')
z.span('h')

exceptions.SystemError: 'NULL object passed to Py_BuildValue'

release 2.1.3 (Sep 2005)

Win32 compilation issue reported by Berthold Höllmann:

  • AVLmodule.c(855) : error C2099: initializer is not a constant

release 2.2.0 (May 2019)

Changing wrapper code to Cython, which also makes the module compatible to Python 3.