33 Binary Search Trees

33 Binary Search Trees

DEFINITION: A binary search tree is a binary tree with the further property that, for every node,

the value of the element in the left child is less than the value of the element at the node, and
the value of the element in the right child is greater than the value of the element at the node.

Here's a definition of a class for binary search trees:

template <class Etype>
class Binary_Search_Tree
{
  protected:
    void Make_Empty(Tree_Node<Etype> * &);
    Tree_Node<Etype> * Find(const Etype &, Tree_Node<Etype> * ) const;
    Tree_Node<Etype> * Find_Max(Tree_Node<Etype> *) const;
    Tree_Node<Etype> * Find_Min(Tree_Node<Etype> *) const;
    void Insert(const Etype &, Tree_Node<Etype> * &);
    void Remove(const Etype &, Tree_Node<Etype> * &);

Tree_Node<Etype> *Root;
Tree_Node<Etype> *Last_Find;

  public:
    Binary_Search_Tree( ) : Root( NULL ), Last_Find( NULL ) { }
    Binary_Search_Tree( Binary_Search_Tree & Value );
    virtual ~Binary_Search_Tree( )
      { Make_Empty( Root ); Root = Last_Find = NULL; }
    const Etype & operator ( ) ( ) const
       { return Last_Find != NULL ? Last_Find>Element : NULL; }
     virtual int Find( const Etype & X )
       { return int( Last_Find = Find( X, Root ) ); }
    virtual int Find_Max()
      { return int (Last_Find = Find_Max(Root)); }
    virtual int Find_Min()
       { return int (Last_Find = Find_Min(Root)); }
    virtual void Insert( const Etype & X )
      { Insert( X, Root ); }
    void Make_Empty( )
       { Make_Empty( Root ); Root = Last_Find = NULL; }
    virtual void Remove( const Etype & X )
      { Remove( X, Root ); }
};

The Root of the tree is kept as a protected member of the class.

The constructor Binary_Search_Tree( ) simply sets Root and Last_Find to NULL.

As with the List class, a pointer Last_Find is kept to the last node
accessed in the Find, Find_Max, and Find_Min operations. The Find operations return Boolean, not the node found.

A number of member functions are overloaded. For example, Insert is

33.1 Insert

The Insert function is overloaded. The Insert in the public section
passes the element to be inserted and Root to the protected section Insert function.

Insert traverses the tree rooted at T to find the insertion point. If X is already
in the tree, it is not inserted. When the insertion point is found, a new Tree_Node
is created and "attached" at the insertion point.

template <class Etype>
void Binary_Search_Tree<Etype>::
Insert( const Etype & X, Tree_Node<Etype> * & T )
{
  if( T == NULL )
  {
    T = new Tree_Node<Etype>( X );
    if( T == NULL )
        Error("Out of space");
  }

else if( X < T>Element )
Insert( X, T>Left );

else if( X > T>Element )
Insert( X, T>Right );
}

QUESTION: Why is T passed as a reference to pointer?

ANSWER:

33.2 Make Empty

This is a recursive definition. It deletes nodes bottomup
(i.e. from the leaves towards the root)
The figure below shows the recursive calls to Make_Empty for a binary search tree.

template <class Etype>
void Binary_Search_Tree<Etype>::
Make_Empty( Tree_Node<Etype> * & T )
{
  if( T != NULL )
  {
    Make_Empty( T>Left );
    Make_Empty( T>Right );
    delete T;
    T = NULL;
  }
}

33.3 Remove

The Remove operation must do more than simply delete a node. That could result in the tree becoming disconnected.
Consider, for example, what would happen if the Root were removed.

When a node is removed, another node must replace it to keep the tree connected.
Moreover, the binary search tree property must be maintained.

When the node to be removed has been found, there are three cases to consider:

           1. The node to be removed is a leaf node.
                  Since it has no children, just delete it
           2. The node to be removed has only one child.
                  Replace it by that child.
           3. The node to be removed has two children
                  Replace it by the smallest node which is larger.
                  This is the leftmost right descendent.

The figure below shows a binary search tree (BST) before and after removal of Root.
Note that the smallest node larger than Root replaced Root.

33.4 Find, Find_Min, and Find_Max

The various public Find operations return a Boolean which indicates
if the node has been found. If it has been found, Last_Find points at it.
The contents of the node can then be retrieved via operator().

Each of the public functions calls the appropriate protected function,
passing it the Root of the tree.

The following are the protected functions:

Find traverses the tree rooted at T, guiding the traversal by the size of the element at each node
compared to the size of the element sought

template <class Etype>
Tree_Node<Etype> *
Binary_Search_Tree<Etype>::
Find( const Etype & X, Tree_Node<Etype> * T ) const
{
  if( T == NULL )
    return NULL;
  else if( X < T>Element )
    return Find( X, T>Left );
  else if( X > T>Element )
     return Find( X, T>Right );
  else
    return T;
}

Find_Min traverses the tree rooted at node T,
but only searches along the leftmost edges

template <class Etype>
Tree_Node<Etype>*
Binary_Search_Tree<Etype>::
Find_Min( Tree_Node<Etype> *T ) const
{
  if( T == NULL )
    return NULL;
  else if( T>Left == NULL )
    return T;
  else
  return Find_Min( T>Left );
}

Find_Max traverses the tree rooted at node T,
but only searches along the rightmost edges

template <class Etype>
Tree_Node<Etype>*
Binary_Search_Tree<Etype>::
Find_Max( Tree_Node<Etype> *T ) const
{
  if( T != NULL )
    while( T>Right != NULL )
      T = T>Right;
  return T;
}

33.5 Asymptotic Analysis of Binary Search Trees

BST operations on any single node are O(d), where d is the

depth of the node.

The worst case depth of a node in a BST is O(n)

The average case depth of a node in a BST is O(lg n)

33.5.1 Enumeration of Binary Trees

In order to understand what is meant by the "average" binary tree, we
must look at all possible binary trees.

The figure below shows all the binary trees of three nodes. Since these trees are binary trees, not binary search trees, they are differentiated strictly on

structure, not on values at the nodes.

The figure below also shows the binary search trees constructed from all the permutations of {1, 2, 3}. The six permutations are {1,2,3}, {1,3,2}, {2,1,3}, {2,3,1}, {3,1,2}, and {3,2,1}. These trees are differentiated on the basis of the values at the nodes.

The number of threenode binary trees is not the same as the number of hreenode binary search trees.

33.5.2 Averagecase Analysis

The average depth over all nodes in a binary search tree is O(lg n)
(assuming all BST's of n nodes are equally likely.

Average depth over all nodes means

1/N * SUM_i=1^N Depth(t_i), where Depth(t) is
the depth of node t

DEFINITION: The internal path length (IPL) of a tree is the sum of the
depths of all the nodes in the tree.
Average depth of a tree can therefore be given as IPL / N

Theorem 31.1 The internal path length of a binary tree T which has n nodes is
D(n) = D(i) + D(n - i - 1) + n - 1
where i is the number of nodes in the left subtree of T

Proof:

          D( n ) = D( i )               // IPL of left subtree
                   + D( n - i - 1 )     // IPL of right subtree
                   + i                     // 1 for each node of left subtree
                   + (n - i - 1)         // 1 for each node of right subtree

                   = D( i )  + D( n - i - 1) + n - 1

Note that D(1) = 0.

For binary search trees, all subtree sizes are equally likely. This is not true for binary trees.
We demonstrate by example. Consider binary trees of size 3.
Let P ( i , j ) be the probability of a left subtree of size i and a right subtree of size j.

Examining the figure above, the probabilities for binary trees are

P(1, 1) = 1/5 , P(2, 0) = 2/5, and P(0, 2) = 2/5

the probabilities for binary search trees are

P(1, 1) = 1/3, P(2, 0) = 1/3, and P (0, 2) = 1/3

Theorem 33.2 In binary search trees, the average value of D(n) is O(n lg n)
Proof: Theorem 33.1 gives a recurrence for D(n) based on the sizes of left and rightsubtrees.
Since all subtrees are equally likely for binary search trees,
the average value of both D( i ) and D( n - i - 1 ) is

1/n * SUM_j=0^n-1 D(j)
therefore the average value of D(n) is

2 * 1/n SUM_j=0^n-1 D(j) + n - 1

It can be shown by this recurrence D(n) = O(n lg n)

Theorem 33.3 D( n ) = 2/n SUM _j=0^n-1 D( j ) + n - 1 = O(n lg n)
Proof: Multiply by n, then evaluate for (n - 1) to get 2 equations

nD(n) = 2 SUM _j=0^n-1 D(j) + n² - n
and

(n - 1)D(n) = 2 SUM _j=0^n-2 D(j) + (n - 1)² - (n - 1)

Subtract the second equation from the first to get

nD(n) - (n - 1)D(n - 1) = 2D(n - 1) + 2n + 2

Rearrange, dropping insignificant constant 2 and dividing by n(n - 1)

             D(n)      =   D(n - 1)  +      2
            --------     -----------    --------
             n + 1           n            n + 1

We can now write the following sequence of equations by repetitively
substituting n - 1 for n

             D(n)      =   D(n - 1)  +      2
            --------     -----------    --------
             n + 1           n            n + 1

             D(n-1)      =   D(n-2)  +      2
            --------        --------    --------
               n             n -1            n 

             D(n-2)      =   D(n - 3)  +      2
            --------       -----------    --------
             n - 1             n-2          n - 1

               D( 2 )     =     D( 1 )    +     2
               --------        -------        ------
                   3               2             3

Adding these (remembering that D(1) = 0 yields

D(n) = 2 SUM_i=3ⁿ⁺¹ 1/i = ln(n + 1)
n + 1

Therefore, D(n) = O(n lg n)

It is NOT true that the average depth of the nodes in ANY
binary search tree is O(lg n). Consider the binary search tree
formed by inserting keys in sorted order. The average depth in
this tree is O(n).

It is true that, averaged over all binary search trees, the
expected depth of any node is O(lg n)

34 Attempts to Guarantee O(lg n) Behavior

There are a number of variations on the binary search tree which attempt
to guarantee that operations on the tree can be done in O(lg n) time.

Tight Balance Rearrange the binary search tree so it has no
extraordinarily long paths. The AVL tree uses this approach and
guarantees that the depth of the tree is O(lg n).

Loose Balance Rearrange the binary search tree so it is pretty well
balanced. Redblack trees do this and guarantee O(lg n) tree height.
The constant of proportionality is not as good as for AVL.

Splay No explicit balance conditions are maintained. Rather, the tree is
reorganized on each operation in such a way that the amortized cost
of each operation is O(lg n). This means that n operations will cost
O(n lg n).