Skip to main content

Binary trees

Tree is a dynamic data structure composed of multi-levels of linked lists. A tree is defined as a node which consists of a value together with a list of references (pointers) to other such nodes. The top most node is called root of the tree. Nodes are called parent if they points to other nodes, which are called children.

The simplest type of tree is a binary tree. A node in binary tree can have a maximum of two references (called children), denoted by left and right. Nodes without children is called leaf. The parent, grand parent, ... of a node is called ancestor nodes.

Depth: the depth of a node is the number of its ancestors. It is also the number of edges from that particular node to the root node. Root node has depth 0.

Height: height of a node is the number of edges in the longest downward path. It is also the maximum depth of subtree rooted on that particular node. All leafs have height 0.

Size: of a node is the number of nodes (including the node itself) in the subtree rooted on that node.

Set vs. Sequence: when we work on a set interface, we look for items in a tree using the key. On the other hand, in case of a sequence implementation, we look for items by its index. When looking for items by index, size of a subtree is important. Below is a pseudo code to find item by index:

tree *search(tree *node, i) {
size_left = size(node->left)
if (i < size_left) {
search(node->left, i)
} else if (i == size_left) {
return node;
} else { // i > size_left
search(node->right, i - size_left - 1)
}
}

Subtree augmentation: we can maintain certain properties of a subtree as a data member of a node, e.g, size of a subtree, minimum, maximum. Then we can calculate size of any tree/subtree in constant time. Whenever, we insert or delete new item to the tree, we need to update those node properties as well. We must be able to efficiently (time complexity better than or same as O(h)\mathcal{O}(h)) maintain such properties during tree modification. There are properties that we can not maintain efficiently such as index of a node; that would be O(n)\mathcal{O}(n) operation.

Binary Search Tree

A Binary Search Tree (BST) is a binary tree with following properties:

  1. All node keys are distinct
  2. The left subtree of a node contains only keys less than its key
  3. Right subtree contains only keys greater than current node key
  4. Both left and right subtrees are also binary search tree.

Example:

X here represents NULL pointer.

Since BST provides an ordering among the node keys, operations such search, find minimum, maximum becomes easier. The search depth of BST depends on the shape of the tree. If a tree is balanced the complexity is O(h)\mathcal{O}(h). In the worst case, when the tree is highly unbalanced (a singly linked list) the time complexity of search is O(n)\mathcal{O}(n).

src/c/data-structure/binary-tree.c
#include <stdio.h>
#include <stdlib.h>
#include <assert.h>

typedef struct tree
{
int key;
struct tree *left, *right;
} tree;

tree *create_tree(int key)
{
tree *node = (tree *)malloc(sizeof(tree));
assert(node);
node->key = key;
node->left = NULL;
node->right = NULL;

return node;
}

tree *insert_node(tree *node, int key)
{
if (node == NULL)
{
return (create_tree(key));
}

if (key < node->key)
{
node->left = insert_node(node->left, key);
}
else if (key > node->key)
{
node->right = insert_node(node->right, key);
}

return node;
}

void print_tree(tree *node)
{
if (node == NULL)
{
return;
}
else if (node->left == NULL && node->right == NULL)
{
printf("%d", node->key);
return;
}
else
{
printf("%d", node->key);

if (node->left != NULL && node->right != NULL)
{
printf("[");
print_tree(node->left);
printf(", ");
print_tree(node->right);
printf("]");
}
else if (node->right == NULL /* && (node->left != NULL) */)
{
printf("[");
print_tree(node->left);
printf("]");
}
else /* if (node->left == NULL && (node->left != NULL)) */
{
printf("[");
print_tree(node->right);
printf("]");
}
}
}

// depth first search
tree *search(tree *node, int key)
{
if (node == NULL || node->key == key)
{
return node;
}

if (key < node->key)
{
return search(node->left, key);
}
else
{
return search(node->right, key);
}
}

// build queue data structure
typedef struct queue
{
struct tree *item; // content of queue, a tree pointer
struct queue *next; // pointer to next queue item

} queue;

void enqueue(queue **q, tree *node)
{
queue *p = *q;
queue *r = (queue *)malloc(sizeof(queue));
assert(r);
r->item = node;
r->next = NULL;

if (p == NULL)
{
*q = r; // if the queue is empty, newly allocated item is the queue
}
else
{
while (p->next) // go to the end of queue and attach new item
{
p = p->next;
}
p->next = r;
}
}

tree *dequeue(queue **q)
// removes first item from queue and returns the item
{
queue *p = *q;

if (p)
{
*q = p->next;
}

return p->item;
}

int queue_length(queue *q)
{
int len = 0;

while (q)
{
q = q->next;
len++;
}

return len;
}

// breadth first traversal to print binary tree level by level
void print_tree_levels(tree *root)
{
int number_of_nodes;
if (root == NULL)
{
return;
}

queue *q = NULL;
enqueue(&q, root);

while (1)
{
number_of_nodes = queue_length(q);
if (number_of_nodes == 0)
{
break;
}

while (number_of_nodes > 0)
{
tree *node = dequeue(&q);
printf("%d", node->key);
if (number_of_nodes > 1)
{
printf(", ");
}

if (node->left != NULL)
{
enqueue(&q, node->left);
}

if (node->right != NULL)
{
enqueue(&q, node->right);
}

number_of_nodes--;
}
printf("\n");
}
}

int main()
{
const int SIZE = 8;
int node_keys[8] = {5, 3, 2, 4, 7, 6, 8, 1};
int i;
tree *root = NULL;

root = insert_node(root, node_keys[0]);

for (i = 1; i < SIZE; i++)
{
insert_node(root, node_keys[i]);
}

printf("Print tree by in-order (depth first) traversal order:\n");
print_tree(root);
printf("\n");

tree *find = search(root, 7);
printf("\nSearch: looking for '7' and got '%d'\n", find->key);

printf("\nPrinting tree level by level (breadth first traversal):\n");
print_tree_levels(root);

return 0;
}

Traversal algorithms

Pre-order traversal: nodenode->leftnode->right

In-order traversal: node->leftnodenode->right. In the above tree diagram, in-order traversal would be: 2, 5, 6, 7, 9, 10, 12, 13, 14.

Post-order traversal: node->leftnode->rightnode.

Note that there could be more than one (unique) tree representation for the same traversal order.

Above traversal algorithms are depth first algorithm, which use a stack for back-tracking. They can be implemented by recursion as shown in the above code example.

Balanced binary tree (AVL tree)

How do we ensure O(h)=O(logn)\mathcal{O}(h) = \mathcal{O}(\log n), where hh is the height of a tree and nn is the total number of items in the tree? We know that in case of a balanced binary tree O(h)=O(logn)\mathcal{O}(h) = \mathcal{O}(\log n). There are several category of balanced binary trees.

Rotation: We must ensure the traversal order after rotation. We must ensure that traversal order is unchanged after rotation. In case of AVL tree, we maintain the height balance. Skew of a node is defined as height(node->right) - height(node->left). We like to maintain the skew of any node to be either 0, or 1 or -1. If we can maintain height balance, O(h)=O(logn)\mathcal{O}(h) = \mathcal{O}(\log n), indeed.

tree-rotation

After rotation we need to update the subtree properties of A, B, and all their ancestors, which could be done in O(logn)\mathcal{O}(\log n) time.

Resources