Home Classroom Algorithms

1398

1.Introduction

In the last article, we looked at Stack, which was a LIFO (Last In First Out) structure. What this means is that the element which goes in last (is on the top), comes out first (when popped). We will now have a look at data structure called Queue, and its variants.

2.Queue

Recollect how a Stack had a ‘top’. Queue is a data structure with two points, a ‘front’ and a ‘rear’. Elements are added to the rear and removed from the front. Thus, the elements are removed in the order of their respective insertion into the queue. Hence, it can be said that the Queue is a FIFO (First In First Out) data structure.

2.1 Operations

Thus, there are two basic operations in a queue:

i) Enqueue: To insert an element at the rear of the queue. The complexity of this operation is expected to be O(1).

ii) Dequeue: To remove an element from the front of the queue. The complexity of this operation is expected to be O(1).


A queue and its operations.

2.2 Implementation

As with a stack, a queue can also be represented using either an array or a linked list of elements.

Thus, we can see that the implementation should be fairly simple to understand if you have grasped the concept of linked lists. The array implementation of the queue has two pointers, front and rear which point to indices within the array, which act as the front and the rear. In the dequeue operation, the front pointer is incremented by one, and in the enqueue operation, the rear is incremented by one. However, once rear reaches the fixed boundary of the array, it can no longer be incremented, (even though there may be some free space in the array) until certain special adjustments are made.

To use the array without any problems, we implement a slightly different way of manipulating the front and rear pointers. This concept is known as a Circular Queue or in general, a Circular Buffer.

The array is considered to be a ‘circular’ one, joined end-to-end. So, once the rear reaches the boundary, it simply goes to the next position which is the beginning of the array (provided front does not lie there), and this process can go on until rear meets front.


A Circular Queue

Similarly after repeated deletions, the front may approach the boundary too, and can similarly wrap around too.

2.3 Other types of Queues

Double-Ended queue allows queuing and dequeuing to happen both at front and rear. It is also known as deque (pronounced as deck). Complexity of both the operations remains O(1).

Priority queue demands elements to be added with a ‘priority’. When, an element is to be removed from the front, the element with the greatest priority is removed first, regardless of when it arrived. A simple-implementation is O(N) for Enqueue and O(1) for Dequeue if a list of elements sorted by priority is maintained, where N is the number of elements in the queue. An O(1) for Enqueue and O(N) for Dequeue is also common when a sorted list is not maintained, and the element with the highest-priority is looked up in the list at the time of dequeuing.

However, common implementations use a special data structure called Heap, which gives O(log N) insertion and removal complexities.

2.4 Standard Implementations

Queue is implemented in C++ STL (Standard Template Library) using the queue template. Functions for pushing (queuing) and popping (dequeuing) at both front and back are, push_front, push_back, pop_front, and pop_back. There are implementations of Queue in most modern languages, however, one is expected to be comfortable with implementing them without external help.

3.0 Tasks for You
  1. Implement the Circular Queue
  2. Implement Double-Ended Queue.
4.0 References

[1] http://en.wikipedia.org/wiki/Queue_(data_structure)

1322

1. Introduction

A data structure is basically what it is. It is a structure of data. However, every data structure has a particular way of storing data, and data is retrieved / manipulated using certain operations which are defined for that data structure. The efficiency of these operations decide on the suitability of a data structure for a particular purpose.

 

2. Linked List

A linked list is a data structure which consists of a sequence of elements. Each element contains data and a link. The link is used to link with the next element in the record. The links are maintained with the use of pointers. The first element in the list is known as the head, and the pointer to the head is stored to perform operations on the linked list.

As we can see, the elements are linked together. The last link is a ‘Null’ pointer, signifying the end of the list. Since we only have the head pointer with us, we can not move to any random element in the list directly, we need to start from the head pointer and sequentially move to the next element, until we arrive at the desired element. This is known as traversing a list.

The most important operations on a Linked List are:

a) Insert: Insert an element in the beginning / middle / end of the list. Insertion at the beginning is O(1). If the pointer to the last element is stored, insertion at the end can be done in O(1). Since to insert in the middle, we would need to traverse the list, and if there are n elements in the list, the insertion is O(n).

b) Delete: Delete an element from the beginning / middle / end of the list. Deletion of elements have the same complexity as insertion.

 

Other types of Linked List:

a) Doubly Linked List:
A normal linked list has a link only to the next element. In a doubly linked list, each element has a link to the next and previous element.

This type of linked list allows traversal in both directions.

b) Circular Linked List:
In a circular linked list, the last element links to the head of the linked list. Thus, the next element after the last element, is actually the first one in the list.

struct node {
element e;
node *next;
};

node * head;

void insert_in_front(node * n) {
if(head == NULL)
{ head = n; head->next = NULL; return; }

n->next = head;
head = n;
}

void delete_in_front() {
if(head == NULL) {
cout << “Cannot delete”; return;

}
node * ret = head;
head = head->next;
free(ret); //If memory was allocated using malloc
}
element is a user-defined data-type. You can use templates to make it generic.

 

2. Stack

A stack is a data structure, in which elements are added to and removed from the ‘top’ of the stack. We can only access the element at the top of the stack. A stack can be maintained using a linked list with two restrictions as mentioned above, i.e. insertion and deletion happens only at the front of the list. A stack can also be implemented using an array.

 

The two operations involved in a stack are:

a) Push: Adding an element to the top of a stack is known as pushing an element. Pushing an element is an O(1) operation. If number of elements in a stack exceeds the desired limit, (or if it runs out of space if implemented using an array) it is known as a stack overflow error.

b) Pop: Removing an element to the top of a stack is known as popping an element. Popping the element is an O(1) operation. If an element cannot be popped, it is known as a stack underflow error.

c) Peek: This operation returns the element at the stack top. This is also an O(1) operation.

struct Stack {
int stack_top;
element stack[MAXSIZE]; };
void stack_init(Stack &s) {
s.stack_top = -1;
}
int stack_push(Stack &s, element e) {
if(s.stack_top == MAXSIZE) {
cout << “Stack Overflow”;
return -1;
}
s.stack[s.stack_top++] = e;
return 1;
}
int stack_pop(Stack &s, element &e) {
if(s.stack_top == -1) {
cout << “Stack Underflow”; return -1;
}
e = s.stack[s.stack_top--];
return 1;
}
int stack_peek(Stack &s, element &e) {
if(s.stack_top == -1) {
cout << “Empty stack”; return -1;
}
e = s.stack[s.stack_top];
return 1;
}

 

4. Tasks for you:
  1. Complete the code for Linked List to include insertion and deletion in middle and at rear.
  2. Extend the Linked List functions for Doubly Linked List and Circular Linked List.
  3. Study the Josephus’ problem and solve it using Circular Linked Lists.
  4. Using stacks, find the reverse of a string.
  5. A stack is used to check if parentheses are well-formed. For eg. “(())” and “()(())” are well-formed parentheses, but “((()” is not. Given a string, find if a string is a well-formed paretheses string using stacks.

1329

Elementary Search Algorithms and Hash Tables

 

1. Introduction

Till now, we have been discussing sorting algorithms. There is another important category of algorithms known as Search Algorithms. These deal with the problem, ‘Given a list of elements, find if a particular is present in the list’.

 

2. Linear / Sequential Search

The linear search algorithm is the simplest search algorithm. It checks each element sequentially and compares it with the element to be searched. Hence, it tries every element in the list sequentially. This algorithm has the worst asymptotic behavior. The cost of the search is proportional to the number of elements in the list. In other words, the complexity is O(n).

 

The linear search algorithm falls under the category of a brute-force algorithm. A brute force algorithm is one which enumerates over all possible solutions and then chooses the suitable solution. In some cases there are algorithms which utilize certain properties of the problem which help in eliminating the need to enumerate over certain parts of the search-space, and give a better better complexity. Thus, wherever possible, a non brute-force solution is selected.

 

Linear-Search(List L, Element E)
1. For I = 1 to L.Size
1.1 If L[I] == E
Return I
2. Return -1

 

3. Binary Search

Binary Search is yet another application of the Divide-and-Conquer paradigm. It is also quite intuitive. Given, a sorted list of elements, Binary Search can find if a particular element exists in the list in O(log n) time.

 

Binary-Search(List L, Element E)
1. High = 1, Low = L.Size
2. While(High > Low)
2.1 Mid = (High + Low)/2
2.2 If(E > L[Mid])
2.2.1 Low = Mid + 1
2.2.2 Continue
2.3 If(E < L[Mid])
2.3.1 High = Mid – 1
2.3.2 Continue
3. If(L[Low] == E) Return Low
4. Return -1

 

The algorithm requires the list to be sorted. Now, initially the element can be anywhere between 1 and L.Size (inclusive). However, we can break the problem into a smaller size. This can be done by choosing a ‘probe’ element, such as the middle element of the list. Now suppose if Mid = ( 1 + L.Size ) /2. If the element at position Mid is lesser than the element E to be searched, then it is easy to see that if E exists in the list, it would be between (1 + L.Size)/2 + 1 and L.Size (inclusive).

Similarly, if the element at Mid is greater than the element E, then E, if it exists in the list, must be at a position between 1 and (1 + L.Size)/2 – 1 (inclusive). The search continues in the same fashion, until the search space is reduced to one element or the desired element is found.

 

Thus binary search reduces the size of the problem by 2, in every iteration. Hence given a list of size n, binary search will take at the most log2n iterations to decide if the element exists in the list. Thus the complexity of the algorithm is O(log n).

 

4. Hash Tables

A Hash Table is a data structure which makes use of a Hash Function. A Hash Function simply maps the key of the element to an index referring to the position in the array where that particular element will be stored. What this implies is, given an element to search, we can use the Hash Function to compute the index where it was supposed to be stored. We can then check if the element exists at that position or not.

 

However, there are two problems associated with this. Firstly, the array size is usually finite and the hash function should provide map all keys to indexes within this range. Secondly, because of the above limitation, often because the key space is larger than the array size, and also because of imperfections in the hash function, not all keys map to a unique index. Thus, more than one keys map to the same index. This is known as collision.

 

Let us take an example of a hash function for a string s of upto 10 characters.

H(s) = Sum of ASCII values of all the characters of s.
For this hash function, we need an array size of 256*10 = 2560 elements. Which is a reasonable size of For example, if s = “cd”, H(s) = 199. But for s = “be”, H(s) = 199 too. Thus, there is a collision.

 

Hence Hash Tables have to handle collisions. One method to handle them is through Chaining. In chaining, instead of placing just one element at the index, we make that particular index a bucket. Hence, it contains more than one elements, linked together by pointers (a linked list). Thus, when we obtain a particular hash value. We need to visit the bucket at that particular index, and traverse the linked list to check if the element exists in that.

 

Let us suppose there are n elements to be added to the hash table. The table has m indexes available for the elements. Now, if the hash functions map these elements uniformly to the indexes, there would be n /m elements at each index. Thus supposing the hash value computation takes O(1) time, the time for searching the element in the linked list would take O(n/m) time. Which works quite well in practice.

 

Chaining
 

5. Tasks for You

1.Read about Ternary Search. Use it to find the maxima of the function f(x) = sin x in the range [0, Π] and minima in the range [Π, 2Π].
2.Can you use Binary Search to do the same?

 

6. References

[1] Thomas H. Cormen, Charles E. Lesiserson, Ronald L. Rivest, Clifford Stein, Introduction to Algorithms, second edition, Prentice-Hall.
[2] http://en.wikipedia.org/wiki/File:Hash_table_5_0_1_1_1_1_1_LL.svg

1225

Divide-and-Conquer and QuickSort

1. Introduction

In the previous article, we discussed about finding the complexities of algorithms. In the same article, we also came to know about Selection Sort and Bubble Sort, which are two simple algorithms and have an average case complexity of O(n2). But, we have a better algorithm at our disposal, the QuickSort which has an average case complexity of O(n log n). The worst case behavior is O(n2), but that can be avoided in most of the cases if implemented properly.

 

2. Divide-and-Conquer

QuickSort makes use of the Divide-and-Conquer strategy, which entails breaking the existing problem into two or more sub-problems and then solving the sub problems independently. Finally, the solved subproblems are used to solve the bigger problem. (Note: I assume you are familiar with recursion. If you are not, I suggest reading up about recursion from [1]) For example, Merge Sort is a Divide-and-Conquer algorithm. What it does is, it takes the unsorted list; if the list is of size 2 or less it is trivial to sort it, otherwise it divides the list into two sub-lists. These sub-lists are recursively sorted. Now when the two sub-lists have been sorted, the bigger sorted list is created by checking the first element from both the lists, whichever is smaller is placed into the new list and removed from the sub-list. This process goes on till both the sub-lists are empty.

There are many applications of the divide-and-conquer strategy. One of them, we just discussed and the other is QuickSort as I mentioned. Apart from these there is Euclid’s algorithm to find the Greatest Common Divisor of two numbers, which can in turn be used to find the Lowest Common Multiple, which I suggest you should read up in case you are not aware of it. Also, the famous Tower of Hanoi problem is a classic example of Divide-and-Conquer. Here I list the algorithm for recursively solving Tower of Hanoi using Divide and Conquer.

What this is doing is, Beg is the tower from where the disks are to be transferred to End, using Aux as the auxiliary tower for intermediate storage. The algorithm then transfers the N-1 disks from the top of Beg to Aux, using End as the auxiliary storage. Once this is done, the bottom most disk is transferred to End. Thereafter, the N-1 disks we had put in Aux are transferred to End, using Beg as the auxiliary. So elegant, isn’t it? Other slightly more complicated algorithms are the Cooley-Tukey algorithm to multiply two very large numbers (of the order of 1000 digits), the Karatsuba’s algorithm which achieves the same purpose and to find the Discrete Fourier Transform.

 

3. Algorithm

Once you get the hang of what QuickSort does, you will find it to be one of the most intuitive sorting algorithms which comes with a bonus of a good complexity. As I said, QuickSort uses the Divide-and-Conquer paradigm. What it essentially does in each step is, choose a ‘pivot’. A pivot can be any element in the list, the list is then divided into two sub-lists. One sublist consists of elements lesser than the pivot (left sub-list) and the other sub-list consists of elements greater than the pivot (right-sublist). What is now left to do is, sort these sublists recursively using the same procedure and then concatenate the left sub-list, the pivot and the right sub-list.

It is easy to visualize how this works. The problem is that every iteration eats up O(N) space. We can improve on that. Removing the space requirements will speed up the execution even more. To improve on the QuickSort algorithm, we would use two functions here. The first being Partition, which will place the pivot in the correct position in the existing array and ensure that all the elements to the left of the pivot are less than the pivot, and all the elements to the right of the pivot are greater than it. Lets see the partition function first. The other function is the QuickSort function which will find the pivot and partition the list based on that and then recursively sort the two sub-lists.

Thus, what this function does is, it first puts the pivot at the index right and maintains a position variable, FinalPosition (initially equal to left). Everytime it finds a value smaller or equal to the pivot, it keeps that value at the index FinalPosition and increments FinalPosition. In the end, all the elements before FinalPosition are smaller than or equal to the Pivot. So, the Pivot is finally placed at FinalPosition. This value is returned. Lets see how the partition function is utilized to complete the;

I think it is quite apparent that we are making a lot of savings in terms of space as well as time. Since there are no returning of lists, which can become quite unwieldy with increase in the size of the list, we are definitely making better use of space and time. 4. Tasks for You 1. Read up about the Master Theorem. 2. Use Master Theorem to find the average-case complexity of Quick Sort

 

5. References

[1] Yedidyah Langsam, Moshe J. Augenstein, Aaron M. Tenenbaum, Data Structures Using C and C+ +, second edition, Prentice-Hall.

[2] http://en.wikipedia.org/wiki/Quicksort

[3] http://en.wikipedia.org/wiki/File:Partition_example.svg

1261

1. What is an Algorithm?

Given an input I, an algorithm is a series of clearly-defined computation steps which transforms it into the desired output O.

For example, given a sequence of numbers I = {9, 18, 3, 14, 1}, find O, where O is the minimum of all the numbers in the set. The algorithm for this problem is trivial to design. We can iterate over all the elements of the set, and find the minimum element.

However, consider the famous sorting problem. Given a sequence of numbers I = {9, 18, 3, 14, 1}, find O, where O is I sorted in ascending order. There are several possible solutions to this problem. One of the solution can be designed using the algorithm we designed previously. Find the minimum number from I, add it to (the end of) O, and remove that element from I. Similarly, continue finding the minimum elements from I and adding it to the end of O, until I is not empty. This is actually a well-known algorithm to sort numbers, known as Selection Sort. There are quite a few algorithms to do the same job, but the key differentiator amongst any two algorithms is their efficiency. By the end of this article, you would know how we define this ‘efficiency’ and realize how this algorithm is not the best to use for sorting.

 

2. Complexity Analysis

An algorithm is roughly judged on the basis of two major metrics:

a) Time Complexity: It is an indicator of the running time of the algorithm.
b) Space Complexity: It is an indicator of the computational memory required by the algorithm.

 

2.1 Time Complexity:

The notion of time-complexity deals with the running time of the algorithm. It provides us with the idea of the number of steps executed in the algorithm. This is basically to judge the algorithm from a machine-independent view.

Let us look at the pseduo-code for the algorithm we designed for finding the minimum number in the sequence I.

Now, let us associate a cost with the execution of each statement, and the number of times it will get executed. Assume the size of I to be n.

Now, we can see that the total cost of running this algorithm for I of size n is,

We now realize that the cost of running this algorithm is linear with respect to n. That is, the cost grows linearly with increase in n. For the purpose of complexity analysis, we are interested only in the highest order terms, because as the value of n increases, the contribution made by the lower order terms to the running time becomes insignificant.

Now, let us study the sorting algorithm that we discussed earlier.

Now, reconstructing the table in the same fashion, as we did for the previous algorithm, we can prove that the T(n) for Selection-Sort is of the form an2 + bn + c. Thus, T(n) grows with the second power of n.

 

2.1.2 The Big-O Notation

Though we can express the running-time of any algorithm exactly, as we did earlier, it is usually very tedious. We are only interested in how the running-time grows with the increase in the size of the input. The extra-precision involved in the exact description of T(n) is not worth it. If an algorithm A grows at a slower rate than algorithm B to perform the same task, it will usually be more efficient in performing the task. Thus, what we are looking for is the asymptotic efficiency of the algorithm.

The Big-O notation (pronounced “big-oh”) is one of the most commonly used notations to describe the complexity of the algorithm. According to [1]

g(n) is said to be the upper-bound for the algorithm.

For the above two algorithms, the Big-O notations are: O(n) and O(n2). There are other asymptotic notations for describing complexity of algorithms. You can read more about them at [1]. If an algorithm has T(n) of the form, T(n) = a, its time-complexity is said to be O(1). That is, the algorithm operates in constant time. If the T(n) is of the form T(n) = anx + bnx-1 + .. + c, its time complexity is O(nx) and is said to operate in polynomial time. Similarly if T(n) = an + b, its time complexity is O(an) and is said to be an exponential time algorithm. Exponential time algorithms are the least desirable, and we usually strive to obtain a polynomial time solution to problems with such algorithms.

 

2.1.2 Best, Average and Worst Case Analysis

Let us look at another algorithm to sort I. This algorithm is known as Bubble Sort and is quite intuitive.

This algorithm basically iterates over the input sequence, and swaps adjacent elements if they are not in order. The sequence is traversed once again if a swap was performed in the last traversal through the sequence.

In several algorithms, T(n) varies with variation in the input. For example, in most of the cases, the time complexity of the above algorithm is O(n2). This is known as the average case. But realize that, in case the input is sorted in an ascending fashion, the time complexity becomes O(n), since the algorithm will iterate over I just once. Thus, the best case complexity of Bubble Sort is O(n). The worst-case complexity is simply the opposite of the best case complexity.

 

2.1.3 How efficient is Selection Sort?

It turns out that the best, average and worst-case complexities for Selection Sort are all O(n2). However, there are other algorithms such as Quick Sort where the average-case complexity is O(n log n). Similarly Merge Sort also has an average complexity of O(n log n). There are other sorting methods which do not rely on comparison of elements and have still better complexities, such as Counting Sort, but we shall not get into the details of those here.

We realize that since algorithms like Quick Sort on an average case have the complexity as O(n log n), the number of steps required for sorting a sequence I with Quick Sort would definitely be lesser than those required to sort I with Selection Sort. It becomes quite tedious to sort using Selection Sort when n becomes larger than 104. An O(n log n) algorithm can still sort a 107 element array with relative ease.

 

2.2 Space Complexity

You already have the tools to find the space complexity of an algorithm, which is nothing else but an asymptotic bound on the space taken by the algorithm for a given input size n.

Just like we associated the big-O notation with time complexity, we can do the same with space-complexity, and find the best-case, average-case and worst-case space complexities for the algorithm.

 

3 Tasks for you:

  1. Study about the other asymptotic notations for time complexity.
  2. Derive the time complexity of Insertion Sort.

 

References

[1] Thomas H. Cormen, Charles E. Lesiserson, Ronald L. Rivest, Clifford Stein, Introduction to Algorithms, second edition, Prentice-Hall.