Home Classroom Algorithms Basics of Algorithms Part 3

# Basics of Algorithms Part 3

1834

Elementary Search Algorithms and Hash Tables

1. Introduction

Till now, we have been discussing sorting algorithms. There is another important category of algorithms known as Search Algorithms. These deal with the problem, ‘Given a list of elements, find if a particular is present in the list’.

2. Linear / Sequential Search

The linear search algorithm is the simplest search algorithm. It checks each element sequentially and compares it with the element to be searched. Hence, it tries every element in the list sequentially. This algorithm has the worst asymptotic behavior. The cost of the search is proportional to the number of elements in the list. In other words, the complexity is O(n).

The linear search algorithm falls under the category of a brute-force algorithm. A brute force algorithm is one which enumerates over all possible solutions and then chooses the suitable solution. In some cases there are algorithms which utilize certain properties of the problem which help in eliminating the need to enumerate over certain parts of the search-space, and give a better better complexity. Thus, wherever possible, a non brute-force solution is selected.

Linear-Search(List L, Element E)
1. For I = 1 to L.Size
1.1 If L[I] == E
Return I
2. Return -1

3. Binary Search

Binary Search is yet another application of the Divide-and-Conquer paradigm. It is also quite intuitive. Given, a sorted list of elements, Binary Search can find if a particular element exists in the list in O(log n) time.

Binary-Search(List L, Element E)
1. High = 1, Low = L.Size
2. While(High > Low)
2.1 Mid = (High + Low)/2
2.2 If(E > L[Mid])
2.2.1 Low = Mid + 1
2.2.2 Continue
2.3 If(E < L[Mid])
2.3.1 High = Mid – 1
2.3.2 Continue
3. If(L[Low] == E) Return Low
4. Return -1

The algorithm requires the list to be sorted. Now, initially the element can be anywhere between 1 and L.Size (inclusive). However, we can break the problem into a smaller size. This can be done by choosing a ‘probe’ element, such as the middle element of the list. Now suppose if Mid = ( 1 + L.Size ) /2. If the element at position Mid is lesser than the element E to be searched, then it is easy to see that if E exists in the list, it would be between (1 + L.Size)/2 + 1 and L.Size (inclusive).

Similarly, if the element at Mid is greater than the element E, then E, if it exists in the list, must be at a position between 1 and (1 + L.Size)/2 – 1 (inclusive). The search continues in the same fashion, until the search space is reduced to one element or the desired element is found.

Thus binary search reduces the size of the problem by 2, in every iteration. Hence given a list of size n, binary search will take at the most log2n iterations to decide if the element exists in the list. Thus the complexity of the algorithm is O(log n).

4. Hash Tables

A Hash Table is a data structure which makes use of a Hash Function. A Hash Function simply maps the key of the element to an index referring to the position in the array where that particular element will be stored. What this implies is, given an element to search, we can use the Hash Function to compute the index where it was supposed to be stored. We can then check if the element exists at that position or not.

However, there are two problems associated with this. Firstly, the array size is usually finite and the hash function should provide map all keys to indexes within this range. Secondly, because of the above limitation, often because the key space is larger than the array size, and also because of imperfections in the hash function, not all keys map to a unique index. Thus, more than one keys map to the same index. This is known as collision.

Let us take an example of a hash function for a string s of upto 10 characters.

H(s) = Sum of ASCII values of all the characters of s.
For this hash function, we need an array size of 256*10 = 2560 elements. Which is a reasonable size of For example, if s = “cd”, H(s) = 199. But for s = “be”, H(s) = 199 too. Thus, there is a collision.

Hence Hash Tables have to handle collisions. One method to handle them is through Chaining. In chaining, instead of placing just one element at the index, we make that particular index a bucket. Hence, it contains more than one elements, linked together by pointers (a linked list). Thus, when we obtain a particular hash value. We need to visit the bucket at that particular index, and traverse the linked list to check if the element exists in that.

Let us suppose there are n elements to be added to the hash table. The table has m indexes available for the elements. Now, if the hash functions map these elements uniformly to the indexes, there would be n /m elements at each index. Thus supposing the hash value computation takes O(1) time, the time for searching the element in the linked list would take O(n/m) time. Which works quite well in practice.

Chaining

1.Read about Ternary Search. Use it to find the maxima of the function f(x) = sin x in the range [0, Π] and minima in the range [Π, 2Π].
2.Can you use Binary Search to do the same?

6. References

[1] Thomas H. Cormen, Charles E. Lesiserson, Ronald L. Rivest, Clifford Stein, Introduction to Algorithms, second edition, Prentice-Hall.
[2] http://en.wikipedia.org/wiki/File:Hash_table_5_0_1_1_1_1_1_LL.svg

Gaurav is a final year student of Computer Engineering at Thadomal Shahani Engineering College, University of Mumbai (India). His handle on TopCoder is ‘red.dragon’.His areas of interest are Algorithms, Grid Computing and Cryptography