Binary search

In computer science, Binary search is a search algorithm for searching a set of sorted data for a particular data. In its most simple form, binary search assumes the data is sorted and takes advantage of that characteristic. As the data has to be sorted in the first place, it cannot be applied to data such as compound structure unless the programmer writes a special method to compare structures. However, not all languages support this ability to automatically utilise programmer created functions.

Binary search is often used together with other algorithms such as insertion sort or data structures.

Many of standard libraries of languages provide a way to do binary search. C++'s STL (Standard Template Library) provides algorithm functions lower_bound and upper_bound. Java offers binarySearch methods in Arrays class.

Binary search is a logarithmic algorithm and executes in O(log n). Specifically, 1 + log₂ N iterations are needed to return an answer. It is a considerably faster than a linear search. It can be implemented using recursion or iteration, although in many languages it is more elegantly expressed recursively.

This sample Ruby implementation returns true if at least one of the elements of array is equal to value, otherwise return false

def binary_search(array,value)
   first=0
   last=array.size - 1
   while (first <= last)
       mid = (first + last) / 2
       if (value > array[mid])
           first = mid + 1
       elsif (value < array[mid])
           last = mid - 1
       else
          return true
       end
   end
   return false
end

An example of binary search in action is a simple guessing game in which a player has to guess a positive integer selected by another player between 1 and N, using only questions answered with yes or no. Supposing N is 16 and the number 11 is selected, the game might proceed as follows.

Is the number greater than 8? (Yes)
Is the number greater than 12? (No)
Is the number greater than 10? (Yes)
Is the number greater than 11? (No)

Therefore, the number must be 11. At most ceiling(log₂ N) questions are required to determine the number, since each question halves the search space. Note that one less question (iteration) is required than for the general algorithm, since the number is constrained to a particular range.

If N is unknown or infinite, we can still find the mysterious number k in at most steps by first computing an N which is larger than or equal to k:

def find_N(array,value)
   N=1
   while (value>array[N-1])
       N=N*2
   end
   return N
end

In the guessing game when we don't know the value of N, the game would be:

Is the number greater than 1? (Yes)
Is the number greater than 2? (Yes)
Is the number greater than 4? (Yes)
Is the number greater than 8? (Yes)
Is the number greater than 16? (No, N=16, proceed as above)
Is the number greater than 12? (No)
Is the number greater than 10? (Yes)
Is the number greater than 11? (No)

Also observe that we don't need to ask about 8 twice.

In Wikipedia, it is possible to use a binary search to see which user added content to an article. One can find a revision which does not include the content, and do a binary search through the history to see where it appeared. Due to the asymptotic analysis above, this is far quicker than checking every difference.