How your binary search implementation is broken and how to fix it

Posted by on . Last Updated on . Tagged:dotnetjava

I have recently begun studying the theory and implementations of algorithms. During this study, I came across this post which describes how a bug exists in most implementations of a binary search algorithm.

I’ll begin with the example given in the post, which is taken from java.util.Arrays.

public static int BinarySearch(int[] a, int key) {
    int low = 0;
    int high = a.length - 1;
    while (low <= high) {
        int mid = (low + high) / 2;
        int midVal = a[mid];
        if (midVal < key)
        {
            low = mid + 1;
        }
        else if (midVal > key)
        {
            high = mid - 1;
        }
        else
        {
            return mid; // key found
        }
    }
    return -(low + 1);  // key not found.
}

For all intensive purposes this implementation is correct. Except for a bug which can raise an Overflow Exception. The problematic line is this:

int mid = (low + high) / 2;

If the result of low + high is greater than maximum value of an 32-bit integer, the exception is raised. Luckily, there are ways that we can fix this:

int mid = low + (high - low) / 2;

Or if you’re in Java:

int mid = (low + high) >>> 1;

If you are working in .Net languages you will need to work with the unchecked keyword in order to use the bit shift method:

int mid;
unchecked
{
    mid = (low + high) >> 1;
}

The unchecked keyword is used to suppress overflow-checking for integral-type arithmetic operations and conversions.

In an unchecked context, if an expression produces a value that is outside the range of the destination type, the overflow is not flagged. For example, because the calculation in the following example is performed in an unchecked block or expression, the fact that the result is too large for an integer is ignored, and int1 is assigned the value -2,147,483,639.

(Source)

So here is the differential method in C#:

public static int BinarySearch(int[] a, int key) {
    int low = 0;
    int high = a.length - 1;
    while (low <= high) {
        int mid = low + (high - low) / 2;
        int midVal = a[mid];
        if (midVal < key)
        {
            low = mid + 1;
        }
        else if (midVal > key)
        {
            high = mid - 1;
        }
        else
        {
            return mid; // key found
        }
    }
    return -1;  // key not found.
}

And using the bit shift method:

public static int BinarySearch(int[] a, int key) {
    int low = 0;
    int high = a.length - 1;
    while (low <= high) {
        int mid;
        unchecked
        {
            mid = (low + high) >> 1;
        }
        int midVal = a[mid];
        if (midVal < key)
        {
            low = mid + 1;
        }
        else if (midVal > key)
        {
            high = mid - 1;
        }
        else
        {
            return mid; // key found
        }
    }
    return -1;  // key not found.
}

Personally, I prefer the differential version over the bit shift version but I have provided both for completeness. Even though this bug only manifests itself with well over a billion elements in an array, I still thought it was worth pointing out. I encourage you to read the other articles for more complete context.

Stuart Blackler is a seasoned technologist with over 15 years of commercial experience in the .NET ecosystem. Holding a degree in Computer Science, Stuart has earned certifications as a C# developer through Microsoft and as an AWS Solutions Architect and Developer. Stuart is the creator of the popular YouTube channel CodeWithStu, where he delves into topics close to his heart, including .NET, AWS, DevOps, and software architecture with a commitment to sharing knowledge and fostering a community of learners.