Efficiency of Python & ldquo; In & rdquo; Keyword for sorted list


If I have a list that is already sorted and use the in keyword, for example:

a = [1,2,5,6,8,9,10]
print 8 in a

I think this should do a sequential search but can I make it faster by doing binary search? Is there a pythonic way to search in a sorted list?

There is a binary search for Python in the standard library, in module bisect. It does not support in/contains as is, but you can write a small function to handle it:

from bisect import bisect_left
def contains(a, x):
    """returns true if sorted sequence `a` contains `x`"""
    i = bisect_left(a, x)
    return i != len(a) and a[i] == x


>>> contains([1,2,3], 3)
>>> contains([1,2,3], 4)

This is not going to be very speedy though, as bisect is written in Python, and not in C, so you'd probably find sequential in faster for quite a lot cases. bisect has had an optional C acceleration in CPython since Python 2.4.

It is hard to time the exact break-even point in CPython. This is because the code is written in C; if you check for a value that is greater to or less than any value in the sequence, then the CPU's branch prediction will play tricks on you, and you get:

In [2]: a = list(range(100))
In [3]: %timeit contains(a, 101)
The slowest run took 8.09 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 370 ns per loop

Here, the best of 3 is not representative of the true running time of the algorithm.

But tweaking tests, I've reached the conclusion that bisecting might be faster than in for lists having as few as 30 elements.

However, if you're doing really many in operations you ought to use a set; you can convert the list once into a set (it does not even be sorted) and the in operation will be asymptotically faster than any binary search ever would be:

>>> a = [10, 6, 8, 1, 2, 5, 9]
>>> a_set = set(a)
>>> 10 in a_set

On the other hand, sorting a list has greater time-complexity than building a set, so most of the time a set would be the way to go.