If I have a list that is already sorted and use the in keyword, for example:
a = [1,2,5,6,8,9,10] print 8 in a
I think this should do a sequential search but can I make it faster by doing binary search? Is there a pythonic way to search in a sorted list?
There is a binary search for Python in the standard library, in module
bisect. It does not support
contains as is, but you can write a small function to handle it:
from bisect import bisect_left def contains(a, x): """returns true if sorted sequence `a` contains `x`""" i = bisect_left(a, x) return i != len(a) and a[i] == x
>>> contains([1,2,3], 3) True >>> contains([1,2,3], 4) False
This is not going to be very speedy though, as
bisect is written in Python, and not in C, so you'd probably find sequential
in faster for quite a lot cases.
bisect has had an optional C acceleration in CPython since Python 2.4.
It is hard to time the exact break-even point in CPython. This is because the code is written in C; if you check for a value that is greater to or less than any value in the sequence, then the CPU's branch prediction will play tricks on you, and you get:
In : a = list(range(100)) In : %timeit contains(a, 101) The slowest run took 8.09 times longer than the fastest. This could mean that an intermediate result is being cached 1000000 loops, best of 3: 370 ns per loop
Here, the best of 3 is not representative of the true running time of the algorithm.
But tweaking tests, I've reached the conclusion that bisecting might be faster than
in for lists having as few as 30 elements.
However, if you're doing really many
in operations you ought to use a
set; you can convert the list once into a set (it does not even be sorted) and the
in operation will be asymptotically faster than any binary search ever would be:
>>> a = [10, 6, 8, 1, 2, 5, 9] >>> a_set = set(a) >>> 10 in a_set True
On the other hand, sorting a list has greater time-complexity than building a set, so most of the time a set would be the way to go.