import bisect
import sys
HAYSTACK = [1, 4, 5, 6, 8, 12, 15, 20, 21, 23, 23, 26, 29, 30]
NEEDLES = [0, 1, 2, 5, 8, 10, 22, 23, 29, 30, 31]
ROW_FMT = '{0:2d} @ {1:2d} {2}{0:<2d}'
def demo(bisect_fn):
for needle in reversed(NEEDLES):
position = bisect_fn(HAYSTACK, needle) # (1)
offset = position * ' |' # (2)
print(ROW_FMT.format(needle, position, offset)) # (3)
if __name__ == '__main__':
if sys.argv[-1] == 'left': # (4)
bisect_fn = bisect.bisect_left
else:
bisect_fn = bisect.bisect
print('DEMO:', bisect_fn.__name__) # (5)
print('haystack ->', ' '.join(f'{n:2}' for n in HAYSTACK))
demo(bisect_fn)
Managing Ordered Sequences with Bisect
The bisect
module offers two main functions that use the binary search algorithm to quickly find and insert items in any sorted sequence.
Searching with bisect
bisect(haystack, needle)
does a binary search for needle
in haystack—which must be a sorted sequence—to locate the position where needle
can be inserted while maintaining haystack
in ascending order.
In other words, all items appearing up to that position are less than or equal to needle
.
You could use the result of bisect(haystack, needle)
as the index
argument to haystack.insert(index, needle)
—however, using insort
does both steps, and is faster.
Tip
|
Raymond Hettinger wrote a |
Example 1 uses a carefully chosen set of "needles" to demonstrate the insert positions returned by bisect
.
Its output is in Figure 1.
-
Use the chosen
bisect
function to get the insertion point. -
Build a pattern of vertical bars proportional to the
offset
. -
Print formatted row showing needle and insertion point.
-
Choose the
bisect
function to use according to the last command-line argument. -
Print header with name of function selected.
The behavior of bisect
can be fine-tuned in two ways.
First, a pair of optional arguments, lo
and hi
, allow narrowing the region in the sequence to be searched when inserting. lo
defaults to 0 and hi
to the len()
of the sequence.
Second, bisect
is actually an alias for bisect_right
, and there is a sister function called bisect_left
.
Their difference is apparent only when the needle compares equal to an item in the list: bisect_right
returns an insertion point after the existing item, and bisect_left
returns the position of the existing item, so insertion would occur before it.
With simple types like int
, inserting before or after makes no difference,
but if the sequence contains objects that are distinct yet compare equal, then it may be relevant.
For example, 1
and 1.0
are distinct, but 1 == 1.0
is True
. Figure 2 shows the result of using bisect_left
.
An interesting application of bisect
is to perform table lookups by numeric values—for example,
to convert test scores to letter grades, as in Example 2.
>>> breakpoints = [60, 70, 80, 90]
>>> grades='FDCBA'
>>> def grade(score):
... i = bisect.bisect(breakpoints, score)
... return grades[i]
...
>>> [grade(score) for score in [55, 60, 65, 70, 75, 80, 85, 90, 95]]
['F', 'D', 'D', 'C', 'C', 'B', 'B', 'A', 'A']
The code in Example 2 is from the bisect
module documentation, which also lists functions to use bisect
as a faster replacement for the index
method when searching through long ordered sequences of numbers.
When used for table lookups, bisect_left
produces very different results[1].
Note the letter grade results in Example 3.
bisect_left
maps a score of 60 to grade 'F', not 'D' as in Example 2.>>> breakpoints = [60, 70, 80, 90]
>>> grades='FDCBA'
>>> def grade(score):
... i = bisect.bisect_left(breakpoints, score)
... return grades[i]
...
>>> [grade(score) for score in [55, 60, 65, 70, 75, 80, 85, 90, 95]]
['F', 'F', 'D', 'D', 'C', 'C', 'B', 'B', 'A']
These functions are not only used for searching, but also for inserting items in sorted sequences, as the following section shows.
Inserting with insort
Sorting is expensive, so once you have a sorted sequence, it’s good to keep it that way.
That is why bisect.insort
was created.
insort(seq, item)
inserts item
into seq
so as to keep seq
in ascending order.
See Example 4 and its output in Figure 3.
import bisect
import random
SIZE = 7
random.seed(1729)
my_list = []
for i in range(SIZE):
new_item = random.randrange(SIZE * 2)
bisect.insort(my_list, new_item)
print(f'{new_item:2d} -> {my_list}')
Like bisect
, insort
takes optional lo
, hi
arguments to limit the search to a sub-sequence.
There is also an insort_left
variation that uses bisect_left
to find insertion points.