I open-sourced some code today. The iterstuff package for Python is a tool for working with iterables and generators, and allows for some interesting tricks. All very nice, and open-sourcing anything is good. But that’s not what I wanted to write about.
The package consists of a main class (called Lookahead) and some example recipes – bits of code that use the Lookahead. One of the recipes, batch, yields slices of an iterable. A non-Lookahead version of it would be something like this:
from itertools import islice, chain def batch(iterable, size): """ Yield iterables for successive slices of `iterable`, each containing up to `size` items, with the last being less than `size` if there are not sufficient items in `iterable`. Pass over the input iterable once only. @param iterable: an input iterable. @param size: the maximum number of items yielded by any output iterable. """ it = iter(iterable) while True: batchiter = islice(it, size) # The call to next() is done so that when we have sliced # to the end of the input and get an empty generator, # StopIteration will be raised, and bubble up to be raised # in batch(), and thus iteration over the whole batch will # stop. yield chain([next(batchiter)], batchiter)
What interests me about this code is that it demonstrates a blind spot that I’ve seen in my own code several times. See where the code calls batchiter.next()? It reads the first element from the iterable, and then uses itertools.chain to ‘push’ that element back onto the front of the iterable. It’s clever, but a little clumsy.
And later it occurred to me that patterns are to blame. Many of the examples that explain generators have the same pattern: a loop that yields values. For example:
def count_from_one(limit): for i in range(limit+1): yield i + 1
This pattern says “a generator does some work and then yields each element from a loop”. But there’s no rule that says a generator can only yield once.
So we can rewrite batch in a neater way:
def batch(iterable, size): it = iter(iterable) # Instances of this closure generator are # yielded as iterables. def slicer(first_element, slice_of_it): # Yield the first element yield first_element # Yield the rest of the iterable for element in slice_of_it: yield element while True: slice = islice(it, 0, size) # If slice is empty, then calling # next(slice) will raise StopIteration # and exit the loop. yield slicer(next(slice), slice)
Here we use a nested generator that contains two yields, one for the first element and then a loop for the rest. It avoids the use of chain because it doesn’t try to follow the ‘usual’ pattern for generators.
Design patterns are good things, but they can lead you astray.