One of the more impressive features of Python language is the use of “for-each” looping construct that has awed me ever since I started out with Python. For the uninitiated, here is a simple for loop which prints the first 10 natural numbers:
for num in range(1, 11):print(num)
We can also loop over the primitive types such as list, tuples, dictionaries and strings in similar ways:
numbers = [1, 2, 3, 4, 5]record = ('Kshitij', 21, 'Loves Python')details = {'name': 'Kshitij','age': 21}
for num in numbers:print(num) # 1 2 3 4 5
for data in record:print(data) # Kshitij 21 Loves Python
for key, value in details.items():print(key, value) # age 21 name Kshitij
As one implements few data structures in Python using class
, he feels the desire to loop over the data stored in it’s instances. This is where the Iterator Protocol comes into picture.
Let us suppose we are tasked with implementing a standard 52-card deck. A sample implementation might look something like this:
This works fine with regards to creating new instances of Deck
and representing it. However, a major pain point in this implementation is the lack of ability to iterate over the Deck
object.
>>> from cards import Deck>>> new_deck = Deck() # New deck instantiated>>> print(new_deck)... # Works great>>> for card in new_deck:... print(card)
TypeError: 'Deck' object is not iterable
One can be smart enough to explore the instance new_deck
and conclude that the cards
attributes holds the data required for iterations and it, in fact is a list
. With this knowledge, he can hack the above loop as follows:
>>> for card in new_deck.cards:... print(card)Card(...)....
This code works great. However, the end user must attain the internal information about the implementation to perform the iteration. This makes our code lose the advantages of data abstraction and leads much to be desired of the implementation.
There must be a better way!
Urged by the enthusiasm from Raymond Hettinger, I searched for ways to improve my implementation to couple with the Python’s for
loop.
And soon I found the answer — The Iterator Protocol.
In order to learn what the Protocol is and how to implement it in Python, we need to understand some basic terms.
iter()
built-in function to get an iterator for them.next
function to get the next item from them and if there is no next item (because we reached the end), a StopIteration
exception will be raised.iter()
built-in.iter()
built-in works?Whenever the interpreter needs to iterate over an object x
, it automatically calls iter(x)
. The iter
built-in function:
__iter__
method and calls that to obtain an iterator.__iter__
method is not implemented, but __getitem__
method is implemented, Python creates an iterator that attempts to fetch items in order, starting from index 0
.TypeError
exception saying <classname> object is not iterable
.I will present two approaches to implementing the Iterator Protocol:
__next__
: returns the next item in the iterable.
__iter__
: returns itself i.e self
.
3. Define an __iter__
method in the class over whose instances you want to iterate i.e. class Deck. The method should return an instance of DeckIterator.
__iter__
method in the Deck class as a generator function.This is the list of all the features that our object magically seem to support as soon as we implement the protocol.
min
, max
) which consume an iterable.>>> new_deck = Deck()
>>> # 1. Looping through a for loop>>> for card in new_deck:... print(card) # Works great!
>>> # 2. Unpacking similarly to tuples>>> first_card, *rest, last_card = new_deck
>>> # 3. List Comprehensions>>> spades = [card for card in new_deck if card.suit == 'Spades']
>>> # 4. Built-in functions>>> max_card, min_card = max(new_deck), min(new_deck)
I hope that the knowledge of the Iterator Protocol will help you out when writing Python. In order to raise awareness about this seemingly under appreciated feature of Python, I have proposed a talk at PyCon India 2017 on this topic.