Image by Author
If you’re new to Python, you may have come across the terms “iteration” and “membership” and wondered what they mean. These concepts are fundamental to understanding how Python handles collections of data, such as lists, tuples, and dictionaries. Python employs special dunder methods to enable these functionalities.
But what exactly are dunder methods? Dunder/Magic methods are special methods in Python that start and end with a double underscore, hence the name “dunder.” They are used to implement various protocols and can be used to perform a wide range of tasks, such as checking membership, iterating over elements, and more. In this article, we will be focusing on two of the most important dunder methods: __contains__ and __iter__. So, let’s get started.
Understanding Pythonic Loops with Iter Method
Consider a basic implementation of a file directory using Python classes as follows:
class File:
def __init__(self, file_path: str) -> None:
self.file_path = file_path
class Directory:
def __init__(self, files: List[File]) -> None:
self._files = files
A straightforward code where the directory has an instance parameter that contains a list of File objects. Now, if we want to iterate over the directory object, we should be able to use a for loop as follows:
directory = Directory(
files=[File(f"file_{i}") for i in range(10)]
)
for _file in directory:
print(_file)
We initialize a directory object with ten randomly named files and use a for loop to iterate over each item. Simple enough, But whoops! You get an error message: TypeError: ‘Directory’ object is not iterable.
What went wrong? Well, our Directory class isn’t set up to be looped through. In Python, for a class object to become iterable, it must implement the __iter__ dunder method. All iterables in Python like List, Dictionaries, and Set implement this functionality so we can use them in a loop.
So, to make our Directory object iterable, we need to create an iterator. Think of an iterator as a helper that gives us items one by one when we ask for them. For example, when we loop over a list, the iterator object will provide us with the next element on each iteration until we reach the end of the loop. That is simply how an iterator is defined and implemented in Python.
In Python, an iterator must know how to provide the next item in a sequence. It does this using a method called __next__. When there are no more items to give, it raises a special signal called StopIteration to say, “Hey, we’re done here.” In the case of an infinite iteration, we do not raise the StopIteration exception.
Let us create an iterator class for our directory. It will take in the list of files as an argument and implement the next method to give us the next file in the sequence. It keeps track of the current position using an index. The implementation looks as follows:
class FileIterator:
def __init__(self, files: List[File]) -> None:
self.files = files
self._index = 0
def __next__(self):
if self._index >= len(self.files):
raise StopIteration
value = self.files[self._index]
self._index += 1
return value
We initialize an index value at 0 and accept the files as an initialization argument. The __next__ method checks if the index overflows. If it is, it raises a StopIteration exception to signal the end of the iteration. Otherwise, it returns the file at the current index and moves to the next one by incrementing the index. This process continues until all files have been iterated over.
However, we are not done yet! We have still not implemented the iter method. The iter method must return an iterator object. Now that we have implemented the FileIterator class, we can finally move towards the iter method.
class Directory:
def __init__(self, files: List[File]) -> None:
self._files = files
def __iter__(self):
return FileIterator(self._files)
The iter method simply initializes a FileIterator object with its list of files and returns the iterator object. That’s all it takes! With this implementation, we can now loop over our Directory structure using Python’s loops. Let’s see it in action:
directory = Directory(
files=[File(f"file_{i}") for i in range(10)]
)
for _file in directory:
print(_file, end=", ")
# Output: file_0, file_1, file_2, file_3, file_4, file_5, file_6, file_7, file_8, file_9,
The for loop internally calls the __iter__ method to display this result. Although this works, you might still be confused about the underlying workings of the iterator in Python. To understand it better, let’s use a while loop to implement the same mechanism manually.
directory = Directory(
files=[File(f"file_{i}") for i in range(10)]
)
iterator = iter(directory)
while True:
try:
# Get the next item if available. Will raise StopIteration error if no item is left.
item = next(iterator)
print(item, end=', ')
except StopIteration as e:
break # Catch error and exit the while loop
# Output: file_0, file_1, file_2, file_3, file_4, file_5, file_6, file_7, file_8, file_9,
We invoke the iter function on the directory object to acquire the FileIterator. Then, we manually utilize the next operator to invoke the next dunder method on the FileIterator object. We handle the StopIteration exception to gracefully terminate the while loop once all items have been exhausted. As expected, we obtained the same output as before!
Testing for Membership with Contains Method
It is a fairly common use case to check for the existence of an item in a collection of objects. For example in our above example, we will need to check if a file exists in a directory quite often. So Python makes it simpler syntactically using the “in” operator.
print(0 in [1,2,3,4,5]) # False
print(1 in [1,2,3,4,5]) # True
These are majorly used with conditional expressions and evaluations. But what happens if we try this with our directory example?
print("file_1" in directory) # False
print("file_12" in directory) # False
Both give us False, which is incorrect! Why? To check for membership, we want to implement the __contains__ dunder method. When it is not implemented, Python fall backs to using the __iter__ method and evaluates each item with the == operator. In our case, it will iterate over each item and check if the “file_1” string matches any File object in the list. Since we’re comparing a string to custom File objects, none of the objects match, resulting in a False evaluation
To fix this, we need to implement the __contains__ dunder method in our Directory class.
class Directory:
def __init__(self, files: List[File]) -> None:
self._files = files
def __iter__(self):
return FileIterator(self._files)
def __contains__(self, item):
for _file in self._files:
# Check if file_path matches the item being checked
if item == _file.file_path:
return True
return False
Here, we change the functionality to iterate over each object and match the file_path from the File object with the string being passed to the function. Now if we run the same code to check for existence, we get the correct output!
directory = Directory(
files=[File(f"file_{i}") for i in range(10)]
)
print("file_1" in directory) # True
print("file_12" in directory) # False
Wrapping Up
And that’s it! Using our simple directory structure example, we built a simple iterator and membership checker to understand the internal workings of the Pythonic loops. We see such design decisions and implementations fairly often in production-level code and using this real-world example, we went over the integral concepts behind the __iter__ and __contains__ methods. Keep practicing with these techniques to strengthen your understanding and become a more proficient Python programmer!
Kanwal Mehreen Kanwal is a machine learning engineer and a technical writer with a profound passion for data science and the intersection of AI with medicine. She co-authored the ebook “Maximizing Productivity with ChatGPT”. As a Google Generation Scholar 2022 for APAC, she champions diversity and academic excellence. She’s also recognized as a Teradata Diversity in Tech Scholar, Mitacs Globalink Research Scholar, and Harvard WeCode Scholar. Kanwal is an ardent advocate for change, having founded FEMCodes to empower women in STEM fields.