Originally published on melvinkoh.me
“RuntimeError: Dictionary changed size during iteration”, Python slaps you in your face when you try to add/remove entries in
dict
object during iteration.In this post, I will walk us through how we can filter a
dict
collection. First, let us try to understand the problem by running this snippet in your interpreter:from typing import Dict
def get_menu() -> Dict[str, dict]:
return {
"TIMMY_BLACK": {
"item": "Timmy's Coffee Barista's Black",
"sugar_free": True,
"with_milk": False,
},
"TIMMY_LATTE": {
"item": "Timmy's Coffee Latte",
"sugar_free": True,
"with_milk": False,
},
"TIMMY_SWEETENED_LATTE": {
"item": "Timmy's Coffee Latte - Sweetened",
"sugar_free": True,
"with_milk": True,
},
"TIMMY_SIGNATURE_CARAMEL": {
"item": "Timmy's Signature - Caramel Latte",
"sugar_free": None, # Unknown
"with_milk": True,
}
}
coffees = get_menu()
# Filter unsweetened coffees -> RuntimeError
for code, details in coffees.items():
if details.get("sugar_free") is True:
del coffees[code]
> Try to run this snippet! ☝️
You will get a RuntimeError, as expected.
> “RuntimeError: dictionary changed size during iteration”, Python interpreter
The error message is self-explanatory and this makes perfect sense, you shouldn’t be able to expand/shrink the collection while you are looping through it. Imagine if you are allowed to do so, and you keep expanding the collection in your loop body:
import random
data = {
1: 1,
2: 2,
3: 3
}
# This will raise an Error
for k in data:
rand = random.randint(4, 2^31)
data[rand] = rand
If the above snippet is valid in Python, we will slump into infinite loop until memory is run out.
So, in any event we want to filter a
dict
object in Python, can we circumvent away from the RuntimeError? Of course!This is the least favorable solution, given that we have to create a new copy of the dictionary.
"""
Solution #1: Create a shallow copy of dict
"""
coffees = get_menu() # I use this function to generate my dict object
coffee_copy = {**coffees} # Create a shallow copy
for code, details in coffee_copy.items():
if details.get("sugar_free") is True:
del coffees[code]
The above code will work, but it’s less hygienic to me.
Another solution is instead, create a copy of dictionary keys as our iterator:
"""
Solution #2: Create a copy of keys
"""
coffees = get_menu()
key_copy = tuple(coffees.keys()) # Feel free to use any iterable collection
for k in key_copy:
if coffees[k].get("sugar_free") is True:
del coffees[k]
If you are against of create any copy of either the source dictionary
or the keys, we can also use the “deferred update” strategy. Deferred
update, as the name tells, is to defer any mutation to the dictionary
until the iteration is over.
"""
Solution #3: Deferred Update
Risk: What if an Exception is raised half way?
"""
coffees = get_menu()
keys_to_be_removed = []
for code, details in coffees.items():
if details.get("sugar_free") is True:
keys_to_be_removed.append(code)
for k in keys_to_be_removed:
del coffees[k]
However, a caveat here. Imagine an uncaught exception in between line
9 and 10, the iteration will break, all pending updates will be void
and your
dict
object will be left intact.I personally in favor of using copy of dictionary keys and deferred
update. In the subsequent part of this post, I will use the former to
illustrate how we can make our filter logic reusable.
At this point, I decided to use a copy of dictionary keys to mitigate the problem when attempting to delete entries during iteration. Let’s see how we can make the logic reusable.
In my case, I created
FilterableDict
— a subclass of the built-in dict
class to add filter()
method:class FilterableDict(dict):
def __init__(self, *args, **kwargs):
dict.__init__(self, *args, **kwargs)
def filter(self, predicate):
key_copy = tuple(self.keys()) # use Tuple to further reduce memory footprint
for k in key_copy:
if predicate(k, self.get(k)):
del self[k]
return self
def __repr__(self):
return "FilterableDict({})".format(super().__repr__())
> FilterableDict, subclass of built-in dict class
To further reduce the memory footprint of our key copy, I use a
tuple
instead of a list
collection to store them.Let’s make use of our
FilterableDict
and observe the output:> Using our FilterableDict
Here we go, a reusable class to support filtering for a dictionary in Python.
from typing import Dict
# Helper function to generate dict object
def get_menu() -> Dict[str, dict]:
return {
"TIMMY_BLACK": {
"item": "Timmy's Coffee Barista's Black",
"sugar_free": True,
"with_milk": False,
},
"TIMMY_LATTE": {
"item": "Timmy's Coffee Latte",
"sugar_free": True,
"with_milk": False,
},
"TIMMY_SWEETENED_LATTE": {
"item": "Timmy's Coffee Latte - Sweetened",
"sugar_free": True,
"with_milk": True,
},
"TIMMY_SIGNATURE_CARAMEL": {
"item": "Timmy's Signature - Caramel Latte",
"sugar_free": None, # Unknown
"with_milk": True,
}
}
# Our dict subclass
class FilterableDict(dict):
def __init__(self, *args, **kwargs):
dict.__init__(self, *args, **kwargs)
def filter(self, predicate):
key_copy = tuple(self.keys())
for k in key_copy:
if predicate(k, self.get(k)):
del self[k]
return self
def __repr__(self):
return "FilterableDict({})".format(super().__repr__())
coffees = FilterableDict(get_menu())
coffees # FilterableDict({...})
len(coffees) # 4
sugar_free_filter = lambda _, v: not v["sugar_free"]
coffees.filter(sugar_free_filter)
len(coffees) # 3
A different way to achieve the same purpose? Leave me a comment :)