While developing a program, duplicate pieces of code may occur as a result of choosing particular data structures. Appropriate data structures reflect objects’ properties and allow the creation of a concise and intuitive solution. On the contrary, choosing the wrong data structure may complicate the code and make further development even more problematic. For this reason, how to organize your data to simplify the processing is an issue that may highly affect the code.
We’ll consider an example I came across in practice where a choice of data structure caused duplicate for loop in several methods of the same class. I simplified the code a bit to convey the idea.
def process_south_group(uid: int, settings: List[Tuple[str, int]]) -> None:
for name, value in settings:
if name == "working_time":
pass
elif name == "timeout":
pass
elif name == "temperature":
pass
def process_north_group(uid: int, settings: List[Tuple[str, int]]) -> None:
for name, value in settings:
if name == "temperature":
pass
elif name == "connected_to":
pass
In practice, the class contained 3–4 methods with the same pattern for settings processing. I have shown only 2 methods to illustrate the pattern:
for name, value in settings:
...
Each method takes settings as a parameter that is a list of tuples. The parameter represents a sensor’s settings. Settings have a predefined number of parameters: working_time, temperature, timeout, connected_to, where any of them can be optional. For instance, after reading from a file, settings may be:
settings = { 'working_time': 15, 'timeout': 2, 'temperature': 25 }
or
settings = { 'working_time': 15 }
or
settings = { 'connected_to': 5 }
It turns out we have to iterate over the list of settings to get hold of the attributes’ values due to an inappropriate choice of a data structure. For instance, if a method needed to process a temperature value somehow, we would have to:
for name, value in settings:
if name == "temperature":
# to do something
So, as we have to iterate over each tuple in the list, we come to the same for loop in each method:
Let’s see how we can change the code to prevent us from duplicating for loop in each method. We’d like to retrieve a temperature value like this:
setttings.temperature
Since we need a data structure for storing data without associated behavior, dataclass is an appropriate candidate for this case:
@dataclass
class Settings:
working_time: Optional[int] = None
temperature: Optional[int] = None
timeout: Optional[int] = None
connected_to: Optional[int] = None
# or from Python 3.10 you can use union syntax instead of Optional
@dataclass
class Settings:
working_time: int | None = None
temperature: int | None = None
timeout: int | None = None
connected_to: int | None = None
Each of the parameters is None by default as some of them may not be omitted. The original code transforms into:
def process_south_group(uid: int, settings: Settings) -> None:
if settings.working_time is not None:
pass
if settings.timeout is not None:
pass
if settings.temperature is not None:
pass
def process_north_group(uid: int, settings: Settings) -> None:
if settings.temperature is not None:
pass
if settings.connected_to is not None:
pass
We have to check whether the value is not None to differentiate between undefined parameters and ones set to 0.
if settings.timeout: # skip if timeout = 0 or timeout is None
pass
if settings.timeout is not None: # skip only if timeout is None
pass
The data class has helped us to get rid of the for loops in the methods. Besides, you can now look at the method signature and go to the definition of Settings to figure out which parameters it may have. Previously, we had to look through the source code of several methods to grasp all possible parameters for settings because some of them could work with various parameters.
The current implementation is far more flexible in terms of extensibility. For example, if you have different types of sensors with different settings, you may create a hierarchy of sensor classes derived from the base data class. When you add more behavior to the object, you may replace it with a regular class in Python.
Duplicated code may also come across at the loop level. Sometimes, it roots from using an unsuitable data structure for abstracting purposes. It complicates your program as you need to build more complex logic to process them. In that case, think of choosing a different data structure that better reflects the object’s properties allowing you to avoid loops where they are redundant. Bear also in mind that variables with complex type annotations can be a sign of a possibility to split your object into several ones.