188 reads

Pythonic Techniques for Handling Sequences

by Nadim JendoubiMarch 13th, 2023

Too Long; Didn't Read

Advanced Tips for better List and Tuple Manipulation and Sequence Handling, including ListComp, GenExp, Tuple manipulation, Sequence Pattern Matching and Unpacking.

featured image - Pythonic Techniques for Handling Sequences

Sequences and Collections are an integral part of any programming language. Python particularly handles them uniformly, i.e. any type of sequence from Strings and XML elements to arrays, lists, and Tuples are handled in a uniform way.

Understanding the variety of these sequences will spare us the reinvention of the wheel as the existing common sequence interface allows us to leverage and support any new sequence types.

The built-in sequences and how to group them

There are 2 types of sequences:

The container sequences: these hold items of different types, including nested containers. Like list, tuple, and double-ended queues.
The flat sequences: these hold items of simple types. Like str and byte.

A container sequence holds references to the objects it contains, which may be of any type, while a flat sequence stores the value of its contents in its own memory space, not as distinct Python objects.

Another way to group sequences is Mutability VS Immutability:

Mutable sequences: sequences that can change their values, for example, list, bytearray, array.array, and collections.deque.
Immutable sequences: sequences that can not change their values, for example, tuple, str, and bytes.

Let’s see what we can learn about Sequences! ✌️

ListComps and GenExps

List Comprehensions(ListComps) and Generator expressions(GenExps) are Python's solutions to build sequences in a readable and clear way.

The first deals mainly with Lists, the latter with any other type of sequences.

List Comprehensions

ListComp is a very concise way to create a list from an existing list, sometimes by operating on the existing items, using a simpler cleaner more compact syntax, all of this without having to deal with Python lambda.

fruits = ["apple", "banana", "cherry", "kiwi", "mango"]
newlistUsingForLoop = []
newlistUsingListComp = []

# using traditional for loop
for x in fruits:
  if "a" in x:
    newlistUsingForLoop.append(x)

print(newlistUsingForLoop) # ['apple', 'banana', 'mango']

# using ListComp
newlistUsingListComp = [x for x in fruits if "a" in x]

print(newlistUsingListComp) # ['apple', 'banana', 'mango']

# we can notice how easily readable the list comprehension version

Generator Expressions

A generator expression(GenExps) is an expression that returns a generator object, i.e. a function that contains a yield statement and returns a generator object.

GenExps use the same syntax as ListComps, but are enclosed in parentheses rather than brackets.

# create the generator object
squares_generator = (i * i for i in range(5))

# iterate over the generator and print the values
for i in squares_generator:
    print(i)

# this will output the square of numbers from 0 to 4 

colors = ['black', 'white']
sizes = ['S', 'M', 'L']
for tshirt in (f'{c} {s}' for c in colors for s in sizes):
    print(tshirt)

# black S
# black M
# black L
# white S
# white M
# white L

Tuples are Not Just Immutable Lists

Python presents Tuples as Immutable Lists, only this does not do them justice as they can be used as immutable lists and also as records with no field names.

Tuples as records

Tuples can serve as temporary records with no field names, this can be done only if the number of items is fixed and the order of items is respected as it is important.

coordinates = [(33.9425, -118.408056), (31.9425, -178.408056)]
for lat, _ in coordinates:
    print(f 'cordinate latitude: {lat}')

# this will print the latitude each time ignoring the longitude value

There are two ways of creating tuples with named fields but in this instance, they will be regarded as data classes and it is not what we want to discuss here.

Tuples as immutable lists

Tuples are also highly used in Python standard library for their immutability which brings clarity and performance optimization to the code.

a = (10, 'alpha', [1, 2])
b = (10, 'alpha', [1, 2])
print(a == b) 
# True
b[-1].append(99)
print(a == b) 
# False
print(b)
# (10, 'alpha', [1, 2, 99])

There is one caveat to mention, as portrayed above in the code example, only the references contained in the tuple are immutable, the objects held in the references can change their values.

Changing the value of an item in a tuple can lead to serious bugs as tuples are hashable, it is better to use a different data structure for your specific use case.

How to Unpack and Pattern Match sequences?

Unpacking Sequences and Iterables

Unpacking is an operation that consists of assigning an iterable of values to a list of variables in a single assignment statement, it avoids unnecessary and error-prone use of indexes to extract elements from sequences.

coordinates = (33.9425, -118.408056)
latitude, longitude = coordinates # unpacking
print(latitude) 
# 33.9425
print(longitude) 
# -118.408056

We can use the excess operator * for tuples and the excess operator ** for dictionaries when unpacking too.

The target of unpacking can use nesting, i.e. something like (a, b, (c, d)). Python will do the right thing if the value has the same nesting structure.

# Tuple Example
fruits = ("apple", "mango", "papaya", "pineapple", "cherry")
(green, *tropic, red) = fruits
print(green)
# apple
print(tropic)
# ['mango', 'papaya', 'pineapple']
print(red)
# cherry

# Dictionary Example
def myFish(**fish):
    for name, value in fish.items():
        print(f'I have {value} {name}')

fish = {
    'guppies': 2,
    'zebras' : 5,
    'bettas': 10
}
myFish(**fish)

# I have 2 guppies
# I have 5 zebras
# I have 10 bettas

# Nested Unpacking Example
metro_areas = [
('Tokyo', 'JP', 36.933, (35.689722, 139.691667)),
('Delhi NCR', 'IN', 21.935, (28.613889, 77.208889)),
('Mexico City', 'MX', 20.142, (19.433333, -99.133333)),
('New York-Newark', 'US', 20.104, (40.808611, -74.020386)),
('São Paulo', 'BR', 19.649, (-23.547778, -46.635833)),
]

print(f'{"":15} | {"latitude":>9} | {"longitude":>9}')
for name, _, _, (lat, lon) in metro_areas:
    if lon <= 0:
        print(f'{name:15} | {lat:9.4f} | {lon:9.4f}')

#                     | latitude | longitude
# Mexico City         | 19.4333 | -99.1333
# New York-Newark     | 40.8086 | -74.0204
# São Paulo           | -23.5478 | -46.6358

Sequence Pattern Matching

Pattern matching is done with the match/case statement. It is very similar to the if-elif-else statement only cleaner more readable and wields the power of Destructuring.

Destructuring is a more advanced form of unpacking, as it allows writing sequence patterns as tuples or lists or any combination of both.

Here’s a nice example I found on the GUICommits website that could simplify pattern matching with sequences.

baskets = [
    ["apple", "pear", "banana"], 
    ["chocolate", "strawberry"], 
    ["chocolate", "banana"], 
    ["chocolate", "pineapple"], 
    ["apple", "pear", "banana", "chocolate"],
]
match basket:

    # Matches any 3 items
    case [i1, i2, i3]: 
        print(f"Wow, your basket is full with: '{i1}', '{i2}' and '{i3}'")
    # Matches >= 4 items
    case [_, _, _, *_] as basket_items:
        print(f"Wow, your basket has so many items: {len(basket_items)}")

    # 2 items. First should be chocolate, second should be strawberry or banana
    case ["chocolate", "strawberry" | "banana"]:
        print("This is a superb combination")

    # Any amount of items starting with chocolate
    case ["chocolate", *_]:
        print("I don't know what you plan but it looks delicious")

    # If nothing matched before
    case _:
        print("Don't be cheap, buy something else")

One thing to point out, it’s that the *_ matches any number of items, without binding them to a variable. Using * extra instead of *_ would bind the items to extra as a list with 0 or more items.

Time to Slice them Up

A common feature of list, tuple, str, and all sequence types in Python is the support of slicing operations.

We slice a list like this: seq[start, stop, step], step is the number of items to skip. To evaluate the expression seq[start:stop:step], Python calls The Special Method seq.getitem(slice(start, stop, step)).

# On a list
l = [10, 20, 30, 40, 50, 60]
l[:2] # [10, 20]
l[2:] # [30, 40, 50, 60]
l[:3] # [10, 20, 30]

# On a str
s = 'bicycle'
s[::3] # 'bye'
s[::-1] # 'elcycib'
s[::-2] # 'eccb'

# Add, Mul, Del, assign operations on slices
l = list(range(10)) # [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
l[2:5] = [20, 30] # [0, 1, 20, 30, 5, 6, 7, 8, 9]
del l[5:7] # [0, 1, 20, 30, 5, 8, 9]

print(5 * 'abcd') # 'abcdabcdabcdabcdabcd'

What's next?

There are different uses for different types of sequences, and depending on your use case, there are better options. Arrays, Memory views and Double ended queues are prime examples of that.

Memory Views

The built-in memoryview class is a shared-memory sequence type that lets you handle slices of arrays without copying bytes. Using notation similar to the array module, the memoryview.cast method lets you change the way multiple bytes are read or written as units without moving bits around.

The memoryview.cast returns yet another memoryview object, always sharing the same memory.

octets = array('B', range(6)) # array of 6 bytes (typecode 'B')

m1 = memoryview(octets)
m1.tolist() # [0, 1, 2, 3, 4, 5]

m2 = m1.cast('B', [2, 3])
m2.tolist() # [[0, 1, 2], [3, 4, 5]]

m3 = m1.cast('B', [3, 2])
m3.tolist() # [[0, 1], [2, 3], [4, 5]]

m2[1,1] = 22
m3[1,1] = 33
print(octets) # array('B', [0, 1, 2, 33, 22, 5])

Also published here.