Sequences and Collections are an integral part of any programming language. Python particularly handles them uniformly, i.e. any type of sequence from Strings and XML elements to arrays, lists, and Tuples are handled in a uniform way.
Understanding the variety of these sequences will spare us the reinvention of the wheel as the existing common sequence interface allows us to leverage and support any new sequence types.
There are 2 types of sequences:
A container sequence holds references to the objects it contains, which may be of any type, while a flat sequence stores the value of its contents in its own memory space, not as distinct Python objects.
Another way to group sequences is Mutability VS Immutability:
Let’s see what we can learn about Sequences! ✌️
List Comprehensions(ListComps) and Generator expressions(GenExps) are Python's solutions to build sequences in a readable and clear way.
The first deals mainly with Lists, the latter with any other type of sequences.
ListComp is a very concise way to create a list from an existing list, sometimes by operating on the existing items, using a simpler cleaner more compact syntax, all of this without having to deal with Python lambda.
fruits = ["apple", "banana", "cherry", "kiwi", "mango"]
newlistUsingForLoop = []
newlistUsingListComp = []
# using traditional for loop
for x in fruits:
if "a" in x:
newlistUsingForLoop.append(x)
print(newlistUsingForLoop) # ['apple', 'banana', 'mango']
# using ListComp
newlistUsingListComp = [x for x in fruits if "a" in x]
print(newlistUsingListComp) # ['apple', 'banana', 'mango']
# we can notice how easily readable the list comprehension version
A generator expression(GenExps) is an expression that returns a generator object, i.e. a function that contains a yield statement and returns a generator object.
GenExps use the same syntax as ListComps, but are enclosed in parentheses rather than brackets.
# create the generator object
squares_generator = (i * i for i in range(5))
# iterate over the generator and print the values
for i in squares_generator:
print(i)
# this will output the square of numbers from 0 to 4
colors = ['black', 'white']
sizes = ['S', 'M', 'L']
for tshirt in (f'{c} {s}' for c in colors for s in sizes):
print(tshirt)
# black S
# black M
# black L
# white S
# white M
# white L
Python presents Tuples as Immutable Lists, only this does not do them justice as they can be used as immutable lists and also as records with no field names.
Tuples can serve as temporary records with no field names, this can be done only if the number of items is fixed and the order of items is respected as it is important.
coordinates = [(33.9425, -118.408056), (31.9425, -178.408056)]
for lat, _ in coordinates:
print(f 'cordinate latitude: {lat}')
# this will print the latitude each time ignoring the longitude value
There are two ways of creating tuples with named fields but in this instance, they will be regarded as data classes and it is not what we want to discuss here.
Tuples are also highly used in Python standard library for their immutability which brings clarity and performance optimization to the code.
a = (10, 'alpha', [1, 2])
b = (10, 'alpha', [1, 2])
print(a == b)
# True
b[-1].append(99)
print(a == b)
# False
print(b)
# (10, 'alpha', [1, 2, 99])
There is one caveat to mention, as portrayed above in the code example, only the references contained in the tuple are immutable, the objects held in the references can change their values.
Changing the value of an item in a tuple can lead to serious bugs as tuples are hashable, it is better to use a different data structure for your specific use case.
Unpacking is an operation that consists of assigning an iterable of values to a list of variables in a single assignment statement, it avoids unnecessary and error-prone use of indexes to extract elements from sequences.
coordinates = (33.9425, -118.408056)
latitude, longitude = coordinates # unpacking
print(latitude)
# 33.9425
print(longitude)
# -118.408056
We can use the excess operator * for tuples and the excess operator ** for dictionaries when unpacking too.
The target of unpacking can use nesting, i.e. something like (a, b, (c, d)). Python will do the right thing if the value has the same nesting structure.
# Tuple Example
fruits = ("apple", "mango", "papaya", "pineapple", "cherry")
(green, *tropic, red) = fruits
print(green)
# apple
print(tropic)
# ['mango', 'papaya', 'pineapple']
print(red)
# cherry
# Dictionary Example
def myFish(**fish):
for name, value in fish.items():
print(f'I have {value} {name}')
fish = {
'guppies': 2,
'zebras' : 5,
'bettas': 10
}
myFish(**fish)
# I have 2 guppies
# I have 5 zebras
# I have 10 bettas
# Nested Unpacking Example
metro_areas = [
('Tokyo', 'JP', 36.933, (35.689722, 139.691667)),
('Delhi NCR', 'IN', 21.935, (28.613889, 77.208889)),
('Mexico City', 'MX', 20.142, (19.433333, -99.133333)),
('New York-Newark', 'US', 20.104, (40.808611, -74.020386)),
('São Paulo', 'BR', 19.649, (-23.547778, -46.635833)),
]
print(f'{"":15} | {"latitude":>9} | {"longitude":>9}')
for name, _, _, (lat, lon) in metro_areas:
if lon <= 0:
print(f'{name:15} | {lat:9.4f} | {lon:9.4f}')
# | latitude | longitude
# Mexico City | 19.4333 | -99.1333
# New York-Newark | 40.8086 | -74.0204
# São Paulo | -23.5478 | -46.6358
Pattern matching is done with the match/case statement. It is very similar to the if-elif-else statement only cleaner more readable and wields the power of Destructuring.
Destructuring is a more advanced form of unpacking, as it allows writing sequence patterns as tuples or lists or any combination of both.
Here’s a nice example I found on the
baskets = [
["apple", "pear", "banana"],
["chocolate", "strawberry"],
["chocolate", "banana"],
["chocolate", "pineapple"],
["apple", "pear", "banana", "chocolate"],
]
match basket:
# Matches any 3 items
case [i1, i2, i3]:
print(f"Wow, your basket is full with: '{i1}', '{i2}' and '{i3}'")
# Matches >= 4 items
case [_, _, _, *_] as basket_items:
print(f"Wow, your basket has so many items: {len(basket_items)}")
# 2 items. First should be chocolate, second should be strawberry or banana
case ["chocolate", "strawberry" | "banana"]:
print("This is a superb combination")
# Any amount of items starting with chocolate
case ["chocolate", *_]:
print("I don't know what you plan but it looks delicious")
# If nothing matched before
case _:
print("Don't be cheap, buy something else")
One thing to point out, it’s that the *_ matches any number of items, without binding them to a variable. Using * extra instead of *_ would bind the items to extra as a list with 0 or more items.
A common feature of list, tuple, str, and all sequence types in Python is the support of slicing operations.
We slice a list like this: seq[start, stop, step], step is the number of items to skip. To evaluate the expression seq[start:stop:step], Python calls The Special Method seq.getitem(slice(start, stop, step)).
# On a list
l = [10, 20, 30, 40, 50, 60]
l[:2] # [10, 20]
l[2:] # [30, 40, 50, 60]
l[:3] # [10, 20, 30]
# On a str
s = 'bicycle'
s[::3] # 'bye'
s[::-1] # 'elcycib'
s[::-2] # 'eccb'
# Add, Mul, Del, assign operations on slices
l = list(range(10)) # [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
l[2:5] = [20, 30] # [0, 1, 20, 30, 5, 6, 7, 8, 9]
del l[5:7] # [0, 1, 20, 30, 5, 8, 9]
print(5 * 'abcd') # 'abcdabcdabcdabcdabcd'
There are different uses for different types of sequences, and depending on your use case, there are better options. Arrays, Memory views and Double ended queues are prime examples of that.
The built-in memoryview class is a shared-memory sequence type that lets you handle slices of arrays without copying bytes. Using notation similar to the array module, the memoryview.cast method lets you change the way multiple bytes are read or written as units without moving bits around.
The memoryview.cast returns yet another memoryview object, always sharing the same memory.
octets = array('B', range(6)) # array of 6 bytes (typecode 'B')
m1 = memoryview(octets)
m1.tolist() # [0, 1, 2, 3, 4, 5]
m2 = m1.cast('B', [2, 3])
m2.tolist() # [[0, 1, 2], [3, 4, 5]]
m3 = m1.cast('B', [3, 2])
m3.tolist() # [[0, 1], [2, 3], [4, 5]]
m2[1,1] = 22
m3[1,1] = 33
print(octets) # array('B', [0, 1, 2, 33, 22, 5])
Also published here.