In this post, we will be talking about how Python likes to deal with "list-like objects". We will be diving into some quirks of Python that might seem a bit weird and, in the end, we will hopefully teach you how to build something that could actually be useful while avoiding common mistakes. Part 1: Fake lists Let's start with this snippet. index == : index == : index == : index == : index == : index == : index == : : IndexError(index) f = FakeList() : class FakeList : def __getitem__ (self, index) if 0 return "zero" elif 1 return "one" elif 2 return "two" elif 3 return "three" elif 4 return "four" elif 5 return "five" elif 6 return "six" else raise A lot of people will be familiar with this: f[ ] 3 # <<< 'three' is the method you override if you want your instances to respond to the square bracket notation. Essentially is equivalent to . __getitem__ f[3] f.__getitem__(3) What you may not know, is this: i, n enumerate(f): print(i, n) list(f) for in # 0 zero # 1 one # 2 two # 3 three # 4 four # 5 five # 6 six # <<< ['zero', 'one', 'two', 'three', 'four', 'five', 'six'] or this: f f 'three' in # <<< True 'apple' in # <<< False Before I explain what I is going on, let's try to tweak the snippet to see how it reacts: think class FakeList: def __getitem__(self, index): if index == 0: return "zero" elif index == 1: return "one" elif index == 2: return "two" elif index == 3: return "three" elif index == 5: return "five" elif index == 6: return "six" else: raise IndexError(index) - elif index == 4: - return "four" f = FakeList() list(f) Although this would be a reasonable outcome: list(f) # <<< ['zero', 'one', 'two', 'three', 'five', 'six'] # wrong It turns out that the actual result is this: list(f) # <<< ['zero', 'one', 'two', 'three'] Let's try another tweak now: class FakeList: def __getitem__(self, index): if index == 0: return "zero" elif index == 1: return "one" elif index == 2: return "two" elif index == 3: return "three" elif index == 4: return "four" elif index == 5: return "five" elif index == 6: return "six" - else: - raise IndexError(index) f = FakeList() list(f) If you try to run this, it will get stuck and you will have to stop it with ctrl-c. To see why this is the case, let's tweak some more: i, n enumerate(f): print(i, n) input( ) for in "Press Enter to continue" # 0 zero # Press Enter to continue # 1 one # Press Enter to continue # 2 two # Press Enter to continue # 3 three # Press Enter to continue # 4 four # Press Enter to continue # 5 five # Press Enter to continue # 6 six # Press Enter to continue # 7 None # Press Enter to continue # 8 None # Press Enter to continue # 9 None # Press Enter to continue # 10 None # Press Enter to continue # 11 None # Press Enter to continue # ... And our final tweak: class FakeList: def __getitem__(self, index): if index == 0: return "zero" elif index == 1: return "one" elif index == 2: return "two" elif index == 3: return "three" elif index == 4: return "four" elif index == 5: return "five" elif index == 6: return "six" else: raise IndexError(index) + 3 / 0 f = FakeList() i, n enumerate(f): print(i, n) for in # 0 zero # 1 one # 2 two # ZeroDivisionError: divison by zero With all of this in mind, let's try to figure out what Python does when you try to iterate over an object. The steps are, in order: See if object has an method. If it does, call it and `yield` the results. __iter__ See if the object has a method. If it does, call it repeatedly, each result until at some point it raises a exception. It would be reasonable to assume that Python would give up at this point, but it looks like it has yet another trick up its sleeve: __next__ yield StopIteration See if the object has a method. If it does: - Call it with , the result - Call it with , the result - Call it with , the result - And so on... - If at some point you get an , stop the iteration - If at some point you get any other exception, raise it __getitem__ 0 yield 1 yield 2 yield IndexError This explains all our examples: When we removed the part, it went straight to the and stopped the iteration elif index == 4 IndexError When we removed the part, it went to the end of the body of the method, which in Python means that the method returns ; is a perfectly acceptable value for to return, so the iteration went on forever raise IndexError(index) None None __getitem__ When we injected a somewhere, it raised a in the middle of the iteration 3 / 0 ZeroDivisionError Lets now revert to our first example, the "correct" one, and try throwing some more curveballs at it: len(f) list(reversed(f)) # TypeError: object of type 'FakeList' has no len() # TypeError: object of type 'FakeList' has no len() To be honest, the first time I tried these, I expected to work. Python would simply have to try an iteration and count how many steps it took to reach an IndexError. But it doesn't. It probably makes sense since iterable sequences may also be infinite sequences and Python would get stuck. The fact that doesn't work wasn't surprising, especially since didn't work. How would Python know where to start? In fact, when we called reversed(), Python complained about the missing of FakeList, not . But it seems that we can fix both problems by adding to our FakeList: len() reversed() len() len() reversed() len() class FakeList: def __getitem__(self, index): if index == 0: return "zero" elif index == 1: return "one" elif index == 2: return "two" elif index == 3: return "three" elif index == 4: return "four" elif index == 5: return "five" elif index == 6: return "six" else: raise IndexError(index) + def __len__(self): + return 7 f = FakeList() len(f) list(reversed(f)) # <<< 7 # <<< ['six', 'five', 'four', 'three', 'two', 'one', 'zero'] So, to sum up. What can we do with our object? FakeList We can use the square bracket notation (no surprises there): f[3] == "three" We can call on it (again, no surprises): len() len(f) == 7 We can iterate over it: for n in f: print(n), list(f) We can reverse it: for n in reversed(f): print(n), list(reversed(f)) We can find things in it with in: 'three' in f == True So, our appears to behave like a list in almost all respects. But, how can we be sure that we have covered all the bases? Are we missing something? Is there a defined "interface" for "list-like objects" in Python? FakeList Part 2: Abstract Base Classes Abstract Base Classes, or ABCs, are a feature of Python that is not all that well known. There is some theory behind them, that they try to strike a balance between "static typing", which in Python usually means using a lot to determine if a value conforms with the type you are expecting, and "duck typing", which usually means "don't check the types of any value; instead interact with them as if they have the type you expect, and deal with the exceptions that will be raised if they don't conform to your expected type's interface". ABCs introduce something that in the Python ecosystem is called "Goose typing". isinstance Long story short, Abstract Base Classes allow you to call and have it return , when in fact obj is an instance of or one of its subclasses. Let's see it in action: isinstance(obj, cls) True not cls collections.abc Sized isinstance(NotSized(), Sized) : class NotSized : def __len__ (self, *args, **kwargs) pass from import # <<< True You can write your own ABCs, and the theory behind why they are needed and how they work is interesting, but it is not what I want to talk about here. Because, apart from defying , they also have some functionality built-in. If you visit the , you will see the following section: isinstance documentation page of collections.abc This tells us the following: If your class subclasses and defines the and methods, then: Sequence __getitem__ __len__ calling isinstance(obj, Sequence) will return True and they will also have the other 5 methods: , , , and __contains__ __iter__ __reversed__ index count (You can verify the second statement by checking out the source code of Sequence ; it's neither big nor complicated) The first statement is not really surprising, but it is important because it turns out that is the "official" way of saying that obj is a readable list-like object in Python. isinstance(obj, Sequence) == True What is interesting here is that, even without subclassing from Sequence, Python already gave , and to our class from . Lets put the last two mixin methods to the test: __contains__ __iter__ __reversed__ FakeList Part 1 f.index( ) f.count( ) 'two' # AttributeError: 'FakeList' object has no attribute 'index' 'two' # AttributeError: 'FakeList' object has no attribute 'count' We can fix this by subclassing FakeList from Sequence def __getitem__(self, index): ... +from collections.abc import Sequence -class FakeList: +class FakeList(Sequence): f.index( ) f.count( ) 'two' # <<< 2 'two' # <<< 1 So the bottom line of all this is: If you want to make something that can be "officially" considered a readable list-like object in Python, make it subclass Sequence and implement at least the __getitem__ and __len__ methods The same conclusion holds true for all the ABCs listed in the . For example, if you want to make a fully legitimate read- list-like object, you would simply have to subclass from MutableSequence and implement the , , , and insert methods (the ones in the 'Abstract methods' column). documentation write __getitem__ __len__ __setitem__ __detitem__ There is a note in the which is interesting, so we are going to include it here verbatim: documentation Implementation note: Some of the mixin methods, such as , and , make repeated calls to the underlying method. Consequently, if is implemented with constant access speed, the mixin methods will have linear performance; however, if the underlying method is linear (as it would be with a linked list), the mixins will have quadratic performance and will likely need to be overridden. __iter__() __reversed__() index() __getitem__() __getitem__() Part 3: Chainable Methods We are going to shift topics away from list-like objects now. Don't worry, everything will come together in the end. Let's make another useless class. self._count = self._count += c = Counter() c.increment() c.increment() c.increment() c : class Counter : def __init__ (self) 0 : def increment (self) 1 : def __repr__ (self) return f"<Counter: >" {self._count} # <<< <Counter: 3> Nothing surprising here. It would be nice if we could make the calls chainable, i.e., if we could do: .increment c = Counter().increment().increment().increment() c # <<< <Counter: 3> The easiest way to accomplish this is to have .increment() return the object itself: Counter class Counter: def __init__(self): self._count = 0 def increment(self): self._count += 1 def __repr__(self): return f"<Counter: {self._count}>" + return self However, this is not advisable. Here is an from Guido van Rossum (the creator of Python) from 2003: email I'd like explain once more why I'm so adamant sort() shouldn't 'self'. This comes a coding style (popular various other languages, I believe especially Lisp revels ) a series side effects a single object can be chained like this: x.compress().chop(y).sort(z) which would be same x.compress() x.chop(y) x.sort(z) I find chaining form a threat readability; requires reader must be intimately familiar each methods. The form makes clear each these calls acts same object, so even you don't know methods very well, you can understand call are applied x ( all calls are made their side-effects), something . I'd like reserve chaining operations new values, like processing operations: y = x.rstrip( ).split( ).lower() There are a few standard library modules encourage chaining side-effect calls (pstat comes mind). There shouldn't be any new ones; pstat slipped filter when was weak. to that return from in in it where of on the as the to it that the with of the second it that of on the and if the class and its that the second and third to and that for and not to else to for that return string "\n" ":" that of to through my it --Guido van Rossum (home page: http://www.python.org/~guido/) Here is how I interpret this. If someone reads this snippet: obj.do_something() they will assume that : .do_something() mutates obj in some way, and/or has an interesting side-effect probably returns None When they read this snippet: obj2 = obj1.do_something() they will assume that: does not change in any way .do_something() obj1 will have a new value, either a different type (eg a result status) or a slightly mutated copy of obj2 obj1 These assumptions break down when methods : return self c1 = Counter().increment() c2 = c1.increment() c1 c2 c1 == c2 # <<< <Counter: 2> # <<< <Counter: 2> # <<< True Someone not familiar with the implementation of would assume that would hold the value . Counter c1 1 How do we this? My suggestion is: make the class's initializer accept any optional arguments required to fully describe the instance's state. Then, chainable methods will return a new instance with the appropriate, slightly changed, state. fix class Counter: def increment(self): def __repr__(self): return f"<Counter: {self._count}>" - def __init__(self): - self._count = 0 + def __init__(self, count=0): + self._count = count - self._count += 1 - return self + return Counter(self._count + 1) Let's try it out: c1 = Counter().increment() c2 = c1.increment() c1 c2 c1 == c2 # <<< <Counter: 1> # <<< <Counter: 2> # <<< False It might be a little better if we also do this: class Counter: def __init__(self, count=0): self._count = count def increment(self): def __repr__(self): return f"<Counter: {self._count}>" - return Counter(self._count + 1) + return self.__class__(self._count + 1) so that works for subclasses of . .increment() Counter We essentially made the objects , unless someone changes the attribute by hand. Counter immutable "private" _count Part 4: Bringing Everything Together It's now time to build something actually useful. Let's consume an API and access the responses like lists. We are going to use the . Let's start with a snippet: Transifex API (v3) os requests HOST = response = requests.get( self.HOST + url, headers={ : , : }, ) response.raise_for_status() self.data = response.json()[ ] import import : class TxCollection "https://rest.api.transifex.com" : def __init__ (self, url) 'Content-Type' "application/vnd.api+json" 'Authorization' f"Bearer " {os.environ[ ]} 'API_TOKEN' 'data' organizations = TxCollection( ) organizations.data[ ][ ][ ] "/organizations" 0 'attributes' 'name' # <<< 'diegobz' Now let's make this behave like a list: import requests HOST = "https://rest.api.transifex.com" def __init__(self, url): response = requests.get( self.HOST + url, headers={'Content-Type': "application/vnd.api+json", 'Authorization': f"Bearer {os.environ['API_TOKEN']}"}, ) response.raise_for_status() -import os +import os, reprlib, collections -class TxCollection: +class TxCollection(collections.abc.Sequence): - self.data = response.json()['data'] + self._data = response.json()['data'] + def __getitem__(self, index): + return self._data[index] + + def __len__(self): + return len(self._data) + + def __repr__(self): + result = ", ".join((reprlib.repr(item['id']) for item in self)) + result = f"<TxCollection ({len(self)}): {result}>" + return result organizations = TxCollection( ) organizations organizations[ ] "/organizations" # <<< <TxCollection (3): 'o:diegobz', 'o:kb_org', 'o:transifex'> 2 # <<< {'id': 'o:transifex', # ... 'type': 'organizations', # ... 'attributes': { # ... 'name': 'Transifex', # ... 'slug': 'transifex', # ... 'logo_url': 'https://txc-assets-775662142440-prod.s3.amazonaws.com/mugshots/435381b2e0.jpg', # ... 'private': False}, # ... 'links': {'self': 'https://rest.api.transifex.com/organizations/o:transifex'}} What is interesting here is that we that our class is a legitimate readable list-like object because we fulfilled the requirements we set in : we subclassed from and implemented the and methods. know Part 2 collections.abc.Sequence __getitem__ __len__ Now, if you are familiar with Django querysets, you will know that you can apply filters to them and that their evaluation is applied , i.e. evaluated on demand, after the filters have been set. Let's try to apply this logic here, first by making our collections lazy: lazily import os, reprlib, collections import requests class TxCollection(collections.abc.Sequence): HOST = "https://rest.api.transifex.com" def __init__(self, url): response = requests.get( headers={'Content-Type': "application/vnd.api+json", 'Authorization': f"Bearer {os.environ['API_TOKEN']}"}, ) response.raise_for_status() self._data = response.json()['data'] def __getitem__(self, index): return self._data[index] def __len__(self): return len(self._data) def __repr__(self): result = ", ".join((reprlib.repr(item['id']) for item in self)) result = f"<TxCollection ({len(self)}): {result}>" return result + self._url = url + self._data = None + def _evaluate(self): + if self._data is not None: + return - self.HOST + url, + self.HOST + self._url, + self._evaluate() + self._evaluate() organizations = TxCollection( ) organizations "/organizations" # <<< <TxCollection (3): 'o:diegobz', 'o:kb_org', 'o:transifex'> Our evaluation: lazy Will only be triggered when we try to access the collection like a list Will abort early if the collection has already been evaluated To drive point 1 home, I will point out that our method (the one that was called when we typed into our python terminal) does explicitly trigger an evaluation, but triggers it nevertheless. The for item in self part in its first line will start an iteration, which will call (as we saw in ), which will trigger the evaluation. Even if it didn't, the part in the second line would also trigger the evaluation. __repr__ organizations <ENTER> not __getitem__ Part 1 len(self) Playing with metaprogramming, which in this context means making things behave like things that they are not, can be tricky, dangerous and cause bugs, as anyone who has played with and ran into RecursionErrors can attest to. This is the beauty of the conclusion from : we want to make behave like a list and we know which parts of the code trigger that behavior: and . That's the parts we need to add our lazy evaluation to in order to be 100% confident that will properly behave like a readable list. __setattr__ Part 2 TxCollection exactly __getitem__ __len__ only TxCollection Now let's apply filtering. We will intentionally do it the way, by returning self, so that we can see the flaws outlined in in the context of this example. Then we will fix it. wrong Part 3 class TxCollection(collections.abc.Sequence): HOST = "https://rest.api.transifex.com" def __init__(self, url): self._url = url self._data = None def _evaluate(self): if self._data is not None: return response = requests.get( self.HOST + self._url, headers={'Content-Type': "application/vnd.api+json", 'Authorization': f"Bearer {os.environ['API_TOKEN']}"}, ) response.raise_for_status() self._data = response.json()['data'] # def __getitem__, __len__, __repr__ + self._params = {} + params=self._params, + def filter(self, **filters): + self._params.update({f'filter[{key}]': value + for key, value in filters.items()}) + return self Let's take this out for a spin: TxCollection( ).\ filter(resource= , language= ) "/resource_translations" "o:kb_org:p:kb1:r:fileless" "l:el" # <<< <TxCollection (3): 'o:kb_org:p:k...72e4fdb0:l:el', # ... 'o:kb_org:p:k...e877d7ee:l:el', # ... 'o:kb_org:p:k...ed953f8f:l:el'> (Note: There are some Transifex-API-v3-specific things here, like how filtering is applied and what the IDs of the objects look like, that you don't have to worry about. If you are interested, you can check out the documentation ) And now let's demonstrate the flaw we outlined in Part 3: c1 = TxCollection( ).\ filter(resource= , language= ) c2 = c1.filter(translated= ) c1 c2 c1 == c2 "/resource_translations" "o:kb_org:p:kb1:r:fileless" "l:el" "true" # <<< <TxCollection (1): 'o:kb_org:p:k...72e4fdb0:l:el'> # <<< <TxCollection (1): 'o:kb_org:p:k...72e4fdb0:l:el'> # <<< True We know from our previous run that should have a size of 3, but it got overwritten when we applied to it. c1 .filter() Also, c1 = TxCollection( ).\ filter(resource= , language= ) _ = list(c1) c2 = c1.filter(translated= ) c1 c2 c1 == c2 "/resource_translations" "o:kb_org:p:kb1:r:fileless" "l:el" "true" # <<< <TxCollection (3): 'o:kb_org:p:k...72e4fdb0:l:el', # ... 'o:kb_org:p:k...e877d7ee:l:el', # ... 'o:kb_org:p:k...ed953f8f:l:el'> # <<< <TxCollection (3): 'o:kb_org:p:k...72e4fdb0:l:el', # ... 'o:kb_org:p:k...e877d7ee:l:el', # ... 'o:kb_org:p:k...ed953f8f:l:el'> # <<< True We forced an evaluation we applied the second filter (with ), so the second filter was ignored, in both and . before _ = list(c1) c1 c2 To fix this, we will do the same thing we did in : we will add optional arguments to the initializer that describe the whole state of a object and have return a slightly mutated copy of self. Part 3 TxCollection .filter() class TxCollection(collections.abc.Sequence): HOST = "https://rest.api.transifex.com" self._url = url self._data = None # def _evaluate # def __getitem__, __len__, __repr__ - def __init__(self, url): + def __init__(self, url, params=None): + if params is None: + params = {} - self._params = {} + self._params = params - def filter(self, **filters): - self._params.update({f'filter[{key}]': value - for key, value in filters.items()}) - return self + def filter(self, **filters): + params = dict(self._params) # Make a copy + params.update({f'filter[{key}]': value + for key, value in filters.items()}) + return self.__class__(self._url, params) (Note: we didn't set params={} as the default value in the initializer because you shouldn't use mutable default arguments ) c1 = TxCollection( ).\ filter(resource= , language= ) c2 = c1.filter(translated= ) c1 c2 c1 == c2 "/resource_translations" "o:kb_org:p:kb1:r:fileless" "l:el" "true" # <<< <TxCollection (3): 'o:kb_org:p:k...72e4fdb0:l:el', # ... 'o:kb_org:p:k...e877d7ee:l:el', # ... 'o:kb_org:p:k...ed953f8f:l:el'> # <<< <TxCollection (1): 'o:kb_org:p:k...72e4fdb0:l:el'> # <<< False Works like a charm! We concluded by saying that the class we made creates immutable objects, which is why it is safe to use chainable methods on them. What is interesting here is that objects are immutable. So, how do we ensure that implementing chainable methods is safe? The answer is that the state of a consists of two parts: Part 3 TxCollection not TxCollection The and attributes that immutable. _url _params are The attribute which is dynamic. : it will only be evaluated and it has a relationship with the immutable parts. The way for to be evaluated differently is to change and , which can only happen if we make a mutated copy of the original object via _data But once deterministic only _data _url _params .filter() Conclusion I hope this has been interesting. You can write powerful and expressive code with what is explained here, hopefully without introducing bugs. (Authored by Konstantinos Bairaktaris)