CPython Lists, Explained Like You’re the Interpreter

In this article you’ll learn how CPython lists actually work under the hood: In this article you’ll learn how CPython lists actually work under the hood: how a static array model explains fast indexing and “no holes” behavior why contiguity matters, and how indexing reduces to one computation what really happens when the list grows: allocate → copy → repoint how size vs capacity drives CPython’s over-allocation strategy why append() is amortized O(1), even though resizes are sometimes O(N) why insert() / delete-from-middle are O(N) due to shifting how the list is structured (metadata “head” + contiguous “body” of references) how to answer common list interview questions from first principles (not memorization) how a static array model explains fast indexing and “no holes” behavior why contiguity matters, and how indexing reduces to one computation what really happens when the list grows: allocate → copy → repoint how size vs capacity drives CPython’s over-allocation strategy why append() is amortized O(1), even though resizes are sometimes O(N) append() why insert() / delete-from-middle are O(N) due to shifting insert() how the list is structured (metadata “head” + contiguous “body” of references) how to answer common list interview questions from first principles (not memorization) Before we start, one note: everything below began as a personal note. I wrote it while prepping for interviews. But the deeper I dug into CPython and the more I looked under Python’s hood, the clearer the big picture became. It’s like a puzzle: you don’t just assemble it you first have to dig up the pieces. I won’t drag out the intro. I just want to say this: the info below helped me (and a few friends I shared it with) do more than just memorize interview answers about lists and arrays. It helped me understand how this data structure actually works. And once you understand the “why,” you can answer almost any interviewer question without cramming. I’m rewriting my note into an article for two reasons: to organize what I know, and to help readers understand the topic. If this helps even one person, then I’ve done what I wanted. Static arrays To fully understand how lists work in CPython’s standard implementation, we need to understand static arrays first. It might sound scary. It’s not. This data structure is very simple once you get a few basic principles. One thing to clarify: Python doesn’t have “static arrays” as a built-in data structure. We’re going one level down, into C, because CPython is written in C. No C code here though. We don’t need it. We just need the idea. If you’re wondering why you should learn arrays when you came here for lists: because Python’s standard list is basically a static array with a couple of extra settings. (In CPython, a list is a dynamic array, but a dynamic array is just a modified static array.) Let’s go step by step. If we ignore the formal definition, a static array is a contiguous chunk of memory. Imagine memory as a grid of cells. An array is a sequence of, say, 3–4 cells sitting next to each other. Each cell can hold some data type, like int or str, but in a typed array the elements have the same size (usually the same type). So if the array is declared for numbers, the next slot can’t suddenly become a string. If it did, address arithmetic would break. int str but in a typed array the elements have the same size (usually the same type) And the last property: you can’t change an array’s size. That’s basically the full list of important properties. But like I promised, we’re not just memorizing facts here. We’re going to understand how it works. So for each property, we’ll answer one question: “Why?” Why can’t you change an array’s size? Because the array’s memory gets allocated as one contiguous block at creation time. And you usually can’t “extend” that block in place. Let’s use the cell-grid analogy. We have a field of cells that’s our memory. We allocate 5 cells for our array. Then we allocate, say, 3 more cells right after it for another object. Now we realize we need more slots in the array. But right after the 5th cell, the 6th, 7th, and 8th cells are already taken. We can’t expand into occupied cells. And we can’t “skip” them either, because arrays have the second property: contiguity. Which leads to the next question. Why is an array a contiguous data structure? Short answer: because contiguity lets an array access any element instantly. Now let’s unpack how that works. First, lock in one fact: the array name stores the address of the first cell. In our cell analogy, imagine a field of 500 × 500 cells. The array starts at cell 100. Its length is 40 cells. So it begins at 100, and the end boundary is 140 (not included). That means the last occupied cell is 139. When we create the array and assign it to a variable, that variable holds the “100.” So it’s basically arr = [..] arr = 100. In C, the variable stores the address (a pointer) to the start of the array. In Python, a variable stores a reference to an object. I’m using the “address” model here so it’s clear where fast index access comes from. Good. arr = [..] arr = 100 Now why do we care? Back to the cells: assume 1 cell = 1 byte. Our array of 40 cells takes 40 bytes. Let’s assume an int takes 4 bytes in our example (in practice it’s often 4, but it can vary). That means our array can store exactly 10 numbers, each taking 4 cells (bytes). int Now watch closely: we know the array’s base address, and we know the size of each element. How do we access, say, the 5th element in one step without iterating? We do: arr (the base address) + index * object_size (4 bytes/cells). arr index * object_size Address = Base + (Index * Size) Address = Base + (Index * Size) So for the 5th element (index 4): 100 + (4 * 4) = 116. Instant access by index in one operation. Just a simple calculation any kid can do instead of walking through the array. Simple and brilliant. And it works because the array is contiguous: every cell (and every element) sits right after the previous one. Without contiguity, nothing above would work. At this point you can probably guess the answer to the last question. Why do array elements have the same size (usually the same type)? We already saw that fast index access only works if every element has the same size (the same number of “cells”). If an intis 4 bytes in C, then a char is one character (usually 1 byte). And there’s also the “+1 byte” that applies not to a char, but to a C-string (an array of char) that ends with the '\0' character (what that is doesn’t matter right now; that’s a different article). So "hi" takes 3 bytes, and "hello" takes 6 bytes. int char char char '\0' "hi" "hello" So if each element in the array had a different size, the runtime simply wouldn’t be able to compute the correct address for instant access. By the way, if it was ever unclear why indices in most programming languages start at 0 instead of 1, now you know. An index is basically an offset from the start, not a “human” ordinal number. It’s an offset measured in elements: index 0 is the first element, index 4 is the fifth element, and so on. Now let’s quickly go through basic operations on a static array, and then go back up to Python and dynamic arrays. It’s worth mentioning that people often use the “size vs capacity” idea: allocate more memory than you need right now so you resize less often as the array grows. We take 10 cells, but “paint” (fill with elements) only 5 of them, for example. You end up with something like:[10, 20, 30, 40, 50, _, _, _, _, _].Capacity 10, but size 5. We’ll keep using these two variables. [10, 20, 30, 40, 50, _, _, _, _, _] Appending to the end (append): Appending means inserting a new object into the next free slot. There are two scenarios: when there are free slots ([10, 20, 30, 40, 50, _, _, _, _, _]) and when the array is full ([10, 20, 30, 40, 50, 60, 70, 80, 90, 100]). Let’s walk through both. Start with the first case. [10, 20, 30, 40, 50, _, _, _, _, _] [10, 20, 30, 40, 50, 60, 70, 80, 90, 100] Step 1: Before appending, the runtime must make sure we won’t go past allocated memory, so it checks if size < capacity. In our example, 5 < 10 is True. So we move to step 2. (If it’s False, we’ll cover that next.) Step 1: if size < capacity Step 2: Compute the append address. We already know how the array reaches any element. The first free slot is the same idea. If the array has capacity 10 and each slot is 4 bytes, then we reserved 40 bytes total. Step 2: Once you look at it, you’ll notice something: size is always equal to the index of the first free slot. Let’s confirm. size Say the array variable stores 1000 (you remember what that means: the memory address where the array starts). Total length is 40 bytes, but only 5 slots are occupied, each 4 bytes. We need the 6th slot. But size 5. So: size 5 1000 + (5 * 4) = 1020. This can feel confusing because indexing starts at 0, but the table below makes it obvious: Hopefully it’s clearer now. The runtime doesn’t need to “search.” It jumps straight to the right place using size. size Step 3: Write the value into that slot. Step 3: arr[size] = 60; (or arr[5] = 60;). That’s it. No three paragraphs here because the operation is as simple as it gets. arr[size] = 60; (or arr[5] = 60;) Step 4: Commit the change by increasing the “filled slots” counter. If you don’t do this, the next read will still think the array has only 5 elements and it won’t “see” 60. Step 4: size = size + 1. size = size + 1. Done. This is O(1), constant time, because we don’t scan the array to find the first free slot. We jump directly to it. Now case #2: the array is full. There’s no space left in the pre-allocated memory. The fix is simple: the runtime creates a new array 1.5–2× bigger and moves all values there. Let’s break it down. Step 1: Just like before, the runtime (data structure code + memory allocator) checks size < capacity. If True, we use the first scenario. If False, we go to the next step. Step 1: the runtime (data structure code + memory allocator) size < capacity Step 2: The runtime needs a new memory block that’s bigger than the old one. Usually it grows by 1.5–2×. Step 2: How exactly that growth factor gets chosen is a really interesting topic. It’s also something that splits developers into two camps. But that’s for another time. What matters here is why it’s not a constant step like “+100 slots each time.” This might seem obvious, but I’ll still spell it out with an example. If you grow capacity in small fixed steps (like +100), going from 1,000,000 to 2,000,000 requires a lot of resizes. And every resize means copying elements. With multiplicative growth (×1.5/×2), you do far fewer resizes. So the whole series of appends stays efficient overall. Step 3: After the runtime knows how much larger the new block must be, it finds a suitable free region. If the old array started at 1000, maybe the new one starts at 3000. It creates a new array there with a capacity 1.5–2× bigger (for simplicity, let’s say ×2). At this point it’s empty: size 0, even though capacity old capacity * 2. Step 3: size 0 capacity old capacity * 2 Step 4: The most expensive part is copying. The runtime copies every object from the old array into the new one, in order. Step 4: Copy arr[0] (from 1000) -> new_arr[0] (to 3000). Copy arr[1] (from 1004) -> new_arr[1] (to 3004). And so on. This is O(N), linear time, because N can be 5 or 5 million. arr[0] new_arr[0] arr[1] new_arr[1] Step 5: After copying everything, the runtime frees the old array’s memory to avoid a leak. Step 5: Step 6: The runtime updates the variable pointer. arr must forget 1000 and store 3000 instead. We also update capacity to the new value. Step 6: arr capacity Step 7: Now we have a fresh array with free space. We can append the new element and then update size. The append steps are exactly the same as in the first case. Step 7: size Insertion (insert) Insertion differs from write because write always puts the new element at the end. Insertion means we want to insert at the start or somewhere in the middle. In short, we need to free the slot where we want to insert. To free it, we have to move the existing element somewhere else. In our case, “somewhere else” can only be the next slot. If that slot is occupied too, we move that one as well, and so on. You get the idea. Now step by step. Step 1: The runtime uses size to decide what to do: if index (where we want to insert) = size, then it’s basically insertion at the end (like append). In our case, index = size index < size Step 2: For example, let’s insert 99 at index 2. Step 2: Array: [10, 20, 30, 40, 50, _, _] (capacity=7, size=5). [10, 20, 30, 40, 50, _, _] The shift always starts from the end. We run a loop from the last element (Size - 1) down to the insertion point (Index). The runtime moves each element one slot forward, and you get something like: Size - 1 Index [10, 20, 30, 40, 50, 50, _] (yes, 50 appears twice temporary) [10, 20, 30, 40, 50, 50, _] [10, 20, 30, 40, 40, 50, _] ...and again, until: [10, 20, 30, 40, 40, 50, _] [10, 20, 30, 30, 40, 50, _]. We reached index 2. The value that used to be at index 2 has already been copied to the next slot, so the data is safe. Now we can overwrite index 2 with what we want. [10, 20, 30, 30, 40, 50, _] Step 3: Write 99 into index 2. Step 3: Step 4: Update size. Step 4: size size = size + 1 size = size + 1 Done. Insertion is complete. As you can see, we have to touch a bunch of elements, so insertion is O(N). You might ask: what if the array is already at full capacity when we need to insert? Then we resize first, then shift. And yes, that’s exactly how CPython does it. capacity One last key operation. Deletion. Honestly, if you’ve made it this far, you can probably explain deletion without me. Once you understand the array model, it becomes intuitive. But I still need to lay it out clearly. Delete from the end: decrement size and remove the last reference that’s O(1). size If we want to delete any element other than the last, it’s the same as insertion, but in reverse. [10, 20, 99, 30, 40, 50, _] delete 99 at index 2. [10, 20, 99, 30, 40, 50, _] [10, 20, 30, 30, 40, 50, _] we overwrite the unwanted slot to keep the array consistent. [10, 20, 30, 30, 40, 50, _] [10, 20, 30, 40, 50, 50, _] [10, 20, 30, 40, 50, 50, _] [10, 20, 30, 40, 50, _, _] and then we “cut off” the extra element using the method I described literally one paragraph above. [10, 20, 30, 40, 50, _, _] Deleting from the middle/start requires shifting the tail to close the “hole,” so it’s O(N). At this point, that’s everything you need to know about a static array. Now we can go one level up, to the dynamic array in CPython. List in Python Unlike a static array, a Python list doesn’t have the same restrictions. A list doesn’t have a fixed size. We can add objects without doing a manual resize. The objects the list refers to can live anywhere in memory. But inside the list, there’s a contiguous array of references to those objects. That’s why index access is fast. And that’s why we can mix anything inside a list: from plain int values all the way to functions. int A fair question might pop up now: “You said a list is built on a static array, so why did we spend time learning arrays?” I’ll explain. List anatomy Let’s start with what’s inside a list its “anatomy.” Internally, a list has two blocks that people usually call the “Head” and the “Body.” The head is basically a struct. Without diving into C types, think of it as a strict class with a fixed set of attributes: you can’t add new fields, and its memory size is fixed. struct The head contains 3 variables (in reality there are some service fields too, but they don’t matter for us right now). We’ve already seen these ideas: ob_size is the same as size in a static array the actual length (the occupied slots). When you call len() on a list, you get ob_size. allocated is the same as capacity how many slots are reserved for future growth. And the third variable is ob_item. It stores a reference to the “Body.” In other words, it’s a pointer to the start of the static array. This is exactly like the arr variable in our static-array discussion, where we stored addresses like 1000 or 3000. And this is the value that changes on each resize. ob_size size len() ob_size allocated capacity ob_item arr Now the “Body” is a contiguous array of references with the same size/capacity idea we already covered. While there’s spare capacity, appending to the end is fast. When space runs out, the array grows and elements get moved. Same logic as the array, just with automatic resizing. You might ask: how can we store different data types if a static array can only store one type? Easy. This array stores references (this is important!) to arbitrary objects. The array itself stores one type: references. When we index into the list, the interpreter uses the same formula we discussed to find the right reference in the body. Then it follows that reference to return the object. The objects themselves can sit anywhere in memory, totally scattered. They don’t need to be contiguous because the interpreter just jumps to them using references. But the list’s internal array of references is still contiguous. Don’t mix those two ideas up. We can also assign a reference to another reference. So if a = 10 and b = a, then b refers to the same object as a (so evaluating b gives 10). Since we’re dealing with references, b points to the same value as a. Not “another 10,” but the same one. I’m keeping this short because variables and types in Python are a big separate topic. a = 10 b = a b a b 10 b a All operations like delete, insert, and index access work the same way we already discussed. Like I keep repeating: the “Body” is a static array, so the mechanics are identical. Also, list is an ordered collection. Everything you add goes to the end by default. Unless you explicitly insert into the middle or the beginning. That’s why I said “by default.” These are two different operations. list insert Assignment by index can’t “create a hole.” You can’t do lst[7] = x if the list length is 5 the list won’t grow. But insertwith an index that’s too large just appends to the end (no holes). Can you guess why? Because the references live in a static array, and as we already learned, it must be contiguous — no gaps. lst[7] = x insert Resize A few words about resizing. Conceptually it’s the same mechanics. What changes is the growth strategy. I’ll say a bit about it, but don’t stress about memorizing the formula or trying to compute anything by hand. Just understand the idea. In my experience, no interviewer has ever demanded the exact formula. It looks like this: Important: below is a simplified formula that shows CPython’s strategy: “over-allocate a bit” so append is fast in amortized terms. Different CPython versions and different growth steps have extra details (like allocation alignment and small adjustments), so the exact capacity numbers may differ slightly, but the idea is the same. Important: simplified append the exact capacity numbers may differ slightly New_Capacity = New_Size + (New_Size >> 3) + (3 if New_Size > 3) + (3 if New_Size > 3 is a right shift by 3. Mathematically, it’s integer division by 8. Meaning: on resize, the list grows by roughly 12.5%, not by 1.5–2× like in a static array. If that still feels unclear: 1/8 = 0.125. New_Size >> 3 More precisely: the new_size/8 part gives “roughly +12.5%,” but the final growth isn’t fixed. The +3/+6 constant and rounding/alignment affect it. So at small sizes the growth is much more aggressive, and at large sizes it trends closer to ~1.125×. new_size/8 but the final growth isn’t fixed +3 or +6 is the constant. It’s there so small lists grow faster than ~12%. Without it, going from 0->1 would just become 1, and you’d resize constantly. I’ll show real numbers so it’s easier to see. Capacity can grow like this: 0-4-8-16-25-35-46-58-72-88... +3 +6 That sequence is an illustration for a specific implementation/version (historically you can find exactly this growth). In modern CPython, the sequence often looks different (for example, 0, 4, 8, 16, 24, 32, 40, 52, 64, 76, ...) because of extra rules like allocation alignment, but the core idea is the same: rare expensive growth + lots of cheap append. an illustration for a specific implementation/version 0, 4, 8, 16, 24, 32, 40, 52, 64, 76, ... but the core idea is the same append As you can see, early on (0 to 16) it doubles. Later the growth slows down. And at larger sizes it tends toward 1.125 (12.5%). Why this exists, in a bit more detail: At large sizes say a list already takes 1 GB growing by ×2 to 2 GB would be too wasteful. But +12.5% sounds much more reasonable. The constant 6 also helps at small sizes. It prevents the interpreter from constantly resizing just to add 1–2 extra slots. Instead, it grabs 6 right away. Even if 5 of them stay empty, you win because you avoid frequent allocator calls. One more resize fact in Python: unlike many other languages, Python may shrink the list if the actual size becomes significantly smaller than the allocated capacity (a common rule of thumb is “less than half”). This can happen after you delete a bunch of elements. significantly smaller And the last thing you need to know about resizing. This part can feel confusing: we learned that expansion forces copying all elements, which is O(N). But in Python (and other languages), people say that appending to the end has amortized O(1) complexity. appending to the end has amortized O(1) complexity What does that mean? The expensive resize happens rarely (for example at 4, 8, 16 elements, etc.). All other appends (when there’s still space: the 5th, 6th, 7th elements) happen instantly in O(1). If you take the “cost” of one expensive resize and spread it across all the cheap inserts that happened before it, the average cost per append stays low. That’s amortized complexity. amortized complexity Summary This should be enough knowledge to answer almost any list/array question for a junior/middle role, and maybe even senior in some companies. I’m not going to list all list methods and functions because there are thousands of descriptions online already. Here I wanted to concentrate the theory I’ve seen across different sources, and I really hope it helps you. list To wrap up, I want to bring up a few popular interview questions and try to answer them. I won’t lean on random facts pulled out of context. I’ll answer at the understanding level, using what we already covered in this article. One note: some interview questions focus on how list methods behave, and we obviously won’t cover those here. Like I said (again), this article targets theory. (If you want an article with the same level of detail about how list functions work, plus breakdowns of the most common cases let me know in the comments.) Let’s begin. Are Python lists a mutable data type? A very common question. Did we cover the answer? I didn’t literally write “a Python list is mutable,” but we talked about insertion, deletion, and appending. Do I really need to explain that if list supports those operations, it’s mutable? list What data types can a list store? Did I answer this? Yes and no. We discussed that a list can hold different data types at the same time because it stores not the values themselves, but references to them. So technically it stores one “type” of thing references but each reference can point to a totally random data type. Are Python lists ordered collections? Yes. I spent a small paragraph on that, plus a whole block on static arrays to explain why. What happens at the reference level if you assign an existing list to a new variable (for example, list_a = list_b)? list_a = list_b We covered that too. Both variables point to the same object in memory because assignment in Python copies only the reference. Operation complexity And finally, a popular group of questions is about time complexity of list operations. We went through each one in detail, but to summarize: append is O(1), insert into the middle or beginning is O(N), deleting the last element is O(1), and deleting from the middle or beginning is O(N).