Mastering Python Variables: Deep Dive into Memory, Mutability, and Beyond

Welcome to my blog post where we will dive into the world of Python variables. If you are new to programming, don't worry, I have got you covered! In my , I provided a refresher on the basics of Python programming language. previous blog post Now, we will move on to the next level and take a closer look at variables in Python. Variables are one of the fundamental concepts in programming and mastering them is essential for writing efficient and effective code. So, let's get started and explore Python variables in-depth! Table of Contents Memory Variable Reference Counting getrefcount() ctypes Garbage Collection Object Mutability Immutable Mutable Mutable Datatype Within Immutable Datatype Function Arguments and Mutability Immutable Objects as Arguments Mutable Objects as Arguments Mutable Objects inside Mutable Objects as Arguments Shared References and Mutability Python Interning Integer Interning String Interning Variable Equality Everything is an Object Conclusion Memory You can think of memory as a set of blocks where each block has a unique address. Think of it like a real-world example where each house on a street has a unique address. In the same way, each block has a unique address. Now, let's dive into variables. Variable ? What happens when you write a = 5 Python creates an object in memory in some address, let's say . 0x1000 In this object, the value 5 is stored. Here, a is you can think of an alias for the memory address . 0x1000 Here, doesn't represent the value instead it refers to the memory address and the address refers to the data stored in the object and the data is . a 5 0x1000 0x1000 5 To find out the memory address of the object that the variable is referencing, you can use the function. id() # declared a variable a and stored a value 10 a = 10 # printing the value, it's decimal memory address and it's hex memory address print(a) print(id(a))) print(hex(id(a))) # ---------------------- OUTPUT ------------------ # 10 4376986128 0x104e38210 Here, I declared a variable with a value of 10. Let's understand what happened under the hood. a First, python created an object at some memory address, let's say . 0x1000 In that object, python puts the value 10. Finally, variable refers to the memory address that holds the object with the value 10. In the above code, I have printed out the value, the decimal format of the address that the variable is referring to and the hexadecimal format of the address. a 0x1000 a Let's take a look at another example. s = "hello" print(s) print(id(s)) print(hex(id(s))) # ------------------------- OUTPUT --------------------- # hello 4702080944 0x118440fb0 First, python created an object at some memory address with the value . 'hello' Then, the variable refers to the memory address that holds the object with the value . In the above code, I have printed out the value, the decimal format of the address that the variable is referring to and the hexadecimal format of the address. s 'hello' a Reference Counting So, we just learned how the variables are referencing a memory address where an object is stored. We can count how many variables are pointing to that same memory address. Let's say we declared a variable and let's say the memory address where the object gets created is . a = 5 0x1000 Then the reference count to that memory address is 1. Let's say we declared another variable , where is not getting assigned to a value instead is referencing the variable which in turn references the memory location . b = a b 5 b a 0x1000 Hence, two variables are pointing to the memory address . 0x1000 So, the reference count of the memory address is 2. 0x1000 Let's say b got removed either is out-of-scope or maybe gets assigned to a different memory location, then the reference count goes to 1. b Let's say also got removed in one of the above ways, then the reference count goes to 0. a At this point, the Python memory manager recognizes this and throws away the object that was there in that memory location. Finally, the space is freed. module has a function that can be used to get the reference count. sys getrefcount() This takes one parameter the variables, but the downside is that it also adds one reference count to that object. There's also another way using module. ctypes getrefcount() import sys # delcared a list with respective values, print it's id and then get the reference count lst_1 = [1,2,3] print(id(lst_1)) sys.getrefcount(lst_1) # --------------------- OUTPUT --------------------- # 4389242752 2 It says 2 as the method is also referencing the address, so the reference count increases, to get the actual reference count, just subtract 1 from the answer. getrefcount() ctypes # with `ctypes`, you can get the actual reference count as it takes the actual memory address and not the reference. import ctypes def ref_count(address): return ctypes.c_long.from_address(address).value # here you can see you get 1 as the reference count which is correct print(id(lst_1)) ref_count(id(lst_1)) # -------------------- OUTPUT ------------------ # 4389242752 1 Garbage Collection Previously, we learned that as soon as the reference count goes to 0, the Python memory manager destroys the object that's in the memory location and free's up the memory location. But this doesn't work always and one of the cases where this doesn't work is circular references. Let's think of a scenario where variable is referencing the variable and variable is referencing the variable . a b b c Now, let's say we delete the variable . a Now, the reference count of is 0 but the reference count of is 1. b c So, the second object will be destroyed, then the reference count of the third object will become 0 and it'll get destroyed too. Now, let's say the variable is also referencing the variable . i.e and after that, we removed variable . c b c = b a Now, both object has reference count = 1. Now, none of the objects are going to get destroyed as both have a reference count of 1 and this scenario is known as circular referencing. As python memory manager can't eliminate these objects and if we continue like this, this will result in a memory leak. Here, the Garbage collector comes to the rescue, it can handle this kind of issue. You can control the garbage collector programmatically using module. gc You can call it manually and even do your cleanup. import gc import ctypes # function to count the references def ref_count(address): return ctypes.c_long.from_address(address).value I imported the gc and ctypes modules and defined the reference count function to count the reference count. # this function will return if the given object_id is in the garbage collector or not def object_by_id(object_id): for obj in gc.get_objects(): if id(obj) == object_id: return "Object exists" return "Not found" This function will take the id of an object as an argument and then it'll return "Object exists" if the garbage collector has tracked that this object is in some circular reference else it'll return "Not found" i.e. the given object is not in any circular reference. # created two classes to illustrate the circular reference concept class A: def init(self): self.b = B(self) print('A: self: {0}, b:{1}'.format(hex(id(self)), hex(id(self.b)))) class B: def init(self, a): self.a = a print('B: self: {0}, a: {1}'.format(hex(id(self)), hex(id(self.a)))) Now, I created two classes A and B to illustrate the circular reference. : - This line defines a new class called A. class A : - This is the constructor of the A class. It is executed when a new instance of the class is created. def __init__(self) - This line creates a new instance of the B class and assigns it to the b attribute of the current instance of the A class. The self-argument passed to the B constructor is a reference to the current instance of the A class. self.b = B(self) - This line prints a message to the console. The message contains the hexadecimal representations of the memory addresses of the current instance of the A class and its b attribute. print('A: self: {0}, b:{1}'.format(hex(id(self)), hex(id(self.b)))) : - This line defines a new class called B. class B : - This is the constructor of the B class. It is executed when a new instance of the class is created. The argument is a reference to an instance of the A class. def __init__(self, a) a = a - This line assigns the argument to the attribute of the current instance of the B class. self.a a a - This line prints a message to the console. The message contains the hexadecimal representations of the memory addresses of the current instance of the B class and its attribute. print('B: self: {0}, a: {1}'.format(hex(id(self)), hex(id(self.a)))) a We disabled the garbage collector so that we can run it manually and also check the reference count. gc.disable() Now, we disabled the garbage collector, so that we can run it manually. # create an instance of class A my_var = A() # -------------- OUTPUT --------------- # B: self: 0x11953e8c0, a: 0x11953d8d0 A: self: 0x11953d8d0, b:0x11953e8c0 I created an instance of A. This prints out the ids of and . a b The id of and is the same. my_var a print('a: \t{0}'.format(hex(id(my_var)))) print('a.b: \t{0}'.format(hex(id(my_var.b)))) print('b.a: \t{0}'.format(hex(id(my_var.b.a)))) #----------------- OUTPUT ------------------# a: 0x119554e50 a.b: 0x11953e680 b.a: 0x119554e50 # created two variables to store the ids of a and b instances a_id = id(my_var) b_id = id(my_var.b) # These two variables are used to store the ids of a and b. # printing the refernce count of a and b # printing if the object is in garbage collector or not print('refcount(a) = {0}'.format(ref_count(a_id))) print('refcount(b) = {0}'.format(ref_count(b_id))) print('a: {0}'.format(object_by_id(a_id))) print('b: {0}'.format(object_by_id(b_id))) --------------------- OUTPUT -------------------- refcount(a) = 2 refcount(b) = 1 a: Object exists b: Object exists Here, I'm printing the reference count for and . a b I'm also checking if these objects are tracked by garbage collector or not. As you can see, the garbage collector tracked these two variables and returned "Object exists" for both of them as both of them are in a circular reference. Now, let's point to , so we'll only have a circular reference. my_var None my_var= None print('refcount(a) = {0}'.format(ref_count(a_id))) print('refcount(b) = {0}'.format(ref_count(b_id))) print('a: {0}'.format(object_by_id(a_id))) print('b: {0}'.format(object_by_id(b_id))) # ------------------ OUTPUT -------------------- # refcount(a) = 1 refcount(b) = 1 a: Object exists b: Object exists Here, you can see that the reference count of is decreased to 1 as we changed the to refer to . a my_var None gc.collect() print('refcount(a) = {0}'.format(ref_count(a_id))) print('refcount(b) = {0}'.format(ref_count(b_id))) print('a: {0}'.format(object_by_id(a_id))) print('b: {0}'.format(object_by_id(b_id))) --------------- OUTPUT ----------------- refcount(a) = 0 refcount(b) = 0 a: Not found b: Not found We enabled the garbage collector and then the garbage collector removed both the objects and you can see that both the objects are not found. Object Mutability Changing the data inside the object is called modifying the internal state of the object. An object whose internal state can be changed is called mutable otherwise it's immutable. Immutable data types in Python are: Numbers Strings Tuples Frozen Sets User Defined Classes Mutable data types in Python are: Lists Sets Dictionaries User-Defined Classes Now, let's see some examples of mutable and immutable datatypes and understand what happens under the hood of mutable and immutable datatypes when you change their values. Immutable Let's say we have a string . s = 'python' As we know, strings are immutable So, if in the next line, you'll write . s = 'hello' First, python will create another object at some different memory address with the value 'hello'. Then the variable will point to this new object's address. s After this, the previous object with the value 'python' will be destroyed as no one is referencing that object. So, the python memory manager will destroy the object and free up the space. s = 'python' print(s) print(hex(id(s))) # ----------------- OUTPUT ------------------ # python 0x10380b870 s = 'hello' print(s) print(hex(id(s))) # -------------- OUTPUT ------------------- # hello 0x106232470 As you can see both the addresses are different. Mutable Let's say we have a list . a = [1, 2, 3] As we know, lists are mutable i.e. elements can be inserted, deleted and replaced. When we write , python creates an object at some memory location let's say . a = [1, 2, 3] 0x1000 Now, points to the address where the list is stored. a 0x1000 Let's say you wrote , to append 4 in the list . a.append(4) a Unlike Immutable datatypes, python won't create a new object instead it will add the value 4 to the object that is stored in . 0x1000 # creating a list and printing out the list and it's address my_list = [1, 2, 3] print(my_list) print(hex(id(my_list))) # ------------------- OUTPUT ------------------ # [1, 2, 3] 0x11bf06340 # checking if the address is changed after modifying the list my_list.append(4) print(my_list) print(hex(id(my_list))) # ------------------ OUTPUT -------------------- # [1, 2, 3, 4] 0x11bf06340 You can see that the address remains the same. Let's take another example # creating a dictionary and printing the dictionary and it's address my_dict = {'key1': 1, 'key2': 2} print(my_dict) print(hex(id(my_dict))) # ------------------ OUTPUT --------------- # {'key1': 1, 'key2': 2} 0x11be9ea80 # checking if the address is changed after modifying the dictionary my_dict['key1'] = 10 print(my_dict) print(hex(id(my_dict))) #---------------- OUTPUT -----------------# {'key1': 10, 'key2': 2} 0x11be9ea80 As a dictionary is mutable, its address remains the same Mutable Datatype Within Immutable Datatype Let's take another tuple b = ([1,2,3], [4,5,6]) As we know, lists are mutable i.e. elements can be inserted, deleted or replaced. Here, we can modify the lists that are in the tuple, but we can't make nay changes to the tuple i.e we can't insert a new element to the tuple, we can't delete an element from the tuple, we can't delete element from the tuple. Tuple still has the same elements, but as the elements are mutable we can make changes to those elements. a = [1, 2] b = [3, 4] t = (a, b) print(hex(id(a))) print(hex(id(b))) print(hex(id(t))) #----------------- OUTPUT ----------------# 0x10699f400 0x106f1dbc0 0x106f1d540 a.append(3) b.append(5) print(t) print(hex(id(a))) print(hex(id(b))) print(hex(id(t))) ---------------- OUTPUT ---------------- ([1, 2, 3], [3, 4, 5]) 0x10699f400 0x106f1dbc0 0x106f1d540 Here, we only modified the lists and as they are mutable, their addresses haven't changed. We haven't modified the tuple, the tuple always had those two lists, we didn't replace, deleted, or inserted anything into this tuple, that's why its address also remains the same. Function Arguments and Mutability Let's see how the above concept of mutability and immutability affects function arguments. If the argument of the function is immutable, then it won't change. If the argument of the function is mutable, then it'll change. Let's take the help of an example and understand. Immutable Objects as Arguments Let's create a function that takes a string parameter. process(s) Remember, a string is an immutable object. # trying the similar thing with mutable and immutable objects but now the objects are passed as arguments to function def process(s): print('initial s # = {0}'.format(hex(id(s)))) s = s + ' world' print('s after change # = {0}'.format(hex(id(s)))) First, I printed out the memory address where the string object is stored. Then, I concatenated another string to string variable . world s As you already know, modifying immutable objects is not possible. So, first, python created another string object with this new concatenated value , then string variable points to the new object where the string is stored. 'hello world' s 'hello world' In the end, you can see that the memory address where the string variable is pointing to is now different than the previous address. s my_var = 'hello' print('my_var # = {0}'.format(hex(id(my_var)))) Here, I created a string called which is referencing a memory address that has an object with the value . my_var 'hello' Printing the memory address where the string variable is pointing. my_var process(my_var) Then, I called the function and passed as an argument. process(my_var) my_var print('my_var # = {0}'.format(hex(id(my_var)))) Now, you can see that the memory address of the string variable is still the same because is still pointing to the memory address where the string object with the value is stored. my_var my_var 'hello' Mutable Objects as Arguments Let's create a function that takes a list parameter. process(items) Remember, a list is a mutable object. def modify_list(items): print('initial items # = {0}'.format(hex(id(items)))) if len(items) > 0: items[0] = items[0] ** 2 items.pop() items.append(5) print('final items # = {0}'.format(hex(id(items)))) First, I printed out the memory address where the parameter is pointing to. items Then, I modified every element of that list, removed an element from that list and finally append 5 to that list. As you already know, modifying mutable objects is possible. So, first, python simply modified the object. In the end, you can see that the memory address where the list parameter is pointing to, is same as the previous address. items my_list = [2, 3, 4] print('my_list # = {0}'.format(hex(id(my_list)))) Here, I created a list and print out the memory address where the variable is pointing to. my_list my_list modify_list(my_list) Now, I called the function with variable as an argument. modify_list(my_list) my_list print(my_list) print('my_list # = {0}'.format(hex(id(my_list)))) Finally, I'm printing the variable , to check whether is modified or not. my_list my_list You can see that is modified and this makes sense as the list is mutable. my_list Mutable Objects inside Mutable Objects as Arguments Let's create a function that takes a tuple parameter. modify_tuple(t) Remember, a tuple is an immutable object while a list is a mutable object. def modify_tuple(t): print('initial t # = {0}'.format(hex(id(t)))) t[0].append(100) print('final t # = {0}'.format(hex(id(t)))) First, I printed out the memory address where the parameter is pointing to. t Then, I modified the first element of the tuple which is a list, here I appended the value 100. As you already know, modifying mutable objects is possible. So, first, python simply modified the list. In the end, you can see that the memory address where the tuple parameter is pointing to, is the same as the previous address. t a = [1,2,3] b = [10,20,30] my_tuple = (a,b) I created two lists and and then created a tuple containing these two lists. a b my_tuple hex(id(my_tuple)) Here, I'm printing the memory address where the tuple is pointing to. my_tuple modify_tuple(my_tuple) Now, I called the function with as an argument, that will modify the list in the tuple. modify_tuple(my_tuple) my_tuple a my_tuple You can see that the tuple's content remains the same i.e. there are two lists and but the list content/data is changed and it is possible as lists are mutable. a b a Shared References and Mutability Shared reference is the concept of two variables referencing the same object or same memory address. Let's say we create two variables and . a = 10 b = a Let's say that is pointing to the memory address . a `0x1000` So, is also pointing to that same address. b Hence, the reference count of that address is 2. Both variables refer to the same address my_var_1 = 'hello' my_var_2 = my_var_1 print(my_var_1) print(my_var_2) Here, I created two variables and . my_var_1 my_var_2 is pointing to a memory address where an object with the value is stored. my_var_1 'hello' is referencing which in turn points to the memory address where the object with the value is stored. my_var_2 my_var_1 'hello' So, is also referencing the same memory address as . my_var_2 my_var_1 Finally, I'm printing the values of both variables. print(hex(id(my_var_1))) print(hex(id(my_var_2))) Here, I'm printing the memory address that both of these variables are pointing to. # by modifying the address will change as string is immutable my_var_2 = my_var_2 + ' world!' As I modified , will point to some other location where this new object is stored. my_var_2 my_var_2 print(hex(id(my_var_1))) print(hex(id(my_var_2))) Now, you can see both the variables are pointing to different locations. The same thing will happen with mutable objects. # doing the same thing with mutable objects my_list_1 = [1, 2, 3] my_list_2 = my_list_1 print(my_list_1) print(my_list_2) print(hex(id(my_list_1))) print(hex(id(my_list_2))) Similarly, as above both these lists are pointing to the same location. # it'll change both the lists as list is mutable. my_list_2.append(4) Here, I'm modifying the list . my_list_2 print(my_list_2) print(my_list_1) You can see both the lists got modified as lists are mutable, so when I modified the object where the is pointing to, python didn't create another object instead python modified the same object. my_list_2 Due to this, both the lists are showing the modified list. print(hex(id(my_list_1))) print(hex(id(my_list_2))) You can see both lists are pointing to the same address even after modifying a list. Python Interning In Python, interning is a technique used to optimize the use of memory by storing the commonly used objects in a cache to avoid creating new objects each time they are needed. Two common types of objects that are interned in Python are integers and strings. Integer Interning Integer interning is the process of storing and reusing integer objects with values ranging from -5 to 256. When you create an integer object in this range, Python checks if it already exists in memory. If it does, it returns the reference to the existing object instead of creating a new one. This can improve the performance of Python programs by reducing the number of objects created and the amount of memory used. For example: a = 10 b = 10 a is b True In this example, both and are assigned the integer value . Since this value is within the range of interned integers, Python interns it and assigns the same object to both and . Therefore, returns . a b 10 a b a is b True String Interning String interning is the process of storing and reusing string objects with the same value. When you create a string object in Python, it is added to a cache of commonly used strings. If another string with the same value is created, Python returns a reference to the existing object instead of creating a new one. This can also improve the performance of Python programs by reducing the number of objects created and the amount of memory used. For example: a = 'hello' b = 'hello' a is b True In this example, both and are assigned the string . Since this string is commonly used, Python interns it and assigns the same object to both and . Therefore, returns . a b 'hello' a b a is b True It is important to note that interning is an implementation detail of the Python language and may vary depending on the Python interpreter being used. Therefore, it is recommended to rely on the operator to compare values of integers and strings instead of using the operator. == is Variable Equality We can compare variables in two ways, one way is Memory address and the other one is data/content inside the object. To compare memory addresses of variables we can use operator, which is known as an identity operator. is print("a is b: ", a is b) To compare the data/content of the objects, we can use operator, which is known as an equality operator. == print("a == b:", a == b) If you want to check if two variables memory addresses are not equal, then you can use operator. is not If you want to check if two variable's data/content is not equal, then you can use operator. != The object can be assigned to variables to indicate that they are not set in the way we would expect them to be. None For example, let's say we have a string s set to None, as we don't have any proper value, we just initialized the string with None. a = None print(type(a)) print(hex(id(a))) a is None # --------- OUTPUT ---------- # True b = None hex(id(b)) a is b a == b None object is a real object, that is managed by the Python memory manager. hex(id(None)) type(None) Python memory manager will always use a shared reference when assigning a variable to None. Everything is an Object In Python, everything is an object. This means that any value, variable, or function in Python is considered an object. An object in Python is a self-contained piece of code that has data and methods that can be accessed and manipulated. Objects are instances of classes, which are essentially blueprints that define the structure and behavior of the objects. For example, if you declare a variable in Python, such as: x = 42 The value is an object of the class, which means it has built-in methods and attributes that can be accessed and manipulated. 42 int Similarly, if you define a function in Python, such as: def my_function(): print("Hello, World!") The function is an object of the class, which means it can be passed around as a variable, returned from another function, or even assigned to a different name. my_function() function The concept of everything being an object in Python is a fundamental aspect of the language and is important for understanding how Python code is executed and how objects interact with one another. It also allows for powerful programming constructs such as dynamic typing, duck typing, and metaprogramming, which can make Python code more flexible and expressive. Any object can be assigned to a variable, so functions(as a function is an object too) can also be assigned to a function. Any object can be passed to a function, therefore functions can be passed to a function. Any object can be returned from a function, therefore functions can be returned from a function. Conclusion In conclusion, understanding variables and their behavior in Python is crucial for writing effective and efficient code. Variables are placeholders for data that are stored in memory, and their mutability determines whether they can be changed or not. Memory management is an important consideration when working with variables, as it can impact the performance of your code. Shared references can lead to unexpected results, so it is important to be aware of how they work. Finally, understanding function argument mutability can help you avoid errors when passing variables between functions. By keeping these concepts in mind, you can write better Python code and avoid common pitfalls. Also published here.