Mastering Python Variables: Deep Dive into Memory, Mutability, and Beyond

Written by dotslashbit | Published 2023/05/24
Tech Story Tags: python | python-programming | python3 | programming | programming-languages | python-development | python-basics | python-tutorials

TLDRIn this blog post, we will dive into the world of Python variables. Variables are one of the fundamental concepts in programming and mastering them is essential for writing efficient and effective code. Let's first take a look at memory. You can think of memory as a set of blocks where each block has a unique address. Now, let's dive into variables.via the TL;DR App

Welcome to my blog post where we will dive into the world of Python variables. If you are new to programming, don't worry, I have got you covered! In my previous blog post, I provided a refresher on the basics of Python programming language.

Now, we will move on to the next level and take a closer look at variables in Python. Variables are one of the fundamental concepts in programming and mastering them is essential for writing efficient and effective code.

So, let's get started and explore Python variables in-depth!

Table of Contents

  • Memory
  • Variable
  • Reference Counting
    • getrefcount()
    • ctypes
  • Garbage Collection
  • Object Mutability
    • Immutable
    • Mutable
    • Mutable Datatype Within Immutable Datatype
  • Function Arguments and Mutability
    • Immutable Objects as Arguments
    • Mutable Objects as Arguments
    • Mutable Objects inside Mutable Objects as Arguments
  • Shared References and Mutability
  • Python Interning
    • Integer Interning
    • String Interning
  • Variable Equality
  • Everything is an Object
  • Conclusion


Memory

You can think of memory as a set of blocks where each block has a unique address. Think of it like a real-world example where each house on a street has a unique address. In the same way, each block has a unique address.

Now, let's dive into variables.

Variable

What happens when you write a = 5?

  • Python creates an object in memory in some address, let's say 0x1000.

  • In this object, the value 5 is stored.

  • Here, a is you can think of an alias for the memory address 0x1000.
  • Here, a doesn't represent the value 5 instead it refers to the memory address 0x1000 and the address 0x1000 refers to the data stored in the object and the data is 5.

  • To find out the memory address of the object that the variable is referencing, you can use the id() function.

    # declared a variable a and stored a value 10
    
    a = 10
    
    # printing the value, it's decimal memory address and it's hex memory address
    
    print(a)
    print(id(a)))
    print(hex(id(a)))
    
    # ---------------------- OUTPUT ------------------ #
    
    10 
    4376986128 
    0x104e38210
    

Here, I declared a variable a with a value of 10. Let's understand what happened under the hood.

  • First, python created an object at some memory address, let's say 0x1000.

  • In that object, python puts the value 10.

  • Finally, variable a refers to the memory address 0x1000 that holds the object with the value 10. In the above code, I have printed out the value, the decimal format of the address that the variable a is referring to and the hexadecimal format of the address.

Let's take a look at another example.

s = "hello"
print(s)
print(id(s))
print(hex(id(s)))

# ------------------------- OUTPUT --------------------- #
hello
4702080944
0x118440fb0

  • First, python created an object at some memory address with the value 'hello'.
  • Then, the variable s refers to the memory address that holds the object with the value 'hello'. In the above code, I have printed out the value, the decimal format of the address that the variable a is referring to and the hexadecimal format of the address.

Reference Counting

  • So, we just learned how the variables are referencing a memory address where an object is stored.

  • We can count how many variables are pointing to that same memory address.

  • Let's say we declared a variable a = 5 and let's say the memory address where the object gets created is 0x1000.

  • Then the reference count to that memory address is 1.

  • Let's say we declared another variable b = a, where b is not getting assigned to a value 5 instead b is referencing the variable a which in turn references the memory location 0x1000.

  • Hence, two variables are pointing to the memory address 0x1000.

  • So, the reference count of the memory address 0x1000 is 2.

  • Let's say b got removed either b is out-of-scope or maybe gets assigned to a different memory location, then the reference count goes to 1.

  • Let's say a also got removed in one of the above ways, then the reference count goes to 0.

  • At this point, the Python memory manager recognizes this and throws away the object that was there in that memory location.
  • Finally, the space is freed.

  • sys module has a getrefcount() function that can be used to get the reference count.
  • This takes one parameter the variables, but the downside is that it also adds one reference count to that object.
  • There's also another way using ctypes module.

getrefcount()

import sys

# delcared a list with respective values, print it's id and then get the reference count
lst_1 = [1,2,3]
print(id(lst_1))
sys.getrefcount(lst_1)

# --------------------- OUTPUT --------------------- #
4389242752
2

  • It says 2 as the method getrefcount() is also referencing the address, so the reference count increases, to get the actual reference count, just subtract 1 from the answer.

ctypes

# with `ctypes`, you can get the actual reference count as it takes the actual memory address and not the reference.
import ctypes

def ref_count(address):
    return ctypes.c_long.from_address(address).value

# here you can see you get 1 as the reference count which is correct
print(id(lst_1))
ref_count(id(lst_1))

# -------------------- OUTPUT ------------------ #
4389242752
1

Garbage Collection

  • Previously, we learned that as soon as the reference count goes to 0, the Python memory manager destroys the object that's in the memory location and free's up the memory location.
  • But this doesn't work always and one of the cases where this doesn't work is circular references.
  • Let's think of a scenario where variable a is referencing the variable b and variable b is referencing the variable c.

  • Now, let's say we delete the variable a.

  • Now, the reference count of b is 0 but the reference count of c is 1.

  • So, the second object will be destroyed, then the reference count of the third object will become 0 and it'll get destroyed too.

  • Now, let's say the variable c is also referencing the variable b. i.e c = b and after that, we removed variable a.

  • Now, both object has reference count = 1.

  • Now, none of the objects are going to get destroyed as both have a reference count of 1 and this scenario is known as circular referencing.

  • As python memory manager can't eliminate these objects and if we continue like this, this will result in a memory leak.

  • Here, the Garbage collector comes to the rescue, it can handle this kind of issue.

  • You can control the garbage collector programmatically using gc module.

  • You can call it manually and even do your cleanup.

    import gc import ctypes
    
    # function to count the references
    def ref_count(address): 
      return ctypes.c_long.from_address(address).value
    

  • I imported the gc and ctypes modules and defined the reference count function to count the reference count.

    # this function will return if the given object_id is in the garbage collector or not
    def object_by_id(object_id): 
      for obj in gc.get_objects(): 
        if id(obj) == object_id: 
          return "Object exists" 
      return "Not found"
    

  • This function will take the id of an object as an argument and then it'll return "Object exists" if the garbage collector has tracked that this object is in some circular reference else it'll return "Not found" i.e. the given object is not in any circular reference.

    # created two classes to illustrate the circular reference concept
    class A: 
      def init(self): 
        self.b = B(self) 
        print('A: self: {0}, b:{1}'.format(hex(id(self)), hex(id(self.b))))
    
    class B: 
      def init(self, a): 
        self.a = a 
        print('B: self: {0}, a: {1}'.format(hex(id(self)), hex(id(self.a))))
    

    Now, I created two classes A and B to illustrate the circular reference.

  • class A: - This line defines a new class called A.

  • def __init__(self): - This is the constructor of the A class. It is executed when a new instance of the class is created.

  • self.b = B(self) - This line creates a new instance of the B class and assigns it to the b attribute of the current instance of the A class. The self-argument passed to the B constructor is a reference to the current instance of the A class.

  • print('A: self: {0}, b:{1}'.format(hex(id(self)), hex(id(self.b)))) - This line prints a message to the console. The message contains the hexadecimal representations of the memory addresses of the current instance of the A class and its b attribute.

  • class B: - This line defines a new class called B.

  • def __init__(self, a): - This is the constructor of the B class. It is executed when a new instance of the class is created. The a argument is a reference to an instance of the A class.

  • self.a = a - This line assigns the a argument to the a attribute of the current instance of the B class.

  • print('B: self: {0}, a: {1}'.format(hex(id(self)), hex(id(self.a)))) - This line prints a message to the console. The message contains the hexadecimal representations of the memory addresses of the current instance of the B class and its a attribute.

  • We disabled the garbage collector so that we can run it manually and also check the reference count.

    gc.disable()
    

Now, we disabled the garbage collector, so that we can run it manually.

# create an instance of class A
my_var = A()


# -------------- OUTPUT --------------- #
B: self: 0x11953e8c0, a: 0x11953d8d0
A: self: 0x11953d8d0, b:0x11953e8c0

  • I created an instance of A.

  • This prints out the ids of a and b.

  • The id of my_var and a is the same.

    print('a: \t{0}'.format(hex(id(my_var))))
    print('a.b: \t{0}'.format(hex(id(my_var.b))))
    print('b.a: \t{0}'.format(hex(id(my_var.b.a))))
    
    #----------------- OUTPUT ------------------#
    
    a:     0x119554e50 
    a.b:     0x11953e680 
    b.a:     0x119554e50
    
    # created two variables to store the ids of a and b instances
    
    a_id = id(my_var) 
    b_id = id(my_var.b)
    
    # These two variables are used to store the ids of a and b.
    
    # printing the refernce count of a and b
    
    # printing if the object is in garbage collector or not
    
    print('refcount(a) = {0}'.format(ref_count(a_id)))
    print('refcount(b) = {0}'.format(ref_count(b_id))) 
    print('a: {0}'.format(object_by_id(a_id))) 
    print('b: {0}'.format(object_by_id(b_id)))
    
    --------------------- OUTPUT --------------------
    
    refcount(a) = 2 
    refcount(b) = 1 
    a: Object exists 
    b: Object exists
    

  • Here, I'm printing the reference count for a and b.

  • I'm also checking if these objects are tracked by garbage collector or not.

  • As you can see, the garbage collector tracked these two variables and returned "Object exists" for both of them as both of them are in a circular reference.

Now, let's point my_var to None, so we'll only have a circular reference.

my_var= None


print('refcount(a) = {0}'.format(ref_count(a_id)))
print('refcount(b) = {0}'.format(ref_count(b_id)))
print('a: {0}'.format(object_by_id(a_id)))
print('b: {0}'.format(object_by_id(b_id)))

# ------------------ OUTPUT -------------------- #
refcount(a) = 1
refcount(b) = 1
a: Object exists
b: Object exists

  • Here, you can see that the reference count of a is decreased to 1 as we changed the my_var to refer to None.

    gc.collect() 
    print('refcount(a) = {0}'.format(ref_count(a_id))) 
    print('refcount(b) = {0}'.format(ref_count(b_id))) 
    print('a: {0}'.format(object_by_id(a_id))) 
    print('b: {0}'.format(object_by_id(b_id)))
    
    --------------- OUTPUT -----------------
    
    refcount(a) = 0 
    refcount(b) = 0 
    a: Not found 
    b: Not found
    

We enabled the garbage collector and then the garbage collector removed both the objects and you can see that both the objects are not found.

Object Mutability

  • Changing the data inside the object is called modifying the internal state of the object.

  • An object whose internal state can be changed is called mutable otherwise it's immutable.

  • Immutable data types in Python are:

    • Numbers
    • Strings
    • Tuples
    • Frozen Sets
    • User Defined Classes
  • Mutable data types in Python are:

    • Lists
    • Sets
    • Dictionaries
    • User-Defined Classes

Now, let's see some examples of mutable and immutable datatypes and understand what happens under the hood of mutable and immutable datatypes when you change their values.

Immutable

  • Let's say we have a string s = 'python'.

  • As we know, strings are immutable

  • So, if in the next line, you'll write s = 'hello'.

  • First, python will create another object at some different memory address with the value 'hello'.

  • Then the variable s will point to this new object's address.

  • After this, the previous object with the value 'python' will be destroyed as no one is referencing that object.

  • So, the python memory manager will destroy the object and free up the space.

s = 'python'
print(s)
print(hex(id(s)))

# ----------------- OUTPUT ------------------ #
python
0x10380b870


s = 'hello'
print(s)
print(hex(id(s)))

# -------------- OUTPUT ------------------- #
hello
0x106232470

As you can see both the addresses are different.

Mutable

  • Let's say we have a list a = [1, 2, 3].

  • As we know, lists are mutable i.e. elements can be inserted, deleted and replaced.

  • When we write a = [1, 2, 3], python creates an object at some memory location let's say 0x1000.

  • Now, a points to the address 0x1000 where the list is stored.
  • Let's say you wrote a.append(4), to append 4 in the list a.
  • Unlike Immutable datatypes, python won't create a new object instead it will add the value 4 to the object that is stored in 0x1000.

# creating a list and printing out the list and it's address
my_list = [1, 2, 3]
print(my_list)
print(hex(id(my_list)))

# ------------------- OUTPUT ------------------ #
[1, 2, 3]
0x11bf06340


# checking if the address is changed after modifying the list

my_list.append(4)

print(my_list)

print(hex(id(my_list)))

# ------------------ OUTPUT -------------------- #
[1, 2, 3, 4]
0x11bf06340

  • You can see that the address remains the same. Let's take another example

    # creating a dictionary and printing the dictionary and it's address
    
    my_dict = {'key1': 1, 'key2': 2} 
    print(my_dict) 
    print(hex(id(my_dict)))
    
    # ------------------ OUTPUT --------------- #
    
    {'key1': 1, 'key2': 2} 
    0x11be9ea80
    
    
    # checking if the address is changed after modifying the dictionary
    
    my_dict['key1'] = 10 print(my_dict) print(hex(id(my_dict)))
    
    #---------------- OUTPUT -----------------#
    
    {'key1': 10, 'key2': 2} 
    0x11be9ea80
    

  • As a dictionary is mutable, its address remains the same

Mutable Datatype Within Immutable Datatype

  • Let's take another tuple b = ([1,2,3], [4,5,6])

  • As we know, lists are mutable i.e. elements can be inserted, deleted or replaced.

  • Here, we can modify the lists that are in the tuple, but we can't make nay changes to the tuple i.e we can't insert a new element to the tuple, we can't delete an element from the tuple, we can't delete element from the tuple.

  • Tuple still has the same elements, but as the elements are mutable we can make changes to those elements.

    a = [1, 2] 
    b = [3, 4] 
    t = (a, b) 
    print(hex(id(a))) 
    print(hex(id(b))) 
    print(hex(id(t)))
    
    #----------------- OUTPUT ----------------#
    
    0x10699f400 
    0x106f1dbc0 
    0x106f1d540
    
    
    
    a.append(3) 
    b.append(5) 
    print(t) 
    print(hex(id(a))) 
    print(hex(id(b))) 
    print(hex(id(t)))
    
    ---------------- OUTPUT ----------------
    
    ([1, 2, 3], [3, 4, 5]) 
    0x10699f400 
    0x106f1dbc0 
    0x106f1d540
    

  • Here, we only modified the lists and as they are mutable, their addresses haven't changed.

  • We haven't modified the tuple, the tuple always had those two lists, we didn't replace, deleted, or inserted anything into this tuple, that's why its address also remains the same.

Function Arguments and Mutability

  • Let's see how the above concept of mutability and immutability affects function arguments.
  • If the argument of the function is immutable, then it won't change.
  • If the argument of the function is mutable, then it'll change.
  • Let's take the help of an example and understand.

Immutable Objects as Arguments

  • Let's create a function process(s) that takes a string parameter.

  • Remember, a string is an immutable object.

    # trying the similar thing with mutable and immutable objects but now the objects are passed as arguments to function
    
    def process(s): 
      print('initial s # = {0}'.format(hex(id(s)))) 
      s = s + ' world' 
      print('s after change # = {0}'.format(hex(id(s))))
    

  • First, I printed out the memory address where the string object is stored.

  • Then, I concatenated another string world to string variable s.

  • As you already know, modifying immutable objects is not possible.

  • So, first, python created another string object with this new concatenated value 'hello world', then string variable s points to the new object where the 'hello world' string is stored.

  • In the end, you can see that the memory address where the string variable s is pointing to is now different than the previous address.

    my_var = 'hello' 
    print('my_var # = {0}'.format(hex(id(my_var))))
    

  • Here, I created a string called my_var which is referencing a memory address that has an object with the value 'hello'.

  • Printing the memory address where the string variable my_var is pointing.

    process(my_var)

  • Then, I called the function process(my_var) and passed my_var as an argument.

    print('my_var # = {0}'.format(hex(id(my_var))))
    

  • Now, you can see that the memory address of the string variable my_var is still the same because my_var is still pointing to the memory address where the string object with the value 'hello' is stored.

Mutable Objects as Arguments

  • Let's create a function process(items) that takes a list parameter.

  • Remember, a list is a mutable object.

    def modify_list(items): print('initial items # = {0}'.format(hex(id(items)))) if len(items) > 0: items[0] = items[0] ** 2 items.pop() items.append(5) print('final items # = {0}'.format(hex(id(items))))

  • First, I printed out the memory address where the parameter items is pointing to.

  • Then, I modified every element of that list, removed an element from that list and finally append 5 to that list.

  • As you already know, modifying mutable objects is possible.

  • So, first, python simply modified the object.

  • In the end, you can see that the memory address where the list parameter items is pointing to, is same as the previous address.

    my_list = [2, 3, 4]

    print('my_list # = {0}'.format(hex(id(my_list))))

  • Here, I created a list my_list and print out the memory address where the my_list variable is pointing to.

    modify_list(my_list)

  • Now, I called the function modify_list(my_list) with my_list variable as an argument.

    print(my_list) print('my_list # = {0}'.format(hex(id(my_list))))

  • Finally, I'm printing the variable my_list, to check whether my_list is modified or not.

  • You can see that my_list is modified and this makes sense as the list is mutable.

Mutable Objects inside Mutable Objects as Arguments

  • Let's create a function modify_tuple(t) that takes a tuple parameter.

  • Remember, a tuple is an immutable object while a list is a mutable object.

    def modify_tuple(t): 
      print('initial t # = {0}'.format(hex(id(t)))) 
      t[0].append(100) 
      print('final t # = {0}'.format(hex(id(t))))
    

  • First, I printed out the memory address where the parameter t is pointing to.

  • Then, I modified the first element of the tuple which is a list, here I appended the value 100.

  • As you already know, modifying mutable objects is possible.

  • So, first, python simply modified the list.

  • In the end, you can see that the memory address where the tuple parameter t is pointing to, is the same as the previous address.

    a = [1,2,3] 
    b = [10,20,30] 
    my_tuple = (a,b)
    

  • I created two lists a and b and then created a tuple my_tuple containing these two lists.

    hex(id(my_tuple))

  • Here, I'm printing the memory address where the tuple my_tuple is pointing to.

    modify_tuple(my_tuple)

  • Now, I called the function modify_tuple(my_tuple) with my_tuple as an argument, that will modify the list a in the tuple.

    my_tuple

  • You can see that the tuple's content remains the same i.e. there are two lists a and b but the list a content/data is changed and it is possible as lists are mutable.

Shared References and Mutability

  • Shared reference is the concept of two variables referencing the same object or same memory address.

  • Let's say we create two variables a = 10 and b = a.

  • Let's say that a is pointing to the memory address `0x1000` .

  • So, b is also pointing to that same address.

  • Hence, the reference count of that address is 2.

  • Both variables refer to the same address

    my_var_1 = 'hello' 
    my_var_2 = my_var_1 
    print(my_var_1) 
    print(my_var_2)
    

  • Here, I created two variables my_var_1 and my_var_2.

  • my_var_1 is pointing to a memory address where an object with the value 'hello' is stored.

  • my_var_2 is referencing my_var_1 which in turn points to the memory address where the object with the value 'hello' is stored.

  • So, my_var_2 is also referencing the same memory address as my_var_1.

  • Finally, I'm printing the values of both variables.

    print(hex(id(my_var_1))) print(hex(id(my_var_2)))

  • Here, I'm printing the memory address that both of these variables are pointing to.

    # by modifying the address will change as string is immutable
    
    my_var_2 = my_var_2 + ' world!'
    

  • As I modified my_var_2 , my_var_2 will point to some other location where this new object is stored.

    print(hex(id(my_var_1))) print(hex(id(my_var_2)))

  • Now, you can see both the variables are pointing to different locations.

The same thing will happen with mutable objects.

# doing the same thing with mutable objects
my_list_1 = [1, 2, 3]
my_list_2 = my_list_1
print(my_list_1)
print(my_list_2)


print(hex(id(my_list_1)))
print(hex(id(my_list_2)))

  • Similarly, as above both these lists are pointing to the same location.

    # it'll change both the lists as list is mutable.
    
    my_list_2.append(4)
    

  • Here, I'm modifying the list my_list_2.

    print(my_list_2) 
    print(my_list_1)
    

  • You can see both the lists got modified as lists are mutable, so when I modified the object where the my_list_2 is pointing to, python didn't create another object instead python modified the same object.

  • Due to this, both the lists are showing the modified list.

    print(hex(id(my_list_1))) 
    print(hex(id(my_list_2)))
    

  • You can see both lists are pointing to the same address even after modifying a list.

Python Interning

  • In Python, interning is a technique used to optimize the use of memory by storing the commonly used objects in a cache to avoid creating new objects each time they are needed.
  • Two common types of objects that are interned in Python are integers and strings.

Integer Interning

  • Integer interning is the process of storing and reusing integer objects with values ranging from -5 to 256.

  • When you create an integer object in this range, Python checks if it already exists in memory. If it does, it returns the reference to the existing object instead of creating a new one.

  • This can improve the performance of Python programs by reducing the number of objects created and the amount of memory used. For example:

    a = 10 
    b = 10 
    a is b True
    

In this example, both a and b are assigned the integer value 10. Since this value is within the range of interned integers, Python interns it and assigns the same object to both a and b. Therefore, a is b returns True.

String Interning

  • String interning is the process of storing and reusing string objects with the same value.

  • When you create a string object in Python, it is added to a cache of commonly used strings. If another string with the same value is created, Python returns a reference to the existing object instead of creating a new one.

  • This can also improve the performance of Python programs by reducing the number of objects created and the amount of memory used. For example:

    a = 'hello' b = 'hello' a is b True

  • In this example, both a and b are assigned the string 'hello'. Since this string is commonly used, Python interns it and assigns the same object to both a and b. Therefore, a is b returns True.

It is important to note that interning is an implementation detail of the Python language and may vary depending on the Python interpreter being used. Therefore, it is recommended to rely on the == operator to compare values of integers and strings instead of using the is operator.

Variable Equality

  • We can compare variables in two ways, one way is Memory address and the other one is data/content inside the object.

  • To compare memory addresses of variables we can use is operator, which is known as an identity operator.

    print("a is b: ", a is b)

  • To compare the data/content of the objects, we can use == operator, which is known as an equality operator.

    print("a == b:", a == b)

  • If you want to check if two variables memory addresses are not equal, then you can use is not operator.

  • If you want to check if two variable's data/content is not equal, then you can use != operator.

  • The None object can be assigned to variables to indicate that they are not set in the way we would expect them to be.

  • For example, let's say we have a string s set to None, as we don't have any proper value, we just initialized the string with None.

    a = None print(type(a)) print(hex(id(a)))
    
    a is None
    # --------- OUTPUT ---------- #
    True
    
    
    b = None hex(id(b))
    
    a is b a == b
    

  • None object is a real object, that is managed by the Python memory manager.

    hex(id(None)) 
    type(None)
    

  • Python memory manager will always use a shared reference when assigning a variable to None.

Everything is an Object

  • In Python, everything is an object. This means that any value, variable, or function in Python is considered an object.

  • An object in Python is a self-contained piece of code that has data and methods that can be accessed and manipulated.

  • Objects are instances of classes, which are essentially blueprints that define the structure and behavior of the objects.

For example, if you declare a variable in Python, such as:

x = 42

The value 42 is an object of the int class, which means it has built-in methods and attributes that can be accessed and manipulated.

Similarly, if you define a function in Python, such as:

def my_function():
    print("Hello, World!")

The function my_function() is an object of the function class, which means it can be passed around as a variable, returned from another function, or even assigned to a different name.

The concept of everything being an object in Python is a fundamental aspect of the language and is important for understanding how Python code is executed and how objects interact with one another. It also allows for powerful programming constructs such as dynamic typing, duck typing, and metaprogramming, which can make Python code more flexible and expressive.

  • Any object can be assigned to a variable, so functions(as a function is an object too) can also be assigned to a function.
  • Any object can be passed to a function, therefore functions can be passed to a function.
  • Any object can be returned from a function, therefore functions can be returned from a function.

Conclusion

In conclusion, understanding variables and their behavior in Python is crucial for writing effective and efficient code. Variables are placeholders for data that are stored in memory, and their mutability determines whether they can be changed or not. Memory management is an important consideration when working with variables, as it can impact the performance of your code. Shared references can lead to unexpected results, so it is important to be aware of how they work. Finally, understanding function argument mutability can help you avoid errors when passing variables between functions. By keeping these concepts in mind, you can write better Python code and avoid common pitfalls.


Also published here.


Written by dotslashbit | I'm a Masters CS Student @IIT Kharagpur, India
Published by HackerNoon on 2023/05/24