Multithreading and Multiprocessing are the two most common ways of attaining concurrency and parallelization, however, not many developers understand the difference between them and fail to choose effectively which to use when.
In this article, we will be discussing the differences between Multithreading and Multiprocessing and how to decide what to use and how to implement it in Python.
A thread is an independent flow of execution. It can be essentially seen as a lightweight individual component of a process, which can run parallely. Threading is a feature usually provided by the operating system. There can be multiple threads in a process, that share the same memory space, which means that they share the code to be executed and the variables declared in the program with each other.
To understand this better, let’s consider an example of the programs running on your laptop right now. You are probably reading this article with multiple tabs open in your browser. Meanwhile, you have Spotify desktop app open for listening to music. Now, the browser and the Spotify desktop app are like two distinct processes that can employ several processes or threads to achieve parallelism. So, the different tabs in your browser might be run in different threads. Similarly, Spotify can play music using one thread and use another for downloading your favorite song from the internet, and use a third one to display the user interface. And this is called Multithreading.
Multithreading, as the name suggests, is a task or an operation that can execute multiple threads at the same time. It is a popular technique that streamlines multiple tasks in quick succession at the same time, and facilitates quick and easy sharing of resources among multiple threads with the main thread.
The following image explains Multithreading in Python:
Python is a linear language, but we can use the Threading Python module to understand and implement the concept of Multithreading in Python. The threading module offers an intutive API to easily generate multiple threads that can be used when there is more processing power required.
It can be used as shown below:
import threading
from queue import Queue
import time
def testThread(num):
print num
if __name__ == '__main__':
for i in range(5):
t = threading.Thread(target=testThread, arg=(i,))
t.start()
In the above code snippet, target
is used as the callable object, args
to pass parameters to the function and start
to start the thread.
Now, here comes something interesting - the lock.
There are often cases in programming where you would want your threads to be able to modify or use the variables that are common to the threads. However, to do this, you will have to use something known as the Lock or Global Interpreter Lock (GIL) in Python.
From the Python
In CPython, the global interpreter lock, or GIL, is a mutex that protects access to Python objects, preventing multiple threads from executing Python bytecodes at once. This lock is necessary mainly because CPython's memory management is not thread-safe.
At the interpreter level, Python basically serializes the instructions. In order for any thread to run any function, it must first get a global lock. Because only one thread may gain that lock at a time, the interpreter must finally execute the instructions serially. This architecture makes memory management thread-safe, but it cannot use multiple CPU cores at all.
Simply put, whenever a function wants to use or modify a variable, it locks that variable such that if any other function wants to use or modify that specific variable, it will have to wait until that variable is unlocked.
Consider two functions that each iterate a variable by one. You may use the lock to ensure that one function can read the variable, run computations, and write back to it before another function can, so that we can avoid data corruption.
Threading in Python is more helpful for I/O operations or network-bound task such as running scripts for example, in the case of web scraping rather than tasks that can be CPU intensive. Another example is Tensorflow, which uses a thread pool to transform data in parallel.
Other than these applications, Graphical User Interfaces (GUIs) use Multithreading all the time to make applications responsive and interactive. A common example could be a text editing program where as soon as the user inputs text, its is displayed on the screen. Here one thread takes care of the user input while the other thread handles the task to display it. We can add more threads for more functionalities such as spell check, autocompletion and so on.
Now, having discussed threads in detail, let’s move on to processes.
A process is simply an instance of the computer program being executed. Each process has its own memory space that is used to store the instructions being run, and any data that it needs to access or store for the execution of the code. Because of this, spawning a process is more time-consuming and slow as compared to a thread.
As we discussed earlier, when we are running multiple applications on our desktop, each application is a process and when are executing these processes at the same time, it is called Multiprocessing.
Multiprocessing is the ability of a processor to execute several unrelated tasks simultaneously. It allows you to create programs that can run concurrently, bypassing the Global Interpreter Lock (GIL) and use the entire CPU core for efficient execution of tasks.
Although the concept of Multiprocessing is fundamentally different from Multithreading, still their syntax or usage in Python is quite similar. Similar to the Threading module, we have a Multiprocessing module in Python that helps in generating different processes, where each process has its own Python interpreter and a GIL.
Since the processes don’t share the same memory, they can’t modify the same memory concurrently, saving us from the risk of running into a deadlock or chances of data corruption.
It can be used as shown below:
import multiprocessing
def spawn(num):
print(num)
if __name__ == '__main__':
for i in range(5):
p = multiprocessing.Process(target=spawn, args=(i,))
p.start()
p.join() # this line allows you to wait for processes
As we discussed earlier as well, Mutliprocessing is a wiser choice in case the tasks are CPU extensive and don’t have any I/O operations or user interactions.
Here are some points to summarize the differences, merits and drawbacks of Multiprocessing and Multithreading:
We can draw the following conclusions from this discussion:
Now that you understand how Python Multiprocessing and Multithreading operate and how they compare, you can write code effectively and apply the two approaches in a variety of circumstances.
I hope you found this article helpful. Keep reading!