Understanding Multiprocessing and Multithreading in Python

Written by pragativerma | Published 2022/08/22
Tech Story Tags: python | python-programming | multithreading | multiprocessing | software-development | software-engineering | programming-languages | hackernoon-top-story | hackernoon-es | hackernoon-hi | hackernoon-zh | hackernoon-vi | hackernoon-fr | hackernoon-pt | hackernoon-ja

TLDRMultithreading and Multiprocessing are two most common ways of attaining concurrency and parallelization. Not much developers understand the difference between them and fail to choose which to use. In this article, we will be discussing for the differences. We can use the Threading Python module to understand and implement the concept. The threading module offers an intutive API to easily generate multiple threads that can be used when there is more processing power required. To do this, you will have to use something known as the **Lock** or **Global Interpreter Lock in Python.via the TL;DR App

Multithreading and Multiprocessing are the two most common ways of attaining concurrency and parallelization, however, not many developers understand the difference between them and fail to choose effectively which to use when.

In this article, we will be discussing the differences between Multithreading and Multiprocessing and how to decide what to use and how to implement it in Python.

What is a thread?

A thread is an independent flow of execution. It can be essentially seen as a lightweight individual component of a process, which can run parallely. Threading is a feature usually provided by the operating system. There can be multiple threads in a process, that share the same memory space, which means that they share the code to be executed and the variables declared in the program with each other.

To understand this better, let’s consider an example of the programs running on your laptop right now. You are probably reading this article with multiple tabs open in your browser. Meanwhile, you have Spotify desktop app open for listening to music. Now, the browser and the Spotify desktop app are like two distinct processes that can employ several processes or threads to achieve parallelism. So, the different tabs in your browser might be run in different threads. Similarly, Spotify can play music using one thread and use another for downloading your favorite song from the internet, and use a third one to display the user interface. And this is called Multithreading.

What is Multithreading in Python?

Multithreading, as the name suggests, is a task or an operation that can execute multiple threads at the same time. It is a popular technique that streamlines multiple tasks in quick succession at the same time, and facilitates quick and easy sharing of resources among multiple threads with the main thread.

The following image explains Multithreading in Python:

Python is a linear language, but we can use the Threading Python module to understand and implement the concept of Multithreading in Python. The threading module offers an intutive API to easily generate multiple threads that can be used when there is more processing power required.

It can be used as shown below:

import threading
from queue import Queue
import time

def testThread(num):
    print num

if __name__ == '__main__':
    for i in range(5):
        t = threading.Thread(target=testThread, arg=(i,))
        t.start()

In the above code snippet, target is used as the callable object, args to pass parameters to the function and start to start the thread.

Now, here comes something interesting - the lock.

Global Interpreter Lock

There are often cases in programming where you would want your threads to be able to modify or use the variables that are common to the threads. However, to do this, you will have to use something known as the Lock or Global Interpreter Lock (GIL) in Python.

From the Python wiki:

In CPython, the global interpreter lock, or GIL, is a mutex that protects access to Python objects, preventing multiple threads from executing Python bytecodes at once. This lock is necessary mainly because CPython's memory management is not thread-safe.

At the interpreter level, Python basically serializes the instructions. In order for any thread to run any function, it must first get a global lock. Because only one thread may gain that lock at a time, the interpreter must finally execute the instructions serially. This architecture makes memory management thread-safe, but it cannot use multiple CPU cores at all.

Simply put, whenever a function wants to use or modify a variable, it locks that variable such that if any other function wants to use or modify that specific variable, it will have to wait until that variable is unlocked.

Consider two functions that each iterate a variable by one. You may use the lock to ensure that one function can read the variable, run computations, and write back to it before another function can, so that we can avoid data corruption.

Use Cases for Mutlithreading in Python

Threading in Python is more helpful for I/O operations or network-bound task such as running scripts for example, in the case of web scraping rather than tasks that can be CPU intensive. Another example is Tensorflow, which uses a thread pool to transform data in parallel.

Other than these applications, Graphical User Interfaces (GUIs) use Multithreading all the time to make applications responsive and interactive. A common example could be a text editing program where as soon as the user inputs text, its is displayed on the screen. Here one thread takes care of the user input while the other thread handles the task to display it. We can add more threads for more functionalities such as spell check, autocompletion and so on.

Now, having discussed threads in detail, let’s move on to processes.

What is a process?

A process is simply an instance of the computer program being executed. Each process has its own memory space that is used to store the instructions being run, and any data that it needs to access or store for the execution of the code. Because of this, spawning a process is more time-consuming and slow as compared to a thread.

As we discussed earlier, when we are running multiple applications on our desktop, each application is a process and when are executing these processes at the same time, it is called Multiprocessing.

What is Multiprocessing in Python?

Multiprocessing is the ability of a processor to execute several unrelated tasks simultaneously. It allows you to create programs that can run concurrently, bypassing the Global Interpreter Lock (GIL) and use the entire CPU core for efficient execution of tasks.

Although the concept of Multiprocessing is fundamentally different from Multithreading, still their syntax or usage in Python is quite similar. Similar to the Threading module, we have a Multiprocessing module in Python that helps in generating different processes, where each process has its own Python interpreter and a GIL.

Since the processes don’t share the same memory, they can’t modify the same memory concurrently, saving us from the risk of running into a deadlock or chances of data corruption.

It can be used as shown below:

import multiprocessing
def spawn(num):
  print(num)

if __name__ == '__main__':
  for i in range(5):
    p = multiprocessing.Process(target=spawn, args=(i,))
    p.start()
    p.join() # this line allows you to wait for processes

Use Cases for Mutliprocessing in Python

As we discussed earlier as well, Mutliprocessing is a wiser choice in case the tasks are CPU extensive and don’t have any I/O operations or user interactions.

Differences, Merits and Drawbacks

Here are some points to summarize the differences, merits and drawbacks of Multiprocessing and Multithreading:

  • Threads share the same memory space whereas each process has its own memory space.
  • Sharing objects between threads is simpler, but you must take extra precautions for object synchronization to ensure that two threads do not write to the same object at the same time and that a race condition does not arise.
  • Multithreaded programming is more prone to bugs rather than Multiprocessing because of the added programming overhead for object synchronization.
  • Spawning process is more time and resource-consuming than threads as they have a lower overhead than processes.
  • Threads cannot achieve complete parallelism by leveraging multiple CPU cores due to GIL constraints in Python. There are no such constraints with multiprocessing.
  • The operating system handles process scheduling, whereas the Python interpreter handles thread scheduling.
  • Child processes can be interrupted and killed, but child threads cannot. You must wait for the threads to finish or join.

Conclusion

We can draw the following conclusions from this discussion:

  • Threading is recommended for programs that need IO or user interaction.
  • For CPU-bound, computation-intensive apps, multiprocessing should be employed.

Now that you understand how Python Multiprocessing and Multithreading operate and how they compare, you can write code effectively and apply the two approaches in a variety of circumstances.

I hope you found this article helpful. Keep reading!


Written by pragativerma | I am a Software Developer with a keen interest in tech content writing.
Published by HackerNoon on 2022/08/22