This article was originally posted on my blog. A few weeks ago, I was working on a Python script to extract books' metadata for a . After a couple of hours, I realized that I needed to make thousands of requests to the Google Books API to get the data. So I thought there had to be a way of speeding up the process. content-based recommender As I enjoy learning, especially when it's also a chance of procrastinating on my goals, I decided to build a project using . Afterward, feeling guilty for the time wasted, I decided to write this tutorial with what I learned in the process. asyncio This article aims to provide the basics of how to use for making asynchronous requests to an API. I focus mostly on the actual code and skip most of the theory (besides the short introduction below). However, if you are looking for a more in-depth introduction to , check the recommendations in the references[1]. asyncio asyncio in 30 Seconds or Less asyncio is a Python library that allows you to execute some tasks in a [2] manner. It is commonly used in web-servers and database connections. It is also useful for speeding up IO-bound tasks, like services that require making many requests or do lots of waiting for external APIs. asyncio seemingly concurrent The essence of is that it allows the program to continue executing other instructions while waiting for specific processes to finish (e.g., a request to an API). In this tutorial, we will see how to use for accelerating a program that makes multiple requests to an API. asyncio asyncio Sequential vs. Asynchronous So let's get down to business. To get the most out of this tutorial, try running the code yourself. You can copy-and-paste the code in a Jupyter Notebook and run it without modifications. Just remember to first install the required libraries ( and .) requests aiohttp We'll build a sequential and an asynchronous version of a small program and compare their results and structure. Both programs will do the same: Read a list of ISBNs (international identifier of books) Request the books' metadata to the Google Books API Parse the results from the requests Print the results to the screen The algorithm would look something like the diagram below. We'll now compare two possible approaches for building this algorithm. First, , which executes the requests sequentially. Then, , which uses to run requests asynchronously. Option A Option B asyncio Option A: Sequential Algorithm A sequential version of that algorithm could look as follows: os requests requests.exceptions HTTPError GOOGLE_BOOKS_URL = LIST_ISBN = [ , , , , , , , , , , ] volume_info = item.get( , {}) title = volume_info.get( , ) subtitle = volume_info.get( , ) description = volume_info.get( , ) published_date = volume_info.get( , ) ( title, subtitle, description, published_date, ) url = GOOGLE_BOOKS_URL + isbn response = : response = session.get(url) response.raise_for_status() print( ) HTTPError http_err: print( ) Exception err: print( ) response_json = response.json() items = response_json.get( , [{}])[ ] items requests.Session() session: isbn LIST_ISBN: : response = get_book_details_seq(isbn, session) parsed_response = extract_fields_from_response(response) print( ) Exception err: print( ) import import from import "https://www.googleapis.com/books/v1/volumes?q=isbn:" '9780002005883' '9780002238304' '9780002261982' '9780006163831' '9780006178736' '9780006280897' '9780006280934' '9780006353287' '9780006380832' '9780006470229' : def extract_fields_from_response (item) """Extract fields from API's response""" "volumeInfo" "title" None "subtitle" None "description" None "publishedDate" None return : def get_book_details_seq (isbn, session) """Get book details using Google Books API (sequentially)""" None try f"Response status ( ): " {url} {response.status_code} except as f"HTTP error occurred: " {http_err} except as f"An error ocurred: " {err} "items" 0 return with as for in try f"Response: " {json.dumps(parsed_response, indent= )} 2 except as f"Exception occured: " {err} pass Now, let's breakdown the code to understand what's going on. As usual, we start by importing the required libraries. Then, we define two variables: for specifying the URL of the Google's API we'll use for the requests. GOOGLE_BOOKS_URL which is a sample list of ISBNs for testing the program. LIST_ISBN , For illustration purposes, this how a request to the Google Books API looks like: https://www.googleapis.com/books/v1/volumes?q=isbn:9780002005883 Next, we define the function. This function takes as input the response from the API and extracts the fields we're interested in. extract_fields_from_response The parsing process in is based on the response's structure from the Google Books API, which looks as follows: extract_fields_from_response { : , : , : [ { : , : , : , : , : { : , : , ... "kind" "books#volumes" "totalItems" 1 "items" "kind" "books#volume" "id" "3Mx4QgAACAAJ" "etag" "FWJF/JY16xg" "selfLink" "https://www.googleapis.com/books/v1/volumes/3Mx4QgAACAAJ" "volumeInfo" "title" "Mapping the Big Picture" "subtitle" "Integrating Curriculum and Assessment, K-12" Finally, we go into the most relevant parts of the program: how we make requests to the Google Books API. url = GOOGLE_BOOKS_URL + isbn response = : response = session.get(url) response.raise_for_status() print( ) HTTPError http_err: print( ) Exception err: print( ) response_json = response.json() items = response_json.get( , [{}])[ ] items requests.Session() session: isbn LIST_ISBN: : response = get_book_details_seq(isbn, session) parsed_response = extract_fields_from_response(response) print( ) Exception err: print( ) : def get_book_details_seq (isbn, session) """Get book details using Google Books API (sequentially)""" None try f"Response status ( ): " {url} {response.status_code} except as f"HTTP error occurred: " {http_err} except as f"An error ocurred: " {err} "items" 0 return with as for in try f"Response: " {json.dumps(parsed_response, indent= )} 2 except as f"Exception occured: " {err} pass There are two major pieces here: , which is the function that executes the requests. It takes as input an ISBN and a session object[4] and returns the response from the API as a JSON structure. It also handles possible errors, like providing a wrong URL or going over your daily quota of requests. get_book_details_seq The code block under , is where the actual execution of requests happens. It iterates through the list of ISBNs, getting the book details, parsing them, and finally printing them to the screen. with requests.Session() as session For me, . If you only need to do this a couple of times, you will not find much benefit from using . However, if instead of 10 requests, you need to do 10,000, having some concurrency in your program pays out. In the next section, we'll see how to make this algorithm faster using . executing this process takes ranges from 4 to 6 seconds asyncio asyncio Option B: Asynchronous Algorithm An asynchronous version of the same algorithm may look something as follows: aiohttp asyncio os aiohttp ClientSession GOOGLE_BOOKS_URL = LIST_ISBN = [ , , , , , , , , , , ] item = response.get( , [{}])[ ] volume_info = item.get( , {}) title = volume_info.get( , ) subtitle = volume_info.get( , ) description = volume_info.get( , ) published_date = volume_info.get( , ) ( title, subtitle, description, published_date, ) url = GOOGLE_BOOKS_URL + isbn : response = session.request(method= , url=url) response.raise_for_status() print( ) HTTPError http_err: print( ) Exception err: print( ) response_json = response.json() response_json : response = get_book_details_async(isbn, session) parsed_response = extract_fields_from_response(response) print( ) Exception err: print( ) ClientSession() session: asyncio.gather(*[run_program(isbn, session) isbn LIST_ISBN]) import import import from import "https://www.googleapis.com/books/v1/volumes?q=isbn:" '9780002005883' '9780002238304' '9780002261982' '9780006163831' '9780006178736' '9780006280897' '9780006280934' '9780006353287' '9780006380832' '9780006470229' : def extract_fields_from_response (response) """Extract fields from API's response""" "items" 0 "volumeInfo" "title" None "subtitle" None "description" None "publishedDate" None return async : def get_book_details_async (isbn, session) """Get book details using Google Books API (asynchronously)""" try await 'GET' f"Response status ( ): " {url} {response.status} except as f"HTTP error occurred: " {http_err} except as f"An error ocurred: " {err} await return async : def run_program (isbn, session) """Wrapper for running program in an asynchronous manner""" try await f"Response: " {json.dumps(parsed_response, indent= )} 2 except as f"Exception occured: " {err} pass async with as await for in First, check the function. An keyword prepends it. This keyword tells Python that your function is a . Then, in the function's body, there are two await keywords. These tell that to suspend execution and give back control to the event loop, while the operation it is awaiting finishes. get_book_details_async async coroutine coroutine A is a type of generator function in Python that, instead of producing values, consumes values[4]. The interesting thing about it is that its execution pauses while waiting for new data being sent to it. In our case, this allows the execution of other parts of the program to continue in a manner. coroutine seemingly concurrent In this case, the execution of is suspended while the request is being performed: get_book_details_async . await session.request(method='GET', url=url) It is suspended again, while the request response is being parsed into a JSON structure: . await response.json() Next, we have the coroutine. This one is simply a wrapper around the pipeline of getting a response from the API, parsing it, and printing the results in the screen. It the execution of the coroutine. run_program awaits get_book_details_async Finally, we have the code block under . Using the syntax, we tell the program to schedule all the tasks based on the list of coroutines we provided. This is what allows us to execute tasks concurrently. async with ClientSession() as session: asyncio.gather For me, running this process takes around 800-1000 milliseconds. Results Comparing both versions, we see that than the sequential version. If we increase the number of requests, you'll likely get an even higher speedup. Besides, , which makes using asyncio an excellent option for the kind of task we reviewed in the tutorial. the asynchronous one is around 4 to 7.5 times faster the version using asyncio is not much more complicated than the sequential version Additional recommendations Here are some tips I gathered while working with asyncio: keeps changing all the time, so be wary of old Stack Overflow answers. Many of them are not up to date with the current best practices asyncio APIs will not allow you to run unlimited concurrent requests. To overcome that, take a look at . It will enable you to limit the concurrency of your application. asyncio's Semaphore Not all programs can be speedup with asyncio. Research the type of issue you are facing before doing any substantial modification of code. Other alternatives might work for you (e.g., ) threading, multiprocessing I made a complete version of the program we went through in this tutorial for getting the metadata of almost 7 thousand books. Here's a link to it: . Google Books Crawler Notes and References [1] Real Python has a two of amazing articles introducing : and asyncio Async IO in Python Speed Up Your Python Program With Concurrency [2] It is not strictly concurrent execution. But in practical terms, it looks like it is. [3] S. Buczyński, (2017) What Is the use case of coroutines and asyncio in Python 3.6? [4] The session object is a functionality from the requests library that allows you to persist certain parameters across sessions. This usually results in requests with lower latency. Read more . here [5] D. Beasly, (2009) A Curious Course on Couroutines and Concurrency