We used Octokit’s GitHub REST API Client extensively while working on Git Repository Analyzer project. The use of Node.js asynchronous calls, helped in making simultaneous HTTP calls but there were too many HTTP calls that were made to retrieve the required data. Let’s see how we can overcome the problems by using GitHub’s GraphQL API.
- Search the GitHub repositories based on the search parameters; e.g. language:java.
- Sort these repositories in the descending order of the number of stars found on the repository.
- For each repository, get the top 30 pull requests.
- For each pull request, get the issue and patch/diff data. Issue data and Patch data can be obtained by making 2 separate HTTP calls. A patch is a small file that indicates what was changed in a repository.
Using the GitHub REST API v3
Based on the above requirements, we retrieve the data in the following manner.
Let’s look at the problems faced with the above approach:
- Number of HTTP calls
Let’s say I want to get the required data from 5 GitHub repositories.
- A HTTP call for getting the repositories details (count: 1)
- For each repository, we get the 30 pull requests data (count: 5)
- For each pull request, we make 1 HTTP call for getting the Issue data for the pull request (count: 30*5 = 150)
- For each pull request, we make 1 HTTP call for getting the patch/diff data for the pull request (count: 30*5 = 150)
In total we make 306 HTTP calls for getting the Issue and Patch data for 5 repositories based on the search query.
- Deleting Unused Data
After each HTTP call, only few fields from the response are used for the subsequent calls. Deletion of the unused data from the response becomes necessary so as to reduce the size of the final output.
What is GraphQL ?
From graphql.org :
GraphQL provides a complete and understandable description of the data in your API, gives clients the power to ask for exactly what they need and nothing more, makes it easier to evolve APIs over time, and enables powerful developer tools.
Why use GitHub’s GraphQL API ?
- Drastic Reduction in HTTP calls
In the above scenario, when we use GitHub’s REST API, we made 306 HTTP calls to retrieve data. All this data can now be received with a very few HTTP calls. We will see the exact count of HTTP calls and the details about the implementation in the next article.
- Only get the required fields
We only receive the data we ask for — nothing more and nothing less. We no longer have the need to delete the unused data.
- Single Endpoint
We no longer need to use different URLs (as in case for REST API calls) for retrieving the data. GraphQL has a single endpoint which is independent of the required data.
Currently, GitHub’s GraphQL API doesn’t support the above use case. A request has been made for the same purpose.
Originally published at https://www.linkedin.com on August 18, 2018.