The website content discusses how to solve the N+1 query problem in Python Graphene Django when creating a GraphQL API by using Dataloader for batching and caching database queries.
Abstract
The provided website content outlines an approach to improve the efficiency of GraphQL APIs in Python Graphene Django by addressing the N+1 query problem. This problem arises when an API request triggers a multitude of similar database queries, which is highly inefficient. The article explains the concept of Dataloader, a utility that batches multiple database requests into a single query and caches the results to optimize performance. By implementing Dataloader, the number of database queries is significantly reduced, resulting in faster response times for the API. The author provides a step-by-step guide, including code examples and explanations, to demonstrate how to integrate Dataloader into a Graphene Django application. The guide covers creating a middleware to handle all requests, updating DjangoObjectType, and generating Dataloader classes for both One-to-Many and Many-to-One relationships in the data models. The result is a more efficient and performant GraphQL API, as evidenced by the reduction in response time from 2.79 seconds to 1.46 seconds in the author's example.
Opinions
The author believes that the N+1 problem is a significant issue that developers will encounter as they develop their GraphQL APIs.
Dataloader is presented as a powerful tool that can greatly enhance the performance of a GraphQL API by reducing the number of database requests.
The article suggests that users should compare the performance of the API before and after implementing Dataloader to appreciate the improvement.
The author emphasizes the importance of understanding the relationship between data models, such as One-to-Many and Many-to-One, when implementing Dataloader.
The author provides additional resources and references to further assist developers in understanding and implementing Dataloader in their projects.
By using Dataloader, the author conveys a strong opinion that developers can achieve a more efficient and scalable GraphQL API.
Solving N+1 Problem With Dataloader in Python Graphene Django
A guide with examples on how to use dataloader in Graphene-Django to optimize Python GraphQL
There’s one big problem that you will eventually uncover as you spend more hours developing your GraphQL API. The problem doesn’t give you any errors nor it is obvious to developers — the “N+1” problem.
The goal of this article is to shed some light on how one could use Dataloader to address the “N+1” problem in Graphene Django while creating a GraphQL API for the One-to-Many and Many-to-One relationship between tables.
In case you are lost somewhere in between or you would like to try out everything on your own, you may find the source code here at GitHub.
I will also be using Insomnia REST Client to test out the APIs in this article. If you are interested to learn more about Insomnia, check out this article:
For this article, all the assumptions will be made based on the data models below shown in the models.py file embedded. It is also important to take note that the relationship between Reporter and Article are as below:
One Reporter can have many Articles (One-to-Many)
Each Article can only have one Reporter (Many-to-One)
Problem Statement
Imagine if we want to query a list of 500 reporters and all the articles associated with them from our GraphQL API. We would perform such a query as below:
While the query doesn’t raise any exceptions or errors, the SQL queries executed are as below.
NOTE: To view the SQL queries generated for the underlying GraphQL queries made using Django Graphene, check out the documentation here. It allows us to view various debug information about the current request/response.
The “N+1” Problem
SELECT * FROM starter_reporter LIMIT500
SELECT * WHERE starter_article.reporter_id = '1'
SELECT * WHERE starter_article.reporter_id = '2'
...
SELECT * WHERE starter_article.reporter_id = '499'
SELECT * WHERE starter_article.reporter_id = '500'
The query for reporters → articles
As you can tell, this is highly inefficient where 500 identical database queries were sent to our database.
The entire request took our server about 2.8 seconds on average to respond where we can do better.
Goal
Ultimately, we want to reduce the identical 500 SQL queries into 1 single query such as below with the help of batching using Dataloader.
-- Our Final Goal --
SELECT * FROM starter_article WHERE starter_article.reporter_id IN ('1', '2',..., '499', '500')
Solution
Dataloader to the rescue 🚀🚀🚀
Dataloader is a generic utility to be used as part of your application’s data fetching layer to provide a consistent API over various backends and reduce requests to those backends via batching and caching. (source)
Concept
Conceptually, here are the 3 key steps of what a Dataloader do:
Collects a list of keys (object IDs)
Calls a batch loading function with the list of keys
Returns a Promise which resolves to a list of values
On top of these 3 key steps, there are 2 main constraints this function must uphold (source):
The array of values must be the same length as the array of keys.
Each index in the array of values must correspond to the same index in the array of keys.
Code Example
Our implementation of Dataloader will be divided into two separate parts:
Part 1: Fetch a list of Articles for each Reporter (reporters → articles)
Part 2: Fetch a single Reporter for each Article (articles → reporter)
Creating a Middleware to Handle All Requests
NOTE: In this example, we would need to create newself.*_loaders in__init__ every time we want to create a new Dataloader for a resolver.
We would then need to update our Djangosettings.py to include our newly created LoaderMiddleware.
Next, let’s use the newly created LoaderMiddleware at our ReporterNode type and create a new field with a resolver called dataloaderArticles.
NOTE: Sure we could override the default articles query, but in this article I want to allow users to compare the performance difference side by side and hence we will create a new field called dataloaderArticles instead.
Creating the Dataloader
Part 1: Fetch a list of Articles for each Reporter (reporters → articles)
This is the code example of a generic Dataloader function where the Type is passed in as a variable.
Explanation — Whenever a query is executed to get fetch all the available articles associated with the reporter, the Dataloader would collect a list of keys (i.e. Reporter ids) from our resolve_dataloder_articles resolver.
The Dataloader would then look up the Article table (passed in via middleware.py) and fetch all Article objects using the Reporter ids from keys.
A dictionary map of resuts_by_ids would be populated as below and converted into an array where each index in the array of values corresponds to the same index in the array of keys.
# NOTE:# '1'...'500' are the Reporter IDs.# Each Reporter ID key would contain a list of the related Article object
Lastly, the array is then passed to a Promise which resolves to a list of values.
Part 1 Query Result
The query for reporters → articles (using dataloaderArticles)
Let’s query for articles again using our newly created field dataloaderArticles.
If we were to look at our SQL query, here’s what it looks like now:
SELECT * FROM starter_reporter LIMIT500
SELECT * FROM starter_article WHERE starter_article.reporter_id IN ('1', '2',...,'499', '500')
This request only took 1.46 seconds this time, previously at 2.79 seconds.
Part 2: Fetch a single Reporter for each Article (articles → reporter)
In this section, we will work on the relationship query between Article and Reporter in reverse. Let’s start by adding a new Dataloader class within loaders.py.
Within middlewares.py, let’s update our Loader class with reporter_by_article_loader.
classLoaders:
def__init__(self):
self.reporter_by_article_loader = loaders.generate_loader(ReporterNode, "id")() # Part 2
Next is to update the ArticleNode with resolve_dataloder_reporter resolver.
Part 2 Query Result
Final Thoughts
By using batch and caching, Dataloader enables our GraphQL API to be much more efficient while fetching data by greatly decreasing the number of requests made to our database.
You may find the complete version of the code at source code here. Good luck!