Database — Individual Columns vs JSON Column
Read About Storing JSON in Database vs Having a New Column for Each Key in Just 5 Minutes (with GraphQL as Example)
When designing a new data model, there are many possible data models for storing information. We could choose to store information with a JSON blob or create a database table with columns for each information.
In this blog post, I am going to provide an example scenario with two different approaches, and then compare these two approaches in GraphQL terms.
I also wrote a short piece on how to boost your GraphQL query performance using Dataloader in Python:
Scenario
- Imagine we are going to design an admin dashboard (page A), where we need to display our users’ information such as
date, name, age, sex, job, salaryetc. - This application has both mobile and web view
- Using GraphQL API with relational database
Proposed Solutions
1. Approach “Schema A”
# GraphQL Query
query {
employees {
date
name
age
salary
sex
job
}
}// JSON Response
{
"data": {
"employees": [
{
"date": "2020-09-25 00:00:00+08:00",
"name": "Jack",
"age": 30,
"salary": 3000,
"sex": "M",
"job": "Software Engineer"
},
{
"date": "2020-09-26 00:00:00+08:00",
"name": "Jimmy",
"age": 32,
"salary": 4000,
"sex": "M",
"job": "Senior Software Engineer"
},
{
"date": "2020-09-27 00:00:00+08:00",
"name": "Jessie",
"age": 38,
"salary": 10000,
"sex": "F",
"job": "CTO"
}
]
}
}- Each of the user’s information is saved as an individual column in the database
- Whenever we need to add a new piece of information, i.e.
phone_no, we will need to run database migration, add a new field to the employees' query, etc. - Front-end has the flexibility to always get exactly what they need
2. Approach “Schema B”
# Schema B GraphQL Query
query {
employees {
date
metadata
}
}// JSON Response
{
"data": {
"employees": [
{
"date": "2020-09-25 00:00:00+08:00",
"metadata": {
"name": "Jack",
"age": 30,
"salary": 3000,
"sex": "M",
"job": "Software Engineer"
}
},
{
"date": "2020-09-26 00:00:00+08:00",
"metadata": {
"name": "Jimmy",
"age": 32,
"salary": 4000,
"sex": "M",
"job": "Senior Software Engineer"
}
},
{
"date": "2020-09-27 00:00:00+08:00",
"metadata": {
"name": "Jessie",
"age": 38,
"salary": 10000,
"sex": "F",
"job": "CTO"
}
}
]
}
}- In this approach, we save all the employee information as a single JSON data type column in our database, called
metadata - This solution can also be known as schemaless data where employee data does not conform to a rigid schema
- No migrations needed if we were to add new pieces of information about the employee
- JSON is extremely flexible where it allows us to update or add new information very quickly
- Whenever we add or update a new key-value pair to the
metadatafield, mobile developer does not require to push a new build to receive that new piece of information (this is great because unlike web applications, not every user tend to update their mobile applications)
Comparing “Schema A” vs “Schema B”
In our current scenario when designing page A, Schema B seems like a possible solution to go for, we get everything we need (date, name, job , age, salary, sex) by querying the metadata field alone. However…
Over-fetching
In Schema B, lumping all the information together as a single metadata column will result in having to retrieve a lot of unnecessary information when accessing single information. Here’s an example:
Three weeks down the road… We need page B
- Page B needs
nameandage,jobfields only - Sure we could still use schema B, but we are fetching ALL the information because the
metadatadoes not provide us the flexibility to choose — Over-fetch 😕
Another five weeks later… We need page C
- Page C needs
salaryandnameonly — Over-fetch 😟
A year down the road… Yikes! We need page D with different sets of data️
- Frontend loses the ability to only ask for what they want where we end up over-fetching ☹️
- Over-fetching is bad because it slows down our application as it scales
# Schema B GraphQL Query
query {
employees {
date
metadata # What happens if you want to omit `age`? You can't -- Over-fetch
}
}# Schema A GraphQL Query
query {
employees {
date
name
job
salary
sex
age # If you do not need to get the `age` information, you can just omit this
}
}- Sure, one could argue that whenever we have a new
ratingfield, the front-end does not need to do anything becausemetadatafield can be updated behind the scene to includeratingdata inside it - But how often are we not going to update the frontend code whenever new data comes in? 💭
Development Pitfalls
- Another temptation of using JSON is to avoid migration schema — however, if a team is not disciplined enough, it is very easy for the team to stick to yet another key/value pair into a JSON field
- Since there is no migration for it, years down the road, no one will remember what it was for and there is no validation to it
- One day, this flexibility will come at a cost and it will become a real pain point as it will become harder and harder to change because you might have already built so many things on top of this design decision
- Over time, adding new features, having the data in JSON field will lead to more complicated looking queries as compared to using traditional columns
So, when to use JSON Columns?
- If you are dependent on data whose schema is not fixed
- Storing configuration data that has no defined schema
- If you are dependent on data coming from an external API as JSON, you can avoid the process of normalizing this data into different tables. Instead, you can store this data in the same format and structure that you received it in
Final Thoughts
- Like most things, it depends. There is no wrong/good or bad in and of itself to store data in columns or JSON. It boils down to what you need and understanding what the technical trade-offs are
- Looking backward, JSON column’s flexibility has allowed us to get something out of the door and iterate fairly quickly
- However, over a period of time as the team size grows, the JSON fields became junk drawers for little pieces of this and that where there are no consistency, validation, or integrity between information
