avatarTek Loon

Summary

The article discusses the use of partial indexes in MongoDB to improve performance, particularly in a hotel booking site scenario like Agoda.

Abstract

The article, inspired by the author's experience with the MongoDB Performance Course, delves into the concept of partial indexes as a means to enhance database performance. It defines partial indexes as indexes that cover only a subset of a collection's documents, which can lead to reduced storage and improved write operation performance. The author provides a use case for a hotel booking site, Agoda, to demonstrate when and how to implement partial indexes effectively. The article outlines the benefits of partial indexes, such as faster query execution and more efficient read and write operations, especially when dealing with frequently accessed data like hotel bookings filtered by categories such as star ratings. Through data preparation, problem identification, and solution optimization, the author illustrates the performance gains achieved by using partial indexes, highlighting a significant improvement in write performance and a 500% improvement in read performance.

Opinions

  • The author believes that partial indexes are crucial for maintaining good user experience by improving application speed, especially as data grows rapidly.
  • They suggest that partial indexes should be used when only a subset of data is frequently queried, such as preferential searches for higher-rated hotels.
  • The author emphasizes that while having multiple indexes can be beneficial for query optimization, it can also negatively impact write performance.
  • They argue that the strategic use of partial indexes can significantly reduce the cost of index maintenance without compromising query performance.
  • The author expresses that the decision to implement partial indexes should be based on the frequency of write operations and the specific needs of the business.
  • They conclude that understanding when and how to use partial indexes is essential for database optimization and can lead to substantial performance improvements.

MongoDB Performance 101: When & How You Should Use Partial Indexes

Photo by Denny Müller on Unsplash

This article was inspired when I am taking the MongoDB Performance Course exam. I thought I understood it quite well and turned out that my score was pretty bad. This leads me to write this article and allows me to study in-depth on the topic and at the same time relate it with a practical use case in today’s world.

What is Partial Index?

Partial index is the index where it only indexes a subset of your documents (data) in your collections. In laymen term, imagine you have a million record of data, but you would only want to index part of it, e.g, 500K.

Why Partial Index?

At this point, you might be thinking why would I do a partial index, why don’t I just index all of it.

You can do that but if the cost is high which will affect the performances and slow down your application. You definitely wouldn’t want this to happen especially when your data is growing fast.

Start giving a better User Experience by improving your application speed.

Implementing Partial Index resulted in two benefits:

  • Reduce Storage for Storing the Index
  • Lesser indexes lead to better write operation performances.

When I should use Partial Index?

Now we knew what is partial index and realized the importance of having it in our application. Here comes the next question probably is when I should use it?

These are my current general rules of thumb to implement a partial index. It could be different as time grows.

  • The data could be filtered by different categories, for e.g, 5 stars hotels, 4 stars hotels and etc.
  • Only a subset of data is most frequently queried. For example, users normally tend to search for five stars hotel instead of a one-star hotel.

Now we understood under what kind of conditions or situations we can implement partial index.

Let’s get our hands dirty in the real use case, which is a famous hotel booking site, Agoda.

Use Case: Hotel Booking Site, Agoda

Before I get started, there are some disclaimers that I want to make:

  • I do not work at Agoda.
  • What I suggest and wrote here is just my personal opinion and interpretation, which means I could be wrong.

Introduction & Scenario

Recently, I am planning to travel to Bangkok. Thus, I’m trying to book a room via Agoda.

After the booking process, I thought that this could be a very good and practical use case to implement partial index. In order to perform the booking, there are several fields required. Let’s see what field is required.

Agoda Booking Screenshot

From the screenshot above, we can search hotels using city name, country name and filter using the star rating.

  1. Destination or Hotel Name
  2. Star Rating

Now we understand the flow, let’s move on to the next step where we create the database and mock the dataset.

Data Preparation

I wrote a step-by-step guide previously on how to prepare the database model and generate dummy data for performance optimization.

What do I need?

  1. I need a list of records that contains hotel information. The hotel information includes hotel_name, city, country, stars. This is the minimal field in our collection in order to create a partial index. I will also create two more fields, which are price and average_review, just to make this more closely resemble real production data.
  2. I also require a data generation tool to generate a million-document dataset into our collection as that allows us to discover more significant findings, especially in query time. After some googling, I discovered a tool on GitHub, mgodatagen, which allows me to generate random data in MongoDB with very minimal configuration.

The details of the step are written in this article (This article is free!!). Feel free to check it out.

Problem identification

This section describes what kind of performance problems we encountered. We have to know what kind of performance problems we encountered before we optimize the performance.

What motivates you to implement partial index? The answer is when you have too many indexes and it affects your performances. Let’s create the following pre-requisite and do some experiment.

  • Create five million dataset
  • Create single-field indexes for the following field: hotel_name, city, country, and star.

The number of indexes is directly proportional to the number of data.

This is why we’re generating so much data. Besides, users tend to search hotels through city, country and star rating and also hotel name. Thus, it is a good practice to create an index for all of these fields to optimize your query performance.

Let’s have a look at the indexes we had created. According to the screenshot below, we generated almost 144.1MB indexes.

We created 144.1MB Indexes.

Let’s try to insert a hotel into the database and check for the writing speed.

Before you are able to check the writing speed, you have to configure your database profiling level. Run the following Mongo CLI command.

db.setProfilingLevel(2)

By setting the profiling level to 2, it means the profiler will collect data for all operations. You can refer to the docs here for more details about database profiler.

Now, let’s try to insert a hotel into the DB.

Setting up Profiling Level and Insert New Hotel Information

In order to check the writing performance, we can simply run the following Mongo CLI command by filtering using our collection name and insert operation.

db.system.profile.find({ ns: "property.hotel5m", op: "insert"}).pretty()
Using 16ms to insert a new hotel

From the screenshot above, we’re using 16ms to insert a new record into the database. This doesn’t only affect insert operation, it will also affect the execution time for update as well. If you frequently perform updates to the hotel collections, then this is something you might want to consider to optimize.

Solution Optimization

Now, let’s try optimizing by removing the star indexes and create a partial index for three stars hotels and above.

Step 1: Remove star single-field indexes

You can easily remove indexes using MongoDB Compass. Refer to the screenshot below, click on the rubbish bin icon in the red box and key in the index name in the provided field within confirmation dialog that shown.

Remove star_1 indexes
Confirmation Dialog to Drop Index

Step 2: Verify writes performance again!!

At this stage, we have removed the star indexes. Let’s try to create some new data into the collection.

The millis take to execute reduce from 16 to 8. Mind-Blowing!!!

On average, the write performance did increase 20–30% after several trials. Thus, it might worth the effort for us to create a partial index if we do really care about the write performances for this particular collection.

However, removing this index gives us much bigger drawbacks which the query on searching or sorting based on star rating will be very slow. Refer to the screenshot below.

Search for Hotel in Indonesia

We search for 5 stars hotel in Indonesia, however, there are 20546 hotels in Indonesia but only 4264 hotels are 5 stars hotels which are only 20% of the examined documents. Imagine you work for $100, but at the end of the day, you only get paid $20. Isn’t that frustrating?

Besides, the query execution time is 55ms. You might think 55ms is reasonable, but trust me, you can make this better.

Let’s create a partial compound index in the next step and see things will get better or not!

Step 3: Create a partial compound index for 3 stars hotel and above

Creating a partial index is very simple in this scenario. You’ll just have to specify the condition of the document that should be indexed. What I mentioned can be done using one single command.

db.hotel5m.createIndex({
  country: 1,  
  star: 1
}, { partialFilterExpression: {
  star: { $gte: 3 }
}});

What this command does is it call MongoDB the create index using city and star field, star:1 here means ascending order. Besides, it will do a filter which only star that is greater than or equal to 3 will get indexed. Pretty straightforward right?

After you run the command above and created the partial index. You will be able to see it via MongoDB Compass like the screenshot below.

Partial & Compound Index Created

Step 4: Verify the Write and Read Performance

We have created the partial index in the previous step. Next, we should proceed to verify the performances after creating the partial index.

On average, the write performance did increase compared to initial — 16ms. In the following screenshot, we even achieve 3ms. The write performance is a bit hard to examine in this stage as it is fluctuating.

Using 3ms to insert data into the database

However, the read performance did increase significantly. Aside from shorter query execution time, the efficiency also increases significantly which is 100%. Refer to the screenshot below.

On average, we achieve 13ms query execution time which is 500% improvement from the previous step. Besides, the query was very efficient. The ratio between documents returned & index keys examined is 1, which is superb. The read performance is optimized.

Conclusion

After doing this experiment, I discovered indexes have very minimal impact on the write operation performances. The impact is not significant.

However, it’s really up to the number of indexes you have in a collection and also how frequent the write operation to the database. For e.g, if you do have frequent updates or insert to a collection, then you might be more concern about the write performance compared to the read performance.

Besides, if your business needs really allow you to create a partial index which allows you doesn’t have to index everything, then this is the best practice you should do.

That’s the end of this article and I hope this article does provide you some understanding on partial index and also a hint on when & how you should implement partial index.

Thanks for reading.

References

Mongodb
JavaScript
Software Engineering
Performance
Database
Recommended from ReadMedium