Why We Need Indexes for Database Tables
Introduce B+Tree index without any formula and computer science theory
If you are not a DBA or a Database Developer, you may not know the mechanisms of the database index. But as long as you can write some SQL queries, you must have been heard of database indexes and know that indexes can improve the performance of the SQL queries.
In this article, I’ll try to use the simplest language and diagrams to illustrate how the B+tree index can improve the performance of your SQL queries. The reason why I use the B+tree index as the example is that
- It is used by most Relational Database Management Systems such as MySQL, SQL Server and Oracle
- It can improve the performance of most types of SQL queries, rather than specific types
How does it look like?
Let’s keep this instruction simple, here is a simplified diagram illustrating the structure of the B+tree index.

In the above B+Tree example, each rectangle stands for a block in the hard disk, while the blue filled dot stands for a pointer that links the blocks together.

Please be noted that this diagram has extremely simplified the B+Tree for demonstration purposes, as it assumes that each hard disk block can only contain 2 keys. In practice, this will be much larger.
It is important to understand how a B+tree index is constructed. We need to know that the “Leaf Nodes” level is supposed to have all the values of the field that this index was created on. In the above example, it is apparent that we have only 9 rows in this table column, and their values are from 1 to 9.
If you’re interested in how the above B+Tree was built, please refer to another article of mine: How B+Tree Indexes Are Built In A Database?
How does it work?
B+tree can help most of the database query scenarios, and this is also the reason why it is useful.
Query on Equal Testing
Let’s suppose that our SQL query is retrieving on “equal” where condition, for example:
SELECT *
FROM TABLE
WHERE ID = 3
To find the ID equals 3, the B+tree is used as follows.

- Starts from the top level of the tree, 3 is less than 5, so we need to use the pointer on the left of the number 5
- On the next level, 3 is in between 2 and 4, so we need to use the pointer in the middle
- We’ve got the block on the leaf node, 3 is in here
Query on Comparison
What if our SQL query is searching in a range? For example, here is the SQL query:
SELECT *
FROM TABLE
WHERE ID BETWEEN 3 AND 7
Here is the demonstration of the process.

- Starts from the top level of the tree, 3 is less than 5, so we need to use the pointer on the left of the number 5
- On the next level, 3 is in between 2 and 4, so we need to use the pointer in the middle
- We’ve got the block on the leaf node, 3 is in here
- Since we’re querying on the comparison, so the cursor will keep fetching in this block, so we can get the number 4
- We haven’t got to 7 yet, so the cursor will keep going to the next (right) leaf node block
- We reached the next block, so we’ve got the numbers 5 and 6. But it has not finished yet, similar mechanism as the previous step will be used to get to the next block
- We reached the next block which contains the number 7
- We have reached the upper bound of the range, so the query finished
B+Tree Characteristics
The most important characteristic of B+Tree indexes is that it consists of leaf node level and search key level of the tree.
- All the values of this indexed column appear in the leaf nodes.
- The non-leaf nodes are only used for searching purposes, so there are only pointers to the lower level. In other words, they can’t lead to the actual data entries.
- Every key in the leaf nodes have an extra pointer to the data entry, so it can lead the cursor to find/fetch the data rows.
How B+Tree Improves Performance?
As shown in the above examples, B+Tree works for both “equal” and comparison conditions.
Leaf v.s. Non-leaf Levels
It can be seen that the query only need to go through the search keys on the non-leaf nodes to find the expected values. Therefore, when the SQL query is retrieving on the column on which the B+Tree index was created, only a few levels of non-leaf nodes need to go through.
You must be thinking that the non-leaf nodes must be kind of overheads, and when there are a lot of data rows it will be slow down because there might be numerous non-leaf levels.
Partially correct. Yes, the non-leaf nodes need to be scanned to get the expected value. In fact, the number of scanned times exactly equals the number of non-leaf levels. However, in practice, the block on our hard disk will be much larger than the above examples. Typically, a table with 10 million entries can be put in a B+Tree with only 3 non-leaf levels. Even though the table is extremely big such as billions scaled, normally the number of non-leaf levels of the B+Tree is usually 4 or 5.
Hence, using B+Tree indexes can dramatically reduce the number of hard disk blocks scanned in a SQL query.
Why the Number of Blocks Scanned Matter?
I suppose the audience of this article might not have a computer science background, so I guess a simple explanation of the “block” might be necessary for a better understanding of the problem.
In our hard disk, the data is stored not always in sequence. A single file might be split and stored into different blocks. So, when we are reading a file/dataset/table, in order to scan the whole file, it will be necessary to jump between different blocks.
Typically, for a mechanical hard drive, there is a magnetic head that can only move up and down. When it is necessary to read data from a different location, the whole hard drive disk will rotate the location to where the magnetic head is so that the head can read the data.
Imagine that we are scanning 1000 blocks. The worst case is that the disk will need to be rotated 1000 times. If we’re using indexes, this number will be reduced to 4–5 times. That’s why an index can help to the performance.
Summary
In this article, I’ve shared how B+tree looks like and how it works and facilitate a SQL query, typically with equal and comparing conditions.
It turns out B+Tree is not the most advanced database index anymore, but I believe as probably the most classic index which is still pervasively used in most RDBMS, it is still the best example to demonstrate why we need indexes for a database table and how it works. Hope this is interesting enough for you.
If you feel my articles are helpful, please consider joining Medium Membership to support me and thousands of other writers! (Click the link above)