avatarYang Zhou

Summary

This article provides a comprehensive guide on implementing heap sort in Python, detailing the heap data structure, the heap sort algorithm, and its practical application using Python's heapq module.

Abstract

The article titled "A Deep Dive into Heap and Heap Sort in Python: From Beginner to Expert" offers an in-depth exploration of heap data structures and the heap sort algorithm. It introduces the concept of heaps as complete binary trees with specific properties and distinguishes between max heaps and min heaps. The author explains the heap sort algorithm's mechanism, which involves building a heap from an input list and then extracting elements to sort the list efficiently. The article also covers the representation of heaps in Python using lists, the process of heapifying a list to create a heap, and maintaining the heap property after element extraction. Additionally, the article demonstrates how to leverage Python's built-in heapq module to simplify the implementation of heap sort, emphasizing the module's efficiency and ease of use. The time complexity of heap sort is discussed, highlighting its O(n*logn) time complexity and O(1) space complexity, which is particularly advantageous for sorting large datasets.

Opinions

  • The author suggests that heap sort is a high-performance sorting algorithm suitable for large datasets due to its O(n*logn) time complexity and O(1) space complexity.
  • The article conveys the idea that understanding the heap data structure is crucial for mastering heap sort and that the heapq module in Python simplifies heap operations significantly.
  • The author implies that the heap sort algorithm is underutilized, despite its efficiency, and encourages readers to adopt it to enhance their Python sorting capabilities.
  • The use of visual aids, such as diagrams and GIFs, is advocated by the author to illustrate the heap sort process, indicating a preference for visual learning methods.
  • By providing both a manual implementation and the use of the heapq module, the author demonstrates flexibility in approaching heap sort, catering to both educational and practical programming needs.
  • The author expresses enthusiasm for the Python community and encourages readers to support writers on Medium by joining through the provided referral link.

Algorithm, Data Structure

A Deep Dive into Heap and Heap Sort in Python: From Beginner to Expert

Master the heap and heap sort and take your Python skills to the next level

Photo by Adam Bichler on Unsplash

Are you tired of slow, inefficient sorting algorithms that take forever to process large datasets?

It’s time to check out the heap sort, a high-performance sorting algorithm that can quickly and efficiently sort datasets in O(Nlg(N)) time complexity and O(1) space complexity.

In this article, we’ll take a deep dive into heap and heap sort in Python, exploring the inner workings of this powerful algorithm and learning how to implement it step by step.

Table of Contents · Introduction to Heap: A Tree-Based Data Structure · How Does Heap Sort WorkHow to represent a heap in Python?How to build a heap?How to update the heap after extracting one element? · Implementing Heap Sort in Python · Use the heapq Module of Python · Time and Space Complexity of Heap Sort

Introduction to Heap: A Tree-Based Data Structure

A heap is a complete binary tree that satisfies the heap property, which states that the value of each node in the tree is greater than or equal to the values of its children.

There are two types of heaps:

  • Max heap: In a max heap, the value of each node is greater than or equal to the values of its children, and the root node has the maximum value in the tree.
  • Min heap: In a min heap, the value of each node is less than or equal to the values of its children, and the root node has the minimum value in the tree.

How Does Heap Sort Work

Heap sort is a comparison-based sorting algorithm that uses a heap data structure to sort a list of elements. It works by building a heap from the input list, and then repeatedly extracting the root element (the maximum or minimum value) and placing it at the end of the sorted list. This process is repeated until the heap is empty and the list is fully sorted.

Simply put, there are two steps:

  1. Building a heap
  2. Extracting the elements from the heap

How to represent a heap in Python?

First and foremost, we need to know how to represent a heap properly before building it.

Given that a heap is a complete binary tree, we can just use a Python list to represent the heap.

The list representation of a tree means the real tree is in our mind. We always operate the list in the real code.

It works because of the following special relations between the list and the tree:

  • The root of the tree is always the element at index 0.
  • The left child of a node at index i is stored at index 2i+1, and the right child is stored at index 2i+2.

Therefore, we can always access the tree’s nodes easily through the index of the list.

For example, here is an input list named arr.

Its elements are [5, 2, 7, 1, 3].

It represents an original complete binary tree and we need to convert it as a heap like the following:

The original tree and the target heap
  • The root of the tree is arr[0]
  • The left child of the root is arr[2*0+1], and the right child is arr[2*0+2].

How to build a heap?

The process of building a heap from a list is named heapify.

The original list is already a representation of a binary tree (the tree is in our mind) as shown above. What we need to do is converting the original tree into a heap.

Since the max heap and the min heap share the similar idea. Let’s just talk about how to build a max heap.

The idea is to traverse the non-leaf nodes of the binary tree from bottom to top to construct a max heap, for each non-leaf node, compare it with its left and right children, and swap the largest value with the parent node of this subtree.

Why start from non-leaf nodes rather than the last node?

Because there are no child nodes below leaf nodes, there is no need to operate.

Why traverse from bottom to top rather than from top to bottom?

If we go up from the bottom, the largest value will be put to the root node for sure (similar idea as bubble sort).

How to update the heap after extracting one element?

After converting the original list into a heap, the rest is simple. We just need to repeatedly extract the root element of the heap, which is the maximum/minimum element, and put it into the end of the sorted list that needs to be returned.

However, there is one thing we need to do — update the whole heap after extracting the root node.

It needs two steps:

  1. Replace the root node with the last element in the heap: Remove the root node (which has the largest value in a max heap or the smallest value in a min heap) and replace it with the last element in the heap.
  2. Heapify the whole heap again.

Here is a visualized example of the whole heap building and sorting process:

The process of heap sort

Implementing Heap Sort in Python

Talk is cheap, let’s see the code:

The key function is the heapify and heapify_node. They are used to build a heap from the original list.

Use the heapq Module of Python

Fortunately, we don’t need to implement the fundamental functions to build a heap every time. Python has a built-in module named heapq.

The heapq module provides functionalities for working with heaps, including:

  • heappush(heap, elem): Push an element onto the heap, maintaining the heap property.
  • heappop(heap): Pop and return the smallest element from the heap, maintaining the heap property.
  • heapify(x): Transform a regular list into a heap, in-place, in linear time.
  • nlargest(k, iterable, key=None): Return a list with the k largest elements from the iterable specified and satisfy the heap property.
  • nsmallest(k, iterable, key=None): Return a list with the k smallest elements from the iterable specified and satisfy the heap property.

The heapq module is implemented as a heap queue, which is a priority queue implemented as a heap. It is a useful tool for efficiently sorting and searching lists and other iterable objects.

Now, let’s use it to do the heap sort:

import heapq

def heap_sort(arr):
    # Build a heap
    heapq.heapify(arr)

    # Extract elements from the heap one by one
    sorted_arr = []
    while arr:
        sorted_arr.append(heapq.heappop(arr))

    return sorted_arr

# Test the heap sort function
arr = [5, 21, 207, 19, 3]
print(heap_sort(arr))
# Output: [3, 5, 19, 21, 207]

Simple and elegant, isn’t it? 😎

Time and Space Complexity of Heap Sort

The time complexity of heap sort is O(n*logn), where n is the number of elements in the list. This is because the heapify function takes O(logn) time to heapify a single node, and it is called O(n) times (once for each node in the heap). The heappop function also takes O(logn) time to extract the root element from the heap and restore the heap property, and it is called O(n) times (once for each element in the list).

The space complexity of heap sort is O(1), because the sorting is done in place and no additional space is required. This is also a big advantage of heap sort. Cause it saves lots of memory usage, especially when the original datasets are large.

Thanks for reading. ❤️

Join Medium through my referral link to access millions of great articles:

Algorithms
Data Structures
Python
Programming
Data Science
Recommended from ReadMedium