avatarVincent Blanchon

Summary

The provided web content is an in-depth exploration of Go's map data structure, detailing its hash table implementation, including buckets, hashing, and load factor.

Abstract

The article "Go: Map Design by Example — Part I" is the first in a series that delves into the intricacies of Go's built-in map type. It explains that the map is implemented as a hash table, which utilizes buckets to store key/value pairs. Each bucket can hold up to 8 pairs, and when the map's capacity is reached, the number of buckets is doubled to accommodate more pairs. The article also covers how keys are hashed to determine their bucket placement and how an internal top hash table is used for efficient data access. Additionally, it discusses the concept of overflow buckets when a bucket's slots are filled, and how Go manages map growth through a process called evacuation. The load factor, set at 6.5 for Go's map, is a critical metric that determines when the map should expand to maintain performance. The article concludes by referencing a study that informed the choice of the load factor and hints at further exploration of map implementation in the subsequent article.

Opinions

  • The author suggests that readers follow the series in order, implying that the content is structured to build upon previous parts.
  • The choice of a load factor of 6.5 is presented as a balance between minimizing overflow buckets and wasting space, indicating a trade-off that was carefully considered.
  • The article implies that the performance of Go's map can degrade if too many overflow buckets are created, highlighting the importance of the map's growth strategy.
  • The inclusion of a study on load factors suggests that the Go team has conducted empirical research to optimize map performance.
  • The author's use of examples and diagrams conveys a preference for visual and practical explanations to complement the technical discussion.

Go: Map Design by Example — Part I

Illustration created for “A Journey With Go”, made from the original Go Gopher, created by Renee French.

ℹ️ This article is the first of a series of three articles. Each article will cover a different part of the map. I suggest you read them in order.

The built-in type map provided by Go implements a hash table. In this article, we will explore the implementation of the different parts of this hash table: the buckets (structure that stores the key/value pairs), the hashing (index of those pairs), and the load factor (metric that defines the capacity of the map should grow).

Buckets

Go stores the key/value pairs in a list of buckets where each bucket will hold 8 pairs, and when the map is running out of capacity, the number of hash buckets doubles. Here is a diagram of the rough representation of a map with 4 buckets:

map with a list of buckets

We will see in the next article how the key/value pairs in the bucket are stored. If the map grows again, the number of buckets will double to 8, then 16, and so on.

When a pair comes into the map, it will be dispatched to a bucket according to the hash calculated from the key.

Hashing

When a key/value pair is assigned to the map, Go will generate a hash based on this key.

Let’s take an example with the insertion of key/value pair "foo" = 1. The hash generated could be 15491954468309821754. This value will be applied to a bit operation with the mask corresponding to the number of buckets minus 1. In our example with 4 buckets, it will give a mask of 3 and the following operation:

value dispatched in the buckets

The hash is not only used to dispatch the value in the bucket. An extra operation will be applied on the hash. Each bucket uses the top byte of the hash and keeps it in an internal array that will allow Go to index the key and keep track of the empty slots in the bucket. Let’s have a look at an example with the binary representation of the hash:

top hash table in bucket

Thanks to this internal table in the bucket, called top hash, Go will be able to use them during data access and compare it with the hash of the requested key. According to the usage of the map in our programs, Go needs a way to grow the map to manage more key/values pairs.

Map growth

If a bucket receives a key/value pair to be stored, it will store it inside the 8 available slots in the bucket. If none of them are available, an overflow bucket will be created and linked to the current bucket:

overflow bucket

This overflow property concludes the internal structure of the bucket. However, adding overflow buckets will deteriorate the performance of the map. To remedy this, Go will allocate new buckets (double the current number), keep a link between the old buckets and the new ones, and will incrementally evacuate the buckets. Indeed, after this new allocation, each bucket involved in a writing operation will be evacuated if it has not been done already. All the key/value pairs of the bucket to be evacuated will be re-dispatched in the new buckets, which means pairs stored together previously in a similar bucket could now be dispatched in different buckets.

Go uses its load factor to know when it should start the allocation of the new buckets and this evacuation process.

Loading factor

Go has chosen a load factor of 6.5 for its map. You can find the study related to this load factor in the code:

// Picking loadFactor: too large and we have lots of overflow
// buckets, too small and we waste a lot of space. I wrote
// a simple program to check some stats for different loads:
// (64-bit, 8 byte keys and values)
//  loadFactor    %overflow  bytes/entry     hitprobe    missprobe
//        4.00         2.13        20.77         3.00         4.00
//        4.50         4.05        17.30         3.25         4.50
//        5.00         6.85        14.77         3.50         5.00
//        5.50        10.55        12.94         3.75         5.50
//        6.00        15.27        11.67         4.00         6.00
//        6.50        20.90        10.79         4.25         6.50
//        7.00        27.14        10.15         4.50         7.00
//        7.50        34.03         9.73         4.75         7.50
//        8.00        41.10         9.40         5.00         8.00
//
// %overflow   = percentage of buckets which have an overflow bucket
// bytes/entry = overhead bytes used per key/value pair
// hitprobe    = # of entries to check when looking up a present key
// missprobe   = # of entries to check when looking up an absent key

The map will grow if the average number of values per bucket exceed 6.5. Since the distribution based on the hash of the key could not be evenly distributed, choosing a value close to 8 could result in a lot of overflow buckets, as we saw in the study.

⏭ The next article of this series, “Go: Map Design by Code”, will explain the internal implementation of the map.

Golang
Go
Internals
Recommended from ReadMedium