avatarYong Cui

Summary

The provided content discusses advanced features of Python generators, emphasizing their memory efficiency and unique characteristics that may not be widely known, which can help developers avoid common pitfalls and bugs.

Abstract

Python generators are a powerful tool for implementing lazy evaluation, which is particularly useful for handling large datasets without consuming excessive memory. The article delves into seven advanced features and potential pitfalls of generators, including their lack of a length attribute, their nature as exhaustive iterators, and the ability to send data back into a generator. It also covers the use of generator expressions, the ability to omit parentheses in certain cases, and the versatility of the yield from statement for chaining multiple generators. Additionally, the article explores the two-way communication capabilities of generators using the send() and throw() functions, which can be leveraged to create more dynamic and interactive generator-based code.

Opinions

  • The author assumes that readers have a basic understanding of Python generators and their common use cases before discussing more advanced features.
  • There is an emphasis on the importance of understanding the exhaustive nature of generators to prevent unexpected behavior in code, particularly when reusing generators in multiple loops.
  • The author suggests that the ability to send information back to a generator makes it a more interesting and versatile tool, implying that this feature enhances the interactivity and functionality of generators.
  • The article implies that a lack of knowledge about these advanced generator features could lead to bugs, highlighting the need for a deeper understanding of generators beyond their basic functionality.
  • The author provides practical examples and comparisons, such as the similarity between generator expressions and list comprehensions, to clarify complex concepts and prevent confusion.
  • The use of the yield from statement is presented as a convenient way to integrate multiple generators, suggesting that it can simplify code that would otherwise require more complex iteration logic.
  • The article encourages readers to explore the dis module for a deeper understanding of the underlying implementation mechanisms of generators, indicating a value for thorough technical exploration.

Python: 7 Advanced Features That You May Not Know About Generators

A deeper understanding of Python generators

Photo by Andreas Dress on Unsplash

Introduction

Generators

Generators are a great feature in Python that implements the lazy evaluation paradigm. Unlike other iterables, such as lists and sets, which load all elements in the memory, generators render elements once at a time when requested. Thus, they are very memory efficient and particularly useful when memory-expensive operations are involved. The following is a very common use case with generators in Python.

As shown in the above code, suppose that we’re working with a very large file. When it’s read, Python creates a file object for us, which functions as a generator. When we iterate the file object, we get a row of data each time (i.e., one element at a time), which saves lots of memory compared to other methods that may have to load all rows into the memory.

Generator creation

To create a custom generator, we usually need to define a generator function, which is generally considered a factory function. Being a factory means that calling the particular function will make something for us, and in this case, a generator factory makes a generator when the function is called. Let’s see the following example of creating a simple generator.

As shown above, the magic of the generator function is the use of the yield keyword, which produces the desired element for the caller. Importantly, the generator itself will remember the status of the element rendering sequence, and it will produce the next element as applicable. As mentioned before, calling the generator function will create a generator, and as shown in the code, the generator is assigned to a variable called abc_gen. With this created generator, we can use it in the for loop, as shown in lines 10–12 above. We can also use the introspection function type() to verify that what we’re getting from the generator function is indeed a generator object.

Another common way to create a generator is to use the generator expression, which has the following syntax: (expression for x in iterable). The syntax is very similar to a list comprehension, except for the use of parentheses instead of square brackets. Please don’t confuse these two techniques. If you accidentally use list comprehension, a list of all elements will be generated, which defeats the purpose of saving memory with the use of generators.

Let’s see a simple example of creating a generator using the generator expression.

The premise

After this brief overview, I hope you have a basic understanding of Python generators. Beyond this point, I’ll just suppose that you have already gained a good understanding of generators and use them where applicable, such as in the examples given above. However, there are several features about generators that some of you may not know, and a lack of such knowledge may lead to unexpected bugs in your code. In this article, I’d like to highlight seven less known generator features and/or pitfalls.

Feature/Pitfall Highlights

1. Generators don’t have a length

The examples shown above all involve using generators in the for loops. As you know, for loops operate by going over iterators that are created from iterables. The most common iterables in Python that we get to know early in our learning process are built-in data types, including lists and sets. Notably, one common shared feature among these data types is length characteristics. In other words, we can find out how many elements are in these iterables using the built-in len() function.

Thus, applying the same rationale may be tempting in some cases where we want to find out the length of a generator. For example, after consuming some elements of the generator, we may want to know how many elements are left that are available for further yielding. Unfortunately, generators don’t have such a feature, as shown below. Please note that for simplicity in code demos, I’ll just use generator expression to create generators, unless noted otherwise.

As shown above, we’re not able to get the length of the generator. Actually, this feature (i.e., lack of length) isn’t only true of generators but also of iterators. You may wonder what relationships there are between these generators and iterators, which leads us to the next section.

2. Generators are iterators

We all know that iterators are used in for loops, and generators are used as well, as shown above. So you may postulate that generators are iterators too. Indeed, your postulation is right. Generators are a special kind of iterator. Iterators are typically created by using the built-in iter() function. For example, the following code shows you how we create iterators for lists and dictionaries using the iter() function.

A more notable signature feature of iterators is the use of the next() function to get the next item. Because generators are iterators too, they can use the next() function as well. Consider the following trivial example of this feature. As you can see, we are able to retrieve one item at a time by calling the next() function using the generators, just like other iterators.

3. Generators are exhaustive

To begin with, this feature is actually not unique to generators, but all iterators can be exhaustive as well. What does exhaustive mean? Let’s see a simple example first, before I explain it.

As shown above, after calling the next() function multiple times, we encounter the StopIteration exception. This exception means that we have “exhausted” the elements of the generator, or iterator, more generally speaking. This is why some people say that we “consume” the elements of generators because the consumed elements won’t be yielded again from the generators.

Notably, this StopIteration exception is internally handled in standard operations, such as for loops and comprehensions. Therefore, we never see this exception in these operations. However, not knowing the exhaustive nature of generators can be the root of some bugs in your code.

Consider the following possible scenario. As you can see, if someone uses the same generator in multiple for loops, no more action will be performed in later iterations because the generators have already exhausted their elements. More importantly, no exceptions are thrown because of the internal handling of the StopIteration exception in for loop iterations.

As a side note regarding the exhaustiveness of generators, there are exceptions with infinite generators such as iter(int, 1), which will render 0s for an infinite number of times. Thus, we’ll never exhaust these infinite generators, although they’re not used too often in real-life projects.

4. The parentheses may be omitted sometimes

As shown previously, we need to use parentheses to enclose the body for the expression in generator expression, and I said that we shouldn’t be confused with list comprehension, which uses square brackets to enclose the expression. However, there is a syntactic sugar so that when the generator is the only parameter in a function call, we can omit the parentheses. Let’s consider some simple examples below.

In case you’re wondering how we can know whether a generator or something else (i.e., a list) is used as an intermediate object, here’s an interesting trick to examine what is interpreted for the expression. As shown in the following code snippet, we use the built-in type() function, which uses the generator expression without parentheses. As you can see, the interpreted object is indeed a generator. If you want to have a more in-depth understanding of the underlying implementation mechanisms, you can use the dis module, which will allow us to know what’s happening during a function call, and you’ll be able to find out that a generator is created as an intermediate step.

5. You can choose where to yield from

We already know that we use the yield keyword to produce an element when the generator is requested by the caller. However, there is a special usage of yield which, by combining it with from, will allow us to specify from which generator we want to retrieve the element. It’s pretty useful when you have multiple generators to work with and you want to integrate them. The following code shows you how we can mimic the behavior of the chain() function, which is an iteration tool in the itertools module as part of the standard library.

As you can see, we supply the custom_chain function with two iterables. Using a for loop, we iterate these two iterables sequentially, and for each iteration, we ask the iterable that’s being iterated to yield the applicable element for us. From the output, you can tell that our custom chain function with the use of the yield from feature does the job as the built-in chain() function.

6. Send information back to the generator

So far, the generators work as one-way traffic that only outputs values when requested. Although they serve our business needs in most scenarios, one-way traffic is kind of boring, isn’t it? Actually, it’s possible for us to send information back to the generator, making it more interesting two-way traffic. Let’s explore this feature with the following trivial example.

In the above code, we define a generator function that renders money in the pool. After the initial game setup, the pool has $100, which we get to learn by using the next() function to retrieve the beginning value. As the game goes on, users bet more money, and we achieve it by calling the send() function to inject the data into the generator. In this case, the bets are 20 and 50 for two rounds. As you can see, these amounts are added to the running totals that are produced by the generator.

7. Throw exceptions with generators

The above section reviews how we use the send() function to communicate with generators. Actually, there is another way to have this kind of two-way communication — the throw function. Let’s see how it works with the following example with a realistic context.

In the above code, we mimic how a sand timer works. In the beginning, the level is 100 units, and we assume that with every block of time elapsed, it drops five units. However, if we flip the timer, it will start from 100 units again. To mimic the flipping behavior, we call the throw() function on the generator. Notably, after catching and handling the exception, the generator continues to run and renders the next level (i.e., 95) to us.

Conclusions

In this article, we reviewed seven advanced features about generators in Python. Understanding these features will not only help you avoid possible bugs due to incorrect usage of generators but also provide you with the opportunities of taking advantage of these features to create more robust generators (e.g., two-way interactions of sending and throwing) in your Python code.

Programming
Data Science
Python
Technology
Software Development
Recommended from ReadMedium