avatarDacio Romero

Free AI web copilot to create summaries, insights and extended knowledge, download it at here

6536

Abstract

an>): <span class="hljs-variable language_">self</span>._current += <span class="hljs-number">1</span></pre></div><div id="b902"><pre> <span class="hljs-keyword">if</span> <span class="hljs-variable language_">self</span>._current >= <span class="hljs-variable language_">self</span>.<span class="hljs-symbol">_stop:</span> <span class="hljs-keyword">raise</span> <span class="hljs-title class_">StopIteration</span></pre></div><div id="0e8b"><pre> <span class="hljs-keyword">return</span> <span class="hljs-keyword">self</span>._current</pre></div><div id="95a7"><pre><span class="hljs-built_in">r</span> = my_range(<span class="hljs-number">10</span>) print(list(<span class="hljs-built_in">r</span>)) <span class="hljs-comment"># [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]</span></pre></div><p id="68eb">That wasn’t too hard, but unfortunately, we have to keep track of variables between calls of <code>next()</code>. Personally, I don’t like the boilerplate or changing how I think about loops because it isn’t a drop-in solution, so I prefer <a href="https://blog.dacio.dev/2019/05/03/python-iterators-and-generators/#generators">generators</a></p><p id="12c2">The main benefit is that we can add additional functions that modify its internal variables such as <code>_stop</code> or create new iterators.</p><blockquote id="a21b"><p><i>Class iterators have the downside of needing boilerplate, however, they can have additional functions that modify state.</i></p></blockquote><h1 id="5c16">Generators</h1><p id="a3a5"><a href="https://www.python.org/dev/peps/pep-0255/">PEP 255</a> introduced “simple generators” using the <code>yield</code> keyword.</p><blockquote id="06a7"><p><i>Today, generators are iterators that are just easier to make than their class counterparts.</i></p></blockquote><h2 id="d9f6">Generator Function</h2><p id="2fc9">Generator functions are what was ultimately being discussed in that PEP and are my favorite type of iterator, so let’s start with that.</p><div id="aa1f"><pre><span class="hljs-function">def <span class="hljs-title">my_range</span><span class="hljs-params">(stop)</span>: index =</span> <span class="hljs-number">0</span></pre></div><div id="148a"><pre> <span class="hljs-keyword">while</span> <span class="hljs-built_in">index</span> < <span class="hljs-keyword">stop</span>: yield <span class="hljs-built_in">index</span> <span class="hljs-built_in">index</span> += <span class="hljs-number">1</span></pre></div><div id="6d15"><pre><span class="hljs-built_in">r</span> = my_range(<span class="hljs-number">10</span>) print(list(<span class="hljs-built_in">r</span>)) <span class="hljs-comment"># [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]</span></pre></div><p id="4a90">Do you see how beautiful those 4 lines of code are? It’s slightly significantly shorter than our list implementation to top it off!</p><blockquote id="cc8f"><p><i>Generator functions iterators with less boilerplate than classes with a normal logic flow.</i></p></blockquote><p id="268f">Generator functions automagically <b>“pause”</b> execution and return the specified value with every call of <code>next()</code>. This means that <i>no code</i> is run until the <b>first</b> <code>next()</code> call.</p><p id="885e">This means the flow is like this:</p><ol><li><code>next()</code> is called,</li><li>Code is executed up to the next <code>yield</code> statement.</li><li>The value on the right of <code>yield</code> is returned.</li><li>Execution is paused.</li><li>1–5 repeat for every <code>next()</code> call until the last line of code is hit.</li><li><code>StopIteration</code> is raised.</li></ol><p id="5f65">Generator functions also allow for you to use the <code>yield from</code> keyword which future <code>next()</code> calls to another iterable until said iterable has been exhausted.</p><div id="044e"><pre><span class="hljs-function">def <span class="hljs-title">yielded_range</span>(): <span class="hljs-keyword">yield</span> <span class="hljs-keyword">from</span> <span class="hljs-title">my_range</span>(<span class="hljs-params"><span class="hljs-number">10</span></span>)</span></pre></div><div id="0c21"><pre><span class="hljs-selector-tag">print</span>(<span class="hljs-built_in">list</span>(<span class="hljs-built_in">yielded_range</span>())) # <span class="hljs-selector-attr">[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]</span></pre></div><p id="47ae">That wasn’t a particularly complex example. But you can even do it <i>recursively</i>!</p><div id="3eb9"><pre><span class="hljs-function">def <span class="hljs-title">my_range_recursive</span><span class="hljs-params">(stop, current = <span class="hljs-number">0</span>)</span>: if current >=</span> stop: <span class="hljs-keyword">return</span></pre></div><div id="e1b4"><pre> <span class="hljs-keyword">yield</span> current <span class="hljs-keyword">yield</span> <span class="hljs-keyword">from</span> my_range_recursive(<span class="hljs-keyword">stop</span>, current + <span class="hljs-number">1</span>)</pre></div><div id="f479"><pre><span class="hljs-built_in">r</span> = my_range_recursive(<span class="hljs-number">10</span>) print(list(<span class="hljs-built_in">r</span>)) <span class="hljs-comment"># [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]</span></pre></div><h2 id="fe32">Generator Expression</h2><p id="8e8c">Generator expressions allow us to create iterators as one-liners and are good when we don’t need to give it external functions. Unfortunately, we can’t make another <code>my_range</code> using an expression, but we can work on iterables like our last <code>my_range</code> function.</p><div id="a4d8"><pre><span class="hljs-attribute">my_doubled_range_10</span> = (x * <span class="hljs-number">2</span> for x in my_range(<span class="hljs-number">10</span>)) <span class="hljs-attribute">print</span>(list(my_doubled_range_10)) # <span class="hljs-number">0</span>, <span class="hljs-number">2</span>, <span class="hljs-number">4</span>, <span class="hljs-number">6</span>, <span class="hljs-number">8</span>, <span class="hljs-number">10</span>, <span class="hljs-number">12</span>, <span class="hljs-number">14</span>, <span class="hljs-number">16</span>, <span class="hljs-number">18</span>]</pre></div><p id="d69b">The cool thing about this is that it does the following:</p><ol><li>The <code>list</code> asks <code>my_doubled_range_10</code> for its next value.</li><li><code>my_doubled_range_10</code> asks <code>my_range</code> for its next value.</li><li><code>my_doubled_range_10</code> returns <co

Options

de>my_range</code>’s value multiplied by 2.</li><li>The <code>list</code> appends the value to itself.</li><li>1–5 repeat until <code>my_doubled_range_10</code> raises <code>StopIteration</code> which happens when <code>my_range</code> does.</li><li>The <code>list</code> is returned containing each value returned by <code>my_doubled_range</code>.</li></ol><p id="dba1">We can even do <i>filtering</i> using generator expressions!</p><div id="2a73"><pre><span class="hljs-attribute">my_even_range_10</span> = (x for x in my_range(<span class="hljs-number">10</span>) if x % <span class="hljs-number">2</span> == <span class="hljs-number">0</span>) <span class="hljs-attribute">print</span>(list(my_even_range_10)) #<span class="hljs-meta"> [0, 2, 4, 6, 8]</span></pre></div><p id="f61e">This is very similar to the previous except <code>my_even_range_10</code> only returns values that match the given condition, so only even values between in the range [0, 10).</p><p id="db4c">Throughout all of this, we only create a list because we told it to.</p><h1 id="ef63">The Benefit</h1><figure id="9f6b"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/0*Lmjc7r8sLWs5vZqY.png"><figcaption><a href="https://nvie.com/posts/iterators-vs-generators/">Source</a></figcaption></figure><p id="53bb">Because generators are iterators, iterators are iterables, and iterators lazily return values. This means that using this knowledge we can create objects that will only give us objects when we ask for them and however many we like.</p><p id="fb35">This means we can pass generators into functions that reduce each other.</p><div id="4674"><pre><span class="hljs-keyword">print</span>(<span class="hljs-keyword">sum</span>(my_range(<span class="hljs-number">10</span>))) <span class="hljs-meta"># 45</span></pre></div><p id="2006">Calculating the sum in this way avoids creating a list when all we’re doing is adding them together and then discarding.</p><p id="2483">We can rewrite the very <a href="https://blog.dacio.dev/2019/05/03/python-iterators-and-generators/#the-problem">first example</a> to be much better using a generator function!</p><div id="a2e6"><pre><span class="hljs-attr">s</span> = <span class="hljs-string">'baacabcaab'</span> <span class="hljs-attr">p</span> = <span class="hljs-string">'a'</span></pre></div><div id="ed55"><pre>def find_char(<span class="hljs-built_in">string</span>, <span class="hljs-built_in">character</span>): <span class="hljs-keyword">for</span> index, str_char <span class="hljs-keyword">in</span> enumerate(<span class="hljs-built_in">string</span>): <span class="hljs-keyword">if</span> str_char == <span class="hljs-built_in">character</span>: yield index</pre></div><div id="9a82"><pre><span class="hljs-selector-tag">print</span>(<span class="hljs-built_in">list</span>(<span class="hljs-built_in">find_char</span>(s, p))) # <span class="hljs-selector-attr">[1, 2, 4, 7, 8]</span></pre></div><p id="2283">Now immediately there might be no obvious benefit, but let’s go to my first question: “what if we only want the first result; will we need to make an entirely new function?”</p><blockquote id="0403"><p><i>With a generator function we don’t need to rewrite as much logic.</i></p></blockquote><div id="5a12"><pre><span class="hljs-keyword">print</span>(<span class="hljs-keyword">next</span>(find_char(s, p))) <span class="hljs-meta"># 1</span></pre></div><p id="8a4c">Now we <i>could</i> retrieve the first value of the list that our original solution gave, but this way we only get the first match and stop iterating over the list. The generator will be then discarded and nothing else is created; massively saving memory.</p><h1 id="61a9">Conclusion</h1><p id="43d4">If you’re ever creating a function the accumulates values in a list like this.</p><div id="cc62"><pre><span class="hljs-function">def <span class="hljs-title">foo</span><span class="hljs-params">(bar)</span>: values =</span> []</pre></div><div id="ae37"><pre> <span class="hljs-keyword">for</span> x <span class="hljs-keyword">in</span> bar: # <span class="hljs-built_in">some</span> logic <span class="hljs-built_in">values</span>.<span class="hljs-built_in">append</span>(x)</pre></div><div id="0141"><pre> <span class="hljs-keyword">return</span> values</pre></div><p id="912b">Consider making it return an iterator with a class, generator function, or generator expression like so:</p><div id="15f9"><pre><span class="hljs-keyword">def</span> <span class="hljs-title function_">foo</span>(<span class="hljs-params">bar</span>): <span class="hljs-keyword">for</span> x <span class="hljs-keyword">in</span> <span class="hljs-symbol">bar:</span> <span class="hljs-comment"># some logic</span> <span class="hljs-keyword">yield</span> x</pre></div><h1 id="5948">Resources and Sources</h1><h2 id="d648">PEPs</h2><ul><li><a href="https://www.python.org/dev/peps/pep-0255/">Generators</a></li><li><a href="https://www.python.org/dev/peps/pep-0289/">Generator Expressions PEP</a></li><li><a href="https://www.python.org/dev/peps/pep-0380/">Yield From PEP</a></li></ul><h2 id="f9f4">Articles and Threads</h2><ul><li><a href="https://www.programiz.com/python-programming/iterator">Iterators</a></li><li><a href="https://www.geeksforgeeks.org/python-difference-iterable-iterator/">Iterable vs Iterator</a></li><li><a href="https://wiki.python.org/moin/Generator">Generator Documentation</a></li><li><a href="https://nvie.com/posts/iterators-vs-generators/">Iterators vs Generators</a></li><li><a href="https://stackoverflow.com/a/1995585">Generator Expression vs Function</a></li><li><a href="https://stackoverflow.com/a/8991864">Recrusive Generators</a></li></ul><h2 id="7805">Definitions</h2><ul><li><a href="https://docs.python.org/3/glossary.html#term-iterable">Iterable</a></li><li><a href="https://docs.python.org/3/glossary.html#term-iterator">Iterator</a></li><li><a href="https://docs.python.org/3/glossary.html#term-generator">Generator</a></li><li><a href="https://docs.python.org/3/glossary.html#term-generator-iterator">Generator Iterator</a></li><li><a href="https://docs.python.org/3/glossary.html#term-generator-expression">Generator Expression</a></li></ul><p id="f02b"><i>Originally published at <a href="https://blog.dacio.dev/2019/05/03/python-iterators-and-generators/"></a></i><a href="https://blog.dacio.dev/2019/05/03/python-iterators-and-generators/">https://blog.dacio.dev/2019/05/03/python-iterators-and-generators/<i></i></a><i>.</i></p></article></body>

Fantastic Iterators and How to Make Them

Photo by John Matychuk on Unsplash

The Problem

While learning at Make School I’ve seen my peers write functions that create lists of items.

s = 'baacabcaab'
p = 'a'
def find_char(string, character):
  indices = list()
  for index, str_char in enumerate(string):
    if str_char == character:
      indices.append(index)
  return indices
print(find_char(s, p)) # [1, 2, 4, 7, 8]

This implementation works, but it poses a few problems:

  • What if we only want the first result; will we need to make an entirely new function?
  • What if all we do is loop over the result once, do we need to store every element in memory?

Iterators are the ideal solution to these problems. They function like “lazy lists” in that instead of returning a list with every value it produces and returns each element one at a time.

Iterators lazily return values; saving memory.

So let’s dive into learning about them!

Built-In Iterators

The iterators that are most often are enumerate(), and zip(). Both of these lazily return values by next() with them.

range(), however, is not an iterator, but an “lazy iterable.” - Explanation

We can convert range() into an iterator with iter(), so we’ll do that for our examples for the sake of learning.

my_iter = iter(range(10))
print(next(my_iter)) # 0
print(next(my_iter)) # 1

Upon each call of next() we get the next value in our range; makes sense right? If you want to convert an iterator it to a list you just give it the list constructor.

my_iter = iter(range(10))
print(list(my_iter)) # [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

If we mimic this behavior we’ll start to understand more about how iterators work.

my_iter = iter(range(10))
my_list = list()
try:
  while True:
    my_list.append(next(my_iter))
except StopIteration:
  pass
print(my_list) # [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

You can see that we needed to wrap it in a try catch statement. That’s because iterators raise StopIteration when they’ve been exhausted.

So if we call next on our exhausted range iterator, we’ll get that error.

next(my_iter) # Raises: StopIteration

Making an Iterator

Let’s try making an iterator that behaves like range with only the stop argument by using three common types of iterators: Classes, Generator Functions (Yield) and Generator Expressions

Class

The old way of creating an iterator was through an explicitly defined class. For an object to be an iterator it must implement __iter__() that returns itself and __next__() which returns the next value.

class my_range:
  _current = -1
  def __init__(self, stop):
    self._stop = stop
  def __iter__(self):
    return self
  def __next__(self):
    self._current += 1
    if self._current >= self._stop:
      raise StopIteration
    return self._current
r = my_range(10)
print(list(r)) # [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

That wasn’t too hard, but unfortunately, we have to keep track of variables between calls of next(). Personally, I don’t like the boilerplate or changing how I think about loops because it isn’t a drop-in solution, so I prefer generators

The main benefit is that we can add additional functions that modify its internal variables such as _stop or create new iterators.

Class iterators have the downside of needing boilerplate, however, they can have additional functions that modify state.

Generators

PEP 255 introduced “simple generators” using the yield keyword.

Today, generators are iterators that are just easier to make than their class counterparts.

Generator Function

Generator functions are what was ultimately being discussed in that PEP and are my favorite type of iterator, so let’s start with that.

def my_range(stop):
  index = 0
  while index < stop:
    yield index
    index += 1
r = my_range(10)
print(list(r)) # [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Do you see how beautiful those 4 lines of code are? It’s slightly significantly shorter than our list implementation to top it off!

Generator functions iterators with less boilerplate than classes with a normal logic flow.

Generator functions automagically “pause” execution and return the specified value with every call of next(). This means that no code is run until the first next() call.

This means the flow is like this:

  1. next() is called,
  2. Code is executed up to the next yield statement.
  3. The value on the right of yield is returned.
  4. Execution is paused.
  5. 1–5 repeat for every next() call until the last line of code is hit.
  6. StopIteration is raised.

Generator functions also allow for you to use the yield from keyword which future next() calls to another iterable until said iterable has been exhausted.

def yielded_range():
  yield from my_range(10)
print(list(yielded_range())) # [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

That wasn’t a particularly complex example. But you can even do it recursively!

def my_range_recursive(stop, current = 0):
  if current >= stop:
    return
  yield current
  yield from my_range_recursive(stop, current + 1)
r = my_range_recursive(10)
print(list(r)) # [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Generator Expression

Generator expressions allow us to create iterators as one-liners and are good when we don’t need to give it external functions. Unfortunately, we can’t make another my_range using an expression, but we can work on iterables like our last my_range function.

my_doubled_range_10 = (x * 2 for x in my_range(10))
print(list(my_doubled_range_10)) # 0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

The cool thing about this is that it does the following:

  1. The list asks my_doubled_range_10 for its next value.
  2. my_doubled_range_10 asks my_range for its next value.
  3. my_doubled_range_10 returns my_range’s value multiplied by 2.
  4. The list appends the value to itself.
  5. 1–5 repeat until my_doubled_range_10 raises StopIteration which happens when my_range does.
  6. The list is returned containing each value returned by my_doubled_range.

We can even do filtering using generator expressions!

my_even_range_10 = (x for x in my_range(10) if x % 2 == 0)
print(list(my_even_range_10)) # [0, 2, 4, 6, 8]

This is very similar to the previous except my_even_range_10 only returns values that match the given condition, so only even values between in the range [0, 10).

Throughout all of this, we only create a list because we told it to.

The Benefit

Source

Because generators are iterators, iterators are iterables, and iterators lazily return values. This means that using this knowledge we can create objects that will only give us objects when we ask for them and however many we like.

This means we can pass generators into functions that reduce each other.

print(sum(my_range(10))) # 45

Calculating the sum in this way avoids creating a list when all we’re doing is adding them together and then discarding.

We can rewrite the very first example to be much better using a generator function!

s = 'baacabcaab'
p = 'a'
def find_char(string, character):
  for index, str_char in enumerate(string):
    if str_char == character:
      yield index
print(list(find_char(s, p))) # [1, 2, 4, 7, 8]

Now immediately there might be no obvious benefit, but let’s go to my first question: “what if we only want the first result; will we need to make an entirely new function?”

With a generator function we don’t need to rewrite as much logic.

print(next(find_char(s, p))) # 1

Now we could retrieve the first value of the list that our original solution gave, but this way we only get the first match and stop iterating over the list. The generator will be then discarded and nothing else is created; massively saving memory.

Conclusion

If you’re ever creating a function the accumulates values in a list like this.

def foo(bar):
  values = []
  for x in bar:
    # some logic
    values.append(x)
  return values

Consider making it return an iterator with a class, generator function, or generator expression like so:

def foo(bar):
  for x in bar:
    # some logic
    yield x

Resources and Sources

PEPs

Articles and Threads

Definitions

Originally published at https://blog.dacio.dev/2019/05/03/python-iterators-and-generators/.

Python
Iterators
Learn
Programming
Recommended from ReadMedium