the time window, and then returned. At the same time, it managed all this without ever loading the entire file into memory, only loading one line at a time.</p><p id="e1b8">The functional version, while definitely prettier, completely throws performance considerations out of the window. Not only does it load the entire file into memory, but it traverses it multiple times — first when it loads it into memory, then twice more when it filters out the messages outside the time window, and then when it searches for the appropriate message. For the worst case scenario where the time window encompasses all the messages, and the message we’re searching for is at the end, this means the new version will take on the order of <b>4x longer than the original one</b>. And this problem is further compounded with each operation we stack up.</p><p id="675c">There are certainly things we could to do optimize the new version, but you would find out that there’s only so much that can be done. In other words, it would seem that we’re forced to decide between readable code, and performant code.</p><p id="edb2">Thankfully, you’ll soon see that’s not the case.</p><h2 id="ec6b">Traversal Precedence</h2><p id="f79c">Before we move on, it is important to understand what the fundamental difference between the two approaches above is.</p><p id="42ab">Notice that, in the imperative version, we applied <i>all</i> operations to a given element before moving on to the next element — we took an element, checked if it was within the time window, checked if it matched the user we were searching for, and then possibly transformed it. <b>For each element, this version traversed all the operations and applied them</b>.</p><p id="55ef">Contrast that with how the functional version works. The functional version takes the first operation, and applies it to all the elements. It then takes the second operation, and applies it to all elements, and so on. <b>For each operation, this version traversed all the elements and applied the operation to them</b>.</p><p id="175b">In other words, the fundamental difference between the two approaches is <b>whether traversing elements is given precedence over traversing operations, or the other way around</b>.</p><p id="0097">The contrast between these two approaches actually makes a lot of sense when you think about it. You have more than one operation, more than one element, and need to apply each operation to each element. You can represent this by making a table of operations and elements, like so:</p><figure id="bbf6"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*bQlRFi5WO6oliOC4TQ3TPw.png"><figcaption></figcaption></figure><p id="2bd3">You need to traverse every cell in this table, and you have only two (reasonable) choices — either you go from <i>left to right</i> (give precedence to traversing elements), or from <i>top to bottom</i> (give precedence to traversing operations).</p><figure id="c29f"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*uZL44R3iOL46dBZxKogbHA.png"><figcaption>Element traversal has precedence (used by functional version)</figcaption></figure><figure id="0ff0"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*XgZY4QnvDNrrb6Aayx201g.png"><figcaption>Operation traversal has precedence (used by imperative version)</figcaption></figure><p id="393b">Our first, imperative, implementation of <code>getFirstMessageOf</code> above gave precedence to traversing operations, while our second, functional, implementation gave precedence to traversing elements.</p><h2 id="97a6">Choosing a Style</h2><p id="c5a1">When using an imperative code style, we can easily choose which of the above we want. Let’s illustrate this on a simple task:</p><p id="e4c5"><i>Take a list of integers, filter out the even ones, and divide them by two.</i></p>
<figure id="49e8">
<div>
<div>
<img class="ratio" src="http://placehold.it/16x9">
<iframe class="" src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fpl.kotl.in%2FRKESH8rvb&display_name=Kotlin+Playground&url=https%3A%2F%2Fpl.kotl.in%2FRKESH8rvb&key=a19fcc184b9711e1b4764040d3dc5c07&type=text%2Fhtml&schema=kotl" allowfullscreen="" frameborder="0" height="300" width="800">
</div>
</div>
</figure></iframe></div></div></figure><p id="819d">However, when we want to use a functional approach, the methods we have
Options
at our disposal (defined on <code>Iterable</code>) are all of the element-traversal kind. It is, therefore, natural to ask — is there an implementation which gives precedence to traversing operations?</p><p id="ef6c">The answer is yes — that’s exactly what <code>Sequence</code> is.</p><h2 id="4ba5">Iterable vs. Sequence</h2><figure id="d697"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/0*KvKWa8OIvEAcGexP.png"><figcaption><a href="https://typealias.com/guides/inside-kotlin-sequences/#sequences-iterables-and-iterators">Sequences, Iterables, Iterators on Typealias.com</a></figcaption></figure><p id="eb21">You already know <code>Iterable</code> well — it sits at the base of all the collections we’ve talked so far, such as <code>List</code>, <code>Set</code>, <code>Map</code> etc.</p><div id="ea74"><pre><span class="hljs-keyword">public</span> <span class="hljs-keyword">interface</span> <span class="hljs-title class_">Iterable</span><<span class="hljs-type">out T</span>> {
<span class="hljs-comment">/**
* Returns an iterator over the elements of this object.
/</span>
<span class="hljs-keyword">public</span> <span class="hljs-keyword">operator</span> <span class="hljs-function"><span class="hljs-keyword">fun</span> <span class="hljs-title">iterator</span><span class="hljs-params">()</span></span>: Iterator<T>
}</pre></div><p id="c735">Operations defined on <code>Iterable</code> give precedence to traversing <i>elements</i>, i.e. the first approach illustrated above.</p><p id="ead0">At first look, it seems that <code>Sequence</code> is identical to <code>Iterable</code>, and this is often a source of confusion. Why define the same interface twice?</p><div id="b9be"><pre><span class="hljs-keyword">public</span> <span class="hljs-keyword">interface</span> <span class="hljs-title class_">Sequence</span><<span class="hljs-type">out T</span>> {
<span class="hljs-comment">/*
* Returns an [Iterator] that returns the values from
* the sequence.
*
* Throws an exception if the sequence is constrained
* to be iterated once and iterator is invoked the second
* time.
*/</span>
<span class="hljs-keyword">public</span> <span class="hljs-keyword">operator</span> <span class="hljs-function"><span class="hljs-keyword">fun</span> <span class="hljs-title">iterator</span><span class="hljs-params">()</span></span>: Iterator<T>
}</pre></div><p id="5315">The answer is that, while the structure of the interface is identical, the <i>contract</i> that it represents is different — operations defined on <code>Sequence</code> give precedence to traversing <i>operations</i>, i.e. the second approach illustrated above.</p><p id="e64b">Let’s put what we learned to good use, and rewrite the example in the introduction to use <code>Sequence</code>:</p>
<figure id="4877">
<div>
<div>
<img class="ratio" src="http://placehold.it/16x9">
<iframe class="" src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fpl.kotl.in%2F1cTmq0bu-&display_name=Kotlin+Playground&url=https%3A%2F%2Fpl.kotl.in%2F1cTmq0bu-&key=a19fcc184b9711e1b4764040d3dc5c07&type=text%2Fhtml&schema=kotl" allowfullscreen="" frameborder="0" height="300" width="800">
</div>
</div>
</figure></iframe></div></div></figure><p id="e69f">Notice that the code is <i>identical </i>to the code we wrote when we first translated the imperative version to the functional one, except for the call to <code>File.useLines</code>, which passes a <code>Sequence</code> of all the lines to its lambda parameter, and then closes the file. In other words, we are able to keep exactly the same benefits in readability, while recovering the effectiveness of the imperative version.</p><p id="3463">Once again, Kotlin allows us to have our cake, and eat it too.</p><p id="8e74">Go back to <a href="https://readmedium.com/collection-operations-grouping-5c6312c3dd79">Collection Operations: Grouping</a>, jump to the <a href="https://readmedium.com/table-of-contents-c52573cfa291">Table of Contents</a>, or continue to <a href="https://readmedium.com/sequences-how-they-work-how-to-use-them-c5f17282ef40">Sequences: How They Work & How To Use Them</a>.</p><figure id="8ecd"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*biBSB579iezsNvEQ_NMLBg.png"><figcaption></figcaption></figure></article></body>
Sequences: Why We Need Them
A practical example demonstrating how collections based on Iterable fall short, a novel explanation of why this happens using traversal precedence, and showing how Sequence naturally arises from this view.
— — — — — — — — — — — — — — —
THE CURRENT VERSION OF THIS ARTICLE IS PUBLISHED HERE.
This article is part of the Kotlin Primer, an opinionated guide to the Kotlin language, which is indented to help facilitate Kotlin adoption inside Java-centric organizations. It was originally written as an organizational learning resource for Etnetera a.s. and I would like to express my sincere gratitude for their support.
The Sequence interface, and it’s inheritors, form a completely separate collection hierarchy from the one we’ve been talking about until now (List, Set etc.), which is based on Iterable. In this article, we’ll give an example that demonstrates how what we’ve learned so far can fall short, and proceed to take a look at the fundamental reason why that happens, which also turns out to be the fundamental difference between Sequence and “normal” collections.
Let’s go!
Problem Statement
Imagine you’re reading from a file that contains interactions in a chat room. The format of each line is [<timestamp>] <username>: <message>, as follows:
You want to find the first message that was sent by a given user in a given time window.
Here’s how you might implement it using an imperative style:
Ugh, that’s horrible. We just spent multiple chapters talking about all the awesome things in the standard library, and we’re not using any of them!
Let’s give it another go:
Much better. Right? Well…no. This “functional” version really reads much better, but there’s a big problem.
The imperative version only did the minimum number of operations necessary — it traversed the lines until it either found the one it was looking for, or moved beyond the time window, and then returned. At the same time, it managed all this without ever loading the entire file into memory, only loading one line at a time.
The functional version, while definitely prettier, completely throws performance considerations out of the window. Not only does it load the entire file into memory, but it traverses it multiple times — first when it loads it into memory, then twice more when it filters out the messages outside the time window, and then when it searches for the appropriate message. For the worst case scenario where the time window encompasses all the messages, and the message we’re searching for is at the end, this means the new version will take on the order of 4x longer than the original one. And this problem is further compounded with each operation we stack up.
There are certainly things we could to do optimize the new version, but you would find out that there’s only so much that can be done. In other words, it would seem that we’re forced to decide between readable code, and performant code.
Thankfully, you’ll soon see that’s not the case.
Traversal Precedence
Before we move on, it is important to understand what the fundamental difference between the two approaches above is.
Notice that, in the imperative version, we applied all operations to a given element before moving on to the next element — we took an element, checked if it was within the time window, checked if it matched the user we were searching for, and then possibly transformed it. For each element, this version traversed all the operations and applied them.
Contrast that with how the functional version works. The functional version takes the first operation, and applies it to all the elements. It then takes the second operation, and applies it to all elements, and so on. For each operation, this version traversed all the elements and applied the operation to them.
In other words, the fundamental difference between the two approaches is whether traversing elements is given precedence over traversing operations, or the other way around.
The contrast between these two approaches actually makes a lot of sense when you think about it. You have more than one operation, more than one element, and need to apply each operation to each element. You can represent this by making a table of operations and elements, like so:
You need to traverse every cell in this table, and you have only two (reasonable) choices — either you go from left to right (give precedence to traversing elements), or from top to bottom (give precedence to traversing operations).
Element traversal has precedence (used by functional version)Operation traversal has precedence (used by imperative version)
Our first, imperative, implementation of getFirstMessageOf above gave precedence to traversing operations, while our second, functional, implementation gave precedence to traversing elements.
Choosing a Style
When using an imperative code style, we can easily choose which of the above we want. Let’s illustrate this on a simple task:
Take a list of integers, filter out the even ones, and divide them by two.
However, when we want to use a functional approach, the methods we have at our disposal (defined on Iterable) are all of the element-traversal kind. It is, therefore, natural to ask — is there an implementation which gives precedence to traversing operations?
The answer is yes — that’s exactly what Sequence is.
You already know Iterable well — it sits at the base of all the collections we’ve talked so far, such as List, Set, Map etc.
publicinterfaceIterable<out T> {
/**
* Returns an iterator over the elements of this object.
*/publicoperatorfuniterator(): Iterator<T>
}
Operations defined on Iterable give precedence to traversing elements, i.e. the first approach illustrated above.
At first look, it seems that Sequence is identical to Iterable, and this is often a source of confusion. Why define the same interface twice?
publicinterfaceSequence<out T> {
/**
* Returns an [Iterator] that returns the values from
* the sequence.
*
* Throws an exception if the sequence is constrained
* to be iterated once and `iterator` is invoked the second
* time.
*/publicoperatorfuniterator(): Iterator<T>
}
The answer is that, while the structure of the interface is identical, the contract that it represents is different — operations defined on Sequence give precedence to traversing operations, i.e. the second approach illustrated above.
Let’s put what we learned to good use, and rewrite the example in the introduction to use Sequence:
Notice that the code is identical to the code we wrote when we first translated the imperative version to the functional one, except for the call to File.useLines, which passes a Sequence of all the lines to its lambda parameter, and then closes the file. In other words, we are able to keep exactly the same benefits in readability, while recovering the effectiveness of the imperative version.
Once again, Kotlin allows us to have our cake, and eat it too.