The article discusses the performance differences between Iterable and Sequence in Kotlin, providing guidelines on when to use each, and compares Kotlin Sequence with Java Streams.
Abstract
The article "Sequences: Iterable vs. Sequence vs. Java Stream" provides an in-depth analysis of the performance characteristics of Kotlin's Iterable and Sequence types, emphasizing the importance of benchmarks in determining their efficiency in real-world scenarios. It suggests that Sequence is generally more performant for large collections and complex operation pipelines due to its lazy evaluation and avoidance of intermediate list creation. The article also contrasts Kotlin Sequence with Java Streams, highlighting Kotlin's advantages such as null safety and a richer API through extension functions, while noting the absence of a parallel() method in Sequence. The Kotlin team's decision to implement Sequence is rationalized by its ability to leverage Kotlin-specific features, and the article provides conversion functions for interoperability between Kotlin Sequence and Java Streams.
Opinions
The author emphasizes the importance of benchmarks in assessing the performance of Iterable and Sequence and cautions against making assumptions without empirical data.
Sequence is recommended for large collections and complex operations due to its efficiency in avoiding unnecessary intermediate collections and processing.
The article suggests that Sequence can enhance code readability by allowing a chain of simpler calls without the performance penalty of multiple iterations.
The author points out that Sequence is the only choice for working with infinite data streams, which cannot be represented with Iterable.
Kotlin Sequence is considered superior to Java Streams due to Kotlin's language features like null safety and extension functions, despite the lack of a parallel() method, which is seen as a feature that should be used with caution.
The author provides utility functions to convert between Kotlin Sequence and Java Streams, acknowledging the need for interoperability between the two.
The article encourages the use of Kotlin Flows as a safer and more powerful alternative to Java Stream's parallel() method for concurrent processing.
Sequences: Iterable vs. Sequence vs. Java Stream
A discussion on performance of Sequence vs. Iterable, when to use which, and the differences between Sequence and Java Streams.
— — — — — — — — — — — — — — —
THE CURRENT VERSION OF THIS ARTICLE IS PUBLISHED HERE.
This article is part of the Kotlin Primer, an opinionated guide to the Kotlin language, which is indented to help facilitate Kotlin adoption inside Java-centric organizations. It was originally written as an organizational learning resource for Etnetera a.s. and I would like to express my sincere gratitude for their support.
One of the main differences between Iterable and Sequence, and most often the reason one is used over the other, is differences in performance. However, before we get into this subject, let me just remind you that there is only a single correct way to determine which of the two perform better in a given situation — benchmarks. As you will see, there are multiple factors affecting the performance of Iterable and Sequence, and which of them is better really depends on the actual, real world conditions in which they are used.
That being said, there are certain guidelines that you can keep in mind when trying to estimate which choice is more appropriate when you’re first writing the code (i.e. before you have the ability to benchmark), and all of them can be deduced from looking at the imperative equivalents of what Iterable and Sequence do, as we did at the beginning of this series.
Right off the bat, we notice several large differences that affect performance.
The likeIterable version iterates the entire collection for each operation, and creates intermediate List instances for each operation. Even worse, it inserts the elements one by one, and since the default backing implementation of MutableList is ArrayList, which is backed by a fixed-size array, once we get past the maximum size of this array, ArrayList needs to internally create a new one and copy over all its elements to it before actually inserting it. So a lot of processing power is wasted on copying things over, and the memory (and therefore GC) footprint can be large.
In contrast, the likeSequence version only iterates the entire collection once, and does not create any intermediate List instances. In the presence of a positive limit, the difference is even more profound, because the likeSequence version stops iterating the moment it has found the appropriate number of elements.
So, does that mean that we should always use Sequence? Not, it does not. For one thing, keep in mind that the above is not actually the code that gets executed when using Iterable or Sequence. In reality, we know that applying an intermediate (i.e. non-terminal) operation to Sequence creates a new Sequence that wraps the previous one, and those objects are not trivial. For smaller collections, it can be much faster to just create a few intermediate lists containing 10 elements each than to build up a hierarchy of nested objects that recursively call each other once a terminal operation is applied. That’s why it’s important to always benchmark whatever you write, and not take anything for granted.
Therefore, the general guideline is that the larger the number of operations and the larger the collection of elements we want to apply them, the larger the likelihood that Sequence will be more performant than Iterable. Whenever you’re dealing with a collection that’s large enough that fitting it all into memory starts to be a concern, Sequences should be your go-to solution.
Another benefit of the likeSequence method is that a single complicated call to a given intermediate operation can be broken up into a chain of simpler calls, but without having to traverse the collection each time.
Therefore, using Sequence can theoretically be used to enhance readability while diminishing the impact on performance.
Sequences can also be preferred in situations where the pipeline of operations is not specified in a single place, but rather built up by various different services/methods.
If we did the above using Iterable, we’d probably have processed a large portion of the files on disk in DiskAccessService and CensorService, only to throw almost all of them away in BusinessService.
Finally, there are certain situations where you simply can’t use Iterable, notably situations where you want to work with infinite sequences of elements in an infinite loop. There is simply no way to model this using Iterable, so your choice is between Sequence and coding in an imperative style. Of those two, always choose the former if you can.
There are some other things that you can take into consideration, which you can read about here and here.
Kotlin Sequences vs. Java Streams
You might have already noticed that Kotlin Sequences are very similar to Java Streams, and may be wondering why the Kotlin team decided to reimplement something that was already present in Java.
The reason is that designing Sequences for Kotlin first allows them to take advantage of all the features available in Kotlin that are not available in Java. An obvious one is null safety — when using Stream, all types in all lambdas will be platform types. Another is extension functions, which allow Sequences to have a (much) richer, and arguably simpler, API. You can read about further advantages here.
It should be mentioned that one thing Streams have that Sequences don’t is the parallel() method. However, this is something you’re actually discouragedfromusing, and the same thing can be achieved in a safer and much more powerful manner using Kotlin Flows.
In any case, if you ever need to move between these two worlds, the following functions are at your disposal