avatarUğur Taş

Summary

The provided context discusses best practices for using the Java Stream API, emphasizing declarative style, exception handling, null handling, chaining operations, and limiting filtering and mapping to the start of the pipeline.

Abstract

The Java Stream API, introduced in Java 8, provides a functional programming style for handling collections of data. This context outlines best practices for using streams effectively and efficiently, including embracing a declarative style, properly handling exceptions, dealing with null values, chaining operations, and limiting filtering and mapping to the start of the pipeline. The article also discusses the importance of following best practices for nulls, reusing stream instances, preferring method references and lambda expressions, using streams for one-off operations, avoiding stateful operations, and avoiding side effects in functional operations.

Opinions

  • The Stream API encourages a declarative approach, focusing on what you want to achieve rather than how, leading to cleaner, more readable code.
  • When working with streams, it's essential to handle exceptions gracefully to prevent unexpected program termination.
  • Dealing with null values in streams requires some care. It's recommended to avoid using filter(x -> x != null), use findFirst()/orElse(default) rather than findAny(), use flatMap instead of map, and use Optionals and null-friendly collectors like Collectors.toList().
  • Streams are all about chaining intermediate operations like filter, map, and distinct. This promotes code clarity and makes complex logic easy to express.
  • Filters and mappings such as filter and map operations are best applied early in the stream pipeline to avoid performing unnecessary work on discarded elements.
  • While Java designed streams for one-time use, it’s possible to reuse stream instances with care. Avoid creating new stream instances unnecessarily, especially when dealing with complex operations or resource-intensive tasks.
  • The Stream API is designed for one-off operations on data. Each stream can only be consumed once. Reusing stream operations requires re-acquiring the stream each time. Therefore, it’s best to use streams for ephemeral data processing, not reusable logic.

Java Stream API Best Practices

Java 8 introduced the Java Stream API to provide a functional programming style for handling collections of data. Streams allow you to declaratively describe what you want to do with data without worrying about how it gets done. The Stream API makes many common data operations easy and concise. There are some best practices to follow when using streams for maximum efficiency and readability.

Embrace Declarative Style

Streams encourage a declarative approach, focusing on what you want to achieve rather than how. Instead of iterating through elements and manipulating state, you express your desired transformations on the data. This leads to cleaner, more readable code.

// Imperative loop
List<String> longNames = new ArrayList<>();
for (String name : names) {
  if (name.length() > 10) {
    longNames.add(name);
  }
}
// Declarative stream
List<String> longNames = names.stream()
                            .filter(name -> name.length() > 10)
                            .collect(Collectors.toList());

Handle Exceptions Properly

When working with streams, it’s essential to handle exceptions gracefully to prevent unexpected program termination. Use try-catch blocks or exception handling mechanisms within stream operations to handle exceptional scenarios effectively.

List<String> paths = Arrays.asList("/path/to/file1", "/path/to/file2");
// Handle exceptions within stream operation
List<File> files = paths.stream()
                        .map(path -> {
                            try {
                                return new File(path);
                            } catch (Exception e) {
                                throw new RuntimeException("Error processing file: " + path, e);
                            }
                        })
                        .collect(Collectors.toList());

Follow Best Practices for Nulls

Dealing with null values requires some care when using streams:

  • Avoid filter(x -> x != null) which can match no elements in parallel.
  • Use findFirst()/orElse(default) rather than findAny() to avoid NullPointerException.
  • Use flatMap instead of map to ignore null values.
  • Use Optionals and null-friendly collectors like Collectors.toList() to handle nulls.

Adhering to these best practices avoids common pitfalls when nulls are present.

Chain Operations, Not Loops

Streams are all about chaining intermediate operations like filter, map, and distinct. Think of them as building blocks in a pipeline, transforming the data step-by-step. This promotes code clarity and makes complex logic easy to express.

// Find the average length of words longer than 5 characters
double avgLength = names.stream()
                       .filter(name -> name.length() > 5)
                       .mapToInt(String::length)
                       .average()
                       .getAsDouble();

Limit Filtering and Mapping to Start of Pipeline

Filters and mappings such as filter and map operations are best applied early in the stream pipeline. This avoids performing unnecessary work on discarded elements.

// Don't do this - filter after mapping
list.stream()
  .map(x -> x.toUpperCase()) 
  .filter(s -> s.startsWith("A"))
  .forEach(System.out::println); 
// Do this instead - filter before mapping
list.stream()
  .filter(s -> s.startsWith("A"))
  .map(String::toUpperCase)
  .forEach(System.out::println);

Here, by filtering before mapping, we avoid unnecessary mapping of strings we will later discard.

Reuse Stream Instances

While Java designed streams for one-time use, it’s possible to reuse stream instances with care. Avoid creating new stream instances unnecessarily, especially when dealing with complex operations or resource-intensive tasks. Reusing stream instances can improve code readability and performance by minimizing redundant stream creation.

List<String> words = Arrays.asList("hello", "world", "java", "stream");
// Reuse the same stream instance for multiple operations
Stream<String> stream = words.stream();
long count = stream.filter(word -> word.length() > 4)
                   .count();
// Perform another operation using the same stream instance
List<String> longWords = stream.filter(word -> word.length() > 6)
                               .collect(Collectors.toList());

Prefer Method References and Lambda Expressions

Java Stream API supports both lambda expressions and method references for defining operations on stream elements. While both approaches are valid, method references often result in more readable and concise code.

List<String> words = Arrays.asList("hello", "world", "java", "stream");
// Using lambda expression
words.stream()
     .forEach(word -> System.out.println(word));
// Using method reference
words.stream()
     .forEach(System.out::println);

Prefer Collection Library Methods

The Stream API overlaps with many existing collection methods like Collection.removeIf() and Iterable.forEach(). Prefer using these methods directly instead of acquiring and consuming a stream unnecessarily.

// Don't do this - unnecessary stream
list.stream().map(x -> x.toUpperCase()).collect(toList());
// Do this instead  
list.replaceAll(String::toUpperCase);

Collection methods are simpler and avoid the one-use restriction of streams.

Use Streams for One-Off Operations, not Reusable Logic

The Stream API is designed for one-off operations on data. Each stream can only be consumed once. Reusing stream operations requires re-acquiring the stream each time.

Therefore, it’s best to use streams for ephemeral data processing, not reusable logic. Extract reusable logic into non-stream methods.

// Don't do this - recreating the stream repeatedly
public void printNamesAndAges(List<Person> people) {
  people.stream().map(p -> p.getName()).forEach(System.out::println);
  people.stream().map(p -> p.getAge()).forEach(System.out::println);
}
// Do this instead - extract reusable logic
public void printNamesAndAges(List<Person> people) {
  printNames(people); 
  printAges(people);
}
private void printNames(List<Person> people) {
  people.stream().map(Person::getName).forEach(System.out::println); 
}
private void printAges(List<Person> people) {
  people.stream().map(Person::getAge).forEach(System.out::println);
}

By extracting reusable logic into separate methods, we avoid reacquiring the stream repeatedly.

Avoid Stateful Operations

Stateful operations like sorted() and distinct() can be useful. However, they may lead to unexpected behavior when working with parallel streams. It's generally better to use stateless operations whenever possible to ensure compatibility with parallel execution. For instance:

List<String> words = Arrays.asList("hello", "world", "java", "stream");
// Avoid stateful operation (sorted)
List<String> sortedWords = words.stream()
                               .sorted() // Stateful operation
                               .collect(Collectors.toList());
// Instead, use stateless operation (sorted with a comparator)
List<String> sortedWords = words.stream()
                               .sorted(Comparator.naturalOrder()) // Stateless operation
                               .collect(Collectors.toList());

Avoid Side Effects in Functional Operations

Stream operations should ideally be free of side effects to ensure predictability and maintainability of code. Intermediate stream operations like map and filter should remain side effect free. Side effects should be isolated to terminal operations like forEach and collect which consume the stream.

// Don't do this - side effect in map()
list.stream()
  .map(x -> {
    System.out.println("Mapping " + x);
    return x.toUpperCase();
  })
  .forEach(System.out::println);
// Do this instead - side effect only in forEach()  
list.stream()
  .map(String::toUpperCase)
  .forEach(x -> {
    System.out.println("ForEach " + x); 
  });

Isolating side effects avoids inadvertently depending on the order of operations. Avoid modifying external state or performing I/O operations within stream functions. For instance:

List<String> words = Arrays.asList("hello", "world", "java", "stream");
// Side effect: Modifying external state
AtomicInteger count = new AtomicInteger(0);
List<String> result = words.stream()
                          .peek(word -> count.incrementAndGet())
                          .collect(Collectors.toList());
System.out.println("Total words processed: " + count.get()); // Avoid such side effects
// Instead, use a separate counting mechanism
long wordCount = words.stream()
                     .count();
System.out.println("Total words processed: " + wordCount); // Preferable approach

Use Parallel Streams for CPU Intensive Tasks

Java Stream API offers support for parallel execution through the stream().parallel() or parallelStream() method. For intensive computational tasks, this approach can provide performance benefits by leveraging multiple CPU cores. Use parallel streams when:

  • Processing large amounts of data
  • Applying CPU intensive mappings like complex calculations
  • Terminal operations like forEach and reduce are associative and stateless
// Process large list using parallel stream 
list.parallelStream()
  .map(x -> doExpensiveCalculation(x))
  .forEach(System.out::println);

However, parallel streams incur overhead for splitting/merging work. Hence it’s crucial to profile and test parallel streams to ensure actual performance gains and avoid potential overhead. Moreover, it is better to use only when cpu-bound and with large data sets.

You can use the below example to test it yourself. When you change the 1_000_000 with 100_000_000 you will see the performance change.

List<Integer> numbers = IntStream.rangeClosed(1, 1_000_000)
                                .boxed()
                                .collect(Collectors.toList());
// Sequential stream
long startTime = System.currentTimeMillis();
long count = numbers.stream()
                    .filter(n -> n % 2 == 0)
                    .count();
long endTime = System.currentTimeMillis();
System.out.println("Sequential Stream: " + count + ", Time: " + (endTime - startTime) + " ms");
// Parallel stream
startTime = System.currentTimeMillis();
count = numbers.parallelStream()
               .filter(n -> n % 2 == 0)
               .count();
endTime = System.currentTimeMillis();
System.out.println("Parallel Stream: " + count + ", Time: " + (endTime - startTime) + " ms");

Take Advantage of Infinite Streams

Infinite streams are streams that don’t end, such as:

  • Stream of random numbers
  • Stream of lines in a file
  • Stream of events from a sensor

Infinite streams provide a constant source of input:

new Random().ints()
  .filter(x -> x > 0)
  .limit(10)
  .forEach(System.out::println);

This generates 10 random positive numbers. Infinite streams can avoid managing stream termination manually.

Leverage Lazy Evaluation

One of the key advantages of Java streams is lazy evaluation, meaning elements are processed only when necessary. This allows for optimized performance, especially when dealing with large datasets.

Stream operations are typically lazy, meaning they’re only executed when needed. This can significantly improve performance, especially for large datasets. Utilize this by placing terminal operations only when necessary.

// Check if there are any names starting with "A"
boolean exists = names.stream()
                     .anyMatch(name -> name.startsWith("A"));
// No need to process further if none exists
if (!exists) {
  return;
}

Here another example:

List<Integer> numbers = Arrays.asList(1, 2, 3, 4, 5);
// Filter even numbers and then double them
List<Integer> result = numbers.stream()
                              .filter(n -> n % 2 == 0)
                              .map(n -> n * 2)
                              .collect(Collectors.toList());

Here, filtering and mapping operations are performed only when the terminal operation collect is invoked.

Explore Advanced Techniques

Streams offer more than just basic operations. Discover advanced techniques like flatMap, groupingBy, and partitioning for powerful data manipulation.

// Find all unique words across all sentences
List<String> uniqueWords = sentences.stream()
                                   .flatMap(sentence -> Arrays.stream(sentence.split(" ")))
                                   .distinct()
                                   .collect(Collectors.toList());

Leverage Common Stream Operations

The Stream API provides many pre-defined operations and collectors to handle common data processing patterns:

Mappingmap and flatMap transform each element into something else.

Filteringfilter omits elements that don't match a predicate.

Slicinglimit and skip select a subset of elements.

MatchinganyMatch, allMatch, noneMatch check predicate matches.

FindingfindFirst, findAny return an optional matching element.

Reducingreduce and collect aggregate stream elements.

Sortingsorted returns a sorted stream.

LoopingforEach and forEachOrdered iterate over elements.

These common patterns are optimized and avoid the need to hand-code implementations.

Beware Stream Performance Gotchas

While streams provide a concise way to express operations on data, certain patterns can lead to poor performance:

Boxed streams — Primitive streams like IntStream avoid boxing costs.

Reusing streams — As noted earlier, reusing streams requires reacquiring them.

Large intermediates — Certain operations like distinct() and sorted() can create large intermediate collections.

Improper short-circuiting — Operations like limit() should be applied before costly mappings.

No short-circuiting — Terminal operations like forEach process the entire stream regardless of short-circuiting operations like limit().

Profiling and testing are key to identify and resolve any stream performance issues.

Follow Best Practices for Primitive Streams

The primitive stream types like IntStream and LongStream avoid boxing costs of the generalized Stream<T>. But they have some distinct characteristics to keep in mind:

  • Use mapToXxx() for mapping to other primitives.
  • Terminal operations return primitives like int and long instead of OptionalXxx.
  • Primitive streams cannot be parallelized.
  • flatMapToXxx() is required for flattening.
int sum = numbers.stream()
  .filter(x -> x > 0)
  .mapToInt(x -> x) 
  .sum();

Adhering to primitive stream best practices avoids performance pitfalls and ClassCastExceptions.

Conclusion

The Java Stream API enables declarative data processing with many advantages. By following best practices like isolating side effects, limiting filtering and mapping, and using parallel streams judiciously, you can reap the full benefits of streams. Avoid performance pitfalls and favor collection methods when possible. With a sound understanding of stream capabilities and limitations, you can utilize them effectively in your programs.

👏 Thank You for Reading!

👨‍💼 I appreciate your time and hope you found this story insightful. If you enjoyed it, don’t forget to show your appreciation by clapping 👏 for the hard work!

📰 Keep the Knowledge Flowing by Sharing the Article!

✍ Feel free to share your feedback or opinions about the story. Your input helps me improve and create more valuable content for you.

✌ Stay Connected! 🚀 For more engaging articles, make sure to follow me on social media:

🔍 Explore More! 📖 Dive into a treasure trove of knowledge at Codimis. There’s always more to learn, and we’re here to help you on your journey of discovery.

Java
Stream Api
Java Stream Api
Java8 Stream Api
Best Practices
Recommended from ReadMedium