avatarSea Breeze

Summary

The web content provides an in-depth guide to mastering Java Stream, detailing its creation, intermediate operations like map and flatMap, and terminal operations, with examples illustrating its practical use and benefits over traditional coding methods.

Abstract

The article "Mastering Java Stream in One Article" is a comprehensive tutorial aimed at Java developers who wish to enhance their understanding and application of the Java Stream API. It begins by introducing the concept of Streams and their role in functional programming within Java. The author then breaks down the process of using Streams into three main stages: creating a Stream, performing intermediate operations to transform Streams, and executing terminal operations to produce a result. Intermediate operations such as map, flatMap, filter, distinct, sorted, and limit are explained with examples, emphasizing the functional style and ease of use that Streams bring to data manipulation. The article also covers the use of peek and forEach for debugging and processing elements. Terminal operations like max, min, count, findAny, findFirst, anyMatch, allMatch, and noneMatch are discussed, along with the generation of collections and concatenated strings using collect. The author further explores the efficiency of parallel Streams and cautions against the pitfalls of Stream operations, such as the laziness of Streams and the inability to reuse a Stream after a terminal operation. The article concludes with an invitation for readers to practice using Streams in their projects and to engage in discussions in the comments section.

Opinions

  • The author advocates for the use of Java Streams to write more elegant and maintainable code, suggesting that Streams abstract away the details of data processing, allowing developers to focus on business logic.
  • There is a clear preference for using functional interfaces and lambda expressions when working with Streams, which the author believes leads to cleaner and more readable code.
  • The article promotes the idea that parallel Streams can improve performance by utilizing multiple cores of a CPU, making them particularly useful for large datasets.
  • The author emphasizes the importance of understanding the characteristics of intermediate and terminal operations to avoid common mistakes, such as attempting to use a Stream after a terminal operation has been called.
  • There is an acknowledgment of the potential for misuse of the peek method, advising readers to use it primarily for debugging rather than as part of the core logic of a Stream pipeline.
  • The author encourages hands-on practice and engagement with the community to master the Stream API, indicating a belief in learning through application and collaboration.

Mastering Java Stream in One Article

My articles are open to everyone; non-member readers can read the full article by clicking this link.

Column

In the previous articleļ¼ŒLambda expression in javaļ¼ŒI briefly mentioned the use of Stream. In this article, we will explore it in depth. First, letā€™s take the familiar Student class as an example. Suppose there are a group of students:

@Getter
@AllArgsConstructor
public static class Student {
    private String name;
    private Integer age;
}
ā€‹
List<Student> students = Lists.newArrayList(
    new Student("Bob", 18),
    new Student("Ted", 17),
    new Student("Zeka", 19));

Now there is such a demand:

Return students 18 and older from a given student list, in descending order of age, up to 2.

In Java7 and earlier code, we will implement it in the following way:

public List<Student> getTwoOldestStudents(List<Student> students) {
    List<Student> result = new ArrayList<>();
    // 1.Loop to determine the age of students, first filter out students who match the age
    for (Student student : students) {
      if (student.getAge() >= 18) {
        result.add(student);
      }
    }
    // 2.Sort the list of eligible students by age
    result.sort((s1, s2) -> s2.getAge() - s1.getAge());
    // 3.Determine the result size. If it is greater than 2, intercept the sublist of the first two data and return it.
    if (result.size() > 2) {
      result = result.subList(0, 2);
    }
    return result;
}

In Java8 and later versions, with the help of Stream, we can write the following code more elegantly:

public List<Student> getTwoOldestStudentsByStream(List<Student> students) {
        return students.stream()
                .filter(s -> s.getAge() >= 18)
                .sorted((s1, s2) -> s2.getAge() - s1.getAge())
                .limit(2)
                .collect(Collectors.toList());
    }

Differences between the two implementations:

From a functional perspective, procedural code implementation couples collection elements, loop iterations, and various logical judgments, exposing too many details. As requirements change and become more complex in the future, procedural code will become difficult to understand and maintain.

Functional solutions decouple code details and business logic. Similar to SQL statements, they express ā€œwhat to doā€ rather than ā€œhow to do itā€, allowing programmers to focus more on business logic and write more Clean, easy to understand and maintain code.

Based on my daily project practical experience, I have made a detailed summary of the core points, confusing usage, typical usage scenarios, etc. of Stream. I hope it can help everyone have a more comprehensive understanding of Stream and apply it to project development more efficiently.

First introduction to Stream

Java 8 adds the Stream feature, which enables users to manipulate data structures such as List and Collection in a functional and simpler way, and implements parallel computing without user awareness.

In a nutshellļ¼Œto perform a computation, stream operations are composed into a stream pipeline. The Stream pipeline consists of the following three partsļ¼š

  • Create Stream (Create from source data, which may be an array, a collection, a generator function, an I/O channel, etc)ļ¼›
  • Intermediate Operations (There may be zero or more, which transform a stream into another stream, such as filter(Predicate) )
  • Terminal Operation (which produces a result or side-effect, such as count() or forEach(Consumer))

The following diagram illustrates these processes:

Each Stream pipeline operation type contains several API methods. Letā€™s first list the functions of each API method.

1. Start the pipeline

Mainly responsible for directly creating a new Stream, or creating a new Stream based on existing array, List, Set, Map and other collection type objects.

In addition to Stream, which is a stream of object references, there are primitive specializations for IntStream, LongStream, and DoubleStream, all of which are referred to as "streams" and conform to the characteristics and restrictions described here.

2.Intermediate pipeline

This step is responsible for processing the Stream and returning a new Stream object. Intermediate pipeline operations can be superimposed.

3.Terminate pipeline

As the name implies, after terminating the pipeline operation, the Stream will end, and finally it may perform some logical processing, or return some execution result data as required.

Stream API detailed usage

1.Create Stream

//Stream.of, IntStream.of...
Stream nameStream = Stream.of("Bob", "Ted", "Zeka");
IntStream ageStream = IntStream.of(18, 17, 19);
ā€‹
//stream, parallelStream
Stream studentStream = students.stream();
        Stream studentParallelStream = students.parallelStream();

In most cases, we create a Stream based on an existing collection list.

2.Intermediate Operations

2.1 map

Both map and flatMap are used to convert existing elements to other element types. The difference is:

  • map must be one-to-one, that is, each element can only be converted into a new element
  • flatMap can be one-to-many, that is, each element can be converted into one or more new elements

Letā€™s look at the map method first. The current requirements are as follows:

Convert the previous list of student objects to a list of student names and output:

/**
     * Use of map: one-to-one   
     * @param students
     * @return
     */
    public List<String> objectToString(List<Student> students) {
        return students.stream()
                .map(Student::getName)
                .collect(Collectors.toList());
    }

Outputļ¼š

[Bob, Ted, Zeka]

As you can see, there are three Students in the input, and there will be three Student names in the output.

2.2 flatMap

Now letā€™s expand the requirements.

The school requires every student to join a team. Suppose Bob, Ted, and Zeka join the basketball team. Alan, Anne, and Davis join the football team.

@Getter
@AllArgsConstructor
public static class Team {
    private String type;
    private List<Student> students;
}
ā€‹
List<Student> basketballStudents = Lists.newArrayList(
                new Student("Bob", 18),
                new Student("Ted", 17),
                new Student("Zeka", 19));
ā€‹
List<Student> footballStudent = Lists.newArrayList(
                new Student("Alan", 19),
                new Student("Anne", 21),
                new Student("Davis", 21));
ā€‹
Team basketballTeam = new Team("bastetball", basketballStudents);
Team footballTeam = new Team("football", footballStudent);
List<Team> teams = Lists.newArrayList(basketballTeam, footballTeam);    //Lists Object depends on the com.google.common.collect

Now we need to count the students in all teams and return them in a merged list. How would you implement this requirement?

We try to use the map method to achieve it, as follows:

List<List<Student>> allStudents = teams.stream()
    .map(Team::getStudents)
    .collect(Collectors.toList());

We can see that if it is unsuccessful, the returned result type is List>, and what we want is actually List.

However, we can easily solve this problem in Java7 and previous versions, as follows:

List<Student> allStudents = new ArrayList<>();
for (Team team : teams) {
    for (Student student : team.getStudents()) {
      allStudents.add(student);
    }
}

But this code with two nested for loops is not elegant. Faced with this demand, flatMap can come into play.

List<Student> allStudents = teams.stream()
    .flatMap(t -> t.getStudents().stream())
    .collect(Collectors.toList());

Do you think itā€™s cool? Itā€™s done in one line of code. The flatMap method accepts a lambda expression function. The return value of the function must be a stream type. The flatMap method will eventually merge all the returned streams to generate a new Stream, but the map method cannot do this.

The following diagram clearly illustrates the processing logic of flatMap:

2.3 filter, distinct, sorted, limit

These are commonly used Stream intermediate operation methods. They are often used together. For specific function descriptions, you can see the previous table. Letā€™s look directly at the requirements this time:

Regarding the list of students in all teams just now, we now need to know the ages of the second and third oldest among these students. They need to be at least 18 years old. In addition, if there are duplicates, they can only be counted as one.

List<Integer> topTwoAges = allStudents.stream() /
      .map(Student::getAge) //[18, 17, 19, 19, 21, 21]
      .filter(a -> a >= 18) //[18, 19, 19, 21, 21]
      .distinct()   //[18, 19, 21]
      .sorted((a1, a2) -> a2 - a1)  //
      .skip(1)  //[19, 18]
      .limit(2) //[19, 18]
      .collect(Collectors.toList());
    
System.out.println(topTwoAges);

Outputļ¼š

[19, 18]

Noteļ¼šSince there are only two elements left after the skip method operation, the limit step can actually be omitted.

I believe you can clearly understand the functions of these methods without much explanation. It is highly recommended that you take the initiative to code and try it yourself!

2.4 peek, foreach

Both the peek method and the foreach method can be used to traverse elements and process them one by one, so we will put them together for comparison and explanation. But it is worth noting that peek is an intermediate operation method and foreach is a terminal operation method.

As described before, the intermediate operation can only be used as a processing step in the middle of the Stream pipeline. It cannot be executed directly to get the result. It must be executed later with the cooperation of a terminal operation. As a terminal method with no return value, foreach can perform the corresponding operation directly.

For example, we say hello, xxxā€¦ to each student on the basketball team using peek and foreach respectively.

//peak
System.out.println("------start peek------");
basketballTeam.getStudents().stream().peek(s -> System.out.println("Hello, " + s.getName()));
System.out.println("------end peek------");
ā€‹
System.out.println();
//foreach
System.out.println("------start foreach------");
basketballTeam.getStudents().stream().forEach(s -> System.out.println("Hello, " + s.getName()));
System.out.println("------end foreach------");

As can be seen from the output, peek is not executed when called alone, while foreach can be executed directly:

------start peek------
------end peek------
ā€‹
------start foreach------
Hello, Bob
Hello, Ted
Hello, Zeka
------end foreach------

If you add a terminal operation after peek, it can be executed.

System.out.println("------start peek------");
basketballTeam.getStudents().stream().peek(s -> System.out.println("Hello, " + s.getName())).count();
System.out.println("------end peek------");
ā€‹
//output
------start peek------
Hello, Bob
Hello, Ted
Hello, Zeka
------end peek------

It is worth mentioning that peek should be used with caution to carry business processing logic.Think about it, as long as there is a termination method, will the peek method be executed?Uncertain! Look at the version, look at the scene!

For example, in the Java8 version, the peek method just now will be executed normally, but in Java17 it will be automatically optimized and the logic in the peek will not be executed.As for the reason, you can take a look at the JDK17 official API documentation.

Because for methods such as findFirst and count, the peek method will be regarded as an operation that is not related to the result, so it is directly optimized and not executed.

We can see from the comments of peekā€™s source code that the recommended usage scenario of peek is for some debugging scenarios. You can use peek to print out the information of each element to facilitate debugging and problem location analysis during the development process. Just like the name peek, it seems that it is just to peek at the changes in data during execution.

3.Terminal Operation

Here we divide terminal operations into two categories.

One type is to obtain simple results, mainly including max, min, count, findAny, findFirst, anyMatch, allMatch and other methods; the so-called simple means that the result form is Numbers, Boolean values, or Optional object values, etc.

The other type is the result collection type. Most scenarios are to obtain the result object of a collection class, such as List, Set or HashMap, etc., which are mainly implemented by the collect method.

3.1 Simple result types

ļ¼ˆ1ļ¼‰max, min

max() and min() are mainly used to return the maximum/minimum value of the element after stream processing. The return result is packaged by Optional. For the use of Optional, please refer to the previous article, Lambda combined with Optional makes Javaā€™s handling of null more elegant.

Letā€™s look directly at the example

Tell me who are the oldest and youngest on the football team?

//max
footballTeam.getStudents().stream()
    .map(Student::getAge)
    .max(Comparator.comparing(a -> a))
    .ifPresent(a -> System.out.println("The maximum age for a football team is " + a));
//min
footballTeam.getStudents().stream()
    .map(Student::getAge)
    .min(Comparator.comparing(a -> a))
    .ifPresent(a -> System.out.println("The minimum age for a football team is " + a));

Output:

The maximum age for a football team is 21
The minimum age for a football team is 19

ļ¼ˆ2ļ¼‰findAny, findFirst

findAny(), findFirst() are mainly used to terminate stream processing when an element that meets the conditions is found. findAny() is the same as findFirst for serial Stream and more efficient for parallel Stream.

Assume that the basketball team has a new student, Tom, who is 19 years old.

List<Student> basketballStudents = Lists.newArrayList(
                new Student("Bob", 18),
                new Student("Ted", 17),
                new Student("Zeka", 19),
	        new Student("Tom", 19));

Take a look at the results of the following two requirements

  1. Return the name of the first student on the basketball team whose age is 19;
  2. Return the name of any student on the basketball team who is 19 years old;
//findFirst
basketballStudents.stream()
    .filter(s -> s.getAge() == 19)
    .findFirst()
    .map(Student::getName)
    .ifPresent(name -> System.out.println("findFirst: " + name));
//findAny
basketballStudents.stream()
    .filter(s -> s.getAge() == 19)
    .findAny()
    .map(Student::getName)
    .ifPresent(name -> System.out.println("findAny: " + name));

Outputļ¼š

findFirst: Zeka
findAny: Zeka

It can be seen that there is no difference between the two functions under serial Stream.The differences in parallel processing will be introduced later.

ļ¼ˆ3ļ¼‰count

Since a new student has been added to the basketball team, how many students are there in the basketball team now?

System.out.println("The number of students on the basketball team: " + basketballStudents.stream().count());

Output:

The number of students on the basketball team: 4

ļ¼ˆ4ļ¼‰anyMatch, allMatch, noneMatch

As the names suggest, these three methods are used to determine whether the element meets the conditions and return a boolean value. Look at the following three examples

  1. Is there a student named Alan on the football team?
  2. Are all students on the football team under 22 years old?
  3. Are there no students on the football team who are over 20 years old?
//anymatch
System.out.println("anymatch: "
                   + footballStudent.stream().anyMatch(s -> s.getName().equals("Alan")));
//allmatch
System.out.println("allmatch: "
                   + footballStudent.stream().allMatch(s -> s.getAge() < 22));
//nonematch
System.out.println("noneMatch: "
                   + footballStudent.stream().noneMatch(s -> s.getAge() > 20));

Output:

anymatch: true
allmatch: true
noneMatch: false

3.2 Result collection types

ļ¼ˆ1ļ¼‰Generate a collection

Generating collections should be considered the most commonly used scenario of collect. In addition to the List mentioned before, Sets, Maps, etc. can also be produced, as follows:

//Get the latest student age distribution for the basketball team, no duplicates allowed
Set<Integer> ageSet = basketballStudents.stream().map(Student::getAge).collect(Collectors.toSet());
System.out.println("set: " + ageSet);
//Get a map of the names and ages of all students on the basketball team
Map<String, Integer> nameAndAgeMap = basketballStudents.stream().collect(Collectors.toMap(Student::getName, Student::getAge));
System.out.println("map: " + nameAndAgeMap);

Output:

set: [17, 18, 19]
map: {Ted=17, Tom=19, Bob=18, Zeka=19}

(2) Generate concatenated strings

In addition to generating collections, collect can also be used to concatenate strings.

For example, after we get the names of all students on the basketball team, we hope to use ā€œ,ā€ to splice all the names and return a String.

System.out.println(basketballStudents.stream().map(Student::getName).collect(Collectors.joining(",")));

Output:

Bob,Ted,Zeka,Tom

Maybe you will say, canā€™t you use String.join() to achieve this function? There is no need to use stream to achieve this. It should be stated here that the charm of Stream is that it can be combined with other business logic for processing, making the code logic more natural and coherent. If it is purely a request for String string splicing, there is really no need to use Stream to achieve it. After all, there is no need to kill a chicken with a big knife!

In addition, Collectors.joining() also supports defining prefixes and suffixes, which is more powerful.

System.out.println(basketballStudents.stream().map(Student::getName).collect(Collectors.joining(",", "(",")")));

Output:

(Bob,Ted,Zeka)

(3) Generate statistical results

There is another scenario that may be rarely used in practice, which is to use collect to generate the sum of digital data. Letā€™s take a brief look at it.

//Calculate average
System.out.println("average age: "
                + basketballStudents.stream().map(Student::getAge).collect(Collectors.averagingInt(a -> a)));态
//Summary statistics
IntSummaryStatistics summary = basketballStudents.stream()
    .map(Student::getAge)
    .collect(Collectors.summarizingInt(a -> a));
System.out.println("summary: " + summary);

In the above example, using collect to perform some mathematical operations on age, the results are as follows:

average age: 18.0
summary: IntSummaryStatistics{count=3, sum=54, min=17, average=18.000000, max=19}

Parallel Stream

Mechanism description

Using parallel streams can effectively utilize computer performance and improve logic execution speed. Parallel Stream divides an entire stream into multiple fragments, then executes processing logic on each fragmented stream in parallel, and finally summarizes the execution results of each fragmented stream into an entire Stream.

As shown in the diagram below, filter for numbers greater than or equal to 18:

Use findAny() efficiently

As mentioned before, findAny() is more efficient in Parallel Stream, and it can be seen from the API documentation that the results of each execution of this method may be different.

We try to execute findAny() 10 times using parallelStream to find any student name that satisfies the criteria of Bob, Tom, and Zeka.

for (int i = 0; i < 10; i++) {
    basketballStudents.parallelStream()
        .filter(s -> s.getAge() >= 18)
        .findAny()
        .map(Student::getName)
        .ifPresent(name -> System.out.println("findAny in parallel stream: " + name));
}

Output:

findAny in parallel stream: Zeka
findAny in parallel stream: Zeka
findAny in parallel stream: Tom
findAny in parallel stream: Zeka
findAny in parallel stream: Zeka
findAny in parallel stream: Bob
findAny in parallel stream: Zeka
findAny in parallel stream: Zeka
findAny in parallel stream: Zeka
findAny in parallel stream: Zeka

This output confirms the instability of findAny().

For more information about Parallel Stream, I will further analyze and discuss it in a later article.

Additional information

1. Delayed execution

Streams are lazy; calculations on source data are only performed when a terminal operation is initiated, and source elements are consumed only when needed. For example, the peek method mentioned before in the article is a good example.

2. Avoid performing terminal operations twice

It is necessary to add a reminder here. Once a Stream is terminated, it cannot be used to perform other operations later, otherwise an error will be reported. See the following example:

Stream<Student> studentStream = basketballStudents.stream().filter(s -> s.getAge() == 19);
// Calculate the number of students
System.out.println("the number of students: " + studentStream.count());
// If you try it again, an error will be reported
try {
    System.out.println("the number of students: " + studentStream.count());
} catch (Exception e) {
    e.printStackTrace();
}

Output:

the number of students: 2
java.lang.IllegalStateException: stream has already been operated upon or closed
			at java.util.stream.AbstractPipeline.<init>(AbstractPipeline.java:203)
			...

Summarize

This article introduces how to use Stream to write more elegant code through multiple cases, and briefly introduces the actual role of each API. Due to space limitations, this article only briefly introduces the usage of collect and Parallel Stream. I will discuss them in depth in subsequent articles.

In addition, if you want to master Stream, it is not enough to look at it, but also need to practice in the project. If you have any questions, please feel free to discuss them in the comments section!

Source code address

Github: https://github.com/junfeng0828/JavaBasic

Directory: src/main/java/stream/StreamCase.java

Finally, if the article was helpful, please clap šŸ‘and follow, thank you! ā•°(*Ā°ā–½Ā°*)ā•Æ

Iā€™m Sea Breeze, looking forward to progressing with you. ā¤ļø

Stackademic

Thank you for reading until the end. Before you go:

  • Please consider clapping and following the writer! šŸ‘
  • Follow us on Twitter(X), LinkedIn, and YouTube.
  • Visit Stackademic.com to find out more about how we are democratizing free programming education around the world.
Programming
Coding
Java
Computer Science
Software Development
Recommended from ReadMedium