Mastering Java Stream in One Article
My articles are open to everyone; non-member readers can read the full article by clicking this link.
Column
In the previous articleļ¼Lambda expression in javaļ¼I briefly mentioned the use of Stream. In this article, we will explore it in depth. First, letās take the familiar Student
class as an example. Suppose there are a group of students:
@Getter
@AllArgsConstructor
public static class Student {
private String name;
private Integer age;
}
ā
List<Student> students = Lists.newArrayList(
new Student("Bob", 18),
new Student("Ted", 17),
new Student("Zeka", 19));
Now there is such a demand:
Return students 18 and older from a given student list, in descending order of age, up to 2.
In Java7 and earlier code, we will implement it in the following way:
public List<Student> getTwoOldestStudents(List<Student> students) {
List<Student> result = new ArrayList<>();
// 1.Loop to determine the age of students, first filter out students who match the age
for (Student student : students) {
if (student.getAge() >= 18) {
result.add(student);
}
}
// 2.Sort the list of eligible students by age
result.sort((s1, s2) -> s2.getAge() - s1.getAge());
// 3.Determine the result size. If it is greater than 2, intercept the sublist of the first two data and return it.
if (result.size() > 2) {
result = result.subList(0, 2);
}
return result;
}
In Java8 and later versions, with the help of Stream, we can write the following code more elegantly:
public List<Student> getTwoOldestStudentsByStream(List<Student> students) {
return students.stream()
.filter(s -> s.getAge() >= 18)
.sorted((s1, s2) -> s2.getAge() - s1.getAge())
.limit(2)
.collect(Collectors.toList());
}
Differences between the two implementations:
From a functional perspective, procedural code implementation couples collection elements, loop iterations, and various logical judgments, exposing too many details. As requirements change and become more complex in the future, procedural code will become difficult to understand and maintain.
Functional solutions decouple code details and business logic. Similar to SQL statements, they express āwhat to doā rather than āhow to do itā, allowing programmers to focus more on business logic and write more Clean, easy to understand and maintain code.
Based on my daily project practical experience, I have made a detailed summary of the core points, confusing usage, typical usage scenarios, etc. of Stream. I hope it can help everyone have a more comprehensive understanding of Stream and apply it to project development more efficiently.
First introduction to Stream
Java 8 adds the Stream feature, which enables users to manipulate data structures such as List and Collection in a functional and simpler way, and implements parallel computing without user awareness.
In a nutshellļ¼to perform a computation, stream operations are composed into a stream pipeline. The Stream pipeline consists of the following three partsļ¼
- Create Stream (Create from source data, which may be an array, a collection, a generator function, an I/O channel, etc)ļ¼
- Intermediate Operations (There may be zero or more, which transform a stream into another stream, such as
filter(Predica
te) ) - Terminal Operation (which produces a result or side-effect, such as
coun
t() orforEach(Consum
er))
The following diagram illustrates these processes:
Each Stream pipeline operation type contains several API methods. Letās first list the functions of each API method.
1. Start the pipeline
Mainly responsible for directly creating a new Stream, or creating a new Stream based on existing array, List, Set, Map and other collection type objects.
In addition to Stream
, which is a stream of object references, there are primitive specializations for IntStr
eam, LongStr
eam, and DoubleStr
eam, all of which are referred to as "streams" and conform to the characteristics and restrictions described here.
2.Intermediate pipeline
This step is responsible for processing the Stream and returning a new Stream object. Intermediate pipeline operations can be superimposed.
3.Terminate pipeline
As the name implies, after terminating the pipeline operation, the Stream will end, and finally it may perform some logical processing, or return some execution result data as required.
Stream API detailed usage
1.Create Stream
//Stream.of, IntStream.of...
Stream nameStream = Stream.of("Bob", "Ted", "Zeka");
IntStream ageStream = IntStream.of(18, 17, 19);
ā
//stream, parallelStream
Stream studentStream = students.stream();
Stream studentParallelStream = students.parallelStream();
In most cases, we create a Stream based on an existing collection list.
2.Intermediate Operations
2.1 map
Both map
and flatMap
are used to convert existing elements to other element types. The difference is:
- map must be one-to-one, that is, each element can only be converted into a new element
- flatMap can be one-to-many, that is, each element can be converted into one or more new elements
Letās look at the map method first. The current requirements are as follows:
Convert the previous list of student objects to a list of student names and output:
/**
* Use of map: one-to-one
* @param students
* @return
*/
public List<String> objectToString(List<Student> students) {
return students.stream()
.map(Student::getName)
.collect(Collectors.toList());
}
Outputļ¼
[Bob, Ted, Zeka]
As you can see, there are three Students in the input, and there will be three Student names in the output.
2.2 flatMap
Now letās expand the requirements.
The school requires every student to join a team. Suppose Bob, Ted, and Zeka join the basketball team. Alan, Anne, and Davis join the football team.
@Getter
@AllArgsConstructor
public static class Team {
private String type;
private List<Student> students;
}
ā
List<Student> basketballStudents = Lists.newArrayList(
new Student("Bob", 18),
new Student("Ted", 17),
new Student("Zeka", 19));
ā
List<Student> footballStudent = Lists.newArrayList(
new Student("Alan", 19),
new Student("Anne", 21),
new Student("Davis", 21));
ā
Team basketballTeam = new Team("bastetball", basketballStudents);
Team footballTeam = new Team("football", footballStudent);
List<Team> teams = Lists.newArrayList(basketballTeam, footballTeam); //Lists Object depends on the com.google.common.collect
Now we need to count the students in all teams and return them in a merged list. How would you implement this requirement?
We try to use the map method to achieve it, as follows:
List<List<Student>> allStudents = teams.stream() .map(Team::getStudents) .collect(Collectors.toList());
We can see that if it is unsuccessful, the returned result type is List>, and what we want is actually List
However, we can easily solve this problem in Java7 and previous versions, as follows:
List<Student> allStudents = new ArrayList<>();
for (Team team : teams) {
for (Student student : team.getStudents()) {
allStudents.add(student);
}
}
But this code with two nested for loops is not elegant. Faced with this demand, flatMap can come into play.
List<Student> allStudents = teams.stream() .flatMap(t -> t.getStudents().stream()) .collect(Collectors.toList());
Do you think itās cool? Itās done in one line of code. The flatMap method accepts a lambda expression function. The return value of the function must be a stream type. The flatMap method will eventually merge all the returned streams to generate a new Stream, but the map method cannot do this.
The following diagram clearly illustrates the processing logic of flatMap:
2.3 filter, distinct, sorted, limit
These are commonly used Stream intermediate operation methods. They are often used together. For specific function descriptions, you can see the previous table. Letās look directly at the requirements this time:
Regarding the list of students in all teams just now, we now need to know the ages of the second and third oldest among these students. They need to be at least 18 years old. In addition, if there are duplicates, they can only be counted as one.
List<Integer> topTwoAges = allStudents.stream() /
.map(Student::getAge) //[18, 17, 19, 19, 21, 21]
.filter(a -> a >= 18) //[18, 19, 19, 21, 21]
.distinct() //[18, 19, 21]
.sorted((a1, a2) -> a2 - a1) //
.skip(1) //[19, 18]
.limit(2) //[19, 18]
.collect(Collectors.toList());
System.out.println(topTwoAges);
Outputļ¼
[19, 18]
Noteļ¼Since there are only two elements left after the skip
method operation, the limit
step can actually be omitted.
I believe you can clearly understand the functions of these methods without much explanation. It is highly recommended that you take the initiative to code and try it yourself!
2.4 peek, foreach
Both the peek method and the foreach method can be used to traverse elements and process them one by one, so we will put them together for comparison and explanation. But it is worth noting that peek is an intermediate operation method and foreach is a terminal operation method.
As described before, the intermediate operation can only be used as a processing step in the middle of the Stream pipeline. It cannot be executed directly to get the result. It must be executed later with the cooperation of a terminal operation. As a terminal method with no return value, foreach can perform the corresponding operation directly.
For example, we say hello, xxxā¦ to each student on the basketball team using peek and foreach respectively.
//peak
System.out.println("------start peek------");
basketballTeam.getStudents().stream().peek(s -> System.out.println("Hello, " + s.getName()));
System.out.println("------end peek------");
ā
System.out.println();
//foreach
System.out.println("------start foreach------");
basketballTeam.getStudents().stream().forEach(s -> System.out.println("Hello, " + s.getName()));
System.out.println("------end foreach------");
As can be seen from the output, peek
is not executed when called alone, while foreach
can be executed directly:
------start peek------
------end peek------
ā
------start foreach------
Hello, Bob
Hello, Ted
Hello, Zeka
------end foreach------
If you add a terminal operation after peek, it can be executed.
System.out.println("------start peek------");
basketballTeam.getStudents().stream().peek(s -> System.out.println("Hello, " + s.getName())).count();
System.out.println("------end peek------");
ā
//output
------start peek------
Hello, Bob
Hello, Ted
Hello, Zeka
------end peek------
It is worth mentioning that peek should be used with caution to carry business processing logic.Think about it, as long as there is a termination method, will the peek method be executed?Uncertain! Look at the version, look at the scene!
For example, in the Java8 version, the peek method just now will be executed normally, but in Java17 it will be automatically optimized and the logic in the peek will not be executed.As for the reason, you can take a look at the JDK17 official API documentation.
Because for methods such as findFirst
and count
, the peek method will be regarded as an operation that is not related to the result, so it is directly optimized and not executed.
We can see from the comments of peekās source code that the recommended usage scenario of peek is for some debugging scenarios. You can use peek to print out the information of each element to facilitate debugging and problem location analysis during the development process. Just like the name peek, it seems that it is just to peek at the changes in data during execution.
3.Terminal Operation
Here we divide terminal operations into two categories.
One type is to obtain simple results, mainly including max
, min
, count
, findAny
, findFirst
, anyMatch
, allMatch
and other methods; the so-called simple means that the result form is Numbers, Boolean values, or Optional object values, etc.
The other type is the result collection type. Most scenarios are to obtain the result object of a collection class, such as List, Set or HashMap, etc., which are mainly implemented by the collect
method.
3.1 Simple result types
ļ¼1ļ¼max, min
max() and min() are mainly used to return the maximum/minimum value of the element after stream processing. The return result is packaged by Optional. For the use of Optional, please refer to the previous article, Lambda combined with Optional makes Javaās handling of null more elegant.
Letās look directly at the example
Tell me who are the oldest and youngest on the football team?
//max
footballTeam.getStudents().stream()
.map(Student::getAge)
.max(Comparator.comparing(a -> a))
.ifPresent(a -> System.out.println("The maximum age for a football team is " + a));
//min
footballTeam.getStudents().stream()
.map(Student::getAge)
.min(Comparator.comparing(a -> a))
.ifPresent(a -> System.out.println("The minimum age for a football team is " + a));
Output:
The maximum age for a football team is 21
The minimum age for a football team is 19
ļ¼2ļ¼findAny, findFirst
findAny(), findFirst() are mainly used to terminate stream processing when an element that meets the conditions is found. findAny() is the same as findFirst for serial Stream and more efficient for parallel Stream.
Assume that the basketball team has a new student, Tom, who is 19 years old.
List<Student> basketballStudents = Lists.newArrayList(
new Student("Bob", 18),
new Student("Ted", 17),
new Student("Zeka", 19),
new Student("Tom", 19));
Take a look at the results of the following two requirements
- Return the name of the first student on the basketball team whose age is 19;
- Return the name of any student on the basketball team who is 19 years old;
//findFirst
basketballStudents.stream()
.filter(s -> s.getAge() == 19)
.findFirst()
.map(Student::getName)
.ifPresent(name -> System.out.println("findFirst: " + name));
//findAny
basketballStudents.stream()
.filter(s -> s.getAge() == 19)
.findAny()
.map(Student::getName)
.ifPresent(name -> System.out.println("findAny: " + name));
Outputļ¼
findFirst: Zeka
findAny: Zeka
It can be seen that there is no difference between the two functions under serial Stream.The differences in parallel processing will be introduced later.
ļ¼3ļ¼count
Since a new student has been added to the basketball team, how many students are there in the basketball team now?
System.out.println("The number of students on the basketball team: " + basketballStudents.stream().count());
Output:
The number of students on the basketball team: 4
ļ¼4ļ¼anyMatch, allMatch, noneMatch
As the names suggest, these three methods are used to determine whether the element meets the conditions and return a boolean value. Look at the following three examples
- Is there a student named Alan on the football team?
- Are all students on the football team under 22 years old?
- Are there no students on the football team who are over 20 years old?
//anymatch
System.out.println("anymatch: "
+ footballStudent.stream().anyMatch(s -> s.getName().equals("Alan")));
//allmatch
System.out.println("allmatch: "
+ footballStudent.stream().allMatch(s -> s.getAge() < 22));
//nonematch
System.out.println("noneMatch: "
+ footballStudent.stream().noneMatch(s -> s.getAge() > 20));
Output:
anymatch: true
allmatch: true
noneMatch: false
3.2 Result collection types
ļ¼1ļ¼Generate a collection
Generating collections should be considered the most commonly used scenario of collect. In addition to the List mentioned before, Sets, Maps, etc. can also be produced, as follows:
//Get the latest student age distribution for the basketball team, no duplicates allowed
Set<Integer> ageSet = basketballStudents.stream().map(Student::getAge).collect(Collectors.toSet());
System.out.println("set: " + ageSet);
//Get a map of the names and ages of all students on the basketball team
Map<String, Integer> nameAndAgeMap = basketballStudents.stream().collect(Collectors.toMap(Student::getName, Student::getAge));
System.out.println("map: " + nameAndAgeMap);
Output:
set: [17, 18, 19]
map: {Ted=17, Tom=19, Bob=18, Zeka=19}
(2) Generate concatenated strings
In addition to generating collections, collect
can also be used to concatenate strings.
For example, after we get the names of all students on the basketball team, we hope to use ā,ā to splice all the names and return a String.
System.out.println(basketballStudents.stream().map(Student::getName).collect(Collectors.joining(",")));
Output:
Bob,Ted,Zeka,Tom
Maybe you will say, canāt you use String.join()
to achieve this function? There is no need to use stream
to achieve this. It should be stated here that the charm of Stream
is that it can be combined with other business logic for processing, making the code logic more natural and coherent. If it is purely a request for String string splicing, there is really no need to use Stream
to achieve it. After all, there is no need to kill a chicken with a big knife!
In addition, Collectors.joining() also supports defining prefixes and suffixes, which is more powerful.
System.out.println(basketballStudents.stream().map(Student::getName).collect(Collectors.joining(",", "(",")")));
Output:
(Bob,Ted,Zeka)
(3) Generate statistical results
There is another scenario that may be rarely used in practice, which is to use collect
to generate the sum of digital data. Letās take a brief look at it.
//Calculate average
System.out.println("average age: "
+ basketballStudents.stream().map(Student::getAge).collect(Collectors.averagingInt(a -> a)));ć
//Summary statistics
IntSummaryStatistics summary = basketballStudents.stream()
.map(Student::getAge)
.collect(Collectors.summarizingInt(a -> a));
System.out.println("summary: " + summary);
In the above example, using collect
to perform some mathematical operations on age, the results are as follows:
average age: 18.0
summary: IntSummaryStatistics{count=3, sum=54, min=17, average=18.000000, max=19}
Parallel Stream
Mechanism description
Using parallel streams can effectively utilize computer performance and improve logic execution speed. Parallel Stream divides an entire stream into multiple fragments, then executes processing logic on each fragmented stream in parallel, and finally summarizes the execution results of each fragmented stream into an entire Stream.
As shown in the diagram below, filter for numbers greater than or equal to 18:
Use findAny() efficiently
As mentioned before, findAny() is more efficient in Parallel Stream, and it can be seen from the API documentation that the results of each execution of this method may be different.
We try to execute findAny() 10 times using parallelStream to find any student name that satisfies the criteria of Bob, Tom, and Zeka.
for (int i = 0; i < 10; i++) {
basketballStudents.parallelStream()
.filter(s -> s.getAge() >= 18)
.findAny()
.map(Student::getName)
.ifPresent(name -> System.out.println("findAny in parallel stream: " + name));
}
Output:
findAny in parallel stream: Zeka
findAny in parallel stream: Zeka
findAny in parallel stream: Tom
findAny in parallel stream: Zeka
findAny in parallel stream: Zeka
findAny in parallel stream: Bob
findAny in parallel stream: Zeka
findAny in parallel stream: Zeka
findAny in parallel stream: Zeka
findAny in parallel stream: Zeka
This output confirms the instability of findAny().
For more information about Parallel Stream, I will further analyze and discuss it in a later article.
Additional information
1. Delayed execution
Streams are lazy; calculations on source data are only performed when a terminal operation is initiated, and source elements are consumed only when needed. For example, the peek method mentioned before in the article is a good example.
2. Avoid performing terminal operations twice
It is necessary to add a reminder here. Once a Stream is terminated, it cannot be used to perform other operations later, otherwise an error will be reported. See the following example:
Stream<Student> studentStream = basketballStudents.stream().filter(s -> s.getAge() == 19);
// Calculate the number of students
System.out.println("the number of students: " + studentStream.count());
// If you try it again, an error will be reported
try {
System.out.println("the number of students: " + studentStream.count());
} catch (Exception e) {
e.printStackTrace();
}
Output:
the number of students: 2
java.lang.IllegalStateException: stream has already been operated upon or closed
at java.util.stream.AbstractPipeline.<init>(AbstractPipeline.java:203)
...
Summarize
This article introduces how to use Stream to write more elegant code through multiple cases, and briefly introduces the actual role of each API. Due to space limitations, this article only briefly introduces the usage of collect
and Parallel Stream. I will discuss them in depth in subsequent articles.
In addition, if you want to master Stream, it is not enough to look at it, but also need to practice in the project. If you have any questions, please feel free to discuss them in the comments section!
Source code address
Github: https://github.com/junfeng0828/JavaBasic
Directory: src/main/java/stream/StreamCase.java
Finally, if the article was helpful, please clap šand follow, thank you! ā°(*Ā°ā½Ā°*)āÆ
Iām Sea Breeze, looking forward to progressing with you. ā¤ļø
Stackademic
Thank you for reading until the end. Before you go:
- Please consider clapping and following the writer! š
- Follow us on Twitter(X), LinkedIn, and YouTube.
- Visit Stackademic.com to find out more about how we are democratizing free programming education around the world.