avatarEric Anicet

Summary

The web content provides an in-depth exploration of how to leverage Spring Batch 5.X with Spring Boot 3.X and MongoDB to handle batch processing tasks effectively.

Abstract

The article "Spring Batch 5 — Read from MongoDB and generate CSV files: Part 1" delves into the capabilities of Spring Batch 5.X when used in conjunction with Spring Boot 3.X and MongoDB. It offers an overview of Spring Batch, discussing its architecture and the new features introduced in version 5.X, such as JDK 17 baseline support, dependencies upgrades, batch infrastructure and testing configuration updates, improved Java Records support, full GraalVM native support, and the introduction of the new cursor-based MongoItemReader. The article also touches on the deprecation and removal of APIs and provides insights into the performance benefits of using Spring Batch with GraalVM for cloud-native batch workloads. The content is aimed at developers looking to implement robust batch applications, particularly those involving data read operations from MongoDB and the generation of CSV files.

Opinions

  • The author positions Spring Batch as a vital tool for enterprise systems, emphasizing its lightweight nature and ease of use.
  • The article suggests that the improvements in Spring Batch 5.X, such as Java 17 support and GraalVM native compilation, are significant advancements for batch processing performance.
  • The introduction of DefaultBatchConfiguration and @SpringBatchTest annotation for JUnit 5 indicates a commitment to modernizing the framework and simplifying configuration and testing.
  • The renaming of MongoItemReader to MongoPagingItemReader and the addition of MongoCursorItemReader reflect a focus on performance optimization for large data sets in MongoDB.
  • The author implies that the removal of deprecated APIs and the migration guide are important for developers transitioning to Spring Batch 5.X to ensure a smooth migration process.
  • The recommendation of ZAI.chat as a cost-effective alternative to ChatGPT Plus suggests the author's endorsement of the service for similar AI performance at a lower cost.

Spring Batch 5 — Read from MongoDB and generate CSV files: Part 1

In this story, we are going to explore how to use Spring Batch 5.X with Spring Boot 3.X and MongoDB to perform business operations.

· OverviewSpring Batch OverviewSpring Batch Architecture · What’s New in Spring Batch 5.xJDK 17 baselineDependencies upgradeBatch Infrastructure Configuration UpdatesBatch Testing Configuration UpdatesJava Records Support ImprovementFull GraalVM native supportNew Cursor-based MongoItemReaderOther featuresAPI deprecation and removalSupport Lifecycle · Continue reading · References

Overview

Spring Batch Overview

Spring Batch is a lightweight, comprehensive batch framework designed to enable the development of robust batch applications that are vital for the daily operations of enterprise systems. Spring Batch builds upon the characteristics of the Spring Framework that people have come to expect (productivity, POJO-based development approach, and general ease of use) while making it easy for developers to access and use more advanced enterprise services when necessary. Spring Batch is not a scheduling framework.

Spring Batch Architecture

The Batch application process is organized into four logical tiers, which include Run, Job, Application, and Data.

  1. Run Tier: The Run Tier is concerned with the scheduling and launching of the application.
  2. Job Tier: The Job Tier is responsible for the overall execution of a batch job. It sequentially executes batch steps, ensuring that all steps are in the correct state and all appropriate policies are enforced.
  3. Application Tier: The Application Tier contains components required to execute the program. It contains specific tasklets that address the required batch functionality and enforces policies around a tasklet execution (e.g., commit intervals, capture of statistics, etc.)
  4. Data Tier: The Data Tier provides the integration with the physical data sources that might include databases, files, or queues.

Let’s take a look at the main components of Spring Batch used in processing huge volumes of data:

  • JobLauncher represents a simple interface for launching a Job with a given set of JobParameters.It can be directly used by the user. However, a batch process can be started simply by starting CommandLineJobRunner from the java command.
  • Job is an entity that encapsulates an entire batch process. In Spring Batch, a Job is simply a container for Step instances. It combines multiple steps that logically belong together in a flow and allows for the configuration of properties global to all steps, such as restartability. The job configuration contains the name of the job, the definition, and ordering of Step instances, and whether or not the job is restartable.
@Bean
public Job footballJob(JobRepository jobRepository) {
    return new JobBuilder("footballJob", jobRepository)
                     .start(playerLoad())
                     .next(gameLoad())
                     .next(playerSummarization())
                     .build();
}
  • Step is a domain object that encapsulates an independent, sequential phase of a batch job. It is a unit of processing that constitutes a Job. 1 job can contain 1~N steps reusing a process, parallelization, conditional branching can be performed by dividing 1 job process into multiple steps. A Step can be as simple or complex as the developer desires. A simple Step might load data from a file into the database, requiring little or no code (depending upon the implementations used).
  • JobRepository is the persistence mechanism that provides CRUD operations for JobLauncher, Job, and Step implementations on the database based on the table schema specified by Spring Batch.
  • ItemReader is an abstraction that represents the retrieval of input for a Step, one item at a time. When the ItemReader has exhausted the items it can provide, it indicates this by returning null.
  • ItemProcessor is an abstraction that represents the business processing of an item. It provides an access point to transform or apply other business processes.
  • ItemWriter is an abstraction that represents the output of a Step, one batch or chunk of items at a time. Generally, an ItemWriter does not know the input it should receive next and knows only the item that was passed in its current invocation.

What’s New in Spring Batch 5.x

JDK 17 baseline

Spring Batch 5 is based on Spring Framework 6 which requires Java 17 as a minimum version. Therefore, we need to use Java 17+ to run Spring Batch 5 applications.

Dependencies upgrade

Spring Batch 5 is updating its Spring dependencies across the board to the following versions:

  • Spring Framework 6
  • Spring Integration 6
  • Spring Data 3
  • Spring LDAP 3
  • Spring AMQP 3
  • Spring for Apache Kafka 3
  • Micrometer 1.10

This release also marks the migration to:

  • Jakarta EE 9
  • Hibernate 6

Batch Infrastructure Configuration Updates

Spring Batch 5 introduces a new configuration class named DefaultBatchConfiguration, as an alternative to the @EnableBatchProcessing annotation. It provides all infrastructure beans with default configuration which can be customized as needed. Now, we can specify a transaction manager and customize its transaction attributes using the JobExplorer interface.

@Configuration
class MyJobConfiguration extends DefaultBatchConfiguration {

 @Bean
 public Job job(JobRepository jobRepository) {
  return new JobBuilder("myJob", jobRepository)
    //define job flow as needed
    .build();
 }

}

Batch Testing Configuration Updates

Spring Batch 5, the entire test suite has been migrated to JUnit 5. It introduces @SpringBatchTest annotation that can be specified on a test class that runs Spring Batch-based tests.

@SpringBatchTest
@SpringJUnitConfig(SkipSampleConfiguration.class)
public class SkipSampleFunctionalTests { ... }
  • @SpringJUnitConfig indicates that the class should use Spring’s JUnit facilities
  • @SpringBatchTest injects Spring Batch test utilities (such as the JobLauncherTestUtils and JobRepositoryTestUtils) in the test context

Java Records Support Improvement

Spring Batch 5 also provides enhancements to leverage the Record API in different parts of the framework with Java 17 as a baseline. For example, the FlatFileItemReaderBuilder is now able to detect if the item type is a record or a regular class and configure the corresponding FieldSetMapper implementation accordingly (ie RecordFieldSetMapper for records and BeanWrapperFieldSetMapper for regular classes).

public record Person(int id, String name){}

@Bean
FlatFileItemReader<Person> myReader() {
   return new FlatFileItemReaderBuilder<Person>()
         .name("recordItemReader")
         .resource(new ClassPathResource("persons.csv"))
         .delimited()
         .names("id", "name")
         .fieldSetMapper(new RecordFieldSetMapper<>(Person.class))
         .build();
}

Full GraalVM native support

The native support has been improved significantly in v5.0 by providing the necessary Ahead-Of-Time processing and runtime hints to natively compile Spring Batch applications with GraalVM.

https://spring.io/blog/2022/11/24/spring-batch-5-0-goes-ga/

The values shown here are the average of 10 executions of the sample using the following software and hardware setup:

  • JVM: OpenJDK version “17” 2021–09–14
  • GraalVM: OpenJDK Runtime Environment GraalVM CE 22.0.0.2
  • MacOS BigSur v11.6.2 (CPU: 2,4 GHz 8-Core Intel Core i9, Memory: 32 GB 2667 MHz DDR4)

As these benchmarks show, a native Spring Batch application is two times faster at startup and almost ten times faster at runtime! This really is a game changer for cloud-native batch workloads!

New Cursor-based MongoItemReader

Spring Batch 5.1.0 introduces the MongoCursorItemReader, a new cursor-based item reader for MongoDB. This implementation uses cursors instead of paging to read data from MongoDB, which improves the performance of reads on large collections. For consistency with other cursor/paging readers, the current MongoItemReader has been renamed to MongoPagingItemReader.

Other features

  1. v5.0
  • New Maven Bill Of Materials for Spring Batch modules
  • Default encoding to UTF-8 in all areas of the framework
  • Batch tracing with Micrometer
  • Add support to use any type as a job parameter
  • Full MariaDB support

2. v5.1

  • Virtual Threads support
  • Bulk inserts support in MongoItemWriter
  • New item reader and writer for Redis
  • Memory management improvement in the JpaItemWriter

API deprecation and removal

All APIs that were deprecated in previous versions have been removed. Moreover, some APIs have been deprecated in v5.0 and are scheduled for removal in v5.2. Finally, some APIs have been moved or removed without deprecation for practical reasons.

Please refer to the migration guide for more details about these changes.

Support Lifecycle

https://spring.io/projects/spring-batch/#support

Continue reading

The second part of the story, which explains the implementation of Spring Batch 5 with Spring Boot 3 and MongoDB, is available here.

References

Spring Boot
Spring Batch
Java17
Mongodb
Maven
Recommended from ReadMedium