avatarSuraj Mishra

Summary

The web content provides an in-depth exploration of Spring Data JPA's batch deletion methods, including practical examples and performance considerations for deleting records in batches.

Abstract

The article "Spring Data JPA: deleteAllInBatch*() Methods Explained!" delves into the internal mechanisms and practical applications of batch deletion operations in Spring Data JPA. It begins by setting up a sample Campaign table with a substantial dataset of 1 million records to demonstrate the deletion methods. The article then explains three key batch deletion methods provided by JpaRepository: deleteAllInBatch(), deleteAllInBatch(Iterable<T> entities), and deleteAllByIdInBatch(Iterable<ID> ids). Through detailed code examples and generated logs, the author illustrates how these methods work under the hood, the SQL queries they generate, and their performance implications. The article emphasizes the importance of batch operations for efficient data management, especially when dealing with large datasets, and suggests that developers should consider using these methods to avoid the overhead of individual delete operations. It also touches on the potential use of custom queries and the benefits of batch processing for database operations. The conclusion advises developers to choose the appropriate deletion method based on their specific use case and to continuously upskill in areas like SQL and Spring Data JPA to enhance their development capabilities.

Opinions

  • The author advocates for the use of batch deletion methods in Spring Data JPA to improve efficiency when removing large numbers of records from a database.
  • It is suggested that developers should be mindful of the potential performance impact when deleting all records in a table and consider using the truncate operation for such cases.
  • The article implies that developers have the flexibility to write custom queries if the provided batch deletion methods do not meet their specific needs.
  • There is an emphasis on the importance of continuous learning and skill improvement, with the author providing resources for developers to upgrade their knowledge in SQL and Spring Data JPA.
  • The author's opinion is that batch processing is not only more efficient but also necessary for maintaining good database performance, especially when dealing with large datasets.

Spring Data JPA: deleteAllInBatch*() Methods Explained!

Internal code exploration with sample dataset, examples, and more

Introduction

  • Spring Data JPA provides a lot of handy features to interact with databases for all kinds of operations.
  • One of the much-used operations is the deletion of the records. But deleting large numbers of records one by one is not resource efficient, hence doing it them in batches makes much sense.
  • Spring Data JPA provides helper methods that allow us to delete records in batches. In this article, we will deep dive into them, we will explore the internal code of these methods, and write logic along with the execution.

For the best java and spring-related content please subscribe to the newsletter.

Sample Table/Model

  • First of all, let's create a sample table Campaign with some mock data.
CREATE TABLE Campaign (
    id SERIAL PRIMARY KEY,
    name VARCHAR(255) NOT NULL,
    start_date DATE NOT NULL,
    end_date DATE NOT NULL,
    budget DECIMAL(10, 2) NOT NULL
);


postgres=# \d campaign
                                      Table "public.campaign"
   Column   |          Type          | Collation | Nullable |               Default
------------+------------------------+-----------+----------+--------------------------------------
 id         | integer                |           | not null | nextval('campaign_id_seq'::regclass)
 name       | character varying(255) |           | not null |
 start_date | date                   |           | not null |
 end_date   | date                   |           | not null |
 budget     | numeric(10,2)          |           | not null |
Indexes:
    "campaign_pkey" PRIMARY KEY, btree (id)

Inserting Sample Data

  • Now let's insert 1M records to the Campaign table. The below script (plpgsql) does the job.
-- Create a function to generate and insert mock data
CREATE OR REPLACE FUNCTION generate_mock_data()
RETURNS VOID AS $$
DECLARE
    counter INT := 1;
BEGIN
    WHILE counter <= 1000000 LOOP
        INSERT INTO Campaign (name, start_date, end_date, budget)
        VALUES (
            'Campaign ' || counter,
            CURRENT_DATE + (counter * INTERVAL '1 day'),
            CURRENT_DATE + ((counter + 30) * INTERVAL '1 day'),
            ROUND(RANDOM()::numeric * 10000 + 1000, 2)
        );

        counter := counter + 1;
    END LOOP;
END;
$$ LANGUAGE plpgsql;

-- Execute the function to generate and insert 1 million rows
SELECT generate_mock_data();
  • Now our table contains 1M records that we can delete from Spring Data JPA.
postgres=# select count(*) from campaign;
  count
---------
 1000000
(1 row)

JpaRepository

  • JpaRepository is one of the common repositories that we extend when we define our repository interface for managing our entities.
  • JpaRepository provides 3 helper methods that can delete records in batches: 1. deleteAllInBatch() 2. deleteAllInBatch(Iterable<T> entities) 3. deleteAllByIdInBatch(Iterable<ID> ids)
  • Since JpaRepository is an interface, its implementation exists in the SimpleJpaRepository class.

Now let's deep dive into these methods and write some code along with execution.

deleteAllInBatch()

  • This method deletes all the records in the database in one batch.
  • If we look at the internal implementation of this method in the SimpleJpaRepository.java class, we see that all it's doing is getting a delete query, adding query hints if it's passed, and then applying executedUpdate().
@Transactional
public void deleteAllInBatch() {
   Query query = this.entityManager.createQuery(this.getDeleteAllQueryString());
   this.applyQueryHints(query);
   query.executeUpdate();
}


private String getDeleteAllQueryString() {
  return QueryUtils.getQueryString("delete from %s x", 
                   this.entityInformation.getEntityName());
}
  • If we use this method in our logic class, we just need to extend JpaRepository to campaignRepository and we will get access to deleteAllInBatch() since this method is an interface method and implemented by SimpleJpaRepository.java.
public interface CampaignRepository extends JpaRepository<Campaign, Long> { }
public void deleteAllCampaigns(){
     System.out.println("deleting campaigns");
     long l = System.currentTimeMillis();
     campaignRepository.deleteAllInBatch();
     System.out.println("exec time(ms) : "+(System.currentTimeMillis()-l));
}
  • Below are the generated logs for our logic execution. As we can see the generated query is “delete from campaign” .
Executing SQL Query: delete from campaign
Hibernate: delete from campaign

exec time(ms) : 1819
23:56:43.529 INFO  o.h.e.i.StatisticalLoggingSessionEventListener  - Session Metrics {
    7791152 nanoseconds spent acquiring 1 JDBC connections;
    0 nanoseconds spent releasing 0 JDBC connections;
    41620 nanoseconds spent preparing 1 JDBC statements;
    1806712607 nanoseconds spent executing 1 JDBC statements;
    0 nanoseconds spent executing 0 JDBC batches;
    0 nanoseconds spent performing 0 L2C puts;
    0 nanoseconds spent performing 0 L2C hits;
    0 nanoseconds spent performing 0 L2C misses;
    0 nanoseconds spent executing 0 flushes (flushing a total of 0 entities and 0 collections);
    4662 nanoseconds spent executing 1 partial-flushes (flushing a total of 0 entities and 0 collections)
}
  • One thing we should keep in mind is that this method removes all the records from the database in one batch, so if we have a large table it might take too much of the time to finish the operation. In that case, the truncate operation may be faster to implement.
  • This method is not useful if we want to delete certain batches of records but not all.

If we want to delete certain batches of records, then we can use deleteAllInBatch(Iterable<T> entities) and deleteAllByIdInBatch(Iterable<ID> ids) methods.

deleteAllInBatch(Iterable<T> entities)

  • deleteAllInBatch(Iterable<T> entities) takes a collection of entities that we want to delete and builds a delete query for that many records.
  • If we see the internal code of the method, we will see the base query is “delete from %s x” but the applyAndBind method builds another part of the query that includes the target ids of the query.
@Transactional
public void deleteAllInBatch(Iterable<T> entities) {
   Assert.notNull(entities, "Entities must not be null");
   if (entities.iterator().hasNext()) {
        QueryUtils.applyAndBind(
             QueryUtils.getQueryString(
                 "delete from %s x", 
                  this.entityInformation.getEntityName()), 
                  entities, 
                  this.entityManager
              ).executeUpdate();
    }
}
public static <T> Query applyAndBind(String queryString, Iterable<T> entities, EntityManager entityManager) {
    Assert.notNull(queryString, "Querystring must not be null");
    Assert.notNull(entities, "Iterable of entities must not be null");
    Assert.notNull(entityManager, "EntityManager must not be null");
    Iterator<T> iterator = entities.iterator();
    if (!iterator.hasNext()) {
        return entityManager.createQuery(queryString);
     } else {
        String alias = detectAlias(queryString);
        StringBuilder builder = new StringBuilder(queryString);
        builder.append(" where");
        int i = 0;
        
        // building other part of the query with target ids
        while(iterator.hasNext()) {
           iterator.next();
           Object[] var10002 = new Object[]{alias, null};
           ++i;
           var10002[1] = i;
           builder.append(String.format(" %s = ?%d", var10002)) 
           if (iterator.hasNext()) {
               builder.append(" or");
            }
        }

        Query query = entityManager.createQuery(builder.toString());
        iterator = entities.iterator();
        i = 0;

         while(iterator.hasNext()) {
           ++i;
           query.setParameter(i, iterator.next());
         }

     return query;
   }
}
  • Now we can write a simple method to delete a batch of entities. In the below example, we first query certain campaigns and then pass them to deleteAllInBatch(campaign) to delete those many campaigns.
    public void deleteAllCampaignsInDefinedBatch(){
        LocalDate currentDate = LocalDate.now();
        List<Campaign> campaigns = campaignRepository.findByStartDateBetween(currentDate.plusMonths(1),  currentDate.plusDays(10));
        System.out.println(campaigns.size());
        long l = System.currentTimeMillis();
        campaignRepository.deleteAllInBatch(campaigns);
        System.out.println("millisecond: "+(System.currentTimeMillis()-l));
    }
  • Generated logs for the above method are listed below. As we can see deleteAllInBatch(campaign) generates a query following the query “delete from campaign where id=? or id=? or id=?”
  • Here ids are the ids of the campaign entities that we passed.
Executing SQL Query: delete from campaign where id=? or id=? or id=? or id=? or id=? or id=? or id=? or id=? or id=? or id=? or id=? or id=? or id=? or id=? or id=? or id=? or id=? or id=? or id=? or id=? or id=? or id=? or id=? or id=? or id=? or id=? or id=? or id=? or id=? or id=?
Hibernate: delete from campaign where id=? or id=? or id=? or id=? or id=? or id=? or id=? or id=? or id=? or id=? or id=? or id=? or id=? or id=? or id=? or id=? or id=? or id=? or id=? or id=? or id=? or id=? or id=? or id=? or id=? or id=? or id=? or id=? or id=? or id=?
millisecond: 53
00:16:01.733 INFO  o.h.e.i.StatisticalLoggingSessionEventListener  - Session Metrics {
    6932689 nanoseconds spent acquiring 1 JDBC connections;
    0 nanoseconds spent releasing 0 JDBC connections;
    9986550 nanoseconds spent preparing 2 JDBC statements;
    78803811 nanoseconds spent executing 2 JDBC statements;
    0 nanoseconds spent executing 0 JDBC batches;
    0 nanoseconds spent performing 0 L2C puts;
    0 nanoseconds spent performing 0 L2C hits;
    0 nanoseconds spent performing 0 L2C misses;
    1400175 nanoseconds spent executing 1 flushes (flushing a total of 30 entities and 0 collections);
    7571121 nanoseconds spent executing 1 partial-flushes (flushing a total of 30 entities and 30 collections)
}
  • deleteAllInBatch(Iterable<T> entities) is a useful method when we want to delete certain batches of records. If we have to delete let's say 500k records then we can certainly build sub batches and loop over them and delete 10k records or 100k records at a time so that we don’t overwhelm our database.

deleteAllByIdInBatch(Iterable<ID> ids)

  • Like deleteAllInBatch(Iterable<T> entities), we can also use deleteAllByIdInBatch(Iterable<ID> ids) which takes a collection of IDs instead of entities but does the same job.
  • This method internally calls to deleteAllInBatch(Iterable<T> entities). Below is the internal code implementation for deleteAllByIdInBatch(Iterable<ID> ids).
    @Transactional
    public void deleteAllByIdInBatch(Iterable<ID> ids) {
        Assert.notNull(ids, "Ids must not be null");
        if (ids.iterator().hasNext()) {
            if (this.entityInformation.hasCompositeId()) {
                List<T> entities = new ArrayList();
                ids.forEach((id) -> {
                    entities.add(this.getReferenceById(id));
                });
                // calls to deleteAllInBatch(entities)
                this.deleteAllInBatch(entities);
            } else {
                String queryString = String.format("delete from %s x where %s in :ids", this.entityInformation.getEntityName(), this.entityInformation.getIdAttribute().getName());
                Query query = this.entityManager.createQuery(queryString);
                if (Collection.class.isInstance(ids)) {
                    query.setParameter("ids", ids);
                } else {
                    Collection<ID> idsCollection = (Collection)StreamSupport.stream(ids.spliterator(), false).collect(Collectors.toCollection(ArrayList::new));
                    query.setParameter("ids", idsCollection);
                }

                this.applyQueryHints(query);
                query.executeUpdate();
            }

        }
    }
  • Our simple logic queries the database for a range of campaigns and gets the IDs and targets for deletion by passing them to the deleteAllByIdInBatch(ids).
    public void deleteAllCampaignsByIds(){
        LocalDate currentDate = LocalDate.now();
        currentDate = currentDate.plusMonths(1);
        List<Campaign> campaigns = campaignRepository.findByStartDateBetween(currentDate,  currentDate.plusDays(10));
        System.out.println(campaigns.size());
        List<Long> ids = campaigns.stream().map(Campaign::getId).toList();
        System.out.println("ids: "+ids);
        campaignRepository.deleteAllByIdInBatch(ids);
    }
  • Generated logs show that the below query was executed. “delete from campaign where id in (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)”
Executing SQL Query: select c1_0.id,c1_0.budget,c1_0.end_date,c1_0.name,c1_0.start_date from campaign c1_0 where c1_0.start_date between ? and ?
[select c1_0.id,c1_0.budget,c1_0.end_date,c1_0.name,c1_0.start_date from campaign c1_0 where c1_0.start_date between ? and ?]
Hibernate: select c1_0.id,c1_0.budget,c1_0.end_date,c1_0.name,c1_0.start_date from campaign c1_0 where c1_0.start_date between ? and ?
11
ids: [23000102, 23000103, 23000104, 23000105, 23000106, 23000107, 23000108, 23000109, 23000110, 23000111, 23000112]

Executing SQL Query: delete from campaign where id in (?,?,?,?,?,?,?,?,?,?,?)
[select c1_0.id,c1_0.budget,c1_0.end_date,c1_0.name,c1_0.start_date from campaign c1_0 where c1_0.start_date between ? and ?, delete from campaign where id in (?,?,?,?,?,?,?,?,?,?,?)]
Hibernate: delete from campaign where id in (?,?,?,?,?,?,?,?,?,?,?)
00:06:45.019 INFO  o.h.e.i.StatisticalLoggingSessionEventListener  - Session Metrics {
    11824981 nanoseconds spent acquiring 1 JDBC connections;
    0 nanoseconds spent releasing 0 JDBC connections;
    22250505 nanoseconds spent preparing 2 JDBC statements;
    189448302 nanoseconds spent executing 2 JDBC statements;
    0 nanoseconds spent executing 0 JDBC batches;
    0 nanoseconds spent performing 0 L2C puts;
    0 nanoseconds spent performing 0 L2C hits;
    0 nanoseconds spent performing 0 L2C misses;
    1401303 nanoseconds spent executing 1 flushes (flushing a total of 11 entities and 0 collections);
    8131438 nanoseconds spent executing 1 partial-flushes (flushing a total of 11 entities and 11 collections)
}
  • If we have to delete large batch-size records then we can always paginate them and delete them in sub-batches instead of sending let's say 500k records in one query.

Conclusion

  • We should always keep in mind that operating over batches instead of doing all the work in one go makes sense for databases.
  • Spring Data JPA provides multiple helper methods that can delete records from the database. Depending on the use case we can choose one or the other.
  • Apart from using these helper methods, we can always write our own custom queries since Spring Data JPA allows that too.

Become Better Dev:

To upgrade your developer skills checkout below resources:

Popular Blogs:

Jpa
Hibernate
Deleteall
Bulkdelete
Spring Boot
Recommended from ReadMedium