How to read more than 1000 items from S3 in Java

Summary

The web content provides guidance on how to retrieve more than 1000 items from an AWS S3 bucket using the AWS SDK for Java by utilizing the getNextContinuationToken to paginate through the results.

Abstract

The AWS SDK for Java simplifies interactions with S3 services, but when dealing with buckets containing more than 1000 items, developers must implement pagination to list all items. The article referenced in the content explains Spring Boot and Amazon S3 integration, while the provided code snippets demonstrate how to list items and handle pagination using a ListObjectsV2Request and getNextContinuationToken to retrieve additional pages of results. The process involves checking if the response is truncated and continuing to request further pages until all items are listed.

Opinions

The AWS SDK for S3 is recognized for providing API abstractions that facilitate development.
It is noted that the AWS S3 API has a limit of 1000 items per request, necessitating pagination for larger buckets.
The use of getNextContinuationToken and isTruncated is emphasized as essential for retrieving all items beyond the initial 1000.
The author invites discussion on alternative methods for handling more than 1000 items in S3, suggesting there may be multiple valid approaches to this common task.

How to read more than 1000 items from S3 in Java

AWS SDK for S3 provides API to manage S3 buckets and objects. It supports higher level abstractions for simplified development. In this article, you can learn how to integrate S3 with a Spring Boot application and make API requests to S3.

AWS S3 API can return a maximum of 1000 items per request. We need to work through several pages of API responses to fully list all items within the S3 bucket.

If we want to list items in a S3 bucket, we can easily list items using the following code snippet.

ListObjectsV2Request listObjectsV2Request = new ListObjectsV2Request();
List<S3ObjectSummary> objectSummaries =
        amazonS3Client.listObjectsV2(listObjectsV2Request).getObjectSummaries();

This code snippet works when you have less than 1000 items in a bucket. If there are more than 1000 items in a bucket, we need to use getNextContinuationToken which returns token to get next results.

According to Java doc, NextContinuationToken is sent when isTruncated is true meaning there are more keys in the bucket that can be listed. The next list requests to Amazon S3 can be continued by providing this NextContinuationToken.

final ListObjectsV2Request listObjectsV2Request = new ListObjectsV2Request();
listObjectsV2Request.setBucketName(myBucketName);
listObjectsV2Request.setPrefix(path);

final List<S3ObjectSummary> objectSummariesList = new ArrayList<>();
ListObjectsV2Result listObjectsV2Result;
do {
    listObjectsV2Result = amazonS3Client.listObjectsV2(listObjectsV2Request);
    objectSummariesList.addAll(listObjectsV2Result.getObjectSummaries());
    listObjectsV2Request.setContinuationToken(listObjectsV2Result.getNextContinuationToken());
} while (listObjectsV2Result.isTruncated());

The client uses NextContinuationToken as a parameter for subsequent calls, until the IsTruncated() value returned in the response is false.

Have you handled reading more than 1000 items from S3 with other approaches? Let me know in comments.