Loading Data In Bulk: Multi Insert vs Copy
Comparing bulk loading data with multi-insert and copy
Introduction
- When it comes to loading bulk data into the database table we get 2 options, either we can load them using multi-line insert or using copy command.
- In this article, we will discuss these two options and compare the execution time for these two commands.
Table
- We have a book_reviews table that records the review from different users for different books.
postgres=# \d book_reviews
Table "public.book_reviews"
Column | Type | Collation | Nullable | Default
reviews_id | integer | | not null | nextval('book_reviews_reviews_id_seq'::regclass)
user_id | text | | |
isbn | text | | |
book_rating | text | | |
Indexes:
"book_reviews_pkey" PRIMARY KEY, btree (reviews_id)
Multi values Insert
- Data is inserted with an insert command which inserts one row at a time.
INSERT INTO products (product_no, name, price) VALUES (1, 'Cheese', 9.99);
INSERT INTO products (name, price, product_no) VALUES ('Cheese', 9.99, 1);
- But it will be inefficient to use them to load bulk data since we need to execute the insert command for each insert statement.
- With multi-insert, we only need to parse and execute the query once. for example,
INSERT INTO products (product_no, name, price) VALUES
(1, 'Cheese', 9.99),
(2, 'Bread', 1.99),
(3, 'Milk', 2.99);
- In below example we are constructing multi-line insert command and executing it with prepared statement.
public static void insertAll() throws SQLException {
List<BookReview> bookReviews = IntStream.rangeClosed(1, 1000000)
.mapToObj(i -> getBookReview())
.toList();
String query = "insert into book_reviews (user_id, isbn, book_rating) values ";
String values = bookReviews.stream()
.map(a -> "( " + "'" +a.getUserId()+ "'" + "," + "'" + a.getIsbn()+ "'" + ","+ "'" + a.getBookRating()+ "'" + ")")
.collect(Collectors.joining(","));
Connection connection = getConnection();
PreparedStatement preparedStatement = connection.prepareStatement(query + values);
long start = System.currentTimeMillis();
long rows = preparedStatement.executeUpdate();
connection.commit();
connection.close();
long end = System.currentTimeMillis();
System.out.println("execution took "+(end-start)+" milliseconds");
}
private static BookReview getBookReview() {
BookReview bookReview = new BookReview();
String seed = UUID.randomUUID().toString();
bookReview.setIsbn("isbn"+seed);
bookReview.setBookRating("2.3");
bookReview.setUserId(seed);
return bookReview;
}
private static Connection getConnection() throws SQLException {
Connection connection = DriverManager.getConnection("jdbc:postgresql://localhost/postgres",
$user, $pass);
if(connection != null){
System.out.println("Connected to database");
}
connection.setAutoCommit(false);
return connection;
}
- It takes total 20.9 seconds.
- Although multi-values insert is faster we can do better with copy command
Copy
- We can use single copy command to load bulk data into the table which is more performant than insert and multi-val insert.
- Copy has some in-built optimizations that make it performant. ( such as less commit cost, network latency, context switches, etc. )
- Below example, we are loading bulk data using copy command.
public static void copy3() throws SQLException, IOException {
List<BookReview> bookReviews = IntStream.rangeClosed(1, 1000000)
.mapToObj(i -> getBookReview())
.toList();
Connection connection = getConnection();
String values = bookReviews.stream()
.map(LoadDataWithCopy::toCsv)
.collect(Collectors.joining("\n"));
CopyManager copyManager = ((PGConnection) connection).getCopyAPI();
InputStream is = new ByteArrayInputStream(values.getBytes());
long start = System.currentTimeMillis();
long l = copyManager.copyIn("COPY book_reviews(user_id, isbn, book_rating) FROM STDIN delimiter ',' csv encoding 'UTF-8'",
is);
long end = System.currentTimeMillis();
System.out.println("execution took " + (end - start) + " milliseconds");
connection.close();
}
public static Connection getConnection() throws SQLException {
Connection connection = DriverManager.getConnection("jdbc:postgresql://localhost/postgres",
$user, $pass);
if(connection != null){
System.out.println("Connected to database");
}
return connection;
}
private static BookReview getBookReview() {
BookReview bookReview = new BookReview();
String seed = UUID.randomUUID().toString();
bookReview.setIsbn("isbn"+seed);
bookReview.setBookRating("2.3");
bookReview.setUserId(seed);
return bookReview;
}
private static String toCsv(BookReview a) {
return a.getUserId() + ", " +
a.getIsbn() + ", " +
a.getBookRating();
}
Conclusion
- If we need to load bulk data into the table we should consider using copy over insert/multi-val insert.
- Copy command has some in-built optimizations that make it good options to save on execution time.
References
Before You Leave