3,793 reads

Spring and PostgreSQL: Make Your Database Inserts 30 Times Faster

by Sergei DzeboevOctober 18th, 2023

Too Long; Didn't Read

This article explores the integration of PostgreSQL's "upsert" operation within the Spring framework, showcasing a substantial increase in data handling efficiency. While abstraction eases transitions between implementations, it often neglects unique database features. By leveraging upsert, developers can enjoy enhanced performance. However, this method is not without its limitations, being restricted to flat structures and exclusive to PostgreSQL. Through examples and code snippets, the article guides readers on implementing this approach, emphasizing a balance between its incredible performance and inherent constraints.

featured image - Spring and PostgreSQL: Make Your Database Inserts 30 Times Faster

One popular stack for Java backend applications combines Spring with PostgreSQL for the database. If you’re utilizing this pairing, then this article is specifically tailored for you. Even if you're not, some insights here could prove useful.

Explore how to optimize data inserts by creating a convenient query generator that handles the heavy lifting, freeing you to focus on routine data-saving functionalities.

Efficiency of Unique Database Features

While abstraction offers the advantage of simplifying transitions between implementations, it can also result in the loss of unique database features. This is because abstraction necessitates reliance on universally applicable functions, negating the benefits of specialized features.

The standard implementation of JPA in Spring exemplifies this limitation. Although its default methods operate flawlessly, they aren't always the most efficient. This raises a pertinent question: why select a database if one doesn't fully leverage its distinct capabilities?

It’s comparable to owning a high-performance car but only tapping into a fraction of its potential, perhaps constrained by the apprehension of an imminent model change.

An example of efficiency is the so-called "upsert" operation in the PostgreSQL database. This operation is primarily used for inserts, but if an entry already exists in the database, an update is performed instead - hence, the term "upsert," a blend of "update" and "insert."

Tests indicate that upsert operations are, on average, 30 times faster than the standard save methods employed by JPA repositories. However, writing a distinct upsert query for each object can be a tedious task. In the following sections, we'll explore how to devise a custom repository capable of automatically generating and executing upsert queries for any object.

@Entity and @Table for Entity Verification

Since we are working with an Entity, we will actively use annotations to generate the script. To execute the query, we will use NamedParameterJdbcTemplate from Spring. This class enables easy execution of queries with a script and a list of elements to be written to the database.

Let’s begin by creating a method that reads the @Table and @Entity annotations to ensure we are working with the appropriate class:

private Table extractTableAnnotation(Class<?> aClass) {
    Table tableAnnotation = aClass.getAnnotation(Table.class);
    Entity entityAnnotation = aClass.getAnnotation(Entity.class);
    if (isNull(entityAnnotation)) {
        throw new UnsupportedOperationException("Query can only be generated for Entities");
    }

    if (isNull(tableAnnotation)) {
        throw new UnsupportedOperationException("Query can be generated only for Entities annotated with @Table");
    }
    return tableAnnotation;
}

Extracting Table Path

Next, we will extract the table name and schema from the @Table annotation, giving us the full path to the table. If the table name is not specified, we'll use the class name, converting it to a lower underscore case.

If the schema is not provided, we'll use only the table name, defaulting to the standard schema:

private String extractTablePath(Table tableAnnotation, String className) {
        var table = tableAnnotation.name();

        if (!StringUtils.hasText(table)){
            table =  toLowerUnderscoreCase(className);
        }

        if (StringUtils.hasText(tableAnnotation.schema())) {
            return tableAnnotation.schema() + "." + table;
        }
        return table;
    }

Column and Field Information Extraction

Next, gather information about the columns. Before doing this, prepare a method that converts the field name to the column name by changing the camel case to the underscore case. The following regular expression will aid in this conversion:

"(?<=[a-z])[A-Z]"

Create a pattern field with this value:

private final Pattern camelCasePattern = Pattern.compile("(?<=[a-z])[A-Z]");

Now, using this pattern, create a simple method:

private String toLowerUnderscoreCase(String camelCaseText) {
         return camelCasePattern.matcher(camelCaseText).replaceAll(match -> "_" + match.group())
                 .toLowerCase();
     }

Proceed to collect fields and columns. The column name will be derived from the @Column annotation for each field. If the annotation is absent or the name value is not provided, use the above method to generate the column name.

Additionally, extract the table’s ID using the @Id annotation, which is mandatory. To facilitate data transfer, create a small nested record:

private record EntityFieldsDescription(List<String> columns, List<String> fields, String id) {}

With all the necessary tools in place, proceed to extract all required names:

private EntityFieldsDescription extractEntityInfo(Class<?> aClass) {
    Field[] declaredFields = aClass.getDeclaredFields();
    List<String> columns = new LinkedList<>();
    List<String> fields = new LinkedList<>();
    var id = "id";
    for (Field field : declaredFields) {
        String fieldName = field.getName();

        Column columnAnnotation = field.getAnnotation(Column.class);
        String columnName;
        if (Objects.isNull(columnAnnotation) || !StringUtils.hasText(columnAnnotation.name())) {
            columnName = toLowerUnderscoreCase(fieldName);
        } else {
            columnName = columnAnnotation.name();
        }

        columns.add(columnName);
        fields.add(fieldName);

        if (Objects.nonNull(field.getAnnotation(Id.class))) {
            id = columnName;
        }
    }

    return new EntityFieldsDescription(columns, fields, id);
}

Generating the Upsert Script

After collecting all the necessary data, we can proceed to create a method for generating the script. With the table name, columns, and ID at hand, we will prepare an upsert script, taking cues from the PostgreSQL documentation.

According to the documentation, the syntax for upsert is akin to insert, distinguished by the additional ON CONFLICT command at the end. This command specifies the actions to take if a database record with the same ID already exists – either do nothing or update specific columns.

We'll opt for the latter. In case of an ID conflict, our script will entirely update the record. The EXCLUDED keyword marks the new values, facilitating the update of column values. With this knowledge, we can draft a method to generate the essential elements:

private String generateUpsertQuery(EntityFieldsDescription fieldsDescription, String tableSchema) {
    return "INSERT INTO " +
            tableSchema +
            " (" +
            String.join(",", fieldsDescription.columns()) +
            ") VALUES (" +
            String.join(",", fieldsDescription.fields().stream().map(s -> ":" + s).toList()) +
            ") ON CONFLICT (" +
            fieldsDescription.id() +
            ") DO UPDATE SET " +
            String.join(",", fieldsDescription.columns().stream().map(s -> s + " = EXCLUDED." + s).toList()) +
            ";";
}

Here’s an example of a complete, generated script:

INSERT INTO students (id,first_name,last_name,email,date_of_birth) 
VALUES (:id,:firstName,:lastName,:email,:dateOfBirth) 
ON CONFLICT (id) 
DO UPDATE 
SET id = EXCLUDED.id,
first_name = EXCLUDED.first_name,
last_name = EXCLUDED.last_name,
email = EXCLUDED.email,
date_of_birth = EXCLUDED.date_of_birth;

Bringing It All Together

With the methods to obtain all necessary parts now prepared, it’s time to consolidate them into a single method. This consolidated method will accept any Class<?> as a parameter, enabling dynamic script generation for any Entity:

public String generateUpsertQuery(Class<?> aClass) {
    Table tableAnnotation = extractTableAnnotation(aClass);
    EntityFieldsDescription fieldsDescription = extractEntityInfo(aClass);
    String tablePath = extractTablePath(tableAnnotation, aClass.getSimpleName());

    return generateUpsertQuery(fieldsDescription, tablePath);
}

Final Step: Executing the Upsert Operation

There's just a little bit left to do. After naming the script and setting the Entity, we can execute this operation to write the data to the database. Inject NamedParameterJdbcTemplate using @RequiredArgsConstructor from Lombok to create a private final field:

private final NamedParameterJdbcTemplate template;

Then, add a public method that accepts a List of any objects. Upon receiving the first element from the list, the method will generate the necessary script and execute an upsert into the database:

public <T> void upsertAll(List<T> values) {
    if (!CollectionUtils.isEmpty(values)) {
        Class<?> aClass = values.get(0).getClass();
        template.batchUpdate(generateUpsertQuery(aClass), values.stream().map(BeanPropertySqlParameterSource::new)
                .toArray(BeanPropertySqlParameterSource[]::new));
    }
}

Testing the Upsert

All the functionalities have been written and are ready for testing. Let’s create a Student Entity and compare the write speed with that of the default JPA repository:

@Data
@Entity
@Table(name = "students")
public class Student {

    @Id
    private Integer id;

    @Column(name = "first_name")
    private String firstName;

    private String lastName;

    @Column(name = "email")
    private String email;

    @Column
    private LocalDate dateOfBirth;
}

In the test, we will randomly generate 10k to 100k values and write them to the database using both the default and our custom repository. Each entry's ID will be created based on its position in the list. This approach simulates a scenario of an ID conflict to initiate the row update:

private long testExecutionTime() {
        List<Student> students = generateStudentList(random.nextInt(10_000, 100_000));

        long upsertSavingMillis = saveUsingUpsert(students);
        long repoSavingMillis = saveUsingRepository(students);

        Assertions.assertThat(upsertSavingMillis).isLessThan(repoSavingMillis);

        long rate = repoSavingMillis / upsertSavingMillis;
        System.out.printf("Upsert is %d times faster\n", rate);

        return rate;
    }

Let's repeat this operation several times to measure the average performance:

@Test
    void shouldFindExecutionSpeedForUpsertAndRepoSaving() {
        List<Long> testResults = new LinkedList<>();
        for (int i = 1; i <= 5; i++) {
            System.out.println("____________Run #" + i + "______________");
            testResults.add(testExecutionTime());
            System.out.println("____________Run #" + i + "______________\n");
        }

        int average = (int) testResults.stream().mapToDouble(value -> value).average().orElseThrow();
        System.out.printf("Upsert is %d times faster on average\n", average);
    }

Let’s see the results:

____________Run #5______________
Saving using upsert:
	Time: 322 ms
	Students Processed: 17459
Saving using repository:
	Time: 11625 ms
	Students Processed: 17459
Upsert is 36 times faster
____________Run #5______________

Upsert is 33 times faster on average

As a result, on average, the upsert script delivers performance that is 33 times better, clearly making it a preferable choice over the standard JPA method.

Pros and Cons of the Upsert Approach

Of course, this approach has its drawbacks. The script will not work with nested entities and is only suitable for flat structures. However, if you are certain that you have exactly the same data format to save, there is no reason not to utilize the more efficient upsert.

Below are the pros and cons of this approach compared to JPA:

Pros:

Incredible performance
Capability to generate a script for any entity

Cons:

Limited to flat structures
Exclusive to PostgreSQL

Conclusion

Utilizing a database's specific features and tricks often leads to enhanced performance. It is typically more advantageous to embrace these features rather than avoid them for solution universality. Implementing a straightforward solution can significantly boost performance and elevate your database experience.

The complete code is available on my GitHub page.