When developing backend services, it’s effortless to create problems if database integration is implemented incorrectly. This article will tell you some best practices for working with relational databases in modern services and also will show you that automatically generating and keeping up-to-date schema is arguably not a good idea. I will use for database migrations, for easy setup, and as an example database. Flyway Spring Boot H2 I didn’t cover basic information about what migrations are and how they work. Here’re good articles from Flyway: basic information about what migrations are how Flyway works . under the hood The problem A long time ago, developers were initializing and updating databases by applying scripts separately from the application. However, nobody does it these days because it’s hard to develop and maintain in a proper state, which leads to severe troubles. Nowadays, developers mostly use two approaches: Automatic generation, e.g., or - database initializes and keeps up to date by comparing classes and the current DB state; if changes are needed, they apply. JPA Hibernate Database migrations - developers incrementally update the database, and changes apply automatically on a startup, database migrations. Also, if we talk about Spring, there’s a database initialization out of the box, but it’s way less advanced than its analogs such as Flyway or Liquibase. basic Hibernate automatical generating To demonstrate how it works let’s use a simple example. Table users with three fields - , , : id user_name email Let’s have a look at the one automatically generated by Hibernate. Hibernate entity: @Entity
@Table(name = "users")
public class User {
    @Id
    @GeneratedValue
    private UUID id;

    @Column(name = "user_name", length = 64, nullable = false)
    private String userName;

    @Column(name = "email", length = 128, nullable = true)
    private String email;
} To enable keeping the schema up to date we need this row in Spring Boot config and it starts doing it on startup: jpa.hibernate.ddl-auto=update And log from hibernate when the application is starting: Hibernate: create table users (id binary(255) not null, email varchar(128), user_name varchar(64) not null, primary key (id)) After automatical generating, It created as with a maximum size of 255 is too much because consists only of 36 characters. So we need to use type instead, however, it doesn’t generate this way. It can be fixed by adding this annotation: id binary UUID UUID @Column(name = "id", columnDefinition = "uuid") However, we’re already writing SQL definition to the column, which breaks the abstraction from SQL to Java. And let’s fill the table with some users: insert into users (id, user_name, email)
values ('297a848d-d406-4055-8a6f-4a4118a44001', 'Artem', null);
insert into users (id, user_name, email)
values ('921a9d42-bf14-4c3f-9893-60f79cdd0825', 'Antonio', 'antonio@gmail.com'); Adding a new column Let’s imagine, for example, that after some time we want to add notifications to our app, and consequently track if a user wants to receive them. So we decided to add a column to table users and make it non-nullable. receive_notifications That means that in Hibernate entity, we add the new column: @Column(name = "receive_notifications", nullable = false)
private Boolean receiveNotifications; After starting the app, we see the error in logs and no new column. It’s because the table is not empty, and we need to set a default value to existing rows: Error executing DDL "alter table users add column receive_notifications boolean not null" via JDBC Statement We can set a default value by adding SQL column definition again: columnDefinition = "boolean default true" And from Hibernate logs, we can see that it worked: Hibernate: alter table users add column receive_notifications boolean default true not null However, let’s imagine we needed to be something more complex, for example, true or false, depending on whether the email is filled or not. It’s impossible to implement that logic only with Hibernate, so we need migrations anyways. receive_notifications To sum up, the main drawbacks of the automatically generated and updated schema approach: It is Java-first and consequently not flexible in terms of SQL, non-predictable, oriented on Java first, and sometimes doesn’t do SQL stuff the way you expect. You can write some SQL definitions to conduct it, but it’s limited compared to pure SQL DDL. Sometimes it’s impossible to update existing tables and do something with data, and we need SQL scripts anyway. In most cases, it ends up with automatic schema updating and keeping migrations for updating data. It’s always easier to avoid automatically generating and doing everything related to the database layer in migrations. Also, it’s not convenient when it comes to parallel development because it doesn’t support versioning, and it’s tough to tell what’s going on with schema. Solution Here is how it looks without automatically generating and updating schema: Script for initializing DB: resources/db/migration/V1__db_initialization.sql create table if not exists users
(
    id        uuid        not null primary key,
    user_name varchar(64) not null,
    email     varchar(128)
); Filling database with some users: resources/db/migration/V2__users_some_data.sql insert into users (id, user_name, email)
values ('297a848d-d406-4055-8a6f-4a4118a44001', 'Artem', null);

insert into users (ID, USER_NAME, EMAIL)
values ('921a9d42-bf14-4c3f-9893-60f79cdd0825', 'Antonio', 'antonio@gmail.com'); Adding the new field and setting the not-trivial default value to existing rows: resources/db/migration/V3__users_add_receive_notification.sql alter table users
    add column if not exists receive_notifications boolean;

-- It's not a really safe with huge amount of data but good for the example
update users
set users.receive_notifications = email is not null;

alter table users
    alter column receive_notifications set not null; And nothing stops us from using hibernate if we choose to. In configs, we need to set this property: jpa.hibernate.ddl-auto=validate Now Hibernate won’t generate anything. It will only check if Java representation matches DB. Moreover, we no longer need to mix some Java and SQL to conduct Hibernate automatical generating, so It can be concise and without extra responsibility: @Entity
@Table(name = "users")
public class User {
    @Id
    @Column(name = "id")
    @GeneratedValue
    private UUID id;

    @Column(name = "user_name", length = 64, nullable = false)
    private String userName;

    @Column(name = "email", length = 128, nullable = true)
    private String email;

    @Column(name = "receive_notifications", nullable = false)
    private Boolean receiveNotifications;
} How To Use Migrations Right Every piece of migration must be idempotent, meaning that if migration applies several times, the database state stays the same. If we ignore that, we can end up with errors after rollbacks or not applying pieces that lead to failures. Idempotency in most cases can be easily achieved by adding checks like / as we did above. if not exists if exists When writing some DDL, it’s better to add as much as reasonably possible in one migration, not create several ones. The main reason is readability. It’s way better if related changes, made in one pull request, are in one file. Don’t change already existing migrations. It’s an obvious but required one. Once migration is written, merged, and deployed it must stay untouched. Some related changes must be done in a separate one. Each developer requires a separate environment. Usually, it’s a local one. The reason is that if some migrations are applied to a shared environment, they will be followed by some failures later due to the way migration instruments work. It’s convenient to have some integration tests that run all the migrations on a test database and check if everything is working. It can be really handy in builds that check the correctness of a PR before merging and a lot of elementary mistakes can be avoided. In this example, there are integration tests that do that check out of the box. It’s better to use pattern for naming migrations instead of using . The second one is convenient and will help to avoid version numbers conflict in parallel development. But sometimes, it’s better to have name conflict than successfully applying migrations without developers controlling the versions. V{version+=1}__description.sql V{datetime}__description.sql Conclusion It was a lot of information, but I hope you will find it helpful. If you use automatically generating/updating of schema - take a close look at what is going on with schema because it can behave unexpectable. And it’s always a good idea to add as much description as possible to conduct it. But it’s better next time to consider migrations because it will relieve Java entities, remove excess responsibility, and benefit you with a lot of control over DDL. To sum up best practices: Write migrations idempotent. Test all migrations together on a test database by writing integration tests. Include related changes into one file. Each developer needs their own DB environment. Take a close look at versions when writing migrations. Don’t change already existing ones. You can find the fully working example on . GitHub

Learn Why and How to Use Relational Database Migrations

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

Untitled Story

How to use HashMap with Custom Keys (and Avoid Shooting Yourself in the Leg)

10 Ways to Optimize Your Database

10 Ways to Reduce Data Loss and Potential Downtime Of Your Database

10 Principles of Proper Database Benchmarking

10 Minute Guide to Fixing Damaged SQL Databases - No Recovery Required!

How to use HashMap with Custom Keys (and Avoid Shooting Yourself in the Leg)

10 Ways to Optimize Your Database

10 Ways to Reduce Data Loss and Potential Downtime Of Your Database

10 Principles of Proper Database Benchmarking

10 Minute Guide to Fixing Damaged SQL Databases - No Recovery Required!

Light-Mode

Classic

Newspaper

Dark-Mode

Neon Noir

Minty

HN StartUps