When developing backend services, it’s effortless to create problems if database integration is implemented incorrectly. This article will tell you some best practices for working with relational databases in modern services and also will show you that automatically generating and keeping up-to-date schema is arguably not a good idea.
I will use Flyway for database migrations, Spring Boot for easy setup, and H2 as an example database.
I didn’t cover basic information about what migrations are and how they work. Here’re good articles from Flyway:
A long time ago, developers were initializing and updating databases by applying scripts separately from the application. However, nobody does it these days because it’s hard to develop and maintain in a proper state, which leads to severe troubles.
Nowadays, developers mostly use two approaches:
Automatic generation, e.g., JPA or Hibernate - database initializes and keeps up to date by comparing classes and the current DB state; if changes are needed, they apply.
Database migrations - developers incrementally update the database, and changes apply automatically on a startup, database migrations.
Also, if we talk about Spring, there’s a basic database initialization out of the box, but it’s way less advanced than its analogs such as Flyway or Liquibase.
To demonstrate how it works let’s use a simple example. Table users with three fields - id
, user_name
, email
:
Let’s have a look at the one automatically generated by Hibernate.
Hibernate entity:
@Entity
@Table(name = "users")
public class User {
@Id
@GeneratedValue
private UUID id;
@Column(name = "user_name", length = 64, nullable = false)
private String userName;
@Column(name = "email", length = 128, nullable = true)
private String email;
}
To enable keeping the schema up to date we need this row in Spring Boot config and it starts doing it on startup:
jpa.hibernate.ddl-auto=update
And log from hibernate when the application is starting:
Hibernate: create table users (id binary(255) not null, email varchar(128), user_name varchar(64) not null, primary key (id))
After automatical generating, It created id
as binary
with a maximum size of 255 is too much because UUID
consists only of 36 characters. So we need to use UUID
type instead, however, it doesn’t generate this way. It can be fixed by adding this annotation:
@Column(name = "id", columnDefinition = "uuid")
However, we’re already writing SQL definition to the column, which breaks the abstraction from SQL to Java.
And let’s fill the table with some users:
insert into users (id, user_name, email)
values ('297a848d-d406-4055-8a6f-4a4118a44001', 'Artem', null);
insert into users (id, user_name, email)
values ('921a9d42-bf14-4c3f-9893-60f79cdd0825', 'Antonio', '[email protected]');
Let’s imagine, for example, that after some time we want to add notifications to our app, and consequently track if a user wants to receive them. So we decided to add a column receive_notifications
to table users and make it non-nullable.
That means that in Hibernate entity, we add the new column:
@Column(name = "receive_notifications", nullable = false)
private Boolean receiveNotifications;
After starting the app, we see the error in logs and no new column. It’s because the table is not empty, and we need to set a default value to existing rows:
Error executing DDL "alter table users add column receive_notifications boolean not null" via JDBC Statement
We can set a default value by adding SQL column definition again:
columnDefinition = "boolean default true"
And from Hibernate logs, we can see that it worked:
Hibernate: alter table users add column receive_notifications boolean default true not null
However, let’s imagine we needed receive_notifications
to be something more complex, for example, true or false, depending on whether the email is filled or not. It’s impossible to implement that logic only with Hibernate, so we need migrations anyways.
To sum up, the main drawbacks of the automatically generated and updated schema approach:
It is Java-first and consequently not flexible in terms of SQL, non-predictable, oriented on Java first, and sometimes doesn’t do SQL stuff the way you expect. You can write some SQL definitions to conduct it, but it’s limited compared to pure SQL DDL.
Sometimes it’s impossible to update existing tables and do something with data, and we need SQL scripts anyway. In most cases, it ends up with automatic schema updating and keeping migrations for updating data. It’s always easier to avoid automatically generating and doing everything related to the database layer in migrations.
Also, it’s not convenient when it comes to parallel development because it doesn’t support versioning, and it’s tough to tell what’s going on with schema.
Here is how it looks without automatically generating and updating schema:
Script for initializing DB:
resources/db/migration/V1__db_initialization.sql
create table if not exists users
(
id uuid not null primary key,
user_name varchar(64) not null,
email varchar(128)
);
Filling database with some users:
resources/db/migration/V2__users_some_data.sql
insert into users (id, user_name, email)
values ('297a848d-d406-4055-8a6f-4a4118a44001', 'Artem', null);
insert into users (ID, USER_NAME, EMAIL)
values ('921a9d42-bf14-4c3f-9893-60f79cdd0825', 'Antonio', '[email protected]');
Adding the new field and setting the not-trivial default value to existing rows:
resources/db/migration/V3__users_add_receive_notification.sql
alter table users
add column if not exists receive_notifications boolean;
-- It's not a really safe with huge amount of data but good for the example
update users
set users.receive_notifications = email is not null;
alter table users
alter column receive_notifications set not null;
And nothing stops us from using hibernate if we choose to. In configs, we need to set this property:
jpa.hibernate.ddl-auto=validate
Now Hibernate won’t generate anything. It will only check if Java representation matches DB. Moreover, we no longer need to mix some Java and SQL to conduct Hibernate automatical generating, so It can be concise and without extra responsibility:
@Entity
@Table(name = "users")
public class User {
@Id
@Column(name = "id")
@GeneratedValue
private UUID id;
@Column(name = "user_name", length = 64, nullable = false)
private String userName;
@Column(name = "email", length = 128, nullable = true)
private String email;
@Column(name = "receive_notifications", nullable = false)
private Boolean receiveNotifications;
}
if not exists
/ if exists
as we did above.V{version+=1}__description.sql
pattern for naming migrations instead of using V{datetime}__description.sql
. The second one is convenient and will help to avoid version numbers conflict in parallel development. But sometimes, it’s better to have name conflict than successfully applying migrations without developers controlling the versions.
It was a lot of information, but I hope you will find it helpful. If you use automatically generating/updating of schema - take a close look at what is going on with schema because it can behave unexpectable. And it’s always a good idea to add as much description as possible to conduct it.
But it’s better next time to consider migrations because it will relieve Java entities, remove excess responsibility, and benefit you with a lot of control over DDL.
To sum up best practices:
You can find the fully working example on GitHub.