The Clean Database Firstly, this article speaks about . And secondly, about - Online Transaction Processing databases. It means that the article is going to describe some important principles of how to design your database to be fast, flexible, and easy to use for usual business operations. Note that business operations for an OLTP database mean that each operation usually needs just one entity. So, a clean OLTP database means that your database is correctly designed for a SOLID code. relational databases OLTP databases The text briefly explains some crucial points in database design. Single Responsibility Principle for the table Remember that . It is easier to understand when you think about how to organize your entities and DAO classes. If your tables in the database are not SRP compliant you cannot easily follow SRP in DAO classes. So, the first, and probably the main, rule in designing an OLTP database is fully relying on the SRP principle. all objects in the databases should comply with the Single Responsibility Principle And sometimes it is not so obvious how to prepare your tables according to SRP. Let’s check on a table of clients: At the first glance, this looks like it is all right. But looking closer, we can see that this table consists of two different sets of fields: the first set is just for client login operations and the second set contains contact info that is usually required by another set of operations. Going deeper - you also can have some service accounts that actually don’t need addresses or login_attempts. Also, you can add some additional fields in the future like last_activity, presige_level, and so on. And for each type of operation, you need to retrieve all data for each client even if it is a login attempt or an email search, for notification purposes. It is not just more difficult for the database but it also makes your objects heavier and codes more cumbersome. Furthermore, some databases (for example PostgreSQL) rewrite the whole row on each update operation even if you change just a single boolean field. It means that if your table consists of 100 columns instead of 10 it would write 10 times more data on each update. And write more often than if you split the table. At the end, in the worst scenarios, the database writes dozens of times more data compared to small tables. How to design it right? For example: Or the client-address relation may be many-to-many. In that situation, you also can use the address table for other entities in the system and use the same functionality and UI to update the addresses of different objects. In the end, we have a different set of operations, for each set of operations we manipulate a different set of data. And it is more important that in general, our operations generate less workload on disks under the database. Also, each operation is blocked by other operations much less since from the database perspective they work with different tables. In the end, we have a more flexible set of tables, they generate less workload on the database and give us the wriggle room to write more SRP-compliant code. One additional benefit is that our tables are just way lighter and each query in the future migrations will be executed much faster. ALTER TABLE Single Responsibility Principle for the column The same principle works for columns as well. Let’s look at the next table: Sometimes developers decide to add additional functionality to tables to avoid migration. In the example above there was the table with operation types and . To avoid migration someone decided to add an additional flag and put it to the field operation_type. For example b2b flag. And now we have , , , and . received sent received sent b2b received b2b sent From some point of view, it could be reasonable but what happens next? For example, we need to add an operation bonus. Should we add ” and ”? What if a bonus can be received and also sent? Should it be 8 possible values? (2 for and ) * (2 for and ) * (2 for and ). How it should be managed in the logic? bonus b2b bonus received sent b2b non-b2b bonus non-bonus We should never mix different functionality and flag in a single field. NULLs There is an important rule - . Let’s look at the table again: avoid NULLs transaction These fields look like some enum. They are also only updated on status change operations and are needed in the next usual operation or status changes. If we need to add 4 more statuses? Add 4 more columns? So why do we store that data in such an important and heavy table as ? It could be changed to a set of 2 tables: _on transaction In that structure, we literally extract half of the data outside of the core table without losing any data. Also, we add some new opportunities to add some additional data. For example, we can add and log users who change the status of a transaction. user_id The Clean Database There are still other points about designing databases to optimize operations but this is a topic for another article. Designing a clean database is a very important part of an engineer’s work. When the database is already populated with data - it is much harder to change the database structure. The worse a database is organized the harder it is to keep the design clean. So . do it as early as possible