I was recently going through my notes of SQL Antipatterns and was shocked to realize how actual this book still is. I'm going to share my summary of the book. I guarantee you'll find at least a few antipatterns in your current database design.
Antipattern: Storing a delimiter-separated value string in a varchar field instead of creating an intersection table. It might be easier, but it makes queries harder. And updating the field is also a pain. And your field might have all kinds of consistency errors (for instance, ids are normally numbers, but now you have varchar. The DB cannot ensure consistency for you).
Legitimate Uses of the Antipattern: when the data you're storing in the varchar is not needed to be used in the queries (gets displayed as-is).
Solution: create an intersection table.
Antipattern: When you have multiple level parent -> child relationship and you represent it using an Adjacency list (you have a parent id in every row). It becomes impossible to get the whole tree in one query (because by default you can only get the next level).
Legitimate Uses of the Antipattern: when you're certain that you will only have 1 level of queries. The alternatives are harder to implement.
Solutions:
Reference: "Joe Celko’s Trees and Hierarchies in SQL for Smarties"
Antipattern: Using a primary key that is not the appropriate primary key for this table (e.g., there is a natural primary key like social security number or there could be a compound key in there). If a table has a unique column then, most probably that could be the primary key.
Solution: declare a primary key on a field that is easy to index and makes the most sense for that particular table
Antipattern: when you don't use the constraints offered by the DB (especially foreign keys).
Legitimate Uses of the Antipattern: never, unless your DB doesn't support it
Solution: always declare constraints
Antipattern: For example, by appending the year onto the base table name, we’ve combined a data value with a metadata identifier.
This is the reverse of mixing data with metadata from the Entity-Attribute-Value and Polymorphic Associations antipatterns. In those cases, we stored metadata identifiers (a column name and table name) as string data.
In Multicolumn Attributes and Metadata Tribbles, we’re making a data value into a column name or a table name. If you use any of these antipatterns, you create more problems than you solve.
Antipattern: when you have a table with the columns: entity_id, attribute_name, attribute_value
It appears when you have OOP inheritance and the children have various fields. If you use it, you lose referential integrity and data type validation.
Legitimate Uses of the Antipattern: you shouldn't use this in a relational DB. Just use a non-relational DB or one of the solutions below.
Solutions:
Antipattern: When you need to reference from a single table, or multiple parent tables (e.g., comments to features/bugs). You're using the pattern if you store in a column the name of your type of parent table.
Solutions:
Antipattern: When an entity's attribute can have multiple values, you create multiple columns called attribute1, attribute2, etc.
Solutions:
Antipattern: When to support scalability and performance, you clone tables or new columns to support new partitions: ex: sales_2013, sales_2014, etc. (tables of the same schema with different entities). Basically, in your database, you will have a table per "some data value".
Legitimate Uses of the Antipattern: when splitting the database sensibly makes database administration tasks easier after the database size passes a certain threshold.
Solutions:
Antipattern: When you use float, double, or any other related DB type. The problem is how the number is represented internally by the DB. It will do all kinds of unpredictable roundings.
Legitimate Uses of the Antipattern: scientific applications.
Solution: use NUMERIC or decimal
Antipattern: When you want to restrict a column to specific values (e.g., the status of a ticket), you define the allowed values in the table's schema or as a trigger.
Legitimate Uses of the Antipattern: when you have very few values that will never change. But you might still get it wrong (e.g., gender).
Solution: create a lookup table.
Antipattern: When you want to store Store Images or Other Bulky Media (when you reference them via the DB), and you use the FS directly, instead of the DB. These need to be specifically backed up, and the DB does not manage them.
Legitimate Uses of the Antipattern: when the data files might affect how your DB behaves.
Solution: Use BLOB Data Types As Needed
Antipattern: When you want to Optimize the performance of the DB, and you throw indexes here and there without thinking too much about it (no indexes, too many indexes, queries that cannot benefit from indexes).
Legitimate Uses of the Antipattern: none
Solution:
Antipattern: thinking that null is just another value. Actually, SQL treats null as a special value, different from zero, false, or an empty string. As a general rule, all operations with null will return null. Or using a value to represent null.
Solution: Treat null as a unique value. Declare Columns NOT NULL when it makes sense. Also, consider if using default makes sense.
Antipattern: What you're already doing: Get Row with Greatest Value per Group. When you're using group by and are trying to get extra information from the row that you selected (like other columns that are not included in the group by expression), in other words, referencing Nongrouped Columns in the select list of the query.
Every column in the select list of a query must have a single value row per row group. This is called the Single-Value Rule. Columns named in the GROUP BY clause are guaranteed to be exactly one value per group, no matter how many rows the group matches.
Legitimate Uses of the Antipattern: none because most DBs will throw an error
Solutions:
Antipattern: When you must fetch a random sample from one of your tables, you sort data randomly -> ORDER BY RAND(). This operation does a full table scan.
Legitimate Uses of the Antipattern: only when you know for sure that the size of the data that you're randomizing will not be bigger than 50-100 rows.
Solutions:
Antipattern: When you want to achieve everything in only one SQL query (solving a complex problem in one step,
Legitimate Uses of the Antipattern: when you're going to use that query as a data source in a 3rd party app
Solutions:
Antipattern: Store Password in Plain Text
Solution: hash your passwords with salt directly in your application.
Antipattern: when writing dynamic SQL queries, you end up executing unverified input as code.
Solutions:
Antipattern: when you want to code fast, completely skip engineering best practices. Make SQL a Second-Class Citizen, not giving the DB the same importance that the code gets in terms of quality.
Solution:
Antipattern: when designing an MVC application and you don't consider enough how to separate the app logic between the M, V, and C. (Own note: I'll probably need to re-read this chapter from the book)
Solution: Active record is a design pattern that maps objects to DB tables. The Model Has an Active Record -> basically, build services that compose the models. Aim to make your model a domain model, not a database model.
Should be considered only in the context of your app. How much do you want to normalize/denormalize? It's a tradeoff. And it would be best if you benchmarked it.
The objectives of normalization:
Antipattern: discarding DB error messages or not looking at the raw SQLs that get run when debugging the application.
Solution: Recover from Errors Gracefully -> log potential exceptions every time.
Antipattern: when you want to reduce typing (like explicitly mentioning all columns in the select list), you use:
Legitimate Uses of the Antipattern: when you're writing queries to try things out
Solution: Name Columns Explicitly
Antipattern: when you want to do a full-text search, you use pattern-matching predicates (e.g., LIKE, REGEXP).
Solution: don't use SQL. Elasticsearch has made this one easy. If you have to use SQL (for simple cases, each vendor has a solution for this already)
Antipattern: when you want to have contiguous keys in a table (e.g., when you have deletions in the DB), you want to fill in the gaps.
Instead of allocating a new primary key value using the automatic pseudo key mechanism, you might want to make any new row use the first unused primary key value Renumbering Existing Rows -> changing the existing keys so that they are contiguous.
Legitimate Uses of the Antipattern: none.
Solution: just let it be. You can see the primary key id as UUIDs. The fact that they are consecutive is just a coincidence of the implementation.
Also published here.