John Doe just started a new job as a CTO in a Uber-like startup. He will have to scale, disrupt and innovate. But for now, he needs a users table.
John could have met the day-to-day needs in an infinite number of ways. Let’s examine 3 scenarii amongst others (John chooses PostgreSQL).
John creates a users
table with 6 columns: id (uuid, primary key), email (varchar), password (varchar), role (enum: customer, driver, admin), created_at and updated_at (timestamps).
John alters the users
table and adds a phone column (varchar, nullable). Phone is nullable: the users created on day one do not have a phone number, and John will maybe never know their phone numbers.
John alters again the users
table and adds country (varchar, nullable), gender (enum, nullable), and company (varchar, nullable).
Biographies are only for drivers. There are at least two possibilities:
drivers_biographies
table with FK on users
and a bio text column.Each possibility has its pros and cons, John chooses to add a new table because he dislikes having too much NULL and he is certain he never will add biography to other account types. Maybe he is wrong.
Clients can now ask for custom fields on their users. So it’s some kind of dynamic variable properties. John could add nullable new column on demand on its users table (client ask for users’ pets names, John adds a pet_name column). But it does not sound scalable to him, maybe he is wrong.
So, he decided to create two new tables: properties
and users_properties
. The properties
table has 2 columns:
The users_properties
has 3 columns:
_users_
table)_users_properties_
table) Example: the id of “Pet name”Now every time a client ask for new properties on its users, John’s application adds a new entry in properties
table. Then, for each new user with custom properties, some lines are added in users_properties table for its own properties (EAV-like pattern).
There seems to be some problems with this approach:
drivers_biographies
is a one to one table, it’s overkill.users
table and some other (mobile_phone, pet_name) in users_properties
table (with a different schema)? John’s database architecture could be considered inconsistent.users
table has to be nullable: theses columns were added after the first insertions of users, which have no value.
Quick and dirty implementation of the Bigmouth Buffalo’s Path.
Maybe John could have built a better architecture.
John is foresighted. As of Buffalo Mouth’s Path, he creates a users
table with id, email, password and role. But he also creates immediately a users_properties
table (Entity-Attribute-Value model), similar to Wordpress approach. users_properties
is a table with user_id (foreign key on users table), key (varchar, example: name, company, phone, etc.), and value (text).
John is relaxed, he does not have to alter its table schemas. Its program just adds new users in database with their new properties. He created an admin panel to quickly add new properties.
There is one benefit over the previous design. It’s simple to explain and understand, each property is located in one unique table. EAV is a well known pattern, even if John had not heard about it, he would have invented it himself. There are only two tables for describing users metadata, and there will not be more. But there are also caveats:
SELECT name FROM users WHERE age=50 AND gender='f'
becomes:
SELECTname.meta_value AS name,FROMusers_properties age,users_properties name,users_properties genderWHERE age.key = ’age’AND age.value = ’50'AND age.user_id = name.user_idAND gender.user_id = age.user_idAND gender.key = ’gender’ and gender.value = ’f’
Unreadable random table found on Google Images
As many else, John took an interest years ago in “NoSQL”. He tried and abandoned MongoDB to return to its first love, PostgreSQL. He remembers feeling schema-less data can have benefits in some cases, especially in variable metadata. So this time, John creates a users
table with id, email, password, role, metadata (JSONB) created_at and updated_at. Its metadata column is schema-less, he could store objects like:
{"phone": "+33612345678", "company": "My company", "gender": "f"}
Table schema does not change.
There are some benefits with this design over previous paths:
select * from users where metadata->'age'=50
There are some warnings too:
{ "bookings": [{ "id": "aaa-bb-cc", "date": "2017–01–12", "duration": "10 days"},{ "id": "xxx-yy-zz", "date": "2017–02–10", "duration": "1 day"},]}
In my tinkerer career, I created users
table following these three paths (not precisely, but similar) and some other: we could think about dynamically adding column and tables, changing for another DBMS, etc. For the last few months, I’ve been following the third path. I’m sure there is no « right » way, and this path has many hidden caveats too (I should read more) but I’m OK with it for my day-to-day work. I migrated a legacy system with 60+ half-useless tables to about 10 tables by removing metadata tables. Not sure it’s better anyway, and I’m not saying less table is better, but in this specific case, I think code and database are easier to read, create, update and delete. I know I will discover a new path in a few month and be ashamed of what I have done.
Feel free to comment with advices, feedback and criticism. I would be really happy to learn more.
Sorry for long post, here is a potato