Django is a popular framework that you can select to develop an application for your company. But what if you want to create a SaaS application that multiple clients will use? What architecture should you choose? Let’s see how this task can be approached.
The most straightforward approach is to create a separate instance for each client you have. Let’s say we have a Django application and a database. Then, for each client, we need to run its own database and application instance. That means that each application instance has only one tenant.
This approach is simple to implement: you need to just start a new instance of every service that you have. But at the same time, it can cause a problem: each client will significantly increase the cost of the infrastructure. It may be not a big deal if you plan to have just a few clients or if each instance is tiny.
However, let’s assume that we are building a large company that provides a corporate messenger to 100,000 organizations. Imagine, how expensive it can be to duplicate the whole infrastructure for each new client! And, when we need to update the application version, we need to deploy it for each client, so the deployment will be slowed down too.
There is another approach that can help in a scenario when we have a lot of clients for the application: a multi-tenant architecture. It means that we have multiple clients, that we call tenants, but they all use only one instance of the application.
While this architecture solves the problem of the high cost of dedicated instances for each client, it introduces a new problem: how can we be sure that the client’s data is securely isolated from other clients?
We will discuss the following approaches:
Using a shared database and shared database schema: We can identify which tenant owns the data by the foreign key that we need to add to each database table.
Using a shared database, but separate database schemas: This way, we won’t need to maintain multiple database instances but will get a good level of tenant data isolation.
Using separate databases: it looks similar to the single-tenant example, but won’t be the same, as we will still use a shared application instance and select which database to use by checking the tenant.
Let’s dive deeper into these ideas and see how to integrate them with the Django application.
This option may be the first that comes to mind: to add a ForeignKey to the tables, and use it to select appropriate data for each tenant. However, it has a huge disadvantage: the tenants’ data is not isolated at all, so a small programming error can be enough to leak the tenant’s data to the wrong client.
Let’s take an example of database structure from Django documentation:
from django.db import models
class Question(models.Model):
question_text = models.CharField(max_length=200)
pub_date = models.DateTimeField("date published")
class Choice(models.Model):
question = models.ForeignKey(Question, on_delete=models.CASCADE)
choice_text = models.CharField(max_length=200)
votes = models.IntegerField(default=0)
We’ll need to identify which records are owned by which tenant. So, we need to add a Tenant
table and a foreign key in each existing table:
class Tenant(models.Model):
name = models.CharField(max_length=200)
class Question(models.Model):
tenant = models.ForeignKey(Tenant, on_delete=models.CASCADE)
question_text = models.CharField(max_length=200)
pub_date = models.DateTimeField("date published")
class Choice(models.Model):
tenant = models.ForeignKey(Tenant, on_delete=models.CASCADE)
question = models.ForeignKey(Question, on_delete=models.CASCADE)
choice_text = models.CharField(max_length=200)
votes = models.IntegerField(default=0)
To simplify the code a little bit, we can create an abstract base model that will be reused in each other model that we create.
class Tenant(models.Model):
name = models.CharField(max_length=200)
class BaseModel(models.Model):
tenant = models.ForeignKey(Tenant, on_delete=models.CASCADE)
class Meta:
abstract = True
class Question(BaseModel):
question_text = models.CharField(max_length=200)
pub_date = models.DateTimeField("date published")
class Choice(BaseModel):
question = models.ForeignKey(Question, on_delete=models.CASCADE)
choice_text = models.CharField(max_length=200)
votes = models.IntegerField(default=0)
As you can see, there are at least two major risks here: a developer can forget to add a tenant field to the new model, or a developer can forget to use this field while filtering the data.
The source code for this example can be found on GitHub: https://github.com/bp72/django-multitenancy-examples/tree/main/01_shared_database_shared_schema.
Keeping in mind the risks of the shared schema, let’s consider another option: the database will be still shared, but we’ll create a dedicated schema for each tenant. For implementation, we can look at a popular library django-tenants (documentation).
Let’s add django-tenants
to our small project (the official installation steps can be found here).
The first step is the library installation via pip
:
pip install django-tenants
Change the models: the Tenant
model will now be in a separate app Question
and Choice
models won’t have a connection with the tenant anymore. As different tenants’ data will be in separate schemas, we won’t need to link the individual records with the tenant rows anymore.
The file tenants/models.py
from django.db import models
from django_tenants.models import TenantMixin, DomainMixin
class Tenant(TenantMixin):
name = models.CharField(max_length=200)
# default true, schema will be automatically created and synced when it is saved
auto_create_schema = True
class Domain(DomainMixin): # a required table for django-tenants too
...
The file polls/models.py
from django.db import models
class Question(models.Model):
question_text = models.CharField(max_length=200)
pub_date = models.DateTimeField("date published")
class Choice(models.Model):
question = models.ForeignKey(Question, on_delete=models.CASCADE)
choice_text = models.CharField(max_length=200)
votes = models.IntegerField(default=0)
Notice that Question and Choice don’t have a foreign key to Tenant anymore!
The other thing that was changed is that the Tenant is now in a separate app: it’s not only for separating the domains but also important as we will need to store the tenants
table in the shared schema, and polls
tables will be created for each tenant schema.
Make changes to the settings.py
file to support multiple schemas and tenants:
DATABASES = {
'default': {
'ENGINE': 'django_tenants.postgresql_backend',
# ..
}
}
DATABASE_ROUTERS = (
'django_tenants.routers.TenantSyncRouter',
)
MIDDLEWARE = (
'django_tenants.middleware.main.TenantMainMiddleware',
#...
)
TEMPLATES = [
{
#...
'OPTIONS': {
'context_processors': [
'django.template.context_processors.request',
#...
],
},
},
]
SHARED_APPS = (
'django_tenants', # mandatory
'tenants', # you must list the app where your tenant model resides in
'django.contrib.contenttypes',
# everything below here is optional
'django.contrib.auth',
'django.contrib.sessions',
'django.contrib.sites',
'django.contrib.messages',
'django.contrib.admin',
)
TENANT_APPS = (
# your tenant-specific apps
'polls',
)
INSTALLED_APPS = list(SHARED_APPS) + [app for app in TENANT_APPS if app not in SHARED_APPS]
TENANT_MODEL = "tenants.Tenant"
TENANT_DOMAIN_MODEL = "tenants.Domain"
Next, let’s create and apply the migrations:
python manage.py makemigrations
python manage.py migrate_schemas --shared
As a result, we’ll see that the public schema will be created and will contain only shared tables.
We’ll need to create a default tenant for the public
schema:
python manage.py create_tenant --domain-domain=default.com --schema_name=public --name=default_tenant
Set is_primary
to True
if asked.
And then, we can start creating the real tenants of the service:
python manage.py create_tenant --domain-domain=tenant1.com --schema_name=tenant1 --name=tenant_1
python manage.py create_tenant --domain-domain=tenant2.com --schema_name=tenant2 --name=tenant_2
Notice that there are now 2 more schemas in the database that contain polls
tables:
Now, you’ll get the Questions and Choices from different schemas when you call APIs on the domains that you set up for the tenants - all done!
Although the setup looks more complicated and maybe even harder if you migrate the existing app, the approach itself still has a lot of advantages such as the security of the data.
The code of the example can be found here.
The last approach that we will discuss today is going even further and having separate databases for the tenants.
This time, we’ll have a few databases:
We’ll store the shared data such as tenant’s mapping to the databases’ names in the default_db
and create a separate database for each tenant.
Then we’ll need to set the databases config in the settings.py:
DATABASES = {
'default': {
'NAME': 'default_db',
...
},
'tenant_1': {
'NAME': 'tenant_1',
...
},
'tenant_2': {
'NAME': 'tenant_2',
...
},
}
And now, we’ll be able to get the data for each tenant by calling using
QuerySet method:
Questions.objects.using(‘tenant_1’)…
The downside of the method is that you’ll need to apply all migrations on each database by using:
python manage.py migrate --database=tenant_1
It also may be less convenient to create a new database for each tenant, compared to the usage of the django-tenants
or just using a foreign key as in the shared schema approach.
On the other hand, the isolation of the tenant’s data is really good: the databases can be physically separated. Another advantage is that we won’t be limited by using only Postgresql as it’s required by the django-tenants
, we can select any engine that will suit our needs.
More information on the multiple databases topic can be found in the Django documentation.
|
Single-tenant |
MT with shared schema |
MT with separate schema |
MT with separate databases |
---|---|---|---|---|
Data isolation |
✅High |
❌Lowest |
✅High |
✅High |
Risk of leaking data accidentally |
✅Low |
❌High |
✅Low |
✅Low |
Infrastructure cost |
❌Higher with each tenant |
✅Lower |
✅Lower |
✅❌ Lower than single-tenant |
Deployment speed |
❌Lower with each tenant |
✅ |
✅❌ Migrations will be slower as they need to be executed for each schema |
✅❌ Migrations will be slower as they need to be executed for each database |
Easy to implement |
✅ |
❌ Requires a lot of changes if the service was already implemented as a single-tenant app |
✅ |
✅ |
To summarize all the above, It looks like there is no silver bullet for the problem, each approach has its pros and cons, so it’s up to the developers to decide what trade-off they can have.
Separate databases provide the best isolation for the tenant’s data and are simple to implement, however, it costs you a higher for maintenance: n database to update, database connections numbers are higher.
A shared database with a separate schema bit complex to implement and might have some problems with migration.
Single tenant is the most simple to implement, but it costs you by resource over-consumption since you have an entire copy of your service per tenant.