This article will walk through how we correctly persist static & media files for a Django application hosted on Heroku. As a bonus, it will also explain how we can satisfy the additional constraint of specifying private versus public media files based on model definitions.
Before I begin, this post extends from Michael Herman’s TestDriven.io article that was written a while back. I frequent it often when setting up my projects, and have built some extra functionality on top of it over the years. I decided to create a more focused post that references Heroku & Bucketeer with these extra features after helping an individual on StackOverflow.
So without further ado, let's first dive into what static & media files are and how Heroku dynos manage their filesystem?
If you are working with a Django project, then you inevitably have all of your Python application code written around a bunch of .py
files. These are the code paths of your application, and the end-user - hopefully - never actually sees these files or their contents.
Outside of these business-logic files, it is common to serve users directly from your server's file system. For these static files, Django doesn't need to run any code for them; the framework looks up the file and returns the contents for the requesting user to view.
Some examples of static files include:
Non-templated HTML
CSS & JavaScript files to make your page look nice
User profile pictures
Generated PDFs
Media files in Django are a particular variant of static files. Media files are read from the server's file system as well. Unlike static files, though, they are usually generated files uploaded by users or generated by your application and are associated with a model's FileField
or ImageField
. In the examples above, user profile pictures and generated PDFs are typical examples of media files.
When a new media file is uploaded to a Django web application, the framework looks at the DEFAULT_FILE_STORAGE
settings configuration to determine how to store that file. By default, it uses the django.core.files.storage.FileSystemStorage
class, which is what most projects start off as having configured. This implementation looks at the MEDIA_ROOT
configuration that is defined in the settings.py
file and copies the uploaded file contents to a deterministically-created file path under that given MEDIA_ROOT
.
For example, if the MEDIA_ROOT
is set as /var/www/media
, all uploaded files will be copied and written to a location under /var/www/media/
.
Storing these static files on your server's disk file system is okay until you start to work with a containerization platform such as Heroku. To explain why this is the case, it helps to take a step back.
When downloading files on your personal computer, it's okay that these get written to the file system - usually under ~/Downloads
or somewhere similar. This download is because you expect your computer's file system to persist across restarts and shutdowns; if you download a file and restart your computer, that downloaded file should still be there once the laptop is finished restarting.
Heroku uses containerization to execute customer workloads. One fact of this environment is that the associated file systems do not persist across restarts and reschedules. Heroku dynos are ephemeral, and they can be destroyed, restarted, and moved without any warning, which replaces the associated filesystem. This situation means that any uploaded files referenced by FileField's and
ImageField's are just deleted without a trace every time the dyno is restarted, moved, or scaled.
I will be stepping through the process of configuring the Django application for Heroku & S3-compatible storage, but feel free to reference the repository below for the complete code to browse through.
https://github.com/dstarner/django-heroku-static-file-example
This tutorial aims to help you retrofit an existing Django project with S3-compatible storage, but I'll quickly go through the steps I used to set up the example Django application. It may help those new to Django & Heroku or those who encounter bugs following the rest of the setup process.
You can view the tagged project before the storage change at commit 299bbe2
.
example
poetry
for dependency managementexample
package, and the manage.py
file is in the root. I've always found this structure cleaner than the Django apps defined in the project root.django-heroku
package to automatically configure ALLOWED_HOSTS
, DATABASE_URL
, and more. This reduces the headache of deploying Django on Heroku considerablyProcfile
that runs a gunicorn
process for managing the WSGI applicationapp.json
is defined with some fundamental configuration values and resources defined for the project to workrelease
process definition in the Procfile
and an associated scripts/release.sh
script that runs staticfile collection and database migrationsBefore we can start managing static and media files, the Django application needs a persistent place to store the files. Again, we can look to Heroku's extensive list of Add-Ons for s3-compatible storage. Ours of choice will be one called Bucketeer.
Heroku's Bucketeer add-on provides an AWS S3 storage bucket to upload and download files for our application. The Django application will use this configured bucket to store files uploaded by the server and download them from the S3 when a user requests the files.
If you'd like to learn more about AWS S3, the widely-popular data storage solution that Bucketeer is built upon, you can read the S3 user documentation.
It is worth mentioning that the base plan for Bucketeer - Hobbyist
- is $5 per month. If you plan on spinning up the one-click example posted above, it should only cost a few cents if you proactively destroy the application when you are done using it.
To include the Bucketeer add-on in our application, we can configure it through the Heroku CLI, web dashboard, or via the project's app.json
file. We will use the third method of including the add-on in an app.json
file.
If the project does not have one already, we can create the basic structure listed below, with the critical part being the addition of the "add-ons"
configuration. This array defines the "bucketeer:hobbyist"
resource that our application will use, and Heroku will install the add-on into our application if it does not already exist. We also include the " as"
keyword, which will preface the associated configuration variables with the term BUCKETEER
. This prefacing is helpful to keep the generated configuration value names deterministic because, by default, Heroku will generate the prefix as a random color.
{
// ... rest above
"addons": [
// ...other addons...
{
"plan": "bucketeer:hobbyist",
"as": "BUCKETEER"
}
]
}
With the required resources being defined, we can start integrating with our storage add-on.
The django-storages
package is a collection of custom, reuseable storage backends for Django. It aids immensely in saving static and media files to different cloud & storage provider options. One of the supported storage providers is S3, which our Bucketeer add-on is built on. We will leverage the S3 django-storages
backend to handle different file types.
django-storages
Begin by installing the django-storages
package and the related boto3
package used to interface with AWS's S3. We will also lock our dependencies to ensure poetry
and our Heroku deployment continue to work as expected.
poetry add django-storages boto3 && poetry lock
Then, just like most Django-related packages, django-storages
will need to be added to the project's INSTALLED_APPS
in the projects settings.py
file. This will allow Django to load the appropriate code flows as the application starts up.
# example/config/settings.py
INSTALLED_APPS = [
# ... django.X.Y apps above
'storages',
# ... custom project apps below
]
We will return to the settings.py
file later to configure the usage of django-storages
, but before that can be done, we will implement three custom storage backends:
A storage backend for static files - CSS, Javascript, and publicly accessible images - that will be stored in version control - aka git
- and shipped with the application
A public storage backend for dynamic media files that are not stored in version control, such as uploaded files and attachments
A private storage backend for dynamic media files that are not stored in the version control that require extra access to be viewed, such as per-user reports and potentially profile images. Files managed by this backend require an access key and will block access to those without a valid key.
We can extend from django-storages
's S3Boto3Storage
storage backend to create these. The following code can be directly "copy and paste "'d into your project. The different settings
attributes read in the module will be written shortly, so do not expect this code to work if you import it right now.
# FILE: example/utils/storage_backends.py
from django.conf import settings
from storages.backends.s3boto3 import S3Boto3Storage
class StaticStorage(S3Boto3Storage):
"""Used to manage static files for the web server"""
location = settings.STATIC_LOCATION
default_acl = settings.STATIC_DEFAULT_ACL
class PublicMediaStorage(S3Boto3Storage):
"""Used to store & serve dynamic media files with no access expiration"""
location = settings.PUBLIC_MEDIA_LOCATION
default_acl = settings.PUBLIC_MEDIA_DEFAULT_ACL
file_overwrite = False
class PrivateMediaStorage(S3Boto3Storage):
"""
Used to store & serve dynamic media files using access keys
and short-lived expirations to ensure more privacy control
"""
location = settings.PRIVATE_MEDIA_LOCATION
default_acl = settings.PRIVATE_MEDIA_DEFAULT_ACL
file_overwrite = False
custom_domain = False
The attributes listed in each storage backend class perform the following:
location
: This dictates the parent directory used in the S3 bucket for associated files. This is concatenated with the generated path provided by a FileField
or ImageField
's upload_to
method.default_acl
: This dictates the access policy required for reading the files. This dictates the storage backend's access control through values of None
, public-read
, and private
. django-storages
and the S3Boto3Storage
parent class with translate these into object policies.file_overwrite
: In most cases, it's better not to overwrite existing files if we update a specific path. With this set to False
, a unique suffix will be appended to the path to prevent naming collisions.custom_domain
: Disabled here, but you can enable it if you want to use AWS's CloudFront and django-storage
to serve from it.With our storage backends defined, we can configure them to be used in different situations via the settings.py
file. However, it is challenging to use S3 and these different cloud storage backends while in development, and I've always been a proponent of keeping all resources and files "local" to the development machine, so we will create a logic path that will:
Use the local filesystem to store static and media files for convenience. The Django server will be responsible for serving these files directly.
Use the custom S3 storage backends when an environment variable is enabled. We will use the S3_ENABLED
variable to control this, enabling it in our Heroku configuration variables.
First, we will assume that you have a relatively vanilla settings.py
file concerning the static- & media-related variables. For reference, a new project should have a block that looks similar to the following:
# Static files (CSS, JavaScript, Images)
# https://docs.djangoproject.com/en/4.0/howto/static-files/
STATIC_URL = 'static/'
STATIC_ROOT = BASE_DIR / 'collected-static'
We will design a slightly advanced control flow that will seamlessly handle the two cases defined above. In addition, it will provide enough control to override each part of the configuration as needed.
Since there are already default values for the static file usage, we can add default values for media file usage. These will be used when serving files locally from the server while in development mode.
STATIC_URL = '/static/'
STATIC_ROOT = BASE_DIR / 'collected-static'
MEDIA_URL = '/media/'
MEDIA_ROOT = BASE_DIR / 'collected-media'
To begin the process of including S3, let's create the controls to manage if we should serve static & media files from the local server or through the S3 storage backend. We will create three variables
S3_ENABLED
: controls whether media & static files should use S3 storage by default
LOCAL_SERVE_MEDIA_FILES
: controls whether media files should use S3 storage. Defaults to the negated S3_ENABLED
value
LOCAL_SERVE_STATIC_FILES
: controls whether static files should use S3 storage. Defaults to the negated S3_ENABLED
value
from decouple import config # import explained below
# ...STATIC and MEDIA settings here...
# The following configs determine if files get served from the server or an S3 storage
S3_ENABLED = config('S3_ENABLED', cast=bool, default=False)
LOCAL_SERVE_MEDIA_FILES = config('LOCAL_SERVE_MEDIA_FILES', cast=bool, default=not S3_ENABLED)
LOCAL_SERVE_STATIC_FILES = config('LOCAL_SERVE_STATIC_FILES', cast=bool, default=not S3_ENABLED)
if (not LOCAL_SERVE_MEDIA_FILES or not LOCAL_SERVE_STATIC_FILES) and not S3_ENABLED:
raise ValueError('S3_ENABLED must be true if either media or static files are not served locally')
In the example above, we are using the python-decouple
package to make it easier to read and cast environment variables to Python variables. I highly recommend this package when working with settings.py
configurations. We also include a value check to ensure consistency across these three variables. If all three variables are defined in the environment but conflict with one another, the program will throw an error.
We can now start configuring the different configuration variables required by our file storage backends based on those control variables' value(s). We begin by including some S3 configurations required whether we are serving static, media, or both types of files.
if S3_ENABLED:
AWS_ACCESS_KEY_ID = config('BUCKETEER_AWS_ACCESS_KEY_ID')
AWS_SECRET_ACCESS_KEY = config('BUCKETEER_AWS_SECRET_ACCESS_KEY')
AWS_STORAGE_BUCKET_NAME = config('BUCKETEER_BUCKET_NAME')
AWS_S3_REGION_NAME = config('BUCKETEER_AWS_REGION')
AWS_DEFAULT_ACL = None
AWS_S3_SIGNATURE_VERSION = config('S3_SIGNATURE_VERSION', default='s3v4')
AWS_S3_ENDPOINT_URL = f'https://{AWS_STORAGE_BUCKET_NAME}.s3.amazonaws.com'
AWS_S3_OBJECT_PARAMETERS = {'CacheControl': 'max-age=86400'}
The above defines some of the variables required by the django-storages
S3 backend and sets the values to environment configurations that are provided by the Bucketeer add-on. As previously mentioned, all of the add-on environment variables are prefixed with BUCKETEER_
. The S3_SIGNATURE_VERSION
environment variable is not required and most likely does not need to be included.
With the S3 configuration together, we can reference the LOCAL_SERVE_MEDIA_FILES
and LOCAL_SERVE_STATIC_FILES
control variables to override the default static and media file settings if they are desired to be served via S3.
if not LOCAL_SERVE_STATIC_FILES:
STATIC_DEFAULT_ACL = 'public-read'
STATIC_LOCATION = 'static'
STATIC_URL = f'{AWS_S3_ENDPOINT_URL}/{STATIC_LOCATION}/'
STATICFILES_STORAGE = 'example.utils.storage_backends.StaticStorage'
Notice the last line where STATICFILES_STORAGE
is set to the custom Backend we created. That ensures it follows the location & ACL (Access Control List) policies that we configured initially. With this configuration, all static files will be placed under /static/
in the bucket, but feel free to update STATIC_LOCATION
if desired.
We can configure a very similar situation for media files.
if not LOCAL_SERVE_MEDIA_FILES:
PUBLIC_MEDIA_DEFAULT_ACL = 'public-read'
PUBLIC_MEDIA_LOCATION = 'media/public'
MEDIA_URL = f'{AWS_S3_ENDPOINT_URL}/{PUBLIC_MEDIA_LOCATION}/'
DEFAULT_FILE_STORAGE = 'rn_api.utils.storage_backends.PublicMediaStorage'
PRIVATE_MEDIA_DEFAULT_ACL = 'private'
PRIVATE_MEDIA_LOCATION = 'media/private'
PRIVATE_FILE_STORAGE = 'rn_api.utils.storage_backends.PrivateMediaStorage'
The big difference here is that we have configured two different storage backends for media files; one for publicly accessible objects and one for objects that require an access token. When the file is requested, this token will be generated internally by django-storages
so you do not have to worry about anonymous public access.
Since we will have S3_ENABLED
set to False
in our local development environment, it will serve static and media files locally through the Django server instead of from S3. We will need to configure the URL routing to handle this scenario. We can configure our urls.py
file to serve the appropriate files like so:
from django.conf import settings
from django.conf.urls.static import static
from django.contrib import admin
from django.urls import path
urlpatterns = [
path('admin/', admin.site.urls),
]
if settings.LOCAL_SERVE_STATIC_FILES:
urlpatterns += static(settings.STATIC_URL, document_root=settings.STATIC_ROOT)
if settings.LOCAL_SERVE_MEDIA_FILES:
urlpatterns += static(settings.MEDIA_URL, document_root=settings.MEDIA_ROOT)
This will locally serve the static or media files based on the values of the LOCAL_SERVE_STATIC_FILES
and LOCAL_SERVE_MEDIA_FILES
settings variables we defined.
We can enable these storages and our add-on in the app.json
file to start using these storage backends. This will effectively disable LOCAL_SERVE_STATIC_FILES
and LOCAL_SERVE_MEDIA_FILES
to start serving both via S3 when deployed to Heroku.
{
// ...rest of configs...
"env": {
// ...rest of envs...
"S3_ENABLED": {
"description": "Enable to upload & serve static and media files from S3",
"value": "True"
},
}
}
By default, Django will use the PublicMediaStorage
class for uploading media files, meaning the contents will be publicly accessible to anyone with the link. However, a model can utilize the PrivateMediaStorage
backend when desired, which will create short-lived access tokens that prevent the public from viewing the associated object.
The below is an example of using public and private media files on the same model.
from django.db import models
from example.utils.storage_backends import PrivateMediaStorage
class Organization(models.Model):
"""A sample Organization model with public and private file field usage
"""
logo = models.ImageField(help_text='A publicly accessible company logo')
expense_report = models.FileField(
help_text='The private expense report requires a short-lived access token'
storage=PrivateMediaStorage() # will create private files
)
You can see the code for this complete example at commit 265becc
. This configuration will allow your project to scale efficiently using Django on Heroku using Bucketeer.
In a future post, we will discuss how to upload and set these files using vanilla Django & Django REST Framework.
As always, if you find any bugs, issues, or unclear explanations, please reach out to me so I can improve the tutorial & experience for future readers.
Take care, everyone!
Also Published Here