Code Quality Guardians: The Power of Continuous Integration in Software Development

In the ongoing epic struggle between developers and buggy software, we need all of the tools at our disposal to hold back the ravening hordes of runtime and logic errors.

But here is a warrior who holds back the evil armies of Tech Debt even as they infiltrate our own ranks:

Continuous Integration.

Why focus on the “CI” of the CI/CD pipeline?

Continuous Delivery (CD) offers great benefits but also comes with substantial risk just by itself. A bug written at 9:35 a.m. might break production before lunch and then spend the whole afternoon demoralizing the team and possibly losing sales.

Continuous Integration (CI) is the complementary tool designed to substantially reduce the likelihood of such a scenario because a failed CI means the CD doesn’t happen. If the CI fails, the code shouldn’t even be allowed to merge into the main branch.

The CI process augments the code reviewer’s power to protect the repository by offering some concrete standards. It frees human eyes to focus on other things like trade-off choices and how the problem was approached.

It helps keep the project in a maintainable state. Developers don’t have to be afraid to refactor when needed, even in a less familiar part of the code. A good CI process also enforces a certain level of uniform coding styles, reducing the risk of a big hodge-podge.

It also reduces unpleasant surprises. A developer might run checks locally and then confidently check in their commits, believing all is well, but surprise! Sometimes, the commits don’t behave as well in a non-local environment.

In short, you might think of CI as the warrior who stands tall and strong between mistakes and production.

Caveats

An epic hero fending off an entire army of bugs and errors needs to be properly outfitted. Such a warrior needs the finest armor and sharpest weapons to be effective — like a well-designed test suite that covers every edge case.

But if your static checks ignore too many things or if your tests have been neglected, you may be sending out your hero with only a jacket and a baseball bat. It’s better than nothing, surely, but that poor hero will have an awfully hard time defending the castle.

Deploying your CI

Different projects will have different quality check needs, but on a high level, you will want to

Run the static analysis tools for your project’s language(s)
Run your tests and ideally enforce a degree of coverage

Choose your tool

There are a number of CI/CD tools available, including Travis, Jenkins, CircleCI, CodePipeline, etc. The major repository sites also all have their own sets of tools.

My most recent project was a Django website with an app named “enterprises.” I decided to create my CI by writing a YAML workflow for GitHub Actions.

Define the events

on: [push, pull_request]

Since it’s just me working on the project, I don’t need anything elaborate. I have defined only two events for triggering the CI workflow: push and pull requests. Whenever a PR is created, and after any pushes to that PR, the workflow will run. It also runs whenever the main branch is updated.

Define the jobs

A GitHub Action job is a set of steps to be taken, and any workflow can have one or more jobs. Jobs will run in parallel by default but can be configured to wait if one job is dependent on the outcome of another.

First, I gave my job a name (I settled on “build”)

jobs:
  build:

And an operating system.

jobs:
  build:
    runs-on: ubuntu-latest

My project is a website, so there’s no need to support multiple operating systems.

Define the services

Not all jobs will need a service, but I chose to set up a PostgreSQL service for my tests.

The ability to run tests locally using SQLite is wonderful. It’s lightweight and requires no maintenance. However, since the occasional problem does pop up due to database differences, I wanted a more production-like environment for my CI.

services:
  postgres:
    image: postgres:15
    env:
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: postgres
      POSTGRES_DB: postgres
    ports:
      - 5432:5432
    options: >-
      --health-cmd pg_isready
      --health-interval 10s
      --health-timeout 5s
      --health-retries 5

There’s nothing fancy going on here; it’s almost a perfect copy/paste from GitHub’s documentation. To run your tests in a Postgres database, you’ll need to

Specify the image you need. In my case, GitHub will pull and spin up the postgres:15 Docker image.
Set up your environment variables.
Map the ports you plan to use.
And then add any other options you choose, like health checks.

Define the steps

Each job is made up of a series of steps. The first steps ought to set up your project.

steps:
- uses: actions/checkout@v2

- name: Set up Python 3.11
  uses: actions/setup-python@v2
  with:
    python-version: 3.11

- name: Install Poetry
  run: curl -sSL https://install.python-poetry.org | python3 -

- name: Install dependencies
  run: poetry install

Steps 1 and 2 above use predefined actions, which are very handy for common tasks like checking out the repository and setting up Python 3.11 to use with it.

Next, since I use Poetry for dependency management, the next step is to install it using Poetry’s official installer. And finally, I installed my project’s requirements.

And now my CI is ready to do battle:

- name: Run isort
  run: poetry run isort --check-only .

- name: Run black
  run: poetry run black --check .

- name: Run flake8
  run: poetry run flake8 .

The first three steps (isort, black, and flake8) comprise a common triad of style checkers.

Isort is an import sorter, enforcing consistency and readability. By default, it organizes imports by standard library, third-party library, and application library imports.

Black is a formatter that enforces PEP 8. Flake8 is a linter that flags things that black can’t handle automatically.

Together, they make it harder for a developer to contribute unreadable code.

- name: Run PyCQA/Bandit
  run:  poetry run bandit -r enterprises

Bandit is a security linter that checks for common issues and reports them by severity and confidence level.

- name: Setup Database
  run: PGPASSWORD=postgres psql -h localhost -U postgres -c "CREATE DATABASE test_db;"

- name: Set up environment
  run: |
    echo "SECRET_KEY=$(poetry run python -c 'from django.core.management.utils import get_random_secret_key; print(get_random_secret_key())')" >> $GITHUB_ENV
    echo "DATABASE_HOST=localhost" >> $GITHUB_ENV
    echo "DATABASE_PORT=5432" >> $GITHUB_ENV
    echo "DATABASE_NAME=test_db" >> $GITHUB_ENV
    echo "DATABASE_USER=postgres" >> $GITHUB_ENV
    echo "DATABASE_PASSWORD=postgres" >> $GITHUB_ENV

- name: Run Tests
  run: poetry run pytest --cov=enterprises --cov-fail-under=100

And finally, the most powerful weapon of continuous integration: the tests.

Since I chose to run the tests in Postgres instead of SQLite, the first two items of business are to create the database and set up the environment. I configured my workflow to generate a SECRET_KEY on the fly since it will just be thrown away.

Finally, I ran my tests. The best part is that my CI job will fail if the coverage falls under a given percentage, preventing quality from falling prey to expediency.

(Side note: some projects don’t necessarily need 100% coverage, and sometimes prioritizing expediency might be an unavoidable necessity. If that happens, gradually push the coverage threshold back toward its original level, or the loss will become permanent.)

Implementing a CI process to protect your project doesn’t have to be complicated. You need to know what tools to use to make it easy to keep your standards high and then define a process to apply those tools to every proposed change to the code.

See the full gist here.

What are your experiences with designing and implementing CI/CD pipelines? Let me know!

Also published here.