paint-brush
3 Major Python Problems and How to Solve Themby@dave01
3,905 reads
3,905 reads

3 Major Python Problems and How to Solve Them

by David FinsonJanuary 21st, 2023
Read on Terminal Reader
Read this story w/o Javascript

Too Long; Didn't Read

Managing complex development environments has often posed some challenges as projects scaled up. The solution is the use of containerization, which is a method of packaging an application and its dependencies into a self-contained unit that can be easily deployed and run on any platform. I'll demonstrate how to set up and use a containerized development environment using Docker and Docker Compose.
featured image - 3 Major Python Problems and How to Solve Them
David Finson HackerNoon profile picture

While working with Python has, more often than not, been a fantastic experience for me, managing complex development environments has often posed some challenges as projects scaled up.

To name just a few examples, here are 3 major issues with Python that I've run into:

1. Applications that depend on environment variables may need these variables to be set before the app can run.

2. Applications which use auth certificates for communication between different services, may require the generation of these certificates locally before running the application.

3. Dependency versioning clashes can occur between different microservices within the same project.

Things can get especially gnarly when working with multiple microservices which depend on each other, and, frankly, as a developer, I don't really want to be managing all of this overhead just to get up and running. This is especially true if I'm just onboarding to a new project.

One common solution I've seen used when developing Python apps, is to use Python virtual environments, which are isolated environments that contain a Python installation and required packages. However, managing multiple virtual environments and other environment-related configurations can still be time-consuming and cumbersome, as the virtual environment only provides isolation at the Python interpreter level. This means that other environment-related setup, such as environment variables and port allocation, is still shared globally for all project components.

The solution I'll demonstrate in this article is the use of containerization, which is a method of packaging an application and its dependencies into a self-contained unit that can be easily deployed and run on any platform. Docker is a popular platform for developing, deploying, and running containerized applications, and Docker Compose is a tool that makes it easy to define and run multi-container Docker applications using a single YAML file (which is typically named

docker-compose.yml
). Although there are alternative solutions such as minikube, for simplicity's sake, I'll stick to using Docker and Docker Compose in this example.

I'll demonstrate how to set up and use a containerized development environment using Docker and Docker Compose. I'll also discuss some of the challenges of using a containerized development environment, and how to overcome them by configuring Docker and Docker compose to fit the following key requirements for an effective development environment:

1. Run - Running end-to-end scenarios that simulate execution on the target production environment.

2. Deploy - Making code changes and redeploying quickly, as with a non-containerized application runtime stack.

3. Debug - Setting breakpoints and using a debugger to step through code, as with a non-containerized application runtime stack, to identify and fix errors.

Project setup

To illustrate this by example, I'll define a simple Python application that uses the lightweight Python web framework, Flask, to create a RESTful API for querying information about authors and their posts. The API has a single endpoint,

/authors/{author_id}
, which can be used to retrieve information about a particular author by specifying their ID as a path parameter. The application then uses the requests module to make HTTP requests to a separate posts service, which is expected to provide a list of posts by that author. To keep the code concise, all data will be randomly generated on the fly using the Faker library.

To start off, I'll initialize and then open an empty directory for the project. Next, I'll create two sub-directories: The first will be called

authors_service
, and the second
posts_service
. Inside each of these directories, I'll create 3 files:

1.

app.py
: The main entry point for the Flask app, which defines the app, sets up routes, and specifies the functions to be called when a request is made to those routes.

2.

requirements.txt
: A plain text file that specifies the Python packages that are required for the application to run.

3.

Dockerfile
: A text file containing instructions for building a Docker image, which, as mentioned above, is a lightweight, stand-alone, and executable package that includes everything needed to run the application, including the code, a runtime, libraries, environment variables, and pretty much anything else.

In each

app.py
file, I'll implement a Flask microservice with the desired logic.

For 

authors_service
, the
app.py
file looks as follows:

import os
import flask
import requests
from faker import Faker

app = flask.Flask(__name__)

@app.route("/authors/<string:author_id>", methods=["GET"])
def get_author_by_id(author_id: str):
    author = {
        "id": author_id,
        "name": Faker().name(),
        "email": Faker().email(),
        "posts": _get_authors_posts(author_id)
    }
    return flask.jsonify(author)

def _get_authors_posts(author_id: str):
    response = requests.get(
        f'{os.environ["POSTS_SERVICE_URL"]}/{author_id}'
    )
    return response.json()

if __name__ == "__main__":
    app.run(
        host=os.environ['SERVICE_HOST'],
        port=int(os.environ['SERVICE_PORT'])
    )


This code sets up a Flask app and defines a route to handle GET requests to the endpoint

/authors/{author_id}
. When this endpoint is accessed, it generates mock data for an author with the provided ID and retrieves a list of posts for that author from the separate posts service. It then runs the Flask app, listening on the hostname and port specified in corresponding environment variables.

Note that the above logic depends on the
flask
,
requests
and
Faker
packages. To account for this, I'll add them to the authors service's
requirements.txt
file, as follows:

flask==2.2.2
requests==2.28.1
Faker==15.3.4

Note that there are no specific package versioning requirements for any of the dependencies referenced throughout this guide. The versions used were the latest available at the time of writing.

For the

posts_service
,
app.py
looks as follows:

import os
import uuid
from random import randint
import flask
from faker import Faker

app = flask.Flask(__name__)

@app.route('/posts/<string:author_id>', methods=['GET'])
def get_posts_by_author_id(author_id: str):
    posts = [
        {
            "id:": str(uuid.uuid4()),
            "author_id": author_id,
            "title": Faker().sentence(),
            "body": Faker().paragraph()
        }
        for _ in range(randint(1, 5))
    ]
    return flask.jsonify(posts)

if __name__ == '__main__':
    app.run(
        host=os.environ['SERVICE_HOST'],
        port=int(os.environ['SERVICE_PORT'])
    )


In this code, when a client (i.e.

authors_service
) sends a GET request to the route
/posts/{author_id}
, the function
get_posts_by_author_id
is called with the specified
author_id
as a parameter. The function generates mock data for between 1 and 5 posts written by the author using the Faker library, and returns the list of posts as a JSON response to the client.

I'll also need to add the flask and Faker packages to posts service's

requirements.txt
file, as follows:

flask==2.2.2
Faker==15.3.4

Before containerizing these services, let's consider one example of why I'd want to package and run them in isolation from each other in the first place.

Both services use the environment variables

SERVICE_HOST
and
SERVICE_PORT
to define the socket on which the Flask server will be launched. While
SERVICE_HOST
is not an issue (multiple services can listen on the same host),
SERVICE_PORT
can cause problems. If I were to install all dependencies in a local Python environment and run both services, the first service to start would use the specified port, causing the second service to crash because it couldn't use the same port. One simple solution is to use separate environment variables (e.g.,
AUTHORS_SERVICE_PORT
and
POSTS_SERVICE_PORT
) instead. However, modifying the source code to adapt to environmental constraints can become complex when scaling up.

Containerization helps to avoid issues like this by setting up the environment to be adapted for the application, rather than the other way around. In this case, I can set the

SERVICE_PORT
environment variable to a different value for each service, and each service will be able to use its own port without interference from other services.

To containerize the services, I'll create a new file named
Dockerfile
in each service's directory. The contents of this file (for both services) are as follows:

FROM python:3.8
RUN mkdir /app
WORKDIR /app
COPY requirements.txt /app/
RUN pip install -r requirements.txt
COPY . /app/
CMD ["python", "app.py"]

This

Dockerfile
builds off of a Python 3.8 parent image and sets up a directory for the application in the container. It then copies the
requirements.txt
file from the host machine to the container and installs the dependencies listed in that file. Finally, it copies the rest of the application code from the host machine to the container and runs the main application script when the container is started.

Next, I'll create a file named

docker-compose.yml
in the root project directory. As briefly mentioned above, this file is used to define and run multi-container Docker applications. In the
docker-compose.yml
file, I can define the services that make up the application, specify the dependencies between them, and configure how they should be built and run. In this case, it looks as follows:

---
# Specify the version of the Docker Compose file format
version: '3.9'

services:
  # Define the authors_service service
  authors_service:
    # This service relies on, and is therefor dependent on, the below `posts_service` service
    depends_on:
      - posts_service
    # Specify the path to the Dockerfile for this service
    build:
      context: ./authors_service
      dockerfile: Dockerfile

    # Define environment variables for this service
    environment:
      - SERVICE_HOST=0.0.0.0
      - PYTHONPATH=/app
      - SERVICE_PORT=5000
      - POSTS_SERVICE_URL=http://posts_service:6000/posts

    # Map port 5000 on the host machine to port 5000 on the container
    ports:
      - "5000:5000"

    # Mount the authors_service source code directory on the host to the working directory on the container
    volumes:
      - ./authors_service:/app

  # Define the posts_service service
  posts_service:
    # Specify the path to the Dockerfile for this service
    build:
      context: ./posts_service
      dockerfile: Dockerfile

    # Define environment variables for this service
    environment:
      - PYTHONPATH=/app
      - SERVICE_HOST=0.0.0.0
      - SERVICE_PORT=6000

    # Mount the posts_service source code directory on the host to the working directory on the container
    volumes:
      - ./posts_service:/app


Running the application

The containers can be started with the

docker-compose up
command. The first time this is run the docker images will automatically be built.

This satisfies the first above core requirement of "Run".

Redeploying

Note that in the

docker-compose.yml
file, volume mounts are used to share the source code directories for the
authors_service
and
posts_service
services between the host machine and the containers. This allows for code to be edited on the host machine with the changes automatically reflected in the containers (and vice versa).

For example, the following line mounts the

./authors_service
directory on the host machine to the
/app
directory in the
authors_service
container:

volumes:
  - ./authors_service:/app

Changes made on the host machine are immediately available on the container, and changes made in the container are immediately persisted to the host machine's source code directory. This allows for quickly redeploying changes by restarting the relevant container without rebuilding the image, effectively satisfying the second core requirement of "Deploy."

Debugging

This is where it gets a bit more involved. Debuggers in Python use the debugging tools provided by the interpreter to pause the execution of a program and inspect its state at certain points. This includes setting a trace function with

sys.settrace()
at each line of code and checking for breakpoints, as well as using features like the call stack and variable inspection. Debugging a Python interpreter running inside a container can potentially add complexity compared to debugging a Python interpreter running on a host machine. This is because the container environment is isolated from the host machine.

To overcome this, one of the following two general tracks can be taken: The code can be debugged from within the container itself, or it can be debugged using a remote debug server.

First, I will be using VSCode as the editor of choice to demonstrate how to go about this. Afterwards, I will explain how to work similarly with JetBrains PyCharm.

Debugging the code from within the container itself

To develop and debug code from within a running docker container using VSCode, I will:

1. Ensure the Docker extension for VSCode is installed and enabled.

2. Ensure the container I want to attach to is up and running.

3. Open the Docker extension's explorer view by clicking on the Docker icon in the left sidebar.

4. In the explorer view, expand the "Running Containers" section and select the container I want to attach to.

5. Right-click on the container and select the "Attach Visual Studio Code" option from the context menu.

This will attach Visual Studio Code to the selected container and open a new VSCode window within the container. In this new window, I can write, run and debug code as I normally would on a local environment.

In order to avoid having to install VSCode extensions such as Python every time the container restarts, I can mount a volume inside the container that stores the VSCode extensions. This way, when the container is restarted, the extensions will still be available because they are stored on the host machine. To do this using docker compose in this demo project, the

docker-compose.yml
file can be modified as follows:

---
# Specify the version of the Docker Compose file format
version: '3.9'

services:
  # Define the authors_service service
  authors_service:
    ...
    # Mount the authors_service source code directory on the host to the working directory on the container
    volumes:
      - ./authors_service:/app
      # Mount the vscode extension directory on the host to the vscode extension directory on the container
      - /path/to/host/extensions:/root/.vscode/extensions

  # Define the posts_service service
  posts_service:
    ...

Note that the VSCode extensions can typically be found at

~/.vscode/extensions
on Linux and macOS, or
%USERPROFILE%\.vscode\extensions
on Windows.

Using a remote Python debug server

The above method of debugging works well for standalone scripts or for writing, running, and debugging tests. However, debugging a logical flow involving multiple services running in different containers is more complex.

When a container is started, the service it contains is typically launched immediately. In this case, the Flask servers on both services are already running by the time VSCode is attached, so clicking "Run and debug" and launching another instance of the Flask server is not practical as it would result in multiple instances of the same service running on the same container and competing with each other, which is usually not a reliable debugging flow.

This brings me to option number two; using a remote Python debug server. A remote Python debug server is a Python interpreter that is running on a remote host and is configured to accept connections from a debugger. This allows for the use of a debugger which is running locally, to examine and control a Python process which is running on a remote environment.

It's important to note that the term "remote" does not necessarily refer to a physically remote machine or even a local but isolated environment such as a Docker container running on a host machine. A Python remote debug server can also be useful for debugging a Python process that is running within the same environment as the debugger. In this context, I'll use a remote debug server that is running within the same container as the process I'm debugging. The key difference between this method and the first option for debugging we covered is that I'll be attaching to a pre-existing process instead of creating a new one every time I want to run and debug the code.

To get started, the first step is to add the debugpy package to the

requirements.txt
files for both services. debugpy is a high-level, open-source Python debugger that can be used to debug Python programs locally or remotely. I'll add the following line to both
requirements.txt
files:

debugpy==1.6.4

Now I need to rebuild the images in order to install debugpy on the Docker images for each service. I'll run the

docker-compose build
command to do this. Then I'll run
docker-compose up
to launch the containers.

Next, I'll attach VSCode to the running container containing the process I want to debug, as I did above.

In order to attach a debugger to the running python application, I'll need to add the following snippet to the code at the point from which I wish to begin debugging:

import debugpy; debugpy.listen(5678)

This snippet imports the debugpy module and calls the

listen
function, which starts a debugpy server that listens for connections from a debugger on the specified port number (in this case, 5678).

If I wanted to debug the

authors_service
, I could place the above snippet just before the
get_author_by_id
function declaration within the
app.py
file - as follows:

import os
import flask
import requests
from faker import Faker

app = flask.Flask(__name__)

import debugpy; debugpy.listen(5678)

@app.route("/authors/<string:author_id>", methods=["GET"])
def get_author_by_id(author_id: str):
...

This would start a debugpy server on application startup as the

app.py
script is executed.

The next step is to create a VSCode launch configuration for debugging the application. In the root directory for the service whose container I've attached to (and on which I'm running the VSCode window), I'll create a folder named

.vscode
. Then, within this folder, I'll create a file named
launch.json
, with the following contents:

{
    "version": "0.2.0",
    "configurations": [
        {
            "name": "Python: Remote Attach",
            "type": "python",
            "request": "attach",
            "connect": {
                "host": "localhost",
                "port": 5678
            }
        }
    ]
}

This configuration specifies that VSCode should attach to a Python debugger running on the local machine (i.e. the current container) on port 5678 - which, importantly, was the port specified when calling the

debugpy.listen
function above.

I will then save all changes. In the Docker extension's explorer view, I will right-click the container I'm currently attached to and select "Restart container" from the context menu (done on the local VSCode instance). After restarting the container, the VSCode window within the container will display a dialog asking if I want to reload the window - the correct answer is yes.

Now all that remains is to see it in action! To start debugging, within the VSCode instance running on the container, I'll open the script I want to debug and press F5 to start the debugger. The debugger will attach to the script and pause execution at the line where the

debugpy.listen
function is called. The debugger controls in the Debug tab can now be used to step through the code, set breakpoints, and inspect variables.

This satisfies the above "Debug" requirement.

Remote development and debugging with Jetbrains Pycharm IDE

Per the official docs, there are two ways to go about this when using PyCharm: An interpreter can be retrieved from a Docker image using the remote interpreter feature, and / or a remote debug server configuration. Note that these two options are not mutually exclusive. I personally typically rely primarily on the remote interpreter feature for development, and use a remote debug server configuration if and when necessary.



Setting up a remote interpreter

To set up a remote interpreter on PyCharm, I will:

1. Click the interpreters tab pop-up menu in the bottom right corner of the IDE window.

2. Click Add new interpreter, and in then select On docker compose... from the pop up menu.

3. In the next pop up window, select the relevant docker compose file, and then select the relevant service from the dropdown. PyCharm will now attempt to connect to the docker image and retrieve the available python interpreters.

4. In the next window, select the python interpreter I wish to use (e.g. 

/usr/local/bin/python
). Once the interpreter has been selected, click "Create".

PyCharm will then index the new interpreter, after which I can run or debug code as usual - PyCharm will orchestrate docker compose behind the scenes for me whenever I wish to do so.

Setting up a remote debug server configuration

In order to set up a remote debug server configuration, I first need to add two dependencies to the relevant

requirements.txt
file(s): pydevd, and pydevd_pycharm. These are similar in function to the debugpy package demonstrated above, but, as its name suggests, pydevd_pycharm is specifically designed for debugging with PyCharm. In the context of this demo project, I'll add the following two lines to both
requirements.txt
files:

pydevd~=2.9.1
pydevd-pycharm==223.8214.17

Once this is done and the docker images have been rebuilt, I can then embed the following code snippet in the code to start a pydevd_pycharm debug server at the point in the code from which I wish to begin debugging:

import pydevd_pycharm; pydevd_pycharm.settrace('host.docker.internal', 5678)

Note that unlike with debugpy, here I specified a hostname address with the value "host.docker.internal", which is a DNS name that resolves to the internal IP address of the host machine from within a Docker container. This is because I'm not running PyCharm on the container; instead, I'm effectively configuring the debug server to listen on port 5678 of the host machine.

This option also exists with debugpy, but since in that case I was running an instance of VSCode on the container itself, it simplified things to just let the hostname address default to "localhost" (i.e. the loopback interface of the container itself, not the host machine).

The final step is to set up a run configuration which PyCharm can use to connect to the remote debug server.

To do this, I will:

1. Open the Run/Debug Configuration dialog by selecting Run > Edit Configurations from the main menu.

2. Click the + button in the top-left corner of the dialog and select Python Remote Debug from the drop-down menu.

3. In the Name field, enter a name for the run configuration.

4. In the Script path field, specify the path to the script I want to debug.

5. In the Host field, enter the IP address of the host machine where the debugger server will run. In this example, it's "localhost".

6. In the Port field, enter the port number that the debugger server will listen on. In this example, it's 5678.

7. In the Path mappings section, I can specify how the paths on the host machine map to paths within the container. This is useful if I'm debugging code that is mounted into the container from the host, as the paths may not be the same in both environments. In this example, I'll want to map

 path/to/project/on/host/authors_service
on the host machine, to
 /app
in the container for debugging authors_service, or
path/to/project/on/host/posts_service
to
/app
on the container for debugging posts_service (these would need to be two separate run configurations).

8. Click OK to save the run configuration.

To start debugging, I'll select the above run configuration from the Run drop-down menu and click the Debug button, and then spin up the relevant container(s) with the

docker-compose up
command. The PyCharm debugger will attach to the script and pause execution at the line where the
pydevd_pycharm.settrace
function is called, allowing me to begin smashing those bugs.


In summary

In this guide I've given a general, yet practical overview of what containerized python development environments are, why they are useful, and how to go about writing, deploying and debugging python code using them. Please note that this is in no way a comprehensive guide to working with these environments. It is merely a starting point from which to expand upon. Here are a few useful links to that end:

1. Overview of containerization by redhat

3. Official Docker docs

4. Official JetBrains PyCharm docs for remote debugging

5. Official VSCode docs for developing Python on dev containers

I hope you found this guide to be helpful - thanks for reading!