Hackernoon logoImplementing TypeState Pattern in Python by@0xc0ff3370c0d3

Implementing TypeState Pattern in Python

Disclaimer: The details mentioned below are just my thoughts on an approach towards software correctness. I consider myself not to be an expert in Python, Rust, software verification, or in general software development.

Please use your own judgement when designing/implementing software. That said, I am interested in practices to make software safer, and would highly appreciate any constructive criticism/feedback.

What is TypeState Pattern?

The basic idea behind TypeState pattern is to encode the state information inside types. 

While TypeState pattern is a generic pattern that could be used in multiple languages, I  encountered this pattern first while developing software using Rust

There are multitude of resources (see References below) that discuss TypeState pattern inside Rust in further detail, so I will not delve into details of TypeState pattern.

Example of TypeState pattern in Python

Let's explore an example API: say we have a connection API over which we could establish connections and communicate with web servers using HTTP. The connection can exist in two states: the connection to remote end has not been established, and the connection to remote end has been established. There can be more than two states for the connection, but for this example, we will consider only these two states.

One approach to represent this API using TypeState pattern would be as follows:

# Approach 1
import socket

from typing import Optional


class HTTPConnection():
    """HTTP Connection without the connection established."""

    def __init__(self, host: bytes, port: int) -> None:
        """Initialize HTTP connection instance object with connection details."""
        self._host = host
        self._port = port
        self._connected_socket: Optional[HTTPConnection._HTTPConnectionConnectedState] = None

    class _HTTPConnectionConnectedState():
        """HTTP connection with connection to remote end already established"""

        def __init__(self, socket_: socket.socket, host: bytes) -> None:
            """Initialize HTTPConnectionConnectedState instance object."""
            self._socket = socket_
            self._host = host

        def get_request(self, url: bytes) -> bytes:
            """Send GET request to a URL over an HTTP connection."""
            request = ("GET {!r} HTTP/1.1\r\nHost: {!r}\r\nConnection: close\r\n"
                       "\r\n").format(url, self._host)
            self._socket.sendall(request.encode('utf-8'))
            result = b""
            while True:
                data = self._socket.recv(4096)
                if data:
                    result += data
                else:
                    break
            return result

    def __enter__(self) -> "HTTPConnection._HTTPConnectionConnectedState":
        """Manage starting of an HTTP connection."""
        s = socket.socket()
        s.connect((self._host, self._port))
        self._connected_socket = self._HTTPConnectionConnectedState(
            s, self._host)
        return self._connected_socket

    def __exit__(self, _exc_type, _exc_value, _traceback) -> None:
        """Manage closing of an HTTP connection."""
        if self._connected_socket:
            if self._connected_socket._socket:
                self._connected_socket._socket.close()
        # The following operation will not delete the instance object if the
        # user of this API stored reference to the instance object inside a
        # variable outside the scope of context manager
        self._connected_socket = None


def print_example_homepage(
    http_conn: HTTPConnection._HTTPConnectionConnectedState
) -> None:
    """Fetch and print response data over an HTTP connection."""
    print(http_conn.get_request(b'/'))

The following shows an approach to implement a similar logic without using TypeState pattern, by maintaining the state information inside an attribute:

# Approach 2
import socket

from typing import Optional


class HTTPConnection():
    """HTTP connection to connect to a remote endpoint."""

    def __init__(self, host: bytes, port: int) -> None:
        """Initialize the HTTP connection details."""
        self._host = host
        self._port = port
        self._socket: Optional[socket.socket] = None
        self._is_connected = False

    def get_request(self, url: bytes) -> bytes:
        """Send GET request to a URL over an HTTP connection."""
        if not self._socket or not self._is_connected:
            raise Exception("HTTP connection not established")
        request = ("GET {!r} HTTP/1.1\r\nHost: {!r}\r\nConnection: close\r\n\r\n"
                   .format(url, self._host))
        self._socket.sendall(request.encode('utf-8'))
        result = b""
        while True:
            data = self._socket.recv(4096)
            if data:
                result += data
            else:
                break
        return result

    def __enter__(self) -> 'HTTPConnection':
        """Manage starting of an HTTP connection."""
        self._socket = socket.socket()
        self._socket.connect((self._host, self._port))
        self._is_connected = True
        return self

    def __exit__(self, _exc_type, _exc_value, _traceback) -> None:
        """Manage closing of an HTTP connection."""
        if self._socket and self._is_connected:
            self._socket.close()
        self._socket = None
        self._is_connected = False


def print_example_homepage(http_conn: HTTPConnection) -> None:
    """Fetch and print response data over an HTTP connection."""
    print(http_conn.get_request(b'/'))

What benefit does Approach 1 provide over Approach 2? Let's consider two usage examples for the above-mentioned API:

# Usage 1
http_conn = HTTPConnection(b"www.example.com", 80)
print_example_homepage(http_conn)
# Usage 2
with HTTPConnection(b"www.example.com", 80) as http_conn:
    print_example_homepage(http_conn)

For this simple example, it is obvious that the second usage example is correct, but this might not be the case in large codebases.

The following table shows the results for execution of both usage examples with each approach:

|             | Approach 1                                                             | Approach 2                                 |
|-------------|------------------------------------------------------------------------|--------------------------------------------|
|   Usage 1   | AttributeError: 'HTTPConnection' object has no attribute 'get_request' | Exception: HTTP connection not established |
|   Usage 2   | Prints the HTTP response as expected                                   | Prints the HTTP response as expected       |

Both the approaches run into an exception when used incorrectly, and print the expected response when used correctly. So what advantage does the first approach provides over the second one? Even the exception message is more user-friendly in the second case. Let's see the advantage of using the first approach along with static type checking.

Static type checking in Python

Newer versions of Python have added support for providing type hints inside code. These hints could be used by static type checking tools such as mypy in order to provide guarantees of static typing (similar to that in compiled languages such as C, C++, Rust, etc.) along with the benefits of duck typing that Python provides. In addition, typing hints can be added gradually inside the codebase.

While proper unit testing can help catch many issues with API usage, it can sometimes be hard to cover all the scenarios because of the dynamic nature of this approach. In comparison, static analyses are usually more conservative and tend to cover all flow scenarios.

In my opinion, using the best of both worlds to establish correctness of code is the way to go forward (unless Formal Verification becomes feasible for generic programming, in which case I would prefer using formal verification procedures). Let's do static type checking using mypy on both the usage examples with each approach for API design:

|         | Approach 1                                                                                                              | Approach 2      |
|---------|-------------------------------------------------------------------------------------------------------------------------|-----------------|
| Usage 1 | Argument 1 to "print_example_homepage" has incompatible type "HTTPConnection"; expected "_HTTPConnectionConnectedState" | No issues found |
| Usage 2 | No issues found                                                                                                         | No issues found |

Conclusion

As observed above, the approach of using TypeState pattern allows us to find a certain class of errors without even executing the code. NOTE that the implementation above does not prevent all the issues (such as socket.error while using the socket), since the remote end can close the connection anytime after the connection has been established.

The only guarantee this approach tries to provide is that the user of an API is using it as intended by the API developer (assuming that the user is following other generic best practices, such as not using protected members directly), where the intention is encoded in the defined types and type signatures.

In my opinion, the decision of whether to use the TypeState pattern or not depends on the particular use case. The example implementation using TypeState pattern discussed here has an extra cost of memory. Each state has been encoded using a Python class, which requires memory to be allocated inside the Python process.

Using a single attribute would take less memory than using a new class to store the type information. (I am not a Python expert, so maybe there is another way to implement the TypeState pattern more efficiently). Python provides __slots__ to optimize memory allocation for classes, which could help in the examples above.

Personally, I would consider not using TypeState pattern in Python when developing for platforms with tight memory constraints (that is if I have to develop in Python, otherwise I would preferably use a compiled language like C/C++/Rust). But if performance is not as major a concern as correctness of code, I would consider using TypeState patterns along with static type checking at least for the core APIs used in 80% of the codebase.

References

1. TypeState pattern

Python context manager

Python socket module

Tags

Join Hacker Noon

Create your free account to unlock your custom reading experience.