Apache Kafka is the new hotness when it comes to adding realtime messaging capabilities to your system. At its core, it is an open source distributed messaging system that uses a publish-subscribe system for building realtime data pipelines. But, more broadly speaking, it is a distributed and horizontally scaleable commit log.
In a Kafka cluster, you will have topics, producers, consumers, and brokers:
Take a deep dive into Kafka here.
Overall, Kafka provides fast, highly scalable and redundant messaging through a publish-subscribe model.
A pub-sub model is a messaging pattern where publishers categorize published messages into topics without knowledge of which subscribers would receive those messages (if any). Likewise, subscribers express interest in one or more topics and only receive messages that are of interest, without knowing anything about the publishers (source).
As a messaging system, Kafka has some transformative strengths that have catalyzed its rising popularity
Due to its intrinsic architecture, Kafka is not optimized to provide API consumers with friendly access to realtime data. As such, many orgs are hesitant to expose their Kafka endpoints publicly.
In other words, it is difficult to expose Kafka across a public API boundary if you want to use traditional protocols (like websockets or HTTP).
To overcome this limit, we can integrate Pushpin into our Kafka ecosystem to handle more traditional protocols and expose our public API in a more accessible and standardized way.
Server-sent events (SSE) is a technology where a browser receives automatic updates from a server via HTTP connection (standardized in HTML5 standards). Kafka doesn’t natively support this protocol, so we need to add an additional service to make this happen.
Pushpin’s primary value prop is that it is an open source solution that enables realtime push — a requisite of evented APIs (GitHub Repo). At its core, it is a reverse proxy server that makes it easy to implement WebSocket, HTTP streaming, and HTTP long-polling services. Structurally, Pushpin communicates with backend web applications using regular, short-lived HTTP requests.
Integrating Pushpin and Kafka provides you with some notable benefits:
In this next example, we will expose Kafka message via HTTP streaming API.
This example project reads messages from a Kafka service and exposes the data over a streaming API using Server-Sent Events (SSE) protocol over HTTP. It is written using Python & Django, and relies on Pushpin for managing the streaming connections.
In this demo, we drop a Pushpin instance on top of our Kafka broker. Pushpin acts as a Kafka consumer, subscribes to all topics, and re-publishes received messages to connected clients. Clients listen to events via Pushpin.
More granularly, we use views.py to set up an SSE endpoint, while relay.py handles the messaging input and output.
virtualenv --python=python3 venv. venv/bin/activatepip install -r requirements.txt
2. Create a suitable **.env**
with Kafka and Pushpin settings:
KAFKA_CONSUMER_CONFIG={"bootstrap.servers":"localhost:9092","group.id":"mygroup"}GRIP_URL=http://localhost:5561
3. Run the Django server:
python manage.py runserver
4. Run Pushpin:
pushpin --route="* localhost:8000"
5. Run the **relay**
command:
python manage.py relay
The relay
command sets up a Kafka consumer according to KAFKA_CONSUMER_CONFIG
, subscribes to all topics, and re-publishes received messages to Pushpin, wrapped in SSE format.
Clients can listen to events by making a request (through Pushpin) to /events/{topic}/
:
curl -i http://localhost:7999/events/test/
The output stream might look like this:
HTTP/1.1 200 OKContent-Type: text/event-streamTransfer-Encoding: chunkedConnection: Transfer-Encoding
event: messagedata: hello
event: messagedata: world