In this post we’ll look at building a highly efficient event processing topology by leveraging Apache Pulsar’s Functions framework and Project Flogo’s event-driven app framework.
Perhaps you’re scratching your head and asking yourself why do we need this, what’s the point in using Apache Pulsar Functions vs something else for my event processing logic?. Good question, let’s examine just a few of the benefits of Pulsar’s functions framework before we move on:
Additionally, as mentioned above, we’ll use Project Flogo to build our event processing logic. If you’re not familiar, Project Flogo is an open source, event-driven app framework designed from the ground up specifically around the event-driven paradigm and for serverless and FaaS compute models. Some of the benefits of the framework include:
Before we move on let’s spend a few minutes reviewing some of the ideal use cases for Apache Pulsar Functions + Project Flogo.
The above list is just a few of the more common scenarios, however as you begin to combine the above into interconnected logic via Pulsar topics + Functions the myriad of problems you can solve with minimal infrastructure becomes quite impressive.
I’m really not going to get into the details of Apache Pulsar itself, as there are countless articles that cover this in great detail and the Apache Pulsar docs are pretty awesome!
So what exactly are Pulsar Functions? Well, they are compute processes that can be co-located with the cluster nodes. They are designed to consume messages from input topic(s), execute user-defined processing logic (the function itself) and publish results to an output topic. Optionally you can leverage Apache BookKeeper for state persistence and audit to a logging topic.
Put another way, Pulsar Functions provide a platform that is similar to other FaaS platforms, such as that provided by AWS Lambda, Azure Functions, etc. However, the system resources for Pulsar Functions are shared with the Pulsar node(s) and therefore not serverless, just FaaS. You may hear people refer to Pulsar Functions as serverless-like or serverless-inspired for this reason.
The benefits of leveraging such an approach, include:
I suggest you take a look at docs for Pulsar Functions for a more detailed overview.
I’m not going to spend time setting up our Pulsar environment in this article. If you’re interested in setting up your own environment take a look at the Pulsar Docs. Again, they’re quite good!
However, if you’d like to follow along with me I’ll be using the Pulsar Docker Image from DockerHub. Go ahead and fetch the image. I'll wait.
You’ll recall that we’ve chosen to use Project Flogo as the app framework for building our function. This leaves us with several possibilities as to how we’d like to handle incoming events.
Project Flogo exposes a trigger & action construct. The trigger, in this case, is simply the event from a Pulsar topic. The action, a bit more interesting, encapsulates the intricacies of different event processing paradigms, for example:
Oftentimes stream processing is used as a pre-processor for simple event processing. For example, the average temp from a sensor is calculated over a period of time; after the time-bound window has expired a derived event is then published and consumed by a simple event processing function for some action to be taken (notify someone, shut something down, etc).
Consider the following diagram illustrating the basic concept that we're targeting. Note that we're deploying the function in cluster mode, hence co-locating alongside the broker.
The above yields a few major benefits:
This paradigm leaves you to focus only on the function logic itself, no need to worry about how to consume messages or how to publish the results to another topic or consumer. We also need not fret over where to run our functions!
Before we can start building we’ll need to grab the latest Flogo Web UI from DockerHub.
docker run -it -p 3303:3303 flogo/flogo-docker eula-accept
After the container starts successfully you'll be presented with the following message:
======================================================
___ __ __ __ TM
|__ | / \ / _` / \
| |___ \__/ \__| \__/
[success] open http://localhost:3303 in your browser
======================================================
Now that we’ve got the Flogo Web UI up and running, we’ll need to install the Pulsar Function Trigger. This trigger is a specialized Flogo trigger called a shim trigger, which basically allows the trigger to override the main entry point, hence bypassing the generic Flogo engine startup.
To install the trigger:
github.com/project-flogo/messaging-contrib/pulsar/function
After the trigger has successfully installed, we can begin designing our function:
New
from the landing pagePulsarFunctions
Create an action
or New Action
, name it anything you’d likeYou’ll be presented with a blank canvas where you can begin designing your flow (function) logic.
To get started, let's first configure our flow input and outputs.
Add the following activities (activities are the core building block of a Flogo app, essentially the unit of work for a specific task — read from a database, invoke a REST service, etc) to your flow and configure them as discussed below:
Return
: Returns the specified data to the trigger. This allows us to return from the *flow* and map the output params.msgResponse
and insert the following into the mapper: string.concat("Hello ", $flow.msgPayload)
Save
when doneFeel free to add any additional activities based on your specific use case (take a look at the Flogo Showcase to browse community contributed activities/triggers).
We'll need to add the Pulsar Function trigger to our flow, this acts as the entry point into this flow (I’ve taken a bit of liberty and used flow and function interchangeably, as they are conceptually the same thing in this context.).
After the trigger has been added, click the trigger on the left side of the canvas to open the trigger config modal. You'll want to map the input and output data to/from the trigger.
Note, trigger mappings enable you to build your flow logic independent of the trigger itself, we could easily add another trigger and support a different deployment model without any changes to our app logic.
To compile your binary, navigate back to the list of resources within your Flogo app, select the Build button and in the drop down, select
pulsar-function-trigger
(the name of your Pulsar Function trigger).Flogo will build the binary and return a ELF bin that we can deploy to our Pulsar cluster.
Publishing our function to Pulsar in cluster mode is pretty easy:
/pulsar/bin/pulsar-admin functions create \
--go /root/pulsar-functions_linux_amd64 \
-i persistent://public/default/to-flogo \
-o persistent://public/default/from-flogo \
--classname empty
You should see the message
"Created successfully"
`. One odd thing to note, if you don't specify the --classname
switch it will fail to deploy your function... 🤷♂️Good question, let’s find out!
to-flogo
. To publish the message you can use the pulsar-client
CLI:bin/pulsar-client produce -m "Flogo" \
persistent://public/default/to-flogo
from-flogo
topic, which, as you'll recall, is the output topic specified when we deployed our function.bin/pulsar-client consume -s tst-sub \
persistent://public/default/from-flogo
The result of the
from-flogo
topic is as expected, the return value of our Flogo flow:----- got message -----
"Hello Flogo"
On the Flogo front, I think it would be great to see a few enhancements:
I really love the concept of hosting event processing logic alongside the Pulsar node(s), especially considering the reduction in latency and a simplified app architecture.
I work at TIBCO Software and would be remiss if I didn’t at least mention our support for Apache Pulsar as part of TIBCO Messaging.