The rapid rise in Big Data use cases over the last decade has been accelerated by popular massively scalable open-source technologies such as Apache Cassandra® for storage, Apache Kafka® for streaming, and OpenSearch® for search. Now there’s a new member of the peloton, Cadence, for workflow orchestration.
My flatmate used to carry his Cello on a bike to music lessons, a bit like this, but music and cycling also have something more in common — Cadence!
In music, a Cadence is a musical phrase that sounds complete. And in cycling, Cadence is how fast the cyclist is pedaling, which influences efficiency and speed.
Workflow orchestration is the process of defining and executing a series of tasks and the order in which they are to be carried out. You may want some tasks to be performed in sequence, but others may be performed concurrently. You also see this idea in orchestral music scores, such as in this example, which provides the conductor, musicians, and choir instructions on what notes to play and when.
In typical workflow terminology (e.g., BPMN), a workflow has start and end events, activities (atomic or compound tasks), gateways (decisions), and sequences or flows—the order of activities in a workflow.
In bygone eras of computing, I’ve experienced some of the previous attempts at implementing and modeling workflows, including Enterprise Java Beans (Stateful Session Beans were used for workflows), ebXML, BPEL, and modeling and evaluating workflows using discrete event simulation. Traditional workflow modeling uses the semantics of semi/formal notations, including Finite State Machines (FSM), Markov Models, discrete event simulation, and Petri networks to specify, visualize, and analyze workflows.
Open source Cadence was developed by Uber and based on the AWS SWF (Simple Workflow Service). It uses Apache Cassandra (and other databases) for persistence and is designed to be highly fault-tolerant and scalable. Perhaps the most surprising feature of Cadence, compared to other workflow engine approaches I’ve encountered previously, is that it is focused on helping developers write workflows primarily either in Java or Go (other languages are available), and therefore doesn’t come with any visualization notation or tool to specify workflows, and the semantics are just plain old code. However, it does have the ability (with the Cadence Web client) to visualize workflows as they execute.
Now, riding a bike needs a lot of gear. For example, you need a bike (logically), helmet, shoes, water bottle, glasses, gloves, a light, pump, lycra (optional!), etc. Likewise, successful use of Cadence needs a whole bunch of things, including workflows, activities, domains, workflow clients, and workers. Let’s take a look at each in turn.
Let’s start with the most basic Cadence concept, workflows. I’ll be using the cadence java client, which requires you to download it and configure it to compile in your IDE of choice (I use Eclipse and Maven, and for simplicity, I’ve omitted the imports). There’s a bunch of java client examples that inspired mine. First, we need a workflow interface and implementation.
Workflow interface:
static final String activityName = "ExampleActivity";
public interface ExampleWorkflow {
@WorkflowMethod(executionStartToCloseTimeoutSeconds = 120, taskList = activityName)
void startWorkflow(String name);
}
Workflow implementation:
public static class ExampleWorkflowImpl implements ExampleWorkflow {
private ExampleActivities activities = null;
public ExampleWorkflowImpl() {
this.activities = Workflow.newActivityStub(ExampleActivities.class);
}
@Override
public void startWorkflow(String name) {
System.out.println("Started workflow! ID=" + Workflow.getWorkflowInfo().getWorkflowId());
String r1 = activities.task1(name);
String r2 = activities.task2(r1);
System.out.println("Workflow result = " + r2 + " ID=" + Workflow.getWorkflowInfo().getWorkflowId());
}
}
In Cadence, there’s always exactly one method with the @WorkflowMethod annotation, which acts as the start event for the workflow and contains the sequence flow and activities. Calling it starts a stateful workflow instance, which eventually ends.
In the implementation, the method startWorkflow() calls two other methods in sequence (task1, task2), giving us the workflow logic (simple sequential) and 2 activities.
Each running workflow instance has a unique ID, which we’ve used in the above example. You need a workflow ID as workflows can have multiple instances.
However, there is some more complexity here. In reality, Cadence isn’t just a POJI/POJO; it is a platform to execute the stateful workflows scalably and reliably.
You’ll notice a timeout and activities and taskList above. Timeouts are needed as workflows, like music and cycling, must eventually end.
Activities are a core feature of Cadence. Workflows are stateful and fault-tolerant, but in a distributed microservices architecture, you will eventually want to call remote APIs to perform tasks. However, remote API calls can fail and can’t be made fault-tolerant as they are external to the Cadence workflow.
The solution Cadence uses is to allow any code, including remote calls, to be wrapped in a Cadence Activity, with the caveat that Cadence doesn’t recover activity state on failure, and activities are guaranteed to be executed at most once (idempotent activities will be automatically retried on failure, non-idempotent activity failures will need to be handled by business-specific logic including compensation activities, more information on RetryOptions can be found here).
Activities are just a POJO/POJI pair as well, containing the tasks/methods that are to be executed. Each method has a @ActivityMethod annotation with options.
public interface ExampleActivities
{
@ActivityMethod(scheduleToCloseTimeoutSeconds = 60)
String task1(String name);
@ActivityMethod(scheduleToCloseTimeoutSeconds = 60)
String task2(String name);
}
public static class ExampleActivitiesImpl implements ExampleActivities
{
public String task1(String arg) {
System.out.println("task1 " + arg + " in " + Activity.getWorkflowExecution().getWorkflowId());
return arg+"task1";
}
public String task2(String arg) {
System.out.println("task2 " + arg + " in " + Activity.getWorkflowExecution().getWorkflowId());
return arg+"task2";
}
}
We’ve already registered the Activities in the workflow constructor above. Note that there’s a slightly different way required to get the workflow in an activity. So is a Workflow and Activities all we need to get it running?
Cadence has the concept of domains, which is basically just a namespace that workflows live in. You have to create or reuse an existing domain before you can start workflow or workers. Here’s an example of a method to register a new domain (or return if it exists):
public static void registerDomain(String host, String domain)
{
String nameDescription = "a new domain";
IWorkflowService cadenceService = new WorkflowServiceTChannel(ClientOptions.newBuilder().setHost(host).setPort(7933).build());
RegisterDomainRequest request = new RegisterDomainRequest();
request.setDescription(nameDescription);
request.setEmitMetric(false);
request.setName(domain);
int retentionPeriodInDays = 1;
request.setWorkflowExecutionRetentionPeriodInDays(retentionPeriodInDays);
try {
cadenceService.RegisterDomain(request);
System.out.println(
"Successfully registered domain "
+ domain
+ " with retentionDays="
+ retentionPeriodInDays);
} catch (DomainAlreadyExistsError e) {
System.out.println("Domain " + domain + " is already registered")
}
And just in case you are wondering what IWorkflowService
and WorkflowServiceTChannel
are (I did, and we’ll also use these below)! These are in the ServiceClient package, which doesn’t seem well documented, but this appears to be the main way you connect to the Cadence server.
“The ServiceClient is an RPC service client that connects to the cadence service. It also serves as the building block of the other clients.”
In the java client code, I found the following client types (in the client package and serviceclient package): client/ActivityCompletionClient, client/WorkflowClient, serviceclient/IWorkflowService (There may be more).
We’re now ready to try the example. In the main method, here is some sample code to create a domain and a new WorkflowClient instance:
String host = "Cadence Server IP";
String domainName = "example-domain";
registerDomain(domainName);
WorkflowClient workflowClient =
WorkflowClient.newInstance(
new WorkflowServiceTChannel(ClientOptions.newBuilder().setHost(host).setPort(7933).build()),
WorkflowClientOptions.newBuilder().setDomain(domainName).build());
If you try and run the above example, you’ll find that none of the tasks are progressed, and the workflow just times out eventually. What’s gone wrong? The way that activities are executed is by Workers. Workers for the activities need to be created before any work is performed (similar to musicians in an orchestra—you can have a conductor and the score, but the score isn’t actually played until the musicians are on the stage with their instruments, ready to perform, and the conductor starts them off).
So now that we have a WorkflowClient, we’re ready to create a worker. You have to specify the String activityName, and register both the Workflow and Activities implementations with the workflow before starting it (if you forget to register the activities, there’s no error, but nothing happens as Cadence will just wait for a worker process to appear, at least until the workflow times out!):
WorkerFactory factory = WorkerFactory.newInstance(workflowClient);
Worker worker = factory.newWorker(activityName);
worker.registerWorkflowImplementationTypes(ExampleWorkflowImpl.class);
worker.registerActivitiesImplementations(new ExampleActivitiesImpl());
factory.start();
Note that the workers would typically be run in a separate process to the workflow and for scalability reasons on well resourced (and multiple) servers, as this is where the actual workflow task “work” is really being done.
But how many workers do we actually have? There are default concurrency and rate limits, and the default settings from WorkerOptions are:
I suspect that if you were clever and had sufficient monitoring of the workflows and activities (number and times in each), you could use Little’s Law (which relates concurrency, throughput, and time) to estimate the magic numbers more accurately. Here’s an example of changing the WorkerOptions defaults, including an explanation of what makes sense.
It’s now time to start executing a workflow instance as follows:
ExampleWorkflow exampleWorkflow = workflowClient.newWorkflowStub(ExampleWorkflow.class);
exampleWorkflow.startWorkflow("workflow 1");
Workflows can be started synchronously or asynchronously. Two start two instances synchronously:
ExampleWorkflow exampleWorkflow = workflowClient.newWorkflowStub(ExampleWorkflow.class);
exampleWorkflow.startWorkflow("workflow 1 sync");
ExampleWorkflow exampleWorkflow2 = workflowClient.newWorkflowStub(ExampleWorkflow.class);
exampleWorkflow2.startWorkflow("workflow 2 sync");
And asynchronously:
ExampleWorkflow exampleWorkflow3 = workflowClient.newWorkflowStub(ExampleWorkflow.class);
CompletableFuture<String> r3 = WorkflowClient.execute(exampleWorkflow3::startWorkflow, "workflow async");
CompletableFuture<String> r4 = WorkflowClient.execute(exampleWorkflow3::startWorkflow, "workflow 2 async");
try {
r3.get();
r4.get();
} catch (InterruptedException e1) {
e1.printStackTrace();
} catch (ExecutionException e1) {
e1.printStackTrace();
}
Just remember to wait for the asynchronous workflows to complete (as in this example using get()), otherwise, nothing much will happen (as the workers are in the same thread in this example).
Workflows aren’t the only things that can be called asynchronously; activities can be executed concurrently as well. This is helpful when you want to run a large number of concurrent tasks at once and then wait for them all to complete. For example, let’s replace the original startWorkflow(String name) method with a new method startConcurrentWorkflow(int max), and we’ll just reuse the original task1() activity but execute it concurrently:
public void startConcurentWorkflow(int concurrency) {
List<Promise<String>> promises = new ArrayList<>();
List<String> processed = null;
try {
for (int i=0; i<concurrency; i++)
{
Promise<String> x = Async.function(activities::task1, “subpart “ + i);
promises.add(x);
}
// allOf converts a list of promises to a single promise that contains a list
// of each promise value.
Promise<List<String>> promiseList = Promise.allOf(promises);
// get() blocks on all tasks completing
List<String> results = promiseList.get();
int count = 0;
for (String s: results)
{
System.out.println("Processed " + s);
count++;
}
System.out.println("Processed “ + count + " concurrent tasks");
} finally {
}
}
}
Note that we have to use the Async.function(activities::activityName, arg) method and the Promise class with methods add(), allOf(), and get() to block correctly until all of the activities are complete.
The way that Cadence manages workflow state is “clever”’! Rather than maintaining the state of each stateful workflow instance in a database (which is a typical approach for persistence and reliable workflow engines, but has scalability limitations), Cadence emits state changes to the database as history events, and then if something goes wrong (or even if you sleep in a workflow) it just restarts the workflow and replays the history to determine what the next task is to run. This uses the event sourcing pattern (similar to the way that Kafka Streams can compute state from event streams or event streams from the state).
This approach supports large numbers of workflow instances running at once (which is likely for long-running and high throughput workflows) but does have negative implications for long workflows, which require increasingly large histories and replaying of many events to regain the correct state.
However, this approach does introduce some important restrictions on workflow implementation that you need to be aware of—basically, they have to be deterministic. The Cadence Workflow class has lots of useful methods that can be used in place of unsafe/non-deterministic/side-effect riddled standard Java methods, in particular for time, sleeping, and random numbers.
So that’s just about all I know about Cadence so far. However, there’s a lot more to discover, including how it handles exceptions, retries, long-running transactions, different sorts of timeouts, compensation actions, signals, queries, child workflows, and how you can call remote APIs and integrate it with Kafka (e.g., to send and block/receive a message as an activity).