Bernd Ruecker is the co-founder and chief technologist of Camunda.
On Wednesday, March 11, 2020, I conducted the webinar titled “Monitoring & Orchestrating Your Microservices Landscape using Workflow Automation.” Not only was I overwhelmed by the number of attendees, but we also got a huge list of interesting questions before and especially during the webinar. Some of them were answered, but a lot of them were not.
So I want to answer all open questions in this series of seven blog posts covering.
Note that we also started to experiment with the Camunda’s question corner and discuss to make this more frequent, so keep an eye to our community for more opportunities to ask anything (especially as in-person events are canceled for some time).
Architecture wise I would not only look at the business process but at the system boundaries of your microservices. Let’s assume you have the order fulfillment microservices and the checkout microservice (the latter is what I would relate to sales — bare with me if you had other things in mind).
Within one microservice, you have some kind of freedom of choice, it mostly depends on the technologies used. I would consider all of these approaches valid:
If you cross the boundary of one microservice, you will have different applications that need to be decoupled. So you need to go via a well-defined API, which is most often either REST or Messaging (but could also mean Kafka, gRPC, SOAP or similar). That means you cannot do direct Java calls or use call activities (the italic bullet points above) — both would couple the microservices together too closely in terms of technology.
This is related to the last answer. A call activity makes the assumption, that the called subprocess is also expressed in BPMN and deployed on the same workflow engine. If this is the case, it is a very easy way to invoke that subprocess. You will see the link in monitoring (e.g. Camunda Cockpit) and get further support, for example around canceling parent or child workflows.
So if you don’t have any issues with the restriction to run in the same workflow engine, call activities are great. This is typically the case within one microservice.
If you cross the boundary of one microservice, you don’t want to make that assumption. You don’t even want to know if the other microservice uses a workflow engine or not. You simply want to call its public API.
So in this case the call activity cannot be used, you use a service task or a send/receive task pair instead.
This should be partly answered by the last two answers. End-to-end workflows very often cross the boundaries of one service. Let‘s use the end-to-end order fulfillment business process as an example. It is triggered by the checkout service, needs to retrieve payment, fetch goods from the warehouse and so on.
The whole business process happens because of ping-pong of different services. The interesting part is to look at how this ping-pong is happening. And to end up with a manageable system you have to find the right balance between orchestration (one service commands another service to do something) and choreography (services reacting on events):
That means that only parts of the whole business process will be modeled as an executable workflow in a workflow engine. And some parts mind end up in a different workflow engine. And other parts end up hardcoded somewhere. In the above example payment might also have its own workflow:
This is an implementation detail. It is not visible from the outside. You just call the payment API.
You can learn more about that in my talk Complex Event Flows in Distributed Systems.
You already learned a bit about that in the last answer:
Events are facts, it is the information that something has happened. It is always past tense. If you issue an event you should not care about who is picking it up. You simply tell the world: hey — something happened. One important consequence of this thinking is: You should be OK if nobody picks up your event!
This is a good litmus test if your event is really an event. If you do care that somebody is doing something with it, it would probably be better a command.Commands are messages where you want something to happen. So you send a command to something that you know can act on it. And you expect it to do it. There is no choice. It might be still asynchronous though, so it is not guaranteed that you get an immediate reply.
I talked about that at length in my talk Opportunities and Pitfalls of Event-Driven Utopia (2nd half).
If you need a good metaphor: if you send a Tweet, this is an event. You just tell the world something that you think is interesting. You have no idea what will happen with it, might be that nobody looks at it, but it can also trigger tons of reactions. You simply can’t know in advance. If you want your colleague to do something for you, send them an email. This is a command, as you expect them to do something with it.
Of course you could also switch to synchronous communication and call them, still a command, just blocking and synchronous now. This is another important remark: commands or events are not about communication protocols. Of course it is intuitive to send events via messaging (topics) and do commands via REST calls. But you could also publish events as REST feeds, and you can send commands via messages,
This is a super interesting topic. We build architectures more and more based on asynchronous communication. In some situations you still want to return synchronous responses, e.g. when your web UI waits for a REST call.
At least most projects still think they need that. I doubt, that we need it that often, we should better think twice about user experience and frontend technologies. I wrote about it in Leverage the full potential of reactive architectures and design reactive business processes. But let’s do assume that we need that synchronous response.
In this case you have to implement a synchronous facade. In short, this facade waits for a response within a given timeout. For Camunda BPM I did an example with a semaphore once (not the only possibility of course — just an example):
For Camunda Cloud (Zeebe) we even have this functionality build in with awaitable workflow outcomes.
A couple of years back I wrote Saga: How to implement complex business transactions without two-phase commit — which might be a bit outdated, but still gives you the important basics. Or if you prefer you might tune into this talk: Lost in transaction.
TL-DR: You can’t use technical ACID transactions with remote communications, so never between microservices. You need to handle potential inconsistencies on the business level with one of three strategies:
This continues the answer from the last question. So a Saga is a “long-lived transaction”. One way to implement this is by using BPMN compensation events. The canonical example is this trip booking (I also used in Saga: How to implement complex business transactions without two-phase commit):
Using the order fulfillment example from the webinar, you could also leverage this for payments. Assume you deduct money from the customer’s account first (e.g. vouchers) before charging the credit card.
Then you have to roll that back in case the credit card fails:
Please have a look in the documentation, it is not super easy to find (sorry!), but basically, two places might be worth checking: Transactions in Processes (as the foundation) and Failed Jobs for the specific retry configuration.
For idempotent start, typical strategies are (quoted from Remote workers and idempotency):
- Set the so called businessKey in workflow instances and add a unique constraint on the businessKey field in the Camunda database. This is possible and you don’t loose support when doing it. When starting the same instance twice, the the second instance will not be created due to key violation in this case.
- Add some check to a freshly instantiated workflow instance, if there is already another instance running for the same data. Depending on the exact environment this might be very easy — or quite complex to avoid any race condition. An example can be found in the forum.
In Zeebe we built-in some capabilities for idempotent start using unique “message keys”, see A message can be published idempotent.
You can easily restart workflows as you can learn about in the docs. Note that this might conflict with unique business constraints added for idempotency.
This is quite a lengthy question and hard to answer without further refinement. But a I was consultant for a long time, I answer it anyway :-) I might be way off — then just send me an email and I am happy to refine my answer.
If you separate business logic into independent microservices, you can indeed no longer use database consistency checks (like foreign keys) or query capabilities spanning data from multiple microservices. Let’s use the example given in the question:
I could insert a treatment, where the patient id is invalid. And I could not easily answer the question, what people in Germany got a vaccination in April.
Both are true. But, this is the intention of the design! You want to probably evolve the treatment microservice independently of the patients. Maybe you will do treatments online in the future, where patients are registered differently. Maybe you have different patient microservices depending on location. Or whatsoever. Hard to discuss without a real use case at hand.
So you have to tackle these requirements on a different level now. First you could think of a specific microservice that is there purely for data analysis and does now data from both microservices. Like a data warehouse or the like. You might use events to keep that in sync.
And second you will probably have workflows that care about consistency. So, for example, you could think of a workflow to register a new treatment:
This could live within the treatment service. Then it might ask the patient service if the patient is registered and valid. Another design could be that the treatment microservice also listens to all events and builds a local cache of patients, so it can validate it locally.
And a third design is that there might even be a separate microservice that is responsible for planning treatments, that leverages the other two. Planning a treatment might get more complicated anyway — as you might need to check interactions with treatments in history and much more.
I could go on for ages, but the bottom line: it all depends on how you design the boundaries of your microservices. If you get them right you will not have too many of these problems. If you do them in an unfortunate way, you might get in trouble.
Very often microservices boundaries don’t follow the static entities (= tables) as you might think in the first place. There was a good article on that, unfortunately, I don’t find it right now. But probably that’s a good moment for you to google for good boundaries :-)
Once again these questions need some additional discussion to be clear what is asked. From what I read I would simply say, that it depends (of course ;-)).
You might add logic around package sales and other complex requirements into the order fulfillment microservice. You might also decide that this should be a separate microservice. The important thing is that it matches your organization structure. So if one team should do it all, why not have one microservice? If it is too much for a team it does make sense to split it up.
In the latter case you need to think about the end-to-end business process, so especially about how you can include that special package sales service without hard coding it at many different places.
In the example from the webinar I could imagine that the button (=checkout microservice) or some complex package sales microservice emit the order submitted event, and order fulfillment takes it from there. But looking into more details it gets complicated quickly (what’s with payment? What if I can’t send one package? …?).
This is super interesting to discuss, but only make sense in real-life scenarios. I am happy to join such a discussion — ping me!
Previously published at https://blog.bernd-ruecker.com/microservices-webinar-faq-1a9741f4481c#67b4