Implementing Stateful Process Adapters: Embedded BPMN or AWS Step Functions

Just based on recent experience, I am going to put this out there – AWS Step Functions are great for technical state machines which move from one-activity to another but not really designed for stateful process orchestration and definitely not for implementing SAGA

Serverless Step Functions from AWS or BPMN Engines?

When building microservices, the Mulesoft type platform lets you do a lot of the “stateless” request/response or async interfaces really well. But for “stateful” things, especially ones where we need the following, I think AWS Step functions are a half-baked option

This is because there are good embedded BPMN engines that can do the following:

  • Do stateful end-to-end flows and show them in a dashboard
  • Do stateful flows with activities with Synchronous or Asynchronous (request/response i.e. one-way request and then a wait for a message) actions (with AWS step functions, you code your way out of this)
  • Do out-of-the box RESTful APIs for starting a process, getting the tasks state for a process or pushing the state forward etc
  • Do business friendly diagrams
  • Do operational views with real-time “per process” view of current state or amazing historical views with heat-maps 
  • Easy to manage and maintain by the lowest common denominator in your team – lets face it, the cost of maintenance depends on the cost of your resource supporting it and not everyone is AWS skilled and cheap

The only argument I had heard for AWS was that it was better than the embedded BPM engines because we did not need to manage a database. We threw that argument out when our Step Functions had to use DynamoDB to handle storing the complex state

Screen Shot 2020-08-19 at 3.36.40 pm

Comparing the two offerings


Given my experience at a few clients with embedded BPM Engines and AWS Steps in implementing Long Running processes, I have found that Step Functions are great at doing simple state transitions but not easily maintainable and operable with issues around handling async activities and roll-backs – they can be done but you need to code for it!

The existing light weight BPM engines like Camunda offer a better alternative with self-managed and even hosted options and I love they way they present the process states visually especially the heat-maps with historical information

If you want a lot of simple state machines with scale – pick the serverless option but if you want a solid orchestration option, my preference is using BPMN engines like Camunda


Stateful microservices pattern

What are stateful microservices?

Microservices holding state while performing some longer-than-normal execution time type tasks. They have the following characteristics

  1. They have an API to start a new instance and an API to read the current state of a given instance
  2. They orchestrate a bunch of actions that may be part of a single end-to-end transaction. It is not necessary to have these steps as a single transaction
  3. They have tasks which wrap callouts to external APIs, DBs, messaging systems etc.
  4. Their Tasks can define error handling and rollback conditions
  5. They store their current state and details about completed tasks

Screen Shot 2020-03-13 at 7.52.57 pm

Why stateful?

Stateless microservice requests are generally optimised for short-lived request-response type applications.  There are scenarios where long-running one-way request handling is required along with the ability to provide the client with the status of the request and the ability to perform distributed transaction handling and rollback (because XA sucked!)

So you need stateful because

  • there are a group of tasks that need to be done together as a step that is asynchronous with no guaranteed response-time or asynchronous one-way with a response notification due later
  • or there are a group of tasks where each step individually may have a short response time but  aggregated response-time is large
  • or there are a group of tasks which are part of a single distributed transaction if one fails you need to rollback all

Stateful microservice API

Microservices implementing this pattern generating provide two endpoints

  1. An endpoint to initiate: for example, HTTP POST which responds with a status code of “Created” or “Accepted” (depending on what you do with the request) and responds back with a location
  2. An endpoint to query request state: for example, HTTP GET using the process id from the initiate process response. The response is then the current state of the process with information about the past states

Sample use case: User Signup

  1. The process of signing-up or registering a new user requires multiple steps and interaction looks like this [Command]
  2. The client can then check the status of the registration periodically [Query]


POST /registrations HTTP/1.1Content-Type: application/jsonHost:

{ "firstName": "foo","lastName":"bar",email:"" }
HTTP/1.1 201 Created  
Location: /registrations/12345


GET /registrations/12345 HTTP/1.1Content-Type: application/jsonHost:

{ "firstName": "foo","lastName":"bar",email:"" }
HTTP/1.1 200 Ok  

{ "id":"12345", "status":"Pending", "data": { "firstName": "foo","lastName":"bar",email:"" }}

Screen Shot 2020-03-13 at 7.38.41 pm


While the pattern is simple, I have seen the implementation vary with some key anti-patterns. These anti-patterns make the end solution brittle over time leading to issues with stateful microservice implementation and management

  1. Enterprise business process orchestration: Makes it complex, couples various contexts. Keep it simple!
  2. Hand rolling your own orchestration solution: Unlike regular services, operating long-running services requires additional tools for end-to-end observability and handling errors
  3. Implementing via a stateless service platform and bootstrapping a database: The database can become the bottleneck and prevent your stateful services from scaling. Use available services/products as they optimised their datastores to make them highly scalable and consistent
  4. Leaking internal process id: Your end consumer should see some mapped id not the internal id of the stateful microservice. This abstraction is necessary for security (malicious user cannot guess different ids and query them) and dependency management
  5. Picking a state machine product without “rollback”: Given that distributed transaction rollback and error-handling are two big things we are going need to implement this pattern, it is important to pick a product that lets you do this. A lightweight BPM engine is great for this otherwise you may need to hack around to achieve this in other tools
  6. Using stateful process microservices for everything: Just don’t! Use the stateless pattern as they are optimal for the short-lived request/responses use cases. I have, for example, implemented request/response services with a BPEL engine (holds state) and lived to regret it
  7. Orchestrate when Choreography is needed: If the steps do not make sense within a single context, do not require a common transaction boundary/rollback or the steps have no specific ordering with action rules in other microservices then use event-driven choreography


Stateful microservices are a thing! Welcome to my world. They let you orchestrate long-running or a bunch of short-running tasks and provide an abstraction over the process to allow clients to fire-and-forget and then come back to ask for status

Screen Shot 2020-03-13 at 8.37.14 pm

Like everything, it is easy to fall into common traps when implementing this pattern and the best-practice is to look for a common boundary where orchestration makes sense

Screen Shot 2020-03-13 at 8.33.59 pm


Complex Form Evaluation with Drools


Complex business rules are best implemented using a ‘Rules Engine’. Drools is an open source Business Rules Management Product. See here

Screen Shot 2016-08-10 at 5.03.21 PM.png


In this blog we will cover a few basics of using the Drools rule engine, specifically using a Domain Specific Language (DSL) which is a language that is more user focused. The blog comes with a demo project which can be downloaded and used along with this document.

Demo Use Case

Our demo use case will cover evaluating an ‘Application Form‘ with multiple ‘Sections

Each form section has a ‘rule‘ which the current form evaluators (manual task) use to evaluate the ‘Questions‘ in the form. Each form question has one or more ‘option‘ selected.

For example:

   - Section1
        - Question1
            - Option1
        - Question2
            - OptionA,OptionB
   - Section2
        - Question1
            - OptionX,OptionY

Now let us assume a use case with a few simple questions and conditions associated with a particular form, for example, a ‘weekend work approval’ form. We can as a few simple questions


  • Section1:
    • Question1: “Is this necessary work”
      • options: [Yes, No]
      • rule: “Approved if Yes is selected”
  • Section2:
    • Question1: “When is this work to be done”
      • options: [Weekend Work, Regular time]
      • rule: “Manager approval required if this is weekend work
  • Section3:
    • Question1: “Is this an emergency”
      • options: [Non-Emergency, Emergency]
      • rule: “If this is an emergency fix then it is approved

As you can see in our sample use case, we have only one question per section but this can be more.

Screen Shot 2016-08-10 at 5.04.18 PM.png

Code Repository

You can download the source code from here using

Execution Instructions

Run the form.demo.rules.RulesExecutor java main to run the demo

First Steps – a simple condition

A Simple Rule is implemented in file called rule.dslr

package form.demo.rules;
import form.demo.rules.facts.*
import function form.demo.rules.RulesLogger.log

expander rule.dsl

// ----------------------------------------------------
// Rule #1
// ----------------------------------------------------
rule "Section1 Rule1.1"
  Form has a section called "Section1" 
  Section outcome is "No Further Review Required" 

The DSLR file imports facts from the package form.demo.rules.facts. There is a function called log defined in form.demo.rules.RulesLogger class. There is an expander for the DSLR that converts the expander rule.dsl is defined as the expander

The DSL for the rule is in the rule.dsl file

#  Rule DSL
[when]Form has a section called {name}=$form:FormFact(getSection({name}) != null)
[when] And = and
[when] OR = or
[then]Section outcome is {outcome}=$form.getSection({name}).setOutcome({outcome});log(drools,"Section:"+{name}+", Outcome:"+{outcome}+", Rule Applied:"+ drools.getRule().getName() );

When executed against a set of Facts

  FormFact formWithFacts = new FormFact();
  formWithFacts.addSection("Section1", "Question1", "Yes");
  FormAssessmentInfo assessmentInfo = new RulesExecutor().eval(formWithFacts);
Dec 01, 2015 3:49:09 PM form.demo.rules.RulesLogger log
INFO: Rule:"Section1 Rule1.1", Matched --> [ Section:Section1, Outcome:No Further Review Required, Rule Applied:Section1 Rule1.1]
Not Evaluated
     Section1->No Further Review Required

Adding a second condition

– If some option is selected in a section then set the outcome to a value

// ----------------------------------------------------
// Rule #1
// ----------------------------------------------------
rule "Section1 Rule1.1"
  Form has a section called "Section1"
  -"Yes" is ticked in "Question1"
  Section outcome is "No Further Review Required" 
#  Rule DSL
[when]Form has a section called {name}=$form:FormFact(getSection({name}) != null)
[when]-{option} is ticked in {question}=eval($form.getSection({name}).has({question},{option}))
[when] And = and
[when] OR = or
[then][Form]Section outcome is {outcome}=$form.getSection({name}).setOutcome({outcome});log(drools,"Section:"+{name}+", Outcome:"+{outcome}+", Rule Applied:"+ drools.getRule().getName() );

Adding Global Rules

  • If a section acts as a global flag (for example: Emergency Approval) then ignore all outcomes and select this
  • If there is no global flag then if any of the sections have outcome ‘foo’ then set the form outcome to ‘bar’ otherwise set the form outcome to ‘baz’

In the Rule DSL we add the following, notice how a new instance of the FormFact is created – this time without matching a section name

[when]The Form=$form:FormFact()
[when]-has a section with outcome {outcome}=eval($form.hasSectionWithOutcome({outcome}))
[when]-has no section with outcome {outcome}=eval($form.hasSectionWithOutcome({outcome}) == false)
[then]Form outcome is {outcome}=$form.setOutcome({outcome});log(drools,"Form Outcome Set to "+{outcome});

In the DSLR we implement a few global rules

// ----------------------------------------------------
// Global Rule #1
// ----------------------------------------------------
rule "Global Rule1.1"
  The Form 
  -has a section with outcome "Emergency Work"
  Form outcome is "Approved" 

// ----------------------------------------------------
// Global Rule #2.1
// ----------------------------------------------------
rule "Global Rule2.1"
  The Form 
  -has no section with outcome "Emergency Work"
  -has a section with outcome "Manager Review Required"
  Form outcome is "Manager Review Required" 

// ----------------------------------------------------
// Global Rule #2.2
// ----------------------------------------------------
rule "Global Rule2.2"
  The Form 
  -has no section with outcome "Emergency Work"
  -has no section with outcome "Manager Review Required"
  Form outcome is "Manager Review Required" 

Camunda BPM – Manual Retry

The Camunda BPM is a lightweight, opensource BPM platform (See here:

The “Cockpit” application within Camunda is the admin dashboard where deployed processes can viewed at a glance and details of running processes are displayed by the process instance ids. Clicking on a process instance id reveals runtime details while the process is running – process variables, task variables etc. If there is an error thrown from any of the services in a flow, the Cockpit application allows “retrying” the service manually after the coded automatic-retries have completed.

The steps below show how to retry a failed process from the Cockpit administrative console

Manual Retry

From the Process Detail Screen select a failed process (click on the GUID link)

Then on the Right Hand side you should see under “Runtime” a “semi-circle with arrow” indicating “recycle” or “retry” – click on this

This launches a pop-up with check-boxes with IDs for the failed service tasks and option to replay them. Notice these are only tasks for “this” instance of the “flow”

After the “Retry” is clicked another pop-up indicates the status of the action (as in “was the engine able to process the retry request”. The actual Services may have failed again)