Domain Centric Product Teams: Pulling a Reverse Conway Manoeuvre to deliver better Software Products

Modern software engineering is oriented towards building networked distributed features for a highly connected and web savvy customer base in varying contexts. Traditional team structures within the enterprise have evolved from technical SME cliques as engineers who “Ate lunch together wrote Software together”

Good product strategy requires thinking about product features are built by engineering teams because Conway’s Law drives the outcome – to build, maintain and change great functional products we need to deliberately fix the team organisation

Conway’s Law and Layered Architecture

Melvin Conway described a law where he states that the parts of a software system are directly proportional to the organisational structure

  • “Organizations which design systems are constrained to produce designs which are copies of the communication structures of these organizations.” – Melvin Conway [1]
  • “If you have four groups working on a compiler, you’ll get a 4-pass compiler” – Eric S Raymond [2]

We can think of the modern teams oriented around the customer and end-systems to deliver integrated products in the following manner (circa 2020)

Thus, broadly speaking, teams orient around technical layers due to the cohesiveness of the “technical domain” expertise & customer’s (software client) needs.

Screen Shot 2020-08-17 at 1.29.17 pm

 

 

If you have been part of  similar teams in the past you must be familiar with the “layered architectural style” that gives rise to this team

SOALayers

 

Technical Teams = Technical Product

While the layered teams do a great job of grouping the teams by technical expertise, business needs for new end-to-end features are contextual, especially in a large enterprise. Things are more contextual in the real-world

Screen Shot 2020-08-17 at 2.52.24 pm

While the layered team structure may work initially for one context, we find adding new features and domain contexts makes it harder to maintain software components across a layer. This leads to slower software delivery due to various reasons

It is becomes increasing difficult for a single team to know and own aspects of a technical layer used in various contexts

Screen Shot 2020-08-17 at 2.59.30 pm

Conway’s law: With layered teams, we build layered software components focussed on the technical correctness vs end-to-end functionality

Techno-functional Teams = Functional Products

Applying the “reverse Conway manoeuvre” – fixing our team organisation to shape our software we break the technical teams into smaller functional teams. This process naturally builds a more “Domain Centric Product team” which can focus on functional needs and stay true to the end-to-end feature

This organisation allows for a closer relationship across technology layers as we group experts in different technical domains to deliver a common outcome and if done well, allows a smallish technical team to own a contextual solution

Screen Shot 2020-08-17 at 3.10.07 pm

 

Pitfalls of Functional Teams

Functional teams are not the perfect solution. Functional teams are highly agile and can own a specific end-to-end solutions through the entire software lifecycle but they operate in their own bubble

When organisations have different “technical practices” within their IT with different software engineering standards (one for the Portal team, one for the Integration team, one for the CRM team etc) it becomes harder to enforce technical standards within a technical context when a member is “outsourced” to another team

Summary

Modern software engineering is oriented towards building networked distributed features for a highly connected and web savvy customer base in varying contexts. Traditional team structures within the enterprise have evolved from technical SME cliques as engineers who “Ate lunch together wrote Software together”

Good product strategy requires thinking about breaking up the technical groups into highly effective functional teams and keeping the “band together”  through  the  lifecycle  of the software  (the end-to-end product)

 

 

 

Stateful microservices pattern

What are stateful microservices?

Microservices holding state while performing some longer-than-normal execution time type tasks. They have the following characteristics

  1. They have an API to start a new instance and an API to read the current state of a given instance
  2. They orchestrate a bunch of actions that may be part of a single end-to-end transaction. It is not necessary to have these steps as a single transaction
  3. They have tasks which wrap callouts to external APIs, DBs, messaging systems etc.
  4. Their Tasks can define error handling and rollback conditions
  5. They store their current state and details about completed tasks

Screen Shot 2020-03-13 at 7.52.57 pm

Why stateful?

Stateless microservice requests are generally optimised for short-lived request-response type applications.  There are scenarios where long-running one-way request handling is required along with the ability to provide the client with the status of the request and the ability to perform distributed transaction handling and rollback (because XA sucked!)

So you need stateful because

  • there are a group of tasks that need to be done together as a step that is asynchronous with no guaranteed response-time or asynchronous one-way with a response notification due later
  • or there are a group of tasks where each step individually may have a short response time but  aggregated response-time is large
  • or there are a group of tasks which are part of a single distributed transaction if one fails you need to rollback all

Stateful microservice API

Microservices implementing this pattern generating provide two endpoints

  1. An endpoint to initiate: for example, HTTP POST which responds with a status code of “Created” or “Accepted” (depending on what you do with the request) and responds back with a location
  2. An endpoint to query request state: for example, HTTP GET using the process id from the initiate process response. The response is then the current state of the process with information about the past states

Sample use case: User Signup

  1. The process of signing-up or registering a new user requires multiple steps and interaction looks like this [Command]
  2. The client can then check the status of the registration periodically [Query]

Command

POST /registrations HTTP/1.1Content-Type: application/jsonHost: myapi.org

{ "firstName": "foo","lastName":"bar",email:"foo@bar.com" }
HTTP/1.1 201 Created  
Location: /registrations/12345

Query

GET /registrations/12345 HTTP/1.1Content-Type: application/jsonHost: myapi.org

{ "firstName": "foo","lastName":"bar",email:"foo@bar.com" }
HTTP/1.1 200 Ok  

{ "id":"12345", "status":"Pending", "data": { "firstName": "foo","lastName":"bar",email:"foo@bar.com" }}

Screen Shot 2020-03-13 at 7.38.41 pm

Anti-patterns

While the pattern is simple, I have seen the implementation vary with some key anti-patterns. These anti-patterns make the end solution brittle over time leading to issues with stateful microservice implementation and management

  1. Enterprise business process orchestration: Makes it complex, couples various contexts. Keep it simple!
  2. Hand rolling your own orchestration solution: Unlike regular services, operating long-running services requires additional tools for end-to-end observability and handling errors
  3. Implementing via a stateless service platform and bootstrapping a database: The database can become the bottleneck and prevent your stateful services from scaling. Use available services/products as they optimised their datastores to make them highly scalable and consistent
  4. Leaking internal process id: Your end consumer should see some mapped id not the internal id of the stateful microservice. This abstraction is necessary for security (malicious user cannot guess different ids and query them) and dependency management
  5. Picking a state machine product without “rollback”: Given that distributed transaction rollback and error-handling are two big things we are going need to implement this pattern, it is important to pick a product that lets you do this. A lightweight BPM engine is great for this otherwise you may need to hack around to achieve this in other tools
  6. Using stateful process microservices for everything: Just don’t! Use the stateless pattern as they are optimal for the short-lived request/responses use cases. I have, for example, implemented request/response services with a BPEL engine (holds state) and lived to regret it
  7. Orchestrate when Choreography is needed: If the steps do not make sense within a single context, do not require a common transaction boundary/rollback or the steps have no specific ordering with action rules in other microservices then use event-driven choreography

Summary

Stateful microservices are a thing! Welcome to my world. They let you orchestrate long-running or a bunch of short-running tasks and provide an abstraction over the process to allow clients to fire-and-forget and then come back to ask for status

Screen Shot 2020-03-13 at 8.37.14 pm

Like everything, it is easy to fall into common traps when implementing this pattern and the best-practice is to look for a common boundary where orchestration makes sense

Screen Shot 2020-03-13 at 8.33.59 pm

 

Tackling complexity: Using Process maps to improve visibility of integrated system features

“Entropy always increases– second law of thermodynamics

Enterprise systems are similar to isolated physical systems, where the entropy or hidden-information always increases. As the business grows, our technology footprint grows as new systems are implemented, new products and cross-functional features are imagined and an amazing network of integrations emerge

Knowing how information flows and managing the chaos is therefore critical organisations are to move beyond “Functional-1.0” into “Lean-2.0” and “Strategic-3.0”  in their implementations. We discuss how current documentation and technical registries simply “tick the box” and there is a better approach to manage increasing complexity through better context

Enterprise Integration Uses Integrations

Current state: Integration Interface Registries with little context

The network of integrations/interfaces (blue-circles above) are often captured in a technically oriented document called “Interface Registry” in a tabular form by teams performing systems integration. While these tables provide details around “who” (producer/consumer details) and “how” (the type of integration) they cannot describe “when” and “why” (use case).  As projects grow and interfaces grow or are re-used the number of when and whys increase over time and entropy (hidden information) around these interfaces grows; this leads to chaos as teams struggle to operate, manage and change them without proper realisation of the end-to-end picture

As a result, only maintaining a technical integration Interface registry leads to poor traceability (business capability to technical implementation), increased maintenance-cost of interfaces ( hard to test for all scenarios) and leads to duplication of effort over time ( as change becomes complex, teams rewrite)

Screen Shot 2019-11-21 at 6.22.21 pm

Integration Interface Repository

Therefore without proper context around Integration Interfaces, organisations will struggle to manage and map cross-functional features leading to slower lead-time, recovery etc over time. We propose that documenting integration use-cases, in a business-friendly visual language and related them to technical interface lists and enterprise capabilities is the key to mastering the chaos

Mastering the chaos: Building a context map

Context is key as it

  1. It drives product-centric thinking vs project-based thinking
  2. It makes our solution more operable, maintainable and re-useable 

In order to  provide better context and do it in a clear visually-oriented format, we believe documenting integration user-stories as technical process flows is a good start

Consider the following use-case: “As a user, I must be able to search/register/update etc in a system”.  Use-cases begin all start with some activation point – a user, timer or notification and then involve orchestration of services or choreography of events resulting in actions within microservices or end-systems eventually delivering some value through a query or command. We can render such a use-case into a map showing the systems, interfaces and actions in them (activation point, services, orchestrations, value) and do so in a standard manner

Screen Shot 2019-11-21 at 5.58.33 pm

For example, we leveraged the Business Process Management Notation – BPMN 2.0 standards to map integration technical use-case flows where we used general concepts like “swim-lanes” for user and systems, “arrows” for Interfaces (solid for request-response interfaces, dotted-lines for async messages) etc.

The Picture below shows this concept along with the “Interface” lines and “Messages” connecting the boxes (actions) between systems. Each interface or message then was linked to the Integration Interface Registry so that it was easy to trace reuse and dependencies

Screen Shot 2019-11-21 at 4.26.04 pm

It was also important that the context picture above is fairly lean as it avoids documenting too much to avoid becoming a single giant end-to-end picture with everything on it. It is best to stay within a bounded-context and only refer to a specific use-case such as “User Registration” or “Order submission” or “Customer Management” etc. This has the added advantage of helping teams which speak a ubiquitous language talk to a collection of pictures belonging to their domain and integration-practitioners to identify a collection of such teams (bounded-contexts)

Building a library and related to EA

The journey to improve visibility and maintenance of integration artefacts then involves capturing these integration use-case context maps, storing them in a version-controlled repository, relating them to other technical and business repositories

This collection of context maps would contain similar information to a “high-level enterprise system integration view” but with a greater degree of clarity

Screen Shot 2020-02-24 at 7.02.43 pm

This collection can also be linked to the Enterprise Architecture (EA) Repository for full end-to-end traceability of Business Capabilities into Technical Implementations. In fact, the TOGAF framework describes an external Business Architecture repository pattern as part of Solution building blocks (see TOGAF structural framework )

We imagine the Integration Context Map repository linked to the Enterprise Architecture Repository and the Integration Interface repository as shown below – this would provide immense value to cross-functional teams and business stakeholders, allowing both to see a common picture

Screen Shot 2019-11-19 at 7.54.49 pm

Sequence flows or process flows?

Sequence diagrams can also be used to document technical use-cases with systems and interfaces, however similar to the Integration interface list, they then to be difficult to consume for the non-technical users and lack the clarity provided by process maps

Screen Shot 2019-11-19 at 6.06.42 pm

As a general rule of thumb we found the following segregation to be useful:

  1. What: Technical process flows for end-to-end visibility, especially useful in complex long-running distributed features.  Sequence diagrams for technical component designs, best for describing how classes or flows/sub-flows (in Mule, for example) interact
  2. Who:  Context maps by Business Analysts (BA) or Architect and Sequence flows by Developers
  3. When: Context maps by Business Analysts (BA) as early as during project Discovery, providing inputs to sizing and visual map of what-is-to-be (sketch?). Sequence flows by Developers, as a task in Development story
Screen Shot 2019-11-14 at 6.43.16 pm

Let us talk tools

There are a variety of tools that can help document process context maps in a standard BPMN 2.0 format. The key criteria here is to produce a standard artefact – a BPMN 2.0 diagram so that it can be managed by standard version-control tools and rendered to documents, team wikis etc. though tools/plugins

Below is a list of tools you can try, we recommend not getting too hung up on tools and instead focus on the practice of documenting integration use-cases

Tools

Recap

  1. As enterprise projects deliver more integrated solutions, it becomes harder to manage and change integration interfaces without proper traceability
  2. Improve traceability of a single end-to-end use-case through a context map
  3. You can use BPMN 2.0 for a standardised notation to do this and use tools to generate these context maps as .bpmn files
  4. You can version control these .bpmn  files and build a collection of context maps
  5. You can link these context maps to Integration Interface registry and Enterprise Business capability registry for increased traceability across the enterprise
  6. There are many tools to help you write the .bpmn files, don’t get hung up on the tools. Start documenting and linking to the interface registry

Conclusion

The context map collection then becomes very useful for enterprise architecture, integration operations, new project teams, testing etc. as a common visual artefact as it relates to the users, systems and interfaces they use 

Enterprise Integration process maps then become a powerful tool over time as they greatly improve visibility across the landscape and help teams navigate a complex eco-system through a contextual and meaningful visual tool; this leads to better open and maintainable integration products leading to reuse and cost-efficiency