Observing distributed systems: Monitoring, Logging, Auditing and Historical analysis

“Knowing the quality of your services at any given moment in time before your customers do and using this information to continuously improve customer experience is part of modern software delivery and critical to the success of organisations”

In this post, we present why it is important to observe and manage our systems and solutions proactively and present the various mechanisms available for observing and reacting. We discuss distributed services observability through monitoring, logging, tracing and contextual heatmaps

TL;DR: 5 key takeaways in this post

  1. Observing distributed system components is important for operational efficiency
  2. Observations types vary based on observation context and need
  3. Monitoring, Alerting, Logging, Auditing, Tracing and Historical views are types of observations we can make
  4. Observation ability is available out-of-the-box for platforms (AWS, Mulesoft etc) and in 3rd party products (Dynatrace, AppDynamics, New Relic etc). Some things you still need to bake into your API or Microservices framework or code to achieve über monitoring
  5. Use observations not just to react now but also to improve and evolve your technology platform

Why

Thanks to efficient software delivery practices we are delivering more integrated solution features and bolting on more integrated systems to accelerate the digital transformations. This means a lot of old internal systems and external services are being wired onto shiny new enterprise services over a standard platform to enable the flow of data back and forth

Keeping the lights on is simply not enough then, we need to know if the fuse is going to blow before the party guests get here!

Businesses therefore need to

  • proactively observe their systems for fraud and malicious
  • watch and act at actively and through passive means
  • regularly interact with their systems as their users would to discover faults before users do
  • track a single interaction end-to-end over simple and complex transactions for faster resolution of complaints and issues
  • and evolve their features by listening to their systems over a period of time

Screen Shot 2020-03-13 at 1.31.20 pm

Observation contexts and approach

I have, over time, realised that how we observe depends-on what we want to observe and when. There are multiple ways to observe, most of us are familiar with terms like Monitoring, Alerting, Logging, Distributed Tracing etc. but these are useful within an observation context. These contexts are real-time active or passive, incident management, historical analysis etc.

Let us look at some of these contexts in detail:

  • To know at any instant if platform or services are up or down then we use a Monitoring approach
  • If we want to be notified of our monitored components hitting some threshold (CPU, heap, response time etc.) then we use Alerting
  • If we want the system to take some action based on monitoring thresholds (scale-out, deny requests, circuit-break etc.) then we use Alert Actioning
  • If we want more contextual, focussed deep-dive for tracking an incident or defect then we use Logging and Tracking (with tracking IDs)
  • If we want to track activity (user or system) due to concerns around information security or privacy then we implement Log Auditing 
  • If we want to detect bottlenecks, find trends, look for hot spots, improve and evolve the architecture etc. then we use Historical Logs, Distributed Tracing and Contextual flow maps 

Monitoring

Monitoring enables operators of a system to track metrics and know the status at any given point in time. It can be provided via out-of-box plugins or external products and enabled on all levels of an integrated solution: bottom-up from Platform to Services and side-to-side from a client-specific service to domain services, system services etc

Screen Shot 2020-03-13 at 3.18.39 pm

A key thing to note here is that monitoring in the traditional sense was driven by simply “watching the moving parts” but with modern monitoring products, we can “interact” with the services as a “hypothetical user” to detect issues before they do. This is called synthetic transaction monitoring and in my experience has been invaluable at delivering a proactive response to incidents and improving customer experience

For example:

  • Cloud Service Provider Monitoring: AWS Monitoring offers monitoring of its cloud platform and the AWS services   [ Example: https://docs.aws.amazon.com/step-functions/latest/dg/procedure-cw-metrics.html ]
  • Platform As A Service (PaaS) Provider Monitoring: Mulesoft offers an “Integration Services” platform as a service and provides monitoring for its on-prem or cloud-offerings which includes monitoring for the platform and its runtime components (mule applications) [Example: https://www.mulesoft.com/platform/api/monitoring-anypoint]
  • Monitoring Products: Products like New Relic, Dynatrace, App dynamics etc. work great if your enterprise spans a variety of cloud or on-prem services, needs a centralised monitoring solution and requires advanced features such as synthetic transactions, custom plugins etc

Alerting and Actions

Alerting allows users to be notified when monitored resources cross a threshold ( or trip some management-rule. Alerting depends on monitoring and is a proactive approach to knowing how systems are performing at any point in time

While alerts can be great, they can quickly overwhelm a human if there is too many.  One strategy is for the system to take automatic action if there is an alert threshold reach and let the human know it has done something to mitigate a situation. For example:

  • If the API is overloaded (504 – Gateway timeout) but still processing requests, then spin up a new instance of the component to serve the API from a new runtime
  • If downstream service has gone down (500 – Service unavailable) or is timing out (408 – Request timeout) then trip the circuit breaker i.e return 504 from this API
  • If there is a known issue with the runtime heap memory which causes the application to become unresponsive every 20ish hours, then start a new instance when heap reaches a certain threshold and restart this se

Screen Shot 2020-03-13 at 1.52.26 pm

A Sample Dynatrace is shown below with the status of microservices and metrics over time per instance

Screen Shot 2020-03-03 at 9.42.02 am

Logging, Auditing and Transaction Tracking

This tells us about a specific functional context at a point-in-time and given by Logging solutions over our microservices and end systems. Generally, this type of information is queried from the logs using a transaction id or some customer detail and happens after an issue or defect is detected in the system. This is achieved through logging or distributed tracing 

Logging:

  • Use log levels – DEBUG, INFO, ERROR and at each level log only what you need to avoid log streams filling up quickly and call from your friendly enterprise logging team
  • Avoid logging personally identifiable (PI) information ( name, email, phone, driver’s licence etc) – imagine this was your data flowing through someone’s logs, what would you like them to store and see?
  • Log HTTP method and path if your framework does not do that by default

Auditing:

  • Is logging user actions for tracking access especially for protected resources
  • Involves logging information about “who”, “when” and “which resource”
  • Is compact and concise to enable faster detection (less noise in the logs the better)
  • Usually, separate from functional logs but can be combined if it suits

Tracking:

  • Useful for looking at things end-to-end, User Interface to the backend systems
  • Uses trackingIDs to track transactions with each point forwarding the trackingID to the next point downstream
  • Each downstream point must respond back with the same trackingID to close the loop
  • The entry-point, i.e. service client (Mobile app, Web app etc) must generate the trackingID. If this is not feasible then the first service accepting the request must generate this unique ID and pass it along

Screen Shot 2020-03-13 at 4.36.53 pm

 

Heatmaps and historical views

This type of view is constructed from looking at long term data across a chain of client-microservices-provider interactions. Think of a heatmap of flows and errors which emerge over time through traces in the system. This information obviously is available after a number of interactions and highly useful in building strategies to detect bottlenecks in the solution and improve service quality to the consumers

A historical view with heatmaps is achieved through aggregated logs overlayed on visual flow maps grouped by some processID or scenarioID

One example of this in the view below from a tool called Camunda Cockpit. Camunda a lightweight embedded BPMN engine and used for orchestrating services in a distributed transaction context (learn more from Bernd Rucker here https://blog.bernd-ruecker.com/saga-how-to-implement-complex-business-transactions-without-two-phase-commit-e00aa41a1b1b)

 

heatmap

Summary

  1. Observing distributed system components is important for operational efficiency
  2. Observations types vary based on observation context and need
  3. Monitoring, Alerting, Logging, Auditing, Tracing and Historical views are types of observations we can make
  4. Observation ability is available out-of-the-box for platforms (AWS, Mulesoft etc) and in 3rd party products (Dynatrace, AppDynamics, New Relic etc). Some things you still need to bake into your API or Microservices framework or code to achieve über monitoring
  5. Use observations not just to react now but also to improve and evolve your technology platform

How did we get to Microservices?

If you have struggled with decisions when designing APIs or Microservices – it is best to take a step back and look at how we got here. It helps not only renew our appreciation for the rapid changes we have seen over the past 10-20 years but also puts into perspective why we do what we do

I still believe a lot of us cling to old ways of thinking when the world has moved on or is moving faster than we can process. APIs and microservices are not just new vendor-driven fads rather they are key techniques, processes and practices for businesses to survive in a highly evolving eco-system. Without an API-strategy, for example, your business might not be able to provide the services to consumers your competition can provide or with the quality of service we have come to expect (in real-time)

So, with that in mind let us take a step back and look at the evolution of technology within the enterprise and remember that this aligns to the business strategy

Long time ago

There were monolithic applications,  mainframe systems for example processing Order, Pricing, Shipment etc. You still see this monolith written within startups because they make sense – no network calls, just inter-process communication and if you can join a bunch of tables you can make a “broker” happy with a mashup of data

Screen Shot 2020-02-26 at 5.48.04 pm
1. First, there was just the legacy monolith application

Circa 2000s MVC app world

No ESB yet. Exploring the JVM and Java for enterprise application development. JSF was new and EJB framework was still deciding what type of beans to use. Data flowed via custom connectors from Mainframe to these Java applications which cached them and allowed viewing and query of this information

There were also functional foundation applications for enterprise logging, business rules, pricing, rating, identity etc. that we saw emerging and data standards were often vague. EAI patterns were being observed but not standardized and we were more focused on individual service design patterns and the MVC model

Screen Shot 2020-02-26 at 5.48.12 pm
2. Then we built lots of MVC applications and legacy monolith, integration was point-to-point

Services and Service-Oriented Architecture

The next wave began when the number of in-house custom applications started exploding and there was a need for data standardization, a common language to describe enterprise objects and de-coupled services with standards requests and responses

Some organisations started developing their own XML based engines around message queues and JMS standards while others adopted the early service bus products from vendors

Thus Service-Oriented Architecture (SOA) was born with lofty goals to build canonical enterprise data models, reduce point-point services (Java applications had a build-time dependency to services they consumed, other Java services), add standardized security, build service registry etc

We also saw a general adoption and awareness around EAI patterns – we finally understood what a network can do to consistency models and the choice between availability and consistency in a partition. Basically stuff known by those with a Computer Science degree working on distributed computing or collective-communication in a parallel computing cluster

One key observation is that the vendor products supporting SOA were runtime monoliths in their own right. It was a single product (J2EE EAR) running on a one or more application servers with a single database for stateful processes etc. The web services we developed over this product were mere XML configuration which was executed by one giant application

Also, the core concerns were “service virtualisation” and “message-based routing”, which was a pure stateless and transformation-only concept. This worked best when coupled with an in-house practice of building custom services and failed where there was none and the SOA product had to simply transform, route (i.e. it did not solve problems by itself as an integration layer)

Screen Shot 2020-02-26 at 5.36.36 pm
3. We started to make integration standardised and flexible, succeed within the Enterprise but failed to scale for the Digital world. Not ready for mobile or cloud

API and Microservices era

While the SOA phase helped us move away from ugly file-based integrations of the past and really supercharged enterprise application integration, it failed miserably in the digital customer domain. The SOA solutions were not built to scale, they were not built for the web and the web was scaling and getting jazzier by the day; people were expecting more self-service portals and XML parsers were response times!

Those of us who were lucky to let go off the earlier dogma (vendor coolade) around the “web services” we were building started realising there was nothing webby about it. After a few failed attempts at getting the clunk web portals working, we realised that the SOA way of serving information was not suited to this class of problems and we needed something better

We have come back a full circle to custom build teams and custom services for foundation tasks and abstractions over end-systems – we call these “micro-services” and build these not for the MVC architecture but as pure services. These services speak HTTP  natively as the language of the web, without custom standards like SOAP had introduced earlier and use a representational state transfer style (REST) pattern to align with hypermedia best-practices; we call them web APIs and standardise around using JSON as the data format (instead of XML)

Screen Shot 2020-02-26 at 7.01.52 pm
4. Microservices, DevOps, APIs early on – it was on-prem and scalable

The API and Microservices era comes with changes in how we organise (Dev-Ops), where we host our services (scalable platforms on-prem or available as-a-service) and a fresh look at integration patterns (CQRS, streaming, caching, BFF etc.). The runtime for these new microservices-based integration applications is now broken into smaller chunks as there is no centralised bus 🚎

Screen Shot 2020-02-26 at 7.08.20 pm
5. Microservices, DevOps, APIs on externalised highly scalable platforms (cloud PaaS)

Recap

Enterprise system use has evolved over time from being depended on one thing that did everything to multiple in-house systems to in-house and cloud-based services. The theme has been a gradual move from a singular application to a network partitioned landscape of systems to an eco-system of modular value-based services

Microservices serve traditional integration needs between enterprise systems but more importantly enable organisations to connect to clients and services on the web (cloud) in a scalable and secure manner – something that SOA products failed to do (since they were built for the enterprise context only). APIs enable microservices to communicate with the service consumers and providers in a standard format and bring with them best practices such as contract-driven development, policies, caching etc that makes developing and operating them at scale easier

De-mystifying the Enterprise Application Integration (EAI) landscape: Actors, terminology, cadence and protocols

Any form of Enterprise Application Integration (EAI) [1] work for data synchronization,  digital transformation or customer self-service web implementation involves communication between the service providers and service consumers. A web of connections grows over time between systems, facilitated by tools specialising in “system-integration”; this article covers how the clients, services and integration tools communicate and nuances around this observed in the wild

EAI Actors

Depending on the context, a system becomes a data-consumer or data-provider. These consumers and providers can be internal to a business enterprise or external to them. External providers can be pure software-as-a-service or partners with platforms and build teams entrenched in the “client-site”

Screen Shot 2020-02-25 at 3.41.20 pm
1. Services, Consumers within and outside a business enterprise

The provider and consumer systems are the key actors within an internal or external system-integration context. Cadences vary as some provider services are stable and mature, while others are developed with the client applications

Service/API Contract

Providers and Consumers communicate with each other using one of many standard protocols; also consumers have a direct dependency on service provider’s “service contract” to know about these protocols and service details

A service contract is a document describing one or more services offered by a service provider and covers details such as protocol, methods, data type and structure etc.

Screen Shot 2020-02-25 at 1.48.43 pm
2. Consumer, Service provider and Service Contract

A good service contract contains well-documented details of the service as well as “examples” of the request, response and errors. RAML [3] and Swagger/OAS [4] are two of the popular tools used in documenting service contracts

For example, the “Address search” contract by a SaaS vendor below describes the method, URI, query parameters and provides the user with the ability to “try out” the service. This approach allows consumers to iterate faster when developing solutions that use address search without having to engage the SaaS vendor teams (self-service)

Screen Shot 2020-02-25 at 3.49.51 pm
3. A Service Contract on an API Portal [2]

Service Contract Cadence: Contract Driven Development

Contract cadence is when a service contract is available compared to when the client wants to build their application. There are 2 cadences – mature and in-flight.

For a mature/pre-existing service, a good service contract allows the service consumer to develop a client application without having to engage a service provider person. Meanwhile, for a service being developed with the client application, there is an opportunity for both teams to author the service contract together during the elaboration phase of a project

Screen Shot 2020-02-25 at 4.29.52 pm
4. Deliver Cadence based on Service Contract Availability

Service contracts are key to consuming a service; when consuming mature services look for descriptive, sandbox-enabled, self-service services to supercharge your delivery. For services being developed in a project (along with the client applications), by internal teams or external vendors, ask for contracts upfront and ask for consumers & service providers to co-author the contracts to remove ambiguities sooner 

Avoid generating service contracts from developed components (Java Class to WSDL) as this technique leads to isolated, one-way, ambiguous specifications requiring considerable hand-holding and has the most defects during integration testing (from experience). Use Contract Driven Development [8] which facilitates the writing of a contract by the Service Provider and the Consumer together during Elaboration phase (sign in blood if adventurous)  

API Styles: RPC, REST,

Now what we have services, contracts out of the way we can dig deeper into how messages get over the wire in the real-time services. We will ignore “B2B protocols” for this discussion and leave them for the future

The four common real-time service protocols we see all are built over HTTP using  JSON or XML content-type and different in their implementation. Below is a short description of each

  • REST
    • Is the most service protocol for front-end applications and modern enterprise APIs. Is stateless, cacheable, uniform, layered etc and came from this [6]
    • A resource is a key abstraction in REST and a hard concept for integration-practitioners to master
    • Mature REST APIs look like Hypertext, i.e. you can navigate them like you would the web
    • Uses HTTP methods e.g. GET, PUT, POST, Patch for query and command
    • Uses HTTP status codes for communicating the response e.g. 2xx, 4xx, 5xx
    • Open to custom or standard request/response data-types. Standard hypermedia types include HAL, JSON API, JSON-LD, Siren etc. [4]
    • Requires resource-centric thinking
  • GrapQL
  • Streaming APIs
    • HTTP Streaming
    • Websockets
    • SSE
    • HTTP2/Push
    • gRPC
  • gRPC
  •  SOAP
    • Use to be popular before REST came along
    • Uses HTTP POST with the body containing request details – method-name, request etc in XML
    • Uses XML schemas for describing data types
  • Json-RPC
    • Semi-popular with some legacy clients
    • Uses HTTP POST with the body containing request details – method-name, request etc in JSON
    • Uses JSON object to define the method and data
  • OData
    • Used in CRM, ERP etc enterprise systems to reduce client-side development through config driven service consumptions
    • Presents the end service as a “data source” to the consumer, allowing SQL like query
    • Uses schema for describing data source objects and atom/XML  for transmitting over the wire
    • Requires custom parsers for parsing URL which contains the “query” to be mapped to the back-end service

There is plenty to cover in this space and in a future post, I will compare these protocols further and flavour them from experience. The key takeaway here though is that there are many many ways in which service provides and service consumers can communicate, most choose REST or SOAP over HTTP, there are passionate conversations over REST/SOAP, JSON/XML, HAL/JSON-API/Siren all the while OData remains a mystery to us -until we need to deal with it

EAI Patterns

There are heaps to learn about “how” these service providers and consumers communicate in a networked environment but below is a quick overview of these patterns. These patterns emerge because of the CAP Theorem [7] and the Network partition  between these systems looking to exchange data

Screen Shot 2020-02-25 at 5.57.40 pm
5. Integration Communication Patterns used by Service Providers and Consumers

Recap

  1. Enterprise Integration involves internal, external, batch and real-time services
  2. Key actors are service providers, consumers and mediators
  3. Service contracts are key documents in integrating
  4. The cadence between provider and consumer impacts delivery velocity
  5. Service protocols vary but there are 4-main types: REST, SOAP, JSON-RPC and OData

References

[1] EAI Patterns https://www.enterpriseintegrationpatterns.com/

[2] Experian REST API https://www.edq.com/documentation/apis/address-validate/rest-verification/#/Endpoints/Search_Address

[3] RAML https://raml.org/

[4] SWAGGER https://swagger.io/

[5] Choosing Hypermedia Formats https://sookocheff.com/post/api/on-choosing-a-hypermedia-format/

[6] Roy Fielding’s dissertation https://www.ics.uci.edu/~fielding/pubs/dissertation/top.htm

[7] CAP Theorem or Brewer’s Theorem https://en.wikipedia.org/wiki/CAP_theorem

[8] Contract Driven Development https://link.springer.com/chapter/10.1007/978-3-540-71289-3_2

[9] gRPC https://developers.google.com/protocol-buffers/docs/proto3

 

Tackling complexity: Using Process maps to improve visibility of integrated system features

“Entropy always increases– second law of thermodynamics

Enterprise systems are similar to isolated physical systems, where the entropy or hidden-information always increases. As the business grows, our technology footprint grows as new systems are implemented, new products and cross-functional features are imagined and an amazing network of integrations emerge

Knowing how information flows and managing the chaos is therefore critical organisations are to move beyond “Functional-1.0” into “Lean-2.0” and “Strategic-3.0”  in their implementations. We discuss how current documentation and technical registries simply “tick the box” and there is a better approach to manage increasing complexity through better context

Enterprise Integration Uses Integrations

Current state: Integration Interface Registries with little context

The network of integrations/interfaces (blue-circles above) are often captured in a technically oriented document called “Interface Registry” in a tabular form by teams performing systems integration. While these tables provide details around “who” (producer/consumer details) and “how” (the type of integration) they cannot describe “when” and “why” (use case).  As projects grow and interfaces grow or are re-used the number of when and whys increase over time and entropy (hidden information) around these interfaces grows; this leads to chaos as teams struggle to operate, manage and change them without proper realisation of the end-to-end picture

As a result, only maintaining a technical integration Interface registry leads to poor traceability (business capability to technical implementation), increased maintenance-cost of interfaces ( hard to test for all scenarios) and leads to duplication of effort over time ( as change becomes complex, teams rewrite)

Screen Shot 2019-11-21 at 6.22.21 pm

Integration Interface Repository

Therefore without proper context around Integration Interfaces, organisations will struggle to manage and map cross-functional features leading to slower lead-time, recovery etc over time. We propose that documenting integration use-cases, in a business-friendly visual language and related them to technical interface lists and enterprise capabilities is the key to mastering the chaos

Mastering the chaos: Building a context map

Context is key as it

  1. It drives product-centric thinking vs project-based thinking
  2. It makes our solution more operable, maintainable and re-useable 

In order to  provide better context and do it in a clear visually-oriented format, we believe documenting integration user-stories as technical process flows is a good start

Consider the following use-case: “As a user, I must be able to search/register/update etc in a system”.  Use-cases begin all start with some activation point – a user, timer or notification and then involve orchestration of services or choreography of events resulting in actions within microservices or end-systems eventually delivering some value through a query or command. We can render such a use-case into a map showing the systems, interfaces and actions in them (activation point, services, orchestrations, value) and do so in a standard manner

Screen Shot 2019-11-21 at 5.58.33 pm

For example, we leveraged the Business Process Management Notation – BPMN 2.0 standards to map integration technical use-case flows where we used general concepts like “swim-lanes” for user and systems, “arrows” for Interfaces (solid for request-response interfaces, dotted-lines for async messages) etc.

The Picture below shows this concept along with the “Interface” lines and “Messages” connecting the boxes (actions) between systems. Each interface or message then was linked to the Integration Interface Registry so that it was easy to trace reuse and dependencies

Screen Shot 2019-11-21 at 4.26.04 pm

It was also important that the context picture above is fairly lean as it avoids documenting too much to avoid becoming a single giant end-to-end picture with everything on it. It is best to stay within a bounded-context and only refer to a specific use-case such as “User Registration” or “Order submission” or “Customer Management” etc. This has the added advantage of helping teams which speak a ubiquitous language talk to a collection of pictures belonging to their domain and integration-practitioners to identify a collection of such teams (bounded-contexts)

Building a library and related to EA

The journey to improve visibility and maintenance of integration artefacts then involves capturing these integration use-case context maps, storing them in a version-controlled repository, relating them to other technical and business repositories

This collection of context maps would contain similar information to a “high-level enterprise system integration view” but with a greater degree of clarity

Screen Shot 2020-02-24 at 7.02.43 pm

This collection can also be linked to the Enterprise Architecture (EA) Repository for full end-to-end traceability of Business Capabilities into Technical Implementations. In fact, the TOGAF framework describes an external Business Architecture repository pattern as part of Solution building blocks (see TOGAF structural framework )

We imagine the Integration Context Map repository linked to the Enterprise Architecture Repository and the Integration Interface repository as shown below – this would provide immense value to cross-functional teams and business stakeholders, allowing both to see a common picture

Screen Shot 2019-11-19 at 7.54.49 pm

Sequence flows or process flows?

Sequence diagrams can also be used to document technical use-cases with systems and interfaces, however similar to the Integration interface list, they then to be difficult to consume for the non-technical users and lack the clarity provided by process maps

Screen Shot 2019-11-19 at 6.06.42 pm

As a general rule of thumb we found the following segregation to be useful:

  1. What: Technical process flows for end-to-end visibility, especially useful in complex long-running distributed features.  Sequence diagrams for technical component designs, best for describing how classes or flows/sub-flows (in Mule, for example) interact
  2. Who:  Context maps by Business Analysts (BA) or Architect and Sequence flows by Developers
  3. When: Context maps by Business Analysts (BA) as early as during project Discovery, providing inputs to sizing and visual map of what-is-to-be (sketch?). Sequence flows by Developers, as a task in Development story
Screen Shot 2019-11-14 at 6.43.16 pm

Let us talk tools

There are a variety of tools that can help document process context maps in a standard BPMN 2.0 format. The key criteria here is to produce a standard artefact – a BPMN 2.0 diagram so that it can be managed by standard version-control tools and rendered to documents, team wikis etc. though tools/plugins

Below is a list of tools you can try, we recommend not getting too hung up on tools and instead focus on the practice of documenting integration use-cases

Tools

Recap

  1. As enterprise projects deliver more integrated solutions, it becomes harder to manage and change integration interfaces without proper traceability
  2. Improve traceability of a single end-to-end use-case through a context map
  3. You can use BPMN 2.0 for a standardised notation to do this and use tools to generate these context maps as .bpmn files
  4. You can version control these .bpmn  files and build a collection of context maps
  5. You can link these context maps to Integration Interface registry and Enterprise Business capability registry for increased traceability across the enterprise
  6. There are many tools to help you write the .bpmn files, don’t get hung up on the tools. Start documenting and linking to the interface registry

Conclusion

The context map collection then becomes very useful for enterprise architecture, integration operations, new project teams, testing etc. as a common visual artefact as it relates to the users, systems and interfaces they use 

Enterprise Integration process maps then become a powerful tool over time as they greatly improve visibility across the landscape and help teams navigate a complex eco-system through a contextual and meaningful visual tool; this leads to better open and maintainable integration products leading to reuse and cost-efficiency 

 

Raspberry Pi Setup – Part II: IoT API Platform

Intro

This is the second part in a series of posts on setting up a Raspberry Pi to fully utilize the Software & Hardware functionality of this platform to build interesting internet of things (IOT) applications.

Part-I is here https://techiecook.wordpress.com/2016/10/10/raspberry-pi-ssh-headless-access-from-mac-and-installing-node-processing/ … this covered the following:

  • Enable SSH service on the Pi
  • Connect to Pi without a display or router – headless access via Mac

 

Part II: is this blog and it covers the following goals:

  • Install Node, Express on the Pi
  • Write a simple HTTP service
  • Write an API to access static content ( under some /public folder)
  • Write an API to POST data to the Pi
  • Write an API to read GPIO pin information

So lets jump in ….

 

Install Node.js

Be it a static webserver or a complex API – you need this framework on the PI (unless you like doing things in Python and not Javascript)

How do I install this on the Pi?

wget https://nodejs.org/download/release/v0.10.0/node-v0.10.0-linux-arm-pi.tar.gz
cd /usr/local
sudo tar xzvf ~/node-v0.10.0-linux-arm-pi.tar.gz --strip=1
node -v 
npm -v

 

Install Express

Easy to build APIs using the Express Framework

How do I install this on the Pi?

npm install express --save

Testing Node + Express

Write your first simple express HTTP API

I modified the code from here http://expressjs.com/en/starter/hello-world.html as shown below to provide a /health endpoint and a JSON response

  • Create a folder called ‘simple-http’
    • Mine is under /projects/node/
  • Initialize your Node Application with
    npm init
  • Install express
    npm install express --save
  • Write the application
    • 2 endpoints – ‘/’ and ‘/health’
      var express = require('express');
      var app = express();
      
      app.get('/', function (req, res) {
       res.send('API Running');
      });
      
      app.get('/health', function (req, res) {
       res.send('{"status":"ok"}');
      });
      
      app.listen(8080, function () {
       console.log('API Running ...');
      });

Screen Shot 2016-10-10 at 3.27.31 PM.png

GET Static Content

  • Create a “public” folder and put some content in itScreen Shot 2016-10-10 at 3.48.07 PM.png
  • Setup the Node/Express app to use “express.static()”
    var express = require('express');
    var app = express();
    var path = require('path');
    app.use('/static', express.static(path.join('/home/pi/projects' + '/public')));
    app.get('/health', function (req, res) {
      res.send('{"status":"ok"}');
    });
    app.listen(8080, function () {
      console.log('API Running ...');
    });
  • Test by accessing the “/static” endpointScreen Shot 2016-10-10 at 3.52.04 PM.png

POST data

Install body-parser

npm install body-parser --save

Use the body-parser

var bodyParser = require('body-parser');
app.use(bodyParser.json()); 
app.use(bodyParser.urlencoded({ extended: true })); 

Write the POST function … in my case I want to be able to do POST /api/readings

app.post('/api/readings', function(req, res) {
    var d = new Date();
    var value = req.body.value;
    var type = req.body.type;
    res.send('{ "status":"updated"}');
    console.log('Reading | '+d.toLocaleDateString()+'|'+d.toLocaleTimeString()+' | value='+value+' | type='+type);
});

Test It!

To test it we can issue a curl command from a computer connected to the same network as the Raspberry Pi….

curl -X POST -H "Content-Type: application/json"  -d '{
 "value":7.5,
 "type":""
}' "http://192.168.3.2:8080/api/readings"

Screen Shot 2016-10-11 at 10.39.39 AM.png

Note: In the example above, I run the command from the terminal and later in Postman from the Mac hosting the Raspberry Pi … remember the IP for my Pi was obtained via the following command

netstat -rn -finet

 

Read GPIO Pin Information

What are GPIO Pins?

The Raspberry Pi comes with a input/output pins that let you connect to electronic components and read from them or write to them … these general purpose input output pins vary in number based on the model of your Pi – A, B, B+ etc

See  http://pinout.xyz/ for how pins are labelled …

Screen Shot 2016-10-11 at 10.53.54 AM.png

Our Goal: API for accessing GPIO pins

Once we know which pins we want to read/write to, we need to be able to access this from our application … in our example this would be the Node.js application we are writing.

So we would like to be able to something like

  • GET /api/gpio/pins and list all the pin details (id, type etc)
  • GET /api/gpio/pins/{pinId}  and get the value of a pin
  • PUT /api/gpio/pins/{pinId} and set it to high/low or on/off

Setup Javascript Libraries for GPIO access

  1. Install “gpio-admin” so that you do not have to run your Node application as root to read / write to the gpio pins (Read https://github.com/rakeshpai/pi-gpio)
    git clone git://github.com/quick2wire/quick2wire-gpio-admin.git
    cd quick2wire-gpio-admin
    make
    sudo make install
    sudo adduser $USER gpio
  2. Install the npm gpio package (Read https://www.npmjs.com/package/rpi-gpio )
    npm install rpi-gpio
  3. Use this in your application
    1. Watchout:  The gpio operations are Async … this means that capturing ‘value’ of a pin, for instance, cannot be done like ‘var value  = readInput()’ in Node.js we need to use “Promises” (Readhttps://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Promise – Thanks Alvonn!)
    2. “ReferenceError: Promise is not defined”: It appears the version of Node for the Pi (v0.10.0) does not have this library … ouch! Luckily other have hit this problem at some point with node and have posted the following solution which worked  …
      1. Install the “es6-promise” library manually
        npm install es6-promise@3.1.2
      2.  Declare Promise as the following in your application

        var Promise = require('es6-promise').Promise;
    3. Finally make the GPIO call
      1. Declare a ‘gpio’ variable and setup (attach to a pin)
        var gpio = require('rpi-gpio');
        gpio.setup(7, gpio.DIR_IN, readInput);
        function readInput() {
           gpio.read(7, function handleGPIORead(err, value) {
             console.log('Read pin 7, value=' + value);
           });
        }
      2. Read from it Async in a HTTP GET /api/gpio/pins/{pinId} call
        var Promise = require('es6-promise').Promise;
        app.get('/api/gpio/pins/:pinId', function (req, res) {
           var pinId = req.params.pinId; 
           var value = "";
           var p1 = new Promise(function(resolve, reject) {
                   gpio.read(7, function handleGPIORead(err, value) {
                   console.log('GET /api/gipio/pins/:pinId | Pin 7 | value=' + value);
                   resolve(value);
                  });
                 });
        
        p1.then(function(val) {
         console.log('Fulfillment value from promise is ' + val);
         res.send('{"pinId": '+pinId+', "value":'+val+'}');
         }, function(err) {
           console.log('Result from promise is', err);
           res.send('{"pinId": '+pinId+', "value":"Error"}');
         });
        
        });
      3. OutputScreen Shot 2016-10-11 at 4.21.42 PM.png

Show me the code

You can find the code for this project here:

git@bitbucket.org:arshimkola/raspi-gpio-demo-app.git

 

Summary

So by now we should have a platform primed to be used as a IOT device with a simple API running. We could have of course done all of the above in python.

Also we could have written a scheduled-service and “pushed” data from the Pi to some public API … in the next part of the series, we will talk about organizing IOT sensors in a distributed network

 

 

 

References

[1] https://scotch.io/tutorials/use-expressjs-to-get-url-and-post-parameters

[2] https://www.npmjs.com/package/rpi-gpio

[3] https://github.com/rakeshpai/pi-gpio

[4] https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Promise

 

 

 

Raspberry Pi Setup – Part I : Boot, SSH and Headless access

Intro

Part I : Documenting the workflow/steps to setup my Raspberry Pi (2 B) – from Installing Raspbian to installing Node, Processing etc

Goals:

  • Enable SSH service on the Pi
  • Connect to Pi without a display or router – headless access via Mac
Box 1.jpg
Raspberry Pi with GPIO Ribbon cable

Steps

  1. Install Raspbian
  2. Boot your Pi
  3. Setup SSH
  4. Connect to Mac
  5. Troubleshoot Connection to Mac
  6. Install Node
  7. Install Processing

Install Raspbian

You can download it here https://www.raspberrypi.org/downloads/raspbian/

Screen Shot 2016-10-10 at 1.11.28 PM.png

Boot Pi

Use the Raspbian image to boot-up the pi and you should see a command shell (once you have connected your pi to a display ofcourse!)

Setup SSH

Start with typing the following command in your shell

sudo raspi-config

screen-shot-2016-10-10-at-1-17-24-pm

This launches the Config Tool UI, follow the screenshots below

Screen Shot 2016-10-10 at 1.16.15 PM.png

Screen Shot 2016-10-10 at 1.19.27 PM.png

Screen Shot 2016-10-10 at 1.16.26 PM.png

Test by SSH’ing into the pi from the pi

ssh pi@localhost

Screen Shot 2016-10-10 at 2.06.22 PM.png

Copy Network Card’s Address

Do an “ifconfig” in the Pi’s shell and note down the “HWaddr” … we will use this later to search for the IP assigned by Mac

ifconfig

screen-shot-2016-10-10-at-1-37-48-pm

Connect to Mac Headless

Next we will connect the Pi to the in “headless” mode using the Mac to assign an IP to the Pi and then SSH into the Pi from a terminal in Mac …

  1. Ensure your Mac is connected to the network (wifi)
  2. On the Mac goto System Preferences -> SharingScreen Shot 2016-10-10 at 1.29.40 PM.png
  3. Connect Pi to Mac via Ethernet Cable and then power it up
  4. Wait until the “green lights” on the Pi have stopped blinking
  5. Check connection using the following command
    1. See Routing Table with the following command
      netstat -rn -finet
    2. Your Pi’s IP should be in the 1st column, next to the “HWaddr” in the 2nd column Screen Shot 2016-10-10 at 1.37.30 PM.png
    3. If you do not see your “HWAddr” listed then there is some connection issue …  goto “Troubleshoot Connection to Mac” in the next section
    4. Once you can see your IP, setup X11 forwarding with XQuartzScreen Shot 2016-10-10 at 1.34.13 PM.pngscreen-shot-2016-10-10-at-1-44-39-pm  Screen Shot 2016-10-10 at 1.44.49 PM.png
    5. Finally SSH in with the following command
      ssh -X pi@192.168.3.2
      1. If prompted for a password enter “raspberry” (which is your default password and you should change it at some point)
      2. You can setup passwordless login using the tutorial here https://www.raspberrypi.org/documentation/remote-access/ssh/passwordless.md
    6.  … finally this is what it looks likescreen-shot-2016-10-10-at-1-36-57-pm

 

Troubleshoot Connection to Mac

If you are unable to see an IP assigned to your Pi or unable to Ping it or SSH then there could be several reasons for this

  • Check your ethernet cable – I was using a bad cable when connecting to the Mac and a a good one when connecting to the router (when testing standalone), this cost me a lot of time!
  • Check your Mac Settings
    • I did not change the System Preferences -> Network -> Ethernet
    • Mac by default assigns 192.168.2.2 IP to your Pi, this may not get used if your Pi has “static” IP setup … validate by doing the followin
      • cat /etc/network/interfaces
      • ensure you use “dhcp” instead of “static” as shown belowScreen Shot 2016-10-10 at 1.59.01 PM.png
  • Check your Pi
    • while hardware issues are rare, you could have a problem there. Compare with another Pi
    • is it powered on? are the ethernet lights on when you connect the cable?

Software Install

… after you are done setting up the SSH / headless pi access, you can finally start building a usable platform

see Part II of the journey to build services over the PI to send out and read in data!

 

 

Complex Form Evaluation with Drools

Introduction

Complex business rules are best implemented using a ‘Rules Engine’. Drools is an open source Business Rules Management Product. See here

Screen Shot 2016-08-10 at 5.03.21 PM.png

 

In this blog we will cover a few basics of using the Drools rule engine, specifically using a Domain Specific Language (DSL) which is a language that is more user focused. The blog comes with a demo project which can be downloaded and used along with this document.

Demo Use Case

Our demo use case will cover evaluating an ‘Application Form‘ with multiple ‘Sections

Each form section has a ‘rule‘ which the current form evaluators (manual task) use to evaluate the ‘Questions‘ in the form. Each form question has one or more ‘option‘ selected.

For example:

 Form
   - Section1
        - Question1
            - Option1
        - Question2
            - OptionA,OptionB
   - Section2
        - Question1
            - OptionX,OptionY

Now let us assume a use case with a few simple questions and conditions associated with a particular form, for example, a ‘weekend work approval’ form. We can as a few simple questions

Form:

  • Section1:
    • Question1: “Is this necessary work”
      • options: [Yes, No]
      • rule: “Approved if Yes is selected”
  • Section2:
    • Question1: “When is this work to be done”
      • options: [Weekend Work, Regular time]
      • rule: “Manager approval required if this is weekend work
  • Section3:
    • Question1: “Is this an emergency”
      • options: [Non-Emergency, Emergency]
      • rule: “If this is an emergency fix then it is approved

As you can see in our sample use case, we have only one question per section but this can be more.

Screen Shot 2016-08-10 at 5.04.18 PM.png

Code Repository

You can download the source code from here using

git@bitbucket.org:arshimkola/drools-forms-demo.git

Execution Instructions

Run the form.demo.rules.RulesExecutor java main to run the demo

First Steps – a simple condition

A Simple Rule is implemented in file called rule.dslr

package form.demo.rules;
import form.demo.rules.facts.*
import function form.demo.rules.RulesLogger.log

expander rule.dsl


// ----------------------------------------------------
// Rule #1
// ----------------------------------------------------
rule "Section1 Rule1.1"
when 
  Form has a section called "Section1" 
 then
  Section outcome is "No Further Review Required" 
end

The DSLR file imports facts from the package form.demo.rules.facts. There is a function called log defined in form.demo.rules.RulesLogger class. There is an expander for the DSLR that converts the expander rule.dsl is defined as the expander

The DSL for the rule is in the rule.dsl file

#---------------------------------------------------------------------------------------
#  Rule DSL
#---------------------------------------------------------------------------------------
[when]Form has a section called {name}=$form:FormFact(getSection({name}) != null)
[when] And = and
[when] OR = or
[then]Section outcome is {outcome}=$form.getSection({name}).setOutcome({outcome});log(drools,"Section:"+{name}+", Outcome:"+{outcome}+", Rule Applied:"+ drools.getRule().getName() );

When executed against a set of Facts

  FormFact formWithFacts = new FormFact();
  formWithFacts.addSection("Section1", "Question1", "Yes");
  FormAssessmentInfo assessmentInfo = new RulesExecutor().eval(formWithFacts);
Dec 01, 2015 3:49:09 PM form.demo.rules.RulesLogger log
INFO: Rule:"Section1 Rule1.1", Matched --> [ Section:Section1, Outcome:No Further Review Required, Rule Applied:Section1 Rule1.1]
Not Evaluated
     Section1->No Further Review Required

Adding a second condition

– If some option is selected in a section then set the outcome to a value

// ----------------------------------------------------
// Rule #1
// ----------------------------------------------------
rule "Section1 Rule1.1"
when 
  Form has a section called "Section1"
  -"Yes" is ticked in "Question1"
then
  Section outcome is "No Further Review Required" 
end
#---------------------------------------------------------------------------------------
#  Rule DSL
#---------------------------------------------------------------------------------------
[when]Form has a section called {name}=$form:FormFact(getSection({name}) != null)
[when]-{option} is ticked in {question}=eval($form.getSection({name}).has({question},{option}))
[when] And = and
[when] OR = or
[then][Form]Section outcome is {outcome}=$form.getSection({name}).setOutcome({outcome});log(drools,"Section:"+{name}+", Outcome:"+{outcome}+", Rule Applied:"+ drools.getRule().getName() );

Adding Global Rules

  • If a section acts as a global flag (for example: Emergency Approval) then ignore all outcomes and select this
  • If there is no global flag then if any of the sections have outcome ‘foo’ then set the form outcome to ‘bar’ otherwise set the form outcome to ‘baz’

In the Rule DSL we add the following, notice how a new instance of the FormFact is created – this time without matching a section name

[when]The Form=$form:FormFact()
[when]-has a section with outcome {outcome}=eval($form.hasSectionWithOutcome({outcome}))
[when]-has no section with outcome {outcome}=eval($form.hasSectionWithOutcome({outcome}) == false)
[then]Form outcome is {outcome}=$form.setOutcome({outcome});log(drools,"Form Outcome Set to "+{outcome});

In the DSLR we implement a few global rules

// ----------------------------------------------------
// Global Rule #1
// ----------------------------------------------------
rule "Global Rule1.1"
when 
  The Form 
  -has a section with outcome "Emergency Work"
then
  Form outcome is "Approved" 
end 


// ----------------------------------------------------
// Global Rule #2.1
// ----------------------------------------------------
rule "Global Rule2.1"
when 
  The Form 
  -has no section with outcome "Emergency Work"
  -has a section with outcome "Manager Review Required"
then
  Form outcome is "Manager Review Required" 
end 

// ----------------------------------------------------
// Global Rule #2.2
// ----------------------------------------------------
rule "Global Rule2.2"
when 
  The Form 
  -has no section with outcome "Emergency Work"
  -has no section with outcome "Manager Review Required"
then
  Form outcome is "Manager Review Required" 
end

Why is this not an API contract?

Why is this … my Swagger UI, generated from code not a contract? It describes my service, therefore it must be a Service Provider Contract. No? 

This was a common theme for a few of our clients with mobile/web teams as consumers of enterprise services.  Service providers generated contracts, and would sometimes create a contract in the API portal as well.

Service consumers would then read the contract from the API portal and consume the service. What the problem then?  …

SwaggerGenWhatIsIt.png

….the problem is that the red box i.e the Contract – is generated after the service is implemented and not vice-versa.

Why is this a problem then?  It is a problem because the contract is forced upon the consumer and worse there are 2 versions of this document.

So what? Well as you can imagine, changes to the service implementation over time will generate the provider contract (red box), while consumers continue to read the out-of-sync contract.

so? A contract is an agreement between the two parties – consumer and provider. In the above use-case though, this is not the case.

Key Points:

  • Generated Swagger UI is a documentation and not a contract
  • A contract is a collaborative effort between Providers and Consumers
  • A product (API Gateway) cannot solve this problem, it is cultural
  • The above process will create 3 layers of responsibility – Service provider, Service consumer and middleware provider
  • The 3 layers of responsibility makes it harder to test APIs

Side note: I believe this was a big problem with SOA – the “Enterprise Business Service (EBS)” was owned by the middleware team and “Application Business Services (ABS)” was owned by the services teams.

The fix?

Collaborative contracts that help define what a service should do!

This contract is used by Consumers to build their client-code and more importantly the providers use the contract to build the service and test it!

Contract.png

 

 

Lessons from API integration

A general transition has been happening with the nature of work that I do in the integration space … we have been doing less of SOAP/XML/RPC webservices and more RESTful APIs for “digital enablement” of enterprise services .  This brought a paradigm shift and valuable lessons were learnt (rightly or wrongly) … and of course the process of learning and comparing never stops!

Below are my observations …

It is not about SOAP vs REST … it is about products that we integrate, the processes we use and the stories we tell as we use one architectural style vs the other

  1. Goals: Both architectural styles work to achieve the same goal – integration. The key differences lie in where they are used. SOAP/XML has seen some b2b but adoption by web/mobile clients is low
  2. Scale: SOAP/XML is good but will always remain enterprise scale … REST/JSON is webscale. What do I mean by that? REST over HTTP with JSON feels easier to communicate, understand and implement.
  3. Protocol: The HTTP protocol is used by REST better than SOAP, so the former wins in making it easy to adopt your service
  4. Products: The big monolith integration product vendors are now selling “API Gateways” (similar to a “SOA product suite”) … The gateway should be a lightweight policy layer, IMHO, and traditional vendors like to sell “app server” licenses which will blow up an API gateway product (buyer beware!)
  5. Contract: RAML/YAML is a lot easier to read than a WSDL ….which is why contract first works better when doing REST/JSON
  6. Process: The biggest paradigm change we have experienced is doing Contract Driven Development  ny writing API definition in RAML / YAML … Compare this to generating wsdl or RAML from services. Contract driven , as I am learning, is much more collaborative!
  7. Schemas: XML schemas were great until they started describing restrictions … REST with JSON schema is good but may be repeating similar problems at Web scale
  8. Security: I have used OAuth for APIs and watched them enable token authentication and authorization easily with clients outside the enterprise – we are replacing traditional web session ids with oauth tokens to enable faster integration through 3rs party clients … all this through simple configuration in an API gateway compared to some of the struggle with SAML setup within an organisation with the heavy monolithic SOA and security products!

It would be easy to then conclude that we ought to rip out all our existing SOAP/XML integrations and replace them with REST APIs no? Well not quite…as always “horses for the courses”.

Enterprise grade integration may require features currently missing in REST/JSON (WS-* , RPC) and legacy systems may not be equipped to do non XML integration

My goal was to show you my experience in integrating systems using API with Contract driven development & gateways with policies vs traditional web service development using SOA products …hope to hear about your experience, how do you like APIs?

Camunda BPM – Manual Retry

The Camunda BPM is a lightweight, opensource BPM platform (See here: https://camunda.com/bpm/features).

The “Cockpit” application within Camunda is the admin dashboard where deployed processes can viewed at a glance and details of running processes are displayed by the process instance ids. Clicking on a process instance id reveals runtime details while the process is running – process variables, task variables etc. If there is an error thrown from any of the services in a flow, the Cockpit application allows “retrying” the service manually after the coded automatic-retries have completed.

The steps below show how to retry a failed process from the Cockpit administrative console

Manual Retry

From the Process Detail Screen select a failed process (click on the GUID link)

Then on the Right Hand side you should see under “Runtime” a “semi-circle with arrow” indicating “recycle” or “retry” – click on this

This launches a pop-up with check-boxes with IDs for the failed service tasks and option to replay them. Notice these are only tasks for “this” instance of the “flow”

After the “Retry” is clicked another pop-up indicates the status of the action (as in “was the engine able to process the retry request”. The actual Services may have failed again)